Catastrophic forgetting and pseudorehearsal in neural networks

The following is the text of my posting to the Connectionists email list (Sept. 1998) describing my research on catastrophic forgetting and the pseudorehearsal solution in neural networks. Recently this research has been carried out in collaboration with Marcus Frean and Simon McCallum.


The catastrophic forgetting (catastrophic interference, serial learning) problem has come up in this thread. In most neural networks most of the time, learning new information disrupts (even eliminates) old information. I want to quickly describe what we think is an interesting and general solution to this problem.

First, a comment on rehearsal. The catastrophic forgetting problem can be solved with rehearsal - relearning old items as new items are learned. A range of rehearsal regimes have been explored (see for example Murre, 1992; Robins, 1995; and the "interleaved learning" referred to earlier in this thread by Jay McClelland from McClelland, McNaughton, & O'Reilly, 1995).

Rehearsal is an effective solution as long as the previously learned items are actually available for relearning. It may be, however, that the old items have been lost, or it is not practical for some reason to store them. In any case, retaining old items for rehearsal in a network seems somewhat artificial, as it requires that they be available on demand from some other source, which would seem to make the network itself redundant.

It is possible to achieve the benefits of rehearsal, however, even when there is no access to old items. This "pseudorehearsal" mechanism, introduced in Robins (1995), is based on the relearning of artificially constructed populations of "pseudoitems" instead of the actual old items.

In MLP / backprop type networks a pseudoitem is constructed by generating a new input vector at random, and passing it forward through a network in the standard way. Whatever output vector this input generates becomes the associated target output. Rehearsing these pseudoitems during new learning protects the old items in the same way that rehearsing the real old items does. Why does it work?

The essence of preventing catastrophic forgetting is to localise changes to the function instantiated by the network so that it changes only in the immediate vicinity of the new item to be learned. Rehearsal localises changes by relearning ("fixing") the original training data points. Pseudorehearsal localises change by relearning ("fixing") other points randomly chosen from the function (the pseudoitems). (Work in progress suggests that simply using a "local" learning algorithm such as an RBF is not enough).

Pseudorehearsal is the generation of approximations of old knowledge to be rehearsed as needed. The method is very effective, and has been further explored in a number of papers (Robins, 1996; Frean & Robins, 1998; Ans & Rousset, 1997; French, 1997; and as a part of work described in Silver & Mercer, 1998). Pseudorehearsal enables sequential learning (the learning of new information at any time) in a neural network.

Extending these ideas to dynamical networks (such as Hopfield nets), we can rehearse randomly chosen attractors to preserve previously learned items / attractors during new learning (Robins & McCallum 1998). Here the distinction between rehearsal and pseudorehearsal starts to break down, as randomly chosen attractors naturally contain a mixture of both real old items / learned attractors and pseudoitems/ spurious attractors.

We have already linked pseudorehearsal in MLP networks to the consolidation of information during sleep (Robins, 1996). In the context of Hopfield type nets another proposed solution to catastrophic forgetting based on unlearning spurious attractors has also been linked to sleep (eg Hopfield, Feinstein & Palmer, 1983; Crick & Mitchison, 1983; Christos, 1996). We are currently exploring the relationship between this *unlearning* and our *relearning* based accounts. Details of the input patterns, architecture, and learning algorithm are all significant in determining the efficacy of the two approaches (we think our approach has advantages, but this is work in progress!).

References

Ans,B. & Rousset,S. (1997) Avoiding Catastrophic Forgetting by Coupling Two Reverberating Neural Networks. Academie des Sciences, Sciences de la vie, 320, 989 - 997.

Christos, G. (1996) Investigation of the Crick-Mitchison Reverse-Learning Dream Sleep Hypothesis in a Dynamic Setting. Neural Networks, 9, 427 - 434.

Crick,F. & Mitchison,G (1983) The Function of Dream Sleep. Nature, 304, 111 -114.

Frean,M.R. & Robins,A.V. (1998). Catastrophic forgetting and "pseudorehearsal" in linear networks. In Downs T, Frean M & Gallagher M (Eds) Proceedings of the Ninth Australian Conference on Neural Networks Brisbane: University of Queensland (1998) 173 - 178.

French,R.M. (1997) Pseudo-recurrent Connectionist Networks: An Approach to the Sensitivity Stability Dilemma. Connection Science, 9, 353 - 380.

Hopfield,J., Feinstein, D. & Palmer,R. (1983) 'Unlearning' has a Stabilizing Effect in Collective Memories. Nature, 304. 158 - 159.

McClelland,J., McNaughton,B. & O'Reilly,R. (1995) Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419-457.

Murre,J.M.J. (1992) Learning and Categorization in Modular Neural Networks. Hillsdale, NJ: Earlbaum.

Robins,A. (1995) Catastrophic Forgetting, Rehearsal, and Pseudorehearsal. Connection Science, 7, 123 - 146.

Robins,A. (1996) Consolidation in Neural Networks and in the Sleeping Brain. Connection Science, 8, 259 - 275.

Robins, A. & McCallum, S. (1998). Pseudorehearsal and the Catastrophic Forgetting Solution in Hopfield Type Networks. Connection Science, 7 : 121 - 135.

Silver,D. & Mercer,R. (1998) The Task Rehearsal Method of Sequential Learning. Department of Computer Science University of Western Ontario Technical Report # 517.