The given lines were metioned in the Pseudo Rehearsal for Catastrophic Forgetting article by Explosion AI. A crucial detail in this process is that the "revision exercises" that you're mixing into the new material *must not* be produced by the weights you're currently optimising. You should keep the model that generates the revision material static. Otherwise, the model can stabilise on trivial solutions. If you're streaming the examples, you'll need to hold two copies of the model in memory. Alternatively, you can pre-parse a batch of text, and then use the annotations to stabilise your fine-tuning.
I have a few questions about this.
What do you mean by "revision exercises" that you're mixing into the new material *must not* be produced by the weights you're currently optimising?
How can I can create 2 models in the memory as specified?
So that article is a bit old now, and it describes a somewhat experimental approach. That said, there's now an experimental API for the pseudo-rehearsal in spaCy that you can try out: the nlp.resume_training() method takes care of creating the internal model copies, and then you can use nlp.rehearse, passing in batches of raw text.
Again, these APIs are experimental, so your mileage may vary with them. I don't have firm recommendations for how many texts to use for rehearsal, or how large to make the batch sizes, for instance.
There is also 1 doubt about the shuffling of the data internally to get a better accuracy. How far is that true? And by that can we curb Catastrophic Forgetting Problem ?