nlp.rehearse does not work

apparat · October 4, 2023, 6:57am

I am trying to do dynanic model training. For this I need to retrain my model every time when for eg. employees need to add new data to model. Basically my app extracts data from docs and if something wrong - people check what is wrong and model updates

I have 3 solutions in my mind

To do one json file or db with examples, always add new examples to it and retrain whole model from 0 every time (I guess it is the worse solution)
To create new ner pipeline every time I train model (but I think that model will be executing slower, because it is posible to have 100-200 new pipelines)
Best solution what I found - use pseudo-rehearsal

Here is my code that will retrain model

optimizer = model.resume_training()

for itn in range(1000):
    random.shuffle(data)
    losses = {}
    for item in data:
        doc = model.make_doc(item['text'])
        ents = []
        for annotation in item['annotations']:
            start = annotation.get('start')
            end = annotation.get('end')
            label = annotation.get('label')
            if start is not None and end is not None and label is not None:
                span = doc.char_span(start, end, label=label)
                if span is not None:
                    ents.append(span)
        doc.ents = ents
        example = Example.from_dict(doc, {"entities": ents} )
        model.rehearse([example], sgd=optimizer, losses=losses)

If I am using model.rehearse my model does not update at all, but it is successfully processed

When I am trying to use model.update - all works, but now I am getting problem called "chatastrophic forgetting"

Am I doing something wrong, or this feature can not do what I need? Thank you!

ryanwesslen · October 4, 2023, 12:16pm

hi @apparat!

Thanks for your question and welcome to the Prodigy community

Could you post your message on spaCy's GitHub discussions forum?

Your question is specific to spaCy (nlp.rehearse), so you are best off posting there. That's where the spaCy core team answered questions and you'll get a much faster response by posting there. This forum is for Prodigy-specific questions.

Thanks for your understanding!

Topic		Replies	Views
Training the NER pipeline component of an existing model ner , spacy , off-topic	2	911	September 14, 2021
Query in Catastrophic Forgetting Article ner , spacy	5	456	February 18, 2020
NER prodigy train with existing model usage , ner , spacy , solved	7	793	September 28, 2020
Error when trying to retrain the NER model for Spacy v2.2.1 install , solved	1	580	October 16, 2019
Correcting trained model fails. ner , solved , transformers , training	4	844	January 24, 2022

nlp.rehearse does not work

Related topics