I am trying to do dynanic model training. For this I need to retrain my model every time when for eg. employees need to add new data to model. Basically my app extracts data from docs and if something wrong - people check what is wrong and model updates
I have 3 solutions in my mind
- To do one json file or db with examples, always add new examples to it and retrain whole model from 0 every time (I guess it is the worse solution)
- To create new ner pipeline every time I train model (but I think that model will be executing slower, because it is posible to have 100-200 new pipelines)
- Best solution what I found - use pseudo-rehearsal
Here is my code that will retrain model
optimizer = model.resume_training()
for itn in range(1000):
random.shuffle(data)
losses = {}
for item in data:
doc = model.make_doc(item['text'])
ents = []
for annotation in item['annotations']:
start = annotation.get('start')
end = annotation.get('end')
label = annotation.get('label')
if start is not None and end is not None and label is not None:
span = doc.char_span(start, end, label=label)
if span is not None:
ents.append(span)
doc.ents = ents
example = Example.from_dict(doc, {"entities": ents} )
model.rehearse([example], sgd=optimizer, losses=losses)
If I am using model.rehearse my model does not update at all, but it is successfully processed
When I am trying to use model.update - all works, but now I am getting problem called "chatastrophic forgetting"
Am I doing something wrong, or this feature can not do what I need? Thank you!