how to use ner.correct --update

Hello there!
I am new to Prodigy and first of all, i wanted to thank you for your great tool and awesome support!

I want to perform named entity recognition with some new labels and I am trying to follow your annotation flowchart (which would be really helpful if updated for newest prodigy/spacy versions). Pretrained models are not really helpful as far as i can tell and also according to your flowchart, i should train a new model from scratch.

I have unlabeled text data, so i started by making some patterns and using ner.manual for some initial annotations. Then i trained my model on that dataset and a blank model (i tried the 3 variations of models provided by spacy and they had indeed worse initial results in comparison to the blank model).

First, i wanted to confirm that my approach so far is correct.
Secondly, it is essential to add more labeled data and train further in order to improve my model. This lead to some questions on how to proceed.

Should i use ner.manual with patterns to just add more data and retrain again? In that case, i assume that i should train my exported model from my initial training instead of the blank model.
Or should i user ner.correct --update with my initial model? In that case, does the model need to be retrained or it is trained at the same time that the annotations are added to the database?

Thanks.

Hi @pkras!

Your approach sounds reasonable. If it's a new label scheme, definitely start off with some manual annotation and train your model on the new data as you intend to do. :+1:

Now, ner.correct becomes handy if you already have a model the predicts something, and you just need to correct its predictions.

  • You can use the --update parameter so that it updates the model in the loop with the collected annotations, so that the predictions adjust as you annotate, easing you in the process.

However at the end, you should always train a new model with all of the annotations (both from ner.manual and ner.correct). You can make multiple passes on your data to get better results, you won't be able to do this via the model-in-the-loop (it just helps you with annotation). You should always train your model afterwards.

2 Likes

Hi @ljvmiranda921,

Thanks for the reply!
I think i get now, how ner.correct --update works.
So i should train a new model at the end, but a new blank model (with all the data) or retrain the model i have saved on my previous training (again with all the data)?
Also making multiple passes means retraining my model on the same data again?

Thanks again for your help!

Glad to be of help!

So i should train a new model at the end, but a new blank model (with all the data) or retrain the model i have saved on my previous training (again with all the data)?

It's the former. It would make more sense to train a blank model with all your data, than doing it incrementally. Ideally, the results of the two would be similar, but it's possible to introduce forgetting effects in the incremental approach.

multiple passes means retraining my model on the same data again?

Oh it just means that internally during training, you can use all ML tricks like dropout, training for multiple epochs, learning rate schedulers to improve your results. :smile:

1 Like

Ok, got it!

Thank you for your assistance!

1 Like