I am currently writing my master thesis on Named Entity Recognition, and I am using prodigy to see if I can improve the results obtained from spaCy. My dataset contains 50 reports of different Financial Institutions from the last 6 years. Each document in average has 580 pages. I have annotated 5 of the reports presented in the dataset using the recipe ner.teach.
I have divided the annotations in different datasets so I can observe the effect of the annotations in each experiment. In the first experiment I have used at maximum 100 annotations for each label. For the 2nd experiments 200 annotations for each label, and so on until the antepenultimate experiment. The penultimate experiment is where I train a model with word vectors (en_core_web_lg). The last one I trained a model with word vectors and pretrained tok2vec weights.
To test these models that I trained, I have selected a small portion of one of the reports which contained around 100 entities. In the first experiment I run the spacy model without using prodigy (without any annotation) and I get 61 entities returned. The results of the models created with the annotations are worse than using the model of spacy without any annotation. The precision of the results gets better but the recall is worst which means that I lose too many entities. I would like some help to improve the results or some suggestion in something that I can be doing wrong.
Hi! Can you share some more details on how exactly you trained your model and what's in the annotations? Are you updating the model with examples of all entity types or only selected ones? Are you adding new entity types? Is it possible that the model is overfitting on the next examples and forgetting previously correct predictions?
The models were trained using the train recipe of prodigy, the based model were either the en_core_web_sm or the en vectors web lg. The annotations are either manual (ner.manual) or binary using the recipe ner.teach. I have annotations for most entities, not every. I added a one new entity type.
I do not understand how I am loosing entities that were correct in the based model.
Also i would like to know if the training is cumulative, this is if the train improves the base model.
Which version of Prodigy are you using, and are you training from both manual and binary annotations combined? The latest version using spaCy v3 (Prodigy v1.11+) should support mixing manual and binary data and treat the examples accordingly – for instance, if you're updating from manual annotations, you want to consider all unannotated tokens as "not an entity", but if you're updating from binary annotations, you want to consider all unannotated tokens as unknown.
So one possible explanation for what you're seeing in the results is that your model was updated from binary annotations with all unannotated tokens interpreted as "outside an entity". So the model learned that all binary examples only included an entity for
If you're using Prodigy v1.10 or lower, you'd have to run two separate training experiments: one with --binary and one without (using the manual examples). If you're using v1.11., could you share the training command you ran?
Also, how did you perform the evaluation? Did you use the figures returned by the train command, or did you run a separate evaluation? Did you use a dedicated evaluation set?
This is a pretty common effect you see in machine learning and NLP more generally if your model is overfitting on new examples. It's also sometimes referred to as "catastrophic forgetting": if you're updating your model with new examples without "reminding it" of what it previously predicted correctly, it can end up "forgetting" what it previously predicted. To prevent this, you usually want to train with a mix of examples and labels, including examples of what the model previously predicted.
I'm not 100% sure I understand this question but if you're training from a base model, the existing weights will be updated. If you train from scratch, the model weights will be randomly initialized and updated from the examples. Training from a base model will always change its weights, ideally in a way that makes it better. But depending on the data and training setup, it's of course also possible to update the weights in a way that makes them perform worse on your evaluation data.