I have begun my training on a blank ‘ner’ spaCy model, and have done some pattern bootstrapping, as well as ner.teach.
After that, I batch-trained my model and gotten a very low accuracy of 31%. When I tried to test my output model against some generic text, I realised that the model tags almost ALL the texts (some being a token, or a span of 2 words) as the label.
What seems to be the problem here? Thanks!
If you’re starting from scratch, the
ner.teach recipe isn’t always the most efficient, as it does take some time for the model to learn. You might be better off using the
--no-missing flag for the initial training. This tells the model that there are no entities in the data that weren’t included in your annotation. By default, the
ner.teach recipe doesn’t let the model assume that, because you’re only saying yes or no to specific suggestions.
I think what’s happening is that you haven’t said “reject” to many of the suggestions, since they’re all from your patterns file. This doesn’t give the model much to learn from — it isn’t learning what isn’t an entity. If you keep annotating, you should get some more suggestions from the model, which will let you tell it what isn’t an entity. But it might still take longer to learn this way, in comparison to just taking a slightly different approach, where you’re doing manual annotation and using the
--no-missing flag to train.