How do I fix an NER that is making 'phantom' predictions?

Hi! I’ve got an NER that I’m pretty happy with. One remaining problem is that the model will sometimes predict entities that aren’t there in the gold annotations. So it’s not simple confusion, e.g. predicting PERSON when I have an ORG; is there a way to remove this behavior from the model?

Just to clarify this: you mean the model predicts an entity type that you didn’t train it for? If so, then I would say the most likely reason must be that you trained your entities on top of an existing model. You could start off with a vectors-only model like en_vectors_web_lg instead to avoid this.

No, the type is fine (constrained by the model). The model was trained from a blank NER. It’s just that the model is predicting an entity that, according to the gold set, is not there. It’s too eager to see an entity, for want of a better explanation.

Okay so, what sort of accuracy figures are you seeing, in terms of precision, recall and F-score? If I’m understanding correctly, it sounds like the question is about low precision?

One explanation could be that you just need more data. You could try running ner.train-curve to check that. It trains the model on subsets of the data. The idea is that if your model is much worse with only 80% of your current examples, it’ll probably be much better if you annotate a further 20%.