Trained Model is not Generalized

I have build a NER model using Prodigy. I have annotated the data based on dictionaries. Now, the model can predict the entities that are given in the dictionary but the model fails in two cases:

  1. Suppose the Entity is Movie and it is trained Titanic as a Movie. When I use Titanics / Titanicc instead of Titanic it cannot identify it as a movie entity.

  2. Suppose Game of Thrones is not in the dictionary but other types of movies are. When an user use Game of Thrones why the model don't understand that it can be a movie?

How can I tackle these two issues?

Hi @ta13,

I'm curious as to (1) how many examples have you collected and (2) what kind of entities are you training for and (3) what dataset are you working on (e.g. tweets, reddit comments, etc.)? Generally, even if an NER model can generalize based on the context, it still needs enough representative examples.

Usually, NER models look into the context or semantics of the word to determine its entity. As a contrived example, when we look for MOVIE, it's possible that the NER model looks for instances of the word "watching" (we're watching X), "theatre" (we went to the theatre for X), etc. to determine that a particular word / token is a MOVIE. Thus, to properly train an NER model, we need to provide samples that can help surface that pattern. It's not simple matching, but of course it won't hurt to have frequent entities covered explicitly in the data.

So to answer your questions:

  1. Perhaps the dataset lacks generalizable instances of "Titanic." You can make do with some data augmentation techniques (e.g. using skweak, nlpaug, etc.) to improve your results.
  2. Similar to the one above, your dataset should have enough representative examples of what you're looking for. If it doesn't have samples pertaining to "Game of Thrones" or "GOT", then it may not be able to detect that later on. Even if the NER model looks for the context, it pays to have these frequent entities show up in your dataset. :slight_smile:
1 Like


Thanks for your comment. But the problem I am facing is that the model is not learning well. I have total 17 entities. I have pre-annotated with the entities dictionaries. But it's hard for model to predict non-dictionary entities. It's not learning well . What is the best way to achieve this task. To train the model I am using prodigy train using is using spacy's custom config and evaluate it with 20% data. The acc is high like, 99%, maybe because the train and test both data covers the dictionary word but when I provide new example of entities, it's tough for model to predict. I need a suggestion here.


Hi @ta13 ,

If the model isn't generalizing, my guess is that the dataset you're training on is not representative, or the 17 entities are ambiguous / difficult to predict (some of them may "semantically overlap" and it's hard to differentiate one over the other).

If the 99% accuracy is coming from the evaluation (dev) data, then perhaps what you're testing on is not entirely representative. You might want to update your evaluation data to better represent the examples you'll see at runtime.

In summary, there are two ways you can go with this: (1) revisit your training and evaluation data if they're truly representative of the test data and (2) revisit your 17 labels and corresponding annotations if they're correct and consistent.