Improve custom NER model performance for different input texts


I trained a custom NER model using spaCy 3.6.0 a while ago, specialized in recognizing two labels (HARDSKILL, SOFTSKILL) in 15K manually labeled job posting texts (using Prodigy Local). It performed acceptably when the input text was a job posting, but its quality got reduced when the input was something else (i.e., a curriculum or a syllabus). To improve the performance, I know I must do some further training, but I have the following queries:

  • Should I train one isolated model per type of input text (i.e., one custom NER model more for syllabi, and another one for curricula), or can I "resume" the training of my current model, gathering samples for the input texts where my current model is performing poorly? (i.e. re – train my current NER model with more samples with texts of syllabi and curricula)
    • If I need to train "one specialized model per type of input text", how to "chain" the predictions? (i.e. how to "intersect" the entities retrieved by model1 + model2 + model 3 without having overlapping spans?) I was thinking in something like this, but I do not know if it would be the right approach.
    • If on the contrary, I can "re – train my current custom NER model with more text of the poorly-performing types of texts", is there any command I could use to do this re – training of the model? Any additional recommendations or readin material? BTW, I will probably label the new texts using prodigy.
    • To avoid confusions, just focus in the suggestion given as possible answer.

Thanks and BR,


Hi @dave-espinosa,

Whether training training separate NER models per data type will be more effective than training one model depends a bit on how different the data types are and how much data you have available per each type. Honestly, I think it's hard to say upfront and you can get the best answer through experimentation.

For option 1) i.e. one NER per data type, I think you'd need a custom spacy pipeline per each (along the lines of the example from the spaCy board) and then add another component that would implement some logic for choosing the final prediction - probably choosing the prediction from the model with the highest confidence?

For option two 2) I think the best strategy would be to add new data to the dataset and annotate it with prodigy teachwhich will serve the examples that the model is most unsure of first.

When you add the samples of new data types (syllabi and curricula) it's probably best to add some data type identifier to the meta of each example so that you can easily do your experimentation.