improve custom NER model accuracy

aoleiReiz · January 27, 2021, 1:17pm

we trained a custom NER model from blank:en, got about 73% validation accuracy, but when testing, the model seems learn some pattern wrongly:

when text is "135dea28e4f8 distribute frankly simon simon england" it predict "135dea28e4f8" as an entity, but after remove last few words, remain "135dea28e4f8 distribute ", it doesn't make "135dea28e4f8" as an entity
for annotations, a word depends on its Semantic environment sometimes it's an entity, sometimes not
Is there anything we can do to improve the performance

ines · January 29, 2021, 12:59am

Hi! If you have examples of concepts and patterns that your model currently gets wrong, a great way to improve it is to include examples like this (correctly annotated) in the training data. For instance, sentences prefixed with a random ID where the number is consistently annotated as an entity (or not, depending on the behaviour you're looking for). Similarly, if you have ambiguous concepts ("apple" the company vs. "apple" the fruit), you can include more examples of those in different contexts.

Are you using any pretrained embeddings, like word vectors etc.? This can often give you a significant boost in accuracy because your model starts off with at least some concept of the words in your data. For a quick experiment, just try using en_core_web_lg or en_vectors_web_lg as the base model instead of just blank:en, and see how that improves your results.

(Later, you could also use data-to-spacy to export your annotations and test spacy-nightly. It's still a pre-release and you'll have to use a separate Python environment for it, but you'll be able to experiment with initialising your model with transformer embeddings, which could improve your results even further. The #1 thing to focus on IMO is still the data, though – if you know what the model is getting wrong, there's a great opportunity here to give it more examples it can learn from.)

Topic		Replies	Views
Model tagging all texts as labels usage , ner	1	408	July 16, 2019
Improve trained models with annotations usage , ner , training	3	517	September 20, 2021
train a Spacy 'en_core_web_md' manually using ner.manual usage , ner , medical	1	1267	October 18, 2019
Improve custom NER model performance for different input texts usage , ner , spacy	1	231	February 19, 2024
Improving on spacy's existing NER entities ner	1	664	December 5, 2019

improve custom NER model accuracy

Related topics