Improving in spacy's existing NER entities

Fourthought · December 10, 2019, 1:26pm

Hi Matt, Ines,

Have just started using prodigy, and it seems to be working well so far.

I'm looking at analysing English translations of Arabic texts and need to improve NER detection on many Arabic terms. For example, 'Islam' is often categorised as an 'ORG' when it should be 'NORP', and many of the names associated to 'PERSON' require improvement.

When there are numerous NER categories to be taught, do you recommend doing a small number and numerous passes over the dataset, or all of them together?

So far, I've chosen the annotating all together with the following workflow:

The NER categories that require improvement are, 'PERSON, NORP, FAC, ORG, GPE, PRODUCT, EVENT, WORK_OF_ART, LAW'. In the dataset are 4104 sentences.

The recipe I've been using is as follows:

prodigy ner.manual text_terms en_core_web_md full_text.jsonl --label "PERSON, NORP, FAC, ORG, GPE, PRODUCT, EVENT, WORK_OF_ART, LAW"
prodigy ner.teach text_terms en_core_web_md full_text.jsonl -- label "PERSON, NORP, FAC, ORG, GPE, PRODUCT, EVENT, WORK_OF_ART, LAW"
prodigy ner.batch-train text_terms en_core_web_md --output C:/Users/Steve/.prodigy/text_term_model/ --eval-spli 0.8 --label "PERSON, NORP, FAC, ORG, GPE, LOC, PRODUCT, EVENT, WORK_OF_ART, LAW"

This workflow gives the following results:
Correct 1557
Incorrect 403
Baseline 0.649
Accuracy 0.794

Thank you,

Steve

honnibal · December 10, 2019, 8:29pm

Hi Steve,

The ner.teach recipe works well to quickly improve a few categories, especially if you don't mind the accuracy on the other categories so much. It's a bit of a tricky approach though because the model finds it hard to learn from the binary feedback. So, sometimes it works well, but other times it struggles a bit --- and it's hard to combine with full annotations.

If you need all of the categories, you might consider the ner.make-gold recipe. This lets you correct the model's output, so I think this might be the one you want. Remember the add the --no-missing flag to the ner.batch-train command as well, to tell the model that all of the information is there.

Fourthought · December 13, 2019, 9:03pm

that's really useful, thank you Matt, will let you know how I get on!

Topic		Replies	Views
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	953	December 9, 2019
Improve trained models with annotations usage , ner , training	3	520	September 20, 2021
ner.teach to silver to gold -- how to best leverage Prodigy's recipes usage , ner	2	1292	August 19, 2019
Will NER improve Text Categorization?	2	413	July 18, 2022
Does prodigy.models.ner.EntityRecognizer constructor modify the underlying nlp model? usage , ner , done , solved	5	661	July 8, 2021

Improving in spacy's existing NER entities

Related topics