Fixing NER Spans

ayplam · May 4, 2018, 5:28pm

Hi,

I can see that there is a recipe to binary say “yes/no” is this named entity/span correct, and there is a recipe for manually marking the named entities. Is there a way to combine both recipes? We would like to be able to say yes/no, this NER is correct or not, but if something like “Washington Smith high school” gets marked as just “Washington Smith” during the named entity recognition, we would like to be able to fix the named entity, instead of marking it as incorrect.

ines · May 4, 2018, 5:30pm

Yes – if I understand your question correctly, the recipe you’re looking for is ner.make-gold. See here or the respective section in your PRODIGY_README.html for more details.

The ner.make-gold recipe uses the model to show you the predicted entities for the selected label(s) and makes them editable, so you can manually correct or remove them.

ayplam · May 4, 2018, 10:08pm

Excellent, I thought the make-gold only had the ability to manually mark entities, but I do see now that it also suggests them, allowing users to accept/deny. Thanks!

Ben · May 6, 2018, 8:43pm

My ner.make-gold does mostly not suggest entities. So I always need to mark the entity and click accept.

In cases with no entity, I would not mark anything and click accept (the empty box). Is that right - or do I have to click reject in this case?
Is it also useful to mark some not entities (which have been frequently suggested in the ner.teach recipe) and click on reject?

ines · May 7, 2018, 10:14am

Yes, that's correct and very important, actually! The fact that a sentence is "correct" and includes no entities is just as important for the model to learn from.

This depends on how you're using the data to train your model later on. If you use ner.gold-to-spacy or a similar approach to convert the annotations and then train your model assuming that the annotations are complete, adding wrong examples manually won't make a difference. If you accept an example, it's then clear that it's gold standard, and that entities that are not labelled in the example must be wrong.

However, if you're working with sparse data and you can't assume that every annotated example is complete, adding negative examples might help – especially if there are noticable mistakes that are easy to replicate in your data (and then reject).

You can also try to pre-train a model with annotations you've already collected, and then load it back into ner.make-gold to see what it suggests, and correct those predictions manually. So for example, you start off with en_core_web_sm, annotate for a bit, update the model with your new annotations and then load the updated model for the next annotation session.

Topic		Replies	Views
Annotation strategy for gold-standard data usage , ner , solved , best-practices	5	2709	October 26, 2018
make-gold workflow usage , ner	1	628	June 11, 2018
ner.make-gold to re-evaluate pre-annotated dataset ner , solved	2	666	July 25, 2018
Annotating correctly using the ner.correct recipe usage , ner , solved	5	459	January 20, 2022
annotate multi phrases using ner.make-gold usage , ner	1	707	February 19, 2019

Fixing NER Spans

Related topics