Fixing NER Spans

Hi,

I can see that there is a recipe to binary say “yes/no” is this named entity/span correct, and there is a recipe for manually marking the named entities. Is there a way to combine both recipes? We would like to be able to say yes/no, this NER is correct or not, but if something like “Washington Smith high school” gets marked as just “Washington Smith” during the named entity recognition, we would like to be able to fix the named entity, instead of marking it as incorrect.

Yes – if I understand your question correctly, the recipe you’re looking for is ner.make-gold. See here or the respective section in your PRODIGY_README.html for more details.

The ner.make-gold recipe uses the model to show you the predicted entities for the selected label(s) and makes them editable, so you can manually correct or remove them.

Excellent, I thought the make-gold only had the ability to manually mark entities, but I do see now that it also suggests them, allowing users to accept/deny. Thanks!

My ner.make-gold does mostly not suggest entities. So I always need to mark the entity and click accept.

  • In cases with no entity, I would not mark anything and click accept (the empty box). Is that right - or do I have to click reject in this case?
  • Is it also useful to mark some not entities (which have been frequently suggested in the ner.teach recipe) and click on reject?

Yes, that's correct and very important, actually! The fact that a sentence is "correct" and includes no entities is just as important for the model to learn from.

This depends on how you're using the data to train your model later on. If you use ner.gold-to-spacy or a similar approach to convert the annotations and then train your model assuming that the annotations are complete, adding wrong examples manually won't make a difference. If you accept an example, it's then clear that it's gold standard, and that entities that are not labelled in the example must be wrong.

However, if you're working with sparse data and you can't assume that every annotated example is complete, adding negative examples might help – especially if there are noticable mistakes that are easy to replicate in your data (and then reject).

You can also try to pre-train a model with annotations you've already collected, and then load it back into ner.make-gold to see what it suggests, and correct those predictions manually. So for example, you start off with en_core_web_sm, annotate for a bit, update the model with your new annotations and then load the updated model for the next annotation session.

1 Like