What is meaning of "fully correct" in NER active learning instructions?


I am am new to Prodigy and excited to try the active learning features for NER. I see in the docs that I should only accept answers that are "fully correct." But when I use the active learning via ner.teach recipe on a newly trained model Prodigy seems to be only showing me one annotated NER span per text example for my current model. This is confusing me in cases where multiple spans should be tagged as NER.

Does "fully correct" for prodigy ner.teach ask if one span is fully correct or if all spans in one text example are fully correct? For instance, say I have a country label and text example "I traveled to Germany and Spain." Say my current model identifies/highlights the token Germany but not the token Spain. Does this count as "fully correct?" The token Germany is highlighted correctly, so the highlighting itself is fully correct (no false positives). But it misses Spain, and thus can not be said to be fully correct.

I guess my question is: does "fully correct" refer to the fact that all spans in the example text are tagged correctly (no false positives or false negatives), or does "fully correct" refer to the fact that a single, displayed highlighted span is tagged correctly?

hi @AbeHandler,

Thanks for your question and welcome to the Prodigy community :wave:

For ner.teach, decisions are only for one span being correct at a time. This is why it's also called a "binary" annotation in the docs:

The binary ner.teach workflow implements uncertainty sampling with beam search: for each example, the annotation model gets a number of analyses and asks you to accept or reject the entity analyses it’s most uncertain about. Based on your decisions, the model is updated in the loop and guided towards better predictions . Even if you don’t update with the full gold-standard annotation, confirming correct analyses and eliminating “dead ends” can very quickly move the model towards the right answer, since all token-based tagging decisions influence each other. Collecting binary feedback is also very fast and allows the annotator to focus on one concept and decision at a time, reducing the potential for human error and inconsistent data.

Yes, it would be correct because it's only evaluating whether "Germany" is correct. Very likely, the next span it'll predict would be "Spain".

If you want to "fully correct", that's what the ner.correct recipe is for. Suggestions for the ner.correct should be accepted when all entities are identified.

There are some old posts that mention the "information loss" when using binary annotations as opposed to "gold" (or manual) annotations:

Hope this helps!

Thanks! That is very helpful. I get it now!

If the team is still working on the docs for Prodigy I would consider possibly clarifying this a little bit. I found it a bit confusing at first but now it makes sense.

Thanks @AbeHandler! Good point. We'll look into updating the docs to see if we can better communicate this.