No worries. For NER, the gold-standard annotations are essentially the complete and correct set of entities on the text.
When you annotate with Prodigy, you’re usually only collecting annotations for one particular part of the document at a time. This is fine, because it still gives us plenty of gradients to train on, which will likely improve the model. But it also means that you won’t necessarily cover all entities that occur in the document, or entity-related information for all tokens (e.g. if the token is part of an entity or not).
For example, the gold-standard NER annotations for the sentence “Bill Jerome Holmes is a person and Facebook is not” would be:
('B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'U-ORG', 'O', 'O')
B
= beginning of an entity, I
= inside an entity, L
= last token of an entity, U
= entity unit (i.e. single-token entity) and O
= outside an entity.
So let’s assume you’ve come across this sentence in Prodigy and select ACCEPT for “Facebook” as an ORG
. The state of the gold-standard annotations will look like this:
('?', '?', '?', '?', '?', '?', '?', 'U-ORG', '?', '?')
ner.make-gold
will keep iterating over your data and keep asking you questions about the contained entities, until it’s filled in all the blanks (marked with a ?
in my example). All possible annotations have different probabilities, and for some of them, we already know that they’re invalid – for example, the token before U-ORG
can’t be a B-
token (i.e. the beginning of an entity), because that can only occur followed by a I-
(inside) or L-
(last). As you annotate, you also define more constraints that should – hopefully – narrow in on the one, correct solution.
Whether you really need gold-standard annotations for what you’re doing is a different question. If you’re creating a training corpus or evaluation data, you’ll likely want annotations that cover everything that’s in the document. If you just want to improve the model or the overall accuracy, you might be better off simply feeding Prodigy more examples and more data that it can learn and generalise from. This is also more fun and less tedious and going over the same data again and again.