Mixing Positive and Negative examples in Training Set for NER Modeling

I was looking at the post: "
NER annotations format with positives and negatives examples
" (Oct 19).

The question posed was if the format

  {"text":"cat is an animal and so is dog, while sandwich is not.",
    "spans": [{"start": 0, "end": 3, "label": "ANIMAL"},{"start": 27, "end": 30, "label": "ANIMAL"},{"start": 38, "end": 46, "label": "ANIMAL"}],

is OK.

Ines replied that:

"You can keep the top-level answer as "accept" , but add an additional "answer" to each span in the "spans" . Alternatively, you could also duplicate the example and create one 3 versions: one for each span and then a top-level answer."

So according to Ines if we would like all the examples to be considered as negative then the example should be formatted as follows?

 {"text":"cat is an animal and so is dog, while sandwich is not.",
        "spans": [{"start": 0, "end": 3, "label": "ANIMAL", 'answer' : 'accept'},{"start": 27, "end": 30, "label": "ANIMAL", 'answer' : '},{"start": 38, "end": 46, "label": "ANIMAL"}],
        "answer": "reject"

And a last question: Can we use this format of examples if we want to train a model using the training loop?

Or the possibility to add negative examples (i.e. "answer" : "reject") exists only when we train our model using a Prodigy recipe?

Yes, if you mark the whole task as "answer": "reject", the spans it contains would be considered as negative examples and the feedback the model would be updated with is "we don't know anything else about this text but we do know that these entities are not correct".

Note that this will only work if you train with Prodigy and set --binary, as this lets you take advantage of binary and incomplete annotations.

Yes, training from binary and incomplete annotations is a bit more complex because in order to update, we first need to calculate how to update the model to move it in the right direction based on what we know, even if we don't know the correct answer. My slides here show an example of the process and idea behind it: https://speakerdeck.com/inesmontani/belgium-nlp-meetup-rapid-nlp-annotation-through-binary-decisions-pattern-bootstrapping-and-active-learning?slide=12

1 Like