NER annotations format with positives and negatives examples

LucieG · October 29, 2019, 8:57am

I have a NER task for which I have gold standard annotations. Some are positive examples, some are negative examples. From what I understand I should use db-in to enter these annotations into a db and use ner.batch-train to build a model to recognize my entity of interest.

However, I am unsure on how to format the json file for the gold standard annotations. The examples I found only provided me with a use case for one single annotation for a given piece of text, but how to format the json file when a given piece of text has several entities, some positive some negative?

{"text":"cat is an animal and so is dog, while sandwich is not.",
"spans": [{"start": 0, "end": 3, "label": "ANIMAL"},{"start": 27, "end": 30, "label": "ANIMAL"},{"start": 38, "end": 46, "label": "ANIMAL"}],
"answer":["accept","accept","reject"]
}
Something like this?

ines · October 29, 2019, 12:48pm

Hi! You can keep the top-level answer as "accept", but add an additional "answer" to each span in the "spans". Alternatively, you could also duplicate the example and create one 3 versions: one for each span and then a top-level answer.

LucieG · October 29, 2019, 12:56pm

Thanks for the quick reply!
Doesn't the alternative solution you propose mean that the model would try to learn that in the first example cat is an animal but dog is not but try at the same time to predict in the second example that dog is an animal but cat isn't?

ines · October 29, 2019, 1:23pm

If you're training with Prodigy, spans on the same input are merged and assigned the correct answer. So if the text is the same and all examples have the same input hash, their annotations are treated as annotations on the same text. Under the hood, Prodigy will also produce one example with 3 spans that each have an "answer". (So you might as well do that yourself if you have control over the data conversion – there's not really a good argument for creating one example per span. Just wanted to mention that this is also possible.)

LucieG · October 29, 2019, 1:39pm

It would seem that the reject answers are ignored nonetheless (I should have 11421 of them)

I used

python -m prodigy db-in OSE_AE_annotations2 all_training_annotations.jsonl

Imported 27688 annotations for 'OSE_AE_annotations2' to database SQLite
Added 'accept' answer to 27688 annotations

With all_annotations.jsonl containing examples of the format you described (unless I misunderstood)
i.e. {text, spans [{start, end, label, answer},{start, end, label, answer},...]}, with some reject answers and some accept answers

ines · October 29, 2019, 2:34pm

Are you referring to the " Added 'accept' answer to 27688 annotations"? That's just the top-level "answer" property. Each example needs a top-level answer, and if that's not set in the data you import, db-in will add "accept" by default (or whatever you specify as the --answer argument).

Topic		Replies	Views
Mixing Positive and Negative examples in Training Set for NER Modeling usage , ner , spacy	1	613	October 1, 2020
Using a handmade annotation file for model training ner , best-practices	3	1627	June 22, 2018
The text is split to several text chunks while using ner.make_gold usage , ner	4	496	March 12, 2019
first annotation - can I switch mid-way from ner.manual to textcat? usage , ner , textcat	4	515	July 13, 2021
Classify NER annotations usage , ner , textcat	3	496	February 14, 2020

NER annotations format with positives and negatives examples

Related topics