NER annotations format with positives and negatives examples

I have a NER task for which I have gold standard annotations. Some are positive examples, some are negative examples. From what I understand I should use db-in to enter these annotations into a db and use ner.batch-train to build a model to recognize my entity of interest.

However, I am unsure on how to format the json file for the gold standard annotations. The examples I found only provided me with a use case for one single annotation for a given piece of text, but how to format the json file when a given piece of text has several entities, some positive some negative?

{"text":"cat is an animal and so is dog, while sandwich is not.",
"spans": [{"start": 0, "end": 3, "label": "ANIMAL"},{"start": 27, "end": 30, "label": "ANIMAL"},{"start": 38, "end": 46, "label": "ANIMAL"}],
"answer":["accept","accept","reject"]
}
Something like this?

Hi! You can keep the top-level answer as "accept", but add an additional "answer" to each span in the "spans". Alternatively, you could also duplicate the example and create one 3 versions: one for each span and then a top-level answer.

Thanks for the quick reply!
Doesn't the alternative solution you propose mean that the model would try to learn that in the first example cat is an animal but dog is not but try at the same time to predict in the second example that dog is an animal but cat isn't?

If you're training with Prodigy, spans on the same input are merged and assigned the correct answer. So if the text is the same and all examples have the same input hash, their annotations are treated as annotations on the same text. Under the hood, Prodigy will also produce one example with 3 spans that each have an "answer". (So you might as well do that yourself if you have control over the data conversion – there's not really a good argument for creating one example per span. Just wanted to mention that this is also possible.)

It would seem that the reject answers are ignored nonetheless (I should have 11421 of them)

I used

python -m prodigy db-in OSE_AE_annotations2 all_training_annotations.jsonl

:sparkles: Imported 27688 annotations for 'OSE_AE_annotations2' to database SQLite
Added 'accept' answer to 27688 annotations

With all_annotations.jsonl containing examples of the format you described (unless I misunderstood)
i.e. {text, spans [{start, end, label, answer},{start, end, label, answer},...]}, with some reject answers and some accept answers

Are you referring to the " Added 'accept' answer to 27688 annotations"? That's just the top-level "answer" property. Each example needs a top-level answer, and if that's not set in the data you import, db-in will add "accept" by default (or whatever you specify as the --answer argument).