Hi!
Traditionally, NER annotation in Prodigy allows only one label per token.
However, for Prodigy 1.11, we've created a new recipe spans.manual
that will allow you to annotate overlapping and nested spans. Your input would look something like this (added newlines for readability but those wouldn't be in your JSONL file):
{"text":"I took tylenol.",
"tokens":[{"text":"I","start":0,"end":1,"id":0,"ws":true},
{"text":"took","start":2,"end":6,"id":1,"ws":true},
{"text":"tylenol","start":7,"end":14,"id":2,"ws":false},
{"text":".","start":14,"end":15,"id":3,"ws":false}],
"spans":[{"start":7,"end":14,"token_start":2,"token_end":2,"label":"Medication"},
{"start":7,"end":14,"token_start":2,"token_end":2,"label":"Generic"}]}
And then with
prodigy spans.manual my_output blank:en input.jsonl -l Medication,Generic
those spans would be preannotated:
For more information on the upcoming 1.11 release, currently available as a "nightly" release, see this thread: ✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans, improved feeds & more