Loading pre-annotated data that has multiple sub-labels per word

SofieVL · June 27, 2021, 2:53pm

Hi!

Traditionally, NER annotation in Prodigy allows only one label per token.

However, for Prodigy 1.11, we've created a new recipe spans.manual that will allow you to annotate overlapping and nested spans. Your input would look something like this (added newlines for readability but those wouldn't be in your JSONL file):

{"text":"I took tylenol.",

"tokens":[{"text":"I","start":0,"end":1,"id":0,"ws":true},
{"text":"took","start":2,"end":6,"id":1,"ws":true},
{"text":"tylenol","start":7,"end":14,"id":2,"ws":false},
{"text":".","start":14,"end":15,"id":3,"ws":false}],

"spans":[{"start":7,"end":14,"token_start":2,"token_end":2,"label":"Medication"},
{"start":7,"end":14,"token_start":2,"token_end":2,"label":"Generic"}]}

And then with

prodigy spans.manual my_output blank:en input.jsonl -l Medication,Generic

those spans would be preannotated:

afbeelding

For more information on the upcoming 1.11 release, currently available as a "nightly" release, see this thread: ✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans, improved feeds & more

Topic		Replies	Views
Multi-label NER usage , ner	1	1657	April 25, 2021
Annotating text with multiple labels simultaneously usage , ner , solved	1	438	April 20, 2020
Overlapping labels for paragraph annotation usage , front-end	5	981	April 12, 2024
Cant load pre-annotated ner jsonl usage , ner , solved	8	1221	June 24, 2020
Multi-labels not working usage , ner , solved	6	1041	August 23, 2019

Loading pre-annotated data that has multiple sub-labels per word

Related topics