Loading pre-annotated data that has multiple sub-labels per word

Jhodson · June 25, 2021, 1:27pm

Hello,

I currently have pre-annotated data that has words that require multiple labels in a hierarchical format. EX:

Text: "I took tylenol."

Tylenol - Label: Medication
Tylenol - Sub-label: Polar
Tylenol - Sub-label: Generic
etc..

Currently the format to load this in a single label is:

{
'text': 'I took tylenol.',
'tokens': etc.. ,
'spans':[{'start':7,'end':13,'token_start':2,'token_end':2,'label':'Medication'}]
}

This format loaded in using prodigy mark as a JSONL will highlight Tylenol as the medication which is a great first step. How can I edit this format to include the multiple sub-labels on the same word?

SofieVL · June 27, 2021, 2:53pm

Hi!

Traditionally, NER annotation in Prodigy allows only one label per token.

However, for Prodigy 1.11, we've created a new recipe spans.manual that will allow you to annotate overlapping and nested spans. Your input would look something like this (added newlines for readability but those wouldn't be in your JSONL file):

{"text":"I took tylenol.",

"tokens":[{"text":"I","start":0,"end":1,"id":0,"ws":true},
{"text":"took","start":2,"end":6,"id":1,"ws":true},
{"text":"tylenol","start":7,"end":14,"id":2,"ws":false},
{"text":".","start":14,"end":15,"id":3,"ws":false}],

"spans":[{"start":7,"end":14,"token_start":2,"token_end":2,"label":"Medication"},
{"start":7,"end":14,"token_start":2,"token_end":2,"label":"Generic"}]}

And then with

prodigy spans.manual my_output blank:en input.jsonl -l Medication,Generic

those spans would be preannotated:

afbeelding

For more information on the upcoming 1.11 release, currently available as a "nightly" release, see this thread: ✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans, improved feeds & more

Topic		Replies	Views
Multi-label NER usage , ner	1	1635	April 25, 2021
Annotating text with multiple labels simultaneously usage , ner , solved	1	426	April 20, 2020
Overlapping labels for paragraph annotation usage , front-end	5	908	April 12, 2024
Cant load pre-annotated ner jsonl usage , ner , solved	8	1183	June 24, 2020
Multi-labels not working usage , ner , solved	6	1016	August 23, 2019

Loading pre-annotated data that has multiple sub-labels per word

Related topics