Does data need to be reannotated to use train recipe for predicting span labels after rel.manual recipe was used?

jaan · October 12, 2021, 7:08pm

I've annotated sentences for joint entity and relation by following this page: https://prodi.gy/docs/dependencies-relations#ner-joint

This page states that it is possible to do "entity and relation annotation at the same time."

Specifically, I annotated 10k+ sentences using the command from the #ner-joint section of the docs above:

prodigy rel.manual annotations en_core_web_sm sentences.jsonl --span-label country,language --label EXP,NEG

My goal is now to train a named entity recognition model to predict only the label of "country". How might I do this?

Here is what I've tried:

prodigy train --spancat annotations yields the error:

  File "/home/jaan/miniconda3/envs/dev/lib/python3.8/site-packages/prodigy/recipes/data_utils.py", line 951, in infer_spancat_suggester
    char_span = doc.char_span(span["start"], span["end"])
KeyError: 'start'

prodigy train --ner annotations yields the error:

✘ Invalid data for component 'ner'

I used prodigy train --parser annotations to successfully train a model to do dependency parsing. However, this model only predicts the labels EXP,NEG and does not predict the span labels that have been annotated.

Is it true that it is impossible to train a named entity recognition system if data has been annotated using the rel.manual recipe?

Does this mean that data must be re-annotated using the ner.manual recipe?

Thanks so much! Currently blocked on this.

ines · October 15, 2021, 1:39pm

Hi! First, the answer to your question: No, you shouldn't have to re-annotate anything and the span data created with rel.manual should be the same format expected by the NER and span categorizer training. So it's definitely strange that something is missing here and you ended up with a span without a start.

Which version of Prodigy are you using and could you add a print statement before that line to see what that example looks like that it fails on? Maybe you somehow ended up with an empty span, which would indicate an issue in the relations UI

Topic		Replies	Views
Processing annotated data usage , ner	1	312	January 20, 2022
Prodigy single span data incompatible with NER model which expects all data to be present? usage , ner , api	3	874	August 17, 2018
rel.manual to train ner and dependency ner , done , solved , dep , relations	15	2048	September 7, 2020
Best way to re-label / re-annotate existing data based on condition ner	1	421	September 19, 2022
Named Entities(manual) usage , ner , solved	4	803	May 11, 2018

Does data need to be reannotated to use train recipe for predicting span labels after rel.manual recipe was used?

Related topics