Does data need to be reannotated to use train recipe for predicting span labels after rel.manual recipe was used?

I've annotated sentences for joint entity and relation by following this page:

This page states that it is possible to do "entity and relation annotation at the same time."

Specifically, I annotated 10k+ sentences using the command from the #ner-joint section of the docs above:

prodigy rel.manual annotations en_core_web_sm sentences.jsonl --span-label country,language --label EXP,NEG

My goal is now to train a named entity recognition model to predict only the label of "country". How might I do this?

Here is what I've tried:

  1. prodigy train --spancat annotations yields the error:
  File "/home/jaan/miniconda3/envs/dev/lib/python3.8/site-packages/prodigy/recipes/", line 951, in infer_spancat_suggester
    char_span = doc.char_span(span["start"], span["end"])
KeyError: 'start'
  1. prodigy train --ner annotations yields the error:
✘ Invalid data for component 'ner'
  1. I used prodigy train --parser annotations to successfully train a model to do dependency parsing. However, this model only predicts the labels EXP,NEG and does not predict the span labels that have been annotated.

Is it true that it is impossible to train a named entity recognition system if data has been annotated using the rel.manual recipe?

Does this mean that data must be re-annotated using the ner.manual recipe?

Thanks so much! Currently blocked on this.

Hi! First, the answer to your question: No, you shouldn't have to re-annotate anything and the span data created with rel.manual should be the same format expected by the NER and span categorizer training. So it's definitely strange that something is missing here and you ended up with a span without a start.

Which version of Prodigy are you using and could you add a print statement before that line to see what that example looks like that it fails on? Maybe you somehow ended up with an empty span, which would indicate an issue in the relations UI :thinking: