The output from either recipe is very similar.
Suppose I annotate this one example:
{"text": "My name is Vincent"}
Via both of these interfaces:
# NER interface
python -m prodigy ner.manual example-ner blank:en examples.jsonl --label name
# SPANCAT interface
python -m prodigy spans.manual example-span blank:en examples.jsonl --label name
Then the output is nearly identical.
NER
This is the output from python -m prodigy db-out example-ner
:
{"text":"Hi. My name is Vincent","_input_hash":1333440749,"_task_hash":39342451,"_is_binary":false,"tokens":[{"text":"Hi","start":0,"end":2,"id":0,"ws":false},{"text":".","start":2,"end":3,"id":1,"ws":true},{"text":"My","start":4,"end":6,"id":2,"ws":true},{"text":"name","start":7,"end":11,"id":3,"ws":true},{"text":"is","start":12,"end":14,"id":4,"ws":true},{"text":"Vincent","start":15,"end":22,"id":5,"ws":false}],"_view_id":"ner_manual","spans":[{"start":15,"end":22,"token_start":5,"token_end":5,"label":"name"}],"answer":"accept","_timestamp":1658923692}
SPANCAT
This is the output from python -m prodigy db-out example-span
:
{"text":"Hi. My name is Vincent","_input_hash":1333440749,"_task_hash":39342451,"tokens":[{"text":"Hi","start":0,"end":2,"id":0,"ws":false},{"text":".","start":2,"end":3,"id":1,"ws":true},{"text":"My","start":4,"end":6,"id":2,"ws":true},{"text":"name","start":7,"end":11,"id":3,"ws":true},{"text":"is","start":12,"end":14,"id":4,"ws":true},{"text":"Vincent","start":15,"end":22,"id":5,"ws":false}],"_view_id":"spans_manual","spans":[{"start":15,"end":22,"token_start":5,"token_end":5,"label":"name"}],"answer":"accept","_timestamp":1658923725}
Spans
In particular, you'll notice that the spans are identical.
"spans":[{"start":15,"end":22,"token_start":5,"token_end":5,"label":"name"}]
In fact, in this case, you can even run:
python -m prodigy train --spancat example-ner
python -m prodigy train --spancat example-span
As far as spancat is concerned, it just tries to learn from annotated "spans". The main difference is that NER won't allow for spans that overlap while spancat does. So I think you won't need to really worry about a "translation" when it comes to the NER -> span annotations.
The only part that I might be worried about is the tokenizer. There might be some edge cases if you're interested in fine-tuning a transformer model. Are there any issues that you've come across while training? If so, could you share the commands that you tried to run with the error message?