We are trying to manual review our relation annotations over two spans.
Recipe I am using
prodigy rel.manual db_name en_core_web_lg /pathtojsonl.jsonl --label attributed_to --span-label ENT1,ENT2 --wrap
One of Example data:
{"text": "Roberts-Smith denies any wrongdoing.", "tokens": [{"text": "Roberts-Smith", "start": 0, "end": 13, "id": 0}, {"text": "denies", "start": 14, "end": 20, "id": 1}, {"text": "any wrongdoing", "start": 21, "end": 35, "id": 2}, {"text": ".", "start": 35, "end": 36, "id": 3}], "spans": [{"start": 21, "end": 35, "token_start": 2, "token_end": 3, "label": "ENT2"}, {"start": 0, "end": 13, "token_start": 0, "token_end": 1, "label": "ENT1"}], "relations": [{"head": 2, "child": 0, "label": "attributed_to", "head_span": {"start": 21, "end": 35, "token_start": 2, "token_end": 3, "label": "ENT2"}, "child_span": {"start": 0, "end": 13, "token_start": 0, "token_end": 1, "label": "ENT1"}}]}
And while running the recipe script i get this error warning message:
⚠ Skipped 2 span(s) that were already present in the input data because
the tokenization didn't match.
⚠ Skipped 2 span(s) that were already present in the input data because
the tokenization didn't match.
⚠ Skipped 2 span(s) that were already present in the input data because
the tokenization didn't match.
The relations are highlighted correctly just the span label appear on some and miss on others.
Problem is when i db-out i missing all my span labels for the warning ones.
How should i handle this warning to use custom tokens and spans