I am getting below error when trying to do ner.batch-train for a new entity. I get this error when trying to train a blank model as well as when training an existing spacy model (like en_core_web_sm).
This error is kind of strange as it is throwing error even when the entities in conflict are same.
File "gold.pyx", line 715, in spacy.gold.GoldParse.init
File "gold.pyx", line 925, in spacy.gold.biluo_tags_from_offsets
ValueError: [E103] Trying to set conflicting doc.ents: '(2, 43, 'adminagent')' and '(2, 43, 'adminagent')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
I understand the intent behind the error, but I am a bit puzzled as there is only one entity in the annotations. Here is the annotation that it is throwing error at.
{'text': ' CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, as Administrative Agent',
'_input_hash': -1415453208,
'_task_hash': 354718826,
'tokens': [{'text': ' ', 'start': 0, 'end': 2, 'id': 0},
{'text': 'CREDIT', 'start': 2, 'end': 8, 'id': 1},
{'text': 'SUISSE', 'start': 9, 'end': 15, 'id': 2},
{'text': 'AG', 'start': 16, 'end': 18, 'id': 3},
{'text': ',', 'start': 18, 'end': 19, 'id': 4},
{'text': 'CAYMAN', 'start': 20, 'end': 26, 'id': 5},
{'text': 'ISLANDS', 'start': 27, 'end': 34, 'id': 6},
{'text': ' ', 'start': 35, 'end': 37, 'id': 7},
{'text': 'BRANCH', 'start': 37, 'end': 43, 'id': 8},
{'text': ',', 'start': 43, 'end': 44, 'id': 9},
{'text': 'as', 'start': 45, 'end': 47, 'id': 10},
{'text': 'Administrative', 'start': 48, 'end': 62, 'id': 11},
{'text': 'Agent', 'start': 63, 'end': 68, 'id': 12}],
'_session_id': 'adminagent-default',
'_view_id': 'ner_manual',
'spans': [{'start': 2,
'end': 43,
'token_start': 1,
'token_end': 8,
'label': 'adminagent'}],
'answer': 'accept'}
I have to annotate large amount of corpus and large number of entities, so the steps that I took are as follows:
- Annotated few samples using ner.manual
- Used ner.batch-train to create a seed model base don annotations from (1)
- Used seed model from (2) to create another set of annotations for entity by using ner.teach binary annotation recepie
I have used same dataset for 1 & 3 annotations, and now trying to train a model based on annotations from step 3 using ner.batch-train when I get the conflict error.
Pls advise.