Hi, I'm using a model that I retrained with Prodigy for more accurate NER detection. The model was trained with Spacy 2.1 and previous version of Prodigy.
After updating both packages (Prodigy 1.8.4, Spacy 2.2.1, redownloaded models vor v2.2) and trying to retrain the model with existing dataset, which worked fine before, I now get an error:
Loaded model en_core_web_sm
Using 5% of accept/reject examples (933) for evaluation
Using 100% of remaining examples (17739) for training
Dropout: 0.15 Batch size: 16 Iterations: 4
BEFORE 0.671
Correct 589
Incorrect 289
Entities 2870
Unknown 2241
# LOSS RIGHT WRONG ENTS SKIP ACCURACY
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/virostatiq/PycharmProjects/prodigy_annotation/venv/lib/python3.6/site-packages/prodigy/__main__.py", line 380, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 212, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/home/virostatiq/PycharmProjects/prodigy_annotation/venv/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/home/virostatiq/PycharmProjects/prodigy_annotation/venv/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/home/virostatiq/PycharmProjects/prodigy_annotation/venv/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 621, in batch_train
examples, batch_size=batch_size, drop=dropout, beam_width=beam_width
File "cython_src/prodigy/models/ner.pyx", line 362, in prodigy.models.ner.EntityRecognizer.batch_train
File "cython_src/prodigy/models/ner.pyx", line 441, in prodigy.models.ner.EntityRecognizer._update
File "gold.pyx", line 597, in spacy.gold.GoldParse.__init__
File "gold.pyx", line 809, in spacy.gold.biluo_tags_from_offsets
**ValueError: [E103] Trying to set conflicting doc.ents: '(0, 4, '!ORG')' and '(0, 4, 'PERSON')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.**
How could I fix the dataset?