Hi,
I have successfully merged two datasets without any errors. The datasets were annotated by two different annotators using ner.manual.
I was trying to batch-train a model using ner.batch-train on this merged dataset using the following command:
prodigy ner.batch-train dataset en_core_web_lg --output ./tmp/model --n-iter 10 --eval-split 0.2 --dropout 0.2
We keep on encountering the following error which is really hard to interpret. Could you please help us in understanding the error:
Loaded model en_core_web_lg
Using 20% of accept/reject examples (21) for evaluation
Using 100% of remaining examples (1373) for training
Dropout: 0.2 Batch size: 4 Iterations: 10
BEFORE 0.017
Correct 3
Incorrect 177
Entities 434
Unknown 431
# LOSS RIGHT WRONG ENTS SKIP ACCURACY
43%|███████████████████████████████████████████████████████████████████████████████████▊ | 584/1373 [00:05<00:07, 101.96it/s]['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-REQNUM', 'I-REQNUM', 'L-REQNUM', 'O', 'O', 'B-PLACE', 'I-PLACE', 'I-PLACE', 'I-PLACE', 'I-PLACE', 'I-PLACE', 'I-PLACE', 'I-PLACE', 'I-PLACE', 'L-PLACE', 'O', 'O', 'B-ISSUE', 'I-ISSUE', 'I-ISSUE', 'I-ISSUE', 'L-ISSUE', 'O']
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-REQNUM', 'I-REQNUM', 'L-REQNUM', 'O', 'O', 'O', 'O', 'O', 'B-PHONE', 'I-PHONE', 'I-PHONE', 'I-PHONE', 'I-PHONE', 'I-PHONE', 'I-PHONE', 'I-PHONE', 'L-PHONE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
['O', 'O', 'O', 'U-ORG', 'O', 'O', 'O', 'U-DATE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
Traceback (most recent call last):
File "/anaconda3/envs/venv36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/anaconda3/envs/venv36/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/anaconda3/envs/venv36/lib/python3.6/site-packages/prodigy/__main__.py", line 380, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 212, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/anaconda3/envs/venv36/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/anaconda3/envs/venv36/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/anaconda3/envs/venv36/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 621, in batch_train
examples, batch_size=batch_size, drop=dropout, beam_width=beam_width
File "cython_src/prodigy/models/ner.pyx", line 362, in prodigy.models.ner.EntityRecognizer.batch_train
File "cython_src/prodigy/models/ner.pyx", line 453, in prodigy.models.ner.EntityRecognizer._update
File "cython_src/prodigy/models/ner.pyx", line 446, in prodigy.models.ner.EntityRecognizer._update
File "cython_src/prodigy/models/ner.pyx", line 447, in prodigy.models.ner.EntityRecognizer._update
File "/anaconda3/envs/venv36/lib/python3.6/site-packages/spacy/language.py", line 457, in update
proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
File "nn_parser.pyx", line 413, in spacy.syntax.nn_parser.Parser.update
File "nn_parser.pyx", line 519, in spacy.syntax.nn_parser.Parser._init_gold_batch
File "transition_system.pyx", line 86, in spacy.syntax.transition_system.TransitionSystem.get_oracle_sequence
File "transition_system.pyx", line 148, in spacy.syntax.transition_system.TransitionSystem.set_costs
ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the GoldParse was not correct. For example, are all labels added to the model?
We are able to train the model on one dataset, export it and retrain the exported model on the second dataset, but the problem arises when we try to train on merged dataset.
Also could you please help us understand what the numbers 584/1373 mean in the following progress bar?
(Because if we count the total number annotations in the merged dataset, there are more than 1373 annotations, assuming 1373 is the number of annotations)
43%|███████████████████████████████████████████████████████████████████████████████████▊ | 584/1373
Thanks.