Hi team, I keep running into this error frequently when trying to train NER models on annotation data that was collected using ner.correct recipe.
ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means that the model can't be updated in a way that's valid and satisfies the correct annotations specified in the GoldParse. For example, are all labels added to the model? If you're training a named entity recognizer, also make sure that none of your annotated entity spans have leading or trailing whitespace or punctuation. You can also use the experimental debug-data command to validate your JSON-formatted training data.
Things i have tried so far -
no luck. does any one have any tips on how I can move forward?
spacy : 2.3.0
prodigy : 1.10.8
p.s - i never used to face this issue before as prodigy would just throw an warning for misaligned tokens and carry on. Now i am stuck and unable to proceed. All this started happening recently after some updates
Hi! That's definitely strange. Do you know what changed, i.e. did you upgrade Prodigy? Also, when you ran the check for misaligned spans with Doc.char_span, did it turn up any spans that were misaligned or otherwise problematic? And can you export your data with data-to-spacy and run spaCy's debug-data over it? And does that turn up anything that looks suspicious?
**=========================== Data format validation ===========================**
✔ Corpus is loadable
**=============================== Training stats ===============================**
Training pipeline: tagger, parser, ner
Starting with blank model 'en'
4490 training docs
1122 evaluation docs
✔ No overlap between training and evaluation data
**============================== Vocab & Vectors ==============================**
ℹ 184242 total words in the data (13764 unique)
ℹ No word vectors present in the model
**========================== Named Entity Recognition ==========================**
ℹ 6 new labels, 0 existing labels
0 missing values (tokens with '-' label)
⚠ 1 entity span(s) with punctuation
✔ Good amount of examples for all labels
✔ Examples without occurrences available for all labels
✔ No entities consisting of or starting/ending with whitespace
Entity spans consisting of or starting/ending with punctuation can not be
trained with a noise level > 0.
**=========================== Part-of-speech Tagging ===========================**
ℹ 1 label in data (57 labels in tag map)
✘ Label '-' not found in tag map for language 'en'
**============================= Dependency Parsing =============================**
ℹ Found 184242 sentences with an average length of 1.0 words.
ℹ 1 label in train data
ℹ 1 label in projectivized train data
**================================== Summary ==================================**
✔ 5 checks passed
⚠ 1 warning
✘ 1 error