Error while training NER model

thalish · September 2, 2021, 5:10am

Hi team, I keep running into this error frequently when trying to train NER models on annotation data that was collected using ner.correct recipe.

ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means that the model can't be updated in a way that's valid and satisfies the correct annotations specified in the GoldParse. For example, are all labels added to the model? If you're training a named entity recognizer, also make sure that none of your annotated entity spans have leading or trailing whitespace or punctuation. You can also use the experimental debug-data command to validate your JSON-formatted training data.

Things i have tried so far -

no luck. does any one have any tips on how I can move forward?

spacy : 2.3.0
prodigy : 1.10.8

p.s - i never used to face this issue before as prodigy would just throw an warning for misaligned tokens and carry on. Now i am stuck and unable to proceed. All this started happening recently after some updates

ines · September 2, 2021, 9:49am

Hi! That's definitely strange. Do you know what changed, i.e. did you upgrade Prodigy? Also, when you ran the check for misaligned spans with Doc.char_span, did it turn up any spans that were misaligned or otherwise problematic? And can you export your data with data-to-spacy and run spaCy's debug-data over it? And does that turn up anything that looks suspicious?

thalish · September 16, 2021, 8:45pm

**=========================== Data format validation ===========================**

✔ Corpus is loadable

**=============================== Training stats ===============================**

Training pipeline: tagger, parser, ner

Starting with blank model 'en'

4490 training docs

1122 evaluation docs

✔ No overlap between training and evaluation data

**============================== Vocab & Vectors ==============================**

ℹ 184242 total words in the data (13764 unique)

ℹ No word vectors present in the model

**========================== Named Entity Recognition ==========================**

ℹ 6 new labels, 0 existing labels

0 missing values (tokens with '-' label)

⚠ 1 entity span(s) with punctuation

✔ Good amount of examples for all labels

✔ Examples without occurrences available for all labels

✔ No entities consisting of or starting/ending with whitespace

Entity spans consisting of or starting/ending with punctuation can not be

trained with a noise level > 0.

**=========================== Part-of-speech Tagging ===========================**

ℹ 1 label in data (57 labels in tag map)

✘ Label '-' not found in tag map for language 'en'

**============================= Dependency Parsing =============================**

ℹ Found 184242 sentences with an average length of 1.0 words.

ℹ 1 label in train data

ℹ 1 label in projectivized train data

**================================== Summary ==================================**

✔ 5 checks passed

⚠ 1 warning

✘ 1 error

thalish · September 16, 2021, 8:50pm

It doesn't seem to throw any error using spacy-debug (for ner atleast). I have no idea how to make sense of this

thalish · September 16, 2021, 8:57pm

Anotation task format for ner_manual interface - #4 by ines I tried this to find misaligned spans and it did not throw anything.

Topic		Replies	Views
ner.batch-train after ner.maual results error (Value error : [E024]) ner , spacy , solved	8	2963	June 26, 2019
Cannot debug Annotation Data to Train NER model. ner , spacy	4	1896	October 7, 2020
NER training on dataset which was annotated on older version. usage , ner , spacy	1	2264	January 26, 2021
Matching tokenisation on pre-existing annotated data usage , ner , spacy , solved	2	553	March 27, 2020
ner.train-curve error on whitespace usage , ner , spacy	1	597	December 25, 2019

Error while training NER model

Related topics