Recipe ner.batch-train results in ValueError: [E030]

ines · May 22, 2019, 10:23pm

The --unsegmented flag only means that Prodigy won't apply the sentence segmenter to split your texts into sentences. If your examples are already pre-segmented, this is fine – but if your data contains lots of really long texts, you probably want to split them, because otherwise training may be slow and the long texts may throw off the model. So it should be fine in your case.

Ahh, I meant examples that consist of only one token. So basically, where "text" has only one word. Do you find any of those as well?

Do any of the entity spans you've annotated start or end on whitespace characters? In spaCy v2.1, it's now "illegal" for the named entity recognizer to predict entities that start or end with whitespace, or consist of only whitespace. For example, "\n", but also "hello\n". This should be a really helpful change, because those entities are pretty much always wrong, and making them "illegal" limits the options and moves the entity recognizer towards correct predictions. But it also means that if you data contains training examples like this, you probably want to remove or fix them.

Topic		Replies	Views
Prerequisites for the dep.teach recipe ner , spacy	4	748	January 18, 2019
Training on binary annotations throws error done , training	4	683	August 12, 2021
ValueError: [E030] Sentence boundaries unset. spacy	1	702	March 2, 2022
Issues with ner.batch-train with en_trf_bertbaseuncased_lg after creating a custom set of labels enhancement , usage , ner , solved , transformers	1	1169	October 14, 2019
[E896] on training existing model (NER) usage , ner	1	297	October 10, 2023

Recipe ner.batch-train results in ValueError: [E030]

Related topics