ner.batch-train after ner.maual results error (Value error : [E024])

pathapatisivayya · June 19, 2019, 8:49am

I Manually trained few entities like Skill, Role, Employer …

python3 -m prodigy ner.manual resume_ner en_core_web_lg /home/Desktop/sample.txt --label
“SKILL,ROLE,EMPLOYER”

And after started batch train
python3 -m prodigy ner.batch-train resume_ner en_core_web_lg --output resume-model --label “SKILL,ROLE,EMPLOYER” --eval-split 0.2 --n-iter 6 --batch-size 8

Which results
ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the GoldParse was not correct. For example, are all labels added to the model?

Unable to extract custom spacy model.

ines · June 20, 2019, 7:56am

Hi! Did you use the same Prodigy version for annotation and training, or did you collect the annotations in a previous version?

It's likely that this is related to the update to spaCy v2.1, which is stricter about gold standard data and constraints for the parser and named entity recognizer. See my reply from this thread:

So you might want to double-check the data and see if you have any "illegal" spans in there. It's usually pretty rare and removing them should be no problem, because in most cases, they'd be rejected suggestions anyway.

pathapatisivayya · June 20, 2019, 2:58pm

Thanks for the prompt response. Which is very helpful.

In my understanding, with in the trained model I have taken ‘Artificial Intelligence’, ‘Machine Learning’ as single entities under the label SKILL. So this is the reason for the error?

If that was the reason, is there any way to handle such kind of multiple word entities / tokens?

Thanks in advance.

ines · June 20, 2019, 3:23pm

Multi-word entities are no problem – in fact, this is one of the key features of NER.

But spaCy now explicitly raises errors for spans that start or end with whitespace characters, or consist of only whitespace. So "Artificial Intelligence" is totally fine – but an annotated entity for "\nArtificial Intelligence" or "\n" would be invalid.

pathapatisivayya · June 24, 2019, 10:29am

Hello there,

I am training ner-model as below steps:

python3 -m prodigy ner.manual resume_ner en_core_web_lg /home/Desktop/sample .txt --label "SKILL,ROLE,"

Completed training model. It is giving below error when tried checking accuracy of model.

ValueError: [E024] Could not find an optimal move to supervise the parser. Usually, this means the GoldParse was not correct. For example, are all labels added to the model?

I am found the error because white-spaces.

Any possible to remove white-spaces please send me any reference link (or) code.

Thanks in advance.

ines · June 24, 2019, 10:42am

You can export your dataset by running the db-out command and then check the JSONL file:

prodigy db-out resume_ner > resume_ner.jsonl

After you’ve removed the problematic spans or have corrected them, you can then reimport the data to a new dataset:

prodigy db-in resume_ner_fixed resume_ner.jsonl

You can probably also write a script to find the problematic entities automatically and then exclude them, and add the result to a new dataset. I haven’t tested this yet, but something like this should work:

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("resume_ner")
fixed_examples = []

def is_whitespace_entity(text):
    whitespace = (" ", "\n")  # etc.
    if text.startswith(whitespace) or text.endswith(whitespace):
        return True
    for char in whitespace:
        if text == char:
            return True
    return False

for eg in examples:
    new_spans = []
    for span in eg.get("spans", []):
        entity = eg["text"][span["start"]:span["end"]]
        if not is_whitespace_entity(entity):
            new_spans.append(span)
    eg["spans"] = new_spans
    fixed_examples.append(eg)

db.add_dataset("resume_ner_fixed")
db.add_examples(fixed_examples, ["resume_ner_fixed"])

pathapatisivayya · June 25, 2019, 3:15pm

Hello there,

Here i am attaching screenshot

How to increase Accuracy please any suggestion any reference link

Please as soon as possible.
Thanks in advance.

honnibal · June 25, 2019, 7:01pm

@pathapatisivayya There’s not really a good general-purpose answer to that. If we could know automatically what would improve accuracy, we’d implement that as a single script.

Some things you could try:

Try running ner.train-curve, to see whether accuracy is improving as more data is used. If so, you can try continuing to annotate.
If your annotations are complete, you can try adding the --no-missing flag. If they’re not complete, you can try running ner.silver-to-gold to make sure there are no missing entities.
You can try starting from a blank model, or training word vectors on your data.
You can try using spacy pretrain to learn an initial vector representation.
You can try analysing your errors, and either building a rule-based dictionary, or refining your annotation scheme.
You can try training a text classifier to filter out irrelevant texts, that might distract your model.

honnibal · June 26, 2019, 12:05pm

I’m sorry but I don’t think I understand your question.

We really can’t provide much project-specific advice, as this crosses past questions of how to use Prodigy, into much more general questions around how to solve specific problems with NLP or ML technologies.

If you need urgent help with your project, you might try posting a request to hire a freelancer in the consultants thread: spaCy/prodigy consultants?

Topic		Replies	Views
Error while training NER model usage , spacy , training	4	1850	September 16, 2021
ner.train-curve error on whitespace usage , ner , spacy	1	597	December 25, 2019
KeyError: 'U-STRIKE' when training on a new entitity ner	3	1201	January 17, 2018
Cannot debug Annotation Data to Train NER model. ner , spacy	4	1890	October 7, 2020
Improving on spacy's existing NER entities ner	1	664	December 5, 2019

ner.batch-train after ner.maual results error (Value error : [E024])

Related topics