ner.batch-train after ner.maual results error (Value error : [E024])

ines · June 24, 2019, 10:42am

You can export your dataset by running the db-out command and then check the JSONL file:

prodigy db-out resume_ner > resume_ner.jsonl

After you’ve removed the problematic spans or have corrected them, you can then reimport the data to a new dataset:

prodigy db-in resume_ner_fixed resume_ner.jsonl

You can probably also write a script to find the problematic entities automatically and then exclude them, and add the result to a new dataset. I haven’t tested this yet, but something like this should work:

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("resume_ner")
fixed_examples = []

def is_whitespace_entity(text):
    whitespace = (" ", "\n")  # etc.
    if text.startswith(whitespace) or text.endswith(whitespace):
        return True
    for char in whitespace:
        if text == char:
            return True
    return False

for eg in examples:
    new_spans = []
    for span in eg.get("spans", []):
        entity = eg["text"][span["start"]:span["end"]]
        if not is_whitespace_entity(entity):
            new_spans.append(span)
    eg["spans"] = new_spans
    fixed_examples.append(eg)

db.add_dataset("resume_ner_fixed")
db.add_examples(fixed_examples, ["resume_ner_fixed"])

Topic		Replies	Views
Error while training NER model usage , spacy , training	4	1853	September 16, 2021
ner.train-curve error on whitespace usage , ner , spacy	1	597	December 25, 2019
KeyError: 'U-STRIKE' when training on a new entitity ner	3	1201	January 17, 2018
Cannot debug Annotation Data to Train NER model. ner , spacy	4	1892	October 7, 2020
Improving on spacy's existing NER entities ner	1	664	December 5, 2019

ner.batch-train after ner.maual results error (Value error : [E024])

Related topics