It seems like you have labels that were misnamed
HARDSKILL in your annotations.
The simplest approach is change those labels. Let's say your annotations are in a Prodigy dataset called
python -m db-out ner_dataset > ner_dataset.jsonl
Then run a python script that changes any span labels with
HARDSKILLS. This isn't the most elegant but a simple way to correct those:
examples = srsly.read_jsonl("ner_dataset.jsonl")
for eg in examples:
for span in eg.get("spans"):
Then reload that new
python -m db-in new_ner_dataset new_ner_dataset.jsonl
That should work. Now you should be able to train:
python -m prodigy train --ner new_ner_dataset my_model --eval-split 0.2
What's more important though is take note that your labeling process has some gaps (e.g., allowing label mispellings) and try to find ways to improve. One way to prevent this in the future is to use a simple
.txt file with your label names.
For example, in your main directory, have a file named
Now you can run:
python -m prodigy ner.manual ner_dataset blank:en my_input_data.jsonl --label labels.txt
Related, as I think you're likely using named multi-user sessions, be sure to set your
PRODIGY_ALLOWED_SESSION in your Prodigy configuration. This prevents users from accidentally mistyping their session name, which can cause similar problems down the road.
For example, if
PRODIGY_ALLOWED_SESSIONS=alex,jo, then only
?session=jo would be allowed and other names would raise an error.
Hope this helps!