Hi @mattdr,
Thanks for your message and welcome to the Prodigy community
At a glance, nothing seems wrong with your command's syntax.
First, let's check that your model was trained for the labels you're providing. Can you look for the meta.json
file in models/model-last
? Does it have:
"ner":[
"MODULE",
"LOGISTICS",
"PRODUCT",
"HR",
"POLICY"
]
How's the performance? You can view in the meta.json
.
Just curious, how did you train the model? If you could provide the command that would be great.
Did you do it from scratch (i.e., with a blank model) or fine tuning a pretrained pipeline?
If you did fine tuning, you may be dealing with catastrophic forgetting.
Since you're creating a couple of new entity types, you should likely train your model from scratch. We discuss this in the docs and in the Prodigy NER flowchart.
On the 2nd part: "you're not recognizing the sentences".
How did you get the final_dataset.jsonl
file? Did you db-out
your annotations? You can run prodigy stats dataset_cw
to show basic stats.
Does dataset_cw
have any overlap with final_dataset.jsonl
? Are they 100% same sentences? Do either or both have annotations? Data duplication/hashing may be coming into play.