Thanks @mattdr for the details! That's helpful.
Nice performance and great work scaling up your annotations!
I can't find anything obvious still .
Could you try with your
ner.correct command to save your annotations to a different dataset than
prodigy ner.correct dataset_cw_gold models/model-last final_dataset.jsonl --label MODULE,LOGISTICS,PRODUCT,HR,POLICY
I named it "gold" because
ner.correct used to be called
ner.make-gold (see here), and is sometimes thought of as a "gold standard" annotation.
Also, can you try running
print-stream on a sample of your
final_dataset.jsonl (e.g., if you created a new file with say the first 10 records of that file)
prodigy print-stream models/model-last final_dataset_first10.jsonl
This recipe will make predictions. Just a heads up, this recipe will score all source records so that's why I recommended a smaller file. You can try on more records though .
To show it's working you can even replace
models/model-last with a pretrained
en_core_web_sm to show you want you'd expect to see.
Also -- cool trick, you can use your model to score on previously annotated data like:
prodigy print-stream models/model-last dataset:dataset_cw
or just part of the data you've (accept, reject, or ignore). For example, you can score any annotations you've ignored by running:
prodigy print-stream models/model-last dataset:dataset_cw:ignore
Alternatively, you can try your model
spacy-streamlit. GitHub - explosion/spacy-streamlit: 👑 spaCy building blocks and visualizers for Streamlit apps
This gives a great interface for showing the model to users/non-data scientists but can also help make sure the model is predicting as you expect.
If you find you're having examples that show the model is predicting entities correct, but still not Prodigy, let us know.