That's great to hear!
I'd recommend all uppercase. It will avoid any issues as I think spaCy can handle either, but Prodigy will only show labels in UI as uppercase.
Also, three other tips.
First, to keep your command short (and avoid possible misspellings due to long labels list), you can also specify your labels by a local text file. For example, if in your folder you have a file named labels.txt
:
# labels.txt
ACTOR
ARTIFACT
FIRM
INFRASTRUCTURE
JURISDICTION
METRIC
PRODUCT
REGULATION
REGULATOR
SERVICE
Then you can run it with:
python -m prodigy ner.manual test_ner model-best-all_text ./test_text.jsonl --label ./labels.txt
I mention this because I noticed in your original command, you accidentally put REGULATOR,REGULATOR
twice. I don't think this will have any problem, but it goes to show with such a long list, it's really easy to misspell or misspecify. But if you put it once in a local .txt
file, you'll always be consistent
The second tip is that if you run into issues like this to debug built-in recipes like ner.correct
, you can view the source code of all built-in recipes. Those recipes are in the path of Location:
of your Prodigy installation you find by running prodigy stats
and in the recipes
folder (e.g., you can find ner.correct
in the ner.py
script).
By using custom logging, you can better diagnose any issues/questions you may have with any of the recipes. Let alone - if you learn some common syntax and conventions, you can reuse and begin developing your own custom recipes.
If you had still had issues, this was going to be my next recommendation. I mention in case it helps you debug your next possible recipe question.
Last - you have a really cool workflow. If you want a more reproducible workflow, consider converting your project into a spaCy project (which is being renamed weasel
package). For example, here's an example template for Prodigy. This will make reproducibility so much easier like converting your model into a spaCy package, running that model as a streamlit app, or a FastAPI app.
Let me know if you have any questions!