Fine-tuned ner transformer model not labelling in Prodigy ner.correct

Hello!

I have sucessfully fine tuned a HuggingFace transformer model in spaCy. When I load the model in a notebook and run it on a doc object, it returns expected entities very well.

When I load the same model into prodigy in a ner.correct recipe, no entity labels appear.

displacy view of entity labels when model is loaded and run in spaCy in jupyter notebook.

When the same model is loaded as the model in the ner.correct recipe, there are nor entity labels.

cmd in virtual environment:

python -m prodigy ner.manual test_ner model-best-all_text ./test_text.jsonl --label ACTOR,ARTIFACT,FIRM,INFRASTRUCTURE,JURISDICTION,METRIC,PRODUCT,REGULATION,REGULATOR,REGULATOR,SERVICE

prodigy app:

hi @gdean!

Thanks for your question and welcome to the Prodigy community :wave:

Can you provide the exact command you were running? I'm wondering if you missed some of the labels or had some issues with it.

For example, can you confirm you're not getting any predicted labels when running:

python -m prodigy ner.correct test_ner model-best-all_text ./test_text.jsonl --label ACTOR,ARTIFACT,FIRM,INFRASTRUCTURE,JURISDICTION,METRIC,PRODUCT,REGULATION,REGULATOR,REGULATOR,SERVICE

where your fine-tuned model is model-best-all_text?

This is identical to your command you provided, only changing ner.manual for ner.correct.

In this case of running ner.manual, this is the expected behavior. The only time ner.manual can provide labels is when the input (source) file already has labels on it.

Hi @ryanwesslen - thanks for your reply!

Apologies - I pasted in the incorrect command line. I can confirm that when I run the command provided with ner.correct prodigy still is not showing any labels .

Thanks for the update.

Any chance your labels in your model may be different in case then what you're putting in your Prodigy command? This can happen if, say, your spaCy model's labels are lower-cased but you're using upper-case. To check your spaCy model,

import spacy
nlp = spacy.load("en_core_web_sm") # use model-best-all_text instead
nlp.get_pipe('ner').labels
('CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART')

Just curious, are you able to see any spans if you use print-stream recipe:

python -m prodigy print-stream spacy_model source

If it doesn't work, I'm wondering if there may be some tokenization misalignment problems. Did you modify the tokenizer in your pipeline?

Also, just curious, what versions of Prodigy and spaCy are you using? You can see both by providing:

prodigy stats
spacy info

Hi @ryanwesslen - thank you for the helpful feedback!

Wehn I checked the labeles in the model it appears they were all lowercase, when I switched the labels to lowercase in the command line the labels appeared!

Just out of curiosity, is it a best practice to keep labels in all caps for spaCy and Prodigy?

Thanks so much for your help!!

1 Like

That's great to hear! :sweat_smile:

I'd recommend all uppercase. It will avoid any issues as I think spaCy can handle either, but Prodigy will only show labels in UI as uppercase.

Also, three other tips.

First, to keep your command short (and avoid possible misspellings due to long labels list), you can also specify your labels by a local text file. For example, if in your folder you have a file named labels.txt:

# labels.txt
ACTOR
ARTIFACT
FIRM
INFRASTRUCTURE
JURISDICTION
METRIC
PRODUCT
REGULATION
REGULATOR
SERVICE

Then you can run it with:

python -m prodigy ner.manual test_ner model-best-all_text ./test_text.jsonl --label ./labels.txt

I mention this because I noticed in your original command, you accidentally put REGULATOR,REGULATOR twice. I don't think this will have any problem, but it goes to show with such a long list, it's really easy to misspell or misspecify. But if you put it once in a local .txt file, you'll always be consistent :slight_smile:

The second tip is that if you run into issues like this to debug built-in recipes like ner.correct, you can view the source code of all built-in recipes. Those recipes are in the path of Location: of your Prodigy installation you find by running prodigy stats and in the recipes folder (e.g., you can find ner.correct in the ner.py script).

By using custom logging, you can better diagnose any issues/questions you may have with any of the recipes. Let alone - if you learn some common syntax and conventions, you can reuse and begin developing your own custom recipes.

If you had still had issues, this was going to be my next recommendation. I mention in case it helps you debug your next possible recipe question.

Last - you have a really cool workflow. If you want a more reproducible workflow, consider converting your project into a spaCy project (which is being renamed weasel package). For example, here's an example template for Prodigy. This will make reproducibility so much easier like converting your model into a spaCy package, running that model as a streamlit app, or a FastAPI app.

Let me know if you have any questions!