Problem with custom model - ner train -

Hi, unfortunately, I have problems training a custom ner model. Here's what I'm seeing during the process, and I'm pretty sure it's wrong (no scores and no details R/F/P):


python -m prodigy train --ner correct_PER_MAIL_NG_data --base-model custom_model_email01 --label-stats

If I change the model to the standard spacy model, it works:


python -m prodigy train --ner correct_PER_MAIL_NG_data --base-model en_core_web_md --label-stats

I have created the custom model as follows:

nlp = spacy.load("en_core_web_md")
ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "EMAIL", "pattern": [{'LIKE_EMAIL':True}]}]


And it also seems to work:

doc = nlp("Apple is opening office.")
print([(ent.text, ent.label_) for ent in doc.ents])
[('Apple', 'ORG'), ('', 'EMAIL')]

I can use the custom model to annotate and create a dataset with ner.correct:


python -m prodigy ner.correct correct_PER_MAIL_NG_data custom_model_email01 ./NG_data_meta.jsonl --label PERSON,EMAIL --update

But something seems to go wrong during the training process. Any idea what I am doing wrong?

Thanks Alfred

============================== ✨  Prodigy Stats ==============================

Version          1.11.2
Location         C:\Users\xxx\Miniconda3\lib\site-packages\prodigy
Prodigy Home     C:\Users\xxx\.prodigy
Platform         Windows-10-10.0.18362-SP0
Python Version   3.8.3
Database Name    SQLite
Database Id      sqlite
Total Datasets   5
Total Sessions   16

correct_PER_MAIL_NG_data.jsonl (87.7 KB)

1 Like

Hi! It looks like the problem here isn't the training itself (which seems to run fine) but rather the scores and score weights that are defined in the config and decide which scores to show in the table and how to calculate the final score:

You can test this by comparing the training.score_weights section in your config.cfg (or nlp.config) of the two pipelines.

It's definitely confusing that adding the entity ruler would change this, and I don't immediately see why your score weights would end up different in the custom pipeline :thinking:

Thanks @ines for the ideas. The training.score_weights are the same (custom model left):

The only difference is the entity ruler:

Maybe it doesn't work what I want to do. My idea was to create a gold standard with a custom model (based on en_core_web_md) for person names and emails (entity ruler) with ner.correct. Afterwards I wanted to use the dataset for an optimisation of the person names and an evaluation of the email recognition. I know that an optimisation of the entity ruler is not possible, but I want to further optimise the person names and at the same time have a result for the entity ruler emails.

Like this, but I don't want to create a statistical model for the email recognition:

My use case looks like this:

  • Recognition, optimisation and evaluation of person names and ToDo: F_Name L_Name with span recognizer
  • Recognition and evaluation of emails with entity ruler (like_email)
  • One pipeline that does everything

It is possible to create and evaluate a custom model with one gold standard that has all the information/annotations? Or are there other best practices for such a use case?

Thanks Alfred

Hi! Thanks for the detailed report. We looked into this further and were able to reproduce the issue you were seeing. It turns out that there was a small bug in the mechanism that decides which output measures to show. The bug was caused due to a conflict between the NER and the entity_ruler, as both use the same output measures. We will patch this in the next bugfix release.

In the meantime, just to ensure you can continue with your work, would it be an option for you to first train the NER model with Prodigy as you intended to, and only afterwards add the entity_ruler to the trained version? I think in principle this should boil down to the same thing, because the entity_ruler will be unaffected by prodigy train anyway.


Thank you @SofieVL for your quick feedback and the workaround - I will try it. I can also evaluate the entity_ruler with the CLI evaluate after the labeling and training.

Thanks Alfred

1 Like

It works now :slight_smile: , THANK YOU!