Having trouble training blank model with a new entity types

kdziemia · October 17, 2018, 9:27pm

I’m attempting to train a model with a couple custom entities to recognize exercises and quantities.
Here’s an example phrase: “Four sets of twenty reps of standing calf raises and five sets of nine reps deadlifts”.
The model should recognize both the rep and set count, as well as the exercise being performed. The entities are mostly multi word tokens.
My training dataset has about 100 of these types of phrases with an additional 250 phrases that talk about fitness subjects in general.

I’ve attempted to train a proof of concept model using the following video
https://prodi.gy/docs/video-new-entity-type

I’ve created a jsonl file containing the training data formatted as such:

{"text":"incline chest press one set of six repetitions"}
{"text":"i did twenty burpees and thirty two push ups"}
{"text":"four hyperextensions and sixteen sit ups"}

I’ve created a patterns jsonl file formatted as such:

{"label":"EXERCISE","pattern":[{"lower":"squat"}]}
{"label":"EXERCISE","pattern":[{"lower":"leg"},{"lower":"press"}]}
{"label":"EXERCISE","pattern":[{"lower":"lunge"}]}
{"label":"EXERCISE","pattern":[{"lower":"deadlift"}]}
{"label":"EXERCISE","pattern":[{"lower":"leg"},{"lower":"extension"}]}
{"label":"EXERCISE","pattern":[{"lower":"leg"},{"lower":"curl"}]}
{"label":"QUANTITY","pattern":[{"lower":"six"}]}
{"label":"QUANTITY","pattern":[{"lower":"twenty"},{"lower":"one"}]}

I’ve decided to train the named entity recognizer from scratch as the default entities are not relevant to me. I’ve created and saved a blank 'en' model:

nlp = spacy.blank('en')
nlp.add_pipe(nlp.create_pipe('ner'))
nlp.begin_training()
nlp.to_disk('/path/to/model')

I've performed training using:

prodigy ner.teach workouts model training_data.jsonl --patterns patterns.jsonl --label "EXERCISE, QUANTITY"

I output the model using:

prodigy ner.batch-train workouts model --output workouts-model --label "EXERCISE, QUANTITY" --eval-split 0.2 --n-iter 25 --batch-size 8

The resulting model is entirely inaccurate. I’ve noticed that during training none of the patterns are ever referenced. I’m not sure if this is because I often list plural forms of the exercises. Should I be using "lemma" instead of "lower"? Will the lemma property even work with a blank model? I'm also getting very strange suggestions for entities and difficulty recognizing multi word token quantities.

I've also tried to use ner.manual recipe but it I get "No tasks available" after tagging the first 9 phrases.

I'm not quite sure what to try next.

FYI, I've originally created a similar model to identify foods, quantities, and units solely with Spacy with a similar size training dataset with quite good results for the limited dataset. I'm giving Prodigy a try because tagging entities manually for every phrase was too time consuming but the workflow looks to be very different.

honnibal · October 18, 2018, 5:44pm

Are you using v1.5.1? We’ve just released a fix for the problem I think you’re having. If you haven’t updated yet, try v1.6.1, and please let us know if the problem still persists.

kdziemia · October 22, 2018, 3:27pm

Updating to 1.6.1 has fixed the issues. Thank you!

Topic		Replies	Views
[Request] best practice for bootstrapping data for training partially new Named Entites? (and a question about PhraseMatcher ) usage , ner , spacy , best-practices , training	3	296	February 16, 2024
Add on new name entity incrementally... usage , ner	2	1034	October 7, 2019
What would be a good approach to train a NER model to recognize random strings usage , ner , spacy , solved	3	391	June 27, 2022
No entities found when running ner.batch-train on new NER ner , done	7	825	June 7, 2019
Training few new entities: Result very low usage , ner , spacy	3	17	January 29, 2025

Having trouble training blank model with a new entity types

Related topics