ner.teach not giving relevant entities from patterns jsonl

jiajin · September 26, 2018, 7:09pm

This means it’s trying to show you roughly one suggestion from the model for each suggestion from the pattern matcher. The true matches from the pattern matcher are added as training examples for the model, and the model also learns when you click yes or no to the suggestions.

Just to clarify, is the model supposed to learn live, during the ner.teach process? Or only after I close the annotation window and run ner.batch-train? I'm asking because whilst using a patterns file, ner.teach seems to have the same behaviour as ner.match; that is, it merely yields matches to the patterns and does not seem to be interleaving novel suggestions from the model. Additionally, even if the patterned strings are precise duplicates of one another, the annotation window repeatedly displays these duplicates and does not seem to be preferring more unlikely queries.

I'm pretty certain that something is broken, and my gut feel is that the model's output is only considered when --label lowercase, while the pattern matcher's output is only considered when --label UPPERCASE, in the context of a ner.teach command.

I've done some tests to narrow this down: After performing my initial set of ~1000 annotations, I ran ner.batch-train and then ran ner.teach on the new model I trained. But whenever the label is spelled in lowercase (i.e. the correct casing), the suggestions were free to differ from the available patterns. Whereas when the label is spelled in uppercase, I only get perfect pattern matches (no different from before batch-train was run), and the score in the bottom corner of the annotation string display is constantly at 0 (which should not be the case, because after ner.batch-train is run, the model should have an idea of what constitutes an address entity).

Furthermore,

python -m prodigy ner.teach addr_db_v01 models/addr_model_v01 data/source.jsonl --label addr --patterns data/address_seeds.jsonl --exclude addr_db_v01

returns the same output as

python -m prodigy ner.teach addr_db_v01 models/addr_model_v01 data/source.jsonl --label addr --exclude addr_db_v01

which confirms my suspicions.

These bugs are breaking for me, because it disrupts the core purpose that Prodigy was meant to fulfill in my workflow . As of now, the pattern matching is basically just glorified regex with no model input, and it's frustrating to have spent the past week working around the bugs of this software (see other threads on ner.batch-train failing, etc.). What's the timeline on the updates to fix this?

Topic		Replies	Views
NER - Multi-entity and proper use of datasets ner , database , best-practices	2	2109	February 7, 2019
No entities found when running ner.batch-train on new NER ner , done	7	825	June 7, 2019
patterns.sonl file training in prodigy usage , ner , solved	1	459	June 24, 2020
Pre-annotate entities with patterns usage , ner , solved	6	762	January 11, 2023
What would be a good approach to train a NER model to recognize random strings usage , ner , spacy , solved	3	389	June 27, 2022

ner.teach not giving relevant entities from patterns jsonl

Related topics