Training a new Pattern

madhujahagirdar · March 12, 2018, 7:08pm

is there any empirical analysis done on how many new samples are required compared to the corpus sized trained to train a new pattern? As we deploy prodigy models in production, it would be an important metric to ensure certainty of outcomes. I will do some benchmark on this, wanted to understand if you there is any historical analysis form this perspective

honnibal · March 12, 2018, 7:17pm

Assuming by pattern you mean entity type: There’s not really a way to say, because it depends on how hard the learning problem is. Some things to consider:

How common is the entity?
How diverse are the instances?
How ambiguous are the instances?

If you have an entity that’s made up of only a single word, and that word is common, and that word is always an entity, any model will learn this super quickly. On the other hand, if you’re trying to tag long phrases with huge surface variation, and whether the phrase is an entity depends on context, you’ll need a lot of examples.

The best advice we’ve been able to give is to plot out a dose/response curve of data vs accuracy. This is implemented in the ner.train-curve recipe.

madhujahagirdar · March 12, 2018, 7:33pm

When I meant pattern here, I mean for text classification and not Entity or term recognition.

madhujahagirdar · March 13, 2018, 1:20am

Also, I see that Spacy is integrated with https://hazyresearch.github.io/snorkel/. Could be useful to create synthetic dataset using snorkel.

Topic		Replies	Views
Training NER model from scratch using (forward-looking) patterns usage	8	632	December 17, 2019
Transfer Learning for NER usage , ner	6	2304	May 24, 2021
training a new entity type with Prodigy usage , ner	4	572	March 8, 2019
Use Case Feasibility ner , spacy	1	431	July 28, 2019
[Request] best practice for bootstrapping data for training partially new Named Entites? (and a question about PhraseMatcher ) usage , ner , spacy , best-practices , training	3	120	February 16, 2024

Training a new Pattern

Related Topics