Can you increase the question batch size in ner.teach active learning?

themrmax · June 19, 2020, 5:29pm

For my current task, I'm spending around 10 seconds labeling (I'm getting 10 examples per batch) and then around 60 seconds waiting for the next batch. Is it possible to increase the batch size so I have a (much) larger batch before I start labelling? Or is this the wrong way to think about things, e.g. maybe

I am reaching the limits of what active learning can help with
I should be using keywords to filter down my dataset (a lot of my examples don't contain any entities)

On this second point, is there a way to leverage the patterns file for this, i.e. only annotate examples which have a pattern match?

ines · June 21, 2020, 4:17pm

Which model are you using and how long are your texts? If you're using ner.teach, Prodigy will not just ask the model for the best analysis, but multiple possible analyses – so the longer your texts, the longer this may take. So if you're not already doing this, try using shorter examples, like single sentences.

In addition to that, you can also experiment with changing the batch_size setting. When the queue is running low, Prodigy will fetch the next batch of examples in the background, so maybe you can find a good trade-off batch size where it takes you long enough to annotate so that the model already has the next batch ready in the background.

If you get the feeling that you're not seeing enough examples for the given score threshold, then that's definitely an option. It will give your model more positive examples to learn from. If you only want to annotate examples with matches, check out the match recipe: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP If you're annotating entities, you want to set --label-span to add the matched label to the span.

Alternatively, you could also start with a fully manual or semi-manual round of annotation using ner.manual (with patterns) or ner.correct (if you have a model that already predicts something).

Topic		Replies	Views
source and pattern file size for ner.teach usage , ner	2	326	November 26, 2020
ner.train number of examples usage , ner	8	1945	August 3, 2018
No task available on sample NER training docs , usage , ner	3	1005	February 2, 2018
Basic question about batch persistence usage	2	752	October 9, 2019
Prodigy NER train recipe getting killed by OOM usage , ner	5	1237	June 14, 2022

Can you increase the question batch size in ner.teach active learning?

Related topics