textcat.teach model init: db-based or session-only?

JanP · November 5, 2019, 6:22pm

Hello there!

For each label I am currently:

bootstrapping the model with patterns
running textcat.teach with prefer_high_scores sorter and annotating until the progress bar shows around 90 % (usually I need something over 1000 examples)
running textcat.batch-train that typically achieves around 75 % F-score

At this point, I would like to boost the preformance by adding additional examples using textcat.teach and prefer_uncertain sorter. (Hopefully, this workflow is sensible, or should I rather be focusing at hyperparameter tuning?) However, when I start textcat.teach again, based on the progress bar, it seems that the model in the loop is only trained based on the actual session.

Is there any way how to initialise the model in the loop based on all the examples in the db?

ines · November 6, 2019, 11:02am

This is correct, it always starts from the base model – because otherwise, we'd essentially have to run textcat.batch-train under the hood before each annotation session. So instead of wrapping that in textcat.teach, you can just run that step yourself with the setting you need, pre-train your model and then use that artifact as the base model.

So when you run textcat.teach for the second time, you can pass in the path to the model you trained with textcat.batch-train instead of the base model (en_core_web_sm etc.).

JanP · November 6, 2019, 12:50pm

Excellent! Thanks for the clarification!

Is there actually any benefit to usingprefer_uncertain on textcat.batch-train models trained for each label separately in comparison to usingprefer_uncertain on textcat.batch-trainmodel trained on a merged dataset of all the labeled example?

My expectation is that the uncertain cases suggested for annotation should be the same, but I might be missing something...

Topic		Replies	Views
Textcat teach after training to better converge model's decisions usage , textcat , solved	1	378	November 11, 2020
Start with a New Model When Starting a New Session usage , textcat , solved	1	490	July 11, 2018
textcat.teach showing same text twice (and not using active learning?) textcat	15	2317	August 15, 2018
Textcat.teach doesn't work to update the text classification model with exclusive classes. usage , textcat	5	719	September 25, 2020
How textcat.teach works under the hood usage , textcat	16	275	March 26, 2025

textcat.teach model init: db-based or session-only?

Related topics