Best use of `textcat.teach`

ines · June 18, 2020, 9:26am

The idea of the textcat.teach recipe is that it uses the model in the loop to select the most relevant examples for annotation, based on the score (e.g. pioritising the examples with a score closest to 0.5, as those may be the most "uncertain" predictions). This also means that the recipe will skip examples with high and low scores, so you're not going to see all examples in your dataset. The recipe will use an exponential moving average go decide which scores to consider. This prevents Prodigy from getting stuck if the model ends up in a state where it produces more high/low scores etc.

If you're starting completely from scratch with a new model and you're annotating labels that might not be equally distributed, this workflow can be less effective because your model knows nothing. And it would take very long to get enough examples of all labels to teach it something meaningful so it can actually "participate" properly.

So it might make sense to start with a manual workflow like textcat.manual and annotate a small sample from scratch. You can then pretrain your model on that to give it a head-start. It can also help to use --patterns on textcat.teach to make sure that pattern matches are always shown if they occur (e.g. to show examples that may be part of rarer classes).

Topic		Replies	Views
Textcat.teach running out of tasks, but they are there on refresh textcat	3	307	May 28, 2021
Textcat model with multiple classes usage , textcat	5	1536	November 1, 2019
Only 25 lines loading from my .jsonl stream usage , textcat	3	903	July 30, 2019
Interface error with text cat.teach? usage , textcat	1	583	March 20, 2018
How can I training a textcat have thousands label. usage , textcat , spacy	2	1328	June 20, 2019

Best use of `textcat.teach`

Related topics