How textcat.teach works under the hood

magdaaniol · March 26, 2025, 1:51pm

I think the uncertainty sampling with multiple labels in textcat.teach introduced more ambiguous cases that confused the model's previously learned patterns. This may be related to fact that the sampling might have been suboptimal due to working with multiple labels while updating only one label at a time (as discussed in my previous post)
I would definitely recommend working with one label at a time to try to gather more examples from underrepresented classes. You will then be able to merge them into the format expected by spaCy texcat_multilabel by exporting them with data_to_spacy.
Some users tried implementing texcat.teach with multiple labels so that the model gets updated on all labels with each non-exlusive annotations but I'm not sure how would that affect the active learning effectiveness. It might be that it will take longer to converge in comparison to doing focused one-label-at-a-time sessions.

Re: number of examples
Definitely more is needed, I would recommend following spaCy general purpose advise here. You can also run spacy data debug once you've converted your data to spaCy DocBin format to get some more structural insights on your dataset.

Re: document length & architecture
All the architectures use some sort of pooling of the token vector representations. This allows to process documents that are arbitrarily long, but you're definitely right in thinking that it might lead to some context dilution. Now, some techniques are more prompt to this context dilution than others. BOW just pools the n-gram representations without taking token position into account which makes it least appropriate for long texts. For a CPU solution, the ensemble model would probably be better as it uses attention in the textcat component, but if you can work on a GPU then a transformer-based architecture would definitely handle better long-distance relationships in text thanks to self-attention.
That said, splitting the text into paragraphs and preprocessing the input to select the most representative parts makes a lot of sense and is spaCy's developers general advice due to more efficient memory use.

Topic		Replies	Views
textcat.teach not taking into account label value textcat , done	4	601	December 7, 2018
Textcat teach after training to better converge model's decisions usage , textcat , solved	1	364	November 11, 2020
Textcat - teach to train. usage , textcat	2	553	September 1, 2022
Textcat.teach with Multiple Choice & Update Model textcat , spacy	1	997	February 11, 2020
Is there a way to highlight seeded terms in textcat.teach? enhancement , textcat , done	5	1802	January 29, 2020

How textcat.teach works under the hood

Related topics