Pattern files for textcat.teach

ines · July 6, 2018, 1:23pm

Yes, your solution is really elegant, actually! Since the annotations all have the same format and are collected with the same process (binary feedback on text plus label), you could store everything in one dataset, too. It's just make it more difficult to revert your changes if you make a mistake or want to try something else in the textcat.teach step.

The --patterns approach on textcat.teach was intended to be the "simpler solution" – but as it turned out in this thread, it does have some limitations, especially for getting over the cold start problem with rare categories.

Yes, for the final model, you should be able to train on all the annotations from scratch. At least, there is no reason why you should have to use your pre-trained model as a baseline. If you want, you could try both approaches and compare the results – if there's a significant difference, this would be very interesting!

Topic		Replies	Views
Textcat.teach not using the pattern file enhancement , textcat , done	10	1922	September 20, 2022
Can we bring back --seeds for textcat.teach? textcat , solved	7	523	February 10, 2023
Seeding text categorization with phrases textcat , done , custom	9	4207	March 21, 2018
Seeds not recognized by textcat.teach textcat , solved	10	3280	January 23, 2019
No tasks available in v1.10 - texcat.teach usage , textcat	4	841	June 28, 2020

Pattern files for textcat.teach

Related topics