Can I use the model training data again as the source data for textcat.teach?

kaorisugi · September 9, 2020, 12:43pm

Hello!
I want to update the text classification model that does polarity checking in textcat.teach. The text classification model was trained on annotation data made from textcat.manual.
So I have one question.
As the title says, is there any adverse effect of using the same .jsonl data used for training as source data when actively learning a model with textcat.teach?
I'm going to use a .jsonl data without annotations, not datasets.
What I'm worried about is that using the same data as in training may leak answers and have a negative impact on active learning.

ines · September 10, 2020, 11:53am

Hi! If your goal is to update the model you already have and to improve it, updating it on the same data wouldn't really work, because there's nothing new from the model to learn from. If you use the same examples again during textcat.teach, the annotations you're creating in that process are basically a subset of what you've already collected manually.

If you have more raw unseen text, it's better to use that instead. textcat.teach will pre-select examples based on their scores and ideally guide you towards annotating the more relevant examples so you won't have to label everything.

kaorisugi · September 11, 2020, 1:10am

Thank you for your answer!
I understand that using the same data is not something the model can learn. I will work on updating the model with the new data.
prodigy is a great tool. I'm always looking forward to your updates.

ines · September 11, 2020, 10:11am

Thanks, that's nice to hear!

Topic		Replies	Views
Textcat.teach doesn't work to update the text classification model with exclusive classes. usage , textcat	5	707	September 25, 2020
Textcat teach after training to better converge model's decisions usage , textcat , solved	1	365	November 11, 2020
Start with a New Model When Starting a New Session usage , textcat , solved	1	480	July 11, 2018
textcat.teach showing same text twice (and not using active learning?) textcat	15	2300	August 15, 2018
When is the model called and the scores updated in the textcat teach method textcat	1	348	August 19, 2022

Can I use the model training data again as the source data for textcat.teach?

Related topics