textcat.teach seems to be asking questions that are already in the dataset

ivan · April 8, 2022, 8:40pm

I have noticed that quite often textcat.teach is re-asking questions about text that has already been annotated.

Why does it do this? Is it because my model is bad at those examples, and it is trying to reinforce the category by creating more of them?

koaning · April 11, 2022, 10:02am

Hi Ivan!

There are a few things that might be happening here. It's possible that textcat.teach queues up questions about the same text, but with different labels. This is especially likely if your model is produces uncertain scores for those examples. It's also possible that your text example has a very small change, which causes our hashing function to consider them different.

Do you perhaps have some examples that you can share?

ivan · April 21, 2022, 6:51pm

Thankyou Koaning! I will create some examples of this. I am reasonably sure the examples were not slightly different, but it could definitely be because the model was doing poorly on those examples.

Topic		Replies	Views
Textcat - same data keeps appearing usage , textcat	3	515	July 23, 2019
Same task presented for every pattern match enhancement , textcat	1	559	November 30, 2019
textcat.teach presents same annotation task if text snippet contains multiple patterns enhancement , usage , textcat , solved	11	1668	June 3, 2019
textcat.teach showing same text twice (and not using active learning?) textcat	15	2300	August 15, 2018
Can I use the model training data again as the source data for textcat.teach? usage , textcat , solved	3	432	September 11, 2020

textcat.teach seems to be asking questions that are already in the dataset

Related topics