Reject or skip examples for text classifier annotations

mikael · November 28, 2018, 10:01pm

Im going through 50k chat conversations in customer service. Im using textcat.teach to set labels on them.
Example labels: payment, order info, cancellation.

There is a lot of messages inside the conversation that are not relevant at the moment, like welcome phrases and confirmations.

My questions is if I should reject them or just simply skip them?
I started out with annotations for the payment label and got quite good score but after thousands of annotations with a majority of rejects I got close to zero in score for a relevant payment conversation.

Can you confirm that skip is a good solution at this point?

honnibal · November 29, 2018, 1:04pm

You might find that a two-stage pipeline will work better for your use-case here. If you have a really easy problem of rejecting 99% of your messages, it can be good to have a very simple model that performs that task as an initial filter. Then you can run your more powerful model on the remaining examples.

There are several good open-source solutions for simple text classification problems. The text classification solutions in scikit-learn are efficient and easy to use, so you might want to try that. The text classification solution in FastText is also pretty good.

Once you’ve filtered out the messages you can easily tell are irrelevant, you can work on just the relevant cases. This will make annotation faster, and should also improve your model accuracy, because the classes will be much more balanced.

mikael · November 29, 2018, 9:16pm

Thank you for your response, ill try that

Topic		Replies	Views
Multi-label Text Classification Ignore example usage , textcat , solved	4	494	October 29, 2020
Practical use of rejected textcat.teach annotations for downstream tasks	2	89	May 24, 2024
Multilabel text classification annotation approach usage , textcat , solved	6	1674	November 6, 2018
Best Practices for text classifier annotations usage , textcat , best-practices	7	5004	March 24, 2021
How can I improve a textcat model? usage , textcat	1	764	May 6, 2019

Reject or skip examples for text classifier annotations

Related topics