Prodigy Tutorial Video: Finding Bad Labels for Text Classification

I'm happy to announce that we've got yet another Prodigy tutorial on our Youtube channel! This one is about finding bad labels for text classification in the Google Emotions dataset. Many useful techniques can help you find potential candidates and the video even investigates annotator disagreement.

You can watch the new video here or on YouTube .

The content is designed to be interesting if you're a Prodigy user, but general enough such that a general data science audience might also learn a trick or two. I genuinely hope that these videos help folks get better datasets for their models. There's a lot of work to do in this space!

1 Like

This is a beautiful guide, thank you. It also is a lightning tour of differ NLP methods!

One way we found to quickly weed out bad labels was to export then use excel and auto filter with different keywords (very much like the heuristic step you applied in python). Needs a careful approach, as it is possible to fill down and ruin a lot of labels, but is also super fast.

1 Like