textcat.teach for multi-class classification

If I want to use textcat.teach to help label 100+ classes, how would that work? What's the recommended workflow - is there an alternative way that's better?

hi @riwikiwi!

Thanks for your question and welcome to the Prodigy community :wave:

textcat.teach assumes you have a model already: do you have one? how were the labels for that model created? why 100 classes?

One nice thing about textcat.teach is that it outputs binary annotations (as opposed to manual), which simply mean Yes/No decisions (instead of asking the user to choose from all possible labels). See Docs. So for that, an annotator will just say yes/no when presented with an example and a candidate label, reducing the cognitive load for an annotator to choose from 100+ classes at a time.

But I'm guessing you may not have a model yet, so you'd need to start with textcat.manual.

You'll save yourself a lot of headaches if you can find any way to simplify your problem, especially at first as you learn more about your data and create your first workflow:

Dealing with very large label sets or hierarchical labels

If you’re working on a task that involves more than 10 or 20 labels, it’s often better to break the annotation task up a bit more, so that annotators don’t have to remember the whole annotation scheme. Remembering and applying a complicated annotation scheme can slow annotation down a lot, and lead to much less reliable annotations. Because Prodigy is programmable, you don’t have to approach the annotations the same way you want your models to work. You can break up the work so that it’s easy to perform reliably, and then merge everything back later when it’s time to train your models.

If your annotation scheme is mutually exclusive (that is, texts receive exactly one label), you’ll often want to organize your labels into a hierarchy, grouping similar labels together. For instance, let’s say you’re working on a chat bot that supports 200 different intents. Choosing between all 200 intents will be very difficult, so you should do a first pass where you annotate much more general categories. You’d then take all the texts annotated for some general type, such as information, and set up a new annotation task to sort them into more specific subtypes. This lets the annotators study up on that part of the annotation scheme, so they can make more reliable decisions.


If you set up like a hiearchical, we recommend this approach:

Also, rules (patterns) can be really helpful. This is especially the case if you have some prior knowledge about each group and know of some terms that could start. You can even use terms.teach to generate quickly.

Also: one outside-the-box idea -- if you know the name of all 100 class (this is usually the case in a business problem where you're told the classification types), consider using LLM terms.openai.fetch link to generate related terms (patterns) for each class. I don't know how much value zero-shot recipes would do for that many categories but you can test out if you install v1.12.

Alternatively, maybe for 20-30 classes you could try something like bulk labeling:

Unfortunately, there's not one perfect solution but hopefully, this gives you a few ideas of options you can experiment with. Hope this helps!

Hi @ryanwesslen thank you so much for your answer. I already have a dataset manually labeled with the 100 classes (the business case requires there to be 100 classes) . How do I go from there if were to use text.teach on unlabeled data? And how would the binary annotations work if I say no to one class, would it classify it as the next closest one? Thank you!

So, first you'd need to create your first model.

First, make sure your annotations are in the right format for spaCy or Prodigy.

Here's a post where we go through the data format:

You'll need to make a decision if you want your classes are mutually exclusive (textcat) or not (textcat_multilabel) which can affect how the data is formatted. See that post for more details.

Then once your annotations are into the Prodigy DB, use data-to-spacy to export out the annotations and train with spacy train. You can train with prodigy train instead -- it has the advantage is it's quicker to start, but it's harder to reconfigure down the road as it hides the config file so I tend to recommend learning data-to-spacy/spacy train earlier.

Also you may want very early to create a dedicated holdout (evaluation) dataset. This will make your experiments down the road much easier to read as your evaluation dataset is staying the same. If you don't specify a dedicated hold out dataset, Prodigy will create a random partition for evaluation. However, this can change each time so if you rerun, you may get different results simply due to a new holdout (evaluation) dataset. Be sure to use the eval: prefix with either data-to-spacy or prodigy train, e.g., --textcat dataset,eval:eval_dataset.

Sort of. Here's a post (see slides, which cover NER but same idea applies to textcat) that provide some detail. Essentially, updating is made for the known (binary) labels and the other labels are treated as missing.

What's worth mentioning is that this approach was designed for a reasonable number of labels; when working with 100+ labels, I'm a little more skeptical on how well this would work (especially if you don't have a well trained model already, then doing textcat.teach). The problem is textcat.teach assumes you have a model that can measure uncertainty well, that is it "knows" what it doesn't know. The problem is if you have only a small amount of data across all labels, especially perhaps imbalanced for many labels, it's hard for textcat.teach to work well if it doesn't know what it doesn't know (aka it can't measure uncertainty well).

I would recommend a "bottom-up" approach, where you start with good/well-balanced labels, and then only add imbalanced/rare/poor-performing labels slowly:

  1. before applying to your entire dataset (100+ label), start with a small subset of labels (6-8) that you know you have a good number of labels. Train an initial model on only those labels. This will give you a good idea of a good benchmark model. Maybe if one or two of those labels aren't performing as high as you'd like, you could add more annotations for them and then retrain a new model from scratch.
  2. Then, slowly expand to add more labels in small groups. You'll first need to add them into your training, then use prior knowledge to focus on the labels that need the most help -- for example, severely imbalanced or poor model performance.
  1. Consider using textcat.correct too instead of textcat.teach early on. textcat.correct will still use the model's prediction in the UI (so makes it a little easy as your job is to correct). You can even pass a --threshold parameter where you consider annotations based on some threshold.
  2. Only use textcat.teach when you have sufficient number of examples for that label. Also consider using patterns (see docs) in combination.

Hope this helps!