textcat.teach surprising UI for multilabel

Ran the following command:

prodigy textcat.teach insults-w-negs-4 blank:en  ~/datasets/reddit-comment-corpus/RC_2011-08.bz2 --loader reddit --label NOT_INSULT,INSULT,AWESOME

The UI at localhost:8080 shows this:

I would have thought I'd get three options in the same way that you do for textcat.manual. Am I misunderstanding?

When you go to the docs you can confirm that the textcat.teach recipe is a binary interface. That means that it only gives you the option to say weather or not the current prediction is correct.

You could alternatively use textcat.correct if you'd like a full manual interface where you can correct the model.

Thanks for clarifying. So in that case, what is the intended use case of passing multiple labels to textcat.teach? Error messaging suggests textcat.teach requires at least one --label, and it runs fine on 3 -- what is it doing with them if the UI only gives me the option to accept or reject?

A binary interface doesn't imply that there is only one label. A binary interface merely implies that what is on display is either correct or wrong. Maybe these examples help make clarify the expectations a bit.

Binary vs. Manual in NER

For example, this is an interface for ner.correct. This allows you to take the model output and to correct. In the case of NER, you can still assign multiple labels to the text because this is a manual interface.

But now consider ner.teach. This just shows one annotation at a time and you can only tell the interface if the current annotation is correct or not.

The binary interfaces are less expressive, but annotate a whole lot quicker because each example is a simple yes/no.

Binary vs. Manual in Textcat

Let's now consider how this would work for text classification. This is one example for textcat.correct. This is a manual interface, and it allows you to select each class individually.

But now, let's consider textcat.teach. That would look like this:

Again, this is a binary interface and you'd only be able to say if it is correct or not. Again, it's less expressive, but it's faster to annotate because it's a yes/no decision. Note that in this case you'd still be able to attach more than one class to an example, but each class would pop up in a separate example. Typically these classes would be suggested by a spaCy model that tries to predict the examples.

Does this help?

1 Like

Does this help?

Yep, thank you.

1 Like