Congrats on such a great product. You won’t believe it, but my beta invite came through just as I was on the couch manually labelling a spreadsheet of 100k+ rows. Needless to say, I was excited!
I’m just after some best practise/workflow advice.
What’s the best way to classify multiple labels (150+) into a single model?
I work in the chatbot space and need to run intent detection on inbound messages and respond with the correct label (to link to the correct answer from a different app).
Consider a QnA bot with 150 topics, so, 150 labels.
Should I textcat.teach for every label one at a time, then afterwards run textcat.batch-train for each label? If so, should I output the batch-train to the same model each time and then start again with the next label for textcat.teach?
I totally get the workflow for one label on one model, just after advice on how to add many more labels.
I think you’ll do well creating terminology lists to bootstrap your categories. Start off with a couple of seed terms, and then build out the word list using the terms.teach recipe, as shown at the start of the insults classifier video. This will help you create initial models for each of your terms.
You want to get to the point where you have a single dataset with at least a few positive and negative examples for each of your labels. Prodigy assumes labels are not mutually exclusive, i.e. that each text can have multiple labels. If that’s not true for your domain, then you know that all examples that are positive for one class will be negative examples for the other classes. To take advantage of this knowledge, you can create ‘reject’ examples for the other classes once one class has been accepted. This logic is left for you to implement because label schemes can have complicated dependencies, e.g. some labels may be mutually exclusive, others not.
Overall I suggest you let your workflow evolve as you go. It’s a boot-strapping process: hopefully every bit of knowledge you’re adding can be used to make the knowledge collection easier. The optimal procedure for this will be different for every problem, so we’ve tried to give you a variety of tools that compose well.
You’ll find yourself moving text in and out of the database, merging records, etc. This is all by design. Similarly, you’ll want to write little bits of Python (or shell if you’re perverse enough to prefer it ) as you go. This is also by design. We wanted to avoid a problem we often find with developer tools, especially hosted ones: they often end up creating this parallel language of scripts and configurations, that’s actually just worse than Python. We assume you know at least one programming language pretty well, so we wanted to make sure we let you use it, instead of creating more arbitrary stuff.
I think your last para was just what I needed. The realisation that it’s designed for us to manipulate with our own scripts and code, to use it as a tool rather than a off-the-shelf solution (which, yes, you’re right, is much better).
Follow up question (shout if you want me to make a new thread).
How much of a hack / is it possible to have short sentences or phrases in a seed list to use in terms.teach and textcat.teach rather than words? Something like:
how are you getting on
how is your morning so far
how do you feel
how is your day going
how is it going
how is your evening
is everything all right
how are the things going
I’m fine and you
how has your day been going
how is your day being
how are you
how are you today
how have you been
The easiest way for now would be to simply pre-train your model with the examples you already have, so it doesn't start off at zero and has at least some concept of your labels. See the this spaCy example for an end-to-end text classification training script. The model you save to disk can be directly loaded into Prodigy:
Alternatively, if you want to use terms.teach for phrases, you'll need a model with vocab and vectors containing multi-word tokens. This is a little more complicated, though, because you'll need to retokenize the text so phrases are one token. If you're bootstrapping the text classification with terms.teach, the model you later use for textcat.teach needs access to the same vectors. So you'll have to either write a wrapper for textcat.teach that adds your custom merging/tokenization logic, or package that with your spaCy model. The best way to achieve this would be to use a custom pipeline component.
I'm preparing to train a multi-label classifier with a little more than 20 labels and would like some input on how best to do that with regard to the annotation process.
For starters, I plan to annotate ~500 positive examples for each label to see where that gets me. Do you think, it would be best to create a multiple choice-style with all labels or run a separate session for each label?
If I ran separate sessions, then I would probably stream in training examples at random so as not to annotate the same text over and over. But would that create a bias in the model? The bias coming from having examples in my training data that hasen't been annotated with all possible labels.
I really like the one-decision-at-a-time design of Prodigy, but at the same time it seems a bit impractical to annotate the same examples over and over again. And it seems overwhelming to decide for more than 20 labels at a time.
I have the same question, but I am not using “.teach”. I am using manual annotation with choice view because this is for collecting gold data from SME. I plan to used “.teach” when creating training data. Do we have a response for this?
Using the choice interface for manual annotation is a good solution here – if you don’t want to go through one example at a time, you can use the multiple choice view to select one or more labels in one go.
If you want to use a recipe like teach and improve the model’s suggestions in the loop, that’s a bit different – at least, for a workflow like this, the idea is that you don’t want to be looking at all labels for all examples, and instead, focus on the most relevant ones (e.g. the most uncertain predictions). In that case, it also makes more sense to look at the labels separately – the most relevant examples and corrections for label A might be completely different from those for label B.
So a possible workflow could be this:
Collect an initial dataset of gold-standard annotations using a multiple-choice interface. (Don’t forget to collect enough for a separate evaluation set!)
Pre-train a model and evaluate it. Here, you can also look at the mistakes and the labels that are most problematic. Maybe there are some labels the model mostly gets right, and others it struggles with.
Run textcat.teach for the labels that need improvement and give feedback on the model’s sugestions. If you feel like you need to fine-tune the example selection (e.g. to skip more examples), you can always write your own sorter like prefer_uncertain that takes a steam of (score, example) tuples and decides whether to send out an example for annotation, based on its score.