Text classification scoring

Hi,

I am trying to train a textcat model on around ~60% of my data, and using the trained model to predict the remaining 40% of the unseen and unlabeled data. This is provided the model trained on 60% of my data achieves decent level of accuracy.

My question is, is it possible to obtain a confidence scoring of the prediction made by my model when I run it against the remaining 40% of my data? The purpose is that, I would like to focus say only those predictions that achieves <80% accuracy. So I can potentially save time manually annotating every single one of them.

You might propose that I use textcat.teach which has an active learning loop within, however, I have >20 labels for my texts, and the list is not exhaustive (as I have yet to explore all of the possible labels). So by using textcat.teach, I am potentially limiting my annotation task to the limited list of existing labels.

Hope to seek advice, thanks.

Yes, that sounds like the type of workflow that should be easy to implement with a custom recipe :slightly_smiling_face: For instance, you could stream in your remaining data, process each text with your pretrained model and then only send out an example if your criteria are met – for instance, if the highest scoring label has a score under your given threshold, or if a certain label has a low score, or if all labels have low scores. How you define that is up to you and just comes down to doing the math given the doc.cats with labels keyed by scores.

Here's a quick example:

import prodigy
from prodigy.components.loaders import JSONL
import spacy

@prodigy.recipe("textcat.score")
def textcat_score(dataset, spacy_model, source, threshold=0.8):
    nlp = spacy.load(spacy_model)
    
    def get_stream():
        stream = JSONL(source)
        # Mostly for speed so we can pipe examples through nlp.pipe
        eg_tuples = ((eg["text"], eg ) for eg in stream)
        for doc in nlp.pipe(eg_tuples, as_tuples=True):
            # Decide whether you want to send out example, based on predicted scores
            highest_score_cat = max(doc.cats, key=doc.cats.get)
            if doc.cats[highest_score_cat] <= threshold:
                # Format example, e.g. attach label, choice options etc.
                eg["label"] = highest_score_cat
                eg["meta"] = {"score": doc.cats[highest_score_cat]}
                yield eg

    return {
        "dataset": dataset,
        "stream": get_stream(),
        "view_id": "classification"  # or choice, blocks etc.
    }

You might want to adjust the logic that decides whether to send out an example, depending on your model. The objective is a bit different if your categories are not mutually exclusive and in that case, you might want to look at all labels. You could also focus on the examples within a range of 0.4 to 0.6, as those are the most "uncertain" scores.

In the recipe above, it's just sending out the examples with a top-level "label" and rendering them in the classification interface. But you could also use the choice interface with label options instead, or a custom interface with blocks that show the text, options and maybe an input field to leave comments during the data exploration phase.

If you're strictly viewing this as a type of "data exploration" or "label scheme development" task, then keeping the label scheme open like this can definitely make sense. If your label scheme is mututally exclusive, I'd recommend switching to a fixed label scheme as quickly as you can, though, since the presence and absence of one label has an impact on all the other labels, so once you make a change, you kinda have to review all other annotations again anyway.