Textcat.teach with Multiple Choice & Update Model

Hello,
I am trying to create a custom recipe where I can do text classification with multiple choice while updating the default TextClassifier model at the same time. I am using as base the textcat_teach recipe
I have the code for the multiple choice + classification with predictions. I am also able to update the answer before is written to the DB so I can correct the label before it is written to the DB it in case the prediction was not correct.
What I am not sure how to do or if I am doing it right is to update the model on the go with the annotation value (Accept/Reject) and how to check. Do I need to call "model.update" in my review_answers function?

Here is my config for the multi choice and the update with my custom method

return {
    "view_id": "blocks",
    "dataset": dataset,  # Name of dataset to save annotations
    "stream": stream,  # Incoming stream of examples
    "config": {
        "blocks": [
            {"view_id": "choice"},
        ],
        "lang": nlp.lang # Additional config settings, mostly for app UI
    },
    "update": review_answers,  # Update callback, called with batch of answers
    "exclude": exclude  # List of dataset names to exclude
}

Then, my review_answers is this:

def review_answers(answers):
    # Overwrite the tag before writing to disk
    for eg in answers:
        # Update the label tag with the answer from the annotator
        if eg["answer"] == "accept" and len(eg["accept"]) > 0:
            eg["label"] = eg["accept"][0]
    model.update # Is this the right way to update the model?

The flow of my recipe is basically the same as the textcat_teach recipe

# Load the stream from a JSONL file and return a generator that yields a
# dictionary for each example in the data.
stream = JSONL(source)

# Load the spaCy model
nlp = spacy.load(spacy_model)

# Initialize Prodigy's text classifier model, which outputs
# (score, example) tuples
model = TextClassifier(nlp, label)

if patterns is None:
    # No patterns are used, so just use the model to suggest examples
    # and only use the model's update method as the update callback
    predict = model
    update = model.update  # Do I need to do model.update here?? Update is not used
else:
    # Initialize the pattern matcher and load in the JSONL patterns.
    # Set the matcher to not label the highlighted spans, only the text.
    matcher = PatternMatcher(
        nlp,
        prior_correct=5.0,
        prior_incorrect=5.0,
        label_span=False,
        label_task=True,
    )
    matcher = matcher.from_disk(patterns)
    # Combine the NER model and the matcher and interleave their
    # suggestions and update both at the same time
    predict, update = combine_models(model, matcher)

# Use the prefer_uncertain sorter to focus on suggestions that the model
# is most uncertain about (i.e. with a score closest to 0.5). The model
# yields (score, example) tuples and the sorter yields just the example
stream = prefer_uncertain(predict(stream))

stream = add_options(stream, label)

Thanks for any guidance you can provide.

Hi! To update the annotation model, you can call model.update with a batch of examples in Prodigy's format. Since your recipe is creating predict/update functions depending on whether you're using patterns or not, you probably want to be using that update function. For example:

if patterns is None:
    # etc.
    predict = model
    update = model.update
else:
    # etc.
    predict, update = combine_models(model, matcher)

def review_answers(answers):
    update(answers)

Alternatively, you could also create a spaCy text classifier directly and then update that, just like you'd update it during training (by calling nlp.update with texts and annotations). This removes one layer of abstraction introduced by Prodigy's annotation model, and makes it easier to see what's going on.

(The annotation model helps dealing with binary accept/reject annotations, converting data etc., but in your case, the annotation part is pretty straightforward: you have multiple choice options, and it's pretty clear what the annotations "mean" and how you want to use them to update your model.)

Note that any modifications you make to the answers in the update callback won't be reflected in the database – the annotations in the dataset will always match whatever the annotator saw and created. This is by design, because it'd otherwise be too easy to accidentally overwrite your annotations in place when performing the update and potentially destroying datapoints.

1 Like