Multi label tagging

Is there a step by step guide to creating a recipe for multi label tagging?
I’m looking at Custom multilabel categorization recipe but it’s very unclear to me how to use it end to end.
With textcat we tagged some data and then trained the model through the recipe.
How can I train a model using the “accept” value of the JSON? (do I need to pull it out then use spacy the good old way?)

Thanks.

Prodigy’s textcat.batch-train recipe is optimised to train from binary annotations, so if you have data in this format, you can train your model using the built-in workflow.

If you want to convert data collected with the choice interface to the binary accept/reject format, you might also find this thread relevant:

This is also an option. If you’ve collected data using the choice interface, you could then do something like this:

options = ['LABEL_A', 'LABEL_B', 'LABEL_C']
data = []  # use this for training later
for eg in examples:
    accepted = eg.get('accept', [])
    cats = {label: label in accepted for label in options}
    data.append(eg['text'], cats)

Basically, you just need to check whether the "accepted" list of your annotated data includes the label (label was seleted by the user and should apply) or whether it doesn’t. The above code will produce data in the following format, which you can use to train your model in spaCy:

data = [
    ('Hello world', {'LABEL_A': True, 'LABEL_B': False, 'LABEL_C': True})
]
1 Like