Multi label tagging

Raphael · September 9, 2018, 8:27am

Is there a step by step guide to creating a recipe for multi label tagging?
I’m looking at Custom multilabel categorization recipe but it’s very unclear to me how to use it end to end.
With textcat we tagged some data and then trained the model through the recipe.
How can I train a model using the “accept” value of the JSON? (do I need to pull it out then use spacy the good old way?)

Thanks.

ines · September 10, 2018, 10:02am

Prodigy's textcat.batch-train recipe is optimised to train from binary annotations, so if you have data in this format, you can train your model using the built-in workflow.

If you want to convert data collected with the choice interface to the binary accept/reject format, you might also find this thread relevant:

This is also an option. If you've collected data using the choice interface, you could then do something like this:

options = ['LABEL_A', 'LABEL_B', 'LABEL_C']
data = []  # use this for training later
for eg in examples:
    accepted = eg.get('accept', [])
    cats = {label: label in accepted for label in options}
    data.append(eg['text'], cats)

Basically, you just need to check whether the "accepted" list of your annotated data includes the label (label was seleted by the user and should apply) or whether it doesn't. The above code will produce data in the following format, which you can use to train your model in spaCy:

data = [
    ('Hello world', {'LABEL_A': True, 'LABEL_B': False, 'LABEL_C': True})
]

Topic		Replies	Views
textcat_multilabel with only some labels annotated for some examples	5	377	June 14, 2022
Converting choice annotations to textcat annotations usage , textcat , custom , solved	6	1420	September 5, 2018
Custom multilabel categorization recipe textcat , spacy , front-end , solved	12	6281	August 3, 2020
training data format for multiclass textcat Getting Started usage , textcat	7	1574	August 29, 2022
What is the input format for annotated multi-label text classification data Getting Started textcat , solved	2	769	July 10, 2020

Multi label tagging

Related topics