Custom multilabel categorization recipe

ines · July 5, 2018, 10:07am

Yes, spaCy's text classifier supports multiple, non-exclusive categories. We usually recommend making several passes over the data, one for each category. This is often still faster, because the annotator only has to think about one label.

But it all really depends on the use case, so if you want to annotate all labels at once, the "choice" interface could be a good option. The custom recipes workflow has an end-to-end example of this:

To allow multiple selections, you also want to add 'config': {'choice_style': 'multiple'} to the components returned by your custom recipe. The data you collect from the recipe will have an added "accept" key with a list of all selected labels. For example:

{
    "text": "This is a text",
    "options": [
        {"id": "CLASS_A", "text": "Class A"},
        {"id": "CLASS_B", "text": "Class B"},
        {"id": "CLASS_C", "text": "Class C"}
    ],
    "answer": "accept",
    "accept": ["CLASS_B", "CLASS_C"]
}

If you want to train directly with spaCy, you can then convert that data to the cats format:

labels = ['CLASS_A', 'CLASS_B', 'CLASS_C']
training_data = []

for eg in collected_annotations:
    accepted = eg['accept']
    text = eg['text']
    # dictionary of all labels – if label is in accepted list, value is
    # set to True, otherwise it's set to False
    cats = {label: label in accepted for label in labels}
    training_data.append((text, {'cats': cats}))

Alternatively, you can also generate examples in Prodigy's binary annotation style and use the built-in textcat.teach recipe. Here, you could just duplicate each example for each label, add the "label" value and set it to "answer": "accept" or "answer": "reject", depending on whether the label was selected or not.

Topic		Replies	Views
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	986	January 2, 2019
Multi label tagging usage , textcat	1	1180	September 10, 2018
Active learning for a multilabel text classifer textcat	1	1126	December 14, 2017
Interface error with text cat.teach? usage , textcat	1	583	March 20, 2018
Help with textcat workflow usage , textcat , solved	3	641	August 13, 2021

Custom multilabel categorization recipe

Related topics