Custom multilabel categorization recipe

Yes, spaCy's text classifier supports multiple, non-exclusive categories. We usually recommend making several passes over the data, one for each category. This is often still faster, because the annotator only has to think about one label.

But it all really depends on the use case, so if you want to annotate all labels at once, the "choice" interface could be a good option. The custom recipes workflow has an end-to-end example of this:

To allow multiple selections, you also want to add 'config': {'choice_style': 'multiple'} to the components returned by your custom recipe. The data you collect from the recipe will have an added "accept" key with a list of all selected labels. For example:

{
    "text": "This is a text",
    "options": [
        {"id": "CLASS_A", "text": "Class A"},
        {"id": "CLASS_B", "text": "Class B"},
        {"id": "CLASS_C", "text": "Class C"}
    ],
    "answer": "accept",
    "accept": ["CLASS_B", "CLASS_C"]
}

If you want to train directly with spaCy, you can then convert that data to the cats format:

labels = ['CLASS_A', 'CLASS_B', 'CLASS_C']
training_data = []

for eg in collected_annotations:
    accepted = eg['accept']
    text = eg['text']
    # dictionary of all labels – if label is in accepted list, value is
    # set to True, otherwise it's set to False
    cats = {label: label in accepted for label in labels}
    training_data.append((text, {'cats': cats}))

Alternatively, you can also generate examples in Prodigy's binary annotation style and use the built-in textcat.teach recipe. Here, you could just duplicate each example for each label, add the "label" value and set it to "answer": "accept" or "answer": "reject", depending on whether the label was selected or not.

1 Like