I've been using Prodigy and Spacy very successfully for various NER tasks for some time.
I'm now trying to train a multi-label text classification model for news articles.
I already have pre-annotated data that contains news headlines and the labels applicable to each example.
What I cannot find anywhere is the format for the input JSONL file for multi-label text classification. I can find examples of single label binary classifiers like the INSULTS dataset in the tutorial where a
"label" key is provided along with the text. But for multiple labels, am I supposed to provide a list of labels with this key or repeat each example for every label applicable to it or provide an
accept key with all the labels that are applicable similar to what the
textcat.manual recipe does?
The documentation is very lacking on this subject. In the docs for Text Classification under the section
I already have annotations and just want to train a model., the docs say that we need to supply a
text key along with a
spans key. Surely, this is for NER model training and not for text classification, right?