How to add my categories as a label to my pipeline ?


I am using Spacy and my dataset is a list of emails which as part of pre-processing, I cleaned up the data by removing stop words, disclaimers and greeting and each email belongs to a category and in total I have about 30 categories.
Now my question is how can I add my list of categories to the pipeline ?
Would it be like this ?

> text_cat=nlp.create_pipe( "textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})
> nlp.add_pipe(text_cat, last=True)
> text_cat.add_label("Cat1")
> text_cat.add_label("Cat2")
> .
> .
> .

and for load_data function, what would be my cats ? I dont what should the format of the cats to be

In Spacy docs/examples:

def load_data(limit=0, split=0.8):
    """Load data from the IMDB dataset."""
    # Partition off part of the train data for evaluation
    train_data, _ =
    train_data = train_data[-limit:]
    texts, labels = zip(*train_data)
    cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]
    split = int(len(train_data) * split)
    return (texts[:split], cats[:split]), (texts[split:], cats[split:])

In my case, my categories are String, so I dont know what would be right value for me for cats in this linecats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]

I tried adding my list of categories to cats in load_data function, but the code reaches to training the data, in this line:
nlp.update(texts, annotations, sgd=optimizer, drop=0.2, losses=losses)

I get this error:

ValueError: could not convert string to float: 'Cat1'
I could only find examples for POSITIVE and NEGATIVE and with 0 and 1 as categories :frowning:

Hi! This forum is indended for questions specifically about Prodigy – sometimes that touches spaCy as well, but for general usage questions like this one, we now have the GitHub discussion board: So maybe you could repost your question there?