How to add my categories as a label to my pipeline ?

Shahrzad_Fa · December 17, 2020, 10:31pm

Hello,

I am using Spacy and my dataset is a list of emails which as part of pre-processing, I cleaned up the data by removing stop words, disclaimers and greeting and each email belongs to a category and in total I have about 30 categories.
Now my question is how can I add my list of categories to the pipeline ?
Would it be like this ?

> text_cat=nlp.create_pipe( "textcat", config={"exclusive_classes": True, "architecture": "simple_cnn"})
> nlp.add_pipe(text_cat, last=True)
> 
> text_cat.add_label("Cat1")
> text_cat.add_label("Cat2")
> .
> .
> .

and for load_data function, what would be my cats ? I dont what should the format of the cats to be

In Spacy docs/examples:

def load_data(limit=0, split=0.8):
    """Load data from the IMDB dataset."""
    # Partition off part of the train data for evaluation
    train_data, _ = thinc.extra.datasets.imdb()
    random.shuffle(train_data)
    train_data = train_data[-limit:]
    texts, labels = zip(*train_data)
    cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]
    split = int(len(train_data) * split)
    return (texts[:split], cats[:split]), (texts[split:], cats[split:])

In my case, my categories are String, so I dont know what would be right value for me for cats in this linecats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]

I tried adding my list of categories to cats in load_data function, but the code reaches to training the data, in this line:
nlp.update(texts, annotations, sgd=optimizer, drop=0.2, losses=losses)

I get this error:

ValueError: could not convert string to float: 'Cat1'
I could only find examples for POSITIVE and NEGATIVE and with 0 and 1 as categories

ines · December 20, 2020, 11:00pm

Hi! This forum is indended for questions specifically about Prodigy – sometimes that touches spaCy as well, but for general usage questions like this one, we now have the GitHub discussion board: https://github.com/explosion/spaCy/discussions So maybe you could repost your question there?

Topic		Replies	Views
Textcat_MultiLable - How doc[cats]=1 or 0 works while training the Model textcat , spacy	3	20	February 14, 2025
Use textcat and textcat_multilabel in the same model textcat , spacy	1	347	May 19, 2022
Load error after adding custom textcat model to the pipeline textcat , spacy	7	2082	June 26, 2019
Text classification packaging issues textcat , spacy , solved	6	639	April 27, 2019
Outdated Documentation and trouble loading textcat model textcat , spacy	4	821	August 15, 2018

How to add my categories as a label to my pipeline ?

Related topics