Scorer for Text Classification

AlbanoBorba · May 18, 2020, 4:58pm

I try to evaluate my multilabel textcat with Scorer, but the scorer.scores don't return anything.

def evaluate(nlp, test_data):
    scorer = Scorer()
    for text, label in test_data:
        # text: "my example text"
        # label: {"cats:{"cat1":0.0, "cat2":1.0, "cat3":0.0}}
        doc_gold_text = nlp.make_doc(text)
        gold = GoldParse(doc_gold_text, cats=label["cats"])
        pred_value = nlp(text)
        scorer.score(pred_value, gold)

    return score.scores
    # return: {}
    # actually, also other fields return zero

The model works well to predict and I use this same function to eval NER in other model with success. Do I something wrong in this case? Score don't work to textcat?

AlbanoBorba · May 18, 2020, 8:43pm

I don't know what's wrong with the code above, but I tried a different approach and it's worked! Using nlp.evaluate():

def evaluate(nlp, test_data):
    eval_input = [(nlp.make_doc(text), GoldParse(nlp.make_doc(text), cats=label["cats"])) for text, label in test_data]
    scorer = nlp.evaluate(eval_input)
 
    return scorer.scores

Mayank · July 13, 2020, 11:41am

Hi @ines

We're trying to solve a multi-label text classification problem where prodigy has been used for annotation. There'are 10 classes and almost 25% are +ve samples (have one or more labels). We've trained model using SpaCy cli command with en_vectors_web_lg as our base model.

Our goal is to aggregate (sum/avg) all scores across 100 inferences and arrive at relative aggregate scores across the 10 classes.
However, there is a major variance in scoring which is causing problems:

The HIGH score for two different positive samples for the same class are very different. For sample one, it is "label A": 0.177, and for another sample it is "label A": 0.667. Why so much variance? Do we need to normalize?
Also the order-of-magnitude of LOW scores for different samples varies so much - ranging from 10e-1 to 10e-3. Again - Why so much variance? Do we need to normalize?

I have attached a screen shot below

My last question is for the model architecture. Does SpaCy use sigmoid activation function for classification of multi-label classes?

Thanks

kapilok · July 15, 2020, 3:02am

Hi @ines,

This is a follow up to Mayanks post - "major variance in scoring which is causing problems"
Any suggestions how we can fix the variance in scoring?
We were expecting softmax scores - where the multi-class scores/probabilities total to 1 - which is not the case. And we don't have insights into the architecture of the model.

Thanks for your advice.

Kapil

SofieVL · July 22, 2020, 2:27pm

Hi Kapil and Mayank,

You said you have a "multi-label" textcat problem, does that mean that in fact, one sample text can be annotated with multiple positive labels? Because in that case, the output probabilities wouldn't sum up to 1 - the different labels would be seen as "parallel" classification challenges.

If, however, you have a "multi-class" textcat problem but only 1 class can be positive per sample, we'd need to set exclusive_classes to True in your textcat model, and then a Softmax output layer would indeed be used (https://github.com/explosion/spaCy/blob/master/spacy/_ml.py#L702-L707)

Can you share which exact script you're running to perform the text classification, and what exactly the parameters of your challenge are?

Topic		Replies	Views
Text classification scoring usage , textcat , custom	1	616	March 24, 2020
Why getting better result in textcat-multilabel than textcat?	13	328	September 11, 2023
Customizing NER predictions from Spacy for the Scorer function ner , spacy	3	3468	May 25, 2019
Textcat_MultiLable - How doc[cats]=1 or 0 works while training the Model textcat , spacy	3	19	February 14, 2025
Active learning for a multilabel text classifer textcat	1	1126	December 14, 2017

Scorer for Text Classification

Related topics