textcat.teach not taking into account label value

mathetes · December 4, 2018, 4:50pm

Hi, I have a question about the textcat.teach recipe when using a model in the loop.

I first added some categories to a spanish base model.
Then I trained those categories with some labeled data.
I meant to use this partially trained model to bootstrap my labeling process in prodigy, but I'm seeing some unexpected behavior: The score seems to be from the first label if I run the example manually through spacy, not the one set by me. Surely this is not the intended output?

I'm launching prodigy from bash, like so

prodigy textcat.teach test models/output_model input.txt --label busqueda

Given that command, I'm seeing this example

And loading the same model with spacy, this is the output I get:

nlp("Compre un generador y no funciona").cats
{'boleta/factura': 0.2774198651313782, 'busqueda': 0.08486541360616684, ...}

Notice that the scoring in prodigy is the same as the first label in spacy, this is consistent with all examples

Thanks

honnibal · December 5, 2018, 2:18am

Thanks! This was a bug in the meta attribute that displays the score. It doesn’t affect the way the examples are sorted, but it was changing the displayed score, giving the confusing results. We should have a new point release uploaded soon, which will include the fix.

mathetes · December 6, 2018, 12:07am

That’s great to hear!
Just to be sure I dug deeper in order to understand what was going on. The bug seems to come from the TextClassifier model itself when asked to predict on an example.

nlp = spacy.load(spacy_model, disable=['ner', 'parser'])
model = TextClassifier(nlp, set(['label 1', 'label 2', 'label 3']), long_text=long_text)
model.__call__(stream).__next__()

The output I’m seeing from the above is a tuple with the scoring and the annotated example, as described in the documentation. The problem is that all the examples come with the same label, even if I pass multiple labels to the model, as in the code above.

Is this what you were refering to?

ines · December 6, 2018, 2:21pm

In your example code, does the model you’re loading have a pre-trained text classifier component? If not, it’s possible that the model only asks about one label first and then adjusts accordingly. The built-in annotation recipes are also desgined to focus on one label at a time.

I think what Matt was referring to was a much more superficial mistake: The task’s "meta" property (which is what’s displayed in the UI) wasn’t always overwritten correctly, so the score that was displayed was the score from a different label. This didn’t have an impact on the actual scoring, only on what’s shown in the UI.

mathetes · December 7, 2018, 12:15pm

Yes, it does. It has 22 trained classes, some with thousands of examples. Sorry if the generic class names confused the situation, those are definitely not the names of the classes I'm using nor the actual code I tested. In the code the class names match.

Ok, what I posted in my follow-up is entirely different then. I understand that the process is designed to focus on one label at a time, and I intend to use it that way, but I tested the code above just to trace and narrow the problem down.

Look at the output of the code I pasted earlier:

[
{
    "text": "Me puede indicar como instalar un reseptaculo para ducha 70×70",
    "_input_hash": -440991298,
    "_task_hash": -712696987,
    "label": "busqueda",
    "score": 0.63885098695755,
    "priority": 0.63885098695755,
    "spans": [],
    "meta": {
    "score": 0.63885098695755
    }
},
{
    "text": "Compre un generador y no funciona",
    "_input_hash": -1278129354,
    "_task_hash": -1836282783,
    "label": "busqueda",
    "score": 0.2774198353290558,
    "priority": 0.2774198353290558,
    "spans": [],
    "meta": {
    "score": 0.2774198353290558
    }
},
{
    "text": "Hola.. terraza Sao Paolo",
    "_input_hash": -1251212495,
    "_task_hash": -133394796,
    "label": "busqueda",
    "score": 0.2961369454860687,
    "priority": 0.2961369454860687,
    "spans": [],
    "meta": {
    "score": 0.2961369454860687
    }
}]

This is the output if I run those examples over the model by loading it manually

>>> import spacy
>>> nlp = spacy.load("models/output_model")
>>> nlp("Me puede indicar como instalar un reseptaculo para ducha 70×70").cats
{"boleta/factura": 0.6388508677482605, "busqueda": 0.28845229744911194, "carroDeCompra": 0.26851773262023926, 
"contacto": 0.29197776317596436, "cyberBusqueda": 0.16447336971759796, "cyberStock": 0.07532192021608353, 
"despacho": 0.3997839093208313, "errorCompra": 0.03661337494850159, "errorContacto": 0.19127494096755981, 
"garantia": 0.6039203405380249, "horarioAtencion": 0.038573991507291794, "infoCompra": 0.0891498476266861, 
"mediosPago": 0.2964952290058136, "newsletter": 0.3004630208015442, "oportunidades": 0.08744868636131287, 
"paginaWeb": 0.20932042598724365, "reclamo": 0.2141648828983307, "retiroTienda": 0.2355145961046219, 
"stockOportunidades": 0.3207305073738098, "ubiTienda": 0.08180715888738632}
>>> nlp("Compre un generador y no funciona").cats
{"boleta/factura": 0.2774198651313782, "busqueda": 0.08486541360616684, "carroDeCompra": 0.14844946563243866, 
"contacto": 0.07637190073728561, "cyberBusqueda": 0.12072107195854187, "cyberStock": 0.44110289216041565, 
"despacho": 0.8482178449630737, "errorCompra": 0.12313196063041687, "errorContacto": 0.4583171308040619, 
"garantia": 0.38733163475990295, "horarioAtencion": 0.056911077350378036, "infoCompra": 0.300380140542984, 
"mediosPago": 0.4579319953918457, "newsletter": 0.0994899645447731, "oportunidades": 0.14953915774822235, 
"paginaWeb": 0.3992502987384796, "reclamo": 0.04635193571448326, "retiroTienda": 0.45770877599716187, 
"stockOportunidades": 0.09573712944984436, "ubiTienda": 0.04304426535964012}
>>> nlp("Hola.. terraza Sao Paolo").cats
{"boleta/factura": 0.29613691568374634, "busqueda": 0.1275823414325714, "carroDeCompra": 0.1026281863451004, 
"contacto": 0.14645826816558838, "cyberBusqueda": 0.03767079859972, "cyberStock": 0.04783276841044426, 
"despacho": 0.6191720366477966, "errorCompra": 0.11303283274173737, "errorContacto": 0.4571975767612457, 
"garantia": 0.2942344844341278, "horarioAtencion": 0.02736266516149044, "infoCompra": 0.2167942076921463, 
"mediosPago": 0.1288776397705078, "newsletter": 0.051116943359375, "oportunidades": 0.2680467367172241, 
"paginaWeb": 0.07719093561172485, "reclamo": 0.012597997672855854, "retiroTienda": 0.07658756524324417, 
"stockOportunidades": 0.0920182541012764, "ubiTienda": 0.06460288166999817}

Notice the case I made before, where the scoring from the first output matches the scoring of the very first label of the second output. It applies to the meta value as @honnibal said, but it does for the other keys as well.

Topic		Replies	Views
textcat.teach showing same text twice (and not using active learning?) textcat	15	2300	August 15, 2018
How textcat.teach works under the hood usage , textcat	16	94	March 26, 2025
Text classification scoring usage , textcat , custom	1	618	March 24, 2020
textcat_multilabel with only some labels annotated for some examples	5	377	June 14, 2022
Best use of `textcat.teach` usage , textcat	2	1433	June 18, 2020

textcat.teach not taking into account label value

Related topics