textcat.teach not taking into account label value

textcat
to-be-released

(Juan Roberto Honorato) #1

Hi, I have a question about the textcat.teach recipe when using a model in the loop.

  1. I first added some categories to a spanish base model.
  2. Then I trained those categories with some labeled data.
  3. I meant to use this partially trained model to bootstrap my labeling process in prodigy, but I’m seeing some unexpected behavior: The score seems to be from the first label if I run the example manually through spacy, not the one set by me. Surely this is not the intended output?

I’m launching prodigy from bash, like so

prodigy textcat.teach test models/output_model input.txt --label busqueda

Given that command, I’m seeing this example

And loading the same model with spacy, this is the output I get:

nlp(“Compre un generador y no funciona”).cats
{‘boleta/factura’: 0.2774198651313782, ‘busqueda’: 0.08486541360616684, …}

Notice that the scoring in prodigy is the same as the first label in spacy, this is consistent with all examples :thinking:

Thanks


(Matthew Honnibal) #2

Thanks! This was a bug in the meta attribute that displays the score. It doesn’t affect the way the examples are sorted, but it was changing the displayed score, giving the confusing results. We should have a new point release uploaded soon, which will include the fix.


(Juan Roberto Honorato) #3

That’s great to hear!
Just to be sure I dug deeper in order to understand what was going on. The bug seems to come from the TextClassifier model itself when asked to predict on an example.

nlp = spacy.load(spacy_model, disable=['ner', 'parser'])
model = TextClassifier(nlp, set(['label 1', 'label 2', 'label 3']), long_text=long_text)
model.__call__(stream).__next__()

The output I’m seeing from the above is a tuple with the scoring and the annotated example, as described in the documentation. The problem is that all the examples come with the same label, even if I pass multiple labels to the model, as in the code above.

Is this what you were refering to?


(Ines Montani) #4

In your example code, does the model you’re loading have a pre-trained text classifier component? If not, it’s possible that the model only asks about one label first and then adjusts accordingly. The built-in annotation recipes are also desgined to focus on one label at a time.

I think what Matt was referring to was a much more superficial mistake: The task’s "meta" property (which is what’s displayed in the UI) wasn’t always overwritten correctly, so the score that was displayed was the score from a different label. This didn’t have an impact on the actual scoring, only on what’s shown in the UI.


(Juan Roberto Honorato) #5

Yes, it does. It has 22 trained classes, some with thousands of examples. Sorry if the generic class names confused the situation, those are definitely not the names of the classes I’m using nor the actual code I tested. In the code the class names match.

Ok, what I posted in my follow-up is entirely different then. I understand that the process is designed to focus on one label at a time, and I intend to use it that way, but I tested the code above just to trace and narrow the problem down.

Look at the output of the code I pasted earlier:

[
{
    "text": "Me puede indicar como instalar un reseptaculo para ducha 70×70",
    "_input_hash": -440991298,
    "_task_hash": -712696987,
    "label": "busqueda",
    "score": 0.63885098695755,
    "priority": 0.63885098695755,
    "spans": [],
    "meta": {
    "score": 0.63885098695755
    }
},
{
    "text": "Compre un generador y no funciona",
    "_input_hash": -1278129354,
    "_task_hash": -1836282783,
    "label": "busqueda",
    "score": 0.2774198353290558,
    "priority": 0.2774198353290558,
    "spans": [],
    "meta": {
    "score": 0.2774198353290558
    }
},
{
    "text": "Hola.. terraza Sao Paolo",
    "_input_hash": -1251212495,
    "_task_hash": -133394796,
    "label": "busqueda",
    "score": 0.2961369454860687,
    "priority": 0.2961369454860687,
    "spans": [],
    "meta": {
    "score": 0.2961369454860687
    }
}]

This is the output if I run those examples over the model by loading it manually

>>> import spacy
>>> nlp = spacy.load("models/output_model")
>>> nlp("Me puede indicar como instalar un reseptaculo para ducha 70×70").cats
{"boleta/factura": 0.6388508677482605, "busqueda": 0.28845229744911194, "carroDeCompra": 0.26851773262023926, 
"contacto": 0.29197776317596436, "cyberBusqueda": 0.16447336971759796, "cyberStock": 0.07532192021608353, 
"despacho": 0.3997839093208313, "errorCompra": 0.03661337494850159, "errorContacto": 0.19127494096755981, 
"garantia": 0.6039203405380249, "horarioAtencion": 0.038573991507291794, "infoCompra": 0.0891498476266861, 
"mediosPago": 0.2964952290058136, "newsletter": 0.3004630208015442, "oportunidades": 0.08744868636131287, 
"paginaWeb": 0.20932042598724365, "reclamo": 0.2141648828983307, "retiroTienda": 0.2355145961046219, 
"stockOportunidades": 0.3207305073738098, "ubiTienda": 0.08180715888738632}
>>> nlp("Compre un generador y no funciona").cats
{"boleta/factura": 0.2774198651313782, "busqueda": 0.08486541360616684, "carroDeCompra": 0.14844946563243866, 
"contacto": 0.07637190073728561, "cyberBusqueda": 0.12072107195854187, "cyberStock": 0.44110289216041565, 
"despacho": 0.8482178449630737, "errorCompra": 0.12313196063041687, "errorContacto": 0.4583171308040619, 
"garantia": 0.38733163475990295, "horarioAtencion": 0.056911077350378036, "infoCompra": 0.300380140542984, 
"mediosPago": 0.4579319953918457, "newsletter": 0.0994899645447731, "oportunidades": 0.14953915774822235, 
"paginaWeb": 0.3992502987384796, "reclamo": 0.04635193571448326, "retiroTienda": 0.45770877599716187, 
"stockOportunidades": 0.09573712944984436, "ubiTienda": 0.04304426535964012}
>>> nlp("Hola.. terraza Sao Paolo").cats
{"boleta/factura": 0.29613691568374634, "busqueda": 0.1275823414325714, "carroDeCompra": 0.1026281863451004, 
"contacto": 0.14645826816558838, "cyberBusqueda": 0.03767079859972, "cyberStock": 0.04783276841044426, 
"despacho": 0.6191720366477966, "errorCompra": 0.11303283274173737, "errorContacto": 0.4571975767612457, 
"garantia": 0.2942344844341278, "horarioAtencion": 0.02736266516149044, "infoCompra": 0.2167942076921463, 
"mediosPago": 0.1288776397705078, "newsletter": 0.051116943359375, "oportunidades": 0.2680467367172241, 
"paginaWeb": 0.07719093561172485, "reclamo": 0.012597997672855854, "retiroTienda": 0.07658756524324417, 
"stockOportunidades": 0.0920182541012764, "ubiTienda": 0.06460288166999817}

Notice the case I made before, where the scoring from the first output matches the scoring of the very first label of the second output. It applies to the meta value as @honnibal said, but it does for the other keys as well.