KeyError: 'label' when running textcat.teach

textcat
done

(Lez) #1

Hello,
I’m running my first textcat.teach recipe . When progress got to 50% (edit: I later found this occured at any percentage - unpredictably), I got the following chain of errors and was also unable to save from the UI:

19:12:16 - Exception when serving /give_answers
Traceback (most recent call last):
  File "C:\Users\User\Anaconda3\lib\site-packages\waitress\channel.py", line 338, in service
    task.service()
  File "C:\Users\User\Anaconda3\lib\site-packages\waitress\task.py", line 169, in service
    self.execute()
  File "C:\Users\User\Anaconda3\lib\site-packages\waitress\task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "hug\api.py", line 424, in hug.api.ModuleSingleton.__call__.api_auto_instantiate
  File "C:\Users\User\Anaconda3\lib\site-packages\falcon\api.py", line 244, in __call__
    responder(req, resp, **params)
  File "hug\interface.py", line 734, in hug.interface.HTTP.__call__
  File "hug\interface.py", line 709, in hug.interface.HTTP.__call__
  File "hug\interface.py", line 649, in hug.interface.HTTP.call_function
  File "hug\interface.py", line 100, in hug.interface.Interfaces.__call__
  File "C:\Users\User\Anaconda3\lib\site-packages\prodigy\app.py", line 101, in give_answers
    controller.receive_answers(answers)
  File "cython_src\prodigy\core.pyx", line 98, in prodigy.core.Controller.receive_answers
  File "cython_src\prodigy\util.pyx", line 277, in prodigy.util.combine_models.update
  File "cython_src\prodigy\models\textcat.pyx", line 169, in prodigy.models.textcat.TextClassifier.update
KeyError: 'label'

Any Ideas?

Thanks


(Ines Montani) #2

Thanks for the report – this is very strange! For some reason, an annotated example that came back from the web app seems to not have a 'label' key present – which should never happen in text classification mode. Did you notice anything strange in the UI, like a question without a label? And how did your textcat.teach command look?

If you still have your browser with Prodigy open, your latest batch of annotations will still be there (if a request fails, the app will move the batch back to the “outbox” and try again next time). So you could try and stop the server and add a print statement to the give_answers function in prodigy/app.py that outputs the batch of examples Prodigy receives back:

def give_answers(answers=[]):
    print(answers)
    # etc.

You can then re-start the server and hit the “save” button again (don’t reload the browser, though!). You’ll likely still come across the same error when the annotations hit the text classification model, but the /give_answers endpoint should have printed the batch it received back. This will hopefully make it easier to debug what’s going on!


(Lez) #3

Thanks Ines,

Unfortunately there was no useful information revealed by printing answers. I wondered whether this problem may be due to the default spacy model (en_core_web_sm) being too erm small for my use case. Switching to using en_core_web_lg instead appears to have resolved the problem although I don’t know why.

Cheers

Lez


(Shahar Weinstock) #4

Hi,
I just started to use the new version of textcat.teach that includes patterns instead of seeds.
I entered some seed terms and used to terms.to-patterns to create patterns file.

When I use textcat.teach the annotation goes well until I get some examples that come from the pattern matcher. After I annotate one example the annotations can’t be save and I get this notification in the UI:
image

And this is what I see in my shell:

06:20:41 - Exception when serving /give_answers
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.6/site-packages/waitress/channel.py", line 338, in service
    task.service()
  File "/opt/anaconda3/lib/python3.6/site-packages/waitress/task.py", line 169, in service
    self.execute()
  File "/opt/anaconda3/lib/python3.6/site-packages/waitress/task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "hug/api.py", line 424, in hug.api.ModuleSingleton.__call__.api_auto_instantiate
  File "/opt/anaconda3/lib/python3.6/site-packages/falcon/api.py", line 244, in __call__
    responder(req, resp, **params)
  File "hug/interface.py", line 734, in hug.interface.HTTP.__call__
  File "hug/interface.py", line 709, in hug.interface.HTTP.__call__
  File "hug/interface.py", line 649, in hug.interface.HTTP.call_function
  File "hug/interface.py", line 100, in hug.interface.Interfaces.__call__
  File "/opt/anaconda3/lib/python3.6/site-packages/prodigy/app.py", line 101, in give_answers
    controller.receive_answers(answers)
  File "cython_src/prodigy/core.pyx", line 98, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/util.pyx", line 277, in prodigy.util.combine_models.update
  File "cython_src/prodigy/models/textcat.pyx", line 167, in prodigy.models.textcat.TextClassifier.update
KeyError: label

When I end the session I do get a message that all my annotations were saved.

I’m pretty sure that this have something to do with the patterns but I don’t know what I do wrong…


(Ines Montani) #5

Thanks! Something similar was already reported in this thread and it’s been very mysterious. Technically, annotations in textcat.teach should always have an added label, so the update callback should never receive tasks without one.

But now that you mention the patterns, this might actually lead us closer to the problem and an explanation. Did you notice anything strange in the UI, like a task without a label? And could you check your patterns.jsonl file and make sure that all entries have a "label" assigned?

Btw, if you still have your browser open, your last batch of annotations won’t be lost. You can simply exit the Prodigy server and restart it with a different, non-active-learning recipe and then hit “save” again in the web app. (Just don’t reload the browser!) Restarting the server doesn’t do anything if the web app doesn’t make requests – but it’ll make the /give_anwers endpoint available, which lets you save your progress to the dataset. (You could also edit the app.py as I describe here.)


(Shahar Weinstock) #6

No. The examples that came from patterns looked like a NER task, but I guess that’s by design, right?

I checked, they all have a label.

Thanks!


(Ines Montani) #7

Thanks for checking! I think I know what the problem might be – if my suspicion is correct, we should definitely be able to push a fix for this in v1.4.1!

Yes, the span of text that was matched should be highlighted, to show you why the example was selected. However, the task should still have a text classification-style label at the top – which I assume might have been missing?

Update: Found the cause of the issue and fixed it. The fix will be included in the upcoming v1.4.1 – sorry again for the inconvenience here.

Forgot to add: Here’s a quick workaround in the meantime – the solution is very simple. Essentially, you check each example in the stream and if it has spans (i.e. was generated by a pattern), you take the span’s label and add it to the example as the global "label":

def fix_labels(stream):
    for eg in stream:
        spans = eg.get('spans')
        if spans:
            eg['label'] = spans[0]['label']
        yield eg

stream = prefer_uncertain(predict(stream))
stream = fix_labels(stream)

(Ines Montani) #8

Just released v1.4.1 which includes an update to the pattern matcher that fixes this problem. The PatternMatcher class can take two additional keyword arguments:

Argument Type Description
label_span bool If True, the matched label will be added to the highlighted span – if False the span will be highlighted without a label.
label_task bool If True, the matched label will be added to the "label" of the annotation task. If False, no label will be added to the task.

By default, textcat recipes set no label on the span but a label on the task, and ner recipes set no label on the task, but a label on the matched span.


(Chris Cunningham) #9

See recurrence of the issue in v1.4.0. Only thing I noticed was that one result tagged text category as though it was NER, otherwise category appears as header.


(Ines Montani) #10

Ah, just fixed a typo in my post above – I meant v1.4.1! (v1.4.0 was the version that introduced the new pattern matcher for textcat.teach.)


(Chris Cunningham) #11

When will 1.4.1 be released? And is the work around above meant to go in the textcat.teach recipe?


(Ines Montani) #12

v1.4.1 is the current version of Prodigy and was released on March 26 – see here for the changelog:

(And yes, the workaround for v1.4.0 which I describe above is supposed to wrap the stream in textcat.teach.)