KeyError: 'label' when running textcat.teach

lswright · March 12, 2018, 7:24pm

Hello,
I’m running my first textcat.teach recipe . When progress got to 50% (edit: I later found this occured at any percentage - unpredictably), I got the following chain of errors and was also unable to save from the UI:

19:12:16 - Exception when serving /give_answers
Traceback (most recent call last):
  File "C:\Users\User\Anaconda3\lib\site-packages\waitress\channel.py", line 338, in service
    task.service()
  File "C:\Users\User\Anaconda3\lib\site-packages\waitress\task.py", line 169, in service
    self.execute()
  File "C:\Users\User\Anaconda3\lib\site-packages\waitress\task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "hug\api.py", line 424, in hug.api.ModuleSingleton.__call__.api_auto_instantiate
  File "C:\Users\User\Anaconda3\lib\site-packages\falcon\api.py", line 244, in __call__
    responder(req, resp, **params)
  File "hug\interface.py", line 734, in hug.interface.HTTP.__call__
  File "hug\interface.py", line 709, in hug.interface.HTTP.__call__
  File "hug\interface.py", line 649, in hug.interface.HTTP.call_function
  File "hug\interface.py", line 100, in hug.interface.Interfaces.__call__
  File "C:\Users\User\Anaconda3\lib\site-packages\prodigy\app.py", line 101, in give_answers
    controller.receive_answers(answers)
  File "cython_src\prodigy\core.pyx", line 98, in prodigy.core.Controller.receive_answers
  File "cython_src\prodigy\util.pyx", line 277, in prodigy.util.combine_models.update
  File "cython_src\prodigy\models\textcat.pyx", line 169, in prodigy.models.textcat.TextClassifier.update
KeyError: 'label'

Any Ideas?

Thanks

ines · March 12, 2018, 7:42pm

Thanks for the report – this is very strange! For some reason, an annotated example that came back from the web app seems to not have a 'label' key present – which should never happen in text classification mode. Did you notice anything strange in the UI, like a question without a label? And how did your textcat.teach command look?

If you still have your browser with Prodigy open, your latest batch of annotations will still be there (if a request fails, the app will move the batch back to the “outbox” and try again next time). So you could try and stop the server and add a print statement to the give_answers function in prodigy/app.py that outputs the batch of examples Prodigy receives back:

def give_answers(answers=[]):
    print(answers)
    # etc.

You can then re-start the server and hit the “save” button again (don’t reload the browser, though!). You’ll likely still come across the same error when the annotations hit the text classification model, but the /give_answers endpoint should have printed the batch it received back. This will hopefully make it easier to debug what’s going on!

lswright · March 13, 2018, 11:19pm

Thanks Ines,

Unfortunately there was no useful information revealed by printing answers. I wondered whether this problem may be due to the default spacy model (en_core_web_sm) being too erm small for my use case. Switching to using en_core_web_lg instead appears to have resolved the problem although I don’t know why.

Cheers

Lez

WeinstockShahar · March 22, 2018, 3:23pm

Hi,
I just started to use the new version of textcat.teach that includes patterns instead of seeds.
I entered some seed terms and used to terms.to-patterns to create patterns file.

When I use textcat.teach the annotation goes well until I get some examples that come from the pattern matcher. After I annotate one example the annotations can’t be save and I get this notification in the UI:

And this is what I see in my shell:

06:20:41 - Exception when serving /give_answers
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.6/site-packages/waitress/channel.py", line 338, in service
    task.service()
  File "/opt/anaconda3/lib/python3.6/site-packages/waitress/task.py", line 169, in service
    self.execute()
  File "/opt/anaconda3/lib/python3.6/site-packages/waitress/task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "hug/api.py", line 424, in hug.api.ModuleSingleton.__call__.api_auto_instantiate
  File "/opt/anaconda3/lib/python3.6/site-packages/falcon/api.py", line 244, in __call__
    responder(req, resp, **params)
  File "hug/interface.py", line 734, in hug.interface.HTTP.__call__
  File "hug/interface.py", line 709, in hug.interface.HTTP.__call__
  File "hug/interface.py", line 649, in hug.interface.HTTP.call_function
  File "hug/interface.py", line 100, in hug.interface.Interfaces.__call__
  File "/opt/anaconda3/lib/python3.6/site-packages/prodigy/app.py", line 101, in give_answers
    controller.receive_answers(answers)
  File "cython_src/prodigy/core.pyx", line 98, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/util.pyx", line 277, in prodigy.util.combine_models.update
  File "cython_src/prodigy/models/textcat.pyx", line 167, in prodigy.models.textcat.TextClassifier.update
KeyError: label

When I end the session I do get a message that all my annotations were saved.

I’m pretty sure that this have something to do with the patterns but I don’t know what I do wrong…

ines · March 21, 2018, 1:15pm

Thanks! Something similar was already reported in this thread and it’s been very mysterious. Technically, annotations in textcat.teach should always have an added label, so the update callback should never receive tasks without one.

But now that you mention the patterns, this might actually lead us closer to the problem and an explanation. Did you notice anything strange in the UI, like a task without a label? And could you check your patterns.jsonl file and make sure that all entries have a "label" assigned?

Btw, if you still have your browser open, your last batch of annotations won’t be lost. You can simply exit the Prodigy server and restart it with a different, non-active-learning recipe and then hit “save” again in the web app. (Just don’t reload the browser!) Restarting the server doesn’t do anything if the web app doesn’t make requests – but it’ll make the /give_anwers endpoint available, which lets you save your progress to the dataset. (You could also edit the app.py as I describe here.)

WeinstockShahar · March 22, 2018, 6:45am

No. The examples that came from patterns looked like a NER task, but I guess that's by design, right?

I checked, they all have a label.

Thanks!

ines · March 22, 2018, 7:56am

Thanks for checking! I think I know what the problem might be – if my suspicion is correct, we should definitely be able to push a fix for this in v1.4.1!

Yes, the span of text that was matched should be highlighted, to show you why the example was selected. However, the task should still have a text classification-style label at the top – which I assume might have been missing?

Update: Found the cause of the issue and fixed it. The fix will be included in the upcoming v1.4.1 – sorry again for the inconvenience here.

Forgot to add: Here's a quick workaround in the meantime – the solution is very simple. Essentially, you check each example in the stream and if it has spans (i.e. was generated by a pattern), you take the span's label and add it to the example as the global "label":

def fix_labels(stream):
    for eg in stream:
        spans = eg.get('spans')
        if spans:
            eg['label'] = spans[0]['label']
        yield eg

stream = prefer_uncertain(predict(stream))
stream = fix_labels(stream)

ines · March 26, 2018, 2:55pm

Just released v1.4.1 which includes an update to the pattern matcher that fixes this problem. The PatternMatcher class can take two additional keyword arguments:

Argument	Type	Description
`label_span`	bool	If `True`, the matched label will be added to the highlighted span – if `False` the span will be highlighted without a label.
`label_task`	bool	If `True`, the matched label will be added to the `"label"` of the annotation task. If `False`, no label will be added to the task.

By default, textcat recipes set no label on the span but a label on the task, and ner recipes set no label on the task, but a label on the matched span.

cmc476 · April 4, 2018, 3:08am

See recurrence of the issue in v1.4.0. Only thing I noticed was that one result tagged text category as though it was NER, otherwise category appears as header.

ines · April 4, 2018, 8:57am

Ah, just fixed a typo in my post above – I meant v1.4.1! ( v1.4.0 was the version that introduced the new pattern matcher for textcat.teach.)

cmc476 · April 4, 2018, 6:07pm

When will 1.4.1 be released? And is the work around above meant to go in the textcat.teach recipe?

ines · April 4, 2018, 6:08pm

v1.4.1 is the current version of Prodigy and was released on March 26 – see here for the changelog:

(And yes, the workaround for v1.4.0 which I describe above is supposed to wrap the stream in textcat.teach.)

nshant · September 20, 2018, 9:38am

Hi,
I'm using the textcat.teach function using version 1.4.0, but I'm getting an error that's referenced here:

Would it be possible to upgrade to version 1.4.1, in which this issue seems to be resolved?

ines · September 20, 2018, 9:42am

The latest version of Prodigy that's available is v1.5.1, which shouldn't have this problem. You can download it via your personal download link (see the first email you received after ordering Prodigy)

adingler711 · May 15, 2019, 6:07pm

@ines

To followup on this post.

I trained a classification model and was able to successfully run textcat.batch-train but when I the annotations using prodigy db-out news_headlines /tmp to review the annotations I am now getting a KeyError: ‘label’

I have tried to reload my save classification annotations to a new dataset using

prodigy db-in new_set /tmp/news_headlines.jsonl

After it is loaded, it shows both accepted and rejected values. I am assuming the problem occurred when I ran prodigy db-out news_headlines /tmp.

All my annotations follow this format and do include a label:
{“text”:“b’XYZ…’”,"_input_hash":-2107591373,"_task_hash":-531876770,“label”:“SAFETY”,“score”:0.499956131,“priority”:0.499956131,“spans”:[],“meta”:{“score”:0.499956131},“answer”:“reject”}

ines · May 15, 2019, 7:29pm

@adingler711 That’s definitely strange – because all the recipe really does is load the dataset and then get the "label" from each task. Are you sure there’s not a stray example of a different type in there? Maybe from an accidental import to the wrong set?

What happens when you run the following?

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("safety_flag_biz")
for eg in examples:
    if "label" not in eg:
        print("Example without label:" eg)

adingler711 · May 15, 2019, 7:52pm

You are correct, some of the examples do not have a label. There are 5 of them and they are all ‘answer’: ‘ignore’. For now, I removed the 5 examples and was able to successfully run the textcat.batch-train

Thank you, for your quick help

Nick · October 19, 2019, 1:06pm

After some 300 succeeded annotations I got the same error with Prodigy v1.8.4 and textcat: prodigy textcat.teach test nl /annotate_text.json --label TESTLABEL

Restarting the service didn't make the error go away but refreshing the browser (Chrome) did.

   Exception when serving /give_answers
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/waitress/channel.py", line 336, in service
    task.service()
  File "/usr/local/lib/python3.6/dist-packages/waitress/task.py", line 175, in service
    self.execute()
  File "/usr/local/lib/python3.6/dist-packages/waitress/task.py", line 452, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "/usr/local/lib/python3.6/dist-packages/hug/api.py", line 451, in api_auto_instantiate
    return module.__hug_wsgi__(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/falcon/api.py", line 244, in __call__
    responder(req, resp, **params)
  File "/usr/local/lib/python3.6/dist-packages/hug/interface.py", line 789, in __call__
    raise exception
  File "/usr/local/lib/python3.6/dist-packages/hug/interface.py", line 762, in __call__
    self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/hug/interface.py", line 698, in call_function
    return self.interface(**parameters)
  File "/usr/local/lib/python3.6/dist-packages/hug/interface.py", line 100, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/prodigy/_api/hug_app.py", line 282, in give_answers
    controller.receive_answers(answers, session_id=session_id)
  File "cython_src/prodigy/core.pyx", line 137, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/models/textcat.pyx", line 219, in prodigy.models.textcat.TextClassifier.update
  File "cython_src/prodigy/components/preprocess.pyx", line 290, in prodigy.components.preprocess.convert_options_to_cats
KeyError: 'label'

ines · October 19, 2019, 1:20pm

@Nick Hi! I think the issue you're describing sounds like this one:

It seems like very fast clicking can cause the client to submit an empty task, which then confuses Prodigy when it's trying to update the model in the loop. This happens whenever a new batch of answers is sent back to the server. You can find more details in the thread I linked – we'll also ship a fix for this in the next patch release.

Topic		Replies	Views
SentenceSegmenter Attribute error using textcat.teach textcat , done	1	517	September 14, 2017
KeyError:'label' in Prodigy 1.8 ner , textcat , solved	6	2118	May 31, 2019
KeyError: 'start' in ner.teach ner , done , solved	3	783	July 1, 2019
textcat.teach with custom model from spaCy textcat , spacy , solved	3	472	May 21, 2020
Frequent KeyError via /give_answers done , front-end	5	1156	October 19, 2019

KeyError: 'label' when running textcat.teach

Related topics