KeyError:'label' in Prodigy 1.8

Hi, I’m running textcat-batch-teach and receive the following error output:

PS C:\Users\nlp\data> python -m prodigy textcat.batch-train net_web_terms en_core_web_lg --output model -es 0.2

Loaded model en_core_web_lg
Traceback (most recent call last):
File “C:\Program Files\Python37\lib\runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “C:\Program Files\Python37\lib\runpy.py”, line 85, in run_code
exec(code, run_globals)
File "C:\Program Files\Python37\lib\site-packages\prodigy_main
.py", line 380, in
controller = recipe(args, use_plac=True)
File “cython_src\prodigy\core.pyx”, line 212, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “C:\Program Files\Python37\lib\site-packages\plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “C:\Program Files\Python37\lib\site-packages\plac_core.py”, line 207, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File “C:\Program Files\Python37\lib\site-packages\prodigy\recipes\textcat.py”, line 205, in batch_train
examples = convert_options_to_cats(examples)
File “cython_src\prodigy\components\preprocess.pyx”, line 265, in prodigy.components.preprocess.convert_options_to_cats
KeyError: ‘label’

Hi! What’s in your net_web_terms dataset you’re training from? From the error, it looks like there might be an example in there that doesn’t have a "label" key (e.g. the annotated label), and also isn’t a multiple-choice example. Maybe you accidentally added the annotations to a dataset that already had examples from a different task in there?

You can export your dataset using the db-out command and then inspect the JSONL file in an editor.

Thanks for the prompt reply! After the export, I counted the number of “label” keys and it was equal to the number of lines. Below are 2 lines from the json file as a sample. Anything else I could try?

{"text":"A vulnerability in the web-based management interface of Cisco Webex Events Center, Cisco Webex Meeting Center, Cisco Webex Support Center, and Cisco Webex Training Center could allow an unauthenticated, remote attacker to conduct a cross-site scripting (XSS) attack against a user of the web-based management interface of the affected service.","meta":{"source":"CVE-2018-15436","url":"http://www.ibm.com/support/docview.wss?uid=swg22016346","score":0.9962189908},"_input_hash":578901348,"_task_hash":-1247858894,"tokens":[{"text":"A","start":0,"end":1,"id":0},{"text":"vulnerability","start":2,"end":15,"id":1},{"text":"in","start":16,"end":18,"id":2},{"text":"the","start":19,"end":22,"id":3},{"text":"web","start":23,"end":26,"id":4},{"text":"-","start":26,"end":27,"id":5},{"text":"based","start":27,"end":32,"id":6},{"text":"management","start":33,"end":43,"id":7},{"text":"interface","start":44,"end":53,"id":8},{"text":"of","start":54,"end":56,"id":9},{"text":"Cisco","start":57,"end":62,"id":10},{"text":"Webex","start":63,"end":68,"id":11},{"text":"Events","start":69,"end":75,"id":12},{"text":"Center","start":76,"end":82,"id":13},{"text":",","start":82,"end":83,"id":14},{"text":"Cisco","start":84,"end":89,"id":15},{"text":"Webex","start":90,"end":95,"id":16},{"text":"Meeting","start":96,"end":103,"id":17},{"text":"Center","start":104,"end":110,"id":18},{"text":",","start":110,"end":111,"id":19},{"text":"Cisco","start":112,"end":117,"id":20},{"text":"Webex","start":118,"end":123,"id":21},{"text":"Support","start":124,"end":131,"id":22},{"text":"Center","start":132,"end":138,"id":23},{"text":",","start":138,"end":139,"id":24},{"text":"and","start":140,"end":143,"id":25},{"text":"Cisco","start":144,"end":149,"id":26},{"text":"Webex","start":150,"end":155,"id":27},{"text":"Training","start":156,"end":164,"id":28},{"text":"Center","start":165,"end":171,"id":29},{"text":"could","start":172,"end":177,"id":30},{"text":"allow","start":178,"end":183,"id":31},{"text":"an","start":184,"end":186,"id":32},{"text":"unauthenticated","start":187,"end":202,"id":33},{"text":",","start":202,"end":203,"id":34},{"text":"remote","start":204,"end":210,"id":35},{"text":"attacker","start":211,"end":219,"id":36},{"text":"to","start":220,"end":222,"id":37},{"text":"conduct","start":223,"end":230,"id":38},{"text":"a","start":231,"end":232,"id":39},{"text":"cross","start":233,"end":238,"id":40},{"text":"-","start":238,"end":239,"id":41},{"text":"site","start":239,"end":243,"id":42},{"text":"scripting","start":244,"end":253,"id":43},{"text":"(","start":254,"end":255,"id":44},{"text":"XSS","start":255,"end":258,"id":45},{"text":")","start":258,"end":259,"id":46},{"text":"attack","start":260,"end":266,"id":47},{"text":"against","start":267,"end":274,"id":48},{"text":"a","start":275,"end":276,"id":49},{"text":"user","start":277,"end":281,"id":50},{"text":"of","start":282,"end":284,"id":51},{"text":"the","start":285,"end":288,"id":52},{"text":"web","start":289,"end":292,"id":53},{"text":"-","start":292,"end":293,"id":54},{"text":"based","start":293,"end":298,"id":55},{"text":"management","start":299,"end":309,"id":56},{"text":"interface","start":310,"end":319,"id":57},{"text":"of","start":320,"end":322,"id":58},{"text":"the","start":323,"end":326,"id":59},{"text":"affected","start":327,"end":335,"id":60},{"text":"service","start":336,"end":343,"id":61},{"text":".","start":343,"end":344,"id":62}],"spans":[{"start":63,"end":68,"text":"Webex","rank":0,"label":"NETPORT_WEB","score":0.9962189908,"source":"en_core_web_lg","input_hash":578901348}],"answer":"accept"}
{"text":"An attacker could exploit this vulnerability by gaining local access to a system running Microsoft Windows and protected by Cisco Immunet or Cisco AMP for Endpoints and executing a malicious file.","meta":{"source":"CVE-2018-15437","url":"http://www.ibm.com/support/docview.wss?uid=swg22016346","score":0.9845065568},"_input_hash":-1980660223,"_task_hash":614246028,"tokens":[{"text":"An","start":0,"end":2,"id":0},{"text":"attacker","start":3,"end":11,"id":1},{"text":"could","start":12,"end":17,"id":2},{"text":"exploit","start":18,"end":25,"id":3},{"text":"this","start":26,"end":30,"id":4},{"text":"vulnerability","start":31,"end":44,"id":5},{"text":"by","start":45,"end":47,"id":6},{"text":"gaining","start":48,"end":55,"id":7},{"text":"local","start":56,"end":61,"id":8},{"text":"access","start":62,"end":68,"id":9},{"text":"to","start":69,"end":71,"id":10},{"text":"a","start":72,"end":73,"id":11},{"text":"system","start":74,"end":80,"id":12},{"text":"running","start":81,"end":88,"id":13},{"text":"Microsoft","start":89,"end":98,"id":14},{"text":"Windows","start":99,"end":106,"id":15},{"text":"and","start":107,"end":110,"id":16},{"text":"protected","start":111,"end":120,"id":17},{"text":"by","start":121,"end":123,"id":18},{"text":"Cisco","start":124,"end":129,"id":19},{"text":"Immunet","start":130,"end":137,"id":20},{"text":"or","start":138,"end":140,"id":21},{"text":"Cisco","start":141,"end":146,"id":22},{"text":"AMP","start":147,"end":150,"id":23},{"text":"for","start":151,"end":154,"id":24},{"text":"Endpoints","start":155,"end":164,"id":25},{"text":"and","start":165,"end":168,"id":26},{"text":"executing","start":169,"end":178,"id":27},{"text":"a","start":179,"end":180,"id":28},{"text":"malicious","start":181,"end":190,"id":29},{"text":"file","start":191,"end":195,"id":30},{"text":".","start":195,"end":196,"id":31}],"spans":[{"start":147,"end":150,"text":"AMP","rank":0,"label":"NETPORT_WEB","score":0.9845065568,"source":"en_core_web_lg","input_hash":-1980660223}],"answer":"reject"}

Thanks, that’s good to know and definitely eliminates the most common source of the problem!

Which version of Prodigy are you using? (You can check by running the prodigy stats command.) The latest release is v1.8.2, so if you’re not on that, could you try upgrading and see if the problem still occurs? You’ll be able to download the latest version via your personal download link which you’ve received in the first confirmation email.

Thank you. It is the same error after upgrading to 1.8.2. If you like, I can send you my jsonl file.

Feel free to email it over, but I think I found the problem – sorry, I also only searched for "label" in your data, but didn’t read the examples properly.

If you run textcat.batch-train, it expects one top-level "label" key on each example, describing the text category. This is also the data that will be created by any of the textcat recipes.

However, you data seems to contain named entity annotations? A "label" occurs, but only in the "spans" (highlighted entities in the text). So the text classifier is confused and doesn’t know what to do with it, because it expects text classification annotations.

If you meant to train an NER model, you probably want to run ner.batch-train instead? Alternatively, if you created NER annotations, but you want to use them to train a text classifier, you could write a little conversion script that looks at the annotated spans and creates examples with one top-level label that should apply to the whole text.

OK, yes. I’m afraid I did a little of both ner and textcat teaching. ner.batch-train worked! Thanks again!

1 Like