KeyError: 'label' Error with Prodigy 1.10.7

Hello,

I am trying to train a grammar tool for certain ungrammaticality patterns for English in a similar way to Training a grammar tool - #2 by ines.

Here are a few examples from the labeled dataset in json format (after labeling in Prodigy):

{"text":"Energy Australia will do practically all the work","label":"BAD_GRAMMAR","_input_hash":-1212092456,"_task_hash":-510335938,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"And they are there for there customers","label":"BAD_GRAMMAR","_input_hash":-1323887238,"_task_hash":1000448416,"_session_id":null,"_view_id":"classification","answer":"accept"}

Each example/line has 'label: "BAD_GRAMMAR"' label.

Below is the code for training a model with 'prodigy train textcat':

!python -m prodigy train textcat new_set ./Desktop/Retraining_POS_Tagger/tagger_model_3 --output ./Desktop/Grammaticality_Classifier/grammaticality_model --eval-id test_dataset -TE

However, I get the following error message:

"""
:heavy_check_mark: Loaded model './Desktop/Retraining_POS_Tagger/tagger_model_3'
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/atakanince/groupsolver_env/lib/python3.8/site-packages/prodigy/main.py", line 53, in
controller = recipe(args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 321, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/Users/atakanince/groupsolver_env/lib/python3.8/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/Users/atakanince/groupsolver_env/lib/python3.8/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File "/Users/atakanince/groupsolver_env/lib/python3.8/site-packages/prodigy/recipes/train.py", line 103, in train
data, labels = merge_data(nlp, **merge_cfg)
File "/Users/atakanince/groupsolver_env/lib/python3.8/site-packages/prodigy/recipes/train.py", line 402, in merge_data
for eg in convert_options_to_cats(textcat_validated, exclusive=textcat_exclusive):
File "cython_src/prodigy/components/preprocess.pyx", line 353, in prodigy.components.preprocess.convert_options_to_cats
KeyError: 'label'
"""

I have no idea what's wrong. Help would be much appreciated.

Best,
-Atakan

Hm, the data sample you posted seems OK to me.

To double check, I loaded those two examples in a custom db db_4097 and ran

prodigy train textcat db_4097 blank:en --eval-id db_4097

Which ran without issue.

Have you double checked the format of your eval set, are you 100% there's a label annotation in each and every example?

Or could you provide a longer data sample (both for train and test) that helps me reproduce the issue?

Hi Sofie,

Thank you for your prompt response. Here is the eval dataset:

"""
{"text":"There products are good.","label":"BAD_GRAMMAR","_input_hash":-1645639406,"_task_hash":-1410066889,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"There great products are awesome.","label":"BAD_GRAMMAR","_input_hash":-819965774,"_task_hash":-1994949002,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"There is a shop around the corner.","label":"BAD_GRAMMAR","_input_hash":-471097359,"_task_hash":-437723881,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"Are there elections soon?","label":"BAD_GRAMMAR","_input_hash":836169608,"_task_hash":1490833088,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"There product is high quality.","label":"BAD_GRAMMAR","_input_hash":715642260,"_task_hash":746544323,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Is there product high quality?","label":"BAD_GRAMMAR","_input_hash":1034366084,"_task_hash":1326463427,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"There life is better with us.","label":"BAD_GRAMMAR","_input_hash":-34810624,"_task_hash":1149767129,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Is there life outside?","label":"BAD_GRAMMAR","_input_hash":855489025,"_task_hash":676857513,"_session_id":null,"_view_id":"classification","answer":"reject"}

"""

here is some more data from train dataset:

"""
{"text":"AOPA & EAA both cost a fraction of the annual dues at NBAA yet they are super interactive with there members and it is easy to know what issues they are working on.","label":"BAD_GRAMMAR","_input_hash":-996205124,"_task_hash":-1132416227,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Again freedom of speech everyone has that right to speak there mind","label":"BAD_GRAMMAR","_input_hash":1644417985,"_task_hash":-1304798905,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Alota people lost there jobs prices of food going up petrol","label":"BAD_GRAMMAR","_input_hash":36461726,"_task_hash":-303123952,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"I know you guys making alota money","label":"BAD_GRAMMAR","_input_hash":-2019791434,"_task_hash":-227280561,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"But help the poor people","label":"BAD_GRAMMAR","_input_hash":-1057632978,"_task_hash":-410494013,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"Are there precautions in place? Do you sanitized after each patient?","label":"BAD_GRAMMAR","_input_hash":660409566,"_task_hash":-1666602451,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"Australia is nice country there environment there sense of people","label":"BAD_GRAMMAR","_input_hash":738950169,"_task_hash":572933876,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Australia such as beautiful country there many people so kindness","label":"BAD_GRAMMAR","_input_hash":1370149089,"_task_hash":1863762462,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Bcz it is good to study in different country u can know traditional and there cultures","label":"BAD_GRAMMAR","_input_hash":-2024846060,"_task_hash":2102297053,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"And also food is delicious","label":"BAD_GRAMMAR","_input_hash":-458338878,"_task_hash":2092736225,"_session_id":null,"_view_id":"classification","answer":"ignore"}
{"text":"Because it is a matter of freedon of speech. Like it or not everyone should be allowed to speak there opinions through social media. Thats what it is there for","label":"BAD_GRAMMAR","_input_hash":-1574744707,"_task_hash":1059412996,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Because there juice is sweet and really good","label":"BAD_GRAMMAR","_input_hash":-1390572401,"_task_hash":1896111745,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Because they lie on anything that they push for there political agenda","label":"BAD_GRAMMAR","_input_hash":1708305497,"_task_hash":-354000252,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Been there many times but prefer Tenerife.","label":"BAD_GRAMMAR","_input_hash":132587129,"_task_hash":-144354063,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"Being atound other people and they don't want to cover there face or stay 6 feet away","label":"BAD_GRAMMAR","_input_hash":1781422388,"_task_hash":-27939727,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"But can't wait to get a lunch lunch or lunch tomorrow or tomorrow I'll be there tomorrow morning and then I pick up the kids tomorrow or lunch lunch or","label":"BAD_GRAMMAR","_input_hash":1880689730,"_task_hash":1450162884,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"By training ing there employees to use technology","label":"BAD_GRAMMAR","_input_hash":-1012638018,"_task_hash":-401584579,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Do you think there should be a rule, where everyone gets a standing ovation once in there life. You already started read on...","label":"BAD_GRAMMAR","_input_hash":1013251118,"_task_hash":-850667332,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"It is trying to get people to start reading on Amazon read","label":"BAD_GRAMMAR","_input_hash":-1749264730,"_task_hash":1804529721,"_session_id":null,"_view_id":"classification","answer":"ignore"}
{"text":"Easy to communicate with there people and so many peaceful university","label":"BAD_GRAMMAR","_input_hash":-798600673,"_task_hash":-21034858,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Energy Australia are always trying to ensure there customers know they are trying to make the most affordable and clean energy","label":"BAD_GRAMMAR","_input_hash":1836122933,"_task_hash":-589373510,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Everyone In This World Should Get A Standing Innovation. At Least Once In There Life. # Read","label":"BAD_GRAMMAR","_input_hash":264735208,"_task_hash":2069375211,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Excitement but the question would be are there real savings?","label":"BAD_GRAMMAR","_input_hash":-444669240,"_task_hash":-1184069775,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"Fashion nova. There way of making you feel comfortable and fitted to feel pretty is amazing","label":"BAD_GRAMMAR","_input_hash":-1618707928,"_task_hash":2120577728,"_session_id":null,"_view_id":"classification","answer":"accept"}
{"text":"Graphics as well as the landscape/setting looks appealing. Would want to know roughly the size of the island, are there alot of dungeons to explore, are there guilds to join,","label":"BAD_GRAMMAR","_input_hash":1921931521,"_task_hash":934089965,"_session_id":null,"_view_id":"classification","answer":"reject"}
{"text":"Hard to find people who see things there way I dio and should read books","label":"BAD_GRAMMAR","_input_hash":495779735,"_task_hash":1632352676,"_session_id":null,"_view_id":"classification","answer":"accept"}
"""

Validation dataset format has label for each line.

I double-checked and all lines in the train dataset have label.

Thanks!

This is frustrating, because I can't replicate your issue which makes it really difficult to debug on my end.
What I've done is, I've taken your two data samples, stored them in a jsonl file, read them into Prodigy with db-in and then ran

prodigy train textcat train blank:en --eval-id test -TE

which gives me

✔ Loaded model 'blank:en'
Created and merged data for 24 total examples
Created and merged data for 8 total examples
Using 24 train / 8 eval (from 'test')
Component: textcat | Batch size: compounding | Dropout: 0.2 | Iterations: 10
ℹ Baseline accuracy: 100.000

=========================== ✨  Training the model ===========================

#    Loss       F-Score
--   --------   --------
1    7.00       100.000
2    7.00       100.000
...
Label         F-Score
-----------   -------
BAD_GRAMMAR   100.000
...

Could you try the same - running with the limited datasets and a blank English model? Then perhaps change to your custom model and see whether it runs on the sample data? Then change the training set, and only after that change the test set, to see when the error starts occurring?

Thank you so much for looking into this Sofie!
As training data, I am using only 23 examples and the same test dataset I sent to you.
The following is giving the same error:

"""
!python -m prodigy train textcat new_set_2 blank:en --eval-id test_dataset -TE
"""

When I split the train dataset for validation, no error message:

"""
!python -m prodigy train textcat new_set_2 blank:en -es 0.2 -TE
"""

When I try the custom model with the small dataset as above, it works fine:

"""
!python -m prodigy train textcat new_set_2 ./Desktop/Retraining_POS_Tagger/tagger_model_3 -es 0.2 -TE
"""

When I use the full train dataset with blank:en and the custom model, both fail:

"""
!python -m prodigy train textcat new_set blank:en -es 0.2 -TE
"""

"""
!python -m prodigy train textcat new_set ./Desktop/Retraining_POS_Tagger/tagger_model_3 -es 0.2 -TE
"""

It looks like the problem is with both the train and test dataset files. Both are json. Can I email them to you?

-Atakan

Yes, you can email them to sofie at explosion.ai, then I can hopefully replicate and help you debug this!

I imported the train and test datasets with db-in using new names: new_set_2 and test_dataset_2 instead of new_set and test_dataset, respectively. and it worked:

:heavy_check_mark: Loaded model './Desktop/Retraining_POS_Tagger/tagger_model_3'
Created and merged data for 149 total examples
Created and merged data for 8 total examples
Using 149 train / 8 eval (from 'test_dataset_2')
Component: textcat | Batch size: compounding | Dropout: 0.2 | Iterations: 10
:information_source: Baseline accuracy: 100.000

=========================== :sparkles: Training the model ===========================

Loss F-Score


1 32.00 100.000
2 32.00 100.000
3 32.00 100.000
4 32.00 100.000
5 18.00 100.000
6 4.88 100.000
7 4.00 100.000
8 4.88 100.000
9 4.00 100.000
10 4.00 100.000

============================= :sparkles: Results summary =============================

Label F-Score


BAD_GRAMMAR 100.000

Best F-Score 100.000
Baseline 100.000

Does that mean your issue is resolved then?
Perhaps the datasets with the other names contained some older, incorrect examples from a previous experiment?

That's what I think too. One more question. In the report I share above, the baseline accuracy is 100, as best f-score is, after training with ~300 examples. When I try the model both on grammatical and ungrammatical sentences, the score is 1. Is that because the model does not have enough data? How much data should I have for training such a model?

Thanks in advance.

-Atakan

I'm not sure I understand what you mean? If the score is 1, that means 100%.
Also, if you're measuring on just 8 evaluation examples, that might be a bit too few to get a reliable performance score.

Oh, sorry for not being clear. When I test the model with a grammatical sentence, I get 1.0 (100% ungrammatical).

grammar_nlp("Their book is weak").cats
{'BAD_GRAMMAR': 1.0}

When I test it with an ungrammatical one, I get the same score:

grammar_nlp("There book is weak.").cats
{'BAD_GRAMMAR': 1.0}

I would expect some score lower than 1.0 for grammatical cases. I was wondering whether I get 1.0 for both grammatical and ungrammatical cases because the model has not learned anything yet and I need to label more data.

Thanks,
-Atakan

Right, I hadn't looked into the details of your annotation/challenge yet.

So if I understand correctly, you're training a textcat with examples label="BAD_GRAMMAR", and that is the only label you're feeding the classifier. What happens then, is that the classifier will simply learn to predict that all possible input is BAD_GRAMMAR, because it hasn't received any counter examples. If it just always predicts bad grammar, the training loss is zero and the ML algorithm is happy, but your classifier will not be very useful.

So, what you'd need to do is make sure that you also include examples that have good grammar. Only then will it become a challenge for the ML algorithm, and will it try to actually learn that difference.

Thank you Sofie! Actually, the training data has 239 ungrammatical and 83 grammatical examples. I guess the dataset is not big enough.

-Atakan

Hi Atakan,

I think there's one more issue with your training command: the fact that you're using -TE which means the labels are interpreted as "mutually exclusive". This means that exactly 1 true label is expected per instance. This setting really is only applicable when you're training on more than one label.

In the case of training on one label, as in your use-case, this -TE setting has a bit of an unexpected consequence. The internal validation will only look at the set of labels that got applied, which is always "BAD_GRAMMAR" in your case. The evaluation will, artificially, always say this is 100% correct because it saw the right label.

Instead, you need to remove the -TE setting and you'll see that you'll get a much more realistic training performance, that should increase as you add more data / train longer.

Hi Sofie,

Thank you so much! Now I get realistic results.

-Atakan

1 Like