prodigy sense2vec.to-patterns KeyError: 'word'

Hi, having the below error when running the following command:

prodigy sense2vec.to-patterns my_sense2vec en_core_web_lg PROD_SIO --output-file my_patterns.jsonl
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/natu/Documents/Azumo/fb-supplier-natu/venv/lib/python3.8/site-packages/prodigy/__main__.py", line 60, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 300, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/Users/natu/Documents/my-project/venv/lib/python3.8/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/Users/natu/Documents/my-project/venv/lib/python3.8/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/natu/Documents/my-project/venv/lib/python3.8/site-packages/sense2vec/prodigy_recipes.py", line 194, in to_patterns
    terms = set([eg["word"] for eg in examples if eg["answer"] == "accept"])
  File "/Users/natu/Documents/my-project/venv/lib/python3.8/site-packages/sense2vec/prodigy_recipes.py", line 194, in <listcomp>
    terms = set([eg["word"] for eg in examples if eg["answer"] == "accept"])
KeyError: 'word'

Don't know exactly what I am doing wrong (used to work).
Thanks!

Hi! What's in your PROD_SIO dataset? Maybe you accidentally added some other stuff to it that wasn't created by accepting/rejecting sense2vec suggestions?

The sense2vec.to-patterns recipe expects each example to follow the format created by sense2vec.teach, so each record should have an entry "word" containing the original string of the word you accepted.

One way to check this would be to run db-out and export your PROD_SIO dataset. Then you can see all examples and maybe find the ones that come from somewhere else. You can always edit the exported data and re-import it to a new dataset with db-in.

Thanks @ines!

I see what you mean, I'll try to check that out (actually my dataset is my_sense2vec and PROD_SIO is my label :slight_smile:) .

Natu

Ah, sorry! I think I copy-pasted this once and used the wrong value :sweat_smile: But yes, I think there's definitely something in your dataset that shouldn't have gone in there and if you remove that, it should work as expected again.