Error when seeds in terms.teach include empty string

I’ve been trying to run the terms.teach recipe similar to the insults classification demo. I downloaded and installed the en_vectors_web_lg model assuming that this would fix the problem.

When I run the command:

prodigy terms.teach financial_seeds en_vectors_web_lg --seed finance.txt

I get the following output:


Initialising with 12 seed terms from finance.txt
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/crivera5/.virtual_envs/spacy_tools_python3/lib/python3.5/site-packages/prodigy/__main__.py", line 235, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 129, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/Users/crivera5/.virtual_envs/spacy_tools_python3/lib/python3.5/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/Users/crivera5/.virtual_envs/spacy_tools_python3/lib/python3.5/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/crivera5/.virtual_envs/spacy_tools_python3/lib/python3.5/site-packages/prodigy/recipes/terms.py", line 95, in teach
    accept_doc = Doc(nlp.vocab, words=seeds)
  File "spacy/tokens/doc.pyx", line 154, in spacy.tokens.doc.Doc.__init__ (spacy/tokens/doc.cpp:5578)
  File "spacy/tokens/doc.pyx", line 512, in spacy.tokens.doc.Doc.push_back (spacy/tokens/doc.cpp:10841)
AssertionError

I figured out the cause of the error. My seed text file had a blank line which resulted in an empty string in the seeds list. This resulted in the error. Perhaps this should be ‘fixed’?

Thanks for the report and analysis – will be fixed in the next release! :+1:

Thanks for the tool. Its seems great so far.

More info:
The error was due to spacy.tokens.Doc(model, vocab=seeds) when it encountered the empty string.
It can be repaired with get_seeds() by parsing out the empty lines or in Doc.

Thanks again!!!

Will be fixed in the upcoming Prodigy v0.3.0! :tada: