Confusing error for tasks with "text": null

I used ner.teach to annotate a bunch of data to improve the NER model. I then ran ner.batch-train on it, and successfully updated and saved the new model. When I turn around and try to use the new model in ner.teach, I get the following error:

ahalterman$ prodigy ner.teach ner_db ner_model  brazil2.jsonl --label LOC,GPE
Traceback (most recent call last):
  File "/Users/ahalterman/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/ahalterman/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 238, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 143, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/util.pyx", line 173, in prodigy.util.suggest_view_id
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/toolz/itertoolz.py", line 368, in first
    return next(iter(seq))
  File "cython_src/prodigy/components/sorters.pyx", line 127, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 53, in genexpr
  File "cython_src/prodigy/models/ner.pyx", line 215, in __call__
  File "cython_src/prodigy/models/ner.pyx", line 185, in get_tasks
  File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
  File "cython_src/prodigy/models/ner.pyx", line 151, in predict_spans
  File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
  File "cython_src/prodigy/components/preprocess.pyx", line 12, in split_sentences
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 531, in pipe
    for doc, context in izip(docs, contexts):
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 554, in pipe
    for doc in docs:
  File "nn_parser.pyx", line 369, in pipe
  File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
  File "nn_parser.pyx", line 369, in pipe
  File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
  File "pipeline.pyx", line 395, in pipe
  File "cytoolz/itertoolz.pyx", line 1046, in cytoolz.itertoolz.partition_all.__next__ (cytoolz/itertoolz.c:14538)
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 710, in _pipe
    for doc in docs:
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 534, in <genexpr>
    docs = (self.make_doc(text) for text in texts)
  File "/Users/ahalterman/anaconda3/lib/python3.6/site-packages/spacy/language.py", line 357, in make_doc
    return self.tokenizer(text)
  File "tokenizer.pyx", line 80, in spacy.tokenizer.Tokenizer.__call__
TypeError: object of type 'NoneType' has no len()

What’s puzzling is that I can load and use the new model with spaCy without any trouble. Any thoughts?

Whoops, disregard. The error came from the data, not from loading the model. The error was caused by a {"text": null} line in the JSONL. That seems like something that could be caught and ignored or given a more transparent error.

1 Like

Thanks for updating with your solution! :+1:

And yes, I agree. For the upcoming version, we’ve added better error handling, plus a logging option that lets you run Prodigy with the PRODIGY_LOGGING environment variable and logs everything that’s going on, including a “verbose” mode that also outputs the individual tasks, and will let you see which input Prodigy fails on. This should hopefully make things like this easier to debug.

While Prodigy will check for invalid tasks (i.e. everything that’s not a dictionary), there’s currently no check for null values, since this is recipe-specific. For example, an image task could have "text": null and still be valid. So my ideas for a solution would be:

  • Log a warning for falsy values where strings are expected.
  • Add validation functions that can be wrapped around a stream. There could be a text-based task validation function, an image-based one etc. The built-in recipes will wrap the stream by default, and for custom recipes, you can either write your own, or use one of the functions provided by Prodigy (which will obviously be documented as well). I think this solution is actually pretty nice, and fits well with Prodigy’s component-based architecture and philosophy.
1 Like

I ran into similar problems when I try to use terms.to-patterns and

run

prodigy terms.to-patterns drugs_terms drugs_patterns.jsonl --label DRUG

found that

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/ce_acceptance_environment/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/user/miniconda3/envs/ce_acceptance_environment/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/user/miniconda3/envs/ce_acceptance_environment/lib/python3.5/site-packages/prodigy/__main__.py", line 253, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 150, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/user/miniconda3/envs/ce_acceptance_environment/lib/python3.5/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/user/miniconda3/envs/ce_acceptance_environment/lib/python3.5/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/user/miniconda3/envs/ce_acceptance_environment/lib/python3.5/site-packages/prodigy/recipes/terms.py", line 162, into_patterns
    .format(len(terms), dataset))
TypeError: object of type 'NoneType' has no len()

Thanks for the report!

This indicates that the dataset you're using to create the patterns from (drugs_terms) is either empty, or more likely, does not exist at all. Prodigy should probably handle this better and exit with an error. Will fix this, thanks!

1 Like