Having erros when loading my own data

I am really enjoying using prodigy. I tried the Github Issues dataset. It works perfectly fine :slight_smile:

prodigy dataset gh_issues "Classify issues on GitHub"
prodigy textcat.teach gh_issues en_core_web_sm "docs" --api github --label DOCUMENTATION

I tried to make a different dataset by following the example in here.


Pinterest Hires Its First Head of Diversity
Airbnb and Others Set Terms for Employees to Cash Out

prodigy textcat.teach my_set path/to/my_data.txt --label POLITICS

However, I received the following error.

Traceback (most recent call last):
  File "/home/pacmann/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/pacmann/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/prodigy/__main__.py", line 235, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 130, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/prodigy/recipes/textcat.py", line 42, in teach
    nlp = spacy.load(spacy_model)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/spacy/__init__.py", line 15, in load
    return util.load_model(name, **overrides)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/spacy/util.py", line 108, in load_model
    return load_model_from_path(Path(name), **overrides)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/spacy/util.py", line 136, in load_model_from_path
    meta = get_model_meta(model_path)
  File "/home/pacmann/anaconda3/lib/python3.6/site-packages/spacy/util.py", line 181, in get_model_meta
    raise IOError("Could not read meta.json from %s" % meta_path)
OSError: Could not read meta.json from my_data.jsonl/meta.json

I would to know if there are an suggestions. I’ve been stuck with this error for weeks now and need some help.

Thanks, glad you like Prodigy!

I think the problem here has nothing to do with your data and is much simpler: The second argument on textcat.teach, after your data set, is the base model you want to start off with – for example, spaCy’s en_core_web_sm. The model is used for basic text processing and tokenization, and will be updated with your annotations. Then, the third argument is the data you want to load in. So what happened here is that Prodigy thought path/to/my_data.txt was a model, tried to load it, and failed.

You can see the command signature and available arguments here, or by typing:

prodigy textcat.teach --help

FYI I ran into the same issue above using the example from the documentation:

$ prodigy textcat.teach my_set path/to/my_data.txt --label POLITICS

@beckerfuffle Ah damn, so there’s a bug in the example! I totally missed that. Fixing, thanks! :+1:

1 Like