unable to convert prodigy jsonl to spacy training json

I just got prodigy today to help with training a few spacy models, but I hit this dead end right away.

After labeling some ner.manual datasets and exporting to jsonl I'm unable to convert them using spacy's json format.

I keep getting this error afterpython -m spacy convert ./annotations.jsonl . --converter jsonl -l en

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/benbot/Documents/dev/projects/standup-ninja/service/env/lib/python3.7/site-packages/spacy/__main__.py", line 33, in <module>
    plac.call(commands[command], sys.argv[1:])
  File "/Users/benbot/Documents/dev/projects/standup-ninja/service/env/lib/python3.7/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/Users/benbot/Documents/dev/projects/standup-ninja/service/env/lib/python3.7/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/benbot/Documents/dev/projects/standup-ninja/service/env/lib/python3.7/site-packages/spacy/cli/convert.py", line 106, in convert
    no_print=no_print,
  File "/Users/benbot/Documents/dev/projects/standup-ninja/service/env/lib/python3.7/site-packages/spacy/cli/converters/jsonl2json.py", line 24, in ner_jsonl2json
    ents = record["spans"]
KeyError: 'spans'

I'm pretty new to spacy and prodigy, so if anyone can lend a hand I'd be eternally greatful.

Hi! If you're using the latest Prodigy, you shouldn't have to use spaCy's generic JSONL converter – you can use the data-to-spacy recipe :slightly_smiling_face: It'll also give you more detailed feedback if something goes wrong (e.g. if you're trying to convert a non-NER dataset to NER data etc.).

If it also tells you that one of your examples is missing "spans", the most likely explanation is that there are some non-NER annotations in your dataset. Maybe you annotated something else before and then added the NER annotations to the same set? You can always use db-out to export the dataset and inspect it manually to see what's in there.

1 Like

Hi,

I am trying to convert prodigy training set jsonl (from the output model directory after training) to spacy json using data-to-spacy recipe. It looks like this converter expects a SQLite object and not the corresponding *.jsonl
✘ Can't find '/training.jsonl' in database 'SQLite

In this case, /training.jsonl' is the jsonl that prodigy ner train outputs in the final model directory. I would have used the SQLite dataset annotated with prodigy ner.manual, but I only have access to the training jsonl that sits in the main model directory. So, can I use only this jsonl and convert it to spacy json and how?
thanks

Yes, as you can also see in the recipe docs, the data-to-spacy recipe expects dataset names, not files.

If you only have a file, a quick workaround could be to just import it to a new dataset using db-in, and then converting that dataset to spaCy's format using data-to-spacy.