We have a prodigy database in PostgreSQL holding annotations that we created using Prodigy 1.6.1. We’re in the process of updating to Prodigy 1.8.3. As part of that process, we’ve shifted to SQLite.
We’ve exported annotations from this database using db-out
(with Prodigy 1.8.3) just fine to a jsonl
file.
Now, in trying to move those to a SQLite database, we’re running the following commands:
$ prodigy dataset my_dataset_name "My English Dataset"
$ prodigy db-in my_dataset_name my_dataset_name.jsonl
The last command errors with the following trace:
17:19:20 - APP: Using Hug endpoints (deprecated)
17:19:21 - DB: Initialising database SQLite
17:19:21 - DB: Connecting to database SQLite
17:19:21 - LOADER: Using file extension 'jsonl' to find loader
17:19:21 - LOADER: Loading stream from jsonl
Traceback (most recent call last):
File "cython_src/prodigy/components/loaders.pyx", line 145, in prodigy.components.loaders.JSONL
File "/usr/local/lib/python3.7/site-packages/srsly/_json_api.py", line 37, in json_loads
return ujson.loads(data)
ValueError: Trailing data
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/site-packages/prodigy/__main__.py", line 372, in <module>
plac.call(commands[command], arglist=args, eager=False)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/usr/local/lib/python3.7/site-packages/prodigy/__main__.py", line 225, in db_in
annotations = [set_hashes(eg) for eg in annotations]
File "/usr/local/lib/python3.7/site-packages/prodigy/__main__.py", line 225, in <listcomp>
annotations = [set_hashes(eg) for eg in annotations]
File "cython_src/prodigy/components/loaders.pyx", line 152, in JSONL
ValueError: Failed to load task (invalid JSON).
{"text":"\n3.","_input_hash":1308918942,"_task_has ... end":492,"label":"ENTITY_TYPE"}],"answer":"accept"}
The following dependencies are installed:
blis==0.2.4
cachetools==3.1.1
certifi==2019.3.9
chardet==3.0.4
cymem==2.0.2
falcon==1.4.1
hug==2.4.8
idna==2.8
jsonschema==2.6.0
murmurhash==1.0.2
numpy==1.16.4
peewee==2.10.2
plac==0.9.6
preshed==2.0.1
prodigy==1.8.3
psycopg2-binary==2.8.2
PyJWT==1.7.1
python-mimeparse==1.6.0
requests==2.22.0
six==1.12.0
spacy==2.1.4
srsly==0.0.6
thinc==7.0.4
toolz==0.9.0
tqdm==4.32.1
urllib3==1.25.3
waitress==1.2.1
wasabi==0.2.2
Any assistance is greatly appreciated!