Trailing data

Hello!
I created prodigy format as was said here:

Then I saved it using both variants with new line and srsly like it was said here:

But still I have error:

python -m prodigy ner.manual output_db de_core_news_sm danil_test.jsonl --loader json -l PRODNAME,MTRL,ENNUM,TEMPER
Using 4 label(s): PRODNAME, MTRL, ENNUM, TEMPER
Traceback (most recent call last):
  File "..\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "..\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "..\python files\repositories\d2b_prodigy\venv\lib\site-packages\prodigy\__main__.py", line 53, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src\prodigy\core.pyx", line 331, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src\prodigy\core.pyx", line 353, in prodigy.core._components_to_ctrl
  File "cython_src\prodigy\core.pyx", line 142, in prodigy.core.Controller.__init__
  File "cython_src\prodigy\components\feeds.pyx", line 56, in prodigy.components.feeds.SharedFeed.__init__
  File "cython_src\prodigy\components\feeds.pyx", line 155, in prodigy.components.feeds.SharedFeed.validate_stream
  File "..\python files\repositories\d2b_prodigy\venv\lib\site-packages\toolz\itertoolz.py", line 376, in first
    return next(iter(seq))
  File "cython_src\prodigy\components\preprocess.pyx", line 128, in add_tokens
  File "cython_src\prodigy\components\filters.pyx", line 37, in filter_duplicates
  File "cython_src\prodigy\components\filters.pyx", line 13, in filter_empty
  File "cython_src\prodigy\components\loaders.pyx", line 24, in _rehash_stream
  File "cython_src\prodigy\components\loaders.pyx", line 157, in JSON
  File "..\python files\repositories\d2b_prodigy\venv\lib\site-packages\srsly\_json_api.py", line 38, in json_loads
    return ujson.loads(data)
ValueError: Trailing data

Both files i got look like this:

What can I do to get proper jsonl format for prodigy to read it?

Hi! I think the problem here is that when you run the command, you've set --loader json, even though your file is a JSONL file. So under the hood, Prodigy will try to load it as JSON, and that will fail because the data is newline-delimited JSON.

You should be able to just remove the loader and Prodigy will guess it correctly from the file extension.

1 Like

Thank you, that helped!