Prodigy 1.12.0 throws errors with ner.openai recipes

I tried to get the ner.openai recipes working on OSX (Silicon) and Linux (Intel), with Prodigy 1.12.0 using Python 3.9, 3.10,3.11. I tried the examples from the documentation page. However, I always get errors, see below for a trace.

./run.sh prodigy ner.openai.correct recipe-ner examples.jsonl --label dish,ingredient,equipment
Using 3 labels from model: dish, ingredient, equipment
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/site-packages/prodigy/__main__.py", line 63, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 868, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 153, in prodigy.core.Controller.from_components
  File "cython_src/prodigy/core.pyx", line 297, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/stream.pyx", line 178, in prodigy.components.stream.Stream.is_empty
  File "cython_src/prodigy/components/stream.pyx", line 193, in prodigy.components.stream.Stream.peek
  File "cython_src/prodigy/components/stream.pyx", line 326, in prodigy.components.stream.Stream._get_from_iterator
  File "/usr/local/lib/python3.9/site-packages/prodigy/recipes/openai/ner.py", line 235, in to_stream
    for ex in openai(stream, batch_size=batch_size, nlp=nlp):
  File "cython_src/prodigy/components/openai.pyx", line 242, in set_hashes
  File "cython_src/prodigy/components/openai.pyx", line 289, in format_suggestions
  File "cython_src/prodigy/components/preprocess.pyx", line 172, in add_tokens
  File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1538, in pipe
    for doc in docs:
  File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1582, in pipe
    for doc in docs:
  File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1579, in <genexpr>
    docs = (self._ensure_doc(text) for text in texts)
  File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1528, in <genexpr>
    docs_with_contexts = (
  File "cython_src/prodigy/components/preprocess.pyx", line 165, in genexpr
  File "cython_src/prodigy/components/openai.pyx", line 268, in stream_suggestions
  File "cython_src/prodigy/components/openai.pyx", line 695, in batch_sequence
  File "cython_src/prodigy/components/loaders.pyx", line 47, in _add_attrs
  File "cython_src/prodigy/components/filters.pyx", line 50, in filter_duplicates
  File "cython_src/prodigy/components/filters.pyx", line 21, in filter_empty
  File "cython_src/prodigy/components/loaders.pyx", line 41, in _rehash_stream
  File "cython_src/prodigy/components/source.pyx", line 693, in load_noop
  File "cython_src/prodigy/components/source.pyx", line 107, in __iter__
  File "cython_src/prodigy/components/source.pyx", line 108, in prodigy.components.source.Source.__iter__
  File "cython_src/prodigy/components/source.pyx", line 578, in read
  File "/usr/local/lib/python3.9/site-packages/srsly/_json_api.py", line 39, in json_loads
    return ujson.loads(data)
ValueError: Expected object or value

The OpenAI keys seem to be ok. Any idea?

That's very strange.

There have been a few updates since v1.12.0, the most recent version right now is v.1.12.4. So you could try that. But the part of the codebase that you're using hasn't been updated ... so while it might be worth trying ... something else might be going awry.

If I had to guess there might be something strange happening in your examples.jsonl file. But I'm assuming you're just passing text through?

Could you try running this with logging, possibly with the traceback setting? That might give us some more information to go on. Make sure that you remove any sensitive information before posting the logs on the forum though!

Yes, I thought so too and replaced the usual suspects like apostrophes and braces. JQ seems to like it, and I can't see anything wrong there. It's the example file:

{
  "text": "Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok."
}

The log output is:

12:40:12: INIT: Setting all logging levels to 20
12:40:12: RECIPE: Calling recipe 'ner.openai.correct'
12:40:12: RECIPE: Starting recipe ner.openai.correct
Using 3 labels from model: dish, ingredient, equipment
12:40:13: get_stream: Loading .jsonl file
12:40:13: get_stream: Rehashing stream
12:40:13: get_stream: Removing duplicates
12:40:13: CONFIG: Using config from global prodigy.json
12:40:13: VALIDATE: Validating components returned by recipe
12:40:13: CONTROLLER: Initialising from recipe
12:40:13: VALIDATE: Creating validator for view ID 'blocks'
12:40:13: VALIDATE: Validating Prodigy and recipe config
12:40:13: PREPROCESS: Tokenizing examples (running tokenizer only)
12:40:13: FILTER: Filtering duplicates from stream
12:40:13: FILTER: Filtering out empty examples for key 'text'

After that the trace appears.

Just to check, there are no locals() shown when you pass PRODIGY_LOG_LOCALS=1?

Ah! Found the bug :slightly_smiling_face: .

This is a JSON formatted file:

{
    "text": "Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok."
}

But what you want is JSONL.

{"text": "Sriracha sauce goes really well with hoisin stir fry, but you should add it after you use the wok."}

In JSONL each line should be a JSON object.

I'll make a note to see if we can catch this better in the future. I'll also make an update to the docs. The example that I added there isn't copy-paste fiendly, and that would've saved you a headacke here. Sorry about that!

ah, should have remembered that, my bad. Thanks for the reminder

Oh no worries! We could've totally provided you with a better error message here and the docs should be copy+paste friendly.

Thanks for letting us know :slightly_smiling_face:!