Error while running a variant of the old ner_manual recipe in Python3.12

Hi,

I'm running a modified version of the recipe at prodigy-recipes/ner/ner_manual.py at master · explosion/prodigy-recipes · GitHub (changed the name and made some other small modifications). While running the recipe, I encounter an error.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/sina/Documents/Earnin/projects/Payroll/prodigy/lib/python3.12/site-packages/prodigy/main.py", line 50, in
main()
File "/Users/sina/Documents/Earnin/projects/Payroll/prodigy/lib/python3.12/site-packages/prodigy/main.py", line 44, in main
controller = run_recipe(run_args)
^^^^^^^^^^^^^^^^^^^^
File "cython_src/prodigy/cli.pyx", line 135, in prodigy.cli.run_recipe
File "cython_src/prodigy/core.pyx", line 155, in prodigy.core.Controller.from_components
File "cython_src/prodigy/core.pyx", line 307, in prodigy.core.Controller.init
File "cython_src/prodigy/components/stream.pyx", line 191, in prodigy.components.stream.Stream.is_empty
File "cython_src/prodigy/components/stream.pyx", line 230, in prodigy.components.stream.Stream.peek
File "cython_src/prodigy/components/stream.pyx", line 343, in prodigy.components.stream.Stream._get_from_iterator
File "cython_src/prodigy/components/source.pyx", line 755, in load_noop
File "cython_src/prodigy/components/source.pyx", line 109, in iter
File "cython_src/prodigy/components/source.pyx", line 110, in prodigy.components.source.Source.iter
File "cython_src/prodigy/components/source.pyx", line 365, in read
File "cython_src/prodigy/components/decorators.pyx", line 118, in inner
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/inspect.py", line 3242, in bind
return self._bind(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/inspect.py", line 3231, in _bind
raise TypeError(
TypeError: got an unexpected keyword argument 'use_chars'

If I change the stream = add_tokens(nlp, stream, use_chars=highlight_chars) to stream = add_tokens(nlp, stream), the error clears and I can start the app, but I need the character-level annotation option to be available.

Any idea what may be the root cause and how I could solve this?

Thanks!

Welcome to the forim @sinazahedi!

I need to admit that that we have fallen behind a bit with updating the OS Prodigy recipes repo to the latest Prodigy API. Sorry about that! The highlight-char feature has been re-implemented as a front-end toggle and requires a slightly different config now. I've updated the OS ner.manual recipe now so it should work as is . You'll see it's just adding the ner_manual_highlight_chars key to the recipe config (apart from removing the unnecessary argument to to add_tokens like you did already.

I also wanted to add as a reminder that the character level highlighting will result in spans that, most likely, will not be aligned with current token boundaries. In order to train a spaCy model on the data annotated on the character level, you'll need to adjust the tokenization in post-processing to make sure that span boundaries coincide with tokens boundaries.
Hopefully, there are clear patterns and the tokenization can be adjusted via custom tokenization rules implemented e.g. as custom spaCy tokenizer which can be easily integrated in a pipeline.
However, if character-level annotation is required to compensate for the errors in data preprocessing e.g. amalgamated words or weird spacing, then it's really recommended to fix the preprocessing because such modifications will be hard to capture by tokenization rules.

Thank you! It's working after applying the changes.

1 Like