Hi,
I'm running an annotation project with Prodigy and we want to run a second pass over our annotations. In order to do this, I would need to load back into Prodigy a copy of our first round of annotations. I read elsewhere on this forum that I should just be able to run the command line script with output from the previous session and get those annotations highlighted in the new session. However, when I try to do that, I just get the following error
Traceback (most recent call last):
File "/home/USER/miniconda3/envs/prodigy/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/USER/miniconda3/envs/prodigy/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/USER/miniconda3/envs/prodigy/lib/python3.10/site-packages/prodigy/__main__.py", line 62, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 389, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src/prodigy/core.pyx", line 73, in prodigy.core.Controller.from_components
File "cython_src/prodigy/core.pyx", line 170, in prodigy.core.Controller.__init__
File "cython_src/prodigy/components/feeds.pyx", line 104, in prodigy.components.feeds.Feed.__init__
File "cython_src/prodigy/components/feeds.pyx", line 150, in prodigy.components.feeds.Feed._init_stream
File "cython_src/prodigy/components/stream.pyx", line 107, in prodigy.components.stream.Stream.__init__
File "cython_src/prodigy/components/stream.pyx", line 58, in prodigy.components.stream.validate_stream
File "cython_src/prodigy/components/preprocess.pyx", line 168, in add_tokens
File "cython_src/prodigy/components/preprocess.pyx", line 264, in prodigy.components.preprocess._add_tokens
File "cython_src/prodigy/components/preprocess.pyx", line 226, in prodigy.components.preprocess.sync_spans_to_tokens
TypeError: string indices must be integers
What might be causing the error? I'm using this script to run Prodigy:
PRODIGY_PORT=8082 prodigy ner.manual database en_core_web_sm data/for_prodigy/data.jsonl --label data/labels
Each entry in the jsonl file I load looks like this (with specific data anonymized here):
{'tokens': <STR LIST OF TOKENS>, 'tags': <STR LIST OF TAGS>], 'spans': [{'start': 13, 'end': 30, 'token_start': 3, 'token_end': 5, 'label': <STR LABEL>}, {'text': <STR TOKEN> 'start': 31, 'end': 37, 'pattern': 1031155696, 'token_start': 6, 'token_end': 6, 'label': <STR LABEL>}, {'text': <STR TOKEN> 'start': 38, 'end': 43, 'pattern': -847644489, 'token_start': 7, 'token_end': 7, 'label': <STR LABEL>}, {'start': 52, 'end': 60, 'token_start': 10, 'token_end': 10, 'label': <STR LABEL>}], 'text': <STR TEXT>}
Any help on this would be much appreciated and happy to post further information.