Hey
I have used Prodigy (ner.manual and ner.correct recipes used) to produce an annotated NER/span dataset (extracting qualification entities from job descriptions) and having given the annotation strategy some further thought, would like to subsume one of the classes under another. I.e. a class I thought was useful at the outset is actually not so useful, and those instances would be better labelled as one of the other, more common classes.
I have used the db-out recipe, opened the .jsonl file and search-replaced the original label with the one I'd now like to have. I then use the db-in recipe on this file to create a corrected dataset, which works correctly and has the correct number of instances.
However, I then try to use this dataset to train a spancat model and receive the following error trace:
Auto-generating config with spaCy
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2288.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2288.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\prodigy\__main__.py", line 61, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\prodigy\recipes\train.py", line 261, in train
train_config = prodigy_config(
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\prodigy\recipes\train.py", line 112, in prodigy_config
config = generate_default_config(pipes, lang, base_nlp, silent=silent)
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\prodigy\recipes\train.py", line 571, in generate_default_config
suggester = infer_spancat_suggester(examples, nlp)
File "C:\Projects\PROJECT-NAME\prodigy_env\lib\site-packages\prodigy\recipes\data_utils.py", line 959, in infer_spancat_suggester
char_span = doc.char_span(span["start"], span["end"])
TypeError: string indices must be integers
All I can think is that I've somehow altered the input data, but I can't work out how I might have done this.
I'd really appreciate any help