Data-to-spacy with custom config

Hi,
When trying to export the data-to-spacy with an existing config, it gives the following error:

    if pipe not in config["components"]:
TypeError: string indices must be integers

Looking at the train.py file, I found the problem to be that around line 493, the config is generated if there is nothing passed in, but it's not loaded if a file is given (like for the other recipes in the same file).

trying to export the data-to-spacy with an existing config

Could you share the command that you tried running, as well as the existing config that you're referring to? If you see an error, could you also share the full traceback? I'm also wondering, are you using a standard config file that spaCy generated or did you add a custom one? If you added a custom one, could you share it as well?

Also, what version of Prodigy are you running?

Sorry for the really late reply, my email notifications weren't enabled.

I'm running the following command:

 python -m prodigy data-to-spacy nl_driving_license_data --textcat-multilabel nl_driving_license --verbose --config .\nl_driving_license.cfg --lang nl
ℹ Using language 'nl'

============================== Generating data ==============================
Components: textcat_multilabel
Merging training and evaluation data for 1 components
  - [textcat_multilabel] Training: 487 | Evaluation: 121 (20% split)
Training: 487 | Evaluation: 121
Labels: textcat_multilabel (7)
  - [textcat_multilabel] B, No license, C, CE, E, BE, D
✔ Saved 487 training examples
nl_driving_license_data\train.spacy
✔ Saved 121 evaluation examples
nl_driving_license_data\dev.spacy

============================= Generating config =============================
Traceback (most recent call last):
  File "C:\Users\Roland\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Roland\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Work\staa\.venv\lib\site-packages\prodigy\__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src\prodigy\core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "D:\Work\staa\.venv\lib\site-packages\plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "D:\Work\staa\.venv\lib\site-packages\plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "D:\Work\staa\.venv\lib\site-packages\prodigy\recipes\train.py", line 494, in data_to_spacy
    config = generate_config(config, base_nlp, base_model, list(pipes))
  File "D:\Work\staa\.venv\lib\site-packages\prodigy\recipes\train.py", line 584, in generate_config
    if pipe not in config["components"]:
TypeError: string indices must be integers

I'm using a standard config file generated by Spacy.

This is the diff of the fix:

diff --git a/.venv/Lib/site-packages/prodigy/recipes/train.py b/.venv/Lib/site-packages/prodigy/recipes/train.py
--- a/.venv/Lib/site-packages/prodigy/recipes/train.py	(date 1654615364487)
+++ b/.venv/Lib/site-packages/prodigy/recipes/train.py	
@@ -492,6 +492,8 @@
     base_nlp = nlp if base_model is not None else None
     if config is None:
         config = generate_default_config(pipes, lang, base_nlp)
+    else:
+        config = load_config(config)
     config = generate_config(config, base_nlp, base_model, list(pipes))
     msg.good("Generated training config")
1 Like

I just confirmed everything locally, and I 'gotta say ... high-five to you! Not only have you found a bug you also seem to have found a fix.

I'll work on a PR and a test to ensure this does not happen again. Thanks for reporting it!

1 Like