(nightly) OSError: [E053] Could not read config.cfg

Hi,

I just joined the nightly program of Prodigy, first thank's for this amazing tool !

I have encountered this issue when I try to run train on an ner dataset.
The dataset has been exported from the stable version of prodigy with db-out and reimported in a new environment with db-in, I don't know if it can have an impact.

~/WebstormProjects/ml_labelization master ?13 > prodigy train --ner text_annotation_master                                                                                      py ml_labelization
ℹ Using CPU

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/__main__.py", line 54, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 479, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/recipes/train.py", line 218, in train
    config = prodigy_config(
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/recipes/train.py", line 100, in prodigy_config
    corpus_config = spacy.util.load_config(CONFIG_READER_PATH)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/spacy/util.py", line 545, in load_config
    raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from /home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/default_config_reader.cfg

Thank's

Thanks and sorry about that, this looks like a Python packaging issue! The .cfg file seems to have been excluded during packaging.

I'm just building wheels for the next nightly that fixes this but in the meantime, you should be able to work around this by creating the file manually. Put the following in a file default_config_reader.cfg:

[corpora]
@readers = "prodigy.MergedCorpus.v1"
# Percentage of examples held back for evaluation
eval_split = 0.2
# Percentage of examples to use (e.g. 0.5 will use half)
sample_size = 1.0

[corpora.textcat]
@readers = "prodigy.TextCatCorpus.v1"
datasets = []
eval_datasets = []
exclusive = false

[corpora.ner]
@readers = "prodigy.NERCorpus.v1"
datasets = []
eval_datasets = []
missing_tag = "O"

[corpora.parser]
@readers = "prodigy.ParserCorpus.v1"
datasets = []
eval_datasets = []

[corpora.tagger]
@readers = "prodigy.TaggerCorpus.v1"
datasets = []
eval_datasets = []
missing_value = ""

... and then put that file here:

/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/default_config_reader.cfg

Thank's for the quick answer, after puting the new file I have this error:

catalogue.RegistryError: [E893] Could not find function 'prodigy.MergedCorpus.v1' in function registry 'readers'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.

Available names: spacy.Corpus.v1, spacy.JsonlCorpus.v1, spacy.read_labels.v1, srsly.read_json.v1, srsly.read_jsonl.v1, srsly.read_msgpack.v1, srsly.read_yaml.v1

Just released a new nightly update that should fix this! :slightly_smiling_face:

1 Like

Thank's, it works now but I have another issue:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/__main__.py", line 54, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 505, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/recipes/train.py", line 235, in train
    return _train(
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/prodigy/recipes/train.py", line 166, in _train
    spacy_train(nlp, output_dir, use_gpu=gpu_id, stdout=stdout)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/spacy/training/loop.py", line 90, in train
    clean_output_dir(output_path)
  File "/home/workstation/WebstormProjects/ml_labelization/venv/lib/python3.8/site-packages/spacy/training/loop.py", line 340, in clean_output_dir
    if path is not None and path.exists():
AttributeError: 'str' object has no attribute 'exists'

I fixed it by passing

spacy_train(nlp, Path(output_dir), use_gpu=gpu_id, stdout=stdout)

instead of

spacy_train(nlp, output_dir, use_gpu=gpu_id, stdout=stdout)

Line 166 in prodigy/recipes/train.py

1 Like

Okay, fixed in v1.11.0a3 :sweat_smile:

1 Like

I just realized that it doesn't work anymore if we don't pass an output directory in command line, here is the whole _train function fixed which is working for me:

def _train(
    config: Config,
    *,
    output_dir: Optional[Union[str, Path]] = None,
    gpu_id: int,
    overrides: Dict[str, Any],
    silent: bool = False,
) -> Tuple[Optional[Dict[str, Any]], Optional[Dict[str, Any]]]:
    # This is a small hack, but we're basically registering a custom logger so
    # we can track the full stats on each log step
    BASELINE = None
    SCORES = None
    output_path = Path(output_dir) if output_dir else None
    @spacy.registry.loggers("prodigy.ConsoleLogger.v1")
    def console_logger(progress_bar: bool = False):
        spacy_setup_printer = spacy_console_logger(progress_bar)

        def setup_printer(
            nlp: Language, stdout: IO = sys.stdout, stderr: IO = sys.stderr
        ):
            spacy_log_step, finalize = spacy_setup_printer(nlp, stdout, stderr)

            def log_step(info: Optional[Dict[str, Any]]) -> None:
                nonlocal SCORES, BASELINE
                if info:
                    if BASELINE is None:
                        BASELINE = info
                    SCORES = info
                spacy_log_step(info)

            return log_step, finalize

        return setup_printer

    msg.divider("Initializing pipeline", show=not silent)
    # TODO: Should we add a before_to_disk callback that removes this again?
    config["training"]["logger"] = {"@loggers": "prodigy.ConsoleLogger.v1"}
    with show_validation_error(None, hint_fill=False):
        config = load_config_from_str(config.to_str(), overrides=overrides)
        nlp = spacy_init_nlp(config, use_gpu=gpu_id)
    msg.good("Initialized pipeline", show=not silent)
    msg.divider("Training pipeline", show=not silent)
    stdout = sys.stdout if not silent else open(os.devnull, "w")
    try:
        spacy_train(nlp, output_path, use_gpu=gpu_id, stdout=stdout)
        return BASELINE, SCORES
    except KeyboardInterrupt:
        msg.warn("Aborted", exits=0)
    return BASELINE, SCORES

Sorry :sweat_smile:

Sorry, I should have tested this properly! I'll push another update.