PRODIDGY_CONFIG_OVERRIDES setting dropout batch_size and eval_frequency not working

mwade625 · April 1, 2022, 2:46pm

I have setup PRODIGY_CONFIG_OVERRIDES as follows:

export PRODIGY_CONFIG_OVERRIDES='{"batch_size":32, "dropout":0.3, "eval_frequency":100}'

When I run the training using prodigy and the logging set to VERBOSE,

10:39:23: INIT: Setting all logging levels to 10
10:39:23: RECIPE: Calling recipe 'train'
Using GPU: 0

========================= Generating Prodigy config =========================
Auto-generating config with spaCy
10:39:24: CONFIG: Using config from global prodigy.json
/home/xxxxxxxx/.prodigy/prodigy.json

10:39:24: CONFIG: Merging config from CLI overrides
{'batch_size': 32, 'dropout': 0.30000000000000004, 'eval_frequency': 100}

However, the program then proceeds as if none of the settings are changed. The generated config.cfg file for the model also does not show the batch_size, dropout and eval_frequency settings that I have set in the overrides.

Is there anyway to actually CONFIRM that it using these settings? Also, I have tried adding the section names as a prefix to the settings and no change: (e.g. "training.eval_frequency": 100)

Thanks,

Michael Wade

P.S. I am only setting the eval_frequency so I can see if it is picking up my overrides.

ines · April 2, 2022, 9:01am

Hi! The problem here is that the PRODIGY_CONFIG_OVERRIDES setting is intended to overwrite the Prodigy configuration, i.e. everything that's typically in your prodigy.json: https://prodi.gy/docs/install#config

In your case, it sounds like you want to overwrite spaCy training config parameters. You can do this just like you would in spacy train, .i.e. by providing them as CLI overrides: https://prodi.gy/docs/install#config For example:

prodigy train --ner foo,bar --training.batch_size 32

mwade-noetic · April 7, 2022, 3:33pm

I went ahead and tried it using using the following command line:

prodigy train --label-stats --ner drd_interrog_ner_gold,drd_interrog_ner_silver ./prod_models_silver10_trf --gpu-id 0 --verbose --base-model en_core_web_trf --training.batch_size 32

I then get the following error:

=========================== Initializing pipeline ===========================
✘ Config validation error
training -> batch_size extra fields not permitted

{'train_corpus': 'corpora.train', 'dev_corpus': 'corpora.dev', 'seed': 0, 'gpu_allocator': None, 'dropout': 0.1, 'accumulate_gradient': 3, 'patience': 5000, 'max_epochs': 0, 'max_steps': 20000, 'eval_frequency': 1000, 'frozen_components': ['tagger', 'parser', 'attribute_ruler', 'lemmatizer'], 'before_to_disk': {'@misc': 'prodigy.todisk_cleanup.v1'}, 'annotating_components': [], 'logger': {'@loggers': 'prodigy.ConsoleLogger.v1'}, 'batch_size': 32, 'batcher': {'@batchers': 'spacy.batch_by_padded.v1', 'discard_oversize': True, 'get_length': None, 'size': 2000, 'buffer': 256}, 'optimizer': {'@optimizers': 'Adam.v1', 'beta1': 0.9, 'beta2': 0.999, 'L2_is_weight_decay': True, 'L2': 0.01, 'grad_clip': 1.0, 'use_averages': True, 'eps': 1e-08, 'learn_rate': {'@schedules': 'warmup_linear.v1', 'warmup_steps': 250, 'total_steps': 20000, 'initial_rate': 5e-05}}, 'score_weights': {'tag_acc': None, 'dep_uas': None, 'dep_las': None, 'dep_las_per_type': None, 'sents_p': None, 'sents_r': None, 'sents_f': None, 'lemma_acc': None, 'ents_f': 0.16, 'ents_p': 0.0, 'ents_r': 0.0, 'ents_per_type': None, 'speed': 0.0}}

If I don't use the --training.batch_size it goes ahead and runs.

mwade-noetic · April 8, 2022, 11:56am

Just as a test, I ran your command line exactly as input and I got the same error message about "extra fields not permitted". Just for clarity I am using version 1.11.7 of Prodigy on Ubuntu 21.10 and python 3.9.7 and spacy is 3.2.3.

prod

ines · April 10, 2022, 10:17am

Ah sorry, I had just copied over your overrides without actually checking that they're valid config settings. spaCy's config doesn't actually have a batch_size setting in the [training] block: https://spacy.io/api/data-formats#config

So it's expected that spaCy complains here. You can customise the batching for nlp.pipe and nlp.evaluate via nlp.batch_size, and the batching strategy during training via training.batcher: https://spacy.io/api/top-level#batchers

Topic		Replies	Views
Flag --batch-size not recognized by prodigy train spacy , solved , nightly	3	920	May 20, 2021
How to set overrides? usage , ner , training	2	680	June 3, 2022
Output model in Prodigy Nightly solved , nightly	5	806	March 20, 2021
1.11.0: Incorrect generation of config files? done , spacy , training	4	579	August 17, 2021
Where can I find explanation of what every parameter in config.cfg means? usage , spacy	1	414	October 31, 2021

PRODIDGY_CONFIG_OVERRIDES setting dropout batch_size and eval_frequency not working

Related topics