How to set overrides?

I am trying to override my batch size for training ner.

I have tried using:

python -m prodigy train task_3_concepts_related_ILN_GOLD --ner ct_images_75_25_500_REVIEW --label-stats --base-model en_core_web_trf --gpu-id 0 --training.batcher.size.start=32 --training.batcher.size.stop=128
ℹ Using GPU: 0

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
ℹ Using config from base model
✔ Generated training config

=========================== Initializing pipeline ===========================
✘ Error parsing config overrides
training -> batcher -> size -> start   not a section value that can be overwritten

and:


python -m prodigy train task_3_concepts_related_ILN_GOLD --ner ct_images_75_25_500_REVIEW --label-stats --base-model en_core_web_trf --gpu-id 0 --training.batch_size 128
ℹ Using GPU: 0

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
ℹ Using config from base model
✔ Generated training config

=========================== Initializing pipeline ===========================
✘ Config validation error
training -> batch_size   extra fields not permitted

{'train_corpus': 'corpora.train', 'dev_corpus': 'corpora.dev', 'seed': 0, 'gpu_allocator': None, 'dropout': 0.1, 'accumulate_gradient': 3, 'patience': 5000, 'max_epochs': 0, 'max_steps': 20000, 'eval_frequency': 1000, 'frozen_components': ['tagger', 'parser', 'attribute_ruler', 'lemmatizer'], 'before_to_disk': {'@misc': 'prodigy.todisk_cleanup.v1'}, 'annotating_components': [], 'logger': {'@loggers': 'prodigy.ConsoleLogger.v1'}, 'batch_size': 128, 'batcher': {'@batchers': 'spacy.batch_by_padded.v1', 'discard_oversize': True, 'get_length': None, 'size': 2000, 'buffer': 256}, 'optimizer': {'@optimizers': 'Adam.v1', 'beta1': 0.9, 'beta2': 0.999, 'L2_is_weight_decay': True, 'L2': 0.01, 'grad_clip': 1.0, 'use_averages': True, 'eps':1e-08, 'learn_rate': {'@schedules': 'warmup_linear.v1', 'warmup_steps': 250, 'total_steps': 20000, 'initial_rate': 5e-05}}, 'score_weights': {'tag_acc': None, 'dep_uas': None, 'dep_las': None, 'dep_las_per_type': None, 'sents_p': None, 'sents_r': None, 'sents_f': None, 'lemma_acc': None, 'ents_f': 0.16, 'ents_p': 0.0, 'ents_r': 0.0, 'ents_per_type': None, 'speed': 0.0}}

How can I check what the current batch size is and how can I customize it?

You can also customise the training procedure by giving Prodigy a config.cfg file for training. These can be generated from the spaCy docs. This might be a better way to explore hyperparameters because you'll have the full settings lists at your disposal.

I just generated a default config via the interface. That generated many settings, some of which are related to batch sizes. Here are two settings that I found.

[nlp]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 1000

...

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

The first nlp.batch_size does not make a change to the training procedure. Rather, it determines the batch size which might speed up calls to nlp.pipe.

The [training.batcher.size] setting is a reference to a Thinc api which increases the batch size as the training procedure moves along. I'm assuming you're interested in changing this setting since it influences the training.

Let me know if this helps :slightly_smiling_face:

I think what's happening here is that Prodigy assumes a default config.cfg that does not have a training.batch setting to override. Instead, you might be able to change the --training.batcher.size.start and --training.batcher.size.stop values if you're using the default NER config.