How to set overrides?

klopez · June 2, 2022, 8:45pm

I am trying to override my batch size for training ner.

I have tried using:

python -m prodigy train task_3_concepts_related_ILN_GOLD --ner ct_images_75_25_500_REVIEW --label-stats --base-model en_core_web_trf --gpu-id 0 --training.batcher.size.start=32 --training.batcher.size.stop=128
ℹ Using GPU: 0

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
ℹ Using config from base model
✔ Generated training config

=========================== Initializing pipeline ===========================
✘ Error parsing config overrides
training -> batcher -> size -> start   not a section value that can be overwritten

and:


python -m prodigy train task_3_concepts_related_ILN_GOLD --ner ct_images_75_25_500_REVIEW --label-stats --base-model en_core_web_trf --gpu-id 0 --training.batch_size 128
ℹ Using GPU: 0

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
ℹ Using config from base model
✔ Generated training config

=========================== Initializing pipeline ===========================
✘ Config validation error
training -> batch_size   extra fields not permitted

{'train_corpus': 'corpora.train', 'dev_corpus': 'corpora.dev', 'seed': 0, 'gpu_allocator': None, 'dropout': 0.1, 'accumulate_gradient': 3, 'patience': 5000, 'max_epochs': 0, 'max_steps': 20000, 'eval_frequency': 1000, 'frozen_components': ['tagger', 'parser', 'attribute_ruler', 'lemmatizer'], 'before_to_disk': {'@misc': 'prodigy.todisk_cleanup.v1'}, 'annotating_components': [], 'logger': {'@loggers': 'prodigy.ConsoleLogger.v1'}, 'batch_size': 128, 'batcher': {'@batchers': 'spacy.batch_by_padded.v1', 'discard_oversize': True, 'get_length': None, 'size': 2000, 'buffer': 256}, 'optimizer': {'@optimizers': 'Adam.v1', 'beta1': 0.9, 'beta2': 0.999, 'L2_is_weight_decay': True, 'L2': 0.01, 'grad_clip': 1.0, 'use_averages': True, 'eps':1e-08, 'learn_rate': {'@schedules': 'warmup_linear.v1', 'warmup_steps': 250, 'total_steps': 20000, 'initial_rate': 5e-05}}, 'score_weights': {'tag_acc': None, 'dep_uas': None, 'dep_las': None, 'dep_las_per_type': None, 'sents_p': None, 'sents_r': None, 'sents_f': None, 'lemma_acc': None, 'ents_f': 0.16, 'ents_p': 0.0, 'ents_r': 0.0, 'ents_per_type': None, 'speed': 0.0}}

How can I check what the current batch size is and how can I customize it?

koaning · June 3, 2022, 7:13am

You can also customise the training procedure by giving Prodigy a config.cfg file for training. These can be generated from the spaCy docs. This might be a better way to explore hyperparameters because you'll have the full settings lists at your disposal.

I just generated a default config via the interface. That generated many settings, some of which are related to batch sizes. Here are two settings that I found.

[nlp]
# Default batch size to use with nlp.pipe and nlp.evaluate
batch_size = 1000

...

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

The first nlp.batch_size does not make a change to the training procedure. Rather, it determines the batch size which might speed up calls to nlp.pipe.

The [training.batcher.size] setting is a reference to a Thinc api which increases the batch size as the training procedure moves along. I'm assuming you're interested in changing this setting since it influences the training.

Let me know if this helps

koaning · June 3, 2022, 7:19am

I think what's happening here is that Prodigy assumes a default config.cfg that does not have a training.batch setting to override. Instead, you might be able to change the --training.batcher.size.start and --training.batcher.size.stop values if you're using the default NER config.

Topic		Replies	Views
PRODIDGY_CONFIG_OVERRIDES setting dropout batch_size and eval_frequency not working usage , solved , training	4	398	April 10, 2022
Flag --batch-size not recognized by prodigy train spacy , solved , nightly	3	922	May 20, 2021
Output model in Prodigy Nightly solved , nightly	5	808	March 20, 2021
Remarkable Difference Between Prodigy and Custom Training Times ner	5	1440	April 4, 2018
How to use GPU to accelerate the train of NER tasks? training	5	2437	August 25, 2021

How to set overrides?

Related topics