Flag --batch-size not recognized by prodigy train

SofieVL · May 18, 2021, 3:49pm

Hi!

prodigy train has been updated significantly in v1.11 to make use of the new and more powerful spacy train command that was released with spaCy v3. This new training command generates a config file that contains all the settings for training.

With Prodigy v1.11, you have two main options to use this training command:

You call prodigy data_to_spacy to convert the Prodigy datasets into a format suitable for training with spaCy. This command will generate an output directory with all the relevant data files AND a default configuration file. You can then edit that configuration file as you see fit.

You can find some more information on how the configuration file is structured in the spaCy docs, more specifically there is a section on the training part of it. Here, you'll find that there is a batcher setting that typically looks something like this:

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

What this means, is that by default a compounding batch size is used, and you can edit the start and stop values, or any other setting, according to your use-case.

Once you're done editing the config, you can call spacy train with it:

python -m spacy train output/config.cfg --paths.train output/train.spacy --paths.dev output/dev.spacy

As a second option, you can call prodigy train and the training will start immediately. Behind the scenes, there will still be a config file that gets generated with default values. To overwrite those values, you can pass them on the command line directly like so:

prodigy train output ... --training.batcher.size.start=35 --training.batcher.size.stop=50

Once you're starting to overwrite the config like this though, I'd personally advice working with data_to_spacy instead and editing the config file yourself. If you want a steady batch size, you could for instance remove the whole compounding block and just make it something like this:

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null
size = 50

Hope that clarifies things

Topic		Replies	Views
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3505	June 2, 2020
How to set overrides? usage , ner , training	2	690	June 3, 2022
ner.batch_train vs spacy nlp.begin_training ner , spacy	1	1099	January 26, 2018
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1212	July 2, 2020
PRODIDGY_CONFIG_OVERRIDES setting dropout batch_size and eval_frequency not working usage , solved , training	4	400	April 10, 2022

Flag --batch-size not recognized by prodigy train

Related topics