Understanding training output for textcat_multilabel - steps vs epochs

nix411 · February 8, 2022, 3:27pm

I'm trying to understand the training output for textcat_multilabel.

As I understand then an epoch means one iteration over all of the training data. In my case that's 24.404 documents. The training part of my config looks like this

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 20000
max_epochs = 0
max_steps = 60000
eval_frequency = 800
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
cats_score = 1.0
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null
cats_macro_auc_per_type = null

As I understand spacy.batch_by_words.v1 I have increasing batching size. Is it correctly understood that the documents are batched together into batches of a total size between a 100 words and 1000 words (unless the document is more than the current batch size then it'll be its own batch)? Assuming all my documents were more than 1000 words then I'd have a batch size of 1?

Now I started wondering about this because I saw that at step 16800 I reached the next epoch, which leaves me with an average batch size of 24404 / 16800 = ~1.45. Is that right? In general my documents are pretty big but performance are good so I don't need to chop it into smaller docs and average over. But maybe I could benefit from fiddling with the batching strategy. Any comments on that?

ines · February 13, 2022, 9:42am

Just a quick note, I think this question might be a better fit for the spaCy discussions forum: Discussions · explosion/spaCy · GitHub

nix411 · February 21, 2022, 12:27pm

For future reference; here is the discussion thread

Topic		Replies	Views
textcat.batch-train versus spacy classificaion example usage , textcat , spacy	4	543	March 30, 2019
Textcat_MultiLable - How doc[cats]=1 or 0 works while training the Model textcat , spacy	3	15	February 14, 2025
Slow training on multilabel textcats usage , textcat , spacy	9	839	November 19, 2021
Understanding outputs for new texcat model usage , textcat , spacy , training	3	833	October 19, 2021
Textcat results seems worse in new prodigy version usage , textcat , spacy , solved , training	4	648	August 30, 2021

Understanding training output for textcat_multilabel - steps vs epochs

Related topics