Understanding training output for textcat_multilabel - steps vs epochs

I'm trying to understand the training output for textcat_multilabel.

As I understand then an epoch means one iteration over all of the training data. In my case that's 24.404 documents. The training part of my config looks like this

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 20000
max_epochs = 0
max_steps = 60000
eval_frequency = 800
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
cats_score = 1.0
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null
cats_macro_auc_per_type = null

As I understand spacy.batch_by_words.v1 I have increasing batching size. Is it correctly understood that the documents are batched together into batches of a total size between a 100 words and 1000 words (unless the document is more than the current batch size then it'll be its own batch)? Assuming all my documents were more than 1000 words then I'd have a batch size of 1?

Now I started wondering about this because I saw that at step 16800 I reached the next epoch, which leaves me with an average batch size of 24404 / 16800 = ~1.45. Is that right? In general my documents are pretty big but performance are good so I don't need to chop it into smaller docs and average over. But maybe I could benefit from fiddling with the batching strategy. Any comments on that?

Just a quick note, I think this question might be a better fit for the spaCy discussions forum: Discussions · explosion/spaCy · GitHub

For future reference; here is the discussion thread