Very strange epoch duration

Hello, i am training my model with this code:

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]

with nlp.disable_pipes(*other_pipes):  # only train NER
    # reset and initialize the weights randomly – but only if we're
    # training a new model
    # nlp.begin_training()

for itn in range(N_ITER):
    random.shuffle(TRAIN_DATA)
    losses = {}
    # batch up the examples using spaCy's minibatch
    batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))

    for batch in batches:

        texts, annotations = zip(*batch)

        nlp.update(
            texts,  # batch of texts
            annotations,  # batch of annotations
            drop=0.2,  # dropout - make it harder to memorise data
            losses=losses,
        )

    print("Losses:", losses, itn)

    # Save model
    output_dir = Path(OUTPUT + str(itn))
    if not output_dir.exists():
        output_dir.mkdir()
    nlp.to_disk(output_dir)
    print("Saved model to", output_dir)

Someone could explain how can i get this strange durations ?

drwxr-xr-x 4 root root 4.0K Jun 28 12:24 0
drwxr-xr-x 4 root root 4.0K Jun 28 13:31 1
drwxr-xr-x 4 root root 4.0K Jun 28 13:35 2

from 1 to 2 only 4 minutes ?

Hi @damiano,

It’s hard to say much from your description. It seems like you’re highlighting the file times in a directory listing? They’re getting closer together as your training progresses, and you wonder why? Is this right?

Assuming that’s the case I’d guess that it might have to do with your minibatch sizes. You’re using the spaCy util compounding that will start out with very small batch sizes (4) and then during training get larger until it reaches (32.) This would probably explain why the training process starts out taking a long time per epoch and then gets faster as it progresses.

You can verify this by running a snippet:

from spacy.util import compounding
iter = compounding(4.0, 32.0, 1.001)
for i in range(100):
     print(next(iter))

Even after 100 batches, the batch size has only grown to:

4.4072278811487084
4.411635109029857
4.416046744138886