Very strange epoch duration

damiano · June 28, 2019, 2:00pm

Hello, i am training my model with this code:

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]

with nlp.disable_pipes(*other_pipes):  # only train NER
    # reset and initialize the weights randomly – but only if we're
    # training a new model
    # nlp.begin_training()

for itn in range(N_ITER):
    random.shuffle(TRAIN_DATA)
    losses = {}
    # batch up the examples using spaCy's minibatch
    batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))

    for batch in batches:

        texts, annotations = zip(*batch)

        nlp.update(
            texts,  # batch of texts
            annotations,  # batch of annotations
            drop=0.2,  # dropout - make it harder to memorise data
            losses=losses,
        )

    print("Losses:", losses, itn)

    # Save model
    output_dir = Path(OUTPUT + str(itn))
    if not output_dir.exists():
        output_dir.mkdir()
    nlp.to_disk(output_dir)
    print("Saved model to", output_dir)

Someone could explain how can i get this strange durations ?

drwxr-xr-x 4 root root 4.0K Jun 28 12:24 0
drwxr-xr-x 4 root root 4.0K Jun 28 13:31 1
drwxr-xr-x 4 root root 4.0K Jun 28 13:35 2

from 1 to 2 only 4 minutes ?

justindujardin · June 30, 2019, 6:43pm

Hi @damiano,

It’s hard to say much from your description. It seems like you’re highlighting the file times in a directory listing? They’re getting closer together as your training progresses, and you wonder why? Is this right?

Assuming that’s the case I’d guess that it might have to do with your minibatch sizes. You’re using the spaCy util compounding that will start out with very small batch sizes (4) and then during training get larger until it reaches (32.) This would probably explain why the training process starts out taking a long time per epoch and then gets faster as it progresses.

You can verify this by running a snippet:

from spacy.util import compounding
iter = compounding(4.0, 32.0, 1.001)
for i in range(100):
     print(next(iter))

Even after 100 batches, the batch size has only grown to:

4.4072278811487084
4.411635109029857
4.416046744138886

Topic		Replies	Views
Remarkable Difference Between Prodigy and Custom Training Times ner	5	1437	April 4, 2018
spacy pretrain epco_loss spacy	1	430	May 18, 2020
Training the NER pipeline component of an existing model ner , spacy , off-topic	2	910	September 14, 2021
slow training NER task on base model en_core_web_md usage , ner , spacy	1	518	January 9, 2022
Traning/validation in Textcat/ textcat , spacy , off-topic	0	1181	May 26, 2020

Very strange epoch duration

Related topics