Model loading time

akshitasood63 · June 26, 2019, 7:38am

While using Active Learning(using Custom Model) with prodigy, I was wondering if there is a way by which we can keep on yielding out data to the UI, without waiting for model training to complete.
I am using the two base methods : “predict” to give out the data to the UI, and “Update” to train the model in batches after every ‘n’ sentences and make predictions every time the model is updated.
So every time the update function is training the model, we get ‘Loading…’ on the screen, and my model training time might be huge which leads to huge waiting time.
So, is there a workaround to yield the data without waiting for the model training to complete?

ines · June 26, 2019, 8:01am

Hi! Are you sure that what takes so long is the updating and not the predicting? The update callback shouldn’t necessarily be blocking and it’s just something that’ll run in the background, usually while you’re annotating the examples in the current queue.

However, there are several things that could make fetching a new batch of tasks take long: long texts in NER mode (because generating all possible analyses for the text will take a long time), large model with word vectors, rare labels that are almost never predicted, or lots of duplicates in the data (because most of the time will be spent skipping examples).

Which recipe are you using and which model are you starting with? And what’s the configured batch size?

akshitasood63 · June 26, 2019, 5:15pm

No, I am not sure, that was just a guess because I am predicting right after training the model (both training and predictions will be done in one process sequentially, and this process is spawned inside the Update function), the “predict” function will just yield out the sentences from the updated dictionary.
No, skipping lots of sentences shouldn’t be the reason, as the dictionary yielding out sentences always contains non-confident data only.
I am using a custom recipe, which yields out sentences in the batches of size 10, and I am starting with a blank model.

So, you mean to say that predict and update are 2 parallel processes? As far as I observed, both the update and predict function have same pid.

ines · June 27, 2019, 8:17am

Ah, sorry, I just realised that the way I described this was bad – just edited my original post. What I meant to say was: What happens asynchronously is the action on the front-end (sending answers, requesting a new batch) and the action on the back-end. The updating usually happens while you're annotating the current queue in the app, so it'd really have to take very long for it to have an impact like that.

This is definitely pretty strange if you're working with batches of only 10. How are you updating the model in your update callback? Are you updating it with all of the examples or only the current batch?

Topic		Replies	Views
active learning and update function ner , best-practices	1	1030	February 25, 2021
ner.teach keeps loading in the interface usage , solved	4	636	November 1, 2018
Prodigy Version 1.5.1 vs 1.4.2 api , solved	3	851	July 21, 2018
When is the model called and the scores updated in the textcat teach method textcat	1	347	August 19, 2022
Synchronous Batch Mode usage , custom , streams	4	553	November 4, 2020

Model loading time

Related topics