If there is a failure during an update() new examples aren't comitted to database

grantus · March 9, 2021, 11:30pm

As a general design thing - when new data is sent to update(...) naturally we would like to trigger a retrain of the model.

The issue is that if something happens (say a CUDA OOM, which isn't uncommon), the update never returns - and we lose those new examples forever.

I've noticed that there is a before_db() function call. Is update() post transaction with the database?

If not could we have an after_db() function added?

ines · March 10, 2021, 2:15am

Hi! If updating or saving to the database fails for whatever reason, the outgoing batch of answers will stay on the client. So if you haven't closed the browser, should always be able to restart the server and re-submit the annotations by hitting "save" in the UI.

The update function is called before the answers are placed in the database. The before_db callback lets you modify examples you store (which doesn't affect the examples you're updating from). This ist mostly relevant if the JSON data you're sending around includes things like base64-encoded image data that you need in the UI and maybe for updating a model, but don't want to store in the database.

grantus · March 10, 2021, 4:17am

Thanks Ines,

I know that currently we don't have the option of an after_db callback. But, as a feature request that would be really handy in a future version. In this way we can handle retraining over the full data set once we know 100% it's in the database.

Our current "solution" is to fork out a sub-process so that update returns almost instantly - but there is an obvious race condition here (e.g., when we go to re-run our full training loop there's a non-zero chance the database hasn't yet finished inserting all the updated annotations).

ines · March 10, 2021, 10:16am

So if I understand your requirement correctly, you basically want to perform some action after the examples have been saved to the database, right? Instead of your subprocess approach, a potentially much simpler solution could be to just take care of saving to the database yourself in the update callback and set "dataset": False in the recipe. Under the hood, Prodigy mostly just calls into add_examples, so you could also do this:

def update(answers):
    db.add_examples(answers, datasets=[dataset])
    your_logic_here(answers)

Would this work for you?

grantus · March 16, 2021, 12:18am

Thanks that should do the job!
Sorry I didn't see this update.

Topic		Replies	Views
Getting access to annotations before placed in db usage , database , custom , solved	8	2038	October 31, 2019
Save annotations with update method / Fail gracefully usage , solved	6	486	May 31, 2022
dataset: False in custom recipe bug , streams	2	343	August 25, 2021
Auto Save usage , done , front-end , solved	22	12230	May 26, 2023
How to modify behavior of receive_answers( ) usage , solved , server	5	954	June 12, 2020

If there is a failure during an update() new examples aren't comitted to database

Related topics