If there is a failure during an update() new examples aren't comitted to database

As a general design thing - when new data is sent to update(...) naturally we would like to trigger a retrain of the model.

The issue is that if something happens (say a CUDA OOM, which isn't uncommon), the update never returns - and we lose those new examples forever.

I've noticed that there is a before_db() function call. Is update() post transaction with the database?

If not could we have an after_db() function added?

Hi! If updating or saving to the database fails for whatever reason, the outgoing batch of answers will stay on the client. So if you haven't closed the browser, should always be able to restart the server and re-submit the annotations by hitting "save" in the UI.

The update function is called before the answers are placed in the database. The before_db callback lets you modify examples you store (which doesn't affect the examples you're updating from). This ist mostly relevant if the JSON data you're sending around includes things like base64-encoded image data that you need in the UI and maybe for updating a model, but don't want to store in the database.

Thanks Ines,

I know that currently we don't have the option of an after_db callback. But, as a feature request that would be really handy in a future version. In this way we can handle retraining over the full data set once we know 100% it's in the database.

Our current "solution" is to fork out a sub-process so that update returns almost instantly - but there is an obvious race condition here (e.g., when we go to re-run our full training loop there's a non-zero chance the database hasn't yet finished inserting all the updated annotations).

So if I understand your requirement correctly, you basically want to perform some action after the examples have been saved to the database, right? Instead of your subprocess approach, a potentially much simpler solution could be to just take care of saving to the database yourself in the update callback and set "dataset": False in the recipe. Under the hood, Prodigy mostly just calls into add_examples, so you could also do this:

def update(answers):
    db.add_examples(answers, datasets=[dataset])

Would this work for you?

1 Like

Thanks that should do the job!
Sorry I didn't see this update.

1 Like