Automatically run train command

vaibhav · May 18, 2021, 10:43am

Hi

I dont know whether what I am trying to do is possible within prodigy or not, so please let me know.

What my ask here is that suppose I am working with ner.manual command with prodigy and have annotated suppose 500 paragraphs. Now I need prodigy to automatically run the train command on this annotated data and evaluate the model and store it along with the statistics in my local pc location. Also, when I again annotate 500 paragraphs, (in total now 1000 paragraphs annotated), I want this process to repeat, i.e., run the train command on already trained model and evaluate the model again and store it with some other name and statistics as well. Again repeat this process after 500 annotations.

Is it really possible or am I am being too ambitious?

ines · May 19, 2021, 4:17am

Hi! Just to make sure I understand the idea correctly, you want this to kinda happen automatically in the background, right?

It should definitely be possible to set this up, it just comes down to finding a clever solution that works and is efficient. The main thing to consider is that you probably want to run the training in a separate process, or even on a separate machine. In theory, you could make your recipe's update method check the count of the current dataset and trigger the training, even in a subprocess. But this can easily slow down your machine or get messy if something fails because it's all in a subprocess.

A potentially better option would be to have an external service that connects to your DB and keeps checking the dataset count (db.count_dataset) every X seconds and starts your training and data export if you have 500 examples, and so on. If you want to run it on a different machine, e.g. a server, you could use a shared remote database like MySQL or PostgreSQL, and connect to it from the annotation machine and the training machine.

vaibhav · May 20, 2021, 11:42am

Thanks Ines! I will try this out.

Topic		Replies	Views
How to continue anotate in saved dataset? usage , database , solved	1	377	February 24, 2022
Adding new data to be annotated without re-starting the server usage , database	10	402	November 3, 2023
batch-train from the UI usage	2	952	February 1, 2019
how to update records for annotations in realtime database , solved , streams	1	569	June 14, 2022
Re-annotating records usage , database , streams	4	593	May 5, 2020

Automatically run train command

Related topics