Automatically run train command

Hi

I dont know whether what I am trying to do is possible within prodigy or not, so please let me know.

What my ask here is that suppose I am working with ner.manual command with prodigy and have annotated suppose 500 paragraphs. Now I need prodigy to automatically run the train command on this annotated data and evaluate the model and store it along with the statistics in my local pc location. Also, when I again annotate 500 paragraphs, (in total now 1000 paragraphs annotated), I want this process to repeat, i.e., run the train command on already trained model and evaluate the model again and store it with some other name and statistics as well. Again repeat this process after 500 annotations.

Is it really possible or am I am being too ambitious?

Hi! Just to make sure I understand the idea correctly, you want this to kinda happen automatically in the background, right?

It should definitely be possible to set this up, it just comes down to finding a clever solution that works and is efficient. The main thing to consider is that you probably want to run the training in a separate process, or even on a separate machine. In theory, you could make your recipe's update method check the count of the current dataset and trigger the training, even in a subprocess. But this can easily slow down your machine or get messy if something fails because it's all in a subprocess.

A potentially better option would be to have an external service that connects to your DB and keeps checking the dataset count (db.count_dataset) every X seconds and starts your training and data export if you have 500 examples, and so on. If you want to run it on a different machine, e.g. a server, you could use a shared remote database like MySQL or PostgreSQL, and connect to it from the annotation machine and the training machine.

Thanks Ines! I will try this out.