on_exit function to save the model's state

akshitasood63 · February 23, 2018, 5:50am

Can you please tell me the syntax of on_exit function so that I can save my model’s status for the next time.

ines · February 23, 2018, 10:58am

You can find the detailed API documentation of the individual recipe components in the PRODIGY_README.html, which is available for download with Prodigy.

The on_exit function takes the controller as its argument (which also gives you access to the database, session ID and other settings). Aside from that, you can structure it however you like. In custom recipes, working with closures is often very nice – for example:

@prodigy.recipe('custom-recipe')
def custom_recipe(dataset):  # etc.
    model = LOAD_YOUR_MODEL_HERE()
    # etc.
    
    def on_exit(controller):
        SAVE_YOUR_MODEL_HERE()

    return {
        'dataset': dataset,
        'on_exit': on_exit,
        # etc.
    }

Alternatively, you could also write a little reusable function like this:

def save_model(nlp, output_path):
    def on_exit(controller):
        nlp.to_disk(output_path)
    return on_exit

One thing worth noting (also in case others come across this issue later): The model you’re training during annotation is updated online and won’t be as good as a model trained with one of the batch-train recipes (which make several passes over the data, shuffle the data, set a dropout rate etc.). So saving the model in a recipe is totally fine if you want to keep reusing it in the next annotation session – but if you actually want to use the model, you should always batch train it on the collected annotations.

akshitasood63 · February 26, 2018, 5:25am

So, first I must train my model using ner.teach,
then save those annotations.
Then how should I use those annotations for batch train?

akshitasood63 · February 26, 2018, 5:30am

Should I use model saved using ner.teach in ner.batch-train ?

ines · February 26, 2018, 8:57am

No, the model you train during ner.teach is mainly used to help Prodigy suggest better examples for annotation. Those will then be saved in a dataset and the intermediate model will be discarded. You can then use the ner.batch-train reicipe to train a new model from the annotations in your dataset.

For example, let’s say you want to improve the PERSON entity of spaCy’s default English model and you’ve used ner.teach to collect a few hundred examples:

prodigy ner.teach person_dataset en_core_web_sm your_data.jsonl --label PERSON

Your annotations will then be saved to the dataset person_dataset. When you use ner.batch-train, you then specify the same dataset and model. This will batch train a model from the annotations in the set. The model will be similar to the one that helped you collect annotations during ner.teach – just better.

prodigy ner.batch-train person_dataset en_core_web_sm --output /tmp/model --label PERSON

The --output argument specifies the output directory – so in this case, the trained model will be saved to /tmp/model.

Here are some more details and examples:

ner.batch-train recipe with example
Named entity recognition workflow – this shows an end-to-end example of collecting annotations with ner.teach and training a model with ner.batch-train.

Topic		Replies	Views
Use a custom on_exit function with ner.match usage	1	454	November 8, 2019
Is there a way to pass additional arguments to `on_exit`? usage , solved , server	2	554	May 5, 2020
When I set event "on_exit" in prodigy. I'm having to do cntrl+c to make the event properly work. How do I do it remotely? usage , server	1	440	February 3, 2020
Saving and retrieving annotations usage , database , custom , solved	7	5102	June 13, 2018
On_exit not called enhancement , api , solved	2	923	April 16, 2019

on_exit function to save the model's state

Related topics