Can you please tell me the syntax of on_exit function so that I can save my model’s status for the next time.
You can find the detailed API documentation of the individual recipe components in the PRODIGY_README.html
, which is available for download with Prodigy.
The on_exit
function takes the controller as its argument (which also gives you access to the database, session ID and other settings). Aside from that, you can structure it however you like. In custom recipes, working with closures is often very nice – for example:
@prodigy.recipe('custom-recipe')
def custom_recipe(dataset): # etc.
model = LOAD_YOUR_MODEL_HERE()
# etc.
def on_exit(controller):
SAVE_YOUR_MODEL_HERE()
return {
'dataset': dataset,
'on_exit': on_exit,
# etc.
}
Alternatively, you could also write a little reusable function like this:
def save_model(nlp, output_path):
def on_exit(controller):
nlp.to_disk(output_path)
return on_exit
One thing worth noting (also in case others come across this issue later): The model you’re training during annotation is updated online and won’t be as good as a model trained with one of the batch-train
recipes (which make several passes over the data, shuffle the data, set a dropout rate etc.). So saving the model in a recipe is totally fine if you want to keep reusing it in the next annotation session – but if you actually want to use the model, you should always batch train it on the collected annotations.
So, first I must train my model using ner.teach,
then save those annotations.
Then how should I use those annotations for batch train?
Should I use model saved using ner.teach in ner.batch-train ?
No, the model you train during ner.teach
is mainly used to help Prodigy suggest better examples for annotation. Those will then be saved in a dataset and the intermediate model will be discarded. You can then use the ner.batch-train
reicipe to train a new model from the annotations in your dataset.
For example, let’s say you want to improve the PERSON
entity of spaCy’s default English model and you’ve used ner.teach
to collect a few hundred examples:
prodigy ner.teach person_dataset en_core_web_sm your_data.jsonl --label PERSON
Your annotations will then be saved to the dataset person_dataset
. When you use ner.batch-train
, you then specify the same dataset and model. This will batch train a model from the annotations in the set. The model will be similar to the one that helped you collect annotations during ner.teach
– just better.
prodigy ner.batch-train person_dataset en_core_web_sm --output /tmp/model --label PERSON
The --output
argument specifies the output directory – so in this case, the trained model will be saved to /tmp/model
.
Here are some more details and examples:
-
ner.batch-train
recipe with example -
Named entity recognition workflow – this shows an end-to-end example of collecting annotations with
ner.teach
and training a model withner.batch-train
.