By default, Prodigy makes very little assumptions about your stream of data, so when you exit the Prodigy server (i.e. quit it in your terminal), Prodigy will start again at the beginning of the stream. However, you can tell it to exclude annotations from one or more datasets by using the --exclude
argument. So when you start the server again, you won’t be asked about the tasks you’ve already annotated:
prodigy ner.teach your_dataset en_core_web_sm data.jsonl --exclude your_dataset
You can also use multiple datasets, e.g. --exclude set_one,set_two
. Excluding sets can also be useful for creating evaluation data, because you definitely want to make sure that your evaluation set doesn’t contain tasks from your training set, and vice versa.
When you re-load the web app (and keep the server running), Prodigy will simply make another request to the /get_questions
endpoint and fetch a new batch of tasks from the stream. So you shouldn’t see any duplication here.
If you want to stop annotating and start again later, you can always restore the model in the loop by training from the already collected annotations, and then using that pre-trained model in the next annotation session. For example:
prodigy ner.teach your_dataset en_core_web_sm data.jsonl
prodigy ner.batch-train your_dataset en_core_web_sm /output-model
prodigy ner.teach your_dataset /output-model data.jsonl --exclude your_dataset
The /output-model
will be trained on the examples collected in the previous session, so it will be very similar to the model you had in the loop before – often much better, though, because the batch-train
recipes use multiple iterations and other tricks to improve accuracy, like shuffling the data, setting a dropout rate etc.
Each model you save out with the batch-train
recipe will also include two JSONL files containing the training and evaluation data. This means you’ll always be able to re-produce the results, or restore the training data from a previous model (if you’ve made a mistake, want to try adding examples from a different source etc.)