Hi! And don’t worry, we know that Prodigy introduces a lot of new concepts and we’re still working on building up a good set of best practices so we can prodvide more guidance for how to tackle various different NLP problems most efficiently.
This depends on what you’re trying to do. Are you trying to improve an existing model, or teach it a new category? And how many annotations did you collect?
As a first step, it’s always nice to look at your annotations and check out the data you’ve created. The
db-out command lets you export the annotations saved in your dataset to a JSONL file, which you can then open in your editor. (Even if you want to train with spaCy directly or using a completely different library, you have these annotations and you can use them however you like).
prodigy db-out your_dataset_name > exported_file.jsonl
If you have labelled a few hundred examples, a good next step could be to run a training experiment and see how your annotations are improving the model. You can do this by running
ner.batch-train (see here for details). The following command will use the annotations in the dataset, train a model with 10 iterations and save it to
prodigy ner.batch-train your_dataset --output /path/to/model --n-iter 10
If the results look good, you can then try to load the model into spaCy and test it. spaCy lets you load models from a file path, so you can do the following:
nlp = spacy.load('/path/to/model')
If you’ve only been annotating so far, no models will have been saved – this only happens after training. Even if you run an annotation recipe with a “model in the loop”, this model won’t be saved afterwards – it won’t be as good as a model trained using multiple iterations, so you should always run the training process as a separate step afterwards. When you run a command like
ner.batch-train, you can specify the output directory where the model will be saved (see example above).
In case you haven’t seen it yet, you might also find this video useful, which shows a full end-to-end workflow of using Prodigy to train a new entity type: