Getting Started Questions

I hate to admit that I’m still having trouble getting started. I’ve created a dataset and used ner.manual to label text spans but I’m not sure what comes next. It seems like there are a lot of options. If I want the simplest way forward, what should I do next?

Use ner.teach or ner.batch-train? If so, will my annotations be in the correct format after using ner.manual? Or do I need to convert them to JSONL.

Also how do I figure out where my spacy model has been saved? It’s not very obvious to me.


Hi! And don’t worry, we know that Prodigy introduces a lot of new concepts and we’re still working on building up a good set of best practices so we can prodvide more guidance for how to tackle various different NLP problems most efficiently.

This depends on what you’re trying to do. Are you trying to improve an existing model, or teach it a new category? And how many annotations did you collect?

As a first step, it’s always nice to look at your annotations and check out the data you’ve created. The db-out command lets you export the annotations saved in your dataset to a JSONL file, which you can then open in your editor. (Even if you want to train with spaCy directly or using a completely different library, you have these annotations and you can use them however you like).

prodigy db-out your_dataset_name > exported_file.jsonl

If you have labelled a few hundred examples, a good next step could be to run a training experiment and see how your annotations are improving the model. You can do this by running ner.batch-train (see here for details). The following command will use the annotations in the dataset, train a model with 10 iterations and save it to /path/to/model.

prodigy ner.batch-train your_dataset --output /path/to/model --n-iter 10

If the results look good, you can then try to load the model into spaCy and test it. spaCy lets you load models from a file path, so you can do the following:

import spacy
nlp = spacy.load('/path/to/model')

If you’ve only been annotating so far, no models will have been saved – this only happens after training. Even if you run an annotation recipe with a “model in the loop”, this model won’t be saved afterwards – it won’t be as good as a model trained using multiple iterations, so you should always run the training process as a separate step afterwards. When you run a command like ner.batch-train, you can specify the output directory where the model will be saved (see example above).

In case you haven’t seen it yet, you might also find this video useful, which shows a full end-to-end workflow of using Prodigy to train a new entity type: