Prodigy to Spacy Guide


I admit that I’m entirely new to this world of NLP, NER, spaCy and Prodigy. Earlier it was very overwhelming while I wanted to learn all this stuff. But ever since I discovered Prodigy, I immediately bought the license as I it feels super welcoming to newcomers like me.

I have a use-case, where I have to assign labels to either a group or sometime singular english words into something machine readable. For example if the raw text is “My business phone number is 123-234-2323” I want the model to give me a label “BUSINESS_PHONE_NUMBER”, another example would be say the raw text is “For verification purpose, enter the last 4 digits of either your social security number or your Bank of America debit card number”, the trained model should be able to predict “SSN” and “DEBIT_CARD”. For my use-case, any particular raw text would always be between 7-8 words. I’m guessing ner.manual would work for me for me. But I’m not sure, and also for a use-case like this I’m not sure which spacy model should I use. If NER is the right way to go, then please help with the questions below.

Okay so what I want is a step-by-step guide to:

  1. Train a NER model (using both manual and PatternBased) using Prodigy with my custom corpus.
  2. Evaluate/Test the trained model with some test data and repeat Step 1 & 2 as necessary.
  3. Export the NER model from Prodigy and import in scapCy. It would be great if I could train and evaluate the NER model as new data is collected and spaCy to automatically pick up the updated model as it is being trained
  4. Build a REST API on top of spaCy to expose the NER service.

So far I followed few tutorials, documentation and YouTube videos but I don’t think I fully understand the whole workflow.

I want to mention my progress so far:

  1. I’m able to install Prodigy on macOS mojave with CPython 3.5.0.
  2. I have created a Prodigy dataset, using the following command
    prodigy dataset [custom-dataset-name] [custom-dataset-description]
  3. Manually teach named entities, using the following commands
    prodigy ner.manual [custom-dataset-name] [spacy-model-name] [path-to-raw-text-json-file] --label [path-to-named-entities-file]
  4. I see that the prodigy.db in the ~/.prodigy directory is populated with the training data.

Now I want to know:

  1. How can I test this training on evaluation data and see if the NER model is indeed working?
  2. How do I edit a wrong prediction while I’m testing?
  3. Once I’m satisfied, how do I export the NER model, so that I can use it with spaCy as a standalone REST API?
  4. And still better, how can I maintain a continuous sync between the Prodigy NER model and the spaCy NER model instead of having to export and import every time I train new data?

Any help in clarifying these questions to me would be greatly appreciated. If there are related and existing documents, videos that I have missed looking at please share them too.

I cant wait to play with this! :slight_smile:

Hi, that’s nice to hear! Welcome to the Prodigy community :wave:

Named entity recognition is especially powerful if you need to generalise based on examples of real-world objects and phrases in context. To achieve the best results, the category of things should be well-defined – for example, PERSON or CITY are useful categories, while CRIME_LOCATION or VICTIM would be very difficult to learn (“victim” is not a category of people and “crime location” isn’t a category of location – it’s all situational).

For some of the categories you describe, you might actually want to try a rule-based approach using spaCy’s Matcher (see here for details), especially if the phrases you’re looking for follow a consistent pattern. You might also want to explore predicting broader categories and then using other features like the dependency parse to extract the information you need. For example, you could train a category BANK, which would apply to “Bank of America” and then look for the syntactic parent (e.g. “debig card” or “account” etc.). See here for the visualized example:

I explain this approach in more detail in this thread. How you end up writing these rules obviously depends on your data, but I think you’ll be able to achieve much better results this way than if you tried to predict fuzzy categories in one go.

If you haven’t seen it already, check out @honnibal’s talk on how to define NLP problems and solve them through iteration. It shows some examples of using Prodigy, and discusses approaches for framing different kinds of problems and finding out whether something is an NER task or maybe a better fit for text classification, or a combination of statistical and rule-based systems.

You might also find this video helpful. It shows an end-to-end workflow of using Prodigy to train a new entity type from a handful of seed terms, all the way to a loadable spaCy model. It also shows how to use match patterns to quickly bootstrap more examples of relevant entity candidates:

The en_core_web_sm model is usually a good baseline model to start with: it’s small, includes all the pre-trained NER categories, as well as the weights for the tagger and parser. Just keep in mind that if you do need some of the other pre-trained categories, you should always include examples of what the model previously got right when you train it. Otherwise, the model may overfit on the new data and “forget” what it previously knew.

If you don’t need any of the other pre-trained capabilities, you can also start off with a blank model. In this example, the blank model is exported to /path/to/blank_en_model, which you can then use as the model argument in Prodigy.

nlp = spacy.blank('en')
nlp.add_pipe(nlp.create_pipe('ner'))  # add blank NER component
nlp.add_pipe(nlp.create_pipe('sentencizer')) # add sentence boundary detector, just in case
nlp.begin_training()  # initialize weights
nlp.to_disk('/path/to/blank_en_model')  # save out model

The ner.batch-train recipe supports passing in an --eval-id argument. This is the name of the evaluation dataset the model is evaluated against. (If no evaluation set is specified, Prodigy will hold back a certain percentage of your training data – but that’s obviously a less reliable evaluation).

The evaluation dataset is a regular Prodigy dataset – so you could repeat step 3 and use ner.manual to label your evaluation data. If you already have a labelled set, you can convert it to Prodigy’s JSON format and then use the db-in command to import the data.

The ner.batch-train recipe lets you define an --output argument, which is the directory the trained model will be exported to. This directory will be a loadable spaCy model, so in order to use and test it, you can pass the directory path to spacy.load. For example, let’s say you run the following command to train the model:

prodigy ner.batch-train your_dataset en_core_web_sm --output /path/to/model --n-iter 10 --eval-id your_evaluation_dataset

You can then do this in spaCy:

import spacy

nlp = spacy.load('/path/to/model')
doc = nlp("This is some sentence with possible entities")
for ent in doc.ents:
    print(ent.text, ent.label_)

How you set up the REST API is up to you. In general, it’s recommended to only load the model once, e.g. at the top level (and not on every request). I personally like using the library Hug (which also powers Prodigy’s REST API btw). Here’s an example:

import hug
import spacy

nlp = spacy.load('/path/to/model')'/get_entities')
def get_entities(text):
    doc = nlp(text)
    ents = [{'text': ent.text, 'label': ent.label_} for ent in doc.ents]
    return {'ents': ents}

For inspiration, you might also want to check out the spacy-services repo, which includes the source for the microservices powering our demos and visualizers. If you like GraphQL, here’s an experimental repo with a GraphQL API I built a while ago. (There’s probably some room for improvement here, since I’m pretty new to GraphQL.)

One you have a model that predicts something, you can start by improving it by correcting its predictions. The most efficient way to do this is to use the ner.teach recipe, which will show you the predictions that the model is most uncertain about, and will ask you to accept or reject the suggestions. As you annotate, the model in the loop is updated and its predictions are adjusted.

prodigy ner.teach your_new_dataset /path/to/model your_data.jsonl --label YOUR_LABEL

The idea here is to find the best possible training data that has the highest impact. In most cases, what you care about is the model’s accuracy overall, not just the accuracy on some very specific examples. It’s tempting to focus on single examples, but it’s often useful to take a step back and look at the bigger picture.

If you want to label without a model in the loop and create a gold-standard training set, i.e. one that contains the full correct parse of the text, you can also use the ner.make-gold recipe. It will stream in the model’s predictions and let you edit them by hand. The idea here is that it’s likely much more efficient and faster than doing everything from scratch, especially if the model gets a lot right already. If the model is correct 70% of the time, you only need to manually label and correct the remaining 30% (instead of doing 100% by hand).

If you annotate new data and want to update and improve the model, you do have to retrain it. This is usually a good thing, though, because it allows you to keep a clear separation between the individual model versions and the data the models were trained on, and it makes your experiments more repeatable. That’s also why Prodigy generally encourages you to create separate datasets for every experiment you run. You should always be able to reproduce any given model state - otherwise, it becomes very difficult to reason about what’s going on and how the annotations affect the predictions.

That said, there’s still a lot you can automate! Prodigy is fully scriptable, so you could write a Python or Bash script that runs periodically, trains a model from one or more given datasets, outputs the model to a timestamped directory, writes out a file with all the config, compares the accuracy to the previous results and reports it back to you. If the model improved, you can then deploy it – if not, you can investigate why the new data caused a drop in accuracy (Did the model overfit on the new data? Does the dataset include conflicting annotations? Did you introduce a new concept that was difficult to learn? etc.)


Thank you so much @ines. You have given me a lot of content to consume and digest. I will follow your suggestions and get back if I get stuck somewhere. :slight_smile:


We regard this thread--all the questions and all the responses--as comprehensive and reliable guidance for our use case and procedures for migrating into production. So thank you both!

Only a single request for confirmation: do any of the spaCy/Prodigy releases subsequent to Oct 2018 impact the accuracy of this thread? (I don't detect anything, even with the recent revision of Prodigy train.)

Yeah, it all still looks accurate to me :slightly_smiling_face: Some of the workflows mentioned have been improved, for instance:

  • better general-purpoise training with prodigy train
  • better spaCy interoperability with spaCy via prodigy data-to-spacy
  • ner.make-gold is now called ner.correct
  • for REST APIs, I'd recommend FastAPI