How to manage multiple annotators?

I am working on a project involving multiple annotators using Prodigy to create a gold standard dataset for validating an NLP model. Specifically, we have a rule-based model that analyzes clinical notes and extracts snippets from the note based on the presence of a particular concept of interest. We then have subject matter experts (SMEs) use Prodigy to annotate the same snippet using the textcat.manual recipe as binary classification task with the labels TRUE and FALSE.

The environment is highly restricted due to the nature of the data and has no access to the internet and the SMEs that I work with are not technical. Thus, I have made launching and using Prodigy as easy as possible. This is our current workflow:

  1. I first extract the snippets output by the model into a format that is required by Prodigy, each concept of interest will get its own jsonl file
  2. I have a bash script that installs all the dependencies and asks the user for the concept that they would like to annotate and launch Prodigy (through a Python script that runs the recipe)
  3. This Prodigy session launches with a specific prodigy.json that is written out to disk by the Python script since it contains user specific information
  4. Each concept and user get its own database and within it each session gets its own dataset starting with ds_1 and incrementing for each session
  5. Once a reasonable number of snippets has been annotated by two annotators for the same concept, I process them and calculate the interrator agreement and other metrics to refine the annotation guidelines.

I have slightly modified the default Prodigy interface. Here is a screenshot of the interface that I've shared in my instructions to the SMEs:


When I perform post process, I just specify the path to the database and its name in db_settings and extract all the annotated snippets. This seems to work well, but there might be better way to handle multiple annotators.

What we want is an easy way for annotators to discuss specific snippets. For example, if annotator 1 has a problem with a specific snippet, they can note the snippet ID and share it with annotator 2. Annotator 2 can then look up that snippet on Prodigy.


  1. Is there a better way to implement what I've described than the way I've implemented it? I have to work around heavy restrictions
  2. Is there way to search snippets within Prodigy so that multiple annotators can collaborate across snippets to develop annotation guidelines before annotating "for real" to move forward.

Thank you.

hi @sudarshan85!

First off - great work! This is a fascinating workflow and especially impressive under a highly secure environment.

Several thoughts. Let's start with your core question:

Step 1: Enabling flagging

Have you considered using flagging for annotators to "flag" problems with each record?

Just add this to your global prodigy.json: "show_flag": true

@koaning has a great tutorial on this:

With this, annotators can identify problems at the moment, then keep moving on.

Step 2: (Optional) Add in input text for which annotator to send and message for them

You could also create a custom interface with blocks. If you created two input texts, one for who to send to (the other annotator) and the other for text to state the message/explanation.

Something like this:


import prodigy
from prodigy.components.preprocess import add_tokens
import requests
import spacy

def cat_facts_ner(dataset, lang="en"):
    # We can use the blocks to override certain config and content, and set
    # "text": None for the choice interface so it doesn't also render the text
    blocks = [
        {"view_id": "classification", "label": "concept_of_interest"},
        {"view_id": "text_input", "field_id": "send_to", "field_label": "Send annotation to:", "field_suggestions": ["Steve", "Cindy", "Oliver", "Deepak"]},
        {"view_id": "text_input", "field_id": "comments", "field_rows": 3, "field_label": "Explain your decision"}

    def get_stream():
        res = requests.get("").json()
        for fact in res:
            yield {"text": fact["text"]}

    nlp = spacy.blank(lang)           # blank spaCy pipeline for tokenization
    stream = get_stream()             # set up the stream
    stream = add_tokens(nlp, stream)  # tokenize the stream for ner_manual

    return {
        "dataset": dataset,          # the dataset to save annotations to
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "blocks": blocks         # add the blocks to the config

Vincent has created another awesome video just on that:

Perhaps you could try to use some custom javascript to only reveal the blocks (input_text) when the item is flagged.

Step 3: Use the flagged annotations to other annotators

Now each flagged example will be identified in the DB along with who to send it to and the text, you would need to write up a script to serve up those examples as a stream to that annotator. This will depend a bit on your workflow but the simplest way is to export out flagged examples to separate .jsonl files for each annotator the flagged examples are intended for secondary review.

You could then repeat this and if you wind up getting no resolution (e.g., the 2nd annotator agrees there's something wrong), create a new route to use the review recipe to review 2+ reviews. This could be some arbiter like a manager or the most senior teammate.

Step 4: Update the Annotations Guidelines

One other suggestion is that perhaps examples that meet some quality (e.g., maybe the 2nd reviewer also flags that example), save as examples that you add to your annotation guidelines, which can be shown in Prodigy with instructions: "path/to/my_page.html". Perhaps you could use a .jinja template to automatically populate the instructions html.

We wrote a case study where the Guardian had a similar follow up with the ultimate guide to help improve their annotation guidelines:

Perhaps the post can help SMEs who are new to an NLP workflow could appreciate the workflow (e.g., what are annotation guidelines used for, an example of flagging, etc.).

Just curious - are you aware of the difference between your global and local prodigy.json and overrides?

When you run Prodigy, it will first check if a global configuration file exists. It will also check the current working directory for a prodigy.json or .prodigy.json . This allows you to overwrite specific settings on a project-by-project basis.

Make sure to use all three for the three levels of your service.

  • global: Settings for all users (e.g., all users are connected to the right database)
  • user: Setting for each user; can also be thought of as "project" based as it applies to any tasks run by the user
  • task: Setting on the task level for each user

Sure that's definitely an option. Just want to make sure - are you aware of named-multi user sessions? This approach tends to be the more popular and default behavior for multiple annotators. But running a 1 dataset / 1 port per user is another option. Here's a good pro/con of each:

It's worth noting we're planning to make some of our biggest changes to Prodigy next week with v1.12 release. We currently have the candidate out now:

First, I'd be curious of your thoughts on task routing:

One of the motivations for this is partial_overlap, where you may specify that all annotators review x% of data. That data may be used for calibration like inter-annnotator agreement. Alternatively, you may set up routes based on different criteria -- one being the flagged information mentioned previously. With a little trial-and-error, I bet you could build a sophisticated routing system that may have multiple stages.

Just curious - while you have no internet, would it be possible to run Docker? This may help you in running instances that could even be possible scaled up (say Kubernetes) on premise.

We also wrote new deployment docs where we provide ideas of how to deploy Prodigy via Docker (among other ways):

We're also planning to release soon new "Metrics" components that include built-in and custom metrics like IAA. We decided to move it as a follow-up to our initial v1.12.0 release but I can post back once that's available (either thru alpha or released). One benefit of having IAA integrated with Prodigy would be to compute the metrics on the fly - as a means of ongoing annotation QA, which could make task routing even more powerful!

In v1.12, check out the new filter_by_patterns recipe. Perhaps it can help :slight_smile:

That's my initial thoughts but the team can look over your post next week and see if there are additional suggestions.

Hope this helps!

Hi @ryanwesslen,

I've been out for the past week and will start working on this again next week. Thank you for your kind words about what we've done!

I started reading your reply and stoped and just scrolled and was amazed at the lengths and scale of the reply. I felt that I should post a reply and thank you for the effort. I will read this detailed reply and update the post based on my progress.

Thank you!

1 Like

I think this is the easiest method right now. I'm going to try it out and get feedback from the SMEs and see how it goes.

I am but I decided to write out a global json file for each user because the code base which runs the script is shared. The SMEs don't really like to use git commands to clone repos. Since its shared and each configuration is different, I can't place it in the local code directory.

The admins don't like using docker. I understand that might not be a satisfying answer :slight_smile: but my hands are tied with respect to the installs. Its hard but thats the environment that I have to work in.