Multiple annotators with different data

This definitely sounds feasible. Do you want each annotator to annotate with a model in the loop, or do you have static data you want to go through more or less in order?

I think the best solution here would be to put your service in the middle and let it handle the authentication. So, the user "logs on" and makes a request to your service. Your service authenticates the user and creates a session token etc. If this was successful, your service will start a Prodigy session for the user, and pass the user ID and all other details to the recipe. The recipe will then communicate with your service and request a stream of tasks. Your service will know which user is making requests, so it can construct the stream accordingly.

You can also look at the prodigy.serve function inprodigy/__init__.py if you want to implement your own solution that executes a recipe starts the Prodigy server. But I'm not even sure this will be necessary in your case.

You can find more details on the exact formats in your PRODIGY_README.html (available for download with Prodigy). If you're looking for the format of the annotation tasks in the stream, see the "Annotation task formats" section. A stream is an iterable of dictionaries, with one dictionary describing an annotation task. So your API could simply return a list of objects, e.g. [{"text": "hello world"}] etc.

To avoid exhausting the stream, you might want to write a little wrapper that keeps making requests if the queue is running low. I posted a little example for a custom loader in this thread – the example was supposed to show how to implement your own data loader for Twitter etc., but you can also easily adapt it for your use case:

def custom_loader():
    page = 0   # if API is paged, keep a counter
    while True:
        r = requests.get('http://some-api', params={'page': page})
        response = r.json()
        for item in response['results']:  # or however it's structured
            yield {'text': item['text']}  # etc.
        page += 1  # after page is exhausted, increment

You can also add any other custom properties to your annotation task – like a user identifier. Anything that you add to a task's "meta" object will be displayed in the bottom right corner of the annotation card in the web app.

When Prodigy processes a stream, it will assign an _input_hash based on the input text, and a _task_hash based on the input and the features to annotate, e.g. the spans or labels. This lets you determine whether two tasks are the same. So your service can look at a the task hashes, and check if a user has already annotated a task. It can also check if the tasks that went out to the user all came back annotated – and if not (for example, if the user just closes their browser and doesn't save), send them out again.

If you haven't seen it already, there's also this thread on using multiple annotator, in which I explain a bit more about the hashing.

1 Like