Custom Recipe - Stream Parameters

I wrote a custom recipe. I implemented both stream() and update() to both interface with an external API to GET messages, and POST annotations, respectively.

Naively, I would implement my GET messages endpoint to get the oldest “un-annotated” message, and then the POST marks the message as annotated. It would be great if I could further select by different filters, select by message_id, select by message type, etc. Is this possible?

This is my stream function:

def stream():
    while True:
        url = 'http://localhost:8000/message' # <--- i want to add parameters here, from the GET parameters in the browser
        yield requests.get(url).json()

Ideally, I was thinking that I could put GET parameters in my browser when using prodigy, and then pass those parameters to my API in my stream() function.

Hi! I hope I understand your question and use case correctly – you want to customise what’s fetched from your stream, right?

In that case, using custom arguments in your custom recipe might be the most convenient solution? All arguments to the recipe function automatically become arguments on the command line. So you could do something like this:

def custom_recipe(dataset, msg_type, order_by):  # <-- whatever you want

    def stream():
        while True:
            # add the parameters to your request
            params = {'type': msg_type, 'order_by': order_by}
            url = 'http://localhost:8000/message'
            yield requests.get(url, params=params).json()

    # other stuff here

    return {
        'dataset': dataset, 
        'stream': stream
        # etc.

You can then execute the recipe from the command line like this:

prodigy custom-recipe dataset_name comment date -F

The @recipe decorator also lets you define argument annotations in Plac’s argument annotations format. The descriptions will be shown when you run the recipe with --help on the command line. You can also define the argument type (so values will be converted accordingly), and whether it’s positional, an option or a flag. For example, here are some random made-up parameters:

    dataset=("The dataset to use", "positional", None, str),
    msg_type=("The message type", "option", "t", str),
    order_by=("Order stream by", "option", "o", str),
    per_page=("Items per response", "option", "p", int),
    include_title=("Include message titles", "flag", "i", bool)
def custom_recipe(dataset, msg_type=None, order_by='created', 
                  per_page=10, include_title=False):
    """Custom recipe that integrates API."""
    def stream():
        while True:
            params = { ... } # and so on

On the command line, you can then do the following to see the recipe description and arguments by typing --help:

prodigy custom-recipe --help -F

And usage could look like this:

prodigy custom-recipe your_dataset --msg-type comment --order-by date --per-page 20 --include-title

Or, shortcuts:

prodigy custom-recipe your_dataset -t comment -o date -p 20 -i

Thank you @ines, but unfortunately that not what I’m looking for.

I’m looking for controls/parameters from the browser. Your approach would seemingly require restarting the prodigy process for each changing of a parameter. Put another way, what if I have two annotators, each working at the same time. I want to send one annotator to:


I’d like to pass these GET params dynamically into the stream() function, onto my internal api requests such that they can provide different tasks accordingly.

Does that make sense? Is that possible?

Ah okay!

And no, the app is explicitly connected to one standalone Prodigy process. Even if it was possible to make the stream generator yield different things via the browser, there’s only one stream generator per process – and if it changes, it’d change for everyone connected to that process.

If you’re working with multiple annotators, you usually always want one process per user. The reason for this is that most Prodigy sessions are inherently stateful – for example, as soon as you’re putting a model in the loop or updating anything, you need a clear separation fo the models. You also usually want to store the annotations in different datasets and possibly with different metadata assigned, so you can evaluate and compare them more easily.

In your case, it looks like you already have the “one provider, multiple consumer” stuff figured out via your API (which is usually the “hard part”) – so it’d make much more sense to start the recipe multiple times on different ports with different parameters. @andy’s multiuser Prodigy extension has a nice example of this.

If you want to do this more elegantly via the browser, you could have a simple app that takes care of launching those processes automatically based on a URL. For example:

  1. user accesses
  2. your service starts your recipe script on an available port, creates a new dataset if necessary and passes in the parameters
  3. once Prodigy is running, user is redirected to the web app on the given host/port and can start annotating
  4. optional: kill process after certain period of inactivity and free up the port