Labels in mark, and multiuser access to prodigy

Thanks for your feedback – and nice to hear that you like Prodigy so far! :grinning:

You should be able to both set a --label and also an optional --view-id on the command line when using the mark recipe. If a label is provided, it will also be added to the annotation examples. For example:

prodigy mark my_dataset my_data.jsonl --label LABEL --view-id classification

The --view-id tells Prodigy to use the "classification" interface, i.e. display the label on top and render the example content underneath.

Yes, that's definitely possible! Matt's comment on this thread goes into more detail and outlines as possible strategy to use Prodigy with multiple annotators, and how to structure a "single producer, multiple consumer" forward queue and a "multiple producer, single consumer" backward queue.

The easiest way to implement this would be to create a simple, custom recipe that orchestrates the whole thing. Here's some pseudocode to illustrate the concept:

@prodigy.recipe('multi-annotator')
def multi_annotator_manager(dataset):
    # for simplicity, let's assume you're using a REST API – of course, you
    # might want to solve this more elegantly
    SERVICE = 'http://your-annotation-queue-provider'
    # get a "unique" stream for the session via your annotation queue provider
    stream = requests.get(SERVICE)

    def update(examples):
        # this function will be called every time Prodigy receives a batch of
        # annotated tasks back from the client – instead of updating the model,
        # you can also use it to update your provider
        requests.post(SERVICE, data=examples)

    def on_load(ctrl):
        # this function will be called when the service starts – the controller
        # also gives you access to the database via ctrl.db in case you need it
        existing_annotations = ctrl.db.get_dataset(dataset)
        print("There are {} annotations in the set".format(len(existing_annotations)))

    def on_exit(ctrl):
        # this function will be called when the annotation session ends
        session_dataset = ctrl.db.get_dataset(ctrl.session_id)
        print("This session annotated {} examples".format(len(session_dataset)))

    return {
        'dataset': dataset,  # all annotations will still be saved to the same dataset
        'stream': stream,
        'update': update,
        'on_load': on_load,
        'on_exit': on_exit
        # other stuff here
    }

If you're using a built-in recipe, you can also import and wrap it by your custom recipe. Prodigy recipes are simple Python functions that return a dictionary of components - so you can execute them with the recipe arguments, receive back a dictionary, modify it and return the dictionary by your custom recipe. See my comment here for more background on this. (The example shows how to overwrite the database component, but of course, the same strategy works for overwriting the stream etc.)

You can find more details on custom recipes and the controller and database API in the PRODIGY_README.html.

Btw, when creating multiple Prodigy sessions programmatically, keep in mind that the session ID is generated from the current timestamp (up to seconds). This means that you may see an error if you're trying to start two sessions within the same second – see this thread for more details. (The upcoming version of Prodigy will include a hook that lets you customise the session ID behaviour.)

2 Likes