Labels in mark, and multiuser access to prodigy

Good job for the great product, and your hard work is clearly paying off :blush:

First

I want to know if i can change the visual look of the mark recipe. What i mean is, just like the textcat i can clearly see the “label” over the plain text in the box. How can i do the same thing for the mark, to get the label on top of the box, and save it at the same time inside the annotations.

Second

If possible, if i can have access from other PCs to the same prodigy dataset, and annotate on the same .jsonl file using any recipe from prodigy. Which means, i run prodigy from my pc, with the host 0.0.0.0 That way other pcs can access it from the browser. In my case, i want multi users to have access to the same web application, annotating from the same folder, without getting similar data for both. and in the end, data annotated will be saved to one dataset.

Thanks for your feedback – and nice to hear that you like Prodigy so far! :grinning:

You should be able to both set a --label and also an optional --view-id on the command line when using the mark recipe. If a label is provided, it will also be added to the annotation examples. For example:

prodigy mark my_dataset my_data.jsonl --label LABEL --view-id classification

The --view-id tells Prodigy to use the “classification” interface, i.e. display the label on top and render the example content underneath.

Yes, that’s definitely possible! Matt’s comment on this thread goes into more detail and outlines as possible strategy to use Prodigy with multiple annotators, and how to structure a “single producer, multiple consumer” forward queue and a “multiple producer, single consumer” backward queue.

The easiest way to implement this would be to create a simple, custom recipe that orchestrates the whole thing. Here’s some pseudocode to illustrate the concept:

@prodigy.recipe('multi-annotator')
def multi_annotator_manager(dataset):
    # for simplicity, let's assume you're using a REST API – of course, you
    # might want to solve this more elegantly
    SERVICE = 'http://your-annotation-queue-provider'
    # get a "unique" stream for the session via your annotation queue provider
    stream = requests.get(SERVICE)

    def update(examples):
        # this function will be called every time Prodigy receives a batch of
        # annotated tasks back from the client – instead of updating the model,
        # you can also use it to update your provider
        requests.post(SERVICE, data=examples)

    def on_load(ctrl):
        # this function will be called when the service starts – the controller
        # also gives you access to the database via ctrl.db in case you need it
        existing_annotations = ctrl.db.get_dataset(dataset)
        print("There are {} annotations in the set".format(len(existing_annotations)))

    def on_exit(ctrl):
        # this function will be called when the annotation session ends
        session_dataset = ctrl.db.get_dataset(ctrl.session_id)
        print("This session annotated {} examples".format(len(session_dataset)))

    return {
        'dataset': dataset,  # all annotations will still be saved to the same dataset
        'stream': stream,
        'update': update,
        'on_load': on_load,
        'on_exit': on_exit
        # other stuff here
    }

If you’re using a built-in recipe, you can also import and wrap it by your custom recipe. Prodigy recipes are simple Python functions that return a dictionary of components - so you can execute them with the recipe arguments, receive back a dictionary, modify it and return the dictionary by your custom recipe. See my comment here for more background on this. (The example shows how to overwrite the database component, but of course, the same strategy works for overwriting the stream etc.)

You can find more details on custom recipes and the controller and database API in the PRODIGY_README.html.

Btw, when creating multiple Prodigy sessions programmatically, keep in mind that the session ID is generated from the current timestamp (up to seconds). This means that you may see an error if you’re trying to start two sessions within the same second – see this thread for more details. (The upcoming version of Prodigy will include a hook that lets you customise the session ID behaviour.)

2 Likes

Thank you for the detailed reply, it helped a lot :blush:

1 Like

I’d like to use this workflow to add some new entities that weren’t in my original gold set, but I’m running into an issue. I have exported my the original dataset to ./annotations/ner_gold/ner_gold_person.jsonl (via prodigy db-out), and when I run the following:

prodigy mark ner_gold ./annotations/ner_gold/ner_gold_person.jsonl --view-id ner_manual --label "PERSON,COLLEGE,LOC"

I get mostly the interface I expect, except that instead of the 3 labels I asked for, I get a single one that says NO_LABEL. ner_gold was an empty dataset, so I wondered if adding a single gold record via ner.make-gold would help. ner.make-gold worked as expected, but prodigy mark still had the NO_LABEL behavior.

I think my version is the newest: prodigy-1.5.1-cp35.cp36-cp35m.cp36m-macosx_10_13_x86_64.whl. Am I missing a step here?

Ah, the way this works is a bit unideal, sorry! What’s happening here is that in the manual mode, the config (recipe config or, in theory, prodigy.json), also needs to expose a 'labels' setting with a list of available labels for the menu.

You could just use ner.manual instead, though? This should do exactly what you need, and even give you more settings and options.

1 Like

Yep, confirmed, thanks! Any way to get an --unsegmented variant into there too? A lot of my entities (e.g. addresses) are ending up split across segments.

Hmm, ner.manual shouldn’t segment sentences at all. Is it possible that the data you’re loading in is already segmented? Maybe from a previous annotation session?

You’re absolutely right; thanks and sorry for the noise!

1 Like