Setting session name as a config or CLI option

kinghuang · January 31, 2020, 6:56pm

Is it possible to set a session name via a config or CLI option? I have a team where each person is running Prodigy locally. But, I'd like to still set the session so that we can trace where answers are coming from when we combine datasets.

I know that the session can be set with the ?session=name query parameter. But, it would be nice to preset the session set to a static value for these individually run instances of Prodigy.

ines · February 1, 2020, 11:43am

Ah, so if I understand the use case correctly, you want to attach additional meta info to the data you're annotating that's preserved when you merge datasets?

Recipes support a get_session_id function that was initially added to override the default timestamp session IDs (e.g. if you're starting Prodigy instances programmatically and end up with multiple sessions per second). So in a custom recipe, you could add a command-line argument for the annotator name and then

"get_session_id": lambda: annotator_name

However, in that case, you might as well keep the automatic timestamp session ID and add your meta data to each example in the stream before you send it out for annotation. Any custom properties added to the annotation tasks will be passed through and saved in the database.

def add_meta_to_stream(stream):
    for eg in stream:
        eg["annotator_name"] = annotator_name

stream = JSONL(source)  # or whatever
stream = add_meta_to_stream(stream)

A downside of this approach is that you need to write a custom recipe, or at least wrap an existing recipe function so you can add your custom arguments and logic. And you need to edit it if you ever want to add more meta data (like an internal project ID etc).

A more elegant approach I can think of: use a custom loader script that takes command-line arguments and adds the annotator name (and any other metadata) to the stream, then pipe that forward into Prodigy. All recipes that take an input source can also read from standard input. So you could write a custom loader script like this:

# loader.py
import sys
from prodigy.components.loaders import JSONL

filename = sys.argv[1]  # rudimentary arg parsing
username = sys.argv[2]
examples = JSONL(filename)
for eg in examples:
    eg["annotator_name"] = username
    print(eg)

And then call it like this – the - source value tells Prodigy to read from standard input, i.e. the data you're piping forward:

python loader.py ./data.jsonl king | prodigy ner.manual your_dataset en_core_web_sm - --label ONE,TWO

This will now stream in the data and add "annotator_name": "king" to all examples that come in. If you ever want to add more meta, you can modify your loader and take more arguments. You could also read from environment variables or somewhere else – this really depends on what you prefer.

kinghuang · February 3, 2020, 5:15pm

That's really interesting! I didn't know it was possible to attach arbitrary metadata.

The idea of the custom loader fits well with the design of some Kafka stream processors here. Thanks for the detailed response!

Topic		Replies	Views
Specifying separate streams per session? enhancement , usage , streams	1	463	February 3, 2022
Get session using validate_answer?	3	367	July 22, 2022
get session id within a recipe enhancement , usage , streams	4	1608	February 5, 2020
Set session from an http header enhancement , usage , solved , streams	3	788	April 4, 2020
Limiting possible session names in multi-user workflow enhancement , done , front-end , multi-user	2	812	May 20, 2019

Setting session name as a config or CLI option

Related topics