Setting session name as a config or CLI option

ines · February 1, 2020, 11:43am

Ah, so if I understand the use case correctly, you want to attach additional meta info to the data you're annotating that's preserved when you merge datasets?

Recipes support a get_session_id function that was initially added to override the default timestamp session IDs (e.g. if you're starting Prodigy instances programmatically and end up with multiple sessions per second). So in a custom recipe, you could add a command-line argument for the annotator name and then

"get_session_id": lambda: annotator_name

However, in that case, you might as well keep the automatic timestamp session ID and add your meta data to each example in the stream before you send it out for annotation. Any custom properties added to the annotation tasks will be passed through and saved in the database.

def add_meta_to_stream(stream):
    for eg in stream:
        eg["annotator_name"] = annotator_name

stream = JSONL(source)  # or whatever
stream = add_meta_to_stream(stream)

A downside of this approach is that you need to write a custom recipe, or at least wrap an existing recipe function so you can add your custom arguments and logic. And you need to edit it if you ever want to add more meta data (like an internal project ID etc).

A more elegant approach I can think of: use a custom loader script that takes command-line arguments and adds the annotator name (and any other metadata) to the stream, then pipe that forward into Prodigy. All recipes that take an input source can also read from standard input. So you could write a custom loader script like this:

# loader.py
import sys
from prodigy.components.loaders import JSONL

filename = sys.argv[1]  # rudimentary arg parsing
username = sys.argv[2]
examples = JSONL(filename)
for eg in examples:
    eg["annotator_name"] = username
    print(eg)

And then call it like this – the - source value tells Prodigy to read from standard input, i.e. the data you're piping forward:

python loader.py ./data.jsonl king | prodigy ner.manual your_dataset en_core_web_sm - --label ONE,TWO

This will now stream in the data and add "annotator_name": "king" to all examples that come in. If you ever want to add more meta, you can modify your loader and take more arguments. You could also read from environment variables or somewhere else – this really depends on what you prefer.

Topic		Replies	Views
get session id within a recipe enhancement , usage , streams	4	1682	February 5, 2020
Saving out annotations by session ID usage , solved , server	2	788	April 21, 2020
Questions on Multi-User Sessions on Prodigy usage , multi-user	5	2342	May 5, 2023
Specifying separate streams per session? enhancement , usage , streams	1	485	February 3, 2022
Set session from an http header enhancement , usage , solved , streams	3	916	April 4, 2020

Setting session name as a config or CLI option

Related topics