Ah, so if I understand the use case correctly, you want to attach additional meta info to the data you're annotating that's preserved when you merge datasets?
Recipes support a get_session_id
function that was initially added to override the default timestamp session IDs (e.g. if you're starting Prodigy instances programmatically and end up with multiple sessions per second). So in a custom recipe, you could add a command-line argument for the annotator name and then
"get_session_id": lambda: annotator_name
However, in that case, you might as well keep the automatic timestamp session ID and add your meta data to each example in the stream before you send it out for annotation. Any custom properties added to the annotation tasks will be passed through and saved in the database.
def add_meta_to_stream(stream):
for eg in stream:
eg["annotator_name"] = annotator_name
stream = JSONL(source) # or whatever
stream = add_meta_to_stream(stream)
A downside of this approach is that you need to write a custom recipe, or at least wrap an existing recipe function so you can add your custom arguments and logic. And you need to edit it if you ever want to add more meta data (like an internal project ID etc).
A more elegant approach I can think of: use a custom loader script that takes command-line arguments and adds the annotator name (and any other metadata) to the stream, then pipe that forward into Prodigy. All recipes that take an input source can also read from standard input. So you could write a custom loader script like this:
# loader.py
import sys
from prodigy.components.loaders import JSONL
filename = sys.argv[1] # rudimentary arg parsing
username = sys.argv[2]
examples = JSONL(filename)
for eg in examples:
eg["annotator_name"] = username
print(eg)
And then call it like this – the -
source value tells Prodigy to read from standard input, i.e. the data you're piping forward:
python loader.py ./data.jsonl king | prodigy ner.manual your_dataset en_core_web_sm - --label ONE,TWO
This will now stream in the data and add "annotator_name": "king"
to all examples that come in. If you ever want to add more meta, you can modify your loader and take more arguments. You could also read from environment variables or somewhere else – this really depends on what you prefer.