Saving out annotations by session ID

ainoyatov · April 20, 2020, 10:26pm

Hi @ines Hope this message finds you well.

My use case:

We have our own machine learning model that currently accepts a .csv format data to train itself
We first extract from this ML model, convert it to JSONL format and feed it to stream
Then, using Prodigy we re-annotate, extract JSONL file, convert to .csv and feed it to our ml.

Through writing some custom recipe, everything is working as intended. But now, i want to utilize the PRODIGY_ALLOWED_SESSIONS to permit only those allowed to annotate.

Audience is non-technical and won't have access to the terminal. All they will see is the UI.

Most of the work takes place in on_exit function:

def on_exit(controller):
    examples = controller.db.get_dataset(controller.session_id)
    examples = [x for x in examples if x["answer"] == "accept"]
    for row in examples:
        bodyList = []
        for span in row.get('spans'):
            raw_annotations = {span['label']: [token['text'] for token in row.get('tokens') if
                                               (token['id'] in range(span['token_start'], span['token_end'] + 1))]}

            for k, v in raw_annotations.items():

                bodyList.append(
                    ['col1', 'col2', mergeElements(v), 'col4', k])

        for line in bodyList:
            with open("AnnotatedOn" + controller.session_id + ".csv", 'at') as csvFile:
                write = csv.writer(csvFile)
                write.writerow(line)

return {
    "view_id": view_id,
    "dataset": dataset,
    "stream": stream,
    "config": {
        "lang": nlp.lang,
        "labels": labels
    },
    "on_exit": on_exit
}

Note: mergeElements function just merges the annotated terms after a clean up. Works as expected.

I understand if I do set the environment such as

import os
os.environ["PRODIGY_ALLOWED_SESSION"] = 'manchesterUnited'

I need to use get_session_id function that takes controller as the argument.

The ultimate goal is to write annotator's name to the csv file name. I will add fileTime = datetime.now().strftime('%b%d_%Y_%H:%M:%S') to distinguish annotated work which i know i will lose when i use get_session_id()

Goal is to have a file name:

AnnotatedBy_manchesterUnited_On_fileTime.csv

Can you point me in the right direction? Thank you!

ines · April 21, 2020, 9:03am

Hi! I think the get_session_id function might not be what you want and the solution is actually a bit easier. The get_session_id callback just generates you one session ID on the fly – it was introduced to make it easy to programmatically launch multiple instances of Prodigy, without getting clashes because they're launched within the same millisecond.

If you're using named multi-user sessions, the same controller can have multiple sessions that are added and defined at runtime. The controller.session_id doesn't reflect that – instead, the ID of the session written to the data.

So if you want to export the data to files at the end of the annotation process, you could just load the dataset and look at the "_session_id" value of each task dict. This will contain the name of the session.

(The upcoming version of Prodigy will also have a few more helper functions and properties on the controller to get the names of all currently active sessions or all annotations by session, so you don't have to put that together yourself.)

This will mean that only ?session=manchesterUnited is valid and accessing the app with any other session names will raise an error. This does not replace authentication or anything, but it can help prevent typos and wrongly attributed annotations.

ainoyatov · April 21, 2020, 5:06pm

It was under my nose this entire time. Thank you @ines

you guys rock if I haven't said it already!

Topic		Replies	Views
How to export my annotations?	2	15	March 3, 2025
Managing long annotation sessions usage , streams	3	671	November 1, 2019
Resuming annotations within a session (after closing the browser) usage , streams	6	1411	October 24, 2019
multi -session annotation database , streams	5	653	April 9, 2020
Allow URL params to filter examples to annotate for multi-user sessions enhancement	0	365	April 5, 2022

Saving out annotations by session ID

Related topics