Hi @ines Hope this message finds you well.
My use case:
- We have our own machine learning model that currently accepts a .csv format data to train itself
- We first extract from this ML model, convert it to JSONL format and feed it to stream
- Then, using Prodigy we re-annotate, extract JSONL file, convert to .csv and feed it to our ml.
Through writing some custom recipe, everything is working as intended. But now, i want to utilize the PRODIGY_ALLOWED_SESSIONS to permit only those allowed to annotate.
Audience is non-technical and won't have access to the terminal. All they will see is the UI.
Most of the work takes place in on_exit function:
def on_exit(controller):
examples = controller.db.get_dataset(controller.session_id)
examples = [x for x in examples if x["answer"] == "accept"]
for row in examples:
bodyList = []
for span in row.get('spans'):
raw_annotations = {span['label']: [token['text'] for token in row.get('tokens') if
(token['id'] in range(span['token_start'], span['token_end'] + 1))]}
for k, v in raw_annotations.items():
bodyList.append(
['col1', 'col2', mergeElements(v), 'col4', k])
for line in bodyList:
with open("AnnotatedOn" + controller.session_id + ".csv", 'at') as csvFile:
write = csv.writer(csvFile)
write.writerow(line)
return {
"view_id": view_id,
"dataset": dataset,
"stream": stream,
"config": {
"lang": nlp.lang,
"labels": labels
},
"on_exit": on_exit
}
Note: mergeElements function just merges the annotated terms after a clean up. Works as expected.
I understand if I do set the environment such as
import os
os.environ["PRODIGY_ALLOWED_SESSION"] = 'manchesterUnited'
I need to use get_session_id function that takes controller as the argument.
The ultimate goal is to write annotator's name to the csv file name. I will add fileTime = datetime.now().strftime('%b%d_%Y_%H:%M:%S')
to distinguish annotated work which i know i will lose when i use get_session_id()
Goal is to have a file name:
AnnotatedBy_manchesterUnited_On_fileTime.csv
Can you point me in the right direction? Thank you!