I'm trying to set up an annotation task via a custom recipe, so that each Text item would have 3 different multiple choice questions associated with them. E.g.
"My text here"
Is this True (options: True/False)
Select best option (options: ABC)
Select all that apply (options: XYZ)
I managed this by using custom html, but in order to simplify the recipe, and downstream processing, I was wondering if something like this is possible without adding custom html or JS into the recipe, by using blocks?
So far I'm having no luck: at best, the tasks seem to get confounded with each other so that you can only select one answer to all the annotation tasks combined (True, False, A, B, C, X, Y or Z).
So: is it possible to have multiple, multiple choice annotation tasks per item, without going all in on custom html?
Thanks for your question and welcome to the Prodigy community
Unfortunately, there's not a built-in way without custom HTML.
That's an interesting use case. I'm curious, could you describe a bit more?
What's the logic of which questions each example will get? Is it fixed like based on rules of the input text or metadata? Also are the questions predefined or can they change over time?
We have dabbled with other conditional logic in the past with jinja templates for conditional logic:
But this will still require html/javascript and may not do much to help.
Is there any way you could break up your task into three separate tasks? I know the hesitancy to do everything in one task, but we've found a lot of complex decisions that hit dead ends like this can be improved by breaking down the tasks. No worries if that's not an option, but we'd be happy to brain storm more if you're curious!
Hi @ryanwesslen and thanks for the welcome and the answer, I think this will save me from some further headaches, haha!
That's an interesting use case. I'm curious, could you describe a bit more?
What's the logic of which questions each example will get? Is it fixed like based on rules of the input text or metadata? Also are the questions predefined or can they change over time?
We're basically using an LLM to perform several classical and less-classical NLP tasks simultanously, and need to annotate some of the data post-hoc to get estimates on the model performance on the tasks. So, the questions are the same for each text item.
We considered the option of splitting the annotation tasks apart but since the texts are quite long, we felt like it's better for the annotators (and therefore for the annotation quality) if they the only have to read the text once and answer the questions all at once. Having just a single recipe/workflow to deal with also simplifies the related data engineering work.
In any case, custom html gets the job done, but extracting the annotator answers is a bit more work (as they end up getting stored within an html string).
I also actually just noticed that the custom recipe is not storing the annotator answers correctly. This might have something to do with how we're showing the LLM answers by pre-selecting the checkbox that corresponds to it.
If you might have any input on how to fix this, I would be very thankful. Below is a simplified version of the used recipe (only the first task is considered, the other 2 are essentially repeating the same block of code). The input data is in jsonl format, each entry having the fields "text" and "meta"
import prodigy
import json
from prodigy.components.loaders import JSONL
from typing import Optional
@prodigy.recipe("review-posts",
dataset=("The dataset to save to", "positional", None, str),
source_file_path=("The source file path to load data from", "positional", None, str))
def review_doc(dataset: str, source_file_path: str):
def data_stream():
with open(source_file_path, 'r') as file:
for entry_str in file:
if not entry_str.strip(): # Check if the string is empty or just whitespace
continue
try:
entry = json.loads(entry_str)
context = entry["text"] # The main text content
# Yield a task for 'is_relevant'
if 'is_relevant' in entry['meta']:
options_relevant = [True, False]
relevant_html = " ".join([f"<input type='radio' name='is_relevant' value='{option}' {'checked' if entry['meta']['is_relevant'] == option else ''}> {str(option)}" for option in options_relevant])
yield {
"html": f"Context: {context}<br><br>Is the text relevant?<br>{relevant_html}",
"meta": {"field_name": "is_relevant", "original_annotation": entry['meta']['is_relevant']}
}
except json.JSONDecodeError:
print(f"Failed to decode JSON: {entry_str}")
continue
return {
"dataset": dataset,
"view_id": "html",
"stream": data_stream(),
}