Hi all,
I need to load labels from file which are different for every example.
Some ideas how to do it?
Could you give a tangible example of the task that you're trying to perform? For the choice
interface you should be able to add the options per example in a custom recipe. Is that what you're referring to?
I have a source text and 3 candidates of translation. I want to annotate the best translation.
The point is that I don't know if it's possible to pass translations as labels. In worst case I can show translations with corresponding numbers and labels will be 1,2 and 3.
I wrote a quick custom recipe based on the example in the docs here.
import prodigy
from prodigy.components.loaders import JSONL
@prodigy.recipe(
"translation",
dataset=("The dataset to save to", "positional", None, str),
)
def translation(dataset):
"""Annotate the sentiment of texts using different mood options."""
stream = ({"text": f"hello there, this is text #{i}"} for i in range(10))
stream = add_options(stream) # add options to each task
return {
"dataset": dataset, # save annotations in this dataset
"view_id": "choice", # use the choice interface
"stream": stream,
}
def add_options(stream):
# Helper function to add options to every task in a stream
options = [
{"id": "A", "text": "hallo daaro"},
{"id": "B", "text": "hi daar"},
{"id": "C", "text": "hoi"},
]
for task in stream:
task["options"] = options
yield task
You'll notice that it's programatically adding translation options in the add_options
function. This results in a user interface that looks like this.
When I annotate a single example, here's what prodigy db-out
shows me:
{
"text": "hello there, this is text #0",
"options": [
{
"id": "A",
"text": "hallo daaro"
},
{
"id": "B",
"text": "hi daar"
},
{
"id": "C",
"text": "hoi"
}
],
"_input_hash": -1981681889,
"_task_hash": -756079975,
"_view_id": "choice",
"config": {
"choice_style": "single"
},
"accept": [
"B"
],
"answer": "accept",
"_timestamp": 1685696545
}
Notice how the annotated example shows that the option with id B
was selected and that the text for this option is stored as well? Isn't this what you'd need? Feel free to elaborate if I'm misunderstanding.
Thank you @koaning .
In the example you provide, you add the same options to every example. The "hallo daaro","hi daar" and "hoi" will be shown for every example. What I want to do is to add different text to the options for every example.
I might change the logic in add_options
to reflect that. It's a bit of an arbitrary example, but let's add an index value to the stream and change the option value based on that.
import prodigy
from prodigy.components.loaders import JSONL
@prodigy.recipe(
"translation",
dataset=("The dataset to save to", "positional", None, str),
)
def translation(dataset):
"""Annotate the sentiment of texts using different mood options."""
stream = ({"text": f"hello there, this is text #{i}", "i": i} for i in range(10))
stream = add_options(stream) # add options to each task
return {
"dataset": dataset, # save annotations in this dataset
"view_id": "choice", # use the choice interface
"stream": stream,
}
def add_options(stream):
# Helper function to add options to every task in a stream
for task in stream:
options = [
{"id": "A", "text": "hallo daaro"},
{"id": "B", "text": "hi daar"},
{"id": "C", "text": "hoi"},
{"id": "D", "text": "hai"},
]
# Just to make it explicit, you can do anything with custom code here
options = [o for i, o in enumerate(options) if i % 2]
# In this example I'm doing something fairly arbitrary, but you might also
# add custom translation texts in here as well instead of filtering out options.
task["options"] = options
yield task
Ok, so as I understand, I can read all data (for example from the file) and create the stream manually with corresponding options. Thank you for the explanation! It was very useful.
Yep! That should work.
Alternatively you can also create an examples.jsonl
file with the translations pre-calculated so that you don't need to perform data fetching tasks while the stream is being generated on the Prodigy side. This can be very helpful if you're fetching the translations from laggy APIs. In that case you'd only need to write some custom code to make sure the translations in the .jsonl
file end up in the options
key appropriately.
That is exactly what I wanted to do! Thank you a lot.