Different labels for every example

zparcheta · June 1, 2023, 9:17am

Hi all,
I need to load labels from file which are different for every example.
Some ideas how to do it?

koaning · June 2, 2023, 8:38am

Could you give a tangible example of the task that you're trying to perform? For the choice interface you should be able to add the options per example in a custom recipe. Is that what you're referring to?

zparcheta · June 2, 2023, 8:46am

I have a source text and 3 candidates of translation. I want to annotate the best translation.
The point is that I don't know if it's possible to pass translations as labels. In worst case I can show translations with corresponding numbers and labels will be 1,2 and 3.

koaning · June 2, 2023, 9:03am

I wrote a quick custom recipe based on the example in the docs here.

import prodigy
from prodigy.components.loaders import JSONL

@prodigy.recipe(
    "translation",
    dataset=("The dataset to save to", "positional", None, str),
)
def translation(dataset):
    """Annotate the sentiment of texts using different mood options."""
    stream = ({"text": f"hello there, this is text #{i}"} for i in range(10))
    stream = add_options(stream)  # add options to each task

    return {
        "dataset": dataset,   # save annotations in this dataset
        "view_id": "choice",  # use the choice interface
        "stream": stream,
    }

def add_options(stream):
    # Helper function to add options to every task in a stream
    options = [
        {"id": "A", "text": "hallo daaro"},
        {"id": "B", "text": "hi daar"},
        {"id": "C", "text": "hoi"},
    ]
    for task in stream:
        task["options"] = options
        yield task

You'll notice that it's programatically adding translation options in the add_options function. This results in a user interface that looks like this.

When I annotate a single example, here's what prodigy db-out shows me:

{
  "text": "hello there, this is text #0",
  "options": [
    {
      "id": "A",
      "text": "hallo daaro"
    },
    {
      "id": "B",
      "text": "hi daar"
    },
    {
      "id": "C",
      "text": "hoi"
    }
  ],
  "_input_hash": -1981681889,
  "_task_hash": -756079975,
  "_view_id": "choice",
  "config": {
    "choice_style": "single"
  },
  "accept": [
    "B"
  ],
  "answer": "accept",
  "_timestamp": 1685696545
}

Notice how the annotated example shows that the option with id B was selected and that the text for this option is stored as well? Isn't this what you'd need? Feel free to elaborate if I'm misunderstanding.

zparcheta · June 2, 2023, 9:35am

Thank you @koaning .
In the example you provide, you add the same options to every example. The "hallo daaro","hi daar" and "hoi" will be shown for every example. What I want to do is to add different text to the options for every example.

koaning · June 2, 2023, 9:40am

I might change the logic in add_options to reflect that. It's a bit of an arbitrary example, but let's add an index value to the stream and change the option value based on that.

import prodigy
from prodigy.components.loaders import JSONL

@prodigy.recipe(
    "translation",
    dataset=("The dataset to save to", "positional", None, str),
)
def translation(dataset):
    """Annotate the sentiment of texts using different mood options."""
    stream = ({"text": f"hello there, this is text #{i}", "i": i} for i in range(10))
    stream = add_options(stream)  # add options to each task

    return {
        "dataset": dataset,   # save annotations in this dataset
        "view_id": "choice",  # use the choice interface
        "stream": stream,
    }

def add_options(stream):
    # Helper function to add options to every task in a stream
    for task in stream:
        options = [
            {"id": "A", "text": "hallo daaro"},
            {"id": "B", "text": "hi daar"},
            {"id": "C", "text": "hoi"},
            {"id": "D", "text": "hai"},
        ]
        # Just to make it explicit, you can do anything with custom code here
        options = [o for i, o in enumerate(options) if i % 2]
        # In this example I'm doing something fairly arbitrary, but you might also
        # add custom translation texts in here as well instead of filtering out options. 
        task["options"] = options
        yield task

zparcheta · June 2, 2023, 9:49am

Ok, so as I understand, I can read all data (for example from the file) and create the stream manually with corresponding options. Thank you for the explanation! It was very useful.

koaning · June 2, 2023, 9:51am

Yep! That should work.

Alternatively you can also create an examples.jsonl file with the translations pre-calculated so that you don't need to perform data fetching tasks while the stream is being generated on the Prodigy side. This can be very helpful if you're fetching the translations from laggy APIs. In that case you'd only need to write some custom code to make sure the translations in the .jsonl file end up in the options key appropriately.

zparcheta · June 2, 2023, 12:21pm

That is exactly what I wanted to do! Thank you a lot.

Topic		Replies	Views
Textcat correct recipe usage , textcat , solved	1	628	September 16, 2020
Multilabel Classification for Imaging Dataset	8	321	December 12, 2022
Converting choice annotations to textcat annotations usage , textcat , custom , solved	6	1414	September 5, 2018
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	977	January 2, 2019
Labelling a set of images using a custom recipe image	5	485	June 5, 2023

Different labels for every example

Related Topics