Custom recipe to support multiple labelling schemes, pass text via URL

Hi, I have two questions regarding custom recipes:

1)
For each comment I would like the annotator to label the following for three labelling schemes:

Where - A list of 8 locations
What - A list of things that can happen at those locations
Discipline - Team that should take care of the problem

for example:
"I was at the library and bullies stole my ipad"
Where: Library
What: Theft
Discipline: Security

"The roof in our dorm is leaking"
Where: Dormitories
What: Leak
Discipline: Maintenance

I would like I do something like this to optimize the time spent by each annotator on each text:

    blocks = [        
        {"view_id": "choice", "field_id": "where_label", 'options':wheres},
        {"view_id": "choice", "text": None, "field_id": "what_label", 'options':whats},
        {"view_id": "choice", "text": None, "field_id": "discipline_label", 'options':disciplines},
    ]

This doesn't work exactly right, all the choice blocks are connected, not independent. (I don't think the field_id does anything here either)

Is there a way to do this with custom recipes, that is one label set per labelling scheme for each example?

I have been able to do this in some fashion with an option list that is the superset of all the labeling scheme options and allowing multiple selections by setting the choice_style. Wondering if there is a better way :slight_smile:

2)
Is it possible to pass text to be annotated via the URL, a bit like the way session is passed? I have a Microsoft PowerBI report that can be used to efficiently identify underperforming classes, but no easy way to really integrate back to Prodigy since powerbi has very limited interactivity as far as ability to write back to databases or datasources. I am able to generate URLs that could send our annotators back to prodigy.

Finally, thankyou for such a great tool and excellent documentation!

Ivan

Answer to Question #2

I'll answer this question first because it's the easiest: you cannot pass text to be annotated via the URL. You can however generate a .jsonl file upfront with all the examples that you might be interested in. That means that you could download relevant data from a 3rd party, store it locally, and then proceed using it in Prodigy.

Answer to Question #1

This doesn't work exactly right, all the choice blocks are connected, not independent.

Could you elaborate what you mean by this? Am I understanding it correctly that you want an interface where the answer to the "where" question influences the possible options in the "discipline" action?

Thanks for the info on the URL parameter.

I would like the choice blocks to support different independent label sets in one pass. So the user has to select one "where" one "what" and one "discipline"

At the moment, if the options are exclusive, the user can only select one option across all the label sets. If they are setup to allow multi labels they can choose many options from each label set (I can probably handle this with a validation function?).

Regards,
Ivan

I wrote a quick demo of the behavior that I think you're interested in, to make the issue more tangible, but feel free to correct me if I misunderstood.

I'm starting out with a jsonl file with some examples.

{"text": "hi there"}
{"text": "this is great"}
{"text": "yet another example"}

Next, I write a custom recipe.

# recipe.py
import prodigy
import srsly

@prodigy.recipe(
    "my-custom-recipe",
    dataset=("Dataset to save answers to", "positional", None, str),
    jsonl_file=("Jsonl File to Label", "positional", None, str)
)
def my_custom_recipe(dataset, jsonl_file):
    # Load your own streams from anywhere you want
    stream = list(srsly.read_jsonl(jsonl_file))
    
    blocks = [
        {"view_id": "html", "html_template": "{{text}}"},   
        {"view_id": "text_input", 'field_label': "where", "text": None, "field_id": "where_label","field_suggestions": ["here", "there"]},
        {"view_id": "text_input", "field_label": "what", "text": None, "field_id": "what_label", 'field_suggestions': ["this", "that"]},
        {"view_id": "text_input", "field_label": "discipline", "text": None, "field_id": "discipline_label", 'field_suggestions': ["little", "much"]},
    ]

    return {
        "dataset": dataset,
        "view_id": "blocks",
        "stream": stream,
        "config": {
            "blocks": blocks
        }
    }

Next, I run the labelling interface via:

python -m prodigy my-custom-recipe blocks-demo examples.jsonl -F recipe.py

This gives me an interface with text fields, which also has some suggestions when the user hits the "down" key. The text interface also allows the user to write something different if need be.

I label a first example, hit the save button in the upper lefthand corner and export the data.

> python -m prodigy db-out blocks-demo | jq                                 
{
  "text": "hi there",
  "_input_hash": 1561737964,
  "_task_hash": 1685558882,
  "_view_id": "blocks",
  "what_label": "that",
  "discipline_label": "little",
  "where_label": "there",
  "answer": "accept",
  "_timestamp": 1651564832
}

Is this the behavior you're interested in?

1 Like

Koaning,

This actually better than I imagined :slight_smile: thanks for the detailed example! I love the dropdown list suggestions, it is much cleaner for large sets of labels than a big radio box.

Happy to hear it!

I am curious though ... since there might still be one thought experiment worth doing.

I wonder if it's perhaps faster/better to have three separate labelling interfaces. Right now, an annotator needs to be concerned with three choices at a time. Part of me can imagine, though I don't know for sure, that labelling one thing at a time can be less of a mental burden and might therefore also lead to more/better labels. I'll leave it up to you to decide if this is an experiment worth doing, but if you're curious I'd love to hear any feedback that you might have.

1 Like

That is a good point, and I think it has a lot of merit.

The reason I like to do it all in one pass is that there is a bit of work for each of our records to figure out what actually happened, like Failure Mode, and Failure Cause. It feels like it is more work to go and visit the same comment twice, but maybe it would be faster to do it that way if I could do it in a kind of hierarchical way where they label the Modes in one session that feed into another session for Causes. That would take a bit more user orchestration on my part to direct users to one interface, then to another interface, but not impossible.

The other thing I could do is NER, NER is good at picking out things in the sentence that are relevant, but I still need the class labels to combine all the entities into something useful, so classification seems like less work overall.

Thanks for your thoughts!