What does green tick and red x do for choice tasks?

ines · January 9, 2018, 12:27pm

No, but that's a nice idea! You can easily write your own little converter script for this, though:

from prodigy.components.db import connect

db = connect()  # connect to the database
examples = db.get_dataset('choice_dataset')  # get the dataset

textcat_examples = []  # collect reformatted examples here

for eg in examples:
    accepted = eg.get('accept', [])  # get the list of accepted IDs, e.g. ['FINANCIAL']
    for accepted_id in accepted:
        textcat_examples.append({'text': eg['text'], 'label': accepted_id})

You can then save out the textcat_examples to a JSONL file and add it to a dataset using db-in, or add it to your database straight away by creating a new dataset and adding the list of examples to it. You should then be able to use that dataset to train with textcat.batch-train.

If you want to do this even more elegantly, you could also add an on_exit hook for your recipe that is run when you exit the Prodigy server, and automatically adds the reformatted tasks to a new dataset. The on_exit function takes the controller as its argument, which gives you access to the database and the already annotated examples of the current session. You can find an example of this in the custom recipes workflow.

def on_exit(ctrl):
    # get annotations of current session
    examples = ctrl.db.get_dataset(ctrl.session_id)
    textcat_examples = convert_examples(examples)  # convert the examples
    # add them to your other dataset (needs to exist in the database)
    ctrl.db.add_examples(textcat_examples, datasets=('textcat_examples'))

This depends on what exactly you're trying to do – do you want to recreate the seed selection functionality of the textcat recipes in your custom choice recipe? You can see how the stream with seeds is composed in prodigy/recipes/textcat.py, or use the PatternMatcher from the NER recipes to find terms in your incoming stream. A stream of annotation examples is just a simple generator btw – so you can also implement your own, custom matching logic.

By default, Prodigy tries to make as little assumptions about your streams as possible. Within the same session, duplicate tasks will be filtered out – but when you start a new session, Prodigy will not assume any state. However, once this bug is resolved in the upcoming release, you'll be able to specify the --exclude argument or return a list of dataset IDs as the 'exclude' setting returned by your recipe. This will tell Prodigy to not ask you questions that were already annotated in that dataset. For example, you can set it to the current dataset name, or use the ID of your evaluation set to make sure that examples don't appear in both your training and evaluation set.

Topic		Replies	Views
Answers are missing for view_id='choice' custom	8	727	October 29, 2019
Multiple choice with free text input	4	213	April 25, 2024
Can we rename 'Accept' 'Reject' buttons? If not, is there a way to hide these button? front-end , solved	1	704	October 15, 2018
Disable auto accept when selecting an option in text classification recipe single choice textcat , front-end	1	281	September 15, 2021
How to disable the "Ignore" button? enhancement , usage , front-end , solved	4	1569	July 6, 2019

What does green tick and red x do for choice tasks?

Related topics