question about function add_labels_options

Kasra · June 5, 2019, 1:27pm

Hello,

I am trying to add the recipe textcat.manual to a previous prodigy version. The reason is that at the moment I am not ready to switch to the newest spacy version so I cannot upgrade to the latest prodigy version either.
The question is: can you please show me the code of the function add_label_options(stream, label)?
I tried to add something like the following and it works, but then when i run textcat.print-dataset to look at the result, I see N/A instead of the label that i choose.

def add_options(stream, options):
options = [{'id': index + 1, 'text': option} for index, option in enumerate(options)]

for task in stream:
    task['options'] = options
    yield task

Example of print-dataset output:

n/a       N/A  ...Sentence...

Thanks

ines · June 5, 2019, 1:52pm

Yes, what you did is correct – this is pretty much exactly what that helper function does Since the id property can be anything, I'd suggest also using the option here. The id will be added to the accept list in the annotated task, and using the label here makes it easier to extract the selected labels later on.

This is expected, because the the textcat recipes (training, printing etc.) will all look for a "label" key – but in your case, you only have "options". In v1.8.x, we've updated the training recipes to also accept the options format – but you can also do it yourself by converting the annotations with selected options back to single tasks with a "label". Here's an example:

import copy

def convert_options_to_label(examples):
    converted = []
    for eg in examples:
        selected = eg.get("accept", [])  #  get selected options
        for label in selected:
            new_eg = copy.deepcopy(eg)
            new_eg["label"] = label
            converted.append(new_eg)
    return converted

You could then save the result of this to a new dataset and run textcat.print-dataset over it, or train a model with textcat.batch-train. If your labels are mutually exclusive, you probably also want to create examples for all other labels that weren't selected and set "answer": "reject". That way, you can explicitly tell the model that you know that only the selected label applies and all others don't.

Kasra · June 7, 2019, 8:43am

thanks ines! I tried your suggestion regarding the conversion to lable and it works now
Regarding your second suggestion, Is the following function correct?

   def transform_in_exclusive(examples):
    exclusive_examples = []
    for eg in examples:
        options = [option['text'] for option in eg.get("options", [])]  # get options
        selected = eg.get("label")  # get selected options
        for option in options:
            if option != selected:
                new_eg = copy.deepcopy(eg)
                new_eg["label"] = option
                new_eg["answer"] = "reject"
                exclusive_examples.append(new_eg)
    exclusive_examples.extend(examples)
    return exclusive_examples

and then in the batch_train (Prodigy version 1.7) code i will do:

 ...
 examples = DB.get_dataset(dataset)
 examples = convert_options_to_label(examples)
 examples = transform_in_exclusive(examples)
 labels = {eg["label"] for eg in examples}
 ...

Thanks,

ines · June 7, 2019, 8:47am

I haven’t run it, but looks good to me!

If you want, you could probably even combine this into one single function: for each example, iterate over the options, deepcopy the example, assign the label, and then set the "answer" to "accept" if option == selected, and otherwise to "reject".

Topic		Replies	Views
Textcat correct recipe usage , textcat , solved	1	630	September 16, 2020
convert_options_to_cats for textcat.batch-train textcat , done	2	543	June 7, 2019
Provided label for custom (classification) recipe doesn't show up, and multiple labels cause error usage , textcat , solved	2	494	May 5, 2020
Can't get labels to be shown. docs , usage , textcat , done , solved	6	1361	May 28, 2020
Highlight list of terms in `textcat.manual` for binary annonation usage , textcat	2	412	April 21, 2022

question about function add_labels_options

Related topics