I am trying to add the recipe textcat.manual to a previous prodigy version. The reason is that at the moment I am not ready to switch to the newest spacy version so I cannot upgrade to the latest prodigy version either.
The question is: can you please show me the code of the function add_label_options(stream, label)?
I tried to add something like the following and it works, but then when i run textcat.print-dataset to look at the result, I see N/A instead of the label that i choose.
def add_options(stream, options):
options = [{'id': index + 1, 'text': option} for index, option in enumerate(options)]
for task in stream:
task['options'] = options
yield task
Yes, what you did is correct – this is pretty much exactly what that helper function does Since the id property can be anything, I'd suggest also using the option here. The id will be added to the accept list in the annotated task, and using the label here makes it easier to extract the selected labels later on.
This is expected, because the the textcat recipes (training, printing etc.) will all look for a "label" key – but in your case, you only have "options". In v1.8.x, we've updated the training recipes to also accept the options format – but you can also do it yourself by converting the annotations with selected options back to single tasks with a "label". Here's an example:
import copy
def convert_options_to_label(examples):
converted = []
for eg in examples:
selected = eg.get("accept", []) # get selected options
for label in selected:
new_eg = copy.deepcopy(eg)
new_eg["label"] = label
converted.append(new_eg)
return converted
You could then save the result of this to a new dataset and run textcat.print-dataset over it, or train a model with textcat.batch-train. If your labels are mutually exclusive, you probably also want to create examples for all other labels that weren't selected and set "answer": "reject". That way, you can explicitly tell the model that you know that only the selected label applies and all others don't.
thanks ines! I tried your suggestion regarding the conversion to lable and it works now
Regarding your second suggestion, Is the following function correct?
def transform_in_exclusive(examples):
exclusive_examples = []
for eg in examples:
options = [option['text'] for option in eg.get("options", [])] # get options
selected = eg.get("label") # get selected options
for option in options:
if option != selected:
new_eg = copy.deepcopy(eg)
new_eg["label"] = option
new_eg["answer"] = "reject"
exclusive_examples.append(new_eg)
exclusive_examples.extend(examples)
return exclusive_examples
and then in the batch_train (Prodigy version 1.7) code i will do:
...
examples = DB.get_dataset(dataset)
examples = convert_options_to_label(examples)
examples = transform_in_exclusive(examples)
labels = {eg["label"] for eg in examples}
...
If you want, you could probably even combine this into one single function: for each example, iterate over the options, deepcopy the example, assign the label, and then set the "answer" to "accept" if option == selected, and otherwise to "reject".