Sure! This shouldn’t actually be too difficult The main difference of a “choice” dataset is that it has an "accept"
property containing the IDs of the selected labels. The textcat.batch-train
recipe on the other hand expects one example for each label.
So you could write a little script that loads the data, iterates over the examples and copies them once for each label. I haven’t tested this yet, but something along those lines should work:
from prodigy.components.db import connect
from prodigy.util import set_hashes
import copy
db = connect()
examples = db.get_dataset('your_dataset_name')
converted_examples = [] # export this later
for eg in examples:
if eg['answer'] != 'accept':
continue # skip if the example wasn't accepted
labels = eg['accept'] # the selected label(s)
for label in labels:
new_eg = copy.deepcopy(eg) # copy the example
new_eg['label'] = label # add the label
# not 100% sure if necessary but set new hashes just in case
new_eg = set_hashes(new_eg, overwrite=True)
converted_examples.append(new_eg)