Converting choice annotations to textcat annotations

Sure! This shouldn’t actually be too difficult :slightly_smiling_face: The main difference of a “choice” dataset is that it has an "accept" property containing the IDs of the selected labels. The textcat.batch-train recipe on the other hand expects one example for each label.

So you could write a little script that loads the data, iterates over the examples and copies them once for each label. I haven’t tested this yet, but something along those lines should work:

from prodigy.components.db import connect
from prodigy.util import set_hashes
import copy

db = connect()
examples = db.get_dataset('your_dataset_name')

converted_examples = []  # export this later

for eg in examples:
    if eg['answer'] != 'accept':
        continue  # skip if the example wasn't accepted
    labels = eg['accept']  # the selected label(s)
    for label in labels:
        new_eg = copy.deepcopy(eg)  # copy the example
        new_eg['label'] = label  # add the label
        # not 100% sure if necessary but set new hashes just in case
        new_eg = set_hashes(new_eg, overwrite=True)