Converting choice annotations to textcat annotations

ines · August 27, 2018, 3:00pm

Sure! This shouldn’t actually be too difficult The main difference of a “choice” dataset is that it has an "accept" property containing the IDs of the selected labels. The textcat.batch-train recipe on the other hand expects one example for each label.

So you could write a little script that loads the data, iterates over the examples and copies them once for each label. I haven’t tested this yet, but something along those lines should work:

from prodigy.components.db import connect
from prodigy.util import set_hashes
import copy

db = connect()
examples = db.get_dataset('your_dataset_name')

converted_examples = []  # export this later

for eg in examples:
    if eg['answer'] != 'accept':
        continue  # skip if the example wasn't accepted
    labels = eg['accept']  # the selected label(s)
    for label in labels:
        new_eg = copy.deepcopy(eg)  # copy the example
        new_eg['label'] = label  # add the label
        # not 100% sure if necessary but set new hashes just in case
        new_eg = set_hashes(new_eg, overwrite=True)
        converted_examples.append(new_eg)

Topic		Replies	Views
Multi label tagging usage , textcat	1	1180	September 10, 2018
Textcat correct recipe usage , textcat , solved	1	629	September 16, 2020
make-gold or manual for textcat? usage , textcat	2	1157	April 24, 2018
textcat teach examples from source or from dataset usage , textcat	10	839	August 15, 2019
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	986	January 2, 2019

Converting choice annotations to textcat annotations

Related topics