Annotate multiple JSONL into multiple Datasets


I am currently using Prodigy version 1.10 and need some help with my UseCase. What I am doing is embedding Prodigy UI within an iframe in Plotly Dash and every time an annotator starts visits the dashboard, I have to supply him/her with 2 JSONL files of text strings to annotate. Annotated data from these 2 JSNOL files should go into 2 different datasets.

Is there a recipe (could not find anything in the documentation related to this) already coded for where I can supply the names of 2 datasets along with 2 JSONL files and the rest will be taken care of? OR is there a way to pythonically find out when all the strings available in the JSONL file have been annotated, i.e., No Tasks Available?

Hi! Sorry for only getting to this now, I think I missed this thread earlier. At the moment, Prodigy expects you to pick one dataset per instance to save the annotations to, but you could work around that by calling db.add_examples explicitly in the update callback of your recipe, based on values in the data.

If you're generating the JSONL files programmatically and can add the destination datasets to the JSON record, you could do something like this:

from prodigy.components.db import connect

# in your recipe, and returned as "update": update
def update(answers):
    db = connect()
    for eg in answers:
        dataset = eg["dataset"]  # name of target dataset in example
        db.add_examples([eg], [dataset])

Hi, I tried something like this and it worked. Thanks a lot for helping out. You people are best. Never seen this intensity of you guys in replying to every thread. Thanks.

1 Like