Using Prodigy to confirm or reject existing document labels

nuno · January 5, 2019, 11:29am

Hi!

Working with text classification, I have a list of documents which already have been assigned a label outside of Prodigy. However, some of these documents have been wrongly labeled or I simply don’t want to include them in training a text classification model.

I thought I could use Prodigy to confirm or reject documents and respective label. I want to use textcat teach interface to accept/reject document-labels pairs but I want Prodigy to suggest exactly the label which has already been assigned previously.

Is there a way to do this with Prodigy?

Thanks.

ines · January 5, 2019, 11:46am

Yes, that should be pretty straightforward! Let’s assume your input data looks something like this:

{"text": "This is a text", "label": "LABEL_ONE"}
{"text": "This is another text", "label": "LABEL_TWO"}

This is how Prodigy usually represents texts with a label, and the format should hopefully be very easy to generate from whichever format you already have.

You can then load the data into Prodigy and annotate it. The mark recipe takes the exact data it receives, renders it in a given interface and presents it to you so you can accept/reject it. So basically, exactly what you want to do. Here’s an example:

prodigy mark your_dataset your_data.jsonl --view-id classification

You can then run db-out to export your dataset, and each example will have an "answer" key with either "accept", "reject" or "ignore". You can then use that information to filter the examples so you only have the accepted ones, double-check the rejected answers or do whatever else you need.

Here’s the simplified and annotated source of the mark recipe btw so you can see what’s happening under the hood when you run it:

github.com

explosion/prodigy-recipes/blob/master/other/mark.py

# coding: utf8
from __future__ import unicode_literals

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string
from collections import Counter


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe('mark',
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    view_id=("ID of annotation interface", "option", "o", str),
    exclude=("Names of datasets to exclude", "option", "e", split_string)
)
def mark(dataset, source, view_id, exclude=None):
    """

This file has been truncated. show original

nuno · January 5, 2019, 12:31pm

This was exactly what I needed. Thanks for the very fast reply! You’re providing awesome support always!

Topic		Replies	Views
Can I approve/reject pre labelled text classifications usage , textcat	2	474	February 11, 2020
Yes/no annotations with textcat.manual usage , textcat , solved	3	693	December 21, 2020
Train a textcat model after it has been 'prodigy.teach'ed with 3 labels usage , textcat	5	574	November 16, 2020
text classification usage , textcat	7	1126	October 7, 2019
textcat.manual binary annotation without labels usage , textcat , solved	2	359	November 14, 2021

Using Prodigy to confirm or reject existing document labels

Related topics