Add Meta data for each class label during annotation

HSM · May 21, 2020, 3:07pm

Hi, I was wondering if I can include more information about each label on textcat.teach UI.
The label is just an integer ID and that would be great to include a paragraph for each label so annotator can decide.

ines · May 22, 2020, 11:24am

If it's mostly about providing more details and explanations for the annotator, you can use the "intstructions": Web Application · Prodigy · An annotation tool for AI, Machine Learning & NLP It accepts HTML, so you can format things nicely and include any info that might be relevant during annotation.

If your annotation process is binary, another option would be to wrap the stream, check the "label" assigned to each example and then add an entry to the "meta" with the label description. This will then be displayed in the bottom right corner of the annotation card.

If your labels are integer IDs, you might also consider replacing them with more descriptive strings to make it easier and faster for the annotator to decide. It doesn't matter to the model, and you can always do a search and replace in the data afterwards to convert the labels back – it's still much more efficient than if your annotators have to frequently check instructions and read several paragraphs to know what the label IDs mean.

HSM · May 22, 2020, 4:59pm

I'm using this recipe file for labelling. Could you help me modify it to show meta data about each label?

import spacy
from prodigy.models.matcher import PatternMatcher
from pathlib import Path
import json
from prodigy import recipe
from prodigy.components.db import connect
@recipe('textcat.simple-teach',
    dataset=("Dataset ID", "positional", None, str),
    source_file=("File path or stdin", "positional", None, Path),
    patterns=("Path to match patterns file", "positional", None, Path),
    label=("Label to annotate", "option", "L", str)
)
def simple_teach(dataset, source_file, patterns, label="LABEL"):
    DB = connect()
    nlp = spacy.blank('en')
    matcher = PatternMatcher(nlp, label_span=False, label_task=True).from_disk(patterns)

    # For this example, I assume the source file is already formatted as jsonl
    stream = (json.loads(line) for line in open(source_file))
    stream = (eg for score, eg in matcher(stream))
    return {
        'view_id': 'classification',
        'dataset': dataset,
        'stream': stream,
        'update': None,
        'config': {'lang': 'en', 'labels': [label]}
    }

ines · May 25, 2020, 11:03am

It's kind of up to you, but one thing you could do is something like this:

LABEL_DESCRIPTIONS = {
    "LABEL_A": "Something about label A", 
    "LABEL_B": "Something about label B"
}

def add_label_meta(stream):
    for eg in stream:
        label = eg["label"]
        eg["meta"]["label_info"] = LABEL_DESCRIPTIONS.get(label, "n/a")
        yield eg

And then just add that the the end to update the meta for each example, based on its label:

stream = (eg for score, eg in matcher(stream))
stream = add_label_meta(stream)

HSM · May 26, 2020, 10:27pm

Thank you Ines. I was wondering if I can make the text size for these meta data bigger or not?

ines · May 27, 2020, 11:11am

You can use the "global_css" config setting to add custom CSS overrides. The meta has the class .prodigy-meta, so you can do something like:

"global_css": ".prodigy-meta { font-size: 16px}"

Topic		Replies	Views
Help with postprocessing annotated data for training multicategory text classification model usage , textcat , solved	3	674	April 17, 2020
Provided label for custom (classification) recipe doesn't show up, and multiple labels cause error usage , textcat , solved	2	494	May 5, 2020
Textcat correct recipe usage , textcat , solved	1	629	September 16, 2020
From Choice annotations to binary annotations with Teach usage , textcat , spacy	4	986	January 2, 2019
textcat.manual "'label' is a required property" error textcat , done , solved	4	888	August 6, 2019

Add Meta data for each class label during annotation

Related topics