Hi, I was wondering if I can include more information about each label on textcat.teach UI.
The label is just an integer ID and that would be great to include a paragraph for each label so annotator can decide.
If it's mostly about providing more details and explanations for the annotator, you can use the "intstructions"
: Web Application · Prodigy · An annotation tool for AI, Machine Learning & NLP It accepts HTML, so you can format things nicely and include any info that might be relevant during annotation.
If your annotation process is binary, another option would be to wrap the stream, check the "label"
assigned to each example and then add an entry to the "meta"
with the label description. This will then be displayed in the bottom right corner of the annotation card.
If your labels are integer IDs, you might also consider replacing them with more descriptive strings to make it easier and faster for the annotator to decide. It doesn't matter to the model, and you can always do a search and replace in the data afterwards to convert the labels back – it's still much more efficient than if your annotators have to frequently check instructions and read several paragraphs to know what the label IDs mean.
I'm using this recipe file for labelling. Could you help me modify it to show meta data about each label?
import spacy
from prodigy.models.matcher import PatternMatcher
from pathlib import Path
import json
from prodigy import recipe
from prodigy.components.db import connect
@recipe('textcat.simple-teach',
dataset=("Dataset ID", "positional", None, str),
source_file=("File path or stdin", "positional", None, Path),
patterns=("Path to match patterns file", "positional", None, Path),
label=("Label to annotate", "option", "L", str)
)
def simple_teach(dataset, source_file, patterns, label="LABEL"):
DB = connect()
nlp = spacy.blank('en')
matcher = PatternMatcher(nlp, label_span=False, label_task=True).from_disk(patterns)
# For this example, I assume the source file is already formatted as jsonl
stream = (json.loads(line) for line in open(source_file))
stream = (eg for score, eg in matcher(stream))
return {
'view_id': 'classification',
'dataset': dataset,
'stream': stream,
'update': None,
'config': {'lang': 'en', 'labels': [label]}
}
It's kind of up to you, but one thing you could do is something like this:
LABEL_DESCRIPTIONS = {
"LABEL_A": "Something about label A",
"LABEL_B": "Something about label B"
}
def add_label_meta(stream):
for eg in stream:
label = eg["label"]
eg["meta"]["label_info"] = LABEL_DESCRIPTIONS.get(label, "n/a")
yield eg
And then just add that the the end to update the meta for each example, based on its label:
stream = (eg for score, eg in matcher(stream))
stream = add_label_meta(stream)
Thank you Ines. I was wondering if I can make the text size for these meta data bigger or not?
You can use the "global_css"
config setting to add custom CSS overrides. The meta has the class .prodigy-meta
, so you can do something like:
"global_css": ".prodigy-meta { font-size: 16px}"