CSV File Text Annotation

Hi,

I have a csv file with feedbacks that I need to annotate using about 8 primary labels and around 25 secondary labels.. I am unable to find the right documentation for it. Can you help me with this?

Hi! You can either read in your CSV file directly or convert it to JSONL. Check out this docs section for the expected file formats: https://prodi.gy/docs/api-loaders#input

How you do the annotation depends on what exactly you need to label. If you need to assign labels to the whole texts, check out the docs on text classification. Also see this section for tips on how to efficiently handle large and/or hierarchical label schemes, like your primary and secondary labels: https://prodi.gy/docs/text-classification#large-label-sets

Hi Ines, can you give me an example. I am really struggling to make this work... I am trying to annotate feedbacks in a csv file which has 2 columns, an "ID" and "Feedback". I tried the below recipe so that I can get "ID" in my JSONL output. But, when I run the command, I am seeing error saying:

[x] Error while validating stream: no first example
This likely means that your stream is empty.

I ran the following command:
python -m prodigy feedback_recipe dataset "C:/Users/...../feedback_prodigy_test.csv" -F temp.py

Below is the recipe:

1.) From where should I run the command?
2.) Can you explain the details regarding "dataset" in the command
3.) DIs the command that I ran correct?

import csv
import prodigy

@prodigy.recipe('feedback_recipe',
dataset=prodigy.recipe_args['dataset'],
file_path=("C:/Users/......../feedback_prodigy_test.csv", "positional", None, str))

def feedback_recipe(dataset, file_path):
"""Annotate the feedbacks using different labels."""
stream = custom_csv_loader(file_path) # load in the CSV file
stream = add_options(stream) # add options to each task

return {
      'dataset': dataset,   # save annotations in this dataset
      'view_id': 'choice',  # use the choice interface
      'config': {'choice_style': 'multiple'},
      'stream':stream,
  }

def custom_csv_loader(file_path):
with open(file_path) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
id = row.get('ResponseId')
text = row.get('Feedback_Explanation')
yield {'text': text, 'meta': {'id':id}}

def add_options(stream):
# Helper function to add options to every task in a stream
options = [
{"id": "Billing & Payment", "text": "Billing & Payment"},
{"id": "Registration & Sign-In", "text": "Registration & Sign-In"},
{"id": "Website Issues", "text": "Website Issues"},
{"id": "Customer Service", "text": "Customer Service"},
]
for task in stream:
task["options"] = options
yield task

If you're trying to load a CSV file "as is", the text should be in a column text or Text. Otherwise, Prodigy can't know where to look for it. Check out the link I posted above for examples of the data format: https://prodi.gy/docs/api-loaders#input

You shouldn't really need a custom recipe for what you're trying to do – textcat.manual should do all you need? Check out the documentation on text classification here:

You might want to check out the "Prodigy 101" guide, which explains how to get started, how to run commands and what the arguments mean and how to set up your annotation projects. It also has a glossary at the end that explains the most common terms, like "dataset" etc.

1 Like