CSV File Text Annotation

dinnuv · March 8, 2020, 1:06pm

Hi,

I have a csv file with feedbacks that I need to annotate using about 8 primary labels and around 25 secondary labels.. I am unable to find the right documentation for it. Can you help me with this?

ines · March 9, 2020, 10:23am

Hi! You can either read in your CSV file directly or convert it to JSONL. Check out this docs section for the expected file formats: https://prodi.gy/docs/api-loaders#input

How you do the annotation depends on what exactly you need to label. If you need to assign labels to the whole texts, check out the docs on text classification. Also see this section for tips on how to efficiently handle large and/or hierarchical label schemes, like your primary and secondary labels: https://prodi.gy/docs/text-classification#large-label-sets

dinnuv · March 9, 2020, 7:02pm

Hi Ines, can you give me an example. I am really struggling to make this work... I am trying to annotate feedbacks in a csv file which has 2 columns, an "ID" and "Feedback". I tried the below recipe so that I can get "ID" in my JSONL output. But, when I run the command, I am seeing error saying:

[x] Error while validating stream: no first example
This likely means that your stream is empty.

I ran the following command:
python -m prodigy feedback_recipe dataset "C:/Users/...../feedback_prodigy_test.csv" -F temp.py

Below is the recipe:

1.) From where should I run the command?
2.) Can you explain the details regarding "dataset" in the command
3.) DIs the command that I ran correct?

import csv
import prodigy

@prodigy.recipe('feedback_recipe',
dataset=prodigy.recipe_args['dataset'],
file_path=("C:/Users/......../feedback_prodigy_test.csv", "positional", None, str))

def feedback_recipe(dataset, file_path):
"""Annotate the feedbacks using different labels."""
stream = custom_csv_loader(file_path) # load in the CSV file
stream = add_options(stream) # add options to each task

return {
      'dataset': dataset,   # save annotations in this dataset
      'view_id': 'choice',  # use the choice interface
      'config': {'choice_style': 'multiple'},
      'stream':stream,
  }

def custom_csv_loader(file_path):
with open(file_path) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
id = row.get('ResponseId')
text = row.get('Feedback_Explanation')
yield {'text': text, 'meta': {'id':id}}

def add_options(stream):
# Helper function to add options to every task in a stream
options = [
{"id": "Billing & Payment", "text": "Billing & Payment"},
{"id": "Registration & Sign-In", "text": "Registration & Sign-In"},
{"id": "Website Issues", "text": "Website Issues"},
{"id": "Customer Service", "text": "Customer Service"},
]
for task in stream:
task["options"] = options
yield task

ines · March 11, 2020, 9:02am

If you're trying to load a CSV file "as is", the text should be in a column text or Text. Otherwise, Prodigy can't know where to look for it. Check out the link I posted above for examples of the data format: Loaders and Input Data · Prodigy · An annotation tool for AI, Machine Learning & NLP

You shouldn't really need a custom recipe for what you're trying to do – textcat.manual should do all you need? Check out the documentation on text classification here:

You might want to check out the "Prodigy 101" guide, which explains how to get started, how to run commands and what the arguments mean and how to set up your annotation projects. It also has a glossary at the end that explains the most common terms, like "dataset" etc.

Topic		Replies	Views
How to use loader to load a csv with text and label? usage , textcat	7	703	June 17, 2020
Viewing annotations made in CSV usage	3	268	November 22, 2022
Names only for annotation project usage , ner	1	356	May 8, 2021
Convert CSV to JSONL usage , solved , streams	25	4839	June 5, 2022
CSV with NER classifications to dataset usage	1	1562	December 13, 2018

CSV File Text Annotation

Related topics