CSV with NER classifications to dataset

MarioNavarrete · December 13, 2018, 7:03pm

Hi, I already have a dataset with Text, Entity, Label and I want to use it as inputs to tag another dataset only with Text column. How can I do that with Prodigy? I only see that for start working they type "prodigy dataset NAME ’ ’ " and I dont know from where they get that NAME file.

ines · December 13, 2018, 7:39pm

Sorry if this was confusing. What Prodigy calls a “dataset” is the dataset the created annotations will be saved to. So when you run prodigy dataset your_cool_dataset, Prodigy will create an empty set called “your_cool_dataset” in the database.

When you annotate, you can tell Prodigy to save all labelled examples there. When you’re done, you can use that dataset to train a model, or run the db-out command to export it to a file to use it in a different process.

The data you want to label and load is usually specified as the source argument. Prodigy supports loading in CSV files if they contain a text or Text column. Alternatively, you can also convert your data to JSON or JSONL (see the PRODIGY_README.html for details on the format).

For example, the following command will start the ner.manual recipe so you can label data by hand:

prodigy ner.manual your_dataset en_core_web_sm /path/to/data.csv --label PERSON,ORG

ner.manual - the name of the recipe to run
your_dataset - the name of a dataset in the Prodigy database to save the examples to
en_core_web_sm - name of an installed spaCy model used for tokenization
/path/to/data.csv – the path to your data (can also be a JSON or JSONL file)
--label PERSON,ORG - the labels that will be available

When you’re done, you can export the annotated dataset and check it out:

prodigy db-out your_dataset > some_file.jsonl

Topic		Replies	Views
Datasets and using pre-annotated data Getting Started usage , solved	23	5518	November 15, 2020
Annotated Dataset and NER task with Prodigy usage , ner	6	887	February 3, 2023
Re-labling custom dataset with Prodigy usage , ner	2	606	June 28, 2021
Names only for annotation project usage , ner	1	356	May 8, 2021
Create a dataset out of many txt_files documents (Best Practice) usage , ner , best-practices	4	1821	March 30, 2021

CSV with NER classifications to dataset

Related topics