Hi, I already have a dataset with Text, Entity, Label and I want to use it as inputs to tag another dataset only with Text column. How can I do that with Prodigy? I only see that for start working they type "prodigy dataset NAME ’ ’ " and I dont know from where they get that NAME file.
Sorry if this was confusing. What Prodigy calls a “dataset” is the dataset the created annotations will be saved to. So when you run
prodigy dataset your_cool_dataset, Prodigy will create an empty set called “your_cool_dataset” in the database.
When you annotate, you can tell Prodigy to save all labelled examples there. When you’re done, you can use that dataset to train a model, or run the
db-out command to export it to a file to use it in a different process.
The data you want to label and load is usually specified as the
source argument. Prodigy supports loading in CSV files if they contain a
Text column. Alternatively, you can also convert your data to JSON or JSONL (see the
PRODIGY_README.html for details on the format).
For example, the following command will start the
ner.manual recipe so you can label data by hand:
prodigy ner.manual your_dataset en_core_web_sm /path/to/data.csv --label PERSON,ORG
ner.manual- the name of the recipe to run
your_dataset- the name of a dataset in the Prodigy database to save the examples to
en_core_web_sm- name of an installed spaCy model used for tokenization
/path/to/data.csv– the path to your data (can also be a JSON or JSONL file)
--label PERSON,ORG- the labels that will be available
When you’re done, you can export the annotated dataset and check it out:
prodigy db-out your_dataset > some_file.jsonl