Names only for annotation project

erijcken · May 7, 2021, 12:40pm

Hi,

I am new to Prodigy and haven't fully figured out the paradigm.
For a project, I would like to manually annotate names from texts. My team has developed our own model to recognize the names, so I only want to use the annotated texts as a golden standard for our model.

To do so, I have a csv file texts.csv with the text in one of the columns. Do I need to convert this file into a json, or can I also run Prodigy on the csv file?

Also, what is the code that I need to run to start the ner_manual with this dataset?

I suppose, I have to start with:

!python -m prodigy ner.manual

However, it is unclear to me how I should run the rest. Can someone help me with this?

ines · May 8, 2021, 11:52pm

Hi! In case you haven't seen it, you might find the "First steps" guide helpful: Prodigy 101 – everything you need to know · Prodigy · An annotation tool for AI, Machine Learning & NLP It shows an example of a manual NER annotation workflow and walks you through the different steps and explains the concepts. You can copy-paste the ner.manual command from there and adjust the arguments for your use case – for example, the path to your texts and the labels you want to use.

Under the hood the different workflows – also called "recipes" – are Python functions that take different arguments, depending on the workflow. You can specify those arguments on the command line. In the documentation, you can typically hover over the commands to view more details about the arguments:

You can also view the full API and available arguments here: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

Yes, you can also provide a .csv file as the input – just make sure that the text is in a column Text or text. See here for examples of the supported input formats: Loaders and Input Data · Prodigy · An annotation tool for AI, Machine Learning & NLP

We often recommend JSON because it's more flexible and lets you represent nested data structures better – for example, lists of spans and tokens. This is also the output format produced by Prodigy: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP

Topic		Replies	Views
CSV with NER classifications to dataset usage	1	1562	December 13, 2018
Updating an NER model using the annotation tool ner , spacy	6	399	June 5, 2023
JSONL with annotation for NET multi-tag for newbies usage , ner	3	664	February 14, 2022
How can I correct my annotations using the NER.manual recipe?	5	251	May 22, 2023
How to use file annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl ? usage , ner , solved , training	2	558	October 16, 2021

Names only for annotation project

Related topics