Names only for annotation project

Hi,

I am new to Prodigy and haven't fully figured out the paradigm.
For a project, I would like to manually annotate names from texts. My team has developed our own model to recognize the names, so I only want to use the annotated texts as a golden standard for our model.

To do so, I have a csv file texts.csv with the text in one of the columns. Do I need to convert this file into a json, or can I also run Prodigy on the csv file?

Also, what is the code that I need to run to start the ner_manual with this dataset?

I suppose, I have to start with:

!python -m prodigy ner.manual

However, it is unclear to me how I should run the rest. Can someone help me with this?

Hi! In case you haven't seen it, you might find the "First steps" guide helpful: Prodigy 101 – everything you need to know · Prodigy · An annotation tool for AI, Machine Learning & NLP It shows an example of a manual NER annotation workflow and walks you through the different steps and explains the concepts. You can copy-paste the ner.manual command from there and adjust the arguments for your use case – for example, the path to your texts and the labels you want to use.

Under the hood the different workflows – also called "recipes" – are Python functions that take different arguments, depending on the workflow. You can specify those arguments on the command line. In the documentation, you can typically hover over the commands to view more details about the arguments:

You can also view the full API and available arguments here: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

Yes, you can also provide a .csv file as the input – just make sure that the text is in a column Text or text. See here for examples of the supported input formats: Loaders and Input Data · Prodigy · An annotation tool for AI, Machine Learning & NLP

We often recommend JSON because it's more flexible and lets you represent nested data structures better – for example, lists of spans and tokens. This is also the output format produced by Prodigy: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP