when to use db-in vs ner.manual


I am a little confused in which scenarios should I use "db-in" or "ner.manual". Can you please help?

  1. Start annotations from scratch with a patterns file.
    --> I did prodigy db-in dev ./raw_dev.jsonl, then prodigy ner.manual dev en_custom ./raw_dev.jsonl --label LOC --patterns patterns.jsonl
    and I got the message "No tasks available"

  2. Increase number of examples to be annotated
    ---> Do I use db-in or ner.manual with the same dataset name?

  3. Edit annotated examples
    ---> After I use db-out to export the annotated examples. If I were to edit or review the annotated examples, do i use db-in or ner.manual with a new dataset name?

Thank you.

Hi! The db-in command is only intended to import existing annotations into your Prodigy datasets – for example, if you've already labelled data with some other process and want to combine it with new annotations or if you want to re-import annotations to a new dataset.

If you just want to annotate data, you do not have to import anything upfront – you can just start the server with your input data and Prodigy will stream it in, let you annotate and save the collected annotations to the database.

The reason you're seeing "No tasks available" after importing the data is that Prodigy will skip questions that are already in the dataset. Since you've already imported the raw data to the dataset of annotations, there's nothing new in the data because it's all in the database already.

This depends on what you want to do with the data: if you want to re-annotate the exported JSON examples to correct them etc., you can load them back into ner.manual. If you just want to add them to a new dataset to use Prodigy to train from them, or so you can add more examples to them later, you can use db-in with a new dataset name.