prodigy db-in

I'm trying to do ner.train_curve. However, my training set was broken into several small pieces and reviewed separately. Thus I need to combine into one. I used merge_spans function to merge all the reviewed datasets and tried db-in to one of the existing one. However, the dataset was not changes at all. Then I tried to db-in new_dataset, but I could not find that in prodigy stats -l either. Thanks.

Hi! Did you see a success message after importing and what did it say? And did you run it all on the same machine? Maybe double-check you didn't accidentally write to a different database (e.g. a newly created SQLite database in the home directory)?

Hi Ines,

No, I did not see a success, but it displays the jsonl file I'm trying to add.
I just check my prodigy home, and yes you are right, there is a new database created, So how do I make sure that I'm writing into the one I'm using?

Also, for the jsonl file I'm importing, which fields are necessary? text, meta, spans, answer?

You can use your prodigy.json (in your home directory or current working directory) to specify the SQLite database settings. Alternatively, you can also use the PRODIGY_HOME environment variable to change the location of the home directory where Prodigy looks for the config and creates the database file.

For NER, you need at least a "text", "spans" (list, empty or with entities) and an "answer". All other fields are optional. You can find an example of the data format here: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP

When you train, Prodigy will also validate the examples against a schema and raise an error if required fields are missing.

1 Like