How to reuse the prodigy.db to retrain the older (spacy v2) ner custom model

rahul1 · December 5, 2022, 5:38am

How to transfer prodigyold.db (containing older prodigy annotated dataset) to new prodigy.db

Step: move the old database file to the folder .prodigy.
Check if the database name doesnot contain any spaces or symobls (otherwise you will get error in following steps). Now your .prodigy folder contains two database files (prodigy.db and prodigyold.db) and a json file (prodigy.json).

Step: change the name of the db file in the prodigy.json

{
  "db": "sqlite",
  "db_settings": {
    "sqlite": {
      "name": "prodigyold.db",
      "path": "./.prodigy"
    }
  }
}

This step can be done within ssh terminal using vi commands (more info on internet). Else, the file can be edited in the local enviornment and then uploaded in ssh terminal, and moved to .prodigy folder.

Step: Check which datasets are there in your old database by typing following command

prodigy stats -l

The result shows number of datasets, sessions and name of the datasets as well. There is only one dataset in my case (old_dataset). A model can be created, just by using old dataset without having to hydrate new database:

prodigy train /home/gebruiker/Documenten/ - ner old_dataset

However, it is a good practice to use those datasets to export the annotated data as JSONL files as shown below.

Step: Export the old prodigyold.db dataset using db-out command to produce a JSONL file.
Beware: before executing the db-out command, give the name of the old database ('prodigyold.db' in my case) in the prodigy.json file, shown above.

prodigy db-out old_dataset > ./old_data.jsonl

Step: change the name of the db file again in the prodigy.json

{
  "db": "sqlite",
  "db_settings": {
    "sqlite": {
      "name": "prodigy.db",
      "path": "./.prodigy"
    }
  }
}

Step: create new_dataset in the prodigy.db using the annotated old_data.jsonl
prodigy db-in new_dataset ./old_data.jsonl --rehash
You can use the new_dataset to create a model using prodigy train recipe.

Topic		Replies	Views
Trained model location path usage , ner	5	571	May 11, 2022
Can I replicate "prodigy train --ner ds_<dataset_name> ./models --eval-split 0.25 -L" within Python? ner , spacy	1	279	October 19, 2023
NER prodigy train with existing model usage , ner , spacy , solved	7	793	September 28, 2020
Help updating spaCy v2 model usage , spacy	5	381	December 15, 2021
Further train NER model from existing Model usage , ner , solved , training	1	586	January 25, 2022

How to reuse the prodigy.db to retrain the older (spacy v2) ner custom model

Related topics