How to transfer prodigyold.db (containing older prodigy annotated dataset) to new prodigy.db
Step: move the old database file to the folder .prodigy.
Check if the database name doesnot contain any spaces or symobls (otherwise you will get error in following steps). Now your .prodigy folder contains two database files (prodigy.db and prodigyold.db) and a json file (prodigy.json).
Step: change the name of the db file in the prodigy.json
{
"db": "sqlite",
"db_settings": {
"sqlite": {
"name": "prodigyold.db",
"path": "./.prodigy"
}
}
}
This step can be done within ssh terminal using vi commands (more info on internet). Else, the file can be edited in the local enviornment and then uploaded in ssh terminal, and moved to .prodigy folder.
Step: Check which datasets are there in your old database by typing following command
prodigy stats -l
The result shows number of datasets, sessions and name of the datasets as well. There is only one dataset in my case (old_dataset). A model can be created, just by using old dataset without having to hydrate new database:
prodigy train /home/gebruiker/Documenten/ - ner old_dataset
However, it is a good practice to use those datasets to export the annotated data as JSONL files as shown below.
Step: Export the old prodigyold.db dataset using db-out command to produce a JSONL file.
Beware: before executing the db-out command, give the name of the old database ('prodigyold.db' in my case) in the prodigy.json file, shown above.
prodigy db-out old_dataset > ./old_data.jsonl
Step: change the name of the db file again in the prodigy.json
{
"db": "sqlite",
"db_settings": {
"sqlite": {
"name": "prodigy.db",
"path": "./.prodigy"
}
}
}
Step: create new_dataset in the prodigy.db using the annotated old_data.jsonl
prodigy db-in new_dataset ./old_data.jsonl --rehash
You can use the new_dataset to create a model using prodigy train recipe.