can't reach annotated sentences

Hi,

I use prodigy v1.11.5 with ner.manual recipe and named multi-user sessions on google cloud VM where we have annotated about 60K sentences.

When I use below command

prodigy train first_model --ner ner_method --eval-split 0.2

it can see only 4 annotations.

I also use

prodigy db-out ner_method output

it extracts only 4 sentences to ".jsonl" file.

I can reach the annotated sentences on sqlite database and below are the images of the queries:

Screen Shot 2022-01-04 at 08.19.33

I use

prodigy db-out ner_method_tutku output

command but it says "Can't find 'ner_method_tutku' in database SQLite".

Hi @senol !

Hmm, does the dataset ner_method_tutku exist? Perhaps you meant ner_method-tutku (note the dash vs. underscore based from your screenshot). You can check if which datasets exist by running:

prodigy stats -l

Whenever you run prodigy db-out, it will show the annotations that were saved. To sanity-check, are you saving your annotations? Perhaps the other texts were already saved in the "named" datasets (i.e., ner_method-XXXX)?

Thanks for the quick reply.

Below are the datasets: ner_method has 4 annotations and ner_method_reviewed has 0 annotation.

All other annotations (about 60K) are in example table in prodigy db as you can see on the image in my fist message. But I don't know how to reach them to train a model.

I also tried "ner_method-tutku" and it still throws "Can't find 'ner_method-tutku' in database SQLite" error. It isn't in datasets either.

Are you sure that the .db file you're looking at and querying is the same database Prodigy is accessing (in the same location as shown when you run prodigy stats)? Maybe you ended up with two SQLite databases on the cloud machine?

Because it looks like the ner_method dataset, as well as the named session datasets like ner_method-tutku are definitely available and added, so Prodigy should be able to find them if it's accessing the same DB.

Here is my folder and there is only one .db file. I can manually query db and find annotations in example table but prodigy couldn't find them.

Did you customise the path to the Prodigy database in your prodigy.json? In your screenshot from prodigy stats, it shows the Prodigy home directory as /home/datascience_in_tourism/.prodigy (note the . here). So this is where Prodigy would be looking for the prodigy.db by default. So maybe you ended up with 2 databases by accident?

You're right! There exists a ".prodigy" directory and I don't know how it exists. I replace the db file and now prodigy sees the dataset. I really appreciate your help!

2 Likes