Number of tasks doesn't match number of items in input file

Hello, we've been running ner.manual off a JSONL file with 1046 unique entries, but Prodigy now says it has no more tasks with only 711 in the database. What might be going on? Is there some reason Prodigy might have skipped certain lines?

Hi! When you restart the server with the same data and dataset, do you get new examples? Or do you see "no tasks available"?

If you see new tasks, one explanation could be that the app was refreshed in between:

If you don't see new tasks, it means that Prodigy thinks that all unique tasks are already in the database. Or, phrased differently: the task hashes created for the incoming examples are all already in the dataset.

Ah, okay. We restarted and there are new tasks, so it must have been reloads that caused it. Thank you!

1 Like

Followup question: when I restart Prodigy, it picks back up where it left off, but when a coworker uses the exact same command to launch it, it gives us "Total: 0". What's the reason for that?

What do you mean by "Total: 0"? The total count of annotations in the database? Maybe double-check that you're both connecting the same database (you can set PRODIGY_LOGGING=basic to see more debugging info). The default database is a local SQLite database on disk, so your coworker might be connecting to a new database on their local machine instead of whichever database you're storing your annotations in.

That's right, the progress section in the web UI shows a total of 0 annotations in the database when my coworker launches Prodigy.

When coworker launches Prodigy:

11:43:10 - DB: Initialising database SQLite
11:43:10 - DB: Connecting to database SQLite
11:43:10 - DB: Loading dataset 'CJMinorNamesOct2019' (0 examples)

When I launch Prodigy:

11:41:09 - DB: Initialising database SQLite
11:41:09 - DB: Connecting to database SQLite
11:41:09 - DB: Loading dataset 'CJMinorNamesOct2019' (942 examples)

We are using the default database and we are launching Prodigy from the same machine. We put Prodigy in a shared location on a VM which we're RDPing into.

I figured out what was happening -- Prodigy created a separate database file in both of our personal home folders. I was able to fix that by setting PRODIGY_HOME and adding an explicit path to a database file in a shared location.