How to use the review recipe on two datasets from two different Jupyter notebooks

Hello everyone,

My colleague and I are running Prodigy in two different notebooks on Kubeflow and annotating a dataset in parallel. We now want to use the Prodigy review recipe to compare our work. How can we export a dataset (not the jsonl) from one notebook and import it into the other? Is that even possible or do we have to use a single notebook considering the two databases are both saved in the SQLite?

Thanks to all,
Christian

hi @cw90!

Thanks for your question and welcome to the Prodigy community :wave:

Prodigy doesn't have a concept of a notebook and notebooks aren't required for Prodigy. Given the title of your post, I suspect you mean a Jupyter notebook. We have a jupyter lab extension that allows you to run and annotate Prodigy in jupyter, but this is optional and may be a factor confusing the real problem.

What is the Prodigy command that you and your colleagues are running?

I don't understand this question. It's not really a notebook that's different.

Perhaps -- and I'm not familiar with what Kubeflow is -- you have different SQLite databases? Are you're commands on different machines? (Please correct me if you are running on the same machine/server)

One thing you may want to do is run prodigy stats and find Prodigy Home location. In this folder, there will be prodigy.db, which is your actual SQLite DB. Also from prodigy stats you should see the number of datasets. If your number is different than your colleagues, this would suggest you have different SQLite databases.

If you can point to his (or he point to your Database), you can modify the path of your database in the configuration settings.

{
  "db": "sqlite",
  "db_settings": {
    "sqlite": {
      "name": "prodigy.db",
      "path": "/custom/path"
    }
  }
}

I'm a bit confused by your use of the word "databases". Did you mean datasets?

Using Prodigy via Jupyter extension only works best for data scientists running Prodigy alone (no other annotators, developers, etc). Is there a reason why you need to run Prodigy via a notebook and can't run through a terminal/CLI?

Hi @ryanwesslen,

Thanks for the reply, I appreciate having such a supportive community.

Thanks for pointing us to prodigy stats. It seems we've found a way to export a dataset from one Jupyter notebook and import it into another notebook using the built-in db-out and db-in. We exported the first dataset to a jsonl using db-out and imported that jsonl into the other notebook using db-in and created a new dataset for it. Then we would be able to use Prodigy's review recipe to compare our annotations.

However, we're now getting the following error message:

This has already been reported by a user in Comparing new ner.manual dataset to a revised database - Prodigy Support, so we'll take a look there to try to solve the problem. But if you have other ideas, we'd appreciate your support and knowledge sharing :slight_smile:

Thanks again for the support!
Christian