Issues Setting Custom Database

I have been struggling to get prodigy to use my custom database. I have followed the examples available in support, the Mondigy code, and documentation all to no avail.

I wrote a custom recipe which I debugged using the default SQLite database adapter. All worked well. I would like to store my results in MongoDB, so I began working on a custom adapter. In my custom recipe I tried returning the following dictionary


Initially, everything seems to work fine (i.e., new datasets and sessions are created in MongoDB). However, when I try to save annotations via the Prodigy UI (note using the default save button), the annotations are saved to SQLite. A little hacking revealed sometime after the controller is instantiated, a call to Controller.set_db with the default SQLite database is made.

To address this issue, I subclassed Controller to prevent modification of the Database after construction


My custom recipe returns an instance of the subclassed Controller

Note while the return statement is not shown, I do return the controller.

This is very hacky, but I can now store annotations in my MongoDB collection
image
Seen above are the three MongoDB documents written as the result of a single annotation save. The first entry is the debug_dataset. Second is some default session I don't know how to disable. The final item is the session for one of my predefined users (i.e., user '1'). Note the annotations are correctly saved in both the debug_dataset and the session for user '1'.

The issue at this point is the session names created for my users at time of connect do not match the prepopulated sessions when specifying the PRODIGY_ALLOWED_SESSIONS environment variable. The session names should be <dataset_name>-<session_name>. As seen in the image above, the session names created at time of connect are no_dataset-<session_name>. Note if I call Controller.get_session_name('1') I get the desired session name debug_dataset-1. See the following debug output for evidence


The first line is printed in Controller.confirm_session
The second line is printed from my session factory. The list is Controller.session_ids and no_dataset-1 is the session_id being created.
The third line is printed from Controller.add_session
The fourth line is printed from my task router. The list is Controller.session_ids and no_dataset-1 is the session_id we're routing for.

TL;DR
I believe there is a bug in Controller preventing the use of custom databases and possibly a bug creating session names using a custom database.

Any support would be greatly appreciated.

Prodigy Stats
Version: 1.13.3
Platform macOS-13.5.2-arm64-arm-64bit
Python Version 3.10.11
Spacy Version: 3.6.1
Database Name: SQLite
Database Id: sqlite
Total Datasets: 1
Total Sessions: 301

After a lot of debugging, I discovered we need dataset to be specified in the Controller config like so

Additionally, it seems Prodigy sets the database according to the config at least once during startup. This explains why my custom DB was getting replaced. But I don't think this behavior is desirable.

It took me days to resolve these issues. Updated documentation and a clear tutorial of how to add in your own custom database would have saved me a lot of time.

I fear that mondigy hasn't been maintained in two years and that it's no longer compatible with the Prodigy codebase. Many methods have been added to the database class in version 1.12 to accommodate task routers, which is one of the reasons why it won't work for recent versions.

I'm curious, do you have anything specific configured in your prodigy.json file with regards to databases?

I understand the frustration. On the longer term we are planning to replace the current ORM with SqlAlchemy and we can't consider integrations/documentation improvements on this topic until this refactory has taken place. That said, there are currently no plans to support MongoDB directly.