I have been struggling to get prodigy to use my custom database. I have followed the examples available in support, the Mondigy code, and documentation all to no avail.
I wrote a custom recipe which I debugged using the default SQLite database adapter. All worked well. I would like to store my results in MongoDB, so I began working on a custom adapter. In my custom recipe I tried returning the following dictionary
Initially, everything seems to work fine (i.e., new datasets and sessions are created in MongoDB). However, when I try to save annotations via the Prodigy UI (note using the default save button), the annotations are saved to SQLite. A little hacking revealed sometime after the controller is instantiated, a call to Controller.set_db with the default SQLite database is made.
To address this issue, I subclassed Controller to prevent modification of the Database after construction
My custom recipe returns an instance of the subclassed Controller
Note while the return statement is not shown, I do return the controller.
This is very hacky, but I can now store annotations in my MongoDB collection
Seen above are the three MongoDB documents written as the result of a single annotation save. The first entry is the debug_dataset. Second is some default session I don't know how to disable. The final item is the session for one of my predefined users (i.e., user '1'). Note the annotations are correctly saved in both the debug_dataset and the session for user '1'.
The issue at this point is the session names created for my users at time of connect do not match the prepopulated sessions when specifying the PRODIGY_ALLOWED_SESSIONS
environment variable. The session names should be <dataset_name>-<session_name>
. As seen in the image above, the session names created at time of connect are no_dataset-<session_name>
. Note if I call Controller.get_session_name('1')
I get the desired session name debug_dataset-1
. See the following debug output for evidence
The first line is printed in Controller.confirm_session
The second line is printed from my session factory. The list is Controller.session_ids and
no_dataset-1
is the session_id being created.The third line is printed from Controller.add_session
The fourth line is printed from my task router. The list is Controller.session_ids and
no_dataset-1
is the session_id we're routing for.
TL;DR
I believe there is a bug in Controller preventing the use of custom databases and possibly a bug creating session names using a custom database.
Any support would be greatly appreciated.
Prodigy Stats
Version: 1.13.3
Platform macOS-13.5.2-arm64-arm-64bit
Python Version 3.10.11
Spacy Version: 3.6.1
Database Name: SQLite
Database Id: sqlite
Total Datasets: 1
Total Sessions: 301