Hacking a multi tenancy Prodigy: problem with identical session IDs


I am trying to serve multiple users with Prodigy to allow them to annotate subtitles. To this end I have a single Postgres that all the running instances of Prodigy is pointing to.

When I start them up I get the following error:

peewee.IntegrityError: duplicate key value violates unique constraint “dataset_name”
DETAIL: Key (name)=(2017-11-21_17-16-29) already exists.

It seems like whenever prodigy starts it tries to enter a minute-unique row in the dataset table and starting a lot of instances up against a single instance causes collisions.

I just wanted to check that for the moment that this is a constraint of Prodigy and we should store things in separate DBs for the moment.


Thanks for the report! This is an interesting constraint we hadn’t considered: When you start a new Prodigy annotation session, a semi-“unique” session ID is generated from the current timestamp. This allows the user and database to distinguish between individual sessions that add to the same dataset.

But it also means that if you start two sessions within the same second, the session IDs will be identical, which causes the error. In theory, this is desired behaviour – but we should probably consider adding an option to let the user plug in their own get_session_id() function or simply supply their own, custom session ID.

For now, the simplest workaround would probably be to make sure that you’re always waiting at least one second between starting sessions.

Btw, if you haven’t seen it already, check out this comment by Matt on strategies for managing multiple annotators with Prodigy.

Makes sense. Thanks!

1 Like