"Structured" use of SQL backends?

phdowling · April 30, 2019, 3:05pm

Hi guys. I’ve noticed that Prodigy seems to not really use a SQL-based backends as “structured”, but rather encodes all of its data as blobs which can then only be accessed through the library. I am not sure if this was a deliberate design decision, but I would say it’s definitely a pretty limiting one.

The main drawback is that maintaining data becomes really hard since it can only be done programatically via the library. I can’t simply join labels created in Prodigy to other data in my DB, let alone even view labels that users have created and allow manually changing them through some other interface (like a postgres browser UI).

I would sincerely request that you consider changing this in future versions, since it would greatly improve the usability of Prodigy and make data management a lot easier.

(If there is some obvious workaround that I am missing of course, please let me know and disregard what I wrote here.)

ines · April 30, 2019, 4:16pm

This is a good question! Ultimately, Prodigy is pretty agnostic to what you pass around in the task dictionary. There are some conventions, like the key "label" or "spans" in some built-in recipes and interfaces, but most of it is up to you. In the default database model we ship with Prodigy, we tried to make fewer assumptions about what the example data means and how to translate it to a database schema. When we built Prodigy, we didn’t want to make any decisions here that’d be very difficult to reverse and potentially lock users in and cause migration issues down the line. Instead, we focused on making the Database handler fully customisable.

If you do have more specific requirements (or opinions on how you want your database to be structured), Prodigy lets you pass in your very own Database class that can take full control over how to store and retrieve the data. It just needs to exposes the methods that Prodigy expects. (Also see this thread for more details.)

Topic		Replies	Views
Labeling existing data in Postgres table usage , database	2	1518	August 20, 2019
Using custom DB (Google Spanner) usage , install , database , third-party	14	611	April 6, 2022
Using Google Firestore as database or not database , custom	8	1825	July 22, 2019
Prodigy Database usage , database , custom	1	1013	May 15, 2020
MongoDB to Store Annotations usage , database , custom	2	1601	March 2, 2021

"Structured" use of SQL backends?

Related topics