Using Google Firestore as database or not

Prodigy lets you pass in a custom Database class via the "db" setting returned by a custom recipe, or via an entry point of your own Python package installed in the same environment.

So basically, you can write a class that exposes the same methods and properties as the built-in Database class but writes to your remote Firestore database. For instance, it’d have a method add_examples that takes a list of examples and a list of one or more dataset names and then adds those examples to the given datasets in your custom database. The datasets property returns a list of all dataset names in your custom database, and so on.

It’s also possible that some of the methods won’t even have to do anything in your case – for example, I’m not sure reconnecting is an issue with Firestore, so your reconnect would just do nothing. Similarly, the link and unlink methods are really only used internally within Prodigy’s existing database class. So in your Firestore connection, you can just write to a table directly if you want. (Not 100% sure what the best practices are for Firestore/Firebase.)

For details on the API, you can check out the Readme or the source of components/db.py in your Prodigy installation.

1 Like