Using custom DB (Google Spanner)

GalM · March 24, 2022, 10:28am

Hi,

I'd like to connect Prodigy to read and write the text data from/to Spanner.
Our text samples are sitting as a plain text in a Spanner table (table = "input_data", column name = "text", type = STRING).
The annotations will sit in another table (table = "annotations").
I'm trying to implement most of the database functions as in Database · Prodigy · An annotation tool for AI, Machine Learning & NLP.
I'm not really sure about the structure which is needed. For example, how should the output of get_examples look like?
If someone has already implemented a connection to a custom database which isn't one of the available on peewee, it would be great to learn from.

Thanks

kirilov · March 24, 2022, 10:36am

Hi @GalM ,

You can check out this package, which implements most of the database functions for MongoDB. I've used it as a starting point for my own custom annotation setup.

GalM · March 24, 2022, 12:53pm

Thanks @kirilov , I'll take a look.

GalM · March 27, 2022, 11:54am

Hi,

Still need your help regarding this issue.
It's unclear what components are must-have.
Also, what's the table structure needed for Prodigy's internal tables + which tables should be created.
The package linked in the previous comment is Mongo specific and uses an old Prodigy version.
Can you please add a documentation with the exact components needed?

Thanks

GalM · March 28, 2022, 12:39pm

If I'll find a way to connect to Spanner using psql, will it become a simpler task?

ines · March 28, 2022, 3:35pm

You can find the default table structure here: Database · Prodigy · An annotation tool for AI, Machine Learning & NLP However, if you're using your own database, you can also decide on your own scheme here. Ultimately, all Prodigy will do is ask for examples or give you examples to store, so if your database can perform these actions, it's up to you how you want to store the data.

If you want to implement a Database class that slots into Prodigy just like the built-in one, you can see the methods that should be implemented in the mondigy implementation here: https://github.com/jdagdelen/mondigy/blob/6c7928121faf1ba7d78fe29367bacd997ecc1f24/mondigy/database.py#L37

If there's a way to automate this so you can connect to Postgres directly, then yes, you should be able to just use the Postgres integration out-of-the-box.

GalM · March 29, 2022, 6:45am

Thanks a lot!

GalM · March 31, 2022, 9:37am

Hi,

I've implemented the necessary parts (I think so ).
Now I'm running:
prodigy spans.manual text_annotation blank:en - --label FORM,TAX_FILER --loader spanner_loader
And I get ✘ No loader found for 'spanner_loader'.
spanner_loader file:

I'm running the prodigy command from the same folder where the loader file sits.

Thanks

ines · April 2, 2022, 8:58am

You also need to tell Prodigy where to find your loader by name. One option is to not make it a recipe and register it:

from prodigy.util import registry

@registry.loaders.register("spanner_loader")
def spanner_loader(source):
   ...

The loader will always receive whatever you pass in as the source argument on the CLI – for instance, the mondigy package uses this to provide a configuration file.

(The more advanced solution that mondigy uses is to wrap everything in a package and register the loader as an entry point: mondigy/setup.cfg at 6c7928121faf1ba7d78fe29367bacd997ecc1f24 · jdagdelen/mondigy · GitHub)

Alternatively, you can also make your loader write to standard output, i.e. by calling print(j) instead of yield j. If your loader writes to standard output, you can use it by piping its output forward into the recipe and setting the source to - so it reads from standard input. For example:

prodigy spanner-loader -F spanner_loader.py | prodigy ner.manual dataset blank:en - --label X,Y,Z

GalM · April 3, 2022, 6:54am

Thanks @ines.
When using the final option, I get an error:

  File "/Users/gal/.pyenv/versions/3.9.4/envs/april-dev-venv/lib/python3.9/site-packages/prodigy/components/db.py", line 84, in connect
    raise ValueError(f"Invalid database id: {db_id}")
ValueError: Invalid database id: spanner

After implementing all of the DB class by ourselves, how do we make it not to try and reach the prodigy.components.db?

GalM · April 3, 2022, 6:59am

Also, regarding the second option you've suggested:
from prodigy.util import registry- "util" is not found.

ines · April 4, 2022, 3:48pm

That's strange What's the exact error message you're seeing and is Prodigy installed correctly in the environment?

GalM · April 5, 2022, 7:37am

Prodigy works fine (without Spanner), so I guess it's installed correctly (?).
Sorry, when running directly through Python CLI, the works fine. PyCharm doesn't recognize it.

When trying to run it with the registry option I get:

File "/Users/gal/.pyenv/versions/3.9.4/envs/april-dev-venv/lib/python3.9/site-packages/prodigy/components/db.py", line 84, in connect
    raise ValueError(f"Invalid database id: {db_id}")
ValueError: Invalid database id: spanner

GalM · April 5, 2022, 8:36am

Looks like it tries to connect using the prodigy connect function. And not using my custom DB.

ines · April 6, 2022, 11:57am

Ah, that's likely because of Cython, so it's just the editor not being able to resolve the module.

It looks like this is related to the database itself, not the loader. If you're not using entry points to tell Prodigy where to find the code (e.g. like mondigy does here: https://github.com/jdagdelen/mondigy/blob/6c7928121faf1ba7d78fe29367bacd997ecc1f24/setup.cfg#L26) you can also register it explicitly, e.g.:

@registry.databases("spanner")
def spanner_db():
    return YourDatabaseInstance()

Topic		Replies	Views
Extracting annotations from a database using a custom recipe usage , database	1	593	September 30, 2019
MongoDB to Store Annotations usage , database , custom	2	1601	March 2, 2021
data-to-spacy with custom db plugin ner , database , spacy , solved	2	830	June 2, 2021
annotations imported via db-in not showned ner , done , front-end	2	39	August 31, 2024
Tutorial or Example to develop a custom DB usage , database	1	555	October 6, 2020

Using custom DB (Google Spanner)

Related topics