I'd like to connect Prodigy to read and write the text data from/to Spanner.
Our text samples are sitting as a plain text in a Spanner table (table = "input_data", column name = "text", type = STRING).
The annotations will sit in another table (table = "annotations").
I'm trying to implement most of the database functions as in Database · Prodigy · An annotation tool for AI, Machine Learning & NLP.
I'm not really sure about the structure which is needed. For example, how should the output of get_examples look like?
If someone has already implemented a connection to a custom database which isn't one of the available on peewee, it would be great to learn from.
You can check out this package, which implements most of the database functions for MongoDB. I've used it as a starting point for my own custom annotation setup.
Still need your help regarding this issue.
It's unclear what components are must-have.
Also, what's the table structure needed for Prodigy's internal tables + which tables should be created.
The package linked in the previous comment is Mongo specific and uses an old Prodigy version.
Can you please add a documentation with the exact components needed?
You can find the default table structure here: Database · Prodigy · An annotation tool for AI, Machine Learning & NLP However, if you're using your own database, you can also decide on your own scheme here. Ultimately, all Prodigy will do is ask for examples or give you examples to store, so if your database can perform these actions, it's up to you how you want to store the data.
If there's a way to automate this so you can connect to Postgres directly, then yes, you should be able to just use the Postgres integration out-of-the-box.
I've implemented the necessary parts (I think so ).
Now I'm running: prodigy spans.manual text_annotation blank:en - --label FORM,TAX_FILER --loader spanner_loader
And I get ✘ No loader found for 'spanner_loader'.
spanner_loader file:
You also need to tell Prodigy where to find your loader by name. One option is to not make it a recipe and register it:
from prodigy.util import registry
@registry.loaders.register("spanner_loader")
def spanner_loader(source):
...
The loader will always receive whatever you pass in as the source argument on the CLI – for instance, the mondigy package uses this to provide a configuration file.
Alternatively, you can also make your loader write to standard output, i.e. by calling print(j) instead of yield j. If your loader writes to standard output, you can use it by piping its output forward into the recipe and setting the source to - so it reads from standard input. For example:
Prodigy works fine (without Spanner), so I guess it's installed correctly (?).
Sorry, when running directly through Python CLI, the works fine. PyCharm doesn't recognize it.
When trying to run it with the registry option I get:
File "/Users/gal/.pyenv/versions/3.9.4/envs/april-dev-venv/lib/python3.9/site-packages/prodigy/components/db.py", line 84, in connect
raise ValueError(f"Invalid database id: {db_id}")
ValueError: Invalid database id: spanner