data-to-spacy with custom db plugin

Hi!
I've collected some annotations using ner.teach and now would like to convert them to Spacy format using data-to-spacy. I'm using a custom db plugin in my Prodigy recipe to save the annotations to MongoDB instead of SQLite.

According to the documentation of data-to-spacy, I can't find a way to specify my db plugin so the parameter --ner will be looking in the right place.

Currently I'm trying the command:

prodigy data-to-spacy ./output.json --ner-missing --ner my_dataset

But I'm getting:

✘ Can't find 'my_dataset' in database 'SQLite'

Hi! How does your database plugin work under the hood, does it register the database via entry points, or are you using a custom recipe that changes the database and returns it via the components?

It looks like Prodigy's connect() helper ends up connecting to the default SQLite database instead of your custom one. If the database is registered via entry points, Prodigy should be able to find it automatically if your custom database ID is set in your prodigy.json. If not, that's definitely confusing.

If you've been using a custom recipe that implements the custom database, the problem is that data-to-spacy will connect to the default database. So you could either edit the recipe to load from your custom DB instead, or wrap your custom DB as a Python package and make it expose your database class via an entry point so Prodigy can find it by string name (see this thread for some background).

Thank you! For some reason I missed that I could use entry points to plug in the database manager. The link to the thread you sent was extremely helpful. I've moved my database manager to a plugin and prodigy is reading it fine now!

2 Likes