Issue with terms.teach recipe wrapper saving to SQLite (custom db setup)

graforlock · September 5, 2019, 3:50pm

Hello,

We were trying to create MongoDB wrapper for Prodigy and we've come a bit far with it.

However, when wrapping a custom recipe terms.teach as far as I see in Prodigy source code, some of the data is being saved to SQL even before Save button functionality is being used. The end result is that some of the data ends up in an SQLite file and some of it ends up in MongoDB which is far from ideal.

Given that I am not expert, there are two different JSON objects stored under the examples table, the ones that get saved in SQLite only are the initial seed CSV values when prodigy command is ran:

And the other ones seem to be trained data that is saved upon saving:

Firstly question: What is the reason for those two seemingly different rows in the same table?

Second question: Couldn't db be passed to terms.teach so it worked like in for instance ner.teach so everything could be saved in a single place?

ines · September 5, 2019, 4:06pm

Hi! The saving in the web app calls the exact same API endpoint, no matter if you hit "Save" or if Prodigy saves the examples in the background.

But I think what might be happening in your custom recipe is this: When the terms.teach recipe starts, the seed terms are already saved to the database automatically, because they should also be part of the patterns. If you want to use a custom database for the recipe, you should make sure to also use your custom database there and not save to the SQLite database instead. See here:

github.com

explosion/prodigy-recipes/blob/67909be93fb1f3df1b7510661746b0002264070a/terms/terms_teach.py#L28-L31


DB = connect()
if dataset and dataset in DB:
    seed_tasks = [set_hashes({'text': s, 'answer': 'accept'}) for s in seeds]
    DB.add_examples(seed_tasks, datasets=[dataset])

However you've structured your MongoDB integration, this call should be made to the custom DB as well.

graforlock · September 5, 2019, 6:28pm

Sounds good, I will save the seeds prior to calling teach function if this doesn't matter/screw anything up.

ines · September 5, 2019, 6:40pm

The dataset just holds the saved annotations, so the saving of the seed terms just happens upfront to make sure they're also in the set (and you don't have to click through them again). It shouldn't have any other implications for the recipe

Topic		Replies	Views
Saving and retrieving annotations usage , database , custom , solved	7	5104	June 13, 2018
Issues Setting Custom Database database , custom	2	232	October 17, 2023
Custom templates with custom DB and exclude logic usage , custom , solved	20	3054	January 29, 2018
DB switches from Custom back to SQLite	1	212	November 20, 2023
Tutorial or Example to develop a custom DB usage , database	1	556	October 6, 2020

Issue with terms.teach recipe wrapper saving to SQLite (custom db setup)

Related topics