Doubts about databases

alvaro.marlo · August 4, 2021, 1:41pm

Hi,

I have some doubts about the database structure when I use the recipe drop.

In dataset, the entries that are not sessions are deleted, why the sessions still in the database? Is there any reason for it?

In example, any entry is deleted, so if I upload the same content, it will be duplicated with the same input hash and task_hash. That is something irrelevant or it is better removing the entries of example to keep the database clean?

Thanks, and sorry for my english

ines · August 6, 2021, 12:13am

Hi! At the moment, the session datasets aren't always explicitly linked to a given dataset, so you can remove a regular dataset independently from a session dataset. If you don't want any of the session datasets, you could fetch all session dataset names, filter for the timestamp names and then remove them all. This is probably easiest to do by calling into the Database API from Python: Database · Prodigy · An annotation tool for AI, Machine Learning & NLP

While one example can be linked to multiple datasets, you can also have the same example included in different datasets multiple times. There are definitely use cases where you might want this (e.g. if you're creating multiple versions of the same annotation by different people, or if you want multiple and slightly different versions of the same dataset).

I wouldn't worry too much about carefully grooming your database, especially if you're working with text. An individual example isn't that large, so your database will stay relatively small for a long time.

Topic		Replies	Views
Dropping dataset from code doesn't properly delete examples done , database	12	3193	June 5, 2020
Deleting examples from DB usage , database	9	2151	October 14, 2019
Delete annotation from dataset/database usage , database	1	1849	January 15, 2019
Delete all session information but keep annotations	1	335	June 12, 2023
Old examples are automatically added to new dataset done , database	15	2024	March 25, 2019

Doubts about databases

Related topics