Dropping a session from annotations

DerDiego13 · April 22, 2023, 9:05pm

I used spans.correct and the closed then window without hitting save as I didn't want to save the annotations. However, they somehow got saved and are now added to the dataset. With "prodigy stats -ls" I can see all my sessions. Is there a way to drop the last one from the data base and then just to continue and redo them?

ryanwesslen · April 24, 2023, 5:48pm

hi @DerDiego13,

Thanks for your question and welcome to the Prodigy community

You can run:

from prodigy.components.db import connect
db = connect()
db.drop_dataset("mydata-ryan")

Of course, it's good to be cautious, so you may want to run db-out before running to save a backup for your data.

Here's an example. Let's assume I'm saving my annotations into a dataset called mynerdata.

Let's assume these annotations have been saved from two sessions: diego (20 records annotated) and ryan (10 records annotated).

python -m prodigy ner.correct mynerdata en_core_web_sm data/news_headlines.jsonl --label ORG
Using 1 label(s): ORG

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

✔ Saved 30 annotations to database SQLite
Dataset: mynerdata
Session ID: 2023-04-24_13-37-59

You can now run in Python using the database components:

>>> from prodigy.components.db import connect
>>> db = connect()
>>> len(db.get_dataset("mynerdata"))
30

Let's now view diego's and ryan's annotations:

>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
10

Now we'll use db.drop_dataset but for the session we want to drop (mynerdata-ryan):

>>> db.drop_dataset("mynerdata-ryan")
True
>>> len(db.get_dataset_examples("mynerdata"))
20
>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'NoneType' has no len()

So now mynerdata only has diego's annotations, but we can't find the session for ryan.

You can also use this same logic to clean up old sessions based on some criteria (e.g., remove any sessions without any annotations):

Hope this helps!

DerDiego13 · April 25, 2023, 8:49am

Hi, thank you very much for the reply. I will have a look into it.
For now, I used "prodigy review" (as I needed to correct a few anyway) and will certainly come across the ones that were falsely added and then correct them along the way. Luckily, the dataset is not yet that long.

Topic		Replies	Views
Dropping dataset from code doesn't properly delete examples done , database	12	3196	June 5, 2020
Editing datasets usage , database , solved	6	12282	June 2, 2021
Delete all session information but keep annotations	1	336	June 12, 2023
How do we inspect dataset sessions? usage , database , solved	3	2037	August 9, 2018
How to delete a session from python code usage , database , solved	2	1397	July 8, 2020

Dropping a session from annotations

Related topics