Dropping a session from annotations

I used spans.correct and the closed then window without hitting save as I didn't want to save the annotations. However, they somehow got saved and are now added to the dataset. With "prodigy stats -ls" I can see all my sessions. Is there a way to drop the last one from the data base and then just to continue and redo them?

hi @DerDiego13,

Thanks for your question and welcome to the Prodigy community :wave:

You can run:

from prodigy.components.db import connect
db = connect()
db.drop_dataset("mydata-ryan")

Of course, it's good to be cautious, so you may want to run db-out before running to save a backup for your data.

Here's an example. Let's assume I'm saving my annotations into a dataset called mynerdata.

Let's assume these annotations have been saved from two sessions: diego (20 records annotated) and ryan (10 records annotated).

python -m prodigy ner.correct mynerdata en_core_web_sm data/news_headlines.jsonl --label ORG
Using 1 label(s): ORG

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

✔ Saved 30 annotations to database SQLite
Dataset: mynerdata
Session ID: 2023-04-24_13-37-59

You can now run in Python using the database components:

>>> from prodigy.components.db import connect
>>> db = connect()
>>> len(db.get_dataset("mynerdata"))
30

Let's now view diego's and ryan's annotations:

>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
10

Now we'll use db.drop_dataset but for the session we want to drop (mynerdata-ryan):

>>> db.drop_dataset("mynerdata-ryan")
True
>>> len(db.get_dataset_examples("mynerdata"))
20
>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'NoneType' has no len()

So now mynerdata only has diego's annotations, but we can't find the session for ryan.

You can also use this same logic to clean up old sessions based on some criteria (e.g., remove any sessions without any annotations):

Hope this helps!

Hi, thank you very much for the reply. I will have a look into it.
For now, I used "prodigy review" (as I needed to correct a few anyway) and will certainly come across the ones that were falsely added and then correct them along the way. Luckily, the dataset is not yet that long.