Dropping a session from annotations

hi @DerDiego13,

Thanks for your question and welcome to the Prodigy community :wave:

You can run:

from prodigy.components.db import connect
db = connect()
db.drop_dataset("mydata-ryan")

Of course, it's good to be cautious, so you may want to run db-out before running to save a backup for your data.

Here's an example. Let's assume I'm saving my annotations into a dataset called mynerdata.

Let's assume these annotations have been saved from two sessions: diego (20 records annotated) and ryan (10 records annotated).

python -m prodigy ner.correct mynerdata en_core_web_sm data/news_headlines.jsonl --label ORG
Using 1 label(s): ORG

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

✔ Saved 30 annotations to database SQLite
Dataset: mynerdata
Session ID: 2023-04-24_13-37-59

You can now run in Python using the database components:

>>> from prodigy.components.db import connect
>>> db = connect()
>>> len(db.get_dataset("mynerdata"))
30

Let's now view diego's and ryan's annotations:

>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
10

Now we'll use db.drop_dataset but for the session we want to drop (mynerdata-ryan):

>>> db.drop_dataset("mynerdata-ryan")
True
>>> len(db.get_dataset_examples("mynerdata"))
20
>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'NoneType' has no len()

So now mynerdata only has diego's annotations, but we can't find the session for ryan.

You can also use this same logic to clean up old sessions based on some criteria (e.g., remove any sessions without any annotations):

Hope this helps!