hi @DerDiego13,
Thanks for your question and welcome to the Prodigy community
You can run:
from prodigy.components.db import connect
db = connect()
db.drop_dataset("mydata-ryan")
Of course, it's good to be cautious, so you may want to run db-out
before running to save a backup for your data.
Here's an example. Let's assume I'm saving my annotations into a dataset called mynerdata
.
Let's assume these annotations have been saved from two sessions: diego
(20 records annotated) and ryan
(10 records annotated).
python -m prodigy ner.correct mynerdata en_core_web_sm data/news_headlines.jsonl --label ORG
Using 1 label(s): ORG
✨ Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!
✔ Saved 30 annotations to database SQLite
Dataset: mynerdata
Session ID: 2023-04-24_13-37-59
You can now run in Python using the database components:
>>> from prodigy.components.db import connect
>>> db = connect()
>>> len(db.get_dataset("mynerdata"))
30
Let's now view diego
's and ryan
's annotations:
>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
10
Now we'll use db.drop_dataset
but for the session
we want to drop (mynerdata-ryan
):
>>> db.drop_dataset("mynerdata-ryan")
True
>>> len(db.get_dataset_examples("mynerdata"))
20
>>> len(db.get_dataset_examples("mynerdata-diego"))
20
>>> len(db.get_dataset_examples("mynerdata-ryan"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'NoneType' has no len()
So now mynerdata
only has diego
's annotations, but we can't find the session for ryan
.
You can also use this same logic to clean up old sessions based on some criteria (e.g., remove any sessions without any annotations):
Hope this helps!