I used spans.correct and the closed then window without hitting save as I didn't want to save the annotations. However, they somehow got saved and are now added to the dataset. With "prodigy stats -ls" I can see all my sessions. Is there a way to drop the last one from the data base and then just to continue and redo them?
Thanks for your question and welcome to the Prodigy community
You can run:
from prodigy.components.db import connect db = connect() db.drop_dataset("mydata-ryan")
Of course, it's good to be cautious, so you may want to run
db-out before running to save a backup for your data.
Here's an example. Let's assume I'm saving my annotations into a dataset called
Let's assume these annotations have been saved from two sessions:
diego (20 records annotated) and
ryan (10 records annotated).
python -m prodigy ner.correct mynerdata en_core_web_sm data/news_headlines.jsonl --label ORG Using 1 label(s): ORG ✨ Starting the web server at http://localhost:8080 ... Open the app in your browser and start annotating! ✔ Saved 30 annotations to database SQLite Dataset: mynerdata Session ID: 2023-04-24_13-37-59
You can now run in Python using the database components:
>>> from prodigy.components.db import connect >>> db = connect() >>> len(db.get_dataset("mynerdata")) 30
Let's now view
>>> len(db.get_dataset_examples("mynerdata-diego")) 20 >>> len(db.get_dataset_examples("mynerdata-ryan")) 10
Now we'll use
db.drop_dataset but for the
session we want to drop (
>>> db.drop_dataset("mynerdata-ryan") True >>> len(db.get_dataset_examples("mynerdata")) 20 >>> len(db.get_dataset_examples("mynerdata-diego")) 20 >>> len(db.get_dataset_examples("mynerdata-ryan")) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'NoneType' has no len()
mynerdata only has
diego's annotations, but we can't find the session for
You can also use this same logic to clean up old sessions based on some criteria (e.g., remove any sessions without any annotations):
Hope this helps!
Hi, thank you very much for the reply. I will have a look into it.
For now, I used "prodigy review" (as I needed to correct a few anyway) and will certainly come across the ones that were falsely added and then correct them along the way. Luckily, the dataset is not yet that long.