Old examples are automatically added to new dataset

I’m seeing this behavior too. Unfortunately, with our workflow we were hoping to be able to reuse dataset names, so switching to a new database file isn’t a great option for us. Here’s a simple set of commands that demonstrates the issue.

from prodigy.components.db import connect
from prodigy import set_hashes
examples = [{'text': 'Example1', 'label': 'Nice', 'answer': 'accept'},
... {'text': 'Example2', 'label': 'Nice', 'answer': 'reject'}]
examples = [set_hashes(eg) for eg in examples]
db = connect()
23:32:52 - DB: Initialising database SQLite
23:32:52 - DB: Connecting to database SQLite
assert 'cmgtest' not in db
db.add_dataset('cmgtest')
23:33:03 - DB: Creating dataset 'cmgtest'
<prodigy.components.db.Dataset object at 0x110df2080>
db.add_examples(examples, ['cmgtest'])
23:33:10 - DB: Getting dataset 'cmgtest'
23:33:10 - DB: Added 2 examples to 1 datasets
print(len(db.get_dataset('cmgtest')))
23:33:16 - DB: Loading dataset 'cmgtest' (2 examples)
2
db.drop_dataset('cmgtest')
23:33:30 - DB: Removed dataset 'cmgtest'
True
assert 'cmgtest' not in db
db.add_dataset('cmgtest')
23:33:43 - DB: Creating dataset 'cmgtest'
<prodigy.components.db.Dataset object at 0x110df20b8>
print(len(db.get_dataset('cmgtest')))
23:33:51 - DB: Loading dataset 'cmgtest' (1 examples)
1
print(db.get_dataset('cmgtest'))
23:34:00 - DB: Loading dataset 'cmgtest' (1 examples)
[{'label': 'Nice', '_input_hash': 1582969015, 'answer': 'reject', '_task_hash': 19451014, 'text': 'Example2'}]
db.add_examples(examples, ['cmgtest'])
23:34:18 - DB: Getting dataset 'cmgtest'
23:34:18 - DB: Added 2 examples to 1 datasets
print(len(db.get_dataset('cmgtest')))
23:34:29 - DB: Loading dataset 'cmgtest' (3 examples)
3
print(db.get_dataset('cmgtest'))
23:34:33 - DB: Loading dataset 'cmgtest' (3 examples)
[{'label': 'Nice', '_input_hash': 1582969015, 'answer': 'reject', '_task_hash': 19451014, 'text': 'Example2'}, {'label': 'Nice', '_input_hash': -544789127, 'answer': 'accept', '_task_hash': 1326324553, 'text': 'Example1'}, {'label': 'Nice', '_input_hash': 1582969015, 'answer': 'reject', '_task_hash': 19451014, 'text': 'Example2'}]
db.drop_dataset('cmgtest')
23:34:37 - DB: Removed dataset 'cmgtest'
True
assert 'cmgtest' not in db
db.add_dataset('cmgtest')
23:34:47 - DB: Creating dataset 'cmgtest'
<prodigy.components.db.Dataset object at 0x110df2a20>
print(len(db.get_dataset('cmgtest')))
23:34:51 - DB: Loading dataset 'cmgtest' (3 examples)
3
db.add_examples(examples, ['cmgtest'])
23:34:58 - DB: Getting dataset 'cmgtest'
23:34:58 - DB: Added 2 examples to 1 datasets
print(len(db.get_dataset('cmgtest')))
23:35:03 - DB: Loading dataset 'cmgtest' (5 examples)
5

Is there any way to wipe out the examples too? The examples only belong to that dataset, and when I drop_dataset() I’m happy to remove all the associated examples.