Deleting certain annotation sessions

Hi @ines,
I think I made a mistake when annotating and used the same dataset for text categorization labels and NER. While this may not technically be an issue, I think it skews training, if I am not mistaken?

For example, I did two sessions annotating one NER label (maybe around 200+ annotations). When I run the ner.batch-train command, I see output like below:

[Abhishek:~/Projects/Git-Repositories/spaCy] [NM-NLP] master(+25/-6) 4s ± prodigy ner.batch-train event_labels en_core_web_lg --output-model /tmp/model --eval-split 0.2

Loaded model en_core_web_lg
Using 20% of accept/reject examples (3078) for evaluation
Using 100% of remaining examples (12385) for training
Dropout: 0.2  Batch size: 16  Iterations: 10

BEFORE     0.000
Correct    0
Incorrect  24
Entities   4714
Unknown    4714

Why do I see such a huge number when I did not annotate that many examples at all? Is it picking up all the annotations? If so, is there any way I can delete sessions so that I can purge this dataset of the NER stuff and do it on another dataset?

Thanks in advance. :slight_smile:

Yes, this definitely looks like your event_labels dataset has 12385 total examples. Maybe you accidentally added all of your annotations to the same dataset?

To get to the bottom of this, you could run the db-out command, export it to a file and inspect it – maybe this will give you a better idea of what’s going on here? For example:

prodigy db-out event_labels > event_labels.jsonl

This will give you a JSONL file with all the records. You can edit it manually, or use a script to filter it. When you’re done, you can use the db-in command to add the updated data to a new dataset.