Deleting certain annotation sessions

adwaraki · January 20, 2019, 5:15pm

Hi @ines,
I think I made a mistake when annotating and used the same dataset for text categorization labels and NER. While this may not technically be an issue, I think it skews training, if I am not mistaken?

For example, I did two sessions annotating one NER label (maybe around 200+ annotations). When I run the ner.batch-train command, I see output like below:

[Abhishek:~/Projects/Git-Repositories/spaCy] [NM-NLP] master(+25/-6) 4s ± prodigy ner.batch-train event_labels en_core_web_lg --output-model /tmp/model --eval-split 0.2

Loaded model en_core_web_lg
Using 20% of accept/reject examples (3078) for evaluation
Using 100% of remaining examples (12385) for training
Dropout: 0.2  Batch size: 16  Iterations: 10


BEFORE     0.000
Correct    0
Incorrect  24
Entities   4714
Unknown    4714

Why do I see such a huge number when I did not annotate that many examples at all? Is it picking up all the annotations? If so, is there any way I can delete sessions so that I can purge this dataset of the NER stuff and do it on another dataset?

Thanks in advance.

ines · January 20, 2019, 6:37pm

Yes, this definitely looks like your event_labels dataset has 12385 total examples. Maybe you accidentally added all of your annotations to the same dataset?

To get to the bottom of this, you could run the db-out command, export it to a file and inspect it – maybe this will give you a better idea of what’s going on here? For example:

prodigy db-out event_labels > event_labels.jsonl

This will give you a JSONL file with all the records. You can edit it manually, or use a script to filter it. When you’re done, you can use the db-in command to add the updated data to a new dataset.

Topic		Replies	Views
Difference number examples dataset and batch-train usage , ner , solved	2	563	August 28, 2019
ner.train number of examples usage , ner	8	1948	August 3, 2018
Dropping a session from annotations	2	398	April 25, 2023
Inconsistency Number of Annotated Data ner , textcat	10	34	November 27, 2024
Best strategy for training an NER engine usage , ner	8	2177	December 27, 2017

Deleting certain annotation sessions

Related topics