I think I made a mistake when annotating and used the same dataset for text categorization labels and NER. While this may not technically be an issue, I think it skews training, if I am not mistaken?
For example, I did two sessions annotating one NER label (maybe around 200+ annotations). When I run the
ner.batch-train command, I see output like below:
[Abhishek:~/Projects/Git-Repositories/spaCy] [NM-NLP] master(+25/-6) 4s ± prodigy ner.batch-train event_labels en_core_web_lg --output-model /tmp/model --eval-split 0.2 Loaded model en_core_web_lg Using 20% of accept/reject examples (3078) for evaluation Using 100% of remaining examples (12385) for training Dropout: 0.2 Batch size: 16 Iterations: 10 BEFORE 0.000 Correct 0 Incorrect 24 Entities 4714 Unknown 4714
Why do I see such a huge number when I did not annotate that many examples at all? Is it picking up all the annotations? If so, is there any way I can delete sessions so that I can purge this dataset of the NER stuff and do it on another dataset?
Thanks in advance.