Hi @ines,
I think I made a mistake when annotating and used the same dataset for text categorization labels and NER. While this may not technically be an issue, I think it skews training, if I am not mistaken?
For example, I did two sessions annotating one NER label (maybe around 200+ annotations). When I run the ner.batch-train
command, I see output like below:
[Abhishek:~/Projects/Git-Repositories/spaCy] [NM-NLP] master(+25/-6) 4s ± prodigy ner.batch-train event_labels en_core_web_lg --output-model /tmp/model --eval-split 0.2
Loaded model en_core_web_lg
Using 20% of accept/reject examples (3078) for evaluation
Using 100% of remaining examples (12385) for training
Dropout: 0.2 Batch size: 16 Iterations: 10
BEFORE 0.000
Correct 0
Incorrect 24
Entities 4714
Unknown 4714
Why do I see such a huge number when I did not annotate that many examples at all? Is it picking up all the annotations? If so, is there any way I can delete sessions so that I can purge this dataset of the NER stuff and do it on another dataset?
Thanks in advance.