I see this pattern
- [ner] Training: 1738 | Evaluation: 326 (20% split)
Training: 352 | Evaluation: 88
Why is this discrepancy? Which one should I trust?
When I do prodigy tats the dataset has 1738 annotations and they are all accepted
I see this pattern
Why is this discrepancy? Which one should I trust?
When I do prodigy tats the dataset has 1738 annotations and they are all accepted
hi @nvasil!
Have you seen this related post?
I suspect you either have duplicates or you have merged entity spans of annotations on the same data. In the second case, if you’ve accepted/rejected several entities on the same text, those will be combined into one example.
Be sure to use logging PRODIGY_LOGGING=basic
that should show the dedup step explicitly.
The final one is what you should go with (if you're comfortable with how Prodigy's is defaulting its behavior by deduping/merging entities, etc.).
Let me know if this helps!