Lesser annotations for training despite having more annotations in database

Hello guys!

I have been using prodi.gy and spacy for a while now and I must thank you for building such an awesome product! It has really made coding, debugging and annotating a lot easier and more organised than ever for NLP tasks.

I had a small issue concerning training using train recipie for ner models. Have a look at the screenshot below:

The training happens smoothly no issue. However look at the stats of this dataset. I have 1000 annotations which are all having the answer key "accept". However, when I start building a model, I can see that there's only 837 samples available for training. What is done with the remaining 163 samples? Do they never appear in the training; and if they're truncated, why so?

Thanks & Regards,

Hi! If you look at the examples in your dataset, are there any duplicate annotations, e.g. annotations on the same text? Or did your data end up with examples that have the same hashes? This would cause examples to be merged before training, so you end up with a lower number than what's in your actual dataset.