I am using the NER batch_train recipe code to train a model, where I have customized it so I can bring in my own dataset.
I have defined my own evaluation set of 43 examples (i.e. not a random split), where I know each example includes two spans/entities, one of Category 1 and one of Category 2.
However when I go to train my model, I see the following:
It seems to be evaluating against 70 entities, where I expected 86 (43*2). The model trains correctly (no errors), but hits only 60% accuracy, below what I would expect. The 70 vs 86 issue makes me think I am incorrectly coding something when I construct my dataset.
How do I debug this? Ideally I would like to see what 70 entities the model is evaluating against (and therefore which ones it is missing), but also open to other suggestions/advice?