Span Categorizer - Labels Prediction

Hello,
I'm having this issue with the spans categorizer. So I'm annotating data with 10 labels and training a model using it. Now whenever I train the model, 4 of these labels always has 0 accuracy. I thought I need to add more data on these labels, but even though I did that nothing changes. I've attached a screenshot of the results.

Capture

Thanks a lot for your help.

Prodigy model: 1.11.5

Hi @hmousa961 ,

To sanity-check, do you have training and evaluation data included for these four labels? If you don't have that many examples and use a random split, you'd end up with few examples to train your model and "detect" those other labels. As an illustration, if you only have one example and the model gets it wrong, you immediately get 0 accuracy.

To help you debug, you can first check for imbalanced data. Try running:

python -m spacy debug data path/to/file

and assess if your data distributions are roughly the same.
C.f. https://spacy.io/api/cli#debug-data

Thank you for yor help. I'll check it and let you know if it works.

Yes I do. Its not as much data as for the other labels. But I've been annotating data for these labels. Is there a data threshold for the model to start predicting?

How many instances of these labels do you have in your evaluation data in total? The thing is, if you only have 2 examples and the model gets both wrong, that's an accuracy of 0. If you have 10 examples and it gets 2 wrong, that's 80%. So depending on the number of examples you have, the evaluation can be quite brittle and less representative. Similarly, if you only have very few examples to learn from, the model is more likely to struggle learning a given label.

Okays now I understood the way it works. Thank you so much for the help.