Span Categorizer - Labels Prediction

hmousa961 · November 17, 2021, 11:17am

Hello,
I'm having this issue with the spans categorizer. So I'm annotating data with 10 labels and training a model using it. Now whenever I train the model, 4 of these labels always has 0 accuracy. I thought I need to add more data on these labels, but even though I did that nothing changes. I've attached a screenshot of the results.

Capture

Thanks a lot for your help.

Prodigy model: 1.11.5

ljvmiranda921 · November 18, 2021, 1:15am

Hi @hmousa961 ,

To sanity-check, do you have training and evaluation data included for these four labels? If you don't have that many examples and use a random split, you'd end up with few examples to train your model and "detect" those other labels. As an illustration, if you only have one example and the model gets it wrong, you immediately get 0 accuracy.

To help you debug, you can first check for imbalanced data. Try running:

python -m spacy debug data path/to/file

and assess if your data distributions are roughly the same.
C.f. https://spacy.io/api/cli#debug-data

hmousa961 · November 18, 2021, 7:19am

Thank you for yor help. I'll check it and let you know if it works.

hmousa961 · November 18, 2021, 7:21am

Yes I do. Its not as much data as for the other labels. But I've been annotating data for these labels. Is there a data threshold for the model to start predicting?

ines · November 18, 2021, 9:51am

How many instances of these labels do you have in your evaluation data in total? The thing is, if you only have 2 examples and the model gets both wrong, that's an accuracy of 0. If you have 10 examples and it gets 2 wrong, that's 80%. So depending on the number of examples you have, the evaluation can be quite brittle and less representative. Similarly, if you only have very few examples to learn from, the model is more likely to struggle learning a given label.

hmousa961 · November 18, 2021, 10:23am

Okays now I understood the way it works. Thank you so much for the help.

Topic		Replies	Views
Training Data after Using spans.manual usage , done , spacy , spancat	20	843	August 21, 2021
impact of percentage of evaluation data on performance spacy , spancat	9	958	December 13, 2022
Textcat_MultiLable - How doc[cats]=1 or 0 works while training the Model textcat , spacy	3	20	February 14, 2025
Span Cat Annotations and Incorrect Predictions spacy , spancat	4	848	June 8, 2023
SPANCAT not training on single label spacy , spancat	4	937	September 7, 2021

Span Categorizer - Labels Prediction

Related topics