SPANCAT not training on single label

Onyoursix · September 4, 2021, 3:14am

I'm not entirely sure if this would be a Spacy issue or a Prodigy issue.

I've been testing out accuracy between NER and SPANCAT with a few labels to see which one gives me better results. The first data set I had two labels which I labeled together using Prodigy (both NER and SPANCAT). The next label I did just by itself in a new data set. NER I got roughly 77% but with SPANCAT I get 0%, I was super puzzled by this and thought maybe I didn't have enough examples, it's a super easy label with patterns so I got up to 2,000 examples fairly quickly and I still am getting 0% on the precision, recall, & f scores.

Just to be clear, each data set I've created twice, one as a NER and another as a SPANCAT.

I merged the two SPANCAT data sets together (first one with two labels, second one with one label) and tried to train again, that one I also got 0% across the board as well (before the one with two labels worked fine). Any idea what might be going on here? I also exported datasets to Spacy and had the same results.

Here's output from the training with one label.

lnatprodigy · September 4, 2021, 7:31am

Are you on the very latest version of spacy?
There was a bug in the initial release of the spancat component, that lead to exactly this experience (I was running into this exact thing myself).

Onyoursix · September 4, 2021, 5:56pm

I am, I'm on spacy version 3.1.2, I believe I came across the posts regarding the issue you're referring to and it was suggested as a temporary fix to edit the spancat.py file. I opened the file and it seems the bug referenced in the threads was indeed applied already.

I believe I may have found the culprit though. When I train with --base-model blank:en I get a score of 0's like the screen shot above. However, if I change the --base-model to say en_core_web_sm I am now getting scores.

ines · September 7, 2021, 1:14am

That's strange If you look at the configs generated in both scenarios, are there any differences related to the spancat component?

Onyoursix · September 7, 2021, 2:15am

I checked all 3 config files (2 for the data sets, 1 for both of them combined). The only difference is the max_size which I would assume is correct because of the difference in n_grams and the combined set had the larger of the two, as well as the labels location was different (of course).

[components.spancat.suggester]
@misc = "spacy.ngram_range_suggester.v1"
min_size = 1
max_size = 13

[components.spancat.suggester]
@misc = "spacy.ngram_range_suggester.v1"
min_size = 1
max_size = 20

I'm wondering if I goofed my data set somewhere, somehow, and I'm just not aware of it.

These two data sets are throwaways and I need to relabel them, so maybe I'll just follow up in this thread in a day or two when I've done that.

Topic		Replies	Views
Span Categorizer - Labels Prediction usage , training , spancat	5	469	November 18, 2021
Losing spancat labels when training after using prodigy db-merge spacy , spancat	12	339	January 3, 2024
Training Data after Using spans.manual usage , done , spacy , spancat	20	842	August 21, 2021
NER Trained Model Analysis ner , spacy	9	543	July 30, 2023
Spancat Give scores 0	5	522	February 20, 2024

SPANCAT not training on single label

Related topics