I'm not entirely sure if this would be a Spacy issue or a Prodigy issue.
I've been testing out accuracy between NER and SPANCAT with a few labels to see which one gives me better results. The first data set I had two labels which I labeled together using Prodigy (both NER and SPANCAT). The next label I did just by itself in a new data set. NER I got roughly 77% but with SPANCAT I get 0%, I was super puzzled by this and thought maybe I didn't have enough examples, it's a super easy label with patterns so I got up to 2,000 examples fairly quickly and I still am getting 0% on the precision, recall, & f scores.
Just to be clear, each data set I've created twice, one as a NER and another as a SPANCAT.
I merged the two SPANCAT data sets together (first one with two labels, second one with one label) and tried to train again, that one I also got 0% across the board as well (before the one with two labels worked fine). Any idea what might be going on here? I also exported datasets to Spacy and had the same results.
Are you on the very latest version of spacy?
There was a bug in the initial release of the spancat component, that lead to exactly this experience (I was running into this exact thing myself).
I am, I'm on spacy version 3.1.2, I believe I came across the posts regarding the issue you're referring to and it was suggested as a temporary fix to edit the spancat.py file. I opened the file and it seems the bug referenced in the threads was indeed applied already.
I believe I may have found the culprit though. When I train with --base-model blank:en I get a score of 0's like the screen shot above. However, if I change the --base-model to say en_core_web_sm I am now getting scores.
I checked all 3 config files (2 for the data sets, 1 for both of them combined). The only difference is the max_size which I would assume is correct because of the difference in n_grams and the combined set had the larger of the two, as well as the labels location was different (of course).