Spancat Give scores 0

Hello,
I am running training on prodigy using spancat scores all the time coming 0.

E # LOSS TOK2VEC LOSS SPANCAT SPANS_SC_F SPANS_SC_P SPANS_SC_R SCORE


0 0 2298.99 6946.99 0.02 0.01 38.58 0.00
0 200 7729.63 34740.68 0.00 0.00 0.00 0.00
0 400 0.00 1008.98 0.00 0.00 0.00 0.00
0 600 0.00 1021.90 0.00 0.00 0.00 0.00
1 800 0.00 983.89 0.00 0.00 0.00 0.00
1 1000 0.12 990.56 0.00 0.00 0.00 0.00

I'm also running into a similar issue. At times it seems to work, but it's very hard to replicate and very frustrating to figure out what exactly may be going on. I've gone through support pages, but I'm not sure if the team has yet to determine what could be causing this problem. Any additional help would be greatly appreciated!

hi @Mohammad and @padejumo,

Thanks for your questions and sorry you're having issues.

Could it be a memory issue?

For example, perhaps try to modify your n-gram suggested or even try to run on a smaller amount of training data (e.g., try just on the first 100 records).

Related, you may want use spacy data debug to get basic stats on your spans and see if there's any issues.

This related post shows some of that information. In this case, it seems like the sentencizer was the problem and needed to be added:

Also, this user found there was an issue with duplicate spans:

@padejumo I'm sorry to hear about your frustration. Be sure to check out the spaCy GitHub Discussions for spaCy-related problems like training. While prodigy train seems like a Prodigy problem, it's just a wrapper for spacy train so you may be dealing with a spaCy issue, not a Prodigy problem. The spaCy core team answers questions in that forum, so it'll be easier to look and post there for these problems. I know it may be frustrating that we have two forums. Unfortunately, we need to have them because spaCy and Prodigy, while they sometimes overlap, typically have different support problems, so we need two different forums. We're looking at ways to improve searching across and improve the user experience more, so thanks for understanding in the short term.

Since you mentioned replication/reproducibility, what helps us tremendously is to provide fully reproducible examples with issues. For example, if you can provide your spaCy and Prodigy versions (spacy info and prodigy stats) as well as a small reproducible example (say give us a .jsonl with a few sample records). I know sometimes you can't provide examples due to data privacy, but the more you provide, the quicker we can help and debug.

I'm realizing now that it may be a memory issue, possibly it's running into the silent OOM error. I'm finetuning on a Huggingface transformer. Proposed solutions involve reducing the n-grams, nlp batch size or training.batcher size.

The spans I'm labeling are on the larger side, so I need n-grams between 5-20. Are there any other suggestions to resolve this issue? It doesn't seem like NER runs into this same problem, at what point would you suggest switching over to NER instead of spancat?

hi @padejumo,

I'm glad you found the issue.

That's really hard because it likely depends on the context and your own preferences. One important point is that if you annotated with a spans recipe, make sure you didn't make an overlapping spans as its permissible with spans recipes but not ner. If you try to train ner, you'll get an error if there are overlapping spans.

Not sure if you've seen it, but there's also a spaCy template project that compares spancat vs. ner:

You could also try modifying your suggester function. Also, are you using data-to-spacy so as to run spacy train instead of prodigy train? I doubt that'll do much for memory, but it does give you the ability to span characteristics via spacy debug data (see the ner_spancat_compare project for more details).

It may be a not an option, but any way you could even reframe your problem by splitting it up someway and using textcat instead? This post mentions it and a few related ideas.

Hope this helps!

How did you see the silent OOM error ?