spans.correct recipe

lnatprodigy · August 14, 2021, 2:26pm

Hi everyone,

I was wondering if there is gonna be a span.correct recipe. It would be extremely useful for my use case.
I already have a NER model and I'm wanting to use a SpanCat model, because my entities consist of multiple spans that I'd also like to recognize. I could easily train an initial SpanCat model with my NER data and use it for span.correct.

Is something like this planed in the near future?

A second thing that would be very usefull is to be able to use patterns in non-manual recipes

ines · August 15, 2021, 1:46am

Hi! I moved your comment to a separate thread, because that's a better fit And yes, this is a nice idea and I already had this in mind. So we'll definitely be adding this in the future!

This is a bit more difficult and introduces the problem of how matches vs. predictions should be handled, and which to prefer in case there are overlaps. For NER uses cases, you could default to showing either the prediction or pattern match if they disagree – although, it's often useful to see both, but you still want to make sure that your final data ends up with only one version. And while the span categorizer can predict overlapping spans, you'd often still want to pick one span that's most consistent. For instance, the model may predict a "the" + noun phrase, while your pattern describes only the noun phrase. In that case, you want to make sure that your final data only ends up with one of them, not both. The "comparing annotations" workflow described in this issue goes in a similar direction, and it's definitely something you could implement in a custom recipe: Recipe for comparing NER model and manual annotation - #3 by haishao

lnatprodigy · August 15, 2021, 7:26am

That's awesome!

Thank you, I'll look into that.

ines · August 17, 2021, 1:14am

Just released v1.11.1, which includes a spans.correct workflow https://prodi.gy/docs/recipes#spans-correct

lnatprodigy · August 17, 2021, 10:45am

I thought some more about this problem and I think the easiest and probably most pragmatic approach would be to let the user decide whether they want patterns or predicitons to get priority and just discard the other.
For example in my use case, since I'm coming from an existing NER model, it would be awesome to use the model to predict what the NER model had covered (first use the ner data to train a spancat model, which I already did) and then use patterns to help me annotate the additional, overlapping spans (none of which where covered by the ner model, which means there should be no conflicts).
I'm sure I'm a bit biased here and this thinking may be very specific to my use case, but I thought I'd share my thoughts

Topic		Replies	Views
Is there a way to use spans.correct with patterns? usage , spancat	2	449	October 28, 2022
ner.teach with a model made using span.manual usage , ner , solved , spancat	2	327	November 15, 2021
Highlighting spans that are not the entities to be labeled when using ner.correct usage , ner	1	454	December 21, 2020
ner.manual pattern file usage , ner	1	674	August 21, 2021
transition from ner to spancat usage , ner , spancat	1	585	November 8, 2021

spans.correct recipe

Related topics