Feature Request: choosing key for spancat labels in data-to-spacy

This feature request originated in Using multiple SpanCat models in one pipeline · explosion/spaCy · Discussion #12462 · GitHub

When generating training data for SpanCat models with a custom key using prodigy data-to-spacy you have to manually move the data from the default sc key to the models custom key.

It would be easier if you could somehow configure the target key in the data-to-spacy command.

Okay it would be awesome if the target key could be read from the configuration, but this is a bit far fetched at the moment =)

On the other hand: how could you use such a model in recipes like span.correct?

Hi Benjamin.

Interesting use-case you have there!

I'll need to pick this up as a discussion with other team members and get back to you on this. I want to be careful in introducing new arguments to data-to-spacy because it might get unwieldy to support many of these settings. There are many models that we support in that command, and we might need to allow for many extra settings if we go down this route.

That said, I am wondering if there are other things we might be able to do to make this easier, because your use-case certainly seems fair.

Will report back later this week!

1 Like

Hi Vincent,

yes I understand.... another road to take would be to make it configurable whether SpanCat models will overwrite or just extend the spans under doc.spans["sc"] (which basically was the original problem).

But this again might introduce other problems.

As my use case might be an edge case, a workaround as discussed in the linked GitHub discussion is okay for me.

Looking forward to your reply.

1 Like

We just had a discussion on this topic and the consensus is that it makes sense to keep the current data-to-spacy recipe simple and to not add extra arguments. However, this is certainly something we might want to revisit in Prodigy v2. There's certainly a window of opportunity to leverage a config system more for this sort of thing that might allow users like yourself to really customise specific parts of the recipes.

Can't make any promises on what will eventually will get implemented, but I can confirm we're eager to revisit this once it's time for v2!

1 Like

Sounds like a promising long-term strategy!

And in the meantime there are the workarounds in the linked GitHub discussion above :+1:

1 Like