Feature Request: choosing key for spancat labels in data-to-spacy

b2m · March 28, 2023, 2:48pm

This feature request originated in Using multiple SpanCat models in one pipeline · explosion/spaCy · Discussion #12462 · GitHub

When generating training data for SpanCat models with a custom key using prodigy data-to-spacy you have to manually move the data from the default sc key to the models custom key.

It would be easier if you could somehow configure the target key in the data-to-spacy command.

Okay it would be awesome if the target key could be read from the configuration, but this is a bit far fetched at the moment =)

On the other hand: how could you use such a model in recipes like span.correct?

koaning · March 29, 2023, 12:05pm

Hi Benjamin.

Interesting use-case you have there!

I'll need to pick this up as a discussion with other team members and get back to you on this. I want to be careful in introducing new arguments to data-to-spacy because it might get unwieldy to support many of these settings. There are many models that we support in that command, and we might need to allow for many extra settings if we go down this route.

That said, I am wondering if there are other things we might be able to do to make this easier, because your use-case certainly seems fair.

Will report back later this week!

b2m · March 29, 2023, 12:49pm

Hi Vincent,

yes I understand.... another road to take would be to make it configurable whether SpanCat models will overwrite or just extend the spans under doc.spans["sc"] (which basically was the original problem).

But this again might introduce other problems.

As my use case might be an edge case, a workaround as discussed in the linked GitHub discussion is okay for me.

Looking forward to your reply.

koaning · March 31, 2023, 2:00pm

We just had a discussion on this topic and the consensus is that it makes sense to keep the current data-to-spacy recipe simple and to not add extra arguments. However, this is certainly something we might want to revisit in Prodigy v2. There's certainly a window of opportunity to leverage a config system more for this sort of thing that might allow users like yourself to really customise specific parts of the recipes.

Can't make any promises on what will eventually will get implemented, but I can confirm we're eager to revisit this once it's time for v2!

b2m · April 3, 2023, 5:50am

Sounds like a promising long-term strategy!

And in the meantime there are the workarounds in the linked GitHub discussion above

Topic		Replies	Views
Unable to use train and run data-to-spacy recipes for spancat on prodigy 1.11.10 solved , spancat	4	877	May 4, 2023
Spancat is not trained spancat	12	1113	July 27, 2022
Spancat training from db-in'd dataset not working usage , spancat	8	558	April 22, 2022
Losing spancat labels when training after using prodigy db-merge spacy , spancat	12	339	January 3, 2024
span categorization spacy , spancat	3	355	March 24, 2023

Feature Request: choosing key for spancat labels in data-to-spacy

Related topics