I'll need to pick this up as a discussion with other team members and get back to you on this. I want to be careful in introducing new arguments to data-to-spacy because it might get unwieldy to support many of these settings. There are many models that we support in that command, and we might need to allow for many extra settings if we go down this route.
That said, I am wondering if there are other things we might be able to do to make this easier, because your use-case certainly seems fair.
yes I understand.... another road to take would be to make it configurable whether SpanCat models will overwrite or just extend the spans under doc.spans["sc"] (which basically was the original problem).
But this again might introduce other problems.
As my use case might be an edge case, a workaround as discussed in the linked GitHub discussion is okay for me.
We just had a discussion on this topic and the consensus is that it makes sense to keep the current data-to-spacy recipe simple and to not add extra arguments. However, this is certainly something we might want to revisit in Prodigy v2. There's certainly a window of opportunity to leverage a config system more for this sort of thing that might allow users like yourself to really customise specific parts of the recipes.
Can't make any promises on what will eventually will get implemented, but I can confirm we're eager to revisit this once it's time for v2!