✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans & more

Ah, it seems like this is another and probably more common edge case with Prodigy's logic to auto-generate a config from a base model (also just came up in this thread): it currently copies the entire config, including the initialization settings that only run before training. Every model's config records those settings so you know exactly how the artifact was created – but the settings may refer to external files or code that's not required at runtime and not necessarily included. So what you're getting here is the exact

This is a tricky problem and I need to think about how we can best solve it :thinking: One the one hand, using a base model should give you the exact same config settings so you can train with the same configuration as the original pipeline. On the other hand, we need to guard against missing resources and references because otherwise, using a base model will likely fail 80% of the time. But we also can't make any assumptions about the model initialization settings because those could be anything (especially since we also want to support third-party pipelines like scispaCy etc.) :thinking::thinking::thinking:

In the meantime, you can find the orth variants here: https://github.com/explosion/spacy-lookups-data/blob/master/spacy_lookups_data/data/de_orth_variants.json You can probably also remove this part because it's mostly an extra for data augmentation but not necessarily required if you're updating the model with more data.