Hi @ines,
I have gathered 15k dataset in German language. I ran prodigy ner command with different models. I didn't find vectors model in German so used de large one. I was expecting higher accuracy using de_core_news_lg than en_vectors because of the language. May I know why it has 4% difference?
I think the problem might be that you're "resuming" the weights from de_core_news_lg, instead of starting from new weights and just using the vectors. Can you paste the commands you used for each model?
Yes I do think the issue is the weight resuming. You can create a model that has the vectors but dumps the trained pipeline components like this:
import spacy
nlp = spacy.load("de_core_news_lg")
with nlp.disable_pipes(*nlp.pipe_names):
nlp.to_disk("./de_vectors_news_lg")
Alternatively, let's say you want to make a new model, with your own vectors. You can do that with the command spacy init-model, as described here: https://spacy.io/usage/vectors-similarity#converting . You can convert vectors from tools like FastText, so you could use the trained models from fasttext.cc