German NER model

mystuff · November 22, 2020, 8:53pm

Hi @ines,
I have gathered 15k dataset in German language. I ran prodigy ner command with different models. I didn't find vectors model in German so used de large one. I was expecting higher accuracy using de_core_news_lg than en_vectors because of the language. May I know why it has 4% difference?

en_vectors_web_lg - 92%
de_core_news_lg - 88%

Thanks,

honnibal · November 23, 2020, 6:20am

Hi @mystuff,

I think the problem might be that you're "resuming" the weights from de_core_news_lg, instead of starting from new weights and just using the vectors. Can you paste the commands you used for each model?

mystuff · November 23, 2020, 9:23am

Hi @honnibal,
Thanks for the reply

python -m prodigy train ner de_20000 de_core_news_lg --output de_20000_core_lg
python -m prodigy train ner de_20000 en_vectors_web_lg --output de_20000_en_vectors

honnibal · November 26, 2020, 1:24am

Yes I do think the issue is the weight resuming. You can create a model that has the vectors but dumps the trained pipeline components like this:

import spacy
nlp = spacy.load("de_core_news_lg")
with nlp.disable_pipes(*nlp.pipe_names):
    nlp.to_disk("./de_vectors_news_lg")

Alternatively, let's say you want to make a new model, with your own vectors. You can do that with the command spacy init-model, as described here: https://spacy.io/usage/vectors-similarity#converting . You can convert vectors from tools like FastText, so you could use the trained models from fasttext.cc

Topic		Replies	Views
prodigy ner blank vs vectors model usage , ner , spacy , solved	8	872	May 13, 2020
en_vectors_web_lg loading issue usage , spacy , solved	1	1118	May 13, 2020
Can't find model 'en_vectors_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. install , solved	3	3235	July 28, 2020
en_vectors_web_lg renamed to en_core_web_lg in https://prodi.gy/docs docs , done	1	453	May 6, 2021
ner.make-gold with en_vectors_web_lg model usage , spacy	2	728	September 3, 2018

German NER model

Related topics