NER: English training dataset for German language

Hi @ines I have English training data in prodigy format for 10 entities. I am getting 90% accuracy for my domain. Thanks to both Prodigy and SPacy.
Now i need to do same NER in German language.

Does that existing english training data with spacy multi language model work for German NER?.
If not, what is the best way in this situation?.
Do i need to collect German language training data separately?

Glad to hear your model is worked well :smiley:

Typically, you'd create different training data for different languages, yes. You want the training data to be as close as possible to the data that the model will see at runtime. For a German NER model, that'd be German text with entities. There are also certain language differences that can have an impact: in English text, capitalisation can be a strong indicator for named entities and the model can take advantage of that. In German, that's not the case at all, because all nouns are capitalised. So that should probably be reflected in your training data - otherwise, your model may get very confused.

That said, if the entities you're looking for are similar and you already have annotated data, there's no need to do everything from scratch. For example, you could use your annotated English entities to create match patterns and then use those in ner.manual. This will pre-highlight those entities for you, so you have less work when creating your German data.

Thanks for your reply. will try that way.