Turkish language that spaCy doesn’t yet provide pre-trained models


(Omer Yavuz) #1

Do you have any model for Turkish language?

I saw that

What is best roadmap to create new model?
What kind of minimum information do I need to create new model?

Thanks for your advice

(Ines Montani) #2

If you want to train a model from scratch and you also want to train a tagger and dependency parser, you probably want to start off with an existing treebank. More details here:

Using Prodigy, you can then create your own named entity annotations. This is by the way how the named entity recognizer of the new Greek model (currently available for spacy-nightly) was trained. It’s a slightly more involved process, because you want to make sure you have enough examples (minimum a few thousand fully annotated) and also a good evaluation. Because you need gold-standard annotations (all labels annotated and no missing values), you probably want to use the ner.manual recipe and label it by hand.