Turkish language that spaCy doesn’t yet provide pre-trained models

omer · February 4, 2019, 9:29pm

Do you have any model for Turkish language?

I saw that

What is best roadmap to create new model?
What kind of minimum information do I need to create new model?

Thanks for your advice

ines · February 6, 2019, 1:37pm

If you want to train a model from scratch and you also want to train a tagger and dependency parser, you probably want to start off with an existing treebank. More details here:

Using Prodigy, you can then create your own named entity annotations. This is by the way how the named entity recognizer of the new Greek model (currently available for spacy-nightly) was trained. It’s a slightly more involved process, because you want to make sure you have enough examples (minimum a few thousand fully annotated) and also a good evaluation. Because you need gold-standard annotations (all labels annotated and no missing values), you probably want to use the ner.manual recipe and label it by hand.

leotsiakolou · January 23, 2020, 10:53am

In order to train a NER recognizer in this particular context (language like Turkish, Greek without a pre-defined model), do we need to have dependency parser? Or is it enough to annotate only with named entities? What is the recommended number of sentences for like 5 different datatypes?

ines · January 23, 2020, 12:50pm

spaCy's components can be trained indepdently, so you don't need a parser to train an entity recognizer. (The xx_ent_wiki_sm model is an example of a model that was only trained on NER annotations and that only has an NER component.)

That really depends on your data, entity frequencies etc. We typically recommend annotationg at least a few hundred to a few thousand for meaningful results that you can draw conclusions from. If you're using Prodigy, you can periodically run the train-curve recipe to check if your model is improving with more data and to see if you're on the right track.

Topic		Replies	Views
Prodigy created model does not work usage , ner	2	741	November 9, 2018
Support for Japanese annotation in Prodigy ner , spacy	1	912	September 2, 2019
Labeling sequence labeling (e.g. NER) task from scratch ner , spacy	16	3494	October 22, 2017
How is the support for Languages other than English? usage , spacy	4	3341	March 17, 2020
Using Prodigy with Greek text ner , spacy , solved	1	395	April 20, 2020

Turkish language that spaCy doesn’t yet provide pre-trained models

Related topics