Unsupervised Training of SPACY language model using a specialized Corpus

a.konstantinidis · June 26, 2020, 2:28pm

My task involves Greek text used by a branch of the Greek State Administration. To accomplish my set objectives I will use the large Greek Spacy model, for a starter.

However, since the text is specialized, I would like to fine-tune the Spacy Greek model using the particular Corpus. I refer to unsupervised learning. My hope is that following the fine-tuning, the performance of the existing Greek SPACY model will improve when it comes to NER tasks with documents originating from the aforementioned Corpus.

Could you please advise how such fine-tuning can be accomplished? Please be generous in suggesting possible answers, tutorials and pertinent links if you happen to know of.

honnibal · June 29, 2020, 10:08am

Hi @a.konstantinidis,

I think what you want is the spacy pretrain command, which you can find documented here: https://spacy.io/api/cli#pretrain

There's also been some discussion of pretraining on the forum before, for example you can look at these threads: https://support.prodi.gy/search?q=pretrain

In general we can only provide limited support for spaCy-only questions that don't involve Prodigy directly here, as we need to make sure the forum stays more or less on-topic. Fortunately spaCy has quite an active community, so you should be able to find a lot of information from other users, and if you need more direct help there are several consultants who know the software well.

Topic		Replies	Views
Turkish language that spaCy doesn’t yet provide pre-trained models usage , spacy	3	1667	January 23, 2020
Using Prodigy with Greek text ner , spacy , solved	1	390	April 20, 2020
Improve custom NER model performance for different input texts usage , ner , spacy	1	231	February 19, 2024
Help with training from scratch english NER model with pretrained Gensim vectors usage , ner , spacy	2	642	January 27, 2022
Prodigy to Spacy Guide ner , spacy , best-practices	4	5319	January 13, 2020

Unsupervised Training of SPACY language model using a specialized Corpus

Related topics