Improving a NER model with transformers (model size issue)

jcbmyrstn · October 5, 2021, 7:34pm

Hi,
I’m trying to improve a NER model for ancient Greek, a low resource language, with a transformer (spaCy model to be trained with 4000 NER annotations produced with Prodigy). I tried to pretrain my own transformer with the largest corpus I could build (1.7 GB), but this model negatively impacts the overall accuracy of the spaCy model: from 85 to 40. (I do not know why this is happening: too little pretraining data?)
So, I have turned to xlm-roberta-base that brings my model’s accuracy to 91%, but the size of the trained spacy model is huge, twice as large as en_core_web_trf. This must the result of the size of xlm-roberta-base. Which other multilingual transformer I could use that is not bert-base-multilingual, which does not perform as well as roberta for my case. I could not find a distilled version of xlm-roberta-base.

jcbmyrstn · October 6, 2021, 8:00pm

Or, there is a way to reduce the size of the spacy transformer model?

danieldk · October 8, 2021, 9:14am

Thanks for the interesting question! The largest part of the XLM-RoBERTa base model is its vocabulary. Since the vocabulary uses ~250,000 pieces, 732MiB of the model's parameters are embeddings.

Did you try the bert-base-greek-uncased-v1 model? Since its vocabulary only consists of only 35,000 pieces, this model is considerably smaller than XLM-RoBERTa-base.

Another solution, which would require implementation work, would be to prune the vocabulary of your finetuned XLM-RoBERTa model. For example, you could run the model over a larger unannotated greek corpus, keep track of which pieces are used, and then remove the embeddings for the pieces that are never used. You could then map the piece identifiers from the tokenizer to the new embedding matrix.

jcbmyrstn · October 8, 2021, 4:00pm

Hi Daniel,

thanks for your response.

I will try both suggestions. I did not use bert-base-greek because it was trained in a modern Greek corpus, and xlm-roberta seems to have included (by mistake I guess) the ancient Greek texts which are on the web. But I will give a tray to bert-base-greek and then see if I can prune xlm-roberta. I will also experiment pretraining tok2vec with the corpus I have and see if I get comparable results.

Thanks

Topic		Replies	Views
Issue getting Tranformer-based NER pipeline working usage , spacy , transformers	3	1275	January 29, 2021
Prodigy vs DistillBERT model textcat , spacy , transformers	2	710	February 9, 2025
New features idea enhancement	1	321	August 19, 2021
Training a NER model with en_trf_robertabase_lg usage , spacy , solved , transformers	3	1815	January 19, 2020
Training new entity type with en_pytt_bertbaseuncased_lg model usage , ner , transformers	5	2053	August 30, 2019

Improving a NER model with transformers (model size issue)

Related topics