How to apply transfer learning for Vietnamese accounting domain? (Because have little training data for a specific domain)

I see an example for NER in Vietnamese at spaCy/examples.py at master · explosion/spaCy · GitHub .

I have a data in Vietnamese accounting domain (100 A4 pages of text in this domain, I created it).

Please guide me direction for apply transfer learning/complement from general Vietnamese text to specific domain (accounting) Vietnamese text. My question quite urgent, please help me. My text in Vietnamese accounting domain: Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.

I cannot create NER for Vietnamese accounting domain from scratch, because data training is too much. I just have a little data (about 100 pages A4, I prepare it in 3 months). I need the ability for transfer existing general Vietnamese NER to Vietnamese accounting domain. I hope you understand my problem.

In English, I see guide at Training a NAMED ENTITY RECOGNITION MODEL with Prodigy and Transfer Learning - YouTube . In Vietnamese, I don't know success chance with transfer learning and Vietnamese accounting domain text. Please tell me about opportunity and tough problem with my approach. To sum-up: I need NER for Vietnamese accounting posting data with little training data.

Thank you, if you have any concern, please let's me know.

I do not speak Vietnamese, so my advice may need to be taken with a grain of salt. I'm aware of an effort here that seems to be a pre-trained Vietnamese model, but I don't know if it comes with a reliable named entity recognizer.

I suppose my best advice at the moment is to perhaps not worry too much about the transfer learning and to instead put more effort into labeling and doing active learning. There's nothing wrong with training a spaCy model from scratch and many models can still have a reasonable performance on limited sets of training data.

Have you seen our ner teach recipies?