Annotate using ner.manual for a new language

daniseyy · October 26, 2019, 5:42am

Hi, i'm trying to annotate new data using the ner.manual recipe for Indonesian language in which doesn't have any existing models. Should i train using the en_core_web_sm or are there any other options for me to do this?

Thanks in advance

ines · October 26, 2019, 11:45am

Hi! The ner.manual recipe only uses the model for tokenization, so all you need is an Indonesian tokenizer (which spaCy already supports). Once you're ready to train, you can then start off with the a blank Indonesian model and train your entity recognizer from scratch.

To save out a blank model that only includes the tokenization rules and no pipeline components, you can run the following:

import spacy
nlp = spacy.blank("id")
nlp.to_disk("/path/to/model")

Or as a handy one-liner on the command line:

python -c "import spacy; spacy.blank('id').to_disk('/path/to/model')"

In Prodigy, you can then pass in /path/to/model as the base model during annotation and later for training. Also see this thread for more details and examples for working with languages that don't have pre-trained models: Working with languages not yet supported by Spacy

daniseyy · October 27, 2019, 6:27am

Perfect! Thank you so much

Topic		Replies	Views
Blank spacy model without being trained usage , ner , spacy , solved	6	3340	July 29, 2021
Blank spacy model vs en_core_web_xx usage , ner , spacy , custom	2	881	October 25, 2021
New language model for NER usage , ner , spacy , solved	2	570	September 17, 2019
Updating an NER model using the annotation tool ner , spacy	6	397	June 5, 2023
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018

Annotate using ner.manual for a new language

Related topics