Annotate using ner.manual for a new language

ines · October 26, 2019, 11:45am

Hi! The ner.manual recipe only uses the model for tokenization, so all you need is an Indonesian tokenizer (which spaCy already supports). Once you're ready to train, you can then start off with the a blank Indonesian model and train your entity recognizer from scratch.

To save out a blank model that only includes the tokenization rules and no pipeline components, you can run the following:

import spacy
nlp = spacy.blank("id")
nlp.to_disk("/path/to/model")

Or as a handy one-liner on the command line:

python -c "import spacy; spacy.blank('id').to_disk('/path/to/model')"

In Prodigy, you can then pass in /path/to/model as the base model during annotation and later for training. Also see this thread for more details and examples for working with languages that don't have pre-trained models: Working with languages not yet supported by Spacy

Topic		Replies	Views
Updating an NER model using the annotation tool ner , spacy	6	397	June 5, 2023
spaCy, prodigy, annotation usage , ner , solved	2	722	February 8, 2019
Custom recipe w/o model usage , ner , solved	2	673	April 18, 2018
Support for Japanese NER support in spacy! ner , spacy , solved	8	2627	January 24, 2019
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018

Annotate using ner.manual for a new language

Related topics