Since upgrading to spaCy 3.0 I am having trouble figuring out how to do something similar to what is described here:
Could someone show an updated example?
Thanks!
Since upgrading to spaCy 3.0 I am having trouble figuring out how to do something similar to what is described here:
Could someone show an updated example?
Thanks!
Hi! The general approach described in the thread you linked should still work the same in spaCy v3 – but you now don't need to include the hack of overwriting nlp.tokenizer
and can just register a custom tokenizer by adding the @spacy.registry.tokenizers
decorator to your function: https://spacy.io/usage/linguistic-features#custom-tokenizer-training
In the config, you can then write:
[nlp.tokenizer]
@tokenizers = "your_custom_tokenizer_name"
You also don't have to manually edit the __init__.py
on your package anymore after running spacy package
. Instead, you can also use the --code
argument on the CLI and point it to the Python file containing your custom functions. It will then be packaged with the pipeline automatically. https://spacy.io/api/cli#package