How to incorporate document metadata in spaCy 3.0?

Since upgrading to spaCy 3.0 I am having trouble figuring out how to do something similar to what is described here:

Could someone show an updated example?


Hi! The general approach described in the thread you linked should still work the same in spaCy v3 – but you now don't need to include the hack of overwriting nlp.tokenizer and can just register a custom tokenizer by adding the @spacy.registry.tokenizers decorator to your function:

In the config, you can then write:

@tokenizers = "your_custom_tokenizer_name"

You also don't have to manually edit the on your package anymore after running spacy package. Instead, you can also use the --code argument on the CLI and point it to the Python file containing your custom functions. It will then be packaged with the pipeline automatically.