Use custom tokenizer in data-to-spacy recipe

I am developing a model with spaCy using data from prodigy. I'm using a custom tokenizer and I want this new spaCy data with specific tokenization. Is there any chance to run the data-to-spacy command specifying the tokenizer from an external file?

Hi @alvaro.marlo!

Yes, it seems like you can pass the -F for a script with a tokenizer function. It seems to work even though it wasn't originally designed to do this:

Hope this helps!