I am developing a model with spaCy using data from prodigy. I'm using a custom tokenizer and I want this new spaCy data with specific tokenization. Is there any chance to run the data-to-spacy
command specifying the tokenizer from an external file?
Hi @alvaro.marlo!
Yes, it seems like you can pass the -F
for a script with a tokenizer function. It seems to work even though it wasn't originally designed to do this:
Hope this helps!