@kak-to-tak How is your custom tokenizer implemented? Prodigy will use the model's nlp.make_doc
method to create a tokenized Doc
from the string of text. By default, this will call into nlp.tokenizer
. So your custom tokenization should be implemented via the model's tokenizer.
Alternatively, you can also feed in pre-tokenized data that has a "tokens"
property. See here for an example of the format: https://prodi.gy/docs/api-interfaces#dep