Using xlm-roberta model for tokenization

Hi Ines, thank you for your suggestions :D.

To give you a little bit more context, we have tasks similar to NER where the annotators should highlight phrases of a sentence and label a topic to each phrase. What we try is to use ner.manual recipe with blank:en model. One requirement is to use the xlm-roberta model for tokenization so that we use the same tokenizer as what we use to train our model.

The first solution seems appealing to me as it has automatic tokenization alignment. I wonder if we can do this in prodigy 1.10 (as we prefer to use a stable release). CMIIW, spacy 3 will be supported in the upcoming prodigy 1.11. Do you have a rough estimation of when will v1.11 become a stable release?

I also stumbled on this thread Custom Tokenizer where you suggested save out the model and package it with spacy_package. Do you think this solution is valid for my problem?