My goal is to train a model for NER and I have question regarding tokenization.
In my pipeline I want to use BERT but I'm not sure if that means that I have to use BERT's tokenizer during annotation.
Is the bert.ner.manual recipe from the docs only supposed to be used if I'm wanting to feed the data to BERT directly, or also if I'm using BERT as part of a spacy NER model?
There is this image int he spacy docs
This makes it seem like there is a separate tokenizer used, whether I'm using a transformer or not. So I'm not sure I should be annotating data as if they were directly fed into BERT.
I'm not sure I'm making myself clear, but I hope someone can help me out.
Any hints would be much appreciated.