This is a warning coming internally from transformers
or tokenizers
and you don't see actual errors because long sequences are truncated internally before they're passed to the model.
If it happens rarely, you can probably ignore it. If it's frequent, you may want to adjust the window
and stride
for the transformer
span getter in your config. See: Receiving the warning messgae 'Token indices are too long' even after validating doc length is under max sequence length · Discussion #9277 · explosion/spaCy · GitHub