Obviously, there is not for the annotation tool, apart from inconvenience, but i suspect model may cut off ends of sentences which are too long on training stage
If there is such limitation, what is it?
Thank you!
Obviously, there is not for the annotation tool, apart from inconvenience, but i suspect model may cut off ends of sentences which are too long on training stage
If there is such limitation, what is it?
Thank you!
No, spaCy doesn’t do that – it’ll never just truncate your data. (There’s currently only a limit of ~100k characters per Doc
object to prevent memory issues that can occur with the new neural network models. But that’s usually not a problem.)
If you’re using Prodigy with an active learning-powered recipe, it’s recommended to use shorter texts. For each text, Prodigy will search for the best-scoring parses (see also: beam search), so if the sentences are very long, this can easily be less effective and consume more memory. We also generally recommend working with shorter texts because they’re much easier to process by the human annotators. One single annotation decision should be very quick and ideally only really require a few seconds – and that’s much easier if the texts are split into smaller chunks.