If you're using a format that can be read in line by line, e.g. .txt or .jsonl, the document size doesn't matter because it will be streamed in line by line.
.txt is okay if you're working with sentences, but it's not a great format if your examples include line breaks because there's no good way to define where an example starts and ends and how to split up the data. So in that case, you probably want to use a more flexible format like .json or .jsonl instead. If the recipe you're using performs sentence segmentation, you can disable it using the --unsegmented flag.
You typically want to focus on a sentence or paragraph per example if you're annotating entities, because there's no advantage in annotating longer documents, and it's a lot easier for the annotator. If you're annotating text categories, you can also use longer documents – that really depends on your task and the data.