Best annotation strategy for NER

Hi! Most NER implementations, including spaCy's default NER model, typically look at a very narrow context window, e.g. a few surrounding words on either side. So there's usually not an advantage in labelling the whole text at once as opposed to sentence by sentence or paragraph by paragraph. In fact, it can sometimes be counterproductive: if you design your annotation scheme so that it needs context that's very far away, you could collect data that your model might not be able to learn from.

Splitting your text up shouldn't be too difficult. You can always use spaCy with a pretrained model or rule-based sentencizer and split the text into sentences. This will also make it much easier to annotate, because you get to move through the data faster and save more often.