Best annotation strategy for NER

ines · November 4, 2019, 6:46pm

Hi! Most NER implementations, including spaCy's default NER model, typically look at a very narrow context window, e.g. a few surrounding words on either side. So there's usually not an advantage in labelling the whole text at once as opposed to sentence by sentence or paragraph by paragraph. In fact, it can sometimes be counterproductive: if you design your annotation scheme so that it needs context that's very far away, you could collect data that your model might not be able to learn from.

Splitting your text up shouldn't be too difficult. You can always use spaCy with a pretrained model or rule-based sentencizer and split the text into sentences. This will also make it much easier to annotate, because you get to move through the data faster and save more often.

Topic		Replies	Views
Splitting bigger documents for NER usage , ner , best-practices	1	942	March 30, 2022
Correct way to annotate data in my case (Spacy newbie here) usage , ner , spacy	1	579	October 29, 2020
NER best practice: long paragraphs or sentences usage , ner	1	2383	May 14, 2020
how to annotate a longer text in rel.manual? relations	2	442	November 11, 2022
Document-level annotations with Prodigy usage , ner , spacy , solved	3	797	March 28, 2021

Best annotation strategy for NER

Related topics