NER best practice: long paragraphs or sentences

PlataleaMinor · May 14, 2020, 9:27am

I would like to know how experts perform in NLP workflow.

I have a project with pdf files. In each document, I would like to perform a NER extraction on the name and reason for resign. In order to have the training set, I have use my own code to separate the pdf into sentences (with the use of spacy) and put each sentence into prodigy for labeling and training.

My question is
1.) Should I use a long paragraph/page instead of sentence for labeling? As some of the sentences are not complete sentences.
2.) Should I use long paragraph/ page to run with the model (mostly trained by sentences not long paragraph/ page).

Thank you for any comments/ recommendations

ines · May 14, 2020, 5:53pm

Hi! It's important that the examples your model sees during training are similar to the examples the model will see at runtime. So if you want to run your model over sentences, you should train it on sentences and then it also makes more sense to annotate sentences.

So as long as it's consistent, it doesn't matter that much whether you're using longer paragraphs or shorter sentences. We typically recommend annotating shorter texts because they're quicker to read/scan and you collect more datapoints overall. If you're annotating data for NER, this also makes it more obvious when a narrow context window makes it difficult to make the annotation decision. If the annotator struggles with this, the model is also less likely to make the distinction. (You can read more on this here.) So if you can, I'd say going with shorter segments is better.

Topic		Replies	Views
NER on long texts usage , ner	1	711	March 24, 2022
How to split the paragraph into sentences after annotation ner	3	577	November 20, 2022
Prodigy NER Long Text? usage , ner , textcat	3	621	August 6, 2021
Splitting bigger documents for NER usage , ner , best-practices	1	938	March 30, 2022
Best annotation strategy for NER usage , ner	1	655	November 4, 2019

NER best practice: long paragraphs or sentences

Related topics