Prodigy sentence splitting during ner.correct

mwadhwa · February 17, 2021, 11:01pm

Hi,

I am using prodigy for training an initial model, and then as an annotation tool for correcting model suggested spans. Used the "split_sents":false config in my prodigy.json since the entire paragraph context is very important for the NER task we are training.

However, I see a difference in how the data is loaded for ner.manual and ner.correct -> in the former, i see the paragraph as is, where as in ner.correct I see the sentence split. Any idea why this might be?

Help will be appreciated!

Thanks

ines · February 17, 2021, 11:52pm

Hi! By default, ner.correct will split the text into sentences, but you can disable this by setting the --unsegmented flag on the CLI. (The config setting "split_sents_threshold" lets you define the minimum character length required for a text to be split, if sentence segmentation is enabled.)

mwadhwa · February 18, 2021, 1:25am

Got it, thanks! Would you suggest using this flag while training the NER model as well? Or does the ner.train pick up the split sentence flag from the config?

Thank you!

ines · February 24, 2021, 1:14pm

Sorry, I somehow missed the follow-up question! You shouldn't have to set anything during training – if you're splitting the sentences during annotation, the sentences will be saved as separate examples. If not, they'll be saved as multi-sentence examples, and those will also be used to update the model.

Topic		Replies	Views
split_sents_threshold setting not working with custom ner.correct usage , custom	7	804	July 7, 2020
Error while using ner.correct usage , ner	4	1055	January 19, 2020
How to split the paragraph into sentences after annotation ner	3	585	November 20, 2022
Implementing ner.correct says the model you are using isn't setting sentence boundaries ner , solved	8	362	July 24, 2023
prodigy splitting sentences for annotation enhancement , usage , done	14	3447	December 12, 2019

Prodigy sentence splitting during ner.correct

Related topics