Prodigy sentence splitting during ner.correct

Hi,

I am using prodigy for training an initial model, and then as an annotation tool for correcting model suggested spans. Used the "split_sents":false config in my prodigy.json since the entire paragraph context is very important for the NER task we are training.

However, I see a difference in how the data is loaded for ner.manual and ner.correct -> in the former, i see the paragraph as is, where as in ner.correct I see the sentence split. Any idea why this might be?

Help will be appreciated!

Thanks

Hi! By default, ner.correct will split the text into sentences, but you can disable this by setting the --unsegmented flag on the CLI. (The config setting "split_sents_threshold" lets you define the minimum character length required for a text to be split, if sentence segmentation is enabled.)

1 Like

Got it, thanks! Would you suggest using this flag while training the NER model as well? Or does the ner.train pick up the split sentence flag from the config?

Thank you!

Sorry, I somehow missed the follow-up question! You shouldn't have to set anything during training – if you're splitting the sentences during annotation, the sentences will be saved as separate examples. If not, they'll be saved as multi-sentence examples, and those will also be used to update the model.

1 Like