I am using prodigy for training an initial model, and then as an annotation tool for correcting model suggested spans. Used the "split_sents":false config in my prodigy.json since the entire paragraph context is very important for the NER task we are training.
However, I see a difference in how the data is loaded for ner.manual and ner.correct -> in the former, i see the paragraph as is, where as in ner.correct I see the sentence split. Any idea why this might be?
Hi! By default, ner.correct will split the text into sentences, but you can disable this by setting the --unsegmented flag on the CLI. (The config setting "split_sents_threshold" lets you define the minimum character length required for a text to be split, if sentence segmentation is enabled.)
Sorry, I somehow missed the follow-up question! You shouldn't have to set anything during training – if you're splitting the sentences during annotation, the sentences will be saved as separate examples. If not, they'll be saved as multi-sentence examples, and those will also be used to update the model.