consolidating unsegmented and segmented annotations

mhigginslp · March 16, 2018, 6:13pm

I have multiple labels that I have annotations for - some have used the automatic segmentation from ner.teach and others have had this function disabled. I would like to combine the annotations and train a single model on them. I’m guessing that just mushing them together will result in bad training and difficult evaluation.

honnibal · March 17, 2018, 12:39am

Maybe mushing them together won’t be so bad? It’s okay if texts are different lengths. I’ve actually been meaning to play with this more as a data augmentation strategy.

One unfortunate thing about the split_sentences() method at the moment is that it doesn’t currently save the original input hash. This makes it difficult to reconstruct the original stream. We’ll definitely be fixing this in the next version.

NNN · February 14, 2022, 3:17pm

Hi,

I was exploring the -unsegmented argument and came across this thread. I was just wondering if it is now possible to use outputs of the same dataset that were tagged both with and without -U?

Thanks

Topic		Replies	Views
Unsplitting annotated sentences ner , spacy	1	285	June 23, 2022
Strange text segmentation with ner.teach recipe usage	7	596	September 9, 2019
Merging/adding data from different texts usage , ner , database	2	876	March 1, 2019
merging segmented examples (Prodigy ner.correct) and keeping track on documents usage , ner , solved	2	333	February 16, 2020
Questions about ner.teach and ner.correct usage , ner	10	379	January 11, 2024

consolidating unsegmented and segmented annotations

Related topics