Hello,
My team labelled 3,000 sentences for one entity using ner.teach
and saved to a database.
We then labelled the same 3,000 sentence stream independently, using ner.correct
and some of the labels used in en_core_web_lb
to pre-highlight.
We intended to merge this dataset with our first dataset, however, we did not use the --unsegmented
flag, and as a result, our second stream was split into sentences and therefore much longer.
Is there anyway to 'unsplit' sentences in the second dataset, so it resembles the structure of the first (whilst maintaining the spans) and can be merged with the first?
Many thanks for any help