Segmentation and newlines in ner.manual

KMLDS · May 4, 2018, 6:42pm

Thanks @ines - I assumed that was the reason for displaying whitespace characters. In my case, I will have a couple of subject matter experts doing some labeling of documents with a familiar format to them. The important thing I’m missing with the current rendering is the visual cues from paragraph breaks and bulleted lists and similar. If I either just remove the whitespace or break my training examples down into smaller chunks (e.g. just showing text between ‘\n\n’ tokens), it will take them much longer to go through the documents we want to label.

For later modeling efforts on this task there is no semantic difference between ‘\n’ and ’ ', and it doesn’t really matter to me if trailing or preceding ‘\n\n’ tokens are captured (I can just remove them from training data or model outputs, they have no importance to the task at hand).

Topic		Replies	Views
Customizations for the ner.teach UI ner	3	1260	January 11, 2018
Best Practices for Segmenting Text into Passages and Applying Multi-label Classification	1	794	September 13, 2023
Strange text segmentation with ner.teach recipe usage	7	596	September 9, 2019
prodigy splitting sentences for annotation enhancement , usage , done	14	3455	December 12, 2019
HTML to jsonl and NER task workflow usage , ner , solved	6	851	July 19, 2019

Segmentation and newlines in ner.manual

Related topics