how to annotate a longer text in rel.manual?

Bharath · November 9, 2022, 8:56am

Hi Team, we are currently working on NER with relation where we could not render the UI for the lengthy documents like 2 or 3 pages. is there any option for this?

koaning · November 9, 2022, 4:02pm

Is there a reason why you can't split the data into smaller segments? I can imagine that long pieces of text require a lot of scrolling and can make it much harder to annotate.

I sometimes like to use spaCy to preprocess my data, via something like:

import spacy

nlp = spacy.load("en_core_web_md")

# I'm assuming a list/generator of texts loaded in Python
orig_examples = ... 

# This function can then turn them into seperate json blobs with sentences 
def to_sentences(orig_examples):
    for doc in nlp.pipe(orig_examples):
        for sent in doc.sents:
            yield {"text": sent.text}

new_examples = to_sentences(orig_examples)

This uses the sentence splitting property inside of a spaCy Doc object. Might that suffice?

koaning · November 11, 2022, 1:56pm

A colleague of mine mentioned that this answer on the forum may also be relevant:

Topic		Replies	Views
Best way to prepare a long text for annotations usage , spacy , solved	4	2142	August 29, 2018
Is there a limitation for string length for NER spacy models? usage , ner , spacy	1	1498	October 31, 2018
Best annotation strategy for NER usage , ner	1	659	November 4, 2019
NER and relation_extractor with transformer. ner , spacy , relations	1	492	February 15, 2023
relation is responding very slowly enhancement , relations	7	715	March 5, 2022

how to annotate a longer text in rel.manual?

Related topics