Hi Team, we are currently working on NER with relation where we could not render the UI for the lengthy documents like 2 or 3 pages. is there any option for this?
Is there a reason why you can't split the data into smaller segments? I can imagine that long pieces of text require a lot of scrolling and can make it much harder to annotate.
I sometimes like to use spaCy to preprocess my data, via something like:
import spacy
nlp = spacy.load("en_core_web_md")
# I'm assuming a list/generator of texts loaded in Python
orig_examples = ...
# This function can then turn them into seperate json blobs with sentences
def to_sentences(orig_examples):
for doc in nlp.pipe(orig_examples):
for sent in doc.sents:
yield {"text": sent.text}
new_examples = to_sentences(orig_examples)
This uses the sentence splitting property inside of a spaCy Doc
object. Might that suffice?
A colleague of mine mentioned that this answer on the forum may also be relevant: