trying to link words in two spans to form 1 entity in prodigy.

ines · April 19, 2019, 9:22am

I hope I understand your question correctly – but I think the underlying thing here is that the model (and pretty much any computational process) will read text in from top to bottom, left to right, character by character. The two spans may be aligned visually – but to the machine, they’re far apart and just not sequence.

Entity spans are defined as a sequence of tokens and that’s why the entity recognizer is trying to predict – so something like B-O-O-O-I-L wouldn’t be considered a valid entity sequence. Predicting O after B would be considered an illegal move. (For more details on transition-based NER, you might want to check out this video).

Is there a specific reason you want the column heading (?) included in the entity span? If your example is representative of the type of data you’re working with, a purely NER-based approach seems a bit unideal. There are only short text fragments and what you consider a “sequence” isn’t even an actual sequence.

So you might want to experiment with only predicting more generic concepts and then using the surrounding token context to resolve those back to their headlines. See this thread for some ideas and inspiration. You could also try adding a pre-processing step that reformats your raw text in a way that the true word order matches the logical word order. Finally (this is more experimental), you could try framing the problem as a computer vision task (!), try to predict the information based on the position in the document or section, and then extract the text content from the predicted bounding boxes afterwards.

Topic		Replies	Views
Mismatching spans usage , ner , solved	3	336	July 15, 2021
Combining NER with text classification usage , ner , textcat	10	6898	March 20, 2024
📺 Video: Training a custom entity linking model with spaCy & Prodigy ner , project	45	6919	May 10, 2021
Two word entities usage , spacy , solved	8	1734	June 20, 2019
Highlighting spans that are not the entities to be labeled when using ner.correct usage , ner	1	454	December 21, 2020

trying to link words in two spans to form 1 entity in prodigy.

Related topics