NER for tagging start and end of span vs. spancat

ryanwesslen · July 3, 2023, 10:13pm

Thanks for the background!

In general, we tend recommend shorter examples but I can understand if this is a bit tricky.

This is somewhat similar -- here's a quick way to split by a token and then create a new file to load as your source file (replace \xa0 with \n).

However, it doesn't do this under the constraint of keeping the newlines at the end of the doc. I'm wondering if there's a logic you could be as a if statement that would skip splitting when it is at the end.

Back to your original question - I was able to talk with a member of the spaCy dev team who suggested likely spancat would be a better fit. ner doesn't predict entities across sentence boundaries, especially given you have more than 100 spans which that's the case (nor is it easy to drop them).

Hope this helps and let us know if you have further questions!

Topic		Replies	Views
SPAN or NER for topic identification over large sentences ner , spancat	2	437	November 13, 2022
Extracting useful information from Job description ner , textcat , spancat	1	1532	January 24, 2023
Sentence / long spans classification tasks with context	2	276	March 15, 2024
Why do ner_manual spans require start/end? enhancement , usage , ner	1	500	September 13, 2021
Low score in spancat training	11	365	February 14, 2023

NER for tagging start and end of span vs. spancat

Related topics