HTML Source Sentence Boundary Detection Prodigy

nix411 · November 28, 2019, 11:45am

I've made a parser that segments html into a list of paragraphs (if you see a new line on the HTML page then you get a new paragraph). Is that something you could use? However note what Matthew Honnibal said

I have this exact issue. I actually want some of the non-text aspects as features but I don't know how to achieve that in a spaCy framework. I've laid out my challenge in another thread.

Topic		Replies	Views
breaking down texts to sentences for textcat textcat , best-practices	2	334	December 13, 2023
Custom HTML template usage	4	1900	March 21, 2019
NER document Labeling ner , solved	25	3683	August 1, 2019
Combining Document Layout Analysis with NLP spacy	1	806	February 26, 2019
Sentence-based classification: Automated sentence splitting? usage , textcat , spacy , solved	5	1834	June 14, 2018

HTML Source Sentence Boundary Detection Prodigy

Related topics