identify legal terms

ines · November 14, 2019, 12:43pm

Hi! There's no easy answer and it really depends on your data, what you're trying to extract, and so on. In some cases like a citation or case name, you might be able to predict the span directly as a named entity recognition task. In other cases, this is going to be very difficult to learn and it makes a lot more sense to predict a category over the whole sentence. And then there are things that can be extracted using token-based rules or a combination of rules and more general linguistic features. Ultimately, you want to try out different approaches and evaluate them on a representative set of annotated examples to find out what works best.

I'd highly recommend checking out Daniel Hoadley's work on blackstone, a spaCy pipeline and model for processing legal texts (in English). The Readme features a bunch of examples and there are also blog posts that discusss some of the considerations – like, when to model a task as a named entity recognition problem, when to do text classification etc. And some of the components make very clever use of rules to detect abbreviations and improve sentence boundary detection.

Topic		Replies	Views
Can the NER recognize groups of words? Should I use patterns? usage , ner	1	543	October 22, 2018
Questionable results from NER - we must be doing something wrong ner , spacy , best-practices , legal	5	4343	August 30, 2018
Information extraction from legislative text - Doubts and questions usage , ner , relations	3	664	November 11, 2021
Sentence segmentation in NER.teach ner , spacy , solved , legal	2	824	March 10, 2020
Invoice Parsing usage , ner , spacy	3	990	May 14, 2020

identify legal terms

Related topics