identify legal terms

Hello,

I am trying to identify phrases that are related to laws for example:

  • …de acuerdo con lo dispuesto en el artículo 38.4 de la Ley 30/1992, de 26 de noviembre, de Régimen Jurídico...
  • …de conformidad con el artículo 6.2.a).1.º, párrafo segundo, del Real Decreto…
  • ...Infringir la Ley 1098 de 2006 en lo relativo a la prestación de servicios...
  • ...No dar aplicación a los mandatos de la Ley 1751 de 2015, en lo
    correspondiente a la prestación de los servicios de salud .

Can you tell me the best practice to achive this?

Thanks...

Hi! There's no easy answer and it really depends on your data, what you're trying to extract, and so on. In some cases like a citation or case name, you might be able to predict the span directly as a named entity recognition task. In other cases, this is going to be very difficult to learn and it makes a lot more sense to predict a category over the whole sentence. And then there are things that can be extracted using token-based rules or a combination of rules and more general linguistic features. Ultimately, you want to try out different approaches and evaluate them on a representative set of annotated examples to find out what works best.

I'd highly recommend checking out Daniel Hoadley's work on blackstone, a spaCy pipeline and model for processing legal texts (in English). The Readme features a bunch of examples and there are also blog posts that discusss some of the considerations – like, when to model a task as a named entity recognition problem, when to do text classification etc. And some of the components make very clever use of rules to detect abbreviations and improve sentence boundary detection.