Right now we are using regex to identify references. For example, in this sentence, the number 9.6 should be captured:
- "in section 9.6 of this charter"
If possible, we want to train an ML model that will do this automatically. The trouble is, there are so many different kinds of references even within the same collection of documents.
- LR 9.3.11 R does not apply
- article 14(3) second paragraph
- article 5(2)(c) of the
- in the case of paragraphs (5) and (7)
There are many more types which I am not even including. To a human it is usually easy to spot what is a reference or not. But is there a way to teach the computer to do this automatically?