Can the NER recognize groups of words? Should I use patterns?

ehaynie63 · October 22, 2018, 3:39pm

Hi,

I have so far unsuccessful at getting Spacy to recognize legal citations as separate entities. These are the entities I want it to find:

Arizonans for Official English v. Arizona, 520 U.S. 43 (1997)
Bell v. Wolfish, 441 U.S. 520 (1979)
City of L.A. v. Lyons, 461 U.S. 95 (1983)
City of Ontario, Cal. v. Quon, 130 S.Ct. 2619 (2010)

I’ve used NER manual so far. Would it be better to use something else? The classifier is not ideal bc it wouldn’t pull the citations from text.

Where should I go from here? Here’s the result of training my model.

Loaded model en_core_web_sm
Using 50% of accept/reject examples (108) for evaluation
Using 100% of remaining examples (210) for training
Dropout: 0.2 Batch size: 5 Iterations: 10

BEFORE 0.000
Correct 0
Incorrect 18
Entities 407
Unknown 0

LOSS RIGHT WRONG ENTS SKIP ACCURACY

01 102.558 0 18 1841 0 0.000
02 76.975 0 18 1865 0 0.000
03 66.574 0 18 2040 0 0.000
04 57.477 0 18 1988 0 0.000
05 50.854 0 18 1868 0 0.000
06 42.160 0 18 1979 0 0.000
07 32.963 0 18 1972 0 0.000
08 33.265 0 18 1854 0 0.000
09 30.458 0 18 1881 0 0.000
10 30.371 0 18 2006 0 0.000

Correct 0
Incorrect 18
Baseline 0.000
Accuracy 0.000

honnibal · October 22, 2018, 10:24pm

I suspect this type of entity will be difficult to learn, as it’s quite long. You might find that patterns do better. However, I have to say your dataset is very small — so it’s difficult to conclude much from your experiment. It could be that the same approach does succeed if you give it 10-20 times more data.

Topic		Replies	Views
Questionable results from NER - we must be doing something wrong ner , spacy , best-practices , legal	5	4343	August 30, 2018
Using terms.train-vectors recipe with NER ner , terms	1	1260	March 3, 2018
Invoice Parsing usage , ner , spacy	3	990	May 14, 2020
NER Model Features ner , spacy , api	2	654	June 1, 2018
Problem with new entity type and patterns usage , ner , solved	2	817	January 8, 2019

Can the NER recognize groups of words? Should I use patterns?

LOSS RIGHT WRONG ENTS SKIP ACCURACY

Related topics