EntityRuler and ner.match - different behavior

ines · July 11, 2019, 9:31am

@Anji.Vaidyula It really sounds like the problem is the capitalisation then!

Are you sure you want to train a statistical model for this, then? If you haven't done it already, it might be worth running a quick evaluation to see what your baseline is. For instance, if you're getting to a 95% accuracy using only your rules, training a model may be kind of a waste of time.

Oh, I meant you could do something like: extract all ORG entities and save the result in Prodigy's JSONL. Then change the label from ORG to ISSUER and load it into Prodigy and annotate the examples. Even if only 25% of those orgs are issuers, that's still 25% less manual work for you.

This depends on what you're trying to do. The very abstract patterns are unlikely to accurately capture the spans you're looking for. So it really comes down to what gives you the best results.

Both is possible and has different implications. See the documentation here: Rule-based matching · spaCy Usage Documentation

The entity ruler is designed to integrate with spaCy’s existing statistical models and enhance the named entity recognizer. If it’s added before the "ner" component , the entity recognizer will respect the existing entity spans and adjust its predictions around it. This can significantly improve accuracy in some cases. If it’s added after the "ner" component , the entity ruler will only add spans to the doc.ents if they don’t overlap with existing entities predicted by the model. To overwrite overlapping entities, you can set overwrite_ents=True on initialization.

If you just annotate, the model won't be changed at all. If you do use that data to train from later on, the statistical model will be updated, not the entity ruler (which is really just a collection of static rules). My comment here explains some of this in more detail:

Topic		Replies	Views
PhraseMatcher or the EntityRuler? off-topic	0	404	October 27, 2020
Question about EntityRecognizer usage , ner	5	813	July 29, 2020
Pre-Annotation Cannot Capture Entiity usage , ner , spacy	1	320	January 25, 2022
Training NER model from scratch using (forward-looking) patterns usage	8	689	December 17, 2019
Questionable results from NER - we must be doing something wrong ner , spacy , best-practices , legal	5	4339	August 30, 2018

EntityRuler and ner.match - different behavior

Related topics