EntityRuler and ner.match - different behavior

@Anji.Vaidyula It really sounds like the problem is the capitalisation then!

Are you sure you want to train a statistical model for this, then? If you haven't done it already, it might be worth running a quick evaluation to see what your baseline is. For instance, if you're getting to a 95% accuracy using only your rules, training a model may be kind of a waste of time.

Oh, I meant you could do something like: extract all ORG entities and save the result in Prodigy's JSONL. Then change the label from ORG to ISSUER and load it into Prodigy and annotate the examples. Even if only 25% of those orgs are issuers, that's still 25% less manual work for you.

This depends on what you're trying to do. The very abstract patterns are unlikely to accurately capture the spans you're looking for. So it really comes down to what gives you the best results.

Both is possible and has different implications. See the documentation here: Rule-based matching · spaCy Usage Documentation

The entity ruler is designed to integrate with spaCy’s existing statistical models and enhance the named entity recognizer. If it’s added before the "ner" component , the entity recognizer will respect the existing entity spans and adjust its predictions around it. This can significantly improve accuracy in some cases. If it’s added after the "ner" component , the entity ruler will only add spans to the doc.ents if they don’t overlap with existing entities predicted by the model. To overwrite overlapping entities, you can set overwrite_ents=True on initialization.

If you just annotate, the model won't be changed at all. If you do use that data to train from later on, the statistical model will be updated, not the entity ruler (which is really just a collection of static rules). My comment here explains some of this in more detail: