NER or PhraseMatcher?

ines · August 29, 2018, 1:39pm

Yes, that’s correct. The entity recognizer will respect already existing entity spans set by previous pipeline components. Their boundaries are used as constraints for the model’s predictions.

If you have good rules, you could also use them to bootstrap training data for your model and improve the entity recognizer, without having to label anything from scratch. This would then allow you to go beyond your rules and be able to label, say, “May-Ayim-Ufer 9”, even if none of those components were part of your gazetteer. Here’s an example of a possible workflow:

Create gazetteers for your categories and write rules to handle ambiguity (e.g. "Richard Wagner " vs. “Richard Wagner Straße”).
Add your rule-based component to your spaCy pipeline, parse lots of text and extract the text plus entities.
Load the data into Prodigy and run ner.manual to see the entities and correct them if necessary. If your rules are good and 90% accurate, it means you only have to change something about 10% of the cases. So this should be super quick.
Use the created data as gold-standard training data for your model.

Topic		Replies	Views
PhraseMatcher or the EntityRuler? off-topic	0	406	October 27, 2020
Misspelled named entity extraction usage , ner	1	2907	August 20, 2018
Will NER improve Text Categorization?	2	413	July 18, 2022
Can we train an NER to recognise some entities not learned from labelled examples, but a list of imported entities, such as names of areas, main roads, etc.? usage , ner , spacy , solved	2	576	June 21, 2020
Create PhraseMatcher in Spacy and use them to Label data manually ner , spacy , solved , medical	9	1565	December 15, 2020

NER or PhraseMatcher?

Related topics