I want to add a new pipeline component (EntityMatcher) and following an example presented here.
import spacy
from spacy.matcher import PhraseMatcher
from spacy.tokens import Span
list_of_drugs = ['insulin', 'aspirin', 'humalog', 'lantus', 'tamsulosin', 'amlodipine']
class EntityMatcher(object):
name = 'entity_matcher'
def __init__(self, nlp, terms, label):
patterns = [nlp(term) for term in terms]
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add(label, None, *patterns)
def __call__(self, doc):
matches = self.matcher(doc)
spans = []
for label, start, end in matches:
span = Span(doc, start, end, label=label)
spans.append(span)
doc.ents = spans
return doc
Now i have two options, first, I start with the original pipeline:
nlp = spacy.load('en_core_web_lg')
doc = nlp(u'Apple is looking at buying U.K. aspirin and tamsulosin startup for $1 billion')
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 52 62 MONEY
Then I want to add a new component:
entity_matcher = EntityMatcher(nlp, list_of_drugs, 'DRUG')
nlp.add_pipe(entity_matcher)
print(nlp.pipe_names)
['tagger', 'parser', 'ner', 'entity_matcher']
and then applying to the same text gives only drugs:
doc = nlp(u'Apple is looking at buying U.K. startup for production of aspirin and tamsulosin for $1 billion')
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
aspirin 58 65 DRUG
tamsulosin 70 80 DRUG
I’m sure I’m missing here something, but I couldn’t find it in docs. Any simple tweak to combine both the original NER and the custom one?