For important reasons, we have two entirely separate models trained and made. I want to add to the entities already tagged by model A by passing the spaCy document to model B.
But model B seems to destroy any work done by model A. How can I supplement and respect/never overwrite the entities tagged by model A? My intuition is that this can be done by adding model B as a custom pipeline component to model A, but even there wouldn't entity attributes be overwritten?
I'm "manually" working around this with this, but is there a spaCy way to preserve entities?
from spacy.pipeline import merge_entities
med.add_pipe(merge_entities)
drug.add_pipe(merge_entities)
attributes = {'STRENGTH', 'FREQUENCY', 'DURATION', 'FORM', 'ROUTE', 'DOSAGE'}
sent = 'Patient took insulin glargine as needed and she took tylenol for two weeks.'
a = drug(sent)
b = med(sent)
indexes = [(token.ent_type_, token.text) if token.ent_type_ == 'DRUG' else None for token in a]
indexes2 = [(token.ent_type_, token.text) if token.ent_type_ in attributes else None for token in b]
print(len(indexes),indexes)
print(len(indexes2),indexes2)
>>> 13 [None, None, ('DRUG', 'insulin glargine'), None, None, None, None, None, ('DRUG', 'tylenol'), None, None, None, None]
>>> 10 [None, None, None, ('FREQUENCY', 'as needed'), None, None, None, None, ('DURATION', 'for two weeks'), None]
# The goal: 'Patient took insulin glargine as needed and she took tylenol for two weeks.'
>>> 10 [None, None, ('DRUG', 'insulin glargine'), ('FREQUENCY', 'as needed'), None, None, None, ('DRUG','tylenol'), ('DURATION', 'for two weeks'), None]