Merge Entities Error

@ronnie Sorry if this was confusing and frustrating – we hadn’t through this through from end to end, so there’s currently an awkward gap here. But the next update to spaCy will include both factories for merge_entities and merge_noun_chunks out of the box. This means that when you load your model and the pipeline specifies one of those components, spaCy will know what to do. (We’re actually just working on that!)

In the meantime, the simplest fix would be to remove the 'merge_entities' from your meta.json and re-add the function manually. From within a Prodigy recipe, you can also just import the component as prodigy.components.preprocess.merge_entities.

def merge_entities(doc):
    spans = [(e.start_char, e.end_char, e.root.tag, e.root.dep, e.label)
             for e in doc.ents]
    for start, end, tag, dep, ent_type in spans:
        doc.merge(start, end, tag=tag, dep=dep, ent_type=ent_type)
    return doc
nlp = spacy.load('/path/to/your/model')
nlp.add_pipe(merge_entities, name='merge_entities', after='ner')

The above solution sill means you have to do this manually after loading the model. A more elegant solution would be to include the component in your model’s __init__.py and then add a factory to Language that lets spaCy initialise your component. My comment on this thread has more details on this.

def entity_merger(nlp, **cfg):
    return merge_entities

Language.factories['merge_entities'] = lambda nlp, **cfg: entity_merger(nlp, **cfg)

You can then package your model with spacy package (this is important, because you want spaCy to execute the package and its __init__.py!) and it will be able to load the merge_entities component. However, since spaCy will be providing a built-in factory for this, you hopefully won’t have to implement this yourself! (It might be useful in the future, though, if you ever end up writing more complex custom components.)

1 Like