Merge Entities Error

terms
spacy
done

(Madhu Jahagirdar) #1

After I trained model using Merge Entities , I am getting the following error. Do i need to install anything ?

/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use “pip install psycopg2-binary” instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
“”")
Traceback (most recent call last):
File “/usr/lib/python3.5/runpy.py”, line 184, in _run_module_as_main
main”, mod_spec)
File “/usr/lib/python3.5/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/prodigy/main.py”, line 248, in
controller = recipe(args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 150, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/plac_core.py”, line 328, in call
cmd, result = parser.consume(arglist)
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/plac_core.py”, line 207, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/prodigy/recipes/textcat.py”, line 106, in batch_train
nlp = spacy.load(input_model, disable=[‘ner’])
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/spacy/init.py”, line 19, in load
return util.load_model(name, **overrides)
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/spacy/util.py”, line 117, in load_model
return load_model_from_path(Path(name), **overrides)
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/spacy/util.py”, line 157, in load_model_from_path
component = nlp.create_pipe(name, config=config)
File “/home/madhujahagirdar/bionlp-gpu/venv/lib/python3.5/site-packages/spacy/language.py”, line 215, in create_pipe
raise KeyError(“Can’t find factory for ‘{}’.”.format(name))
KeyError: "Can’t find factory for ‘merge_entities’.


(Ines Montani) #2

Thanks for the report! The problem here is that the terms.train-vectors adds a new merge_entities component to the pipeline, which is later added to the model’s meta.json. So when you load the model back in, spaCy is trying to find a factory for that component to initialise it (just like it does for the 'tagger' or 'parser').

Sorry about that – the way this is currently handled is kind of unideal – we need to go back and think about how to best solve this. For now, you could simply remove the 'merge_entities' component from the "pipeline" setting of your model’s meta.json, add the component manually after loading the model:

from prodigy.components.preprocess import merge_entities

nlp = spacy.load('your_model')
nlp.add_pipe(merge_entities, name='merge_entities')

This ensures that the entities are merged so the vectors you’ve trained for the merged entities are available. Here’s the function for reference:

def merge_entities(doc):
    """Preprocess a spaCy doc, merging entities into a single token.
    Best used with nlp.add_pipe(merge_entities).

    doc (spacy.tokens.Doc): The Doc object.
    RETURNS (Doc): The Doc object with merged noun entities.
    """
    spans = [(e.start_char, e.end_char, e.root.tag, e.root.dep, e.label)
             for e in doc.ents]
    for start, end, tag, dep, ent_type in spans:
        doc.merge(start, end, tag=tag, dep=dep, ent_type=ent_type)
    return doc

Alternatively, you could also package your model using the spacy package command and add an entry to Language.factories that initialises the pipeline component – my comments on this thread have more details on this solution.


(Madhu Jahagirdar) #3

what would be the method for merge_noun_chunks ?

raise KeyError(“Can’t find factory for ‘{}’.”.format(name))
KeyError: “Can’t find factory for ‘merge_noun_chunks’.”


(Ines Montani) #4

Sorry, I should have added that one as well. It’s also a preprocessor, so you can import it and add the Prodigy component to your pipeline:

from prodigy.components.preprocess import merge_noun_chunks

nlp = spacy.load('your_model')
nlp.add_pipe(merge_noun_chunks, name='merge_noun_chunks')

Or use the function instead:

def merge_noun_chunks(doc):
    """Preprocess a spaCy Doc, merging noun chunks. Best used with
    nlp.add_pipe(merge_noun_chunks).

    doc (spacy.tokens.Doc): The Doc object.
    RETURNS (Doc): The Doc object with merged noun chunks.
    """
    if not doc.is_parsed:
        return
    spans = [(np.start_char, np.end_char, np.root.tag, np.root.dep)
             for np in doc.noun_chunks]
    for start, end, tag, dep in spans:
        doc.merge(start, end, tag=tag, dep=dep)
    return doc

(Madhu Jahagirdar) #5

from prodigy.components.preprocess import merge_noun_chunks
from prodigy.components.preprocess import merge_entities

nlp = spacy.load("/Users/philips/Development/BigData/RS/annotation/Prodigy/Classification_Model/followup_recommendation_radreportw2veconly/")
nlp.add_pipe(merge_noun_chunks, name=‘merge_noun_chunks’)
nlp.add_pipe(merge_entities, name=‘merge_entities’)

I get the following error:


AttributeError Traceback (most recent call last)
in ()
2 from prodigy.components.preprocess import merge_entities
3
----> 4 nlp = spacy.load("/Users/philips/Development/BigData/RS/annotation/Prodigy/Classification_Model/followup_recommendation_radreportw2veconly/")
5 nlp.add_pipe(merge_noun_chunks, name=‘merge_noun_chunks’)
6 nlp.add_pipe(merge_entities, name=‘merge_entities’)

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/spacy/init.py in load(name, **overrides)
17 “to load. For example:\nnlp = spacy.load(’{}’)”.format(depr_path),
18 ‘error’)
—> 19 return util.load_model(name, **overrides)
20
21

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/spacy/util.py in load_model(name, **overrides)
115 return load_model_from_package(name, **overrides)
116 if Path(name).exists(): # path to model data directory
–> 117 return load_model_from_path(Path(name), **overrides)
118 elif hasattr(name, ‘exists’): # Path or Path-like to model data
119 return load_model_from_path(name, **overrides)

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/spacy/util.py in load_model_from_path(model_path, meta, **overrides)
157 component = nlp.create_pipe(name, config=config)
158 nlp.add_pipe(component, name=name)
–> 159 return nlp.from_disk(model_path)
160
161

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/spacy/language.py in from_disk(self, path, disable)
636 if not (path / ‘vocab’).exists():
637 exclude[‘vocab’] = True
–> 638 util.from_disk(path, deserializers, exclude)
639 self._path = path
640 return self

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/spacy/util.py in from_disk(path, readers, exclude)
520 for key, reader in readers.items():
521 if key not in exclude:
–> 522 reader(path / key)
523 return path
524

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/spacy/language.py in (p, proc)
632 if not hasattr(proc, ‘to_disk’):
633 continue
–> 634 deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
635 exclude = {p: False for p in disable}
636 if not (path / ‘vocab’).exists():

nn_parser.pyx in spacy.syntax.nn_parser.Parser.from_disk()

~/Development/BigData/RS/annotation/venv/lib/python3.5/site-packages/thinc/neural/_classes/model.py in from_bytes(self, bytes_data)
349 if isinstance(name, bytes):
350 name = name.decode(‘utf8’)
–> 351 dest = getattr(layer, name)
352 copy_array(dest, param[b’value’])
353 i += 1

AttributeError: ‘FunctionLayer’ object has no attribute ‘vectors’

Meta.json

{
“license”:“CC BY-SA 3.0”,
“url”:“https://explosion.ai”,
“lang”:“en”,
“sources”:[
“OntoNotes 5”,
“Common Crawl”
],
“name”:“core_web_sm”,
“pipeline”:[
“tagger”,
“parser”,
“textcat”
],
“version”:“2.0.0”,
“description”:“English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.”,
“email”:“contact@explosion.ai”,
“speed”:{
“gpu”:null,
“nwords”:291344,
“cpu”:5122.3040471407
},
“parent_package”:“spacy”,
“spacy_version”:">=2.0.0a18",
“author”:“Explosion AI”,
“accuracy”:{
“uas”:91.7237657538,
“ents_f”:85.2975560875,
“ents_r”:85.6312524451,
“ents_p”:84.9664503965,
“tags_acc”:97.0403350292,
“las”:89.800872413,
“token_acc”:99.8698372794
},
“vectors”:{
“vectors”:569319,
“width”:300,
“keys”:503161
}
}


(Ronnie Taarnborg) #6

Hi!

I’m not sure if I should start a new topic or post my (what I think is related) question here? But here goes (sorry in advance if it should be a separate topic):

I have trained a danish word2vec model on 2.2 million posts from Facebook pages belonging to Danish media sites and politicians with the intent of building a topic classifier (inspired by @ines video tutorial on how to train an insult classifier). I’ve trained the model using the terms.train-vectors recipe with the merge entities flag. However, I get a similar error when using the terms.teach recipe with the trained model.

If I remove the merge_entities component from the meta.json everything works fine, but obviously the merge_entities component is not used.

Is it possible to modify the terms.teach recipe so that it includes the merge_entities component?

Thanks!


(Ines Montani) #7

@ronnie Sorry if this was confusing and frustrating – we hadn’t through this through from end to end, so there’s currently an awkward gap here. But the next update to spaCy will include both factories for merge_entities and merge_noun_chunks out of the box. This means that when you load your model and the pipeline specifies one of those components, spaCy will know what to do. (We’re actually just working on that!)

In the meantime, the simplest fix would be to remove the 'merge_entities' from your meta.json and re-add the function manually. From within a Prodigy recipe, you can also just import the component as prodigy.components.preprocess.merge_entities.

def merge_entities(doc):
    spans = [(e.start_char, e.end_char, e.root.tag, e.root.dep, e.label)
             for e in doc.ents]
    for start, end, tag, dep, ent_type in spans:
        doc.merge(start, end, tag=tag, dep=dep, ent_type=ent_type)
    return doc
nlp = spacy.load('/path/to/your/model')
nlp.add_pipe(merge_entities, name='merge_entities', after='ner')

The above solution sill means you have to do this manually after loading the model. A more elegant solution would be to include the component in your model’s __init__.py and then add a factory to Language that lets spaCy initialise your component. My comment on this thread has more details on this.

def entity_merger(nlp, **cfg):
    return merge_entities

Language.factories['merge_entities'] = lambda nlp, **cfg: entity_merger(nlp, **cfg)

You can then package your model with spacy package (this is important, because you want spaCy to execute the package and its __init__.py!) and it will be able to load the merge_entities component. However, since spaCy will be providing a built-in factory for this, you hopefully won’t have to implement this yourself! (It might be useful in the future, though, if you ever end up writing more complex custom components.)


(Ines Montani) #8

Quick update: The following commit adds merge_entities and merge_noun_chunks as built-in factories, so spaCy will be able to create and add them if they’re present in a model’s meta.json, without requiring custom modifications. The fix will be included in the next spaCy release.


(Ines Montani) #9

Just released spaCy v2.0.10 which includes built-in factories for merge_entities and merge_spans. The new version is compatible with Prodigy v1.4.0, so you should be able to run the following in your Prodigy environment:

pip install -U spacy