Packaging model


I'm trying to package a model with one custom pipe "entity_optimiser" using spacy package command and amending the generated init to include writing to Language.factories and the to include all packages. On loading the model it can't find my modules and the custom component. Everything works as expected when using the source code.

Please can you take a look below if anything stands out as obviously wrong?

Model directory - abridged

|   meta.json
|       en_cit_nb_ner-20.1.23.tar.gz
|   |   en_cit_nb_ner-20.01.23
|   |   meta.json
|   |
|   |
|   \---en_cit_nb_ner
|       |   meta.json
|       |   tokenizer
|       |
|       |
|       +---entity_optimiser
|       |   |   address_instance_label_prior_vector
|       |   |   data.json
|       |   |
|       |   |
|       |   |
|       |   +---spacy_en_model
|       |   |   |
|       |   |   +---ner
|       |   |   |
|       |   |   +---parser
|       |   |   |
|       |   |   +---tagger
|       |   |   |
|       |   |   \---vocab
|       |
|       +---ner
|       |
|       +---vocab

init in top directory contains:

from __future__ import unicode_literals

from pathlib import Path
from spacy.util import load_model_from_init_py, get_model_meta

from spacy.language import Language

from .en_cit_nb_ner.entity_optimiser.entity_optimiser_master import Entity_Optimiser

__version__ = get_model_meta(Path(__file__).parent)['version']

def load(**overrides):
    return load_model_from_init_py(__file__, **overrides)

Language.factories["entity_optimiser"] = lambda nlp, **cfg: Entity_Optimiser()

Here is the error

Python 3.6.7 (default, Jul  2 2019, 02:21:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> nlp = spacy.load("en_cit_nb_ner")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda351\envs\python_36\lib\site-packages\spacy\", line 27, in load
    return util.load_model(name, **overrides)
  File "C:\Anaconda351\envs\python_36\lib\site-packages\spacy\", line 134, in load_model
    return load_model_from_package(name, **overrides)
  File "C:\Anaconda351\envs\python_36\lib\site-packages\spacy\", line 154, in load_model_from_package
    cls = importlib.import_module(name)
  File "C:\Anaconda351\envs\python_36\lib\importlib\", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Anaconda351\envs\python_36\lib\site-packages\en_cit_nb_ner\", line 9, in <module>
    from .en_cit_nb_ner.entity_optimiser.entity_optimiser_master import Entity_Optimiser
  File "C:\Anaconda351\envs\python_36\lib\site-packages\en_cit_nb_ner\en_cit_nb_ner\", line 11, in <module>
    from entity_optimiser_master import Entity_Optimiser
ModuleNotFoundError: No module named 'entity_optimiser_master'

Hi! Are you sure the you shared here is the same that's packed with the model? The line shown in the traceback looks different:

If that's what's in the init that it's trying to load, the erorr makes sense because it uses an absolute import instead of the relative import pointing to the actual module.

Hi @ines,

When I run spacy package on my model dir, it puts my scripts in a folder called en_cit_nb_ner-20.01.23. This folder contains my custom component. I thought I could change the name of this dir and remove the version number so I could import the module and my component. This however later gave me an issue whereby spacy load couldn't read meta.json.

What is the best practice to get my component added to libraries in the model init?

I hope I;m making sense

I think it might be easiest to put the other .py file in the same directory as the The component directories are also the directories used to store the model data, and depending on how your component is set up, you don't want it to override your scripts in there.

Looking at your directory structure again, I think another problem here is that you have a model (spacy_en_model) nested within your model package (en_cit_nb_ner). Ideally, your directory structure should look something like this (also see the docs here):

└── your_model
    ├── your_component   # data serialized by "your_component"    
    |   └── data.json    
    ├── ner              # data for "ner" component
    ├── parser           # data for "parser" component
    ├── tagger           # data for "tagger" component
    ├── vocab            # model vocabulary
    ├── meta.json        # model meta.json with name, language and pipeline
    └── tokenizer        # tokenization rules

your_component here could be your entity_optimiser. After you run spacy package, the directory would look like this:

└── packaged model
    ├── meta.json                     # model meta data
    ├──                      # setup file for pip installation
    └── name_of_your_model            # 📦 model package
        ├──               # init for pip installation
        ├── meta.json                 # model meta data
        └── your_model                # model data, see diagram above

This structure needs to stay like this, otherwise Python can't package it. It needs the at the root, and then a module, e.g. name_of_your_model with an that it can load. This will be the name of the Python package.

Thank you @Inees. I'm trying to get my head around your proposed solution. The reason why there is a spacy model nested is because my model is custom ner + my component and the component for one of the entities needs to use noun chunks and spacy ner for which I'm loading one of your basic models. It works when not packaged. Would you not work in a packaged model?

This is my structure in the working...

My high level structure is:

And my component structure is:

|   addresses # dir
|   spacy core_web_sm # dir, contains spacy model used for spacy entities and noun chunks to assist one of the optimiser modules
| #defines my pipe cls Entity_Optimiser
|   prob_features_given_address_labels # pickle file contains data for one the optimiser modules
|   address_instance_label_prior_vector # pickle file contains data for one the optimiser modules
|   data.json # contains data user by multiple optimiser modules

So the above looks to me as per the docs

When I run spacy package on it (w/out altering any file names), I get:

|   meta.json
    |   meta.json
    | # This is where I'm trying to add to Language.factories from entity_optimiser module but can't import from en_model0-20.01.23
        |   meta.json
        |   tokenizer
        |   |   address_instance_label_prior_vector
        |   |   data.json
        |   |
        |   |
        |   |
        |   |
        |   |
        |   |
        |   |   prob_features_given_address_labels
        |   |
        |   |
        |   +---addresses
        |   +---spacy_en_model

I resolved it and can share my solution shortly. Anna

1 Like