Saving a trained NER model as a loadable module

I have a problem with converting a trained NER model into a loadable module. I believe I'm following the steps laid out in the Get Started, First Steps and the tutorial video on training an insult recogniser.

The following is going to be slightly long and meandering, but that's mainly because I include all the output I get so you can see exactly what's happening (NB: I can see that the prodigy Support page removes tabs from my code below. I suppose you'll just have to imagine the tabs in the for and if blocks below, believe me they're there):

First, I created a dataset:

prodigy dataset eng_model "English model, version 1.0" --author MN

    ✨  Successfully added 'eng_model' to database SQLite.

Then, I annotated the en_core_web_sm using my own training data (for entity type 'ORG' only, just to start somewhere) and saved the annotations in the eng_model dataset:

prodigy ner.teach eng_model en_core_web_sm traindata.txt --label ORG

     ✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

^C('\nSaved 1415 annotations to database', 'SQLite')
('Dataset:', 'eng_model')
('Session ID:', '2017-09-25_19-30-47', '\n')

This seemed to work out just fine, as the ouput comment above hopefully shows. I went through something like 1300 examples before exhausting my training data, which took me slightly less than an hour (you've made this so easy, thanks for that!)

I then ran the batch-train script to train the model:

prodigy ner.
batch-train eng_model en_core_web_sm --output /tmp/model --eval-split 0.5 --label ORG

Loaded model en_core_web_sm
Using 50% of examples (597) for evaluation
Using 100% of remaining examples (601) for training
Dropout: 0.2  Batch size: 32  Iterations: 10  


BEFORE     0.491     
Correct    26
Incorrect  27
Entities   477       
Unknown    205       

     
#          LOSS       RIGHT      WRONG      ENTS       SKIP       ACCURACY  
01         0.610      42         11         483        0          0.792     
02         0.339      48         5          448        0          0.906                                  
03         0.203      47         6          440        0          0.887                                  
04         0.138      46         7          449        0          0.868                                  
05         0.092      48         5          429        0          0.906                                  
06         0.062      48         5          437        0          0.906                                  
07         0.043      49         4          437        0          0.925                                  
08         0.033      46         7          441        0          0.868                                  
09         0.026      48         5          425        0          0.906                                  
10         0.014      47         6          437        0          0.887                                  

Correct    49
Incorrect  4
Baseline   0.491     
Accuracy   0.925     

Model: /tmp/model
Training data: /tmp/model/training.jsonl
Evaluation data: /tmp/model/evaluation.jsonl

So, as far as I can see, everything worked out, and the updated model was placed in /tmp/model.

To test if my updated model actually made a difference I wrote a small script (derived from one of the examples on the prodigy website):


import spacy
import en_core_web_sm

text = ''' ...(not shown)... '''

\# Print entity labels and text for the untrained model:
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
print("\nEntities found before training:")
for ent in doc.ents:
    if ent.label_=='ORG':
        print(ent.label_, ent.text)

\# Load the trained model:
nlp = spacy.load('/tmp/model')
doc = nlp(text)

\# Print entity labels and text
print("\\nEntities found after training:")
for ent in doc.ents:
    if ent.label_=='ORG':
        print(ent.label_, ent.text)

The script found slightly different organisations in the input text using the en_core_web_sm and the annotated model located in /tmp/model, respectively. Also, the updated model performed slightly better than the original model, which is obviously what I'd like it to do. I interpret this to mean that the models are actually different (and that the updated model is slightly better than the original, to boot, at least for my purposes). All good so far (I think).

My problem arises when I try to save the annotated model to a loadable spacy module. I believe I'm following the guidelines to the letter (please tell me if I'm not):

spacy package /tmp/model /tmp --create-meta

    Generating meta.json
    Enter the package settings for your model.

    Model language (default: en): en
    Model name (default: model): model_TEST  
    Model version (default: 0.0.0): 
    Required spaCy version (default: >=2.0.0a14,<3.0.0): 
    Model description: 'Test model'
    Author: MN
    Author email: 
    Author website: 
    License (default: CC BY-NC 3.0): 

    Enter your model's pipeline components
    If set to 'True', the default pipeline is used. If set to 'False', the
    pipeline will be disabled. Components should be specified as a
    comma-separated list of component names, e.g. tensorizer, tagger,
    parser, ner. For more information, see the docs on processing pipelines.

    Pipeline components (default: True): 

    Successfully created package 'en_model_TEST-0.0.0'
    /tmp/en_model_TEST-0.0.0

    To build the package, run `python setup.py sdist` in this directory.

I cd'ed to the /tmp/en_model_TEST-0.0.0 directory and from there ran the setup:

python setup.py sdist

running sdist
running egg_info
creating en_model_TEST.egg-info
writing dependency_links to en_model_TEST.egg-info/dependency_links.txt
writing requirements to en_model_TEST.egg-info/requires.txt
writing top-level names to en_model_TEST.egg-info/top_level.txt
writing en_model_TEST.egg-info/PKG-INFO
writing manifest file 'en_model_TEST.egg-info/SOURCES.txt'
reading manifest file 'en_model_TEST.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'en_model_TEST.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: check: missing required meta-data: url

warning: check: missing meta-data: if 'author' supplied, 'author_email' must be supplied too

creating en_model_TEST-0.0.0
creating en_model_TEST-0.0.0/en_model_TEST
creating en_model_TEST-0.0.0/en_model_TEST.egg-info
creating en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0
creating en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/ner
creating en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/parser
creating en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tagger
creating en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tensorizer
creating en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/vocab
copying files to en_model_TEST-0.0.0...
copying MANIFEST.in -> en_model_TEST-0.0.0
copying meta.json -> en_model_TEST-0.0.0
copying setup.py -> en_model_TEST-0.0.0
copying en_model_TEST/__init__.py -> en_model_TEST-0.0.0/en_model_TEST
copying en_model_TEST/meta.json -> en_model_TEST-0.0.0/en_model_TEST
copying en_model_TEST.egg-info/PKG-INFO -> en_model_TEST-0.0.0/en_model_TEST.egg-info
copying en_model_TEST.egg-info/SOURCES.txt -> en_model_TEST-0.0.0/en_model_TEST.egg-info
copying en_model_TEST.egg-info/dependency_links.txt -> en_model_TEST-0.0.0/en_model_TEST.egg-info
copying en_model_TEST.egg-info/not-zip-safe -> en_model_TEST-0.0.0/en_model_TEST.egg-info
copying en_model_TEST.egg-info/requires.txt -> en_model_TEST-0.0.0/en_model_TEST.egg-info
copying en_model_TEST.egg-info/top_level.txt -> en_model_TEST-0.0.0/en_model_TEST.egg-info
copying en_model_TEST/en_model_TEST-0.0.0/evaluation.jsonl -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0
copying en_model_TEST/en_model_TEST-0.0.0/meta.json -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0
copying en_model_TEST/en_model_TEST-0.0.0/tokenizer -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0
copying en_model_TEST/en_model_TEST-0.0.0/training.jsonl -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0
copying en_model_TEST/en_model_TEST-0.0.0/ner/cfg -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/ner
copying en_model_TEST/en_model_TEST-0.0.0/ner/lower_model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/ner
copying en_model_TEST/en_model_TEST-0.0.0/ner/moves -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/ner
copying en_model_TEST/en_model_TEST-0.0.0/ner/tok2vec_model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/ner
copying en_model_TEST/en_model_TEST-0.0.0/ner/upper_model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/ner
copying en_model_TEST/en_model_TEST-0.0.0/parser/cfg -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/parser
copying en_model_TEST/en_model_TEST-0.0.0/parser/lower_model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/parser
copying en_model_TEST/en_model_TEST-0.0.0/parser/moves -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/parser
copying en_model_TEST/en_model_TEST-0.0.0/parser/tok2vec_model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/parser
copying en_model_TEST/en_model_TEST-0.0.0/parser/upper_model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/parser
copying en_model_TEST/en_model_TEST-0.0.0/tagger/cfg -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tagger
copying en_model_TEST/en_model_TEST-0.0.0/tagger/model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tagger
copying en_model_TEST/en_model_TEST-0.0.0/tagger/tag_map -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tagger
copying en_model_TEST/en_model_TEST-0.0.0/tensorizer/cfg -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tensorizer
copying en_model_TEST/en_model_TEST-0.0.0/tensorizer/model -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/tensorizer
copying en_model_TEST/en_model_TEST-0.0.0/vocab/keys -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/vocab
copying en_model_TEST/en_model_TEST-0.0.0/vocab/lexemes.bin -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/vocab
copying en_model_TEST/en_model_TEST-0.0.0/vocab/strings.json -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/vocab
copying en_model_TEST/en_model_TEST-0.0.0/vocab/vectors -> en_model_TEST-0.0.0/en_model_TEST/en_model_TEST-0.0.0/vocab
Writing en_model_TEST-0.0.0/setup.cfg
creating dist
Creating tar archive
removing 'en_model_TEST-0.0.0' (and everything under it)

So, I get a warning for not putting in my email address and website url, but I find it hard to believe that this should be the problem. Also, when I check the directory dist, the install file is there:

ls dist

en_model_TEST-0.0.0.tar.gz

I then install the model with pip:

pip install dist/en_model_TEST-0.0.0.tar.gz

Processing ./dist/en_model_TEST-0.0.0.tar.gz
Requirement already satisfied: spacy-nightly<3.0.0,>=2.0.0a14 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from en-model-TEST==0.0.0)
Requirement already satisfied: msgpack-python in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: cymem<1.32,>=1.30 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: ujson>=1.35 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: regex==2017.4.5 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: plac<1.0.0,>=0.9.6 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: dill<0.3,>=0.2 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: ftfy<5.0.0,>=4.4.2 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: preshed<2.0.0,>=1.0.0 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: msgpack-numpy in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: murmurhash<0.29,>=0.28 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: thinc<6.9.0,>=6.8.1 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: pathlib in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: numpy>=1.7 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: six in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: idna<2.7,>=2.5 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: html5lib in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from ftfy<5.0.0,>=4.4.2->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: wcwidth in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from ftfy<5.0.0,>=4.4.2->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: wrapt in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from thinc<6.9.0,>=6.8.1->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from thinc<6.9.0,>=6.8.1->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: cytoolz<0.9,>=0.8 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from thinc<6.9.0,>=6.8.1->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: termcolor in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from thinc<6.9.0,>=6.8.1->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: setuptools>=18.5 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from html5lib->ftfy<5.0.0,>=4.4.2->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: webencodings in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from html5lib->ftfy<5.0.0,>=4.4.2->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Requirement already satisfied: toolz>=0.8.0 in /home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages (from cytoolz<0.9,>=0.8->thinc<6.9.0,>=6.8.1->spacy-nightly<3.0.0,>=2.0.0a14->en-model-TEST==0.0.0)
Building wheels for collected packages: en-model-TEST
  Running setup.py bdist_wheel for en-model-TEST ... done
  Stored in directory: /home/mede/.cache/pip/wheels/aa/97/e2/468fe0e132d693852ddf090467827a936060e5c1d959a20b1f
Successfully built en-model-TEST
Installing collected packages: en-model-TEST
Successfully installed en-model-TEST-0.0.0

So, no warnings or errors there. Gives one hope, doesn't it? :slight_smile: To test if the module is now loadable I use a slightly modified version of the script mentioned above:


import spacy
import en_core_web_sm

text = ''' ...(not shown)... '''

\# Print entity labels and text for the untrained model:
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
print("\nEntities found before training:")
for ent in doc.ents:
    if ent.label_=='ORG':
        print(ent.label_, ent.text)

\# Load the trained model:
import en_model_TEST
nlp = spacy.load('en_model_TEST')           \# doesn't work, nor does en_model_TEST.load()
\#nlp = spacy.load('en_model_TEST_0.0.0')    \# doesn't work either, nor does en_model_TEST_0.0.0.load()
\#nlp = spacy.load('model_TEST')             \# doesn't work either, nor does model_TEST.load()
doc = nlp(text)

\# Print entity labels and text
print("\n-------------------------------------------\nEntities found after training:")
for ent in doc.ents:
    if ent.label_=='ORG':
        print(ent.label_, ent.text)

This fails at the line "nlp = spacy.load('en_model_TEST')", and I get the following output:

Traceback (most recent call last):
  File "TUTORIAL_use_trained_model.py", line 126, in <module>
    nlp = spacy.load('en_model_TEST') 
  File "/home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages/spacy/__init__.py", line 13, in load
    return util.load_model(name, **overrides)
  File "/home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages/spacy/util.py", line 110, in load_model
    raise IOError("Can't find model '%s'" % name)
OSError: Can't find model 'en_model_TEST'

Shouldn't spacy be able to find the trained model at this point?? Is there a step after the previous one where I should make the model discoverable for spacy / python? Have I misunderstood the procedure on some basic level? Please let me know what I've done wrong. Thanks in advance!

Thanks for sharing your process and results – this is looking good and from what I can tell, you did everything correctly – especially considering loading the model from a directory worked fine :smiley:

The spaCy error above occurs if spaCy fails to find the model name in either the shortcut links or the installed packages via pip (and it’s not a path to a directory either).

When you run pip list, does the model show up in the list of installed models? And do you get an error when you run:

import en_model_test
nlp = en_model_test.load()

Maybe the capitalisation of TEST could be the cause of the error here? As far as I know, pip packages are usually listed in lowercase – and I remember reading something about package names being converted to lowercase on build. spaCy’s internal function that checks whether a model package is installed doesn’t actually convert the name to lowercase (which it probably should btw – we hadn’t thought of this potential issue before).

The capitalisation of ‘TEST’ in the model name did indeed appear to be the issue. I’ve now re-run the steps from ner.batch-train with a model name without any capitals (ingeniously named ‘en_model_blah’). The model now shows up when I run pip list and I can import it and load it without any problems (see below). Lesson learned: only lower case names from now on :slight_smile:

However, now I get a new error, which appears to have to do with something in the Doc class. Here’s my script as it looks now:


text= ’ ’ ’ … (not shown) … ’ ’ ’

import spacy

# Print entity labels and text for the untrained model:
import en_core_web_sm
nlp = spacy.load(‘en_core_web_sm’)
blah1 = nlp(text)
print("\n-------------------------------------------\nEntities found before training:")
for ent in blah1.ents:
if ent.label_==‘ORG’:
print(ent.label_, ent.text)

#Load the trained model:
import en_model_blah
nlp = spacy.load(‘en_model_blah’)

blah2 = nlp(text)


This gives the following output:

Traceback (most recent call last):
File “TUTORIAL_use_trained_model.py”, line 137, in
blah2 = nlp(text)
File “/home/mede/Desktop/_OVERFOERES/CluedIn/NER_extraction/spacyNER/prodigy_installation/virtualenv_prodigy/lib/python3.5/site-packages/spacy/language.py”, line 275, in call
doc = proc(doc)
TypeError: ‘str’ object is not callable

So, the error doesn’t arise until I try to use the trained model on the input text. The first time, using the en_core_web_sm model, it works without a hitch. Any idea what the issue is here?

Thanks for updating – the case-sensitivity issue is definitely good to know. Will patch this in spaCy, just to be sure.

doc = proc(doc)
TypeError: ‘str’ object is not callable

This error likely means that your model’s meta.json includes an unknown pipeline component, and spaCy fails to resolve the string name (e.g. 'tensorizer') to a built-in component.

Check your model package’s meta.json, compare it to the meta.json of the en_core_web_sm and make sure the "pipeline" setting is identical.

Sorry if this is a little confusing. We’re currently working on making this stuff easier and more intuitive for the next spaCy alpha release. We’ll also improve the batch-train commands in Prodigy and let them take care of the model meta – e.g. name the model after the dataset, add the correct language and make sure it has the correct pipeline set. Prodigy knows all of this, so there’s actually no need to make the user re-enter all those details. This would mean that when you package your model, you won’t have to use the --create-meta flag anymore and can simply use the existing meta.json that was created by Prodigy.

Yes, that was indeed the issue.

The meta.json file of the en_core_web_sm model was [“tensorizer”,\n"tagger",\n"parser",\n"ner"], while the meta.json for the trained model was empty. I re-ran the steps from:

spacy package /tmp/model /tmp --create-meta

… and filled in the same elements when queried for the pipeline info. That fixed the issue, and I am now able to load the module, similarly to how I load en_core_web_sm. Thanks a lot! :slight_smile:

One small comment, in case anyone else is reading this: be aware that the create-meta prompt is a tad finicky. When queried for pipeline info I first entered:

tensorizer,tagger,parser,ner

… so without any whitespaces after the commas. However, this didn’t give the correct format in the resulting meta.json file, and I got the same error as before when trying to run my model on the test text (i.e. “TypeError: ‘str’ object is not callable”)

I then re-did it like this:

tensorizer, tagger, parser, ner

… (i.e. with a whitespace after each comma) and that made the resulting meta.json file’s pipeline info look identical to that of en_core_web_sm (see above). Minor detail, I suppose, and probably not hard to figure out if you know what you’re looking for, but there it is.

Thanks again for the help :+1:

Oh, and thanks for the rapid response time, by the way! :+1: :+1: :+1:

Glad you got it working and sorry about the small hickups. (But it’s also nice to see that the “hard part” worked well and the main problems were small bugs or weirdnesses in the packaging process!)

Will fix the pipeline prompt in spacy package to prevent this issue in the future. The pipeline string entered by the user should be handled as [p.strip() for p in pipeline.split(',')] so that both comma and comma + whitespace is handled properly.

1 Like