Can't find model 'ja_ginza'.

I am trying to build customised classification model, building on what ja_ginza already provides.

Rather than write code to train a classify using spacey, I want to use the convenience provided by prodigy recipes:

prodigy train ner my_data ja_ginza --output ./models/new_ginza

however, i get the below error:

OSError: [E050] Can't find model 'ja_ginza'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Furthermore, trying to download the data using spacy:

spacy download ja_ginza

returns this:

✘ No compatible model found for 'ja_ginza' (spaCy v2.3.2).

Would be great if you could add support for this, but is there a work around?

I just tried to get around this by saving the ginza model locally to disk, and using the below command to reference the local model.

prodigy train ner my_data ./models/ja_ginza --output ./models/new_ginza

I also updated prodigy to the current version:

$ prodigy stats
Version          1.10.4                        
Platform         Linux-4.4.0-87-generic-x86_64-with-glibc2.17
Python Version   3.8.1                         

I now get this error instead:

KeyError: "[E002] Can't find factory for 'CompoundSplitter'. This usually happens when spaCy calls `nlp.create_pipe` with a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to `Language.factories['CompoundSplitter']` or remove it from the model meta and add it via `nlp.add_pipe` instead."

Not quite sure what i need to do here. Specifically, what are write to Language.factories['CompoundSplitter'] and ``add via nlp.add_pipe referring to?

Digging in a bit more, I find that the Can't find factory for 'CompoundSplitter' error only seems to occur when running this command on Linux.

On MacOS, training runs smoothly and a new model is created.

prodigy train ner test_ner ./models/ja_ginza/ --output ./models/new_ginza

Here are stats on each environment.

MacOS

Version          1.10.4                        
Location         /Users/me/.pyenv/versions/3.6.1/lib/python3.6/site-packages/prodigy
Prodigy Home     /Users/me/.prodigy            
Platform         Darwin-19.6.0-x86_64-i386-64bit
Python Version   3.6.1                         
Database Name    SQLite                        
Database Id      sqlite                        
Total Datasets   9                             
Total Sessions   60 

Linux

Version          1.10.4                        
Location         /home/me/.pyenv/versions/Prodigy2/lib/python3.8/site-packages/prodigy
Prodigy Home     /home/me/.prodigy       
Platform         Linux-4.4.0-87-generic-x86_64-with-glibc2.17
Python Version   3.8.1                         
Database Name    SQLite                        
Database Id      sqlite                        
Total Datasets   5                             
Total Sessions   16 

I would rather not change the python version on Linux, but what could be going on here?
Thank you, as always.

Glad you got it working! I think ultimately, this comes down to how that package was implemented – it's not an "official" spaCy model we distribute, which is why the spacy download command won't work. So I also don't know the details of how it's packaged etc.

The Linux/MacOS difference likely happens because you're running different Python environments here – maybe one of them has the Ginza package or additional dependencies installed that are needed to find the component, and maybe the other one doesn't?

This refers to the implementation details of the custom component CompoundSplitter, which I assume is provided by the Ginza library? If a third-party library exposes a custom component, it needs to make it available to spaCy, otherwise spaCy won't know how to set it up. One way to do this is via entry points, or by code that runs as part of the model package.

Thanks Ines - I can confirm the Linux python environment had the Ginza package missing.
Adding this resolved the issue :slight_smile:

1 Like