Annotation using my own Spacy custom pipeline

Hello,

I have defined a basic custom pipeline in Spacy that remove accents, special characters, stop words, ecc and tokenize the text.
Then what I have done is to use the API nlp.to_disk(...) to export it, and moving all the custom code into a specific python file, I have tried to create a package using the Spacy CLI (python -m spacy package ...).
The output of the previous command is now a folder that includes a folder dist, a folder with the name of the pipeline, a meta.json file and a setup.py file among the others.
I am doing this because the next step would be to tag the text I have using Prodigy, but what I would like to do is to use my custom pipeline so it is applied before starting the NER step.

I saw that with the recipes such as ner.correct you can specify the spacy pipeline to be used. I have tried to use the one I exported but I always got some error such as for example "OSError: [E053] Could not read meta.json from en_myP-3.1.0.tar.gz".

Can you please help me understanding what I am doing wrong? And how can I fix this?

Thank your for your help, it's very appreciated.

Regards,
Mauro

Hi! When you run spacy package, it creates a pip-installable Python package. So to use it, you can run pip install /path/to/en_myP-3.1.0.tar.gz. After installation, you can then load the model via its name in Prodigy – in your case, en_myP.

Alternatively, you can also point Prodigy to a file path of the model data directory (the same directory you packaged with spacy package).

2 Likes