spacy on AWS EC2

This is a request for community link-up(s) on the topic of spaCy (not Prodigy) on AWS EC2. This seemed like the closest thread.

Based on AWS error logs, we strongly suspect that our EC2-installed instance of spaCy is not properly selecting the Linux version. Evidence: the various .pcx files in the \site-packages folder seem to be missing.

We're experienced and comfortable with EC2 deployments and with spaCy source files. But before we dive into the instance further, we'd much like to connect with others who are deploying spaCy on EC2. Down-in-the-weeds exchanges could then be very productive and, we hope, mutually helpful.

Thanks for a re-direct (to another thread) or for response(s) from any fellow-EC2'rs!

Honnibal and/or Ines, we are unable to port a local Windows-platform spaCy ap plus Prodigy-generated model to our AWS Linux-platform instance.

We strongly suspect a version conflict somewhere among spaCy and Prodigy dependencies. I will come back immediately with precise error log snippets. But for right now, would you please confirm that your response in this (closed) issue from Dec 6, 2018 is in fact current and accurate https://github.com/explosion/spaCy/issues/1634 ?

You are a life line to our company, and we deeply appreciate your help!

Something is still preventing spaCy from running on our Linux instance at AWS. The following summarizes findings, the AWS error log, and specific claims and questions:

Not surprisingly the spaCy directory structures are different between our (local) Windows and AWS’ Linux.

  1. The Win install has only a single path to spaCy (and its tree) via a single \Local directory:
    C:\Users\RonT\AppData\Local\Programs\Python\Python37\Lib\site-packages\spacy

  2. The Linux install at AWS shows both a \lib and a \lib64:
    ];2-user@ip-172-31-46-245:/opt/python/run/venv[ec2-user@ip-172-31-46-245 venv] cd local** **e-user@ip-172-31-46-245 local] ls
    bin include lib lib64

  3. There is no \spacy in the \site-packages under Linux’s \lib. There is a \spacy under \lib64

  4. Suspicious: the error log shows invocations through commingled and disparate paths :
    '/opt/python/run/venv/local/lib64/python3.6/site-packages',
    '/opt/python/run/venv/local/lib/python3.6/site-packages',
    ['/opt/python/current/app',
    '/opt/python/run/venv/ local/lib64 /python3.6/site-packages',
    '/opt/python/run/venv/ local/lib /python3.6/site-packages',
    '/opt/python/run/venv/ lib64 /python3.6',
    '/opt/python/run/venv/ lib /python3.6',
    '/opt/python/run/venv/ lib64 /python3.6/site-packages',
    '/opt/python/run/venv/ lib /python3.6/site-packages',
    '/opt/python/run/venv/ lib64 /python3.6/lib-dynload',
    '/usr/ lib64 /python3.6',
    '/usr/ lib /python3.6']

  5. Claims and question:

  • The AWS instance is 64-bit Linux.
  • There are no hard-coded paths in any application code that references spaCy.
  • Does spaCy generate any new code at run-time? For example, we think that .pyx files are (generated at run-time), but have yet to examine closely.

Hi @ronaldcturner,

Both spaCy and Prodigy use standard Python approaches to distribution and installation. We're pretty confident these are working correctly: we usually get errors within minutes if something's breaking. There are tens of thousands of spaCy installations, and a few thousand Prodigy installations. Nothing is different about installing on AWS.

You can find detailed setup instructions for spaCy here: https://spacy.io/usage . You should probably start fresh, I'm not sure what might have gone wrong with your local state.

If you still can't get it set up, you might try looking for a consultant to help you. You can try the consultant thread here: spaCy/prodigy consultants? . But since this is a spaCy issue rather than a specific Prodigy issue, you might actually have better luck on a more general platform. For instance, there are many people advertising spaCy experience here: https://www.upwork.com/search/profiles/?nbs=1&q=spaCy

Hello Honnibal:
From the start we have been haunted by this early and by-now-unfair review of NLP libraries. Thank you SINCERELY for strong re-assurance of the "hardness" of spaCy.

Our "AWS" problem was in fact a Prodigy Usage error on our part: failure to adhere to the documentation's "recommendation" to create a Python package for our custom models. You warn explicitly of "failing somewhere down the line when calling spacy.load()". The isolated the "somewhere" for us is a Linux-level difference between Fedora (no spaCy problem, even without Python-packaging) and Ubuntu (spaCy failed because we didn't Python-package the model ).

Our Prodigy/spaCy lesson learned:

  • Assume that every deployment platform expects a Python-pure application. So export custom models as Python packages
  • For the Prodigy "Export Model" step in the NER flowchart, extend our procedure to include Python packaging, beyond simply Language_to_disk().
  • Revise our internal documentation to augment the spaCy "recommendation" in "Using Your Own Models": “In order for your model to present as 100% Pythonic to whatever variety of underlying platform, it is imperative that you wrap…”

We can't thank you enough for the power of Prodigy to accelerate custom model development. And thanks again also for ample documentation and kind support!

(Close this thread whenever you wish.)

1 Like