Thanks for reporting this. It’s been a long journey to make sure we keep spaCy wheels maintained on conda-forge, to help exactly your use-case: making sure Windows users don’t need to install a build environment. Having this little library missing definitely feels like snatching defeat from the jaws of victory!
As a workaround, you could create an mmh3.py file that exposes a function
hash(string) — that should be all you need. Prodigy only imports mmh3 in a couple of places, and only calls that one function.
I think the following should work:
# Call this mmh3, and place it somewhere it can be imported, e.g. in your current working directory.
from spacy.strings import hash_string as hash
The only difference is that
mmh3.hash returns 32 bit signed integers, while
spacy.strings.hash_string() returns 64 bit unsigned integers. We used
mmh3.hash because there were some problems serialising and deserialising the 64 bit values on some platforms. In particular, you should check that the hashes are coming back the same after the examples are passed to the
If you have problems, you should be able to use any other library or method for hashing strings into 32 bit ints. The exact hashing isn’t referenced anywhere, it just needs to be internally consistent.