@andy Thanks, I didn’t know msgpack had that limit.
As a mitigation, you could try passing vocab=False
to the model.to_disk()
. You’ll then have to manage the vocab loading separately. If you want to keep loading from a single directory, you could copy it in after saving from the source. Alternatively, you could load the vocab from a single place each time, keeping the model and the vocab/vectors separate.
If you like you might want to make a subclass of English
(or whatever other Language
) that handles the to/from bytes/disk differently in this way, for convenience.