terms.teach: OverflowError: Python int too large to convert to SQLite INTEGER

Amy_H · January 16, 2019, 3:12pm

OS: Windows 10
Python: 3.7
Dist: Conda
pip installed prodigy without issues

Just got a research trial license yesterday (thanks, btw, can’t wait to show my colleagues at Northwestern!). Installed everything smoothly, default SQLite, etc. Began this training video with work-specific training (https://prodi.gy/docs/video-new-entity-type) and got the following error in conda console:

(spacy_env) C:\Users\ash9984>python -m prodigy terms.teach CAPS_terms en_core_web_lg --seeds “anxiety”
Initialising with 1 seed terms: anxiety

? Starting the web server at http://localhost:8080 …
Open the app in your browser and start annotating!

08:53:09 - Task queue depth is 1

09:02:19 - Exception when serving /give_answers
Traceback (most recent call last):
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\waitress\channel.py”, line 336, in service
task.service()
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\waitress\task.py”, line 175, in service
self.execute()
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\waitress\task.py”, line 452, in execute
app_iter = self.channel.server.application(env, start_response)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\api.py”, line 423, in api_auto_instantiate
return module.hug_wsgi(*args, **kwargs)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\falcon\api.py”, line 244, in call
responder(req, resp, **params)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py”, line 793, in call
raise exception
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py”, line 766, in call
self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py”, line 703, in call_function
return self.interface(**parameters)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py”, line 100, in call
return __hug_internal_self._function(*args, **kwargs)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\prodigy\app.py”, line 173, in give_answers
controller.receive_answers(answers, session_id=session_id)
File “cython_src\prodigy\core.pyx”, line 127, in prodigy.core.Controller.receive_answers
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\prodigy\components\db.py”, line 303, in add_examples
content=content)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py”, line 4977, in create
inst.save(force_insert=True)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py”, line 5170, in save
pk_from_cursor = self.insert(**field_dict).execute()
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py”, line 3584, in execute
cursor = self._execute()
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py”, line 2939, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File “C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py”, line 3830, in execute_sql
cursor.execute(sql, params or ())
OverflowError: Python int too large to convert to SQLite INTEGER

I get this error if I feed in more than 1 training term (eg, “anxiety, depression”). If I use just 1 term, server connection works but once I hit 21 session choices, I get above error in conda console and “ERROR: Couldn’t save annotations. make sure the server is running correctly.” message in training console. I’ve tried to suss out why this is happening multiple ways (variations of reject, accept, ignore, etc) and 21 seems to always trigger a disconnect from the server. Time doesn’t seem to matter, as I let minutes pass between choices.

I also reinstalled everything in a virtual environment this morning to see if that changed anything but sadly the same error persists.

Thanks in advance!!

ines · January 16, 2019, 3:21pm

Hi! Thanks for the detailed report and sorry for the frustration. It looks like some annotation it's trying to save causes SQLite to fail, which in turn kills the Python process – and as soon as the web app tries to request new questions or send back answers, it notices that the server is gone and complains as well.

OverflowError: Python int too large to convert to SQLite INTEGER

Could you double-check that the Python version you're running is 64-bit (not 32) and that your environment is on Python 3+?

Amy_H · January 16, 2019, 3:23pm

Wow! Thanks for the quick response! No need to apologize, I’m a big fan of y’alls work!!!

Here ya go:
(base) C:\Users\ash9984>python --version
Python 3.6.3 :: Anaconda custom (64-bit)

Virtual:
(spacy_env) C:\Users\ash9984>python --version
Python 3.7.2

ines · January 16, 2019, 5:00pm

Thanks and hmmm, this is really really strange. Coud you run conda list and check which version of sqlite it has installed by default? Maybe you’ve ended up with some old version with bad defaults compiled into it (which is a known issue).

To help debug this, could you find line 302 in prodigy/components/db.py and add a print statement above it that outputs the example it’s adding (to find the last one it eventually fails on)? For example, like this:

print(eg)
eg = Example.create(input_hash=eg[INPUT_HASH_ATTR],
                    task_hash=eg[TASK_HASH_ATTR],
                    content=content)

To find the location of your Prodigy installation, you can run the following:

python -c "import prodigy; print(prodigy.__file__)"

Finally, if this is all too annoying and you just want to get started, it might be easier to install MySQL on your system. In your prodigy.json, you can set "db": "mysql" and then use the "db_settings" to specify your username, database and password. See here for details.

Amy_H · January 16, 2019, 11:00pm

conda list produced:

sqlite 3.26.0 he774522_0

Just added print(eg) to line 301 in virtual env version (thanks for the location script, that saved me some time!):

Initializing:

(spacy_env) C:\Users\ash9984>python -m prodigy terms.teach CAPS_terms en_core_web_lg --seeds "anxiety"
Initialising with 1 seed terms: anxiety
{'text': 'anxiety', 'answer': 'accept', '_input_hash': 6298237553007272678, '_task_hash': 6567957362502149308}

At 21 choices I get:

16:58:20 - Task queue depth is 1
{'text': 'depression', 'meta': {'score': 0.8066005695}, '_input_hash': 14703391357354852000, '_task_hash': 4519653915177422000, 'answer': 'accept'}
16:59:23 - Exception when serving /give_answers
Traceback (most recent call last):
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\waitress\channel.py", line 336, in service
    task.service()
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\waitress\task.py", line 175, in service
    self.execute()
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\waitress\task.py", line 452, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\api.py", line 423, in api_auto_instantiate
    return module.__hug_wsgi__(*args, **kwargs)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\falcon\api.py", line 244, in __call__
    responder(req, resp, **params)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py", line 793, in __call__
    raise exception
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py", line 766, in __call__
    self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py", line 703, in call_function
    return self.interface(**parameters)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\hug\interface.py", line 100, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\prodigy\app.py", line 173, in give_answers
    controller.receive_answers(answers, session_id=session_id)
  File "cython_src\prodigy\core.pyx", line 127, in prodigy.core.Controller.receive_answers
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\prodigy\components\db.py", line 304, in add_examples
    content=content)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py", line 4977, in create
    inst.save(force_insert=True)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py", line 5170, in save
    pk_from_cursor = self.insert(**field_dict).execute()
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py", line 3584, in execute
    cursor = self._execute()
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py", line 2939, in _execute
    return self.database.execute_sql(sql, params, self.require_commit)
  File "C:\Users\ash9984\AppData\Local\Continuum\anaconda3\envs\spacy_env\lib\site-packages\peewee.py", line 3830, in execute_sql
    cursor.execute(sql, params or ())
OverflowError: Python int too large to convert to SQLite INTEGER

Hope this helps!

honnibal · January 17, 2019, 12:32pm

Hi @Amy_H,

Thanks for raising this, it definitely looks like a bug. We first thought it was in SQLite, but the hash values printed there sure are bigger than 32bit, so it looks to me like the mmh3 dependency we’re using for this has an error somewhere that’s only being triggered on Windows, possibly only with specific compilation settings.

I had a look at the source for the mmh3_hash function we’re calling, which is here: https://github.com/hajimes/mmh3/blob/master/mmh3module.cpp#L29 . I can’t see the specific error, but it’s not hard to imagine this C code could be hitting a subtle error, perhaps around the sizing of the long or int variables.

In spaCy we’ve been using our own wrapper of MurmurHash: https://github.com/explosion/murmurhash . I think the best solution will be to add the function we need to this, and switch Prodigy over to this library.

The bad news is it’s pretty hard to offer you a mitigation that will let you keep working in the meantime. I’m working on this though — maybe I can figure something out.

As a sanity check as well, could you give the mmh3 version that you’ve got in your conda environment?

honnibal · January 17, 2019, 1:16pm

Okay, I think I have a plan that should let us get a mitigation in place. The plan is:

I get a dev version of murmurhash up on PyPi with a replacement hash function
You check that the replacement hash function doesn’t have the same bug
If the replacement hash function works, then you uninstall mmh3, and we drop in a replacement file that just calls into murmurhash instead.

After we replace the mmh3 module, Prodigy should be none the wiser, and things should work correctly.

Could you try this for me?

python -m pip install murmurhash==1.1.0.dev0
python -c "import mmh3; print(mmh3.hash('anxiety'))"
python -c "import murmurhash; print(murmurhash.hash('anxiety'))"

We’re hoping that the call (into mmh3) produces the bad number that’s larger than 2**32, while the second call (into murmurhash) produces the correct result.

If this works, the next step is to uninstall mmh3, probably with something like conda uninstall mmh3. We then need to get Prodigy importing a replacement. I think it should be fine to create a file mmh3.py in your working directory, with the following contents:

from murmurhash import hash

Finally, run something like python -m prodigy to see if the replacement module is being imported correctly. If it’s not being found on the working directory, we might need to drop it into the conda virtualenv somewhere.

Amy_H · January 18, 2019, 6:59pm

So sorry for the lag on this- work projects got crazy.

Here's what happened:

Collecting murmurhash==1.1.0.dev0
  Downloading https://files.pythonhosted.org/packages/c2/77/585d84ef5f0423c0c1d5163bfe68b7b8b6df3ea074963acaa9c36c8eae60/murmurhash-1.1.0.dev0.tar.gz
Building wheels for collected packages: murmurhash
  Running setup.py bdist_wheel for murmurhash ... done
  Stored in directory: C:\Users\ash9984\AppData\Local\pip\Cache\wheels\66\0d\34\ac8bcbf74f7db9130ec060a0000e071a88af7635342b1a2ea6
Successfully built murmurhash
thinc 6.12.1 has requirement murmurhash<1.1.0,>=0.28.0, but you'll have murmurhash 1.1.0.dev0 which is incompatible.
spacy 2.0.16 has requirement murmurhash<1.1.0,>=0.28.0, but you'll have murmurhash 1.1.0.dev0 which is incompatible.
Installing collected packages: murmurhash
  Found existing installation: murmurhash 1.0.1
    Uninstalling murmurhash-1.0.1:
      Successfully uninstalled murmurhash-1.0.1
Successfully installed murmurhash-1.1.0.dev0

(spacy_env) C:\Users\ash9984>python -c "import mmh3; print(mmh3.hash('anxiety'))"
3518314200804635422

(spacy_env) C:\Users\ash9984>python -c "import murmurhash; print(murmurhash.hash('anxiety'))"
-1859125401

Does this look like what you were hoping for/expecting?

Thanks for taking the time Matt (and Ines!). Very impressed with the level of support I've already received!

Amy_H · January 18, 2019, 9:49pm

I proceeded with your instructions:

(spacy_env) C:\Users\ash9984>conda uninstall mmh3
Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are missing from the target environment:
  - mmh3

(spacy_env) C:\Users\ash9984>python -m prodigy

  ?  Available recipes:
  ner.match, ner.teach, ner.manual, ner.make-gold, ner.eval, ner.eval-ab,
  ner.batch-train, ner.train-curve, ner.print-best, ner.print-stream,
  ner.print-dataset, ner.gold-to-spacy, ner.iob-to-gold

  textcat.teach, textcat.batch-train, textcat.train-curve, textcat.eval,
  textcat.print-stream, textcat.print-dataset

  dep.teach, dep.batch-train, dep.train-curve, compare, pos.teach,
  pos.make-gold, pos.batch-train, pos.train-curve, pos.gold-to-spacy,
  terms.train-vectors, terms.teach, terms.to-patterns, mark, image.manual,
  image.test


  ?  Available commands:
  dataset, drop, stats, pipe, db-in, db-out


(spacy_env) C:\Users\ash9984>python -m prodigy terms.teach CAPS_terms en_core_web_lg --seeds "anxiety"
Initialising with 1 seed terms: anxiety
{'text': 'anxiety', 'answer': 'accept', '_input_hash': -358848061, '_task_hash': -817917627}

  ?  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

At 21 selections it spits this out:

15:44:50 - Task queue depth is 1
{'text': 'depression', 'meta': {'score': 0.8066005695}, '_input_hash': 1147727864, '_task_hash': -2069453297, 'answer': 'accept'}
{'text': 'panic', 'meta': {'score': 0.7935524238}, '_input_hash': 898628126, '_task_hash': 1230930292, 'answer': 'accept'}
{'text': 'insomnia', 'meta': {'score': 0.7930415441}, '_input_hash': 1265564640, '_task_hash': -1287351673, 'answer': 'accept'}
{'text': 'nervousness', 'meta': {'score': 0.7917400127}, '_input_hash': -1688713714, '_task_hash': 863857025, 'answer': 'accept'}
{'text': 'stress', 'meta': {'score': 0.7867360162}, '_input_hash': -1711672912, '_task_hash': 2074732802, 'answer': 'accept'}
{'text': 'disorder', 'meta': {'score': 0.7840409770000001}, '_input_hash': 1533702439, '_task_hash': -1829597718, 'answer': 'accept'}
{'text': 'symptoms', 'meta': {'score': 0.7823031737}, '_input_hash': -2059339190, '_task_hash': 269617525, 'answer': 'accept'}
{'text': 'irritability', 'meta': {'score': 0.773979784}, '_input_hash': -1395564595, '_task_hash': -2002428720, 'answer': 'accept'}
{'text': 'disorders', 'meta': {'score': 0.7714788198}, '_input_hash': -129257277, '_task_hash': -394331402, 'answer': 'accept'}
{'text': 'pain', 'meta': {'score': 0.7712528458}, '_input_hash': -1599602915, '_task_hash': 1091723846, 'answer': 'accept'}

I think its fixed? This is good, right?

ines · January 19, 2019, 2:32pm

Yes, if it doesn’t crash anymore, this indicates that the problem is resolved

(Basically, what went on here was that there’s likely a bug in the hashing library we use that creates the input hash and task hash values for each task saved to the database. The bug is only triggered in super specific conditions and platform combinations and you happened to be the unlucky person to trigger it for the first time ever in over a year )

Amy_H · January 23, 2019, 3:35pm

Yeah, that tracks.

Thanks y’all.

reb-greazy · February 6, 2019, 9:18pm

Greetings, we recently installed prodigy with everything running smoothly. I started adapting this training video (https://prodi.gy/docs/video-insults-classifier) to an application relevant to our company. I got the same “Error: Couldn’t save annotations. Make sure the server is running correctly” error after exactly 21 session choices. I got this error after 21 annotations when using both SQLite and PostgreSQL. I followed the solution given by Matthew involving:

However, we got the same hash number when using both mmh3 and murmurhash:

Therefore, the work around created in this thread is not working for us. Could you please advise?

Thanks in advance,
Rebekah

honnibal · February 7, 2019, 12:25am

@reb-greazy Hmm, that’s confusing!

Could you print the tasks which are failing, so we can see what the text and its hash is? We want to make sure there’s a hash that’s greater than 32bits there, and then verify what the hash value is when we use mmh3. If we’re not getting the same value by calling the library directly, that’s very confusing, and we’ll know where to look. If we do get a value greater than 32bits of out mmh3 and we also get that value out of murmurhash, then I’ll need to push a fix to that dev version of murmurhash.

reb-greazy · February 7, 2019, 12:53am

Here is the call and initialization:

After 21 annotations, I get:

I hope this is helpful. Thanks in advance.

ines · February 7, 2019, 1:17am

Thanks for helping us debug this! Could you add the print statement to the recipe as discussed here?

This will give us the exact term it fails on when it's trying to add it to the database.

reb-greazy · February 7, 2019, 3:42pm

Ines,

I added the print statement as instructed:

After additional testing, I noticed that I will get the same error (see below) every time I try saving from the web-application (even after just one annotation).

Here is the basic logging from my terminal:

Please let me know what additional info I can provide. Thanks so much for your help with this!

ines · February 7, 2019, 3:50pm

Thanks so much! One thing that's not 100% clear from your screenshot yet: What's the last example it prints before the error occurs and the process dies? This should also be something like {'text': 'something', 'answer': '...'}. This will help us debug, because that example is the culprit.

This makes sense because once the server has died, all connections the app is trying to make fail.

The underlying error happens when Prodigy tries to save the example to the database. This doesn't happen instantly – the app usually waits until it has one full batch of answers ready (minus the history, which is kept in the app so you can quickly undo). It then sends it out. A batch consists of 10 examples, so on the 21st annotation, Prodigy has one full batch of 10 answers plus 10 history. It sends the 10 back and the database bam, the error happens. This likely explains the "magic number" of 21.

reb-greazy · February 7, 2019, 4:52pm

Ines, sorry I am still pretty new at working with prodigy. I understand what you are saying regarding needing to know the last example. However, I added the print statement to the recipe but it seems to only print that information after the batch of 10 or right in the beginning as it is initializing the seeds. It continues to print in batches of 10 even when I am getting the saving error in the web application. However, I can find the 21st example by searching for the word it stopped on this session:

The 21st example in this session was the word “assumption” : {‘text’: ‘assumption’, ‘meta’: {‘score’: 0.7629092975159285}, ‘_input_hash’: -1957380416, ‘_task_hash’: -1661728975}

reb-greazy · February 11, 2019, 7:57pm

@ines Can I provide any additional information to help debug this problem?

Thanks!

ines · February 11, 2019, 8:55pm

Sorry, think I forgot to reply to your previous comment!

Yeah, that makes sense, because the answers are also sent back in batches of 10. It loops over each example and adds it to the database – and on one of them, the SQLite database will eventually complain and the whole thing will fail (this doesn't have to be the 21st example – it can be any example in the previous 10).

So just to confirm, is the last example you see before the error occurs the "assumption" example you posted?

Topic		Replies	Views
Receiving "Couldn't save annotations" error usage	39	1872	August 4, 2023
Unexpected character in found when decoding object value enhancement , done , database , solved	18	18592	July 22, 2020
ERROR: Can't fetch tasks. Make sure the server is running correctly. usage , solved	11	2152	May 8, 2021
Database performance/connection issues with a remote postgresql DB database	9	1464	June 9, 2023
Few records in in the db for the same example usage	26	630	June 13, 2023

terms.teach: OverflowError: Python int too large to convert to SQLite INTEGER

Related topics