OverflowError: Maximum recursion level reached

I am having peculiar error when using a custom teach method. I am not entirely sure if Prodigy is the problem.

My custom teach methods returns from a __call__ method with a (score, task) tuple.

I’m actually using a sklearn model and partial_fit method to update. I wasn’t sure what to return in the Model.update method. The README says to return a dictionary but that gives me an error as well.

Thanks

OverflowError: Maximum recursion level reached
Exception when serving /give_answers
Traceback (most recent call last):
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/waitress/channel.py", line 338, in service
    task.service()
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/waitress/task.py", line 169, in service
    self.execute()
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/waitress/task.py", line 399, in execute
    app_iter = self.channel.server.application(env, start_response)
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/hug/api.py", line 421, in api_auto_instantiate
    return module.__hug_wsgi__(*args, **kwargs)
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/falcon/api.py", line 242, in __call__
    responder(req, resp, **params)
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/hug/interface.py", line 692, in __call__
    self.render_content(self.call_function(input_parameters), request, response, **kwargs)
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/hug/interface.py", line 633, in call_function
    return self.interface(**parameters)
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/hug/interface.py", line 99, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/Users/apewu/writelab/aes/venv/lib/python3.6/site-packages/prodigy/app.py", line 66, in give_answers
    controller.receive_answers(answers)
  File "cython_src/prodigy/core.pyx", line 78, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/components/db.pyx", line 183, in prodigy.components.db.Database.add_examples
  File "cython_src/prodigy/components/db.pyx", line 188, in prodigy.components.db.Database.add_examples
OverflowError: Maximum recursion level reached

Using Prodigy 0.4, Python 3.6.3, spaCy 2.0.0a17.

Hmm, this is interesting…The error occurs when adding the answers to the database. The last line in the stack trace mostly consists of a ujson.dumps() on each answer. I did some googling and came across issues around dumping generators – but I don’t see how this could happen here, since it’s storing the annotations it receives from the web app (always a list of dictionaries).

Unless your update function is somehow modifying the answers it receives in place? Could you add some logging around this and also check that the answers received by your update function look reasonable?

update typically returns the loss – but you should also be able to make it return something else if you need to. The return value is passed into the progress function as the keyword argument loss – if a progress function exists. And the float calculated by the progress function is then returned by the REST API and shown in the progress bar on the front-end. This lets you calculate an estimated annotation progress based on the loss.

Ah right, I do morph the stream of dictionaries.

So for a stream of tasks, I add a doc key where a spaCy Doc object is stored. A spaCy Doc object is not serializable so that’s where it was giving me errors. I also tolist() all of the NumPy arrays so they are serializable too.

Inside update(), I use the doc for featurization and updating the weights. I dropped the doc object from the task of streams after updating the weights, and the issue was fixed!

Thanks

1 Like

Thanks for the update – nice to hear that you got it working!

Just thinking about how to solve this in Prodigy going forward. The main question is: Should the update() function be allowed to mutate the annotations received from the front-end and thus modify what’s stored in the database?

  • CONS: Can easily lead to unexpected results and issues like this one. Annotation tasks need to be JSON-serializable objects anyways in order to be rendered by the web application. If a user wants to attach additional data, this can be done before annotation, not after. Also, the database should always reflect the exact state of collected annotations, i.e. the same tasks that were shown to the annotator on the front-end.
  • PROS: Maybe there will be cases in the future where a model should compute additional data based on the collected annotations and answers, and store this with the task? The update function would be the only place for that, since it gives the user access to the answers before they are stored. However, maybe there should be a more explicit API for this then? Randomly mutating a list is a pretty unsatisfying solution.

Ya for context, I am doing some mutation shenanigans so I can do classification directly on a span, one at a time. For each sentence, I have a span_fun that “flattens” the task so that any label inside spans is moved to the task.

e.g. going from one task

{
    'text': 'My name is Barack Obama and Donald Trump.',
    'spans': [
        {'start': 11, 'end': 23, 'label': 'DEMOCRAT'},
        {'start': 28, 'end': 40, 'label': 'REPUBLICAN'},
    ],
    'doc': My name is Barack Obama and Donald Trump.,
}

to one task per span

{
    'text': 'My name is Barack Obama and Donald Trump.',
    'spans': [
        {'start': 11, 'end': 23, 'label': 'DEMOCRAT'},
    ],
    'doc': My name is Barack Obama and Donald Trump.,
},
{
    'text': 'My name is Barack Obama and Donald Trump.',
    'spans': [
        {'start': 28, 'end': 40, 'label': 'REPUBLICAN'},
    ],
    'doc': My name is Barack Obama and Donald Trump.,
}

My span_fun to get the span needs the doc object. Though there are probably better ways to code up this task (maybe add a state variable separate from task).

So are the labels stored after the update() function? Would it be possible to store annotations then update?