route_average_per_task is slow and results in timeouts with No Tasks Available


Thanks a lot for prodigy.
I've been trying tasks routes recently with partial overlap (average=2.5). I am on prodigy==1.12.7 .

I ran into issues where my users would not get tasks systematically. After some debugging (not easy to setup, maybe I missed a tutorial), I understood it was due to prodigy iterating over the stream looking for valid examples with task router.
As the first ~2k samples have been labelled, they are just skipped but it results in > 3 min (depending on the size of the machine) wait that often results in timeouts.
I did some digging and apparently it's mainly bound by the io to the database due to get_hash_count called in the router for every user for every session.

When I tried to dig to see why it was slow and if I could improve it. I was blocked by route_average_per_task being closed source and the code available does not fully seem to work the same. For example: How does `annotations_per_task : 2.5` work. .

Could you provide the code of the router and/or advice on how to improve this (performance & debuggin tips)?
Should I expect this to be this slow? Is there anyway to limit or speed up this call? for example caching the results

Thanks for your help!

hi @Arnault,

Thanks for your post.

There was a task router fix in v1.13.2 that fixed this issue:

The fixes include:

  • Task router issues occurred due to treating the average variable as non-constant, leading to overwrites.
  • The db.get_hash_count call inaccurately counted hashes by querying both the task and session tables.
  • Users annotating examples from previous sessions could still receive tasks, but on restart, their presence in the annot pool hindered item distribution to other users.

I suspect that fix in get_hash_count may be the problem. Can you update to the latest version of Prodigy and see if the problem remains?

Hi @ryanwesslen ,

Thank your answer.
I can not update it currently. This is my old personal license.
I am evaluating to buy some license for my current company. :slight_smile:
So I might come back soon, I think I am going to create my own task router then and try to understand the issues from this changes.

Thank you!

For information, I managed to make it work.