Hi!
Thanks a lot for prodigy.
I've been trying tasks routes recently with partial overlap (average=2.5). I am on prodigy==1.12.7 .
I ran into issues where my users would not get tasks systematically. After some debugging (not easy to setup, maybe I missed a tutorial), I understood it was due to prodigy iterating over the stream looking for valid examples with task router.
As the first ~2k samples have been labelled, they are just skipped but it results in > 3 min (depending on the size of the machine) wait that often results in timeouts.
I did some digging and apparently it's mainly bound by the io to the database due to get_hash_count
called in the router for every user for every session.
When I tried to dig to see why it was slow and if I could improve it. I was blocked by route_average_per_task
being closed source and the code available does not fully seem to work the same. For example: How does `annotations_per_task : 2.5` work. .
Could you provide the code of the router and/or advice on how to improve this (performance & debuggin tips)?
Should I expect this to be this slow? Is there anyway to limit or speed up this call? for example caching the results
Thanks for your help!