Task Routing's problem: I want to get 5 annotations per task but it doesn't work.

kaorisugi · August 24, 2023, 11:05am

Continuing the discussion from Task routers - Problem in JSONL output/Annotators Problem:

Hello!
Task Routing, especially the annotations_per_task setting is what I have been waiting for. Thanks for developing this.
By the way I would like to use Task Routing in textcat.manual recipes.
I prepared the following DATASET based on the article in the link above.

{"text":"1"}
{"text":"2"}
{"text":"3"}
{"text":"4"}
{"text":"5"}
...
{"text":"200"}

And then, I have prepared the following commands.
My expectation is that each of the 200 samples will be annotated with 5 annotations (5 x 200 = 1000).
This will be done with 50 people (50 sessions) and each person will annotate about 20 samples(1000 ÷ 50 = 20).
I set "annotations_per_task": 5in config and added a few other settings needed for my task.

PRODIGY_ALLOWED_SESSIONS="user1,user2,user3,user4,user5,user6,user7,user8,user9,user10,user11,user12,user13,user14,user15,user16,user17,user18,user19,user20,user21,user22,user23,user24,user25,user26,user27,user28,user29,user30,user31,user32,user33,user34,user35,user36,user37,user38,user39,user40,user41,user42,user43,user44,user45,user46,user47,user48,user49,user50" PRODIGY_LOGGING=verbose PRODIGY_CONFIG_OVERRIDES='{"annotations_per_task": 5, "allow_work_stealing": false, "total_examples_target" : 20, "choice_style": "single", "choice_auto_accept" : true, "batch_size": 1, "auto_count_stream": true}' python3 -m prodigy textcat.manual test_anno_per_task_5 /path/to/data/examples-200.jsonl --label number,text

Then, after all 50 sessions were completed and db-out, there were a number of samples that did not get 5 annotations. Each session received a "No tasks available." message prior to the 20 annotations.
The following are the counts per _task_hash(the same results were obtained for _input_hash).

Counter({985633209: 5, -556031871: 5, 329569934: 5, -1541557956: 5, 1163974572: 5, 192484930: 5, 528576632: 5, 16089945: 5, -2026718971: 5, 1182163795: 5, -1154883125: 5, 266669984: 5, -550346988: 5, 2018616092: 5, 185203310: 5, 30789149: 5, -1493531608: 5, -1683186191: 5, 1439392813: 5, 1949915751: 5, -1581206528: 5, 96950103: 5, 613896265: 5, -777789492: 5, 811274923: 5, 335429472: 5, -1105790372: 5, 1709963953: 5, 678744456: 5, -311834425: 5, 635060206: 5, -50156546: 5, -1946269986: 5, -1258641755: 5, 701617668: 5, 1401556973: 5, -474789062: 5, 1560417868: 5, 1162353785: 5, -1221112846: 5, -1066193932: 5, 1398921421: 4, 361978880: 3, -2139080284: 3, 575893851: 2, -254879685: 2, -918995978: 2, -373626418: 2, 1601894163: 1, -1741484794: 1, 987982384: 1, -1397241190: 1, 121797093: 1, -356912473: 1, -111179660: 1, -1457640643: 1, -1410622120: 1, -673123202: 1, 1484163477: 1, -284385253: 1, 135052315: 1, -2115160802: 1, -974546365: 1, -1163790235: 1, 38560155: 1, -1852320233: 1, -798051285: 1, 2081734424: 1, 1181759398: 1, 1855000249: 1, -2013003049: 1, -2042941621: 1, -982861136: 1, 1983761515: 1, 1777964706: 1, 949713221: 1, 904041387: 1, -450564470: 1, 699214634: 1, -1646456394: 1, 634107789: 1, -660019446: 1, -723195853: 1, -1321424780: 1, 91618038: 1, -582536465: 1, -1114571286: 1, 288509414: 1, -46560300: 1, -1739469801: 1, 1302896661: 1, -1305412236: 1, 1050232530: 1, -1760464575: 1, 1009222982: 1, -1034701401: 1, -2046352622: 1, -1867533323: 1, 1849880476: 1, -233250456: 1, -180294626: 1, -1767077153: 1, 1425277282: 1, -1350308508: 1, 2067531768: 1, -864168898: 1, 1967062351: 1, -1248113859: 1, -219564559: 1, 283591064: 1, -209632410: 1, 1118550926: 1, -1729700932: 1, 462571023: 1, 1320552297: 1, 235175096: 1, 2131024112: 1, -941922759: 1, -2019298900: 1, -1217966674: 1, 648801801: 1, 1049894917: 1, -2129863201: 1, -805111550: 1, 312127709: 1, -1695386030: 1, 1305419342: 1, -354062993: 1, 352352375: 1, 1200874921: 1, 986240793: 1, -1807644168: 1, -1187686917: 1, -1172738312: 1, 636429977: 1, -764442540: 1, -758264452: 1, 173208930: 1, -1211930598: 1, -1998097200: 1, -1004836253: 1, -1639136: 1, 1787390710: 1, -786843049: 1, 743142045: 1, 1138201042: 1, -532916708: 1, 224893920: 1, -52228360: 1, -2050838859: 1, -426991815: 1, 718454782: 1, -443673036: 1, 1041143928: 1, 2146508570: 1, 53452206: 1, -213358472: 1, 1099867013: 1, -1531576616: 1, -627299351: 1, -655091577: 1, -1220129460: 1, 1580970564: 1, 942602387: 1, -386022397: 1, 1089089563: 1, -1567467545: 1, 1462722653: 1, -42686089: 1, -1042138423: 1, 381463746: 1, -1718923612: 1, -1535411413: 1, 462368655: 1, -366234124: 1, 438831090: 1, 704811896: 1, -377152113: 1, -1305763770: 1, -1408192729: 1, 1476922744: 1, 1802917067: 1, 1620370207: 1, 348510874: 1, 719386162: 1, -1778119712: 1, -156364477: 1, 624572654: 1, -1706477502: 1, -1500828014: 1, 394787983: 1, 361305380: 1, 853456830: 1, 57204667: 1, -742677904: 1, -1990743135: 1, -997379918: 1, -1778230216: 1, -1323157068: 1, -1812111617: 1})

The progress of test_anno_per_task_5DATASET is as follows: Total should be 1000, but it is not.

============================ Annotation Progress ============================

           New   Unique   Total   Unique
--------   ---   ------   -----   ------
Aug 2023   375      200     375      200

Is there something wrong with the command? For example, does my preferred config setting have any effect?
Version information is as follows.

============================== ✨  Prodigy Stats ==============================

Version          1.13.0                        
Location         /data/{user_name}/proofread-data/proofread_annotation/.v_proofread_annotation/lib/python3.9/site-packages/prodigy
Prodigy Home     /home/{user_name}/.prodigy         
Platform         Linux-5.4.0-1093-aws-x86_64-with-glibc2.27
Python Version   3.9.15                        
Spacy Version    3.5.4                         
Database Name    SQLite                        
Database Id      sqlite                        
Total Datasets   422                           
Total Sessions   9927

koaning · August 28, 2023, 9:32am

Hi there!

Thanks for reaching out. I just created an ran a unit test with your settings and didn't see anything unexpected. I'd like to dive a little deeper though just to make sure that I'm not missing anything.

Just to check, are you 100% sure that the users all hit "save" before stopping? I'd mainly like to rule it out because it could happen that "No tasks available" appears and that the user leaves the session open without saving the final items from their stream.

I couldn't help but notice that you set your logging to verbose. Did you also store these logs? If so, could you share them? You should be able to see each task getting routed to each user. Through these logs we might be able to confirm if users may have forgotten to hit "save".

This statement is correct. The only thing to keep in mind is that we're using a hashing mechanism to ensure consistency which means that we don't guarantee that each user will get exactly 20 samples.

I don't think so. The only thing that I noticed is that you use the auto_count_stream setting which was deprecated in v1.12. I'm also wondering about the total_examples_target setting in your situation because it may not play nice with out hashing trick in our router. But that should only cause the progress bar in the front-end to behave differently, it should not influence the task routing itself.

I'll do some more digging and will report if I've spotted anything. If you happen to be able to share the logs though I'd gladly dig into that.

kaorisugi · September 1, 2023, 5:45am

Thank you for checking this topic! @ koaning
I did the annotation test myself, so I am pretty sure I hit the save button. To be precise, when the "no tasks available" message appeared, the save button changed to a check mark and it appeared that no further saving was possible.

Below is a log of the command resumed. (I have hidden the directory usernames, just in case)
Hopefully this will help you figure something out!

e[1;38;5;135m05:26:58e[0m: INIT: Setting all logging levels to 10
e[1;38;5;135m05:27:00e[0m: CLI: limiting user sessions to list: user1, user2, user3, user4, user5, user6, user7, user8, user9, user10, user11, user12, user13, user14, user15, user16, user17, user18, user19, user20, user21, user22, user23, user24, user25, user26, user27, user28, user29, user30, user31, user32, user33, user34, user35, user36, user37, user38, user39, user40, user41, user42, user43, user44, user45, user46, user47, user48, user49, user50
e[1;38;5;135m05:27:00e[0m: RECIPE: Calling recipe 'textcat.manual'
Using 2 label(s): number, text
e[1;38;5;135m05:27:00e[0m: RECIPE: Starting recipe textcat.manual
e[1;38;5;135m05:27:00e[0m: {'dataset': 'test_anno_per_task_5', 'source': '/data/{user}/proofread-data/proofread_annotation/prodigy_test/data/examples-200.jsonl', 'loader': None, 'label': ['number', 'text'], 'exclusive': False, 'exclude': None}
e[1;38;5;135m05:27:00e[0m: RECIPE: Annotating with 2 labels
e[1;38;5;135m05:27:00e[0m: ['number', 'text']
e[1;38;5;135m05:27:00e[0m: get_stream: Loading .jsonl file
e[1;38;5;135m05:27:00e[0m: get_stream: Rehashing stream
e[1;38;5;135m05:27:00e[0m: get_stream: Removing duplicates
e[1;38;5;135m05:27:00e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:27:00e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:27:00e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:27:00e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[38;5;3m⚠ Config setting 'choice_style' defined in recipe is overwritten by a
different value set in the global or local prodigy.json. This may lead to
unexpected results and potentially changes to the core behavior of the recipe.
If that's surprising, you should probably remove the setting 'choice_style' from
your prodigy.json.e[0m
e[1;38;5;135m05:27:00e[0m: VALIDATE: Validating components returned by recipe
e[1;38;5;135m05:27:00e[0m: CONTROLLER: Initialising from recipe
e[1;38;5;135m05:27:00e[0m: {'before_db': None, 'config': {'labels': ['number', 'text'], 'choice_style': 'single', 'choice_auto_accept': True, 'exclude_by': 'input', 'auto_count_stream': True, 'dataset': 'test_anno_per_task_5', 'recipe_name': 'textcat.manual', 'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'batch_size': 1}, 'dataset': 'test_anno_per_task_5', 'db': True, 'exclude': None, 'get_session_id': None, 'metrics': None, 'on_exit': None, 'on_load': None, 'progress': <prodigy.components.progress.TargetTotalProgressEstimator object at 0x7fdfea0c31c0>, 'self': <prodigy.core.Controller object at 0x7fdfea0c31f0>, 'session_factory': None, 'stream': <prodigy.components.stream.Stream object at 0x7fdfea0c37f0>, 'task_router': None, 'update': <cyfunction _noop_update at 0x7fdfea645110>, 'validate_answer': None, 'view_id': 'choice'}
e[1;38;5;135m05:27:00e[0m: VALIDATE: Creating validator for view ID 'choice'
e[1;38;5;135m05:27:00e[0m: VALIDATE: Validating Prodigy and recipe config
e[1;38;5;135m05:27:00e[0m: FILTER: Filtering duplicates from stream
e[1;38;5;135m05:27:00e[0m: {'by_input': True, 'by_task': True, 'stream': <_cython_3_0_0b3.generator object at 0x7fdfea0d1180>, 'warn_fn': <bound method Printer.warn of <wasabi.printer.Printer object at 0x7fe05304abb0>>, 'warn_threshold': 0.4}
e[1;38;5;135m05:27:00e[0m: FILTER: Filtering out empty examples for key 'text'
e[1;38;5;135m05:27:00e[0m: PREPROCESS: Add multiple choice options for 2 labels
e[1;38;5;135m05:27:00e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:27:00e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:27:00e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:27:00e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:27:00e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:27:00e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:27:00e[0m: DB: Creating unstructured dataset '2023-09-01_05-27-00'
e[1;38;5;135m05:27:00e[0m: {'created': datetime.datetime(2023, 8, 24, 9, 16, 16)}
e[1;38;5;135m05:27:00e[0m: STREAM: Created queue for test_anno_per_task_5-user1.
e[1;38;5;135m05:27:00e[0m: STREAM: Created queue for test_anno_per_task_5-user2.
e[1;38;5;135m05:27:00e[0m: STREAM: Created queue for test_anno_per_task_5-user3.
e[1;38;5;135m05:27:01e[0m: STREAM: Created queue for test_anno_per_task_5-user4.
e[1;38;5;135m05:27:01e[0m: STREAM: Created queue for test_anno_per_task_5-user5.
e[1;38;5;135m05:27:01e[0m: STREAM: Created queue for test_anno_per_task_5-user6.
e[1;38;5;135m05:27:01e[0m: STREAM: Created queue for test_anno_per_task_5-user7.
e[1;38;5;135m05:27:01e[0m: STREAM: Created queue for test_anno_per_task_5-user8.
e[1;38;5;135m05:27:02e[0m: STREAM: Created queue for test_anno_per_task_5-user9.
e[1;38;5;135m05:27:02e[0m: STREAM: Created queue for test_anno_per_task_5-user10.
e[1;38;5;135m05:27:02e[0m: STREAM: Created queue for test_anno_per_task_5-user11.
e[1;38;5;135m05:27:02e[0m: STREAM: Created queue for test_anno_per_task_5-user12.
e[1;38;5;135m05:27:02e[0m: STREAM: Created queue for test_anno_per_task_5-user13.
e[1;38;5;135m05:27:03e[0m: STREAM: Created queue for test_anno_per_task_5-user14.
e[1;38;5;135m05:27:03e[0m: STREAM: Created queue for test_anno_per_task_5-user15.
e[1;38;5;135m05:27:03e[0m: STREAM: Created queue for test_anno_per_task_5-user16.
e[1;38;5;135m05:27:03e[0m: STREAM: Created queue for test_anno_per_task_5-user17.
e[1;38;5;135m05:27:03e[0m: STREAM: Created queue for test_anno_per_task_5-user18.
e[1;38;5;135m05:27:04e[0m: STREAM: Created queue for test_anno_per_task_5-user19.
e[1;38;5;135m05:27:04e[0m: STREAM: Created queue for test_anno_per_task_5-user20.
e[1;38;5;135m05:27:04e[0m: STREAM: Created queue for test_anno_per_task_5-user21.
e[1;38;5;135m05:27:04e[0m: STREAM: Created queue for test_anno_per_task_5-user22.
e[1;38;5;135m05:27:04e[0m: STREAM: Created queue for test_anno_per_task_5-user23.
e[1;38;5;135m05:27:05e[0m: STREAM: Created queue for test_anno_per_task_5-user24.
e[1;38;5;135m05:27:05e[0m: STREAM: Created queue for test_anno_per_task_5-user25.
e[1;38;5;135m05:27:05e[0m: STREAM: Created queue for test_anno_per_task_5-user26.
e[1;38;5;135m05:27:05e[0m: STREAM: Created queue for test_anno_per_task_5-user27.
e[1;38;5;135m05:27:05e[0m: STREAM: Created queue for test_anno_per_task_5-user28.
e[1;38;5;135m05:27:06e[0m: STREAM: Created queue for test_anno_per_task_5-user29.
e[1;38;5;135m05:27:06e[0m: STREAM: Created queue for test_anno_per_task_5-user30.
e[1;38;5;135m05:27:06e[0m: STREAM: Created queue for test_anno_per_task_5-user31.
e[1;38;5;135m05:27:06e[0m: STREAM: Created queue for test_anno_per_task_5-user32.
e[1;38;5;135m05:27:06e[0m: STREAM: Created queue for test_anno_per_task_5-user33.
e[1;38;5;135m05:27:07e[0m: STREAM: Created queue for test_anno_per_task_5-user34.
e[1;38;5;135m05:27:07e[0m: STREAM: Created queue for test_anno_per_task_5-user35.
e[1;38;5;135m05:27:07e[0m: STREAM: Created queue for test_anno_per_task_5-user36.
e[1;38;5;135m05:27:07e[0m: STREAM: Created queue for test_anno_per_task_5-user37.
e[1;38;5;135m05:27:07e[0m: STREAM: Created queue for test_anno_per_task_5-user38.
e[1;38;5;135m05:27:08e[0m: STREAM: Created queue for test_anno_per_task_5-user39.
e[1;38;5;135m05:27:08e[0m: STREAM: Created queue for test_anno_per_task_5-user40.
e[1;38;5;135m05:27:08e[0m: STREAM: Created queue for test_anno_per_task_5-user41.
e[1;38;5;135m05:27:08e[0m: STREAM: Created queue for test_anno_per_task_5-user42.
e[1;38;5;135m05:27:08e[0m: STREAM: Created queue for test_anno_per_task_5-user43.
e[1;38;5;135m05:27:09e[0m: STREAM: Created queue for test_anno_per_task_5-user44.
e[1;38;5;135m05:27:09e[0m: STREAM: Created queue for test_anno_per_task_5-user45.
e[1;38;5;135m05:27:09e[0m: STREAM: Created queue for test_anno_per_task_5-user46.
e[1;38;5;135m05:27:09e[0m: STREAM: Created queue for test_anno_per_task_5-user47.
e[1;38;5;135m05:27:10e[0m: STREAM: Created queue for test_anno_per_task_5-user48.
e[1;38;5;135m05:27:10e[0m: STREAM: Created queue for test_anno_per_task_5-user49.
e[1;38;5;135m05:27:10e[0m: STREAM: Created queue for test_anno_per_task_5-user50.
e[1;38;5;135m05:27:10e[0m: CORS: initialized with wildcard "*" CORS origins

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

e[32mINFOe[0m:     Started server process [e[36m9022e[0m]
e[32mINFOe[0m:     Waiting for application startup.
e[32mINFOe[0m:     Application startup complete.
e[32mINFOe[0m:     Uvicorn running on e[1mhttp://localhost:8080e[0m (Press CTRL+C to quit)
e[32mINFOe[0m:     127.0.0.1:58030 - "e[1mGET /?session=user1 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:58030 - "e[1mGET /bundle.js HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:29:17e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:29:17e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:29:17e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:29:17e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:29:17e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:29:17e[0m: DB: Connecting to database SQLite
e[32mINFOe[0m:     127.0.0.1:58030 - "e[1mGET /project/user1 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:58030 - "e[1mGET /robotocondensed-bold.woff2 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:58040 - "e[1mGET /lato-regular.woff2 HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:29:17e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:29:17e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:29:17e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:29:17e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:29:17e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:29:17e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:29:17e[0m: POST: /get_session_questions
e[1;38;5;135m05:29:17e[0m: CONTROLLER: Getting batch of questions for session: test_anno_per_task_5-user1
e[32mINFOe[0m:     127.0.0.1:58048 - "e[1mGET /lato-bold.woff2 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:58048 - "e[1mGET /favicon.ico HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:29:21e[0m: ROUTER: Routing item with _input_hash=272586206 -> ['test_anno_per_task_5-user7', 'test_anno_per_task_5-user41']
e[1;38;5;135m05:29:21e[0m: ROUTER: Routing item with _input_hash=379431406 -> ['test_anno_per_task_5-user7']
e[1;38;5;135m05:29:37e[0m: RESPONSE: /get_session_questions (0 examples)
e[1;38;5;135m05:29:37e[0m: {'tasks': [], 'total': 375, 'progress': 0.9, 'session_id': 'test_anno_per_task_5-user1'}
e[32mINFOe[0m:     127.0.0.1:58030 - "e[1mPOST /get_session_questions HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:48030 - "e[1mGET /?session=user1 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:48030 - "e[1mGET /bundle.js HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:30:03e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:30:03e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:30:03e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:30:03e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:30:03e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:30:03e[0m: DB: Connecting to database SQLite
e[32mINFOe[0m:     127.0.0.1:48030 - "e[1mGET /project/user1 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:48046 - "e[1mGET /favicon.ico HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:30:03e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:30:03e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:30:03e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:30:03e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:30:03e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:30:03e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:30:03e[0m: POST: /get_session_questions
e[1;38;5;135m05:30:03e[0m: CONTROLLER: Getting batch of questions for session: test_anno_per_task_5-user1
e[1;38;5;135m05:30:03e[0m: RESPONSE: /get_session_questions (0 examples)
e[1;38;5;135m05:30:03e[0m: {'tasks': [], 'total': 375, 'progress': 0.9, 'session_id': 'test_anno_per_task_5-user1'}
e[32mINFOe[0m:     127.0.0.1:48046 - "e[1mPOST /get_session_questions HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mGET /?session=user41 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mGET /bundle.js HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:30:15e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:30:15e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:30:15e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:30:15e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:30:15e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:30:16e[0m: DB: Connecting to database SQLite
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mGET /project/user41 HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:30:16e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:30:16e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:30:16e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:30:16e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:30:16e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:30:16e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:30:16e[0m: POST: /get_session_questions
e[1;38;5;135m05:30:16e[0m: CONTROLLER: Getting batch of questions for session: test_anno_per_task_5-user41
e[1;38;5;135m05:30:16e[0m: RESPONSE: /get_session_questions (0 examples)
e[1;38;5;135m05:30:16e[0m: {'tasks': [], 'total': 375, 'progress': 0.5, 'session_id': 'test_anno_per_task_5-user41'}
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mPOST /get_session_questions HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mGET /?session=user41 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mGET /bundle.js HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:30:19e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:30:19e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:30:19e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:30:19e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:30:19e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:30:19e[0m: DB: Connecting to database SQLite
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mGET /project/user41 HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:30:19e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:30:19e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:30:19e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:30:19e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:30:19e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:30:19e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:30:19e[0m: POST: /get_session_questions
e[1;38;5;135m05:30:19e[0m: CONTROLLER: Getting batch of questions for session: test_anno_per_task_5-user41
e[1;38;5;135m05:30:19e[0m: RESPONSE: /get_session_questions (0 examples)
e[1;38;5;135m05:30:19e[0m: {'tasks': [], 'total': 375, 'progress': 0.5, 'session_id': 'test_anno_per_task_5-user41'}
e[32mINFOe[0m:     127.0.0.1:52418 - "e[1mPOST /get_session_questions HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:52434 - "e[1mGET /favicon.ico HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mGET /?session=user7 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mGET /bundle.js HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:31:16e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:31:16e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:31:16e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:31:16e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:31:16e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:31:16e[0m: DB: Connecting to database SQLite
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mGET /project/user7 HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:31:16e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:31:16e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:31:16e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:31:16e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:31:16e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:31:16e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:31:16e[0m: POST: /get_session_questions
e[1;38;5;135m05:31:16e[0m: CONTROLLER: Getting batch of questions for session: test_anno_per_task_5-user7
e[1;38;5;135m05:31:16e[0m: RESPONSE: /get_session_questions (0 examples)
e[1;38;5;135m05:31:16e[0m: {'tasks': [], 'total': 375, 'progress': 0.6, 'session_id': 'test_anno_per_task_5-user7'}
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mPOST /get_session_questions HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mGET /?session=user7 HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mGET /bundle.js HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:31:18e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:31:18e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:31:18e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:31:18e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:31:18e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:31:18e[0m: DB: Connecting to database SQLite
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mGET /project/user7 HTTP/1.1e[0m" e[32m200 OKe[0m
e[1;38;5;135m05:31:18e[0m: CONFIG: Using config from global prodigy.json
e[1;38;5;135m05:31:18e[0m: /home/{user}/.prodigy/prodigy.json
e[1;38;5;135m05:31:18e[0m: CONFIG: Merging config from CLI overrides
e[1;38;5;135m05:31:18e[0m: {'annotations_per_task': 5, 'allow_work_stealing': False, 'total_examples_target': 20, 'choice_style': 'single', 'choice_auto_accept': True, 'batch_size': 1, 'auto_count_stream': True}
e[1;38;5;135m05:31:18e[0m: DB: Initializing database SQLite
e[1;38;5;135m05:31:18e[0m: DB: Connecting to database SQLite
e[1;38;5;135m05:31:18e[0m: POST: /get_session_questions
e[1;38;5;135m05:31:18e[0m: CONTROLLER: Getting batch of questions for session: test_anno_per_task_5-user7
e[1;38;5;135m05:31:18e[0m: RESPONSE: /get_session_questions (0 examples)
e[1;38;5;135m05:31:18e[0m: {'tasks': [], 'total': 375, 'progress': 0.6, 'session_id': 'test_anno_per_task_5-user7'}
e[32mINFOe[0m:     127.0.0.1:49688 - "e[1mPOST /get_session_questions HTTP/1.1e[0m" e[32m200 OKe[0m
e[32mINFOe[0m:     127.0.0.1:49694 - "e[1mGET /favicon.ico HTTP/1.1e[0m" e[32m200 OKe[0m

kaorisugi · September 1, 2023, 6:45am

I have come up with one possibility for this problem. It is that I used a DATASET NAME that was already in use, which may have caused the glitch.So I retested with a completely new DATASET NAME and got the expected results.

However, under certain conditions, I ran into the problem again. That is, if I stopped and re executed the command once before all sessions had completed annotation.
If we did not do this behavior, there seems to be no problem with the annotations_per_task setting.

Unfortunately, in my organization, we have to stop and rerun the command periodically for security reasons.
Is it possible to prevent the annotations_per_task problem under these conditions?

koaning · September 1, 2023, 9:13am

Ah yeah, that might have an effect. Our task router checks the database state before sending the task to users. If the task has already been annotated enough times it will be skipped. However, that should not affect the total number of annotations that you'd eventually have. You might see different annotations within the session, but restarting the server shouldn't cause the task router to route the tasks differently.

I do want to make sure that nothing is broken here, so I will dive into this again keeping the "server restart" in mind. Again, the task router is designed to be robust against this, but it's always possible that there's a bug on our side. If there's anything on your end that you can share that might help me investigate, do let me know!

One thing: looking at your logs I notice only two lines from the task router.

ROUTER: Routing item with _input_hash=272586206 -> ['test_anno_per_task_5-user7', 'test_anno_per_task_5-user41']
ROUTER: Routing item with _input_hash=379431406 -> ['test_anno_per_task_5-user7']

This indeed suggests that the task already has annotations. Otherwise it would have routed each example to five users.

koaning · September 1, 2023, 11:05am

Yep! I found a bug and I am working on a patch. With a bit of luck it should be out next week.

Thanks for reporting! This issue was definitely related to restarting the server frequently but our task router should be robust against that.

kaorisugi · September 4, 2023, 5:17am

That would be great to hear! I will wait for the task router to be ready to use without any problems. Thanks for checking it out.

koaning · September 8, 2023, 8:46am

The bugfix should be live now! When you download versions 1.13.2 it should be gone.

If the issue persists though, do let me know!

kaorisugi · October 16, 2023, 1:58am

Thanks for fixing the bug! I will post again if I notice anything else.
Thanks for taking care of this.

Topic		Replies	Views
Task routers - Problem in JSONL output/Annotators Problem ner	7	406	August 3, 2023
custom task routing usage , database , image , audio	13	652	June 26, 2023
Tasks left unannotated with multi-annotator setup	7	24	June 19, 2025
Inconsistency Number of Annotated Data ner , textcat	10	34	November 27, 2024
Problems with task stealing and sessions	7	43	May 9, 2025

Task Routing's problem: I want to get 5 annotations per task but it doesn't work.

Related topics