Continuing the discussion from Task routers - Problem in JSONL output/Annotators Problem:
Hello!
Task Routing, especially the annotations_per_task
setting is what I have been waiting for. Thanks for developing this.
By the way I would like to use Task Routing in textcat.manual
recipes.
I prepared the following DATASET based on the article in the link above.
{"text":"1"}
{"text":"2"}
{"text":"3"}
{"text":"4"}
{"text":"5"}
...
{"text":"200"}
And then, I have prepared the following commands.
My expectation is that each of the 200 samples will be annotated with 5 annotations (5 x 200 = 1000).
This will be done with 50 people (50 sessions) and each person will annotate about 20 samples(1000 ÷ 50 = 20).
I set "annotations_per_task": 5
in config and added a few other settings needed for my task.
PRODIGY_ALLOWED_SESSIONS="user1,user2,user3,user4,user5,user6,user7,user8,user9,user10,user11,user12,user13,user14,user15,user16,user17,user18,user19,user20,user21,user22,user23,user24,user25,user26,user27,user28,user29,user30,user31,user32,user33,user34,user35,user36,user37,user38,user39,user40,user41,user42,user43,user44,user45,user46,user47,user48,user49,user50" PRODIGY_LOGGING=verbose PRODIGY_CONFIG_OVERRIDES='{"annotations_per_task": 5, "allow_work_stealing": false, "total_examples_target" : 20, "choice_style": "single", "choice_auto_accept" : true, "batch_size": 1, "auto_count_stream": true}' python3 -m prodigy textcat.manual test_anno_per_task_5 /path/to/data/examples-200.jsonl --label number,text
Then, after all 50 sessions were completed and db-out
, there were a number of samples that did not get 5 annotations. Each session received a "No tasks available." message prior to the 20 annotations.
The following are the counts per _task_hash
(the same results were obtained for _input_hash
).
Counter({985633209: 5, -556031871: 5, 329569934: 5, -1541557956: 5, 1163974572: 5, 192484930: 5, 528576632: 5, 16089945: 5, -2026718971: 5, 1182163795: 5, -1154883125: 5, 266669984: 5, -550346988: 5, 2018616092: 5, 185203310: 5, 30789149: 5, -1493531608: 5, -1683186191: 5, 1439392813: 5, 1949915751: 5, -1581206528: 5, 96950103: 5, 613896265: 5, -777789492: 5, 811274923: 5, 335429472: 5, -1105790372: 5, 1709963953: 5, 678744456: 5, -311834425: 5, 635060206: 5, -50156546: 5, -1946269986: 5, -1258641755: 5, 701617668: 5, 1401556973: 5, -474789062: 5, 1560417868: 5, 1162353785: 5, -1221112846: 5, -1066193932: 5, 1398921421: 4, 361978880: 3, -2139080284: 3, 575893851: 2, -254879685: 2, -918995978: 2, -373626418: 2, 1601894163: 1, -1741484794: 1, 987982384: 1, -1397241190: 1, 121797093: 1, -356912473: 1, -111179660: 1, -1457640643: 1, -1410622120: 1, -673123202: 1, 1484163477: 1, -284385253: 1, 135052315: 1, -2115160802: 1, -974546365: 1, -1163790235: 1, 38560155: 1, -1852320233: 1, -798051285: 1, 2081734424: 1, 1181759398: 1, 1855000249: 1, -2013003049: 1, -2042941621: 1, -982861136: 1, 1983761515: 1, 1777964706: 1, 949713221: 1, 904041387: 1, -450564470: 1, 699214634: 1, -1646456394: 1, 634107789: 1, -660019446: 1, -723195853: 1, -1321424780: 1, 91618038: 1, -582536465: 1, -1114571286: 1, 288509414: 1, -46560300: 1, -1739469801: 1, 1302896661: 1, -1305412236: 1, 1050232530: 1, -1760464575: 1, 1009222982: 1, -1034701401: 1, -2046352622: 1, -1867533323: 1, 1849880476: 1, -233250456: 1, -180294626: 1, -1767077153: 1, 1425277282: 1, -1350308508: 1, 2067531768: 1, -864168898: 1, 1967062351: 1, -1248113859: 1, -219564559: 1, 283591064: 1, -209632410: 1, 1118550926: 1, -1729700932: 1, 462571023: 1, 1320552297: 1, 235175096: 1, 2131024112: 1, -941922759: 1, -2019298900: 1, -1217966674: 1, 648801801: 1, 1049894917: 1, -2129863201: 1, -805111550: 1, 312127709: 1, -1695386030: 1, 1305419342: 1, -354062993: 1, 352352375: 1, 1200874921: 1, 986240793: 1, -1807644168: 1, -1187686917: 1, -1172738312: 1, 636429977: 1, -764442540: 1, -758264452: 1, 173208930: 1, -1211930598: 1, -1998097200: 1, -1004836253: 1, -1639136: 1, 1787390710: 1, -786843049: 1, 743142045: 1, 1138201042: 1, -532916708: 1, 224893920: 1, -52228360: 1, -2050838859: 1, -426991815: 1, 718454782: 1, -443673036: 1, 1041143928: 1, 2146508570: 1, 53452206: 1, -213358472: 1, 1099867013: 1, -1531576616: 1, -627299351: 1, -655091577: 1, -1220129460: 1, 1580970564: 1, 942602387: 1, -386022397: 1, 1089089563: 1, -1567467545: 1, 1462722653: 1, -42686089: 1, -1042138423: 1, 381463746: 1, -1718923612: 1, -1535411413: 1, 462368655: 1, -366234124: 1, 438831090: 1, 704811896: 1, -377152113: 1, -1305763770: 1, -1408192729: 1, 1476922744: 1, 1802917067: 1, 1620370207: 1, 348510874: 1, 719386162: 1, -1778119712: 1, -156364477: 1, 624572654: 1, -1706477502: 1, -1500828014: 1, 394787983: 1, 361305380: 1, 853456830: 1, 57204667: 1, -742677904: 1, -1990743135: 1, -997379918: 1, -1778230216: 1, -1323157068: 1, -1812111617: 1})
The progress of test_anno_per_task_5
DATASET is as follows: Total
should be 1000, but it is not.
============================ Annotation Progress ============================
New Unique Total Unique
-------- --- ------ ----- ------
Aug 2023 375 200 375 200
Is there something wrong with the command? For example, does my preferred config setting have any effect?
Version information is as follows.
============================== ✨ Prodigy Stats ==============================
Version 1.13.0
Location /data/{user_name}/proofread-data/proofread_annotation/.v_proofread_annotation/lib/python3.9/site-packages/prodigy
Prodigy Home /home/{user_name}/.prodigy
Platform Linux-5.4.0-1093-aws-x86_64-with-glibc2.27
Python Version 3.9.15
Spacy Version 3.5.4
Database Name SQLite
Database Id sqlite
Total Datasets 422
Total Sessions 9927