Task routers - Problem in JSONL output/Annotators Problem

I used the task routers for annotating a sample of 30 chunks using recipe ner.manual.

Followed the task routers documentation, first created the data.jsonl with 30 chunks. Then, updated prodigy.json with annotation_per_task as 3.

My annotator pool has 5 annotators where atleast 3 annotators should see the same chunks as per prodigy.json config file. This is a situation of partial overlap.

Out of 5 annotators, only 2 annotators were available at that time. So, user1 got 30 chunks who started early for annotating the chunks and user2 got 9 chunks who started after 10 minutes of 'user1. Following queries are:

  1. Once annotation is done, using prodigy db-out <dataset> <directory>, I could see count of 30 chunks, that too annotated only by user1 found with the help of "_annotator_id":"group2_v3-user1" and not by 'user2. Here user2 handles 9 chunks, those records are not there.
    Am I failing to do anything? Please comment on this.

  2. I am assuming that, as user1 and user2 at that moment and finished the annotation. So, now user3, user4 and user5 can't see any chunks. Is this scenario will face if we use Task Routing? or Later user3, user4 and user5 can see their dedicated chunks eventhough user1, user2 finished their allotted chunks.

prodigy.json

{
    "theme": "basic",
    "custom_theme": {},
    "batch_size": 10,
    "port": xxxx,
    "host": "xxx.yy.zzz.aaa",
    "cors": true,
    "db": "sqlite",
    "db_settings":{"sqlite": 
			{
			    "name": "prodigy.db",
			    "path": "path/to/my/prodigy"
		        }},
    "api_keys": {},
    "validate": false,
    "auto_create": true,
    "auto_exclude_current": true,
    "instant_submit": false,
    "annotation_per_task": 3,
    "show_stats": true,
    "hide_meta": false,
    "show_flag": false,
    "instructions": false,
    "swipe": false,
    "split_sents_threshold":5000,
    "diff_style": "words",
    "html_template": false,
    "global_css": null,
    "javascript": null,
    "writing_dir": "ltr",
    "hide_true_newline_tokens": false,
    "ner_manual_require_click": false,
    "ner_manual_label_style": "list",
    "choice_style": "single",
    "choice_auto_accept": false,
    "darken_image": 0,
    "show_bounding_box_center": false,
    "preview_bounding_boxes": false,
    "shade_bounding_boxes": false
}

I hope this helps to answer my queries.

Thanks in advance
Vinoth Kumar S

Just to check, did you set the PRODIGY_ALLOWED_SESSIONS environment variable upfront so Prodigy is aware of all the annotators? If not, there's this caveat to be aware of.

It's also explained in detail on the docs.

That said, I may have spotted the issue in your configuration file. You seem to have configured annotation_per_task. The setting you want is annotations_per_task, there's an extra "s" in "annotations". I would've expected the validation system to catch this though, so I'll do a deep dive to figure out what may have gone wrong there.

Could you confirm if the annotations_per_task setting fixes things? If not I'll gladly dive deeper into this, but I'd like to rule out this issue is caused by a typo first.

Hi Vincent,

  1. I am always enabling PRODIGY_ALLOWED_SESSIONS='user1,user2,user3,user4,user5'. So that prodigy aware about the sessions going to handle.

  2. changed to annotations_per_task (added 's')

Before solving the issues, is new updation of Task Routing works with v1.11??

Total number chunks: 30. Prodigy splits 30 as below:
user1 --> 18
user2 --> 10
user3 --> 7
user4 --> 0 (as he accessed the link after completion of other users)
user5 --> 15

Issues raised after your suggestions:

  1. After giving the command prodigy db-out , I checked the JSONL file where only user1 and user5 tagging was there. w.r.t user1, only 10 counts and user5, 20 counts which are against the number what user1 and user5 tagged.

Where are the chunks tagged by user2 and user3? Also, chunks tagged by user1 ?

  1. Here 4 annotators tagged the data, so output should contains more than actual data i.e 30.

Can you please help on this?

Thanks

Before diving into this further it might be good to highlight two mechanisms that might be at play here.

Mechanism 1: Hashing

As the video explains, we're using a hashing trick to allocate the examples consistently. In the long run, these hashes will cause an even distribution, but even then, not a perfect one. That's because the hashes that we use, while evenly distributed, can still be seen as "random". So this might explain some of the deviations that you're seeing here.

If you're interested in checking this, you should be able to turn on logging (set it to verbose) and see output appear whenever an allocation is made by one of our internal task routers.

Mechanism 2: Work Stealing

Prodigy also comes with a work-stealing mechanic, which is described at the bottom of the task routers docs.

To quote what is mentioned there:

Besides task routing, Prodigy also offers a work stealing mechanic that might also influence who will annotate the examples. Work stealing occurs whenever an annotator has an empty queue and can use the queue of another user to keep annotating.

Work stealing is a preventive mechanism to avoid the loss of records in a stream. There might be a situation where an annotator requests a batch of examples for annotation, effectively locking those examples, but never actually annotates them. Work stealing enables annotators who reach the end of a shared stream to annotate these locked examples.

So it could also be that work stealing is causing the imbalance. You can manually force it off by setting "allow_work_stealing" to false in prodigy.json.

I'll now dive into some of your questions.

Task routers were introduced in v1.12 and they are not available in earlier versions. Is there a reason why you need v1.11?

It's not entirely clear what you mean with "chunk" here, but I'm assuming that you're referring to an example that needs to get annotated.

Is it possible for you to run Prodigy one more time but with the verbose logging turned on? That might give us more of a window to see what is happening on your end. Theoretically it is possible that you have duplicates in your examples.jsonl file too, which would also ensure less examples being annotated than you might expect. Another reason why some of the annotations might not show up is that some of the user forgot to hit "save" before signing off. The logs should help give us a hint.

I just tried reproducing this locally, and while I wasn't able to find any issues, I figured that it couldn't hurt to share my findings and steps taken.

I started with this dataset.

{"text":"1"}
{"text":"2"}
{"text":"3"}
{"text":"4"}
{"text":"5"}
{"text":"6"}
{"text":"7"}
{"text":"8"}
{"text":"9"}
{"text":"10"}
{"text":"11"}
{"text":"12"}
{"text":"13"}
{"text":"14"}
{"text":"15"}
{"text":"16"}
{"text":"17"}
{"text":"18"}
{"text":"19"}
{"text":"20"}
{"text":"21"}
{"text":"22"}
{"text":"23"}
{"text":"24"}
{"text":"25"}
{"text":"26"}
{"text":"27"}
{"text":"28"}
{"text":"29"}
{"text":"30"}

It's really just a dummy dataset that has exactly 30 rows. This dataset was used by this recipe call:

PRODIGY_ALLOWED_SESSIONS="user1,user2,user3,user4,user5" PRODIGY_CONFIG_OVERRIDES='{"annotations_per_task": 3, "allow_work_stealing": false}' PRODIGY_LOGGING=verbose python -m prodigy ner.manual issue-6702 en_core_web_sm examples-30.jsonl --label number

Notice that I'm setting the sessions upfront, assigning 3 annotations per task and disallowing work stealing. I'm also turning on the verbose logs.

Next, I start a browser and open up five tabs, one for each user.

As I open up these tabs, I also see logs appear. These are the logs for the first time that the task router triggered, for user1.

14:38:57: ROUTER: Routing item with _input_hash=-2045454197 -> ['issue-6702-2-user4', 'issue-6702-2-user5', 'issue-6702-2-user2']
14:38:57: ROUTER: Routing item with _input_hash=-784123405 -> ['issue-6702-2-user1', 'issue-6702-2-user5', 'issue-6702-2-user4']
14:38:57: ROUTER: Routing item with _input_hash=-805513229 -> ['issue-6702-2-user2', 'issue-6702-2-user5', 'issue-6702-2-user3']
14:38:57: ROUTER: Routing item with _input_hash=-1835389134 -> ['issue-6702-2-user2', 'issue-6702-2-user4', 'issue-6702-2-user1']
14:38:57: ROUTER: Routing item with _input_hash=-1991218384 -> ['issue-6702-2-user2', 'issue-6702-2-user1', 'issue-6702-2-user5']
14:38:57: ROUTER: Routing item with _input_hash=-1639312927 -> ['issue-6702-2-user4', 'issue-6702-2-user2', 'issue-6702-2-user5']
14:38:57: ROUTER: Routing item with _input_hash=372028165 -> ['issue-6702-2-user1', 'issue-6702-2-user3', 'issue-6702-2-user4']
14:38:58: ROUTER: Routing item with _input_hash=2018066853 -> ['issue-6702-2-user4', 'issue-6702-2-user2', 'issue-6702-2-user1']
14:38:58: ROUTER: Routing item with _input_hash=42709192 -> ['issue-6702-2-user3', 'issue-6702-2-user1', 'issue-6702-2-user4']
14:38:58: ROUTER: Routing item with _input_hash=1254603855 -> ['issue-6702-2-user1', 'issue-6702-2-user5', 'issue-6702-2-user2']
14:38:58: ROUTER: Routing item with _input_hash=-1789291377 -> ['issue-6702-2-user4', 'issue-6702-2-user5', 'issue-6702-2-user1']
14:38:58: ROUTER: Routing item with _input_hash=1626462944 -> ['issue-6702-2-user5', 'issue-6702-2-user1', 'issue-6702-2-user4']
14:38:58: ROUTER: Routing item with _input_hash=855192632 -> ['issue-6702-2-user3', 'issue-6702-2-user1', 'issue-6702-2-user5']

You'll notice that it keeps polling until it has 10 examples for user1. That's why there are more than 10 lines. The tasks are distributed somewhat evenly, but not perfectly because of the hashing.

I proceeded by annotating all of these examples by simply hitting accept everywhere. Then, I stop the recipe and call db-out. That results in the following output:

{"text":"2","_input_hash":-784123405,"_task_hash":985633209,"_is_binary":false,"tokens":[{"text":"2","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288501,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"4","_input_hash":-1835389134,"_task_hash":678744456,"_is_binary":false,"tokens":[{"text":"4","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288502,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"5","_input_hash":-1991218384,"_task_hash":-1581206528,"_is_binary":false,"tokens":[{"text":"5","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288502,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"7","_input_hash":372028165,"_task_hash":1401556973,"_is_binary":false,"tokens":[{"text":"7","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288502,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"8","_input_hash":2018066853,"_task_hash":-1154883125,"_is_binary":false,"tokens":[{"text":"8","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288502,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"9","_input_hash":42709192,"_task_hash":811274923,"_is_binary":false,"tokens":[{"text":"9","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288503,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"10","_input_hash":1254603855,"_task_hash":16089945,"_is_binary":false,"tokens":[{"text":"10","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288503,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"11","_input_hash":-1789291377,"_task_hash":-474789062,"_is_binary":false,"tokens":[{"text":"11","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288503,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"12","_input_hash":1626462944,"_task_hash":2018616092,"_is_binary":false,"tokens":[{"text":"12","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288503,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"13","_input_hash":855192632,"_task_hash":335429472,"_is_binary":false,"tokens":[{"text":"13","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288504,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"14","_input_hash":-423927864,"_task_hash":-311834425,"_is_binary":false,"tokens":[{"text":"14","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288504,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"15","_input_hash":-902457195,"_task_hash":185203310,"_is_binary":false,"tokens":[{"text":"15","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288504,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"17","_input_hash":933542244,"_task_hash":-1066193932,"_is_binary":false,"tokens":[{"text":"17","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288505,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"19","_input_hash":2035079888,"_task_hash":30789149,"_is_binary":false,"tokens":[{"text":"19","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288505,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"21","_input_hash":-1139507680,"_task_hash":-1258641755,"_is_binary":false,"tokens":[{"text":"21","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288505,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"22","_input_hash":1765252975,"_task_hash":-1221112846,"_is_binary":false,"tokens":[{"text":"22","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288505,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"25","_input_hash":230261373,"_task_hash":329569934,"_is_binary":false,"tokens":[{"text":"25","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288506,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"27","_input_hash":-738557658,"_task_hash":613896265,"_is_binary":false,"tokens":[{"text":"27","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288506,"_annotator_id":"issue-6702-user1","_session_id":"issue-6702-user1"}
{"text":"1","_input_hash":-2045454197,"_task_hash":1182163795,"_is_binary":false,"tokens":[{"text":"1","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288508,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"3","_input_hash":-805513229,"_task_hash":1162353785,"_is_binary":false,"tokens":[{"text":"3","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288509,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"4","_input_hash":-1835389134,"_task_hash":678744456,"_is_binary":false,"tokens":[{"text":"4","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288509,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"5","_input_hash":-1991218384,"_task_hash":-1581206528,"_is_binary":false,"tokens":[{"text":"5","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288509,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"6","_input_hash":-1639312927,"_task_hash":-777789492,"_is_binary":false,"tokens":[{"text":"6","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288510,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"8","_input_hash":2018066853,"_task_hash":-1154883125,"_is_binary":false,"tokens":[{"text":"8","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288510,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"10","_input_hash":1254603855,"_task_hash":16089945,"_is_binary":false,"tokens":[{"text":"10","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288510,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"14","_input_hash":-423927864,"_task_hash":-311834425,"_is_binary":false,"tokens":[{"text":"14","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288510,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"15","_input_hash":-902457195,"_task_hash":185203310,"_is_binary":false,"tokens":[{"text":"15","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288511,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"16","_input_hash":-343056218,"_task_hash":96950103,"_is_binary":false,"tokens":[{"text":"16","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288511,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"17","_input_hash":933542244,"_task_hash":-1066193932,"_is_binary":false,"tokens":[{"text":"17","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288511,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"18","_input_hash":-1213183927,"_task_hash":-556031871,"_is_binary":false,"tokens":[{"text":"18","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288511,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"20","_input_hash":-883828987,"_task_hash":-1493531608,"_is_binary":false,"tokens":[{"text":"20","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288512,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"21","_input_hash":-1139507680,"_task_hash":-1258641755,"_is_binary":false,"tokens":[{"text":"21","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288512,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"23","_input_hash":348472867,"_task_hash":-1683186191,"_is_binary":false,"tokens":[{"text":"23","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288512,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"24","_input_hash":1124906533,"_task_hash":1709963953,"_is_binary":false,"tokens":[{"text":"24","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288512,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"25","_input_hash":230261373,"_task_hash":329569934,"_is_binary":false,"tokens":[{"text":"25","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288513,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"28","_input_hash":-165836021,"_task_hash":-1541557956,"_is_binary":false,"tokens":[{"text":"28","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288513,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"29","_input_hash":-539359274,"_task_hash":1560417868,"_is_binary":false,"tokens":[{"text":"29","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288513,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"30","_input_hash":1814743553,"_task_hash":-550346988,"_is_binary":false,"tokens":[{"text":"30","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288514,"_annotator_id":"issue-6702-user2","_session_id":"issue-6702-user2"}
{"text":"3","_input_hash":-805513229,"_task_hash":1162353785,"_is_binary":false,"tokens":[{"text":"3","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288519,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"7","_input_hash":372028165,"_task_hash":1401556973,"_is_binary":false,"tokens":[{"text":"7","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288519,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"9","_input_hash":42709192,"_task_hash":811274923,"_is_binary":false,"tokens":[{"text":"9","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288520,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"13","_input_hash":855192632,"_task_hash":335429472,"_is_binary":false,"tokens":[{"text":"13","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288520,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"14","_input_hash":-423927864,"_task_hash":-311834425,"_is_binary":false,"tokens":[{"text":"14","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288520,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"15","_input_hash":-902457195,"_task_hash":185203310,"_is_binary":false,"tokens":[{"text":"15","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288520,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"16","_input_hash":-343056218,"_task_hash":96950103,"_is_binary":false,"tokens":[{"text":"16","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288521,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"22","_input_hash":1765252975,"_task_hash":-1221112846,"_is_binary":false,"tokens":[{"text":"22","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288521,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"23","_input_hash":348472867,"_task_hash":-1683186191,"_is_binary":false,"tokens":[{"text":"23","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288521,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"24","_input_hash":1124906533,"_task_hash":1709963953,"_is_binary":false,"tokens":[{"text":"24","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288521,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"26","_input_hash":1310182142,"_task_hash":266669984,"_is_binary":false,"tokens":[{"text":"26","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288522,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"27","_input_hash":-738557658,"_task_hash":613896265,"_is_binary":false,"tokens":[{"text":"27","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288522,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"29","_input_hash":-539359274,"_task_hash":1560417868,"_is_binary":false,"tokens":[{"text":"29","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288522,"_annotator_id":"issue-6702-user3","_session_id":"issue-6702-user3"}
{"text":"1","_input_hash":-2045454197,"_task_hash":1182163795,"_is_binary":false,"tokens":[{"text":"1","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288527,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"2","_input_hash":-784123405,"_task_hash":985633209,"_is_binary":false,"tokens":[{"text":"2","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288528,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"4","_input_hash":-1835389134,"_task_hash":678744456,"_is_binary":false,"tokens":[{"text":"4","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288528,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"6","_input_hash":-1639312927,"_task_hash":-777789492,"_is_binary":false,"tokens":[{"text":"6","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288528,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"7","_input_hash":372028165,"_task_hash":1401556973,"_is_binary":false,"tokens":[{"text":"7","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288528,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"8","_input_hash":2018066853,"_task_hash":-1154883125,"_is_binary":false,"tokens":[{"text":"8","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288529,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"9","_input_hash":42709192,"_task_hash":811274923,"_is_binary":false,"tokens":[{"text":"9","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288529,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"11","_input_hash":-1789291377,"_task_hash":-474789062,"_is_binary":false,"tokens":[{"text":"11","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288529,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"12","_input_hash":1626462944,"_task_hash":2018616092,"_is_binary":false,"tokens":[{"text":"12","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288530,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"16","_input_hash":-343056218,"_task_hash":96950103,"_is_binary":false,"tokens":[{"text":"16","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288530,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"18","_input_hash":-1213183927,"_task_hash":-556031871,"_is_binary":false,"tokens":[{"text":"18","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288530,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"19","_input_hash":2035079888,"_task_hash":30789149,"_is_binary":false,"tokens":[{"text":"19","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288530,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"20","_input_hash":-883828987,"_task_hash":-1493531608,"_is_binary":false,"tokens":[{"text":"20","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288531,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"24","_input_hash":1124906533,"_task_hash":1709963953,"_is_binary":false,"tokens":[{"text":"24","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288531,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"25","_input_hash":230261373,"_task_hash":329569934,"_is_binary":false,"tokens":[{"text":"25","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288531,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"26","_input_hash":1310182142,"_task_hash":266669984,"_is_binary":false,"tokens":[{"text":"26","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288531,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"27","_input_hash":-738557658,"_task_hash":613896265,"_is_binary":false,"tokens":[{"text":"27","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288532,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"28","_input_hash":-165836021,"_task_hash":-1541557956,"_is_binary":false,"tokens":[{"text":"28","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288532,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"29","_input_hash":-539359274,"_task_hash":1560417868,"_is_binary":false,"tokens":[{"text":"29","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288532,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"30","_input_hash":1814743553,"_task_hash":-550346988,"_is_binary":false,"tokens":[{"text":"30","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288532,"_annotator_id":"issue-6702-user4","_session_id":"issue-6702-user4"}
{"text":"1","_input_hash":-2045454197,"_task_hash":1182163795,"_is_binary":false,"tokens":[{"text":"1","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288535,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"2","_input_hash":-784123405,"_task_hash":985633209,"_is_binary":false,"tokens":[{"text":"2","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288536,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"3","_input_hash":-805513229,"_task_hash":1162353785,"_is_binary":false,"tokens":[{"text":"3","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288536,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"5","_input_hash":-1991218384,"_task_hash":-1581206528,"_is_binary":false,"tokens":[{"text":"5","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288536,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"6","_input_hash":-1639312927,"_task_hash":-777789492,"_is_binary":false,"tokens":[{"text":"6","start":0,"end":1,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288536,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"10","_input_hash":1254603855,"_task_hash":16089945,"_is_binary":false,"tokens":[{"text":"10","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288537,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"11","_input_hash":-1789291377,"_task_hash":-474789062,"_is_binary":false,"tokens":[{"text":"11","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288537,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"12","_input_hash":1626462944,"_task_hash":2018616092,"_is_binary":false,"tokens":[{"text":"12","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288537,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"13","_input_hash":855192632,"_task_hash":335429472,"_is_binary":false,"tokens":[{"text":"13","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288537,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"17","_input_hash":933542244,"_task_hash":-1066193932,"_is_binary":false,"tokens":[{"text":"17","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288538,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"18","_input_hash":-1213183927,"_task_hash":-556031871,"_is_binary":false,"tokens":[{"text":"18","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288538,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"19","_input_hash":2035079888,"_task_hash":30789149,"_is_binary":false,"tokens":[{"text":"19","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288538,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"20","_input_hash":-883828987,"_task_hash":-1493531608,"_is_binary":false,"tokens":[{"text":"20","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288539,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"21","_input_hash":-1139507680,"_task_hash":-1258641755,"_is_binary":false,"tokens":[{"text":"21","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288539,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"22","_input_hash":1765252975,"_task_hash":-1221112846,"_is_binary":false,"tokens":[{"text":"22","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288539,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"23","_input_hash":348472867,"_task_hash":-1683186191,"_is_binary":false,"tokens":[{"text":"23","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288539,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"26","_input_hash":1310182142,"_task_hash":266669984,"_is_binary":false,"tokens":[{"text":"26","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288540,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"28","_input_hash":-165836021,"_task_hash":-1541557956,"_is_binary":false,"tokens":[{"text":"28","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288540,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}
{"text":"30","_input_hash":1814743553,"_task_hash":-550346988,"_is_binary":false,"tokens":[{"text":"30","start":0,"end":2,"id":0,"ws":false}],"_view_id":"ner_manual","answer":"accept","_timestamp":1690288540,"_annotator_id":"issue-6702-user5","_session_id":"issue-6702-user5"}

It's 90 lines, which is exactly what I'd expect given that there are three annotations per example (3 x 30 = 90). Next, I'll perform a count on that dataset.

import polars as pl 

pl.read_ndjson("issue-6702.jsonl").groupby("_annotator_id").agg(pl.col("_input_hash").count())

This yields the following table.

┌──────────────────┬─────────────┐
│ _annotator_id    ┆ _input_hash │
│ ---              ┆ ---         │
│ str              ┆ u32         │
╞══════════════════╪═════════════╡
│ issue-6702-user4 ┆ 20          │
│ issue-6702-user3 ┆ 13          │
│ issue-6702-user1 ┆ 18          │
│ issue-6702-user2 ┆ 20          │
│ issue-6702-user5 ┆ 19          │
└──────────────────┴─────────────┘

The distribution isn't perfect, again because of the hashing, but it doesn't feel out of bounds. The reason why we use hashes here, instead of round-robin, has to do with the consistent mapping. The hashing trick really guarantees that a specific hash is mapped to a specific user, even if the order of the stream were to change. With a round-robin approach the allocation might change after the server restarts and new data is added.

Out of curiosity, if you were to follow these steps, do you see something different?

Hi Vincent,

Thanks for the replies. I am working on the same. Initially I tried Task Routers with v1.11, thats the reason faced the problem.

Today we purchased for current version(1.12). Installed successfully in my machine.

By using same config file prodigy.json with parameter "annotations_per_task": 3 and used the command PRODIGY_ALLOWED_SESSIONS="user1,user2,user3,user4,user5" python -m prodigy ner.manual group4 blank:en data2.jsonl --label LABEL1,LABEL2

Am getting the below error:
'PRODIGY_ALLOWED_SESSIONS' is not recognized as an internal or external command, operable program or batch file.

So added the path C:\Users\xxx\xxx\xxx\Python\Python311\site-packages\prodigy in environment variable. Still facing the same error.

If eliminates the "annotations_per_task" from the config file and ran the simple command python -m prodigy ner.manual sample1 blank:en data2.jsonl --label LABEL1,LABEL2, it works and hosted.

Only facing problem with PRODIGY_ALLOWED_SESSIONS and other environment variables as well such as export PRODIGY HOME, export PRODIGY_CONFIG and so on.

Also tried to set the environment variable manually in CMD prompt:

conda env config vars set my_var=PRODIGY_ALLOWED_SESSIONS
reactivated the environment: conda activate test-env
conda env config vars list
output >> my_var = PRODIGY_ALLOWED_SESSIONS

Still facing the error, 'PRODIGY_ALLOWED_SESSIONS' is not recognized as an internal or external command, operable program or batch file.

Thanks

It seems that you're running windows, so it might be that your shell prefers to set environment variables differently. I found this document, but I can't give too much advise here since I am not a Windows user.

Alternatively, you might also configure a .env file and use python-dotenv to force the variables that way. If you go down this route, the command may look something like:

# Assuming there's a `.env` file in the folder
dotenv run -- python -m prodigy ner.manual <arguments ...>

Hi,

Successfully, hosted in Linux terminal. But still a big question on Windows OS. I tried all possibilities for setting the environment variables w.r.t Windows OS, no improvement. Why we need such long process for setting env_variables where none of the solution works better. Am still exploring myself on this.

If any solution, please post here.

Thanks