I can currently running Prodigy for multiple people using separate instances. I use a “yield task” command to feed the task to the stream like below (taken from "No tasks available" on page refresh):
def get_stream(stream):
while True:
for task in stream:
yield task
My main problem is that the yield command sometimes “skips” tasks. For example, it won’t display the first task in Prodigy but it displays the second task instead. Please note that the example, dataset and link databases are all empty when this happens, so it is obviously not Prodigy auto-filtering based on what is in the example database.
I am still using the old version of Prodigy for this (Prodigy 1.6 - Prodigy 1.7 returns JWT: caller token did not validate when I try to start it up. My guess is that it clashes withe NGINX server we are using to serve the Prodigy apps).
Yeah, I am not sure either. I have included a little snippet of the log file. It is a little long, it is basically showing that Prodigy is passing over the first task and only yielding the second task. It just doesn’t seem to be calling “RESPONSE: /get_questions (1 examples)” function after my first task (see line “ERROR: “RESPONSE: /get_questions not called” !”). Is there any way to force the RESPONSE call? Are there any other filters in the Prodigy system apart from filtering the example table?
21:44:24 - Cur Self: 6 (serving new task)
21:44:24 - Input/Task hashes in Example Table: {(-1107351659, -1355042854), (886294252, 1440874809), (-1875289351, 220242905), (-2012468421, -1982285274), (299792362, -10964
0114)}
21:44:24 - Input/Task hash of task to serve Prodigy: (526205998, -556761580)
21:44:24 - Multi-Annotator Logic: True
21:44:24 - Return to __iter__
21:44:24 - __iter__ says task is True
21:44:24 - ABOUT TO YIELD TASK
ERROR: "RESPONSE: /get_questions not called" !
21:44:26 - YIELDED TASK
21:44:28 - Cur Self: 7 (serving new task)
21:44:28 - Input/Task hashes in Example Table: {(-1107351659, -1355042854), (886294252, 1440874809), (-1875289351, 220242905), (-2012468421, -1982285274), (299792362, -109640114)}
21:44:28 - Input/Task hash of task to serve Prodigy: (-1265051456, 554531250)
21:44:28 - Multi-Annotator Logic: True
21:44:28 - Return to __iter__
21:44:28 - __iter__ says task is True
21:44:28 - ABOUT TO YIELD TASK
21:44:30 - RESPONSE: /get_questions (1 examples)
21:44:30 - POST: /give_answers (received 1)
21:44:30 - CONTROLLER: Receiving 1 answers
21:44:30 - DB: Getting dataset '2019-03-19_21-43-52'
21:44:30 - DB: Getting dataset 'multi-server'
21:44:30 - DB: Getting dataset '2019-03-19_21-43-52'
21:44:30 - DB: Added 1 examples to 2 datasets
21:44:30 - CONTROLLER: Added 1 answers to dataset 'multi-server' in database MySQL
21:44:30 - RESPONSE: /give_answers
Ah, I think I figured it out. It is skipping over annotations that are in the example table but are “ignored”. There were some ignored annotations in the example database at the time but I thought they didn’t count like “accepted” and “rejected” annotations. Is this the expected behavior? I think I made an assumption that ignored annotations can reappear later in Prodigy (like basically ‘ignore’ the annotation to do later).
Additionally, building on this, is there a way to prevent ignored annotations from being saved to the example/dataset/link table?
Thanks for looking into this – and no, this shouldn't be expected behaviour at all Re-annotating datasets with answers is totally a supported and expected workflow, and the app will normally just override existing answers.
I'll need to investigate this, but off the top of my head, I can't think of anywhere this would be happening. (But if this is what's happening, at least we've found a pattern, which makes debugging a lot easier!) Is there anything particularly custom you're doing in your recipe?
I see, that is good to know that this is not the expected behavior.
If it helps you, I found this error while starting a Prodigy instance when the ignored annotations were in the database. Then I find that Prodigy skips over the ignored annotations that are already in the example database. But I am not sure about the case in which the Prodigy instance is already running and the ignored annotations are generated during the session - I’ll let you know when I test that scenario out.
I am doing my own annotator logic where I make sure that no annotator annotates a sentence twice and that each sentence is only annotated three times. But I don’t think my logic is interfering as it is letting the right annotations through (i.e. you can see that my multi-annotator logic = True in the log in the my previous post, meaning that the sentence will be sent off to Prodigy)
Yes, that'd definitely be interesting! Also, if the existing answer is the cause here, one quick fix should be to overwriteit /remove it from the task dict before sending it out? And if not, that'd also be interesting, since it means something else would be going on, if I understand this correctly?
I tested the scenario out - it seems that Prodigy doesn’t load any ignored annotations that were already in the example table when the Prodigy server is started up. So in a way, the Prodigy ignore mechanism works perfectly if there are no ignored annotations in the example table - I tested this out and it works. But if there are some ignored annotations in the example table at the time that the prodigy server is started up, Prodigy skips over the ones that were already in the example table - subsequent ignored annotations seem to cycle around as is the expected behavior.
Thanks for the fix for the JWT - I’ll look into it
Ahh okay – but technically, this should happen regardless of the annotation decision (whether it's ignore, accept or reject). The exclude mechanism works based on the hashes, and they do not take the answer into account.
For your use case, it might make sense to set "auto_exclude_current" to false in your config. This will not automatically exclude the examples current dataset on startup and instead allow you to manage the exclude mechanism yourself and at runtime.