SSO and Dynamic stream issues

kwaddle · July 18, 2024, 4:39pm

Hello,

We're having a couple different issues regarding our app.

The most pressing one is regarding SSO. We're wanting to integrate with Microsoft Entra, which is working just fine. The issue is that the session ID provided is a non-human-readable hash value, and we're looking to utilize the username instead. Is there a way to affect this within the custom recipe? Our use case states that the administrator should be able to pre-assign the tasks within prodigy, which at the moment needs to be a username the admin can easily type into the metadata fields we're providing.

The other issue is regarding a dynamic stream. We would like to be able to pull in new tasks during run time, even after the stream is empty. The biggest problem with this is that our users might upload as few as one example at a time. Because of the default batch size of 10, the stream obviously will not pull in anything until there are at least 10 examples. I modified the batch size to 1, which enabled us to successfully fetch one example at a time. The problem becomes the auto save feature, as even when setting instant submit to false, it is still saving to the database after every answer received. Our requirements state that the user should be able to review their answers before saving. This becomes impossible if we're running the infinite loop to pull more examples as few as one at a time.

Any help would be appreciated

magdaaniol · July 22, 2024, 11:32am

Hi @kwaddle,

Apologies for a slightly delayed response!

Re session ID

OIDC vendors including Microsoft Entra provide /userinfo endpoint that maps the session ID hash value with (minimally) user's email. Currently, Prodigy does not provide a function to securely call this endpoint from the recipe level but you could work around it by generating this mapping elsewhere and incorporate it in the task routing logic. Alternatively, if you need it to be dynamic (the annotators often change or you often onboard new ones) you could implement the call to Entra's /userinfo directly from the recipe.
Then, once the mapping is available, inside the custom task router you could consult the current session IDs (it's the attribute of the Controller object), lookup the email to which it corresponds and match the question based on the email value. Do reach out if you need assistance implementing this kind task router (assuming the mapping can be made available).

Re dynamic stream

It's true that once the example has been sent to the server it is impossible to edit it without going back to the DB. The example must be present in browser's cache to enable "undo" logic. In Prodigy the examples are sent to the server immediately if instant_submit is set to True or once the batch of answers has been collected. In the case of batch size=1 this is equivalent to having instant_submit set to True .

In any case, you shouldn't need instant_submit or batch_size 1 to meet the requirement you describe. The pulling logic does not wait until the batch of input examples has become available. On each call to the /get_questions endpoint, the session tries to queue as many examples as possible given the task routing constraints with the batch_size being the upper (not the lower!) limit. In other words, if questions become available in the input stream, the annotators should be able to pull them upon refreshing the tab. This refresh action is important as it's an active call to the /get_questions endpoint.
If that is not what you're observing, we should probably look at how your loader is implemented.

kwaddle · July 22, 2024, 1:04pm

Thanks so much for the feedback!

I love your solution to the SSO issue! We've come to the conclusion that a solution like that would push our delivery date further past the end of the sprint than we would like, and mimics some of the functionality we had planned for a future release. So we've taken on the tech debt of manually setting session names, rather than the worse tech debt of spending extra time on something that will later need to be replaced .

Ah, ok, so I think the issue I was having was because I was not refreshing. The behavior I was seeing was an infinite loop looking for more examples, but not returning anything if I only uploaded one or two with the default batch size. I believe we've also shifted requirements on this, and are planning on emulating an infinite stream with some form of orchestrator or factory that will spin up new instances as needed.

Thanks so much for your response!

Topic		Replies	Views
get session id within a recipe enhancement , usage , streams	4	1607	February 5, 2020
Problems with task stealing and sessions	7	40	May 9, 2025
Get session using validate_answer?	3	366	July 22, 2022
Encrypt session ID usage , custom	1	174	November 22, 2023
SSO Login Redirect Loop	5	243	September 25, 2024

SSO and Dynamic stream issues

Related topics