It is very easy to have multiple annotators on one recipe. But it's almost too easy?
When we pass the Prodigy server link to our annotators, it is very easy to make mistakes (typos, negligence, etc.) in the session names (/?session=mathew).
Is it possible to restrict the names (say through a hook) when creating a new dataset with the session name?
The auto_create=False option doesn't work for new session names.
Yes, I definitely know what you mean. At the moment, the app is completely agnostic to what any of the inputs âmeanâ, including the dataset names but also the sessions or even the data. All validation happens on the server. This is pretty much by design.
The named sessions are a bit special in that way, because they are passed to the app, and then passed back to Prodigy via the /give_answers endpoint. You canât really validate it there because it means youâd only get an error after submitting your first batch and pretty much lose all your progress. And in any case, we can only return custom error codes to the app and canât ever raise server-side errors with named sessions because itâd kill the entire process.
So I think the best option thatâs consistent with the current architecture would be to validate on the server when we request the dataset metadata. The global or recipe config could then expose a list of allowed names (or maybe even optional regular expressions?). If the session doesnât match, the server could return a specific error code so the app can show a message like: âInvalid session IDâ.
Quick update on this: As of v1.8, Prodigy now supports a PRODIGY_ALLOWED_SESSIONS environment variable that can define a list of allowed session string names.