We're trying to hoist up multiple instances of Prodigy at once on a single URL so that people can annotate multiple datasets without needing to restart Prodigy. Our approach is to serve multiple Docker images running different Prodigy
ner.make-gold commands in a network, and communicate to all of them via a nginx reverse-proxy, serving
host.com/dataset-name/ to each instance.
The nginx configuration for this is simple, but it depends on support in Prodigy for the
url_prefix argument being passed to
waitress.serve(), and then for the frontend to be aware of this and use the prefix for REST commands.
Is this something you could support? Alternatively, if you have any better ideas for how to accomplish what we're doing, I'd love to hear them.
We actually provide the source for the server app within Prodigy, so if worse comes to worst you could make the changes you need to the
app.py -- have a look within your Prodigy installation. Unfortunately, there's no easy way to make the front-end aware of the prefix without recompiling the app.
I've definitely struggled through this before as well, so I feel the pain. What I've ended up concluding is that path-based routing is just more trouble than it's worth, and you're much better off just using subdomains. Apparently Apache has some features that allow link rewriting, so that the path-based routing will work as expected, but I usually use Nginx, and I didn't manage to get it to work.
Yeah, we had patched the functionality into
app.py and the JS bundle (lots of fun doing that one), but decided it would be too much friction to support that through Prodigy updates.
In the meantime we're using subdomains through Apache (the Nginx config to do it is actually much simpler and performs better than the Apache version, but Apache has better open-source support for OpenID authentication without having to pay for Nginx Plus). Drawbacks are it's a bigger pain to manage different subdomains rather than a single domain, and our session logins for one model won't persist across domains without far more configuration.
Ah damn. Sorry you've had to go through so much yak-shaving. I would agree that patching the bundle and the app.py probably isn't the approach you want going forward.
I initially found managing subdomains annoying as well, but after I set up wildcard rules for the DNS forwarding and wildcard SSL certificates it wasn't too bad.
The other web server we often recommend people is Traefik. We're using this for Prodigy Scale and it makes the routing really easy, because it integrates with the service discovery -- so as soon as the new service is published, it's routable on the subdomain for that service name. But I don't think it has OpenID support...
Not a problem, all part of the process. Was fun to dig around a bit more into the Prodigy source, anyway.
Traefik looks cool! We use Kong for our kubernetes service mesh / API gateway, but have considered moving over to Traefik. Sadly our kubernetes cluster is not quite ready for production so this is being handled outside that.
Wildcarded SSL & DNS rules are actually the reason that the Apache configuration is relatively straightforward, so that's not my concern. I more meant that it's a bigger pain to coordinate our annotators onto the proper URLs. We're doing subdomains by dataset name, so we have to tell them to login to
dataset-b.example.com, and etc. (plus changing when new datasets are added) rather than say "login to
annotation.example.com and click the link for the the datasets you're knowledgeable in."
Do appreciate the suggestions, though, and glad to hear what we're doing is not so far off from what you've done now and in the past.