User authentication for Prodigy web app

I have some javascript I’d like to inject into the prodigy theme to add some custom authentication controlling who has access to annotate. I’m having trouble figuring out from the documentation the best way to accomplish this. I’m planning on using the textcat.teach recipe. What’s the best way to accomplish this? Thanks!

In theory, you can inject your own scripts via custom HTML templates or even by modifying static/index.html. However, this only really works for things like analytics integration, not for logic that’s supposed to interact with the standalone, pre-compiled React application.

Instead, a better (and possibly more secure) solution would probably be to handle this on the server side and only start up and serve the Prodigy app if the user is already authenticated. My comment in this thread describes a solution to a very similar question. This gives you maximum flexibility over how you want to present the annotator login. And using a custom recipe (e.g. one that wraps the built-in textcat.teach), you can very easily put your own service inbetween Prodigy and the annotator(s), and even manage the annotation tasks that get sent out on a per-user basis. If you search the forum for “multiple annotators”, you’ll find some more examples of this.

There are also some pretty cool solutions contributed by other users – for example a snippet for basic authentication and a script for a bare-bones multi-annotator setup by @andy.

2 Likes

In case anyone stumbles upon this before I get a chance to finish writing up our solution for this… We were able to setup an authentication app in express.js that utilizes http-proxy-middleware to proxy requests to prodigy after the user is authenticated. The prodigy application only accepts incoming requests from the proxy application which prevents unauthorized users from accessing the app. Excluding the authentication portions of the code the only thing needed to proxy requests to prodigy was:

var options = {
    target: 'http://prodigy:8080', // target host
    pathRewrite: {
        '^/proxy' : '/'     // rewrite path
    },
};
var prodigyProxy = proxy(options);
app.use('/proxy', prodigyProxy);
app.use('**', prodigyProxy);

Thanks for sharing – very elegant! Looking forward to reading more about your solution and what you’ve build with Prodigy :blush:

(I’ve renamed the thread btw to make it a little easier to find!)

Adding comment just to bump this issue. The lack of even rudimentary authentication support seems a pretty big oversight. One shouldn’t have to roll their own authentication middleware proxy or hack app.py from the prodigy source in order to have more options besides ‘only a developer can train on their own machine’ and ‘your model is open for the entire entire world to train’.

@mlwelles We’ve left this open-ended so far because we’ve found most users have fairly specific requirements from their organisation about how things need to be secured. Since a lot of users have to plug in their own authentication anyway, we wanted to make sure we didn’t have anything in place that might take effort to undo.

Your models definitely shouldn’t be “open for the entire world to train”, though! Prodigy is a developer tool, so the normal usage is either the developer working themselves, or other people on the local network. I agree that a built-in solution for basic authentication would be a good addition.

I am currently struggling with a similar problem. I need to set up Prodigy on a cloud server and access it thru a proxy. Can you elaborate a bit more on how you managed to do it? How did you configure the prodigy application to only accept traffic from the proxy?

@henninglebbaeus It should really be exactly the same as hosting any Flask or Django app. Also rails or node I guess.

You pretty much always want to have these apps listening on localhost, on some port. Then you have an application like nginx, apache, traefik etc proxying a connection from the wan to that port.

I like Traefik, as it’s a bit easier to configure. The following config should work;

# Write this to /etc/traefik/routes.toml
[frontends]
  [frontends.frontend1]
  backend = "backend1"
  [frontends.frontend1.routes.r1]
  rule = "Host:$ip" # Fill this in with the hostname or IP of the server

[backends]
  [backends.backend1]
    [backends.backend1.servers.server1]
    url = "http://127.0.0.1:8080"

Here’s an example service file. You’ll write this to /etc/systemd/traefik.service . Enable the service with systemctl enable traefik, and set it to start with systemctl start traefik.

[Unit]
Description=The Traefik HTTP and reverse proxy server
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
ExecStart=/usr/bin/traefik --file --file.filename=/etc/traefik/routes.toml

[Install]
WantedBy=multi-user.target

@honnibal Thanks for your quick response.

However, I still struggle to get it running. Since we are operating on Google Cloud Platform, I installed Prodigy on one virtual machine. I set up a HTTP Load Balancer as proxy that basically listens on a static IP and forwards traffic to the virtual machine on port 8080.

When I run a basic “hello world” node server using Express on the VM, everything works smooth. However, when I run the prodigy server on that same port, it does not work.

Any ideas why it fails?

Hmm. What IP is express listening on? Perhaps you need to set the PRODIGY_HOST environment variable (or alternatively, set the "host" key in your ~/.prodigy/prodigy.json file).

By default Prodigy will be listening on the localhost. I haven’t used the HTTP Load Balancer workflow, but I’m guessing it will be expecting the service to be listening on the LAN IP. I’m not sure how it could work if the service were to listen on localhost.

Note that if you set Prodigy to listen on a WAN IP, then the service will be accessible directly, which I expect will bypass any authentication you’re trying to set up.

Setting the host to “0.0.0.0” (any IP on the machine) solved it! Thanks a lot, @honnibal :+1:

1 Like