Image Classification Consensus

Thanks for the kind words, and I'm glad the concept of the tool resonates with you! We hate having to "program" in YaML or by clicking through a web interface, so we wanted to make a tool where the scripting was front and centre.

Focusing on the deployment aspects first:

Prodigy will be exactly as easy to deploy this way as a "hello world" Flask app. Prodigy itself just hosts the REST API, including the single-page app. So whatever steps you'd normally do (e.g. add a reverse proxy, add a domain name with https, etc) will be there as well.

The super hacky zero setup alternative is to host from a local machine with the ngrok.com tool: Tip: Making a local Prodigy service externally accessible with ngrok.com

For the database, you can configure Prodigy to store to any SQL database, by providing config either over environment variables or in the prodigy.json file. You can also configure from the recipe instead. We use the Peewee ORM, so again it should be exactly the same sort of process as setting up DB connectivity for a normal app. The default is to just use SQLite...I think actually if you configure the DB to a persistent disk, you could just use that? I think it'll be fine.

You'll need to do the inter-annotator agreement stuff yourself, but otherwise I think everything should be basically built-in. You'll end up with a couple of Python functions for custom recipes just for convenience probably too. The data format is just jsonl. Your use-case is very much the sort of thing we were thinking of, so I don't think you'll have much trouble.

One feature you'll need is being able to identify which annotations were done by which users. This is referred to as "multi-user sessions" in Prodigy. You can give each annotator a version of the URL differentiated by query parameter. Here's the section in the docs on this:

Multi-user sessions

This update was shipped in preparation of the upcoming Prodigy Scale, a full-featured, standalone application for large-scale multi-user annotation project powered by Prodigy.

As of v1.7.0, Prodigy supports multiple named sessions within the same instance. This makes it easier to implement custom multi-user workflows and controlling the data that's sent out to individual annotators.

To create a custom named session, add ?session=xxx to the annotation app URL. For example, annotator Alex may access a running Prodigy project via http://localhost:8080/?session=alex. Internally, this will request and send back annotations with a session identifier consisting of the current dataset name and the session ID – for example, ner_person-alex. Every time annotator Alex labels examples for this dataset, their annotations will be associated with this session identifier.

The "feed_overlap" setting in your prodigy.json or recipe config lets you configure how examples should be sent out across multiple sessions. By default (true), each example in the dataset will be sent out once for each session, so you'll end up with overlapping annotations (e.g. one per example per annotator). Setting "feed_overlap" to false will send out each example in the data once to whoever is available. As a result, your data will have each example
labelled only once in total.

As of v1.8.0, the PRODIGY_ALLOWED_SESSIONS environment variable lets you define comma-separated string names of sessions that are allowed to be set via the app. For instance, PRODIGY_ALLOWED_SESSIONS=alex,jo would only allow ?session=alex and ?session=jo, and other parameters would raise an error.```