Working collaboratively with Prodigy

hi @kushal_pythonist!

Thanks for your question.

First, it's important to determine the "what" you'll be labeling:

Here's more of the "how" to implement your project.

First, if your annotators are on the same network (e.g., server), check out our documentation on named multi-user sessions. You can start a Prodigy session as you would normally:

python prodigy ner.manual ner_news_headlines blank:en ./news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION

✨ Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

To create a custom named session, add ?session=xxx to the annotation app URL. For example, annotator Alex may access a running Prodigy project via http://localhost:8080/?session=alex. Internally, this will request and send back annotations with a session identifier consisting of the current dataset name and the session ID – for example, ner_person-alex. Every time annotator Alex labels examples for this dataset, their annotations will be associated with this session identifier.

The "feed_overlap" setting in your prodigy.json or recipe config lets you configure how examples should be sent out across multiple sessions. If true, each example in the dataset will be sent out once for each session, so you’ll end up with overlapping annotations (e.g. one per example per annotator). Setting "feed_overlap" to false will send out each example in the data once to whoever is available. As a result, your data will have each example labelled only once in total.

As of v1.8.0, the PRODIGY_ALLOWED_SESSIONS environment variable lets you define comma-separated string names of sessions that are allowed to be set via the app. For instance, PRODIGY_ALLOWED_SESSIONS=alex,jo would only allow ?session=alex and ?session=jo, and other parameters would raise an error.

An alternative approach is where you have one session with a unique port and dataset for each annotator:

This is harder to manage manually when you go beyond a few annotators. Some community members have found using tools like tmux to help:

Second, if you're running on this on a different machine than what your annotators are on, you'll need to modify the host, changing Prodigy Host to "0.0.0.0" from localhost.

To do this, you'll need to modify the configuration in the prodigy.json file or setting the environment variable for PRODIGY_HOST.

As mentioned, for production setups, you may want to consider reverse proxies. Also, be aware that you may have to handle firewalls or other issues depending on your server setup.

Developing comprehensive guidelines for working collaboratively is hard, because a lot of design decisions depend on the unique network setup. There are many server posts. and cloud posts like aws or google-cloud that may help depending on your setup.

Hope this helps!