Working collaboratively with Prodigy

kushal_pythonist · January 18, 2023, 1:14pm

I purchased the licensed(more than 5) to work on NER Project. I do not have any idea how i work collaboratively using these licensed prodigy account. I just want to know how i go with the process. I have 5 peoples to work with me. What and how should i perform the task. Can anyone please explain the process and help me out to complete the project?

ryanwesslen · January 18, 2023, 9:34pm

hi @kushal_pythonist!

Thanks for your question.

First, it's important to determine the "what" you'll be labeling:

Here's more of the "how" to implement your project.

First, if your annotators are on the same network (e.g., server), check out our documentation on named multi-user sessions. You can start a Prodigy session as you would normally:

python prodigy ner.manual ner_news_headlines blank:en ./news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION

✨ Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

To create a custom named session, add ?session=xxx to the annotation app URL. For example, annotator Alex may access a running Prodigy project via http://localhost:8080/?session=alex. Internally, this will request and send back annotations with a session identifier consisting of the current dataset name and the session ID – for example, ner_person-alex. Every time annotator Alex labels examples for this dataset, their annotations will be associated with this session identifier.

The "feed_overlap" setting in your prodigy.json or recipe config lets you configure how examples should be sent out across multiple sessions. If true, each example in the dataset will be sent out once for each session, so you’ll end up with overlapping annotations (e.g. one per example per annotator). Setting "feed_overlap" to false will send out each example in the data once to whoever is available. As a result, your data will have each example labelled only once in total.

As of v1.8.0, the PRODIGY_ALLOWED_SESSIONS environment variable lets you define comma-separated string names of sessions that are allowed to be set via the app. For instance, PRODIGY_ALLOWED_SESSIONS=alex,jo would only allow ?session=alex and ?session=jo, and other parameters would raise an error.

An alternative approach is where you have one session with a unique port and dataset for each annotator:

This is harder to manage manually when you go beyond a few annotators. Some community members have found using tools like tmux to help:

Second, if you're running on this on a different machine than what your annotators are on, you'll need to modify the host, changing Prodigy Host to "0.0.0.0" from localhost.

To do this, you'll need to modify the configuration in the prodigy.json file or setting the environment variable for PRODIGY_HOST.

As mentioned, for production setups, you may want to consider reverse proxies. Also, be aware that you may have to handle firewalls or other issues depending on your server setup.

Developing comprehensive guidelines for working collaboratively is hard, because a lot of design decisions depend on the unique network setup. There are many server posts. and cloud posts like aws or google-cloud that may help depending on your setup.

Hope this helps!

kushal_pythonist · January 20, 2023, 2:42am

Lets take a example here:
I want to annotate NER. I have 5 poeples streamlined
I use this command :

prodigy ner.manual resume blank:en /home/random/Documents/anno/outputfiles/randompersonresume.jsonl --label PERSON,LOCATION,UNIVERSITY,ROLE,ORGANIZATION,DATE,DOMAIN,PROJECT,EXPERIENCE,HARDSKILLS,SOFTSKILLS,DEGREE

to annotate. In first day, they complete the annotations. In second day is this the same way to start over? Or any other way?
I know this may be silly questions but i need to solve this issue asap. Thank You. Appreciated

ryanwesslen · January 20, 2023, 8:08pm

hi @kushal_pythonist!

Unfortunately, the answer is "it depends" on your use case and Ines' post is the best advice for how to start your project.

A lot of the decisions you need to make is based on what your use case is. Matt discussed this as the base of the "ML Hierarchy of Needs":

I suspect you're in a rush so likely may not have time to step back and focus on these problems.

If you need something, the closest step-by-step instructions I can provide is our NER flowchart:

Hope this helps!

kushal_pythonist · January 21, 2023, 3:06am

Thank you so much @ryanwesslen . I will surely watch the video you provided. And thanks again for your constant help and guidance.

Topic		Replies	Views
running prodigy on internal network with multiple annotators usage , ner	5	3168	February 6, 2019
Training with multiple annotators usage , solved	8	3819	July 18, 2023
How to use Prodigy for NER labelling as a team? usage , ner , best-practices	2	1267	January 31, 2020
Starting a Multi-user session solved , multi-user	15	2563	March 14, 2023
NER Team ner	6	549	January 25, 2023

Working collaboratively with Prodigy

Related topics