Thanks for your question and welcome to the Prodigy community
Please see this post:
This post too provides more details on ngrok:
But as mentioned, there are a lot of ways this can be done and really depends on your setup (e.g., on-premise vs cloud, which cloud provider, security/firewall requirements, reverse proxy/load balancer). This is why there isn't a simple step-by-step tutorial. I would recommend checking out more of the multi-user tagged posts that have many examples.
The team is working very hard on Prodigy Teams that provides this functionality:
We'll post updates once it becomes publicly available.
I managed to connect the Prodigy session on my phone and other devices via Ngrok.
Is there a way to track each annotator's progress (number of annotated records)?
Also, currently, when I refresh the Ngrok link, the session's and the total's progress changes to 0% as if no annotations were done. Is it really deleting previous progress? If so, how do I prevent that?
Lastly, from where can I access the annotated dataset by each annotator?
Now it separates by metadata (document); but you could modify for _session_id. I really like the idea from the post of creating a streamlit app like this (FYI, this uses a different approach for tracking annotators by giving each annotator their own port/Prodigy dataset.).
prodigy progress news_headlines_person,news_headlines_org
================================== Legend ==================================
New New annotations collected in interval
Total Total annotations collected
Unique Unique examples (not counting multiple annotations of same example)
================================= Progress =================================
New Unique Total Unique
----------- ---- ------ ----- ------
10 Jul 2021 1123 733 1123 733
12 Jul 2021 200 200 1323 933
13 Jul 2021 831 711 2154 1644
14 Jul 2021 157 150 2311 1790
15 Jul 2021 1464 1401 3775 3191
It doesn't break down by "session_id"; however, you can view the underlying recipe and modify it (e.g., replace time with "session_id" when it prints out the table (or something like it). To find the location of the recipe, run python -m prodigy stats and find the Location: path. Open that folder, then look for /recipes/commands.py. You can then use that as a custom recipe and run it with the -F new_recipe.py. If you get either to work, please post it back so other members of the community can use it!
Did you annotate more than 10 examples and/or make sure to click "Save"? By default, the first 10 example (as batch_size is 10 by default) will not be saved to the database unless you save it or get through those first 10 (then it'll automatically save, and retrieve a new batch).
Are you using named multi-user sessions? You likely would want to as there's no other way to identify your data by annotator. Also, be sure to be aware of the difference of feed_overlap (that is, do you want overlapping or non-overlapping annotations.
No. If you saved the annotations to the database, then it's not "deleting" anything. I suspect your problem was that the annotations were made (less than 10), but you didn't save them to DB by clicking save.
While reading, I saw that the current version of Prodigy is fine for multi-user annotations using the same port, right? I will be able to allow each annotator to annotate the whole dataset and I will be able to get each annotated dataset separately, right?
Is this do-able for the single port approach?
You were right, I did not save the progress. However, when I save the first 10 then go on to annotate more records and exit or refresh, the new progress is still not being saved. Is there a way of enforcing autosave?
Yes I am. I will make sure to change the settings of feed_overlap as I want each annotator to annotate the whole dataset.
Thanks again, Ryan. I really appreciate the detailed answers. You make this platform extremely beginner-friendly.
Correct. An alternative to multi-user named sessions is to run each annotator on a different port/dataset.
Here's a good pro/con of each:
Sounds like then you want "overlapping" annotations, i.e., you need to set feed_overlap to true. You can do this in prodigy.json or as an override.
I just responded to a similar example; see the bottom of the difference of feed_overlap:
If you want to autosave, you can either set your batch_size to 1 or setting instant_submit in prodigy.json (config) to true. The one downside of these approaches is that it'll remove the undo as you'll no longer be holding the batch in client before sending to DB.
I can't remember the specific differences but this post seems to discuss it more:
Also, the Progress Bar is not updated real time; it's only updated when a new batch is retrieved. We've been debating modifying this in a future version but there are some unintended consequences in high latency environments with multiple annotators that can cause issues.
If you wanted something more realtime, you could check out the update callback. @koaning has a great video on it to track annotator speed:
Since, you're using feed_overlap is true, be sure to update Prodigy today. Yesterday, we released v1.11.9 which fixed a lingering bug of "duplicated" annotations in high latency/multiple annotators sessions.
Also, it's worth mentioning the team is working very hard on releasing Prodigy v2 in a few months. This will add even greater customization to feed_overlap (e.g., what if you want only a certain percentage overlapping across annotators).
For the release of Prodigy v2, we have some exciting new features and a significant redesign to the way examples are sent to different annotators. Specifically, this redesign will eliminate the need for this tradeoff and should eliminate 100% of unwanted duplicates.
It will also bring more customization to the feed_overlap setting like setting the number of annotations you want for each task , or configuring a percentage of examples to have overlapping annotations for. We're even working on registering custom policies to distribute work to different annotators.
Also, since you're looking at having multiple annotators, be sure to check out @pmbaumgartner great Inter-annotator Agreement recipes. This can allow you to better "calibrate" how consistent annotators are (see Peter's wonderful NormConf talk on why calibration is important). We'd love feedback as this project is evolving!
Thanks again for your questions (sorry about dumping a ton of resources). Hopefully you have plenty of resources and please post back if you have interesting experiments or link to blog/paper if you're successful!
I haven't used ngrok extensively (i.e., beyond only having one annotator). You may need to research it to find how you can pass the session name (if it's possible). Remember, ngrok is a "quick-and-dirty" tool. It is by no means a perfect solution, especially when handling many annotators. You may also want to try localtunnel as an alternative.
the current dataset is not included in the list. Does the Prodigy session need to end so that it saves or can I access the dataset while running? Because my session is still running since some annotators still did not finish
No, you don't need to end your session. Annotations are saved to the database after each batch. That is, click the "save" button or complete 1 batch, which by default is 10 annotations.
For example, run a recipe into a new dataset. Annotate 10 examples, but don't save or go to the 11th example (if you still have default batch size 10). Then try to look records in that dataset. You should likely not see anything. However, click the save button (or proceed). Now try to look at your data again. You should see your examples. If you don't then you may not be looking at the correct dataset.
Also, consider adding in logging when running your recipes. When you do this, you should see your examples when they're sent to the database (let alone, it'll confirm the name of your database).