Starting a Multi-user session

NoorKhalifa · January 23, 2023, 7:07pm

I read all the docs and support cases but I did not manage to find a step by step tutorial for starting a multi-user session.

I set up the session on my local host and I managed to enter separate sessions by adding ?session=noor but I can only do that on my laptop

But how do I send this to someone who does not use my laptop and internet connection. My annotators are not in the same place/company.

ryanwesslen · January 23, 2023, 7:30pm

hi @NoorKhalifa!

Thanks for your question and welcome to the Prodigy community

Please see this post:

This post too provides more details on ngrok:

But as mentioned, there are a lot of ways this can be done and really depends on your setup (e.g., on-premise vs cloud, which cloud provider, security/firewall requirements, reverse proxy/load balancer). This is why there isn't a simple step-by-step tutorial. I would recommend checking out more of the multi-user tagged posts that have many examples.

The team is working very hard on Prodigy Teams that provides this functionality:

We'll post updates once it becomes publicly available.

NoorKhalifa · January 24, 2023, 6:09pm

Thanks Ryan!

I managed to connect the Prodigy session on my phone and other devices via Ngrok.

Is there a way to track each annotator's progress (number of annotated records)?

Also, currently, when I refresh the Ngrok link, the session's and the total's progress changes to 0% as if no annotations were done. Is it really deleting previous progress? If so, how do I prevent that?

Lastly, from where can I access the annotated dataset by each annotator?

ryanwesslen · January 24, 2023, 6:40pm

Check out this:

Now it separates by metadata (document); but you could modify for _session_id. I really like the idea from the post of creating a streamlit app like this (FYI, this uses a different approach for tracking annotators by giving each annotator their own port/Prodigy dataset.).

Also you may want to check out the progress recipe:

prodigy progress news_headlines_person,news_headlines_org
================================== Legend ==================================

New         New annotations collected in interval
Total       Total annotations collected
Unique      Unique examples (not counting multiple annotations of same example)
================================= Progress =================================

               New   Unique   Total   Unique
-----------   ----   ------   -----   ------
10 Jul 2021   1123      733    1123      733
12 Jul 2021    200      200    1323      933
13 Jul 2021    831      711    2154     1644
14 Jul 2021    157      150    2311     1790
15 Jul 2021   1464     1401    3775     3191

It doesn't break down by "session_id"; however, you can view the underlying recipe and modify it (e.g., replace time with "session_id" when it prints out the table (or something like it). To find the location of the recipe, run python -m prodigy stats and find the Location: path. Open that folder, then look for /recipes/commands.py. You can then use that as a custom recipe and run it with the -F new_recipe.py. If you get either to work, please post it back so other members of the community can use it!

Did you annotate more than 10 examples and/or make sure to click "Save"? By default, the first 10 example (as batch_size is 10 by default) will not be saved to the database unless you save it or get through those first 10 (then it'll automatically save, and retrieve a new batch).

Are you using named multi-user sessions? You likely would want to as there's no other way to identify your data by annotator. Also, be sure to be aware of the difference of feed_overlap (that is, do you want overlapping or non-overlapping annotations.

No. If you saved the annotations to the database, then it's not "deleting" anything. I suspect your problem was that the annotations were made (less than 10), but you didn't save them to DB by clicking save.

Yes, use get_dataset and filter by "_session_id".

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("my_dataset")

# filter examples by `"session_id"`

This is similar to the earlier post.

Since you're annotating on your phone, you may also like @koaning's Prodigy Short on tips on running Prodigy on a mobile device:

NoorKhalifa · January 24, 2023, 6:54pm

While reading, I saw that the current version of Prodigy is fine for multi-user annotations using the same port, right? I will be able to allow each annotator to annotate the whole dataset and I will be able to get each annotated dataset separately, right?

Is this do-able for the single port approach?

You were right, I did not save the progress. However, when I save the first 10 then go on to annotate more records and exit or refresh, the new progress is still not being saved. Is there a way of enforcing autosave?

Yes I am. I will make sure to change the settings of feed_overlap as I want each annotator to annotate the whole dataset.

Thanks again, Ryan. I really appreciate the detailed answers. You make this platform extremely beginner-friendly.

ryanwesslen · January 24, 2023, 7:18pm

Correct. An alternative to multi-user named sessions is to run each annotator on a different port/dataset.

Here's a good pro/con of each:

Sounds like then you want "overlapping" annotations, i.e., you need to set feed_overlap to true. You can do this in prodigy.json or as an override.

I just responded to a similar example; see the bottom of the difference of feed_overlap:

If you want to autosave, you can either set your batch_size to 1 or setting instant_submit in prodigy.json (config) to true. The one downside of these approaches is that it'll remove the undo as you'll no longer be holding the batch in client before sending to DB.

I can't remember the specific differences but this post seems to discuss it more:

Also, the Progress Bar is not updated real time; it's only updated when a new batch is retrieved. We've been debating modifying this in a future version but there are some unintended consequences in high latency environments with multiple annotators that can cause issues.

If you wanted something more realtime, you could check out the update callback. @koaning has a great video on it to track annotator speed:

Since, you're using feed_overlap is true, be sure to update Prodigy today. Yesterday, we released v1.11.9 which fixed a lingering bug of "duplicated" annotations in high latency/multiple annotators sessions.

Also, it's worth mentioning the team is working very hard on releasing Prodigy v2 in a few months. This will add even greater customization to feed_overlap (e.g., what if you want only a certain percentage overlapping across annotators).

For the release of Prodigy v2, we have some exciting new features and a significant redesign to the way examples are sent to different annotators. Specifically, this redesign will eliminate the need for this tradeoff and should eliminate 100% of unwanted duplicates.

It will also bring more customization to the feed_overlap setting like setting the number of annotations you want for each task , or configuring a percentage of examples to have overlapping annotations for. We're even working on registering custom policies to distribute work to different annotators.

Also, since you're looking at having multiple annotators, be sure to check out @pmbaumgartner great Inter-annotator Agreement recipes. This can allow you to better "calibrate" how consistent annotators are (see Peter's wonderful NormConf talk on why calibration is important). We'd love feedback as this project is evolving!

Thanks again for your questions (sorry about dumping a ton of resources). Hopefully you have plenty of resources and please post back if you have interesting experiments or link to blog/paper if you're successful!

NoorKhalifa · January 24, 2023, 8:45pm

Thanks again, Ryan! I will refer to all the resources and see what works for me. I appreciate the resources provided and will definitely post back once I'm done with my project

kushal_pythonist · January 25, 2023, 1:33pm

This works perfectly fine on local host, but not on ngrok platform

ryanwesslen · January 25, 2023, 2:05pm

I haven't used ngrok extensively (i.e., beyond only having one annotator). You may need to research it to find how you can pass the session name (if it's possible). Remember, ngrok is a "quick-and-dirty" tool. It is by no means a perfect solution, especially when handling many annotators. You may also want to try localtunnel as an alternative.

kushal_pythonist · January 26, 2023, 11:34am

this are the sessions i have created running smoothly in localhost but when i use ngrok and expose the ports and share to different people for annotating the documents are same . Why?

ryanwesslen · January 26, 2023, 12:47pm

hi @kushal_pythonist!

I haven't used ngrok beyond one annotator. ngrok is a third-party tool so we don't offer support. Can you check out the ngrok documentation to find out?

NoorKhalifa · March 14, 2023, 1:48pm

I am running my Prodigy session on a Raspberry Pi as a server. Can I access the annotated datasets using Python in my Windows Machine?

from prodigy.components.db import connect
db = connect()
examples = db.get_dataset_examples("my_dataset")

I ran this code, but the variable examples does not show anything

ryanwesslen · March 14, 2023, 2:03pm

Are you 100% sure you have the right name of the dataset and/or that there's actually data that dataset?

Let's assume you saved data into a Prodigy dataset called my_dataset.

Can you run?

prodigy print-dataset my_dataset

Or another alternative is:

python -m prodigy db-out my_dataset > my_dataset.jsonl

I don't think running Python on a windows machine would be a problem. Just curious, can you run:

print(db.datasets)

and do you see my_dataset (or the name of the dataset you're looking for)?

I did notice that if you run db.get_dataset_examples("my_dataset") and you don't have a dataset named my_dataset, it won't print a warning that "Data set doesn't exist".

NoorKhalifa · March 14, 2023, 2:19pm

I ran

print(db.datasets)

and got the list

['ner_news_headlines', 'comment_class', 'news_topic', 'news_topics']

the current dataset is not included in the list. Does the Prodigy session need to end so that it saves or can I access the dataset while running? Because my session is still running since some annotators still did not finish

ryanwesslen · March 14, 2023, 2:40pm

No, you don't need to end your session. Annotations are saved to the database after each batch. That is, click the "save" button or complete 1 batch, which by default is 10 annotations.

For example, run a recipe into a new dataset. Annotate 10 examples, but don't save or go to the 11th example (if you still have default batch size 10). Then try to look records in that dataset. You should likely not see anything. However, click the save button (or proceed). Now try to look at your data again. You should see your examples. If you don't then you may not be looking at the correct dataset.

Also, consider adding in logging when running your recipes. When you do this, you should see your examples when they're sent to the database (let alone, it'll confirm the name of your database).

NoorKhalifa · March 14, 2023, 3:23pm

I ran

python -m prodigy db-out my_dataset > my_dataset.jsonl

on my Raspberry Pi and it worked. I believe that the command must be run in the device that initially ran the Prodigy session.

It would be great if that is not the case and the dataset can be accessed from anywhere.

Topic		Replies	Views
prodigy Multi-user session access usage , solved , streams , multi-user	7	7101	December 15, 2021
Issue with multi-user session multi-user	6	806	February 8, 2023
Labels in mark, and multiuser access to prodigy usage , solved	7	2777	June 28, 2018
Questions on Multi-User Sessions on Prodigy usage , multi-user	5	2297	May 5, 2023
NER Team ner	6	550	January 25, 2023

Starting a Multi-user session

Related topics