NER Team

kushalrsharma · January 23, 2023, 6:45pm

Hello,
I and my team are currently utilizing my company’s license for Prodigy to annotate some data for NER i have like more than 5 members for annotating that I am interested in later analyzing with Spacy. As part of the annotation process, is there a way to resume annotation where you left off from your previous session and start again.
Please advise.

ryanwesslen · January 24, 2023, 1:39pm

hi @kushalrsharma!

Thanks for your question.

By default, Prodigy should resume annotation based on where you left off. So I'm not sure on why you don't think that is the case.

Since you have multiple annotators, have you changed feed_overlap?

The "feed_overlap" setting in your prodigy.json or recipe config lets you configure how examples should be sent out across multiple sessions. If true , each example in the dataset will be sent out once for each session , so you’ll end up with overlapping annotations (e.g. one per example per annotator). Setting "feed_overlap" to false will send out each example in the data once to whoever is available. As a result, your data will have each example labelled only once in total.

This setting will change the behavior of how annotations are sent out across multiple annotators. By default, feed_overlap is false, so where Prodigy starts after stopping the server will depend on what was the last annotation saved to the database (be sure that annotators click the "save" button when they're done so that the last batch is saved to the database).

kushalrsharma · January 24, 2023, 4:11pm

Is this wierd?

export PRODIGY_ALLOWED_SESSIONS=Simran,Saru,Anupam,Heera,Kshitiz,Nirajan,Samikshya,Saurav,Prakash

!python -m prodigy spans.manual span_resume  blank:en /home/kushal/Documents/spacyprodigy/Prakash/jsonlfiles/resume365.jsonl --label PROFILE,PROFILE_NAME,PROFILE_EMAIL,PROFILE_ADDRESS,PROFILE_SOCIAL,PROFILE_SUMMARY,EXPERIENCE,EXPERIENCE_ORG,EXPERIENCE_ADDRESS,EXPERIENCE_STARTDATE,EXPERIENCE_ENDDATE,EXPERIENCE_SKILLS,EXPERIENCE_KNOWLEDGE,ACADEMICS,ACADEMIC_INSTITUTIONS,ACADEMIC_ADDRESS,ACADEMIC_STARTDATE,ACADEMIC_ENDDATE,ACADEMIC_GPA,ACADEMIC_COURSES

When i use http://localhost:8080/?session=Nirajan, http://localhost:8080/?session=Saru
The data that are shown across session are same. why?

prodigy.json 
{
	"feed_overlap": false
}

What can be done to have the different data among different sessions? Is it due to this >/home/kushal/Documents/spacyprodigy/Prakash/jsonlfiles/resume365.jsonl since it consists of only data?

ryanwesslen · January 24, 2023, 5:17pm

hi @kushalrsharma,

When all sessions have the same data, we call that "overlapping" annotations.

However, it seems that you want "non-overlapping" annotations, i.e., each example in the data once to whoever is available. As a result, your data will have each example labelled only once in total.

This is exactly what setting "feed_overlap" to false will do (this is the value by default).

But it seems like you're saying that you think your "feed_overlap" is still set to false, but it is not producing that behavior, right?

Try to run:

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": false}' python -m prodigy ...

This should override everything. I'm concerned you may have accidentally set "feed_overlap": true at some point or not pointing to the correct prodigy.json. You can technically have a prodigy.json for your project and one for global. (You may want to run python -m prodigy stats to verify what path your global prodigy.json is).

Also, you may want to consider reseting your overrides:

export PRODIGY_CONFIG_OVERRIDES="{}"

Last, when providing examples, please provide reproducible examples for us. This will help you get faster responses. Since we don't have your data, it is impossible for us to help you to confirm what the problem and that we're talking about the same issue. I've created this example below that shows this to help you see what should be the difference.

Using this data:
nyt_text_dedup.jsonl (18.5 KB)

`feed_overlap: false` ("non-overlapping")

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": false}' python3 -m prodigy ner.manual ner_ex blank:en nyt_text_dedup.jsonl --label ORG

First, open browser for session1: "ryan"

Don't annotate any examples.

Then open a 2nd browser for session2: "kushu"

Notice how this starts "kushu" at record number 10 (since batch_size is 10).

`feed_overlap: true` (overlapping)

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python3 -m prodigy ner.manual ner_ex blank:en nyt_text_dedup.jsonl --label ORG

First, open browser for session1: "ryan"

Don't annotate any examples.

Then open a 2nd browser for session2: "kushu"

Notice how this starts "kushu" at record number 0 (that is, "ryan" and "kushu" have the same data, hence their annotations "overlap").

Hope this helps to clarify and let me know if this clears up any confusion!

kushal_pythonist · January 25, 2023, 4:07am

Thank You @ryanwesslen it worked.

kushalrsharma · January 25, 2023, 7:12am

@ryanwesslen do this works in case of different users connected to the same wifi? Extremely needed this

ryanwesslen · January 25, 2023, 12:24pm

hi @kushalrsharma!

I think the answer is yes. I'm a bit confused on what you mean by "connected to the same wifi". Typically, they need access to the server that hosts your Prodigy session and that host was modified (e.g., 0.0.0.0).

But please remember -- there are many challenges in setting up annotators to access from external computers like firewall, reverse proxy for setting up https, authentication, etc. Here are just a few examples:

My main takeaway is please remember to keep searching for past tickets. While we'll do our best to help, this is a non-trivial problem to scale up many annotators. We have found many users can create such processes on their own, but they need to have good skills in managing networks/security and it takes time to develop these.

Topic		Replies	Views
Duplicated annotation when changing version ner , spacy	6	556	November 9, 2022
Resume Annotation Session with Prodigy - Text Classification textcat	1	1641	June 14, 2018
Resuming annotations after closing the terminal usage , done , streams	4	629	November 11, 2020
Duplicates in ner.correct in 1.10.2 done , streams	3	524	August 10, 2020
Multiple Sessions duplicated data usage	1	524	July 24, 2019

NER Team

feed_overlap: false ("non-overlapping")

feed_overlap: true (overlapping)

Related topics

`feed_overlap: false` ("non-overlapping")

`feed_overlap: true` (overlapping)