NER Team

Hello,
I and my team are currently utilizing my company’s license for Prodigy to annotate some data for NER i have like more than 5 members for annotating that I am interested in later analyzing with Spacy. As part of the annotation process, is there a way to resume annotation where you left off from your previous session and start again.
Please advise.

hi @kushalrsharma!

Thanks for your question.

By default, Prodigy should resume annotation based on where you left off. So I'm not sure on why you don't think that is the case.

Since you have multiple annotators, have you changed feed_overlap?

The "feed_overlap" setting in your prodigy.json or recipe config lets you configure how examples should be sent out across multiple sessions. If true , each example in the dataset will be sent out once for each session , so you’ll end up with overlapping annotations (e.g. one per example per annotator). Setting "feed_overlap" to false will send out each example in the data once to whoever is available. As a result, your data will have each example labelled only once in total.

This setting will change the behavior of how annotations are sent out across multiple annotators. By default, feed_overlap is false, so where Prodigy starts after stopping the server will depend on what was the last annotation saved to the database (be sure that annotators click the "save" button when they're done so that the last batch is saved to the database).

Is this wierd?

export PRODIGY_ALLOWED_SESSIONS=Simran,Saru,Anupam,Heera,Kshitiz,Nirajan,Samikshya,Saurav,Prakash 
!python -m prodigy spans.manual span_resume  blank:en /home/kushal/Documents/spacyprodigy/Prakash/jsonlfiles/resume365.jsonl --label PROFILE,PROFILE_NAME,PROFILE_EMAIL,PROFILE_ADDRESS,PROFILE_SOCIAL,PROFILE_SUMMARY,EXPERIENCE,EXPERIENCE_ORG,EXPERIENCE_ADDRESS,EXPERIENCE_STARTDATE,EXPERIENCE_ENDDATE,EXPERIENCE_SKILLS,EXPERIENCE_KNOWLEDGE,ACADEMICS,ACADEMIC_INSTITUTIONS,ACADEMIC_ADDRESS,ACADEMIC_STARTDATE,ACADEMIC_ENDDATE,ACADEMIC_GPA,ACADEMIC_COURSES

When i use http://localhost:8080/?session=Nirajan, http://localhost:8080/?session=Saru
The data that are shown across session are same. why?

prodigy.json 
{
	"feed_overlap": false
}

What can be done to have the different data among different sessions? Is it due to this >/home/kushal/Documents/spacyprodigy/Prakash/jsonlfiles/resume365.jsonl since it consists of only data?

hi @kushalrsharma,

When all sessions have the same data, we call that "overlapping" annotations.

However, it seems that you want "non-overlapping" annotations, i.e., each example in the data once to whoever is available. As a result, your data will have each example labelled only once in total.

This is exactly what setting "feed_overlap" to false will do (this is the value by default).

But it seems like you're saying that you think your "feed_overlap" is still set to false, but it is not producing that behavior, right?

Try to run:

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": false}' python -m prodigy ...

This should override everything. I'm concerned you may have accidentally set "feed_overlap": true at some point or not pointing to the correct prodigy.json. You can technically have a prodigy.json for your project and one for global. (You may want to run python -m prodigy stats to verify what path your global prodigy.json is).

Also, you may want to consider reseting your overrides:

export PRODIGY_CONFIG_OVERRIDES="{}"

Last, when providing examples, please provide reproducible examples for us. This will help you get faster responses. Since we don't have your data, it is impossible for us to help you to confirm what the problem and that we're talking about the same issue. I've created this example below that shows this to help you see what should be the difference.

Using this data:
nyt_text_dedup.jsonl (18.5 KB)

feed_overlap: false ("non-overlapping")

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": false}' python3 -m prodigy ner.manual ner_ex blank:en nyt_text_dedup.jsonl --label ORG

First, open browser for session1: "ryan"

Don't annotate any examples.

Then open a 2nd browser for session2: "kushu"

Notice how this starts "kushu" at record number 10 (since batch_size is 10).

feed_overlap: true (overlapping)

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python3 -m prodigy ner.manual ner_ex blank:en nyt_text_dedup.jsonl --label ORG

First, open browser for session1: "ryan"

Don't annotate any examples.

Then open a 2nd browser for session2: "kushu"

Notice how this starts "kushu" at record number 0 (that is, "ryan" and "kushu" have the same data, hence their annotations "overlap").

Hope this helps to clarify and let me know if this clears up any confusion!

Thank You @ryanwesslen it worked.

@ryanwesslen do this works in case of different users connected to the same wifi? Extremely needed this

hi @kushalrsharma!

I think the answer is yes. I'm a bit confused on what you mean by "connected to the same wifi". Typically, they need access to the server that hosts your Prodigy session and that host was modified (e.g., 0.0.0.0).

But please remember -- there are many challenges in setting up annotators to access from external computers like firewall, reverse proxy for setting up https, authentication, etc. Here are just a few examples:

My main takeaway is please remember to keep searching for past tickets. While we'll do our best to help, this is a non-trivial problem to scale up many annotators. We have found many users can create such processes on their own, but they need to have good skills in managing networks/security and it takes time to develop these.