feed_overlap true not working for multiple annotators

Hi,

We're doing a test classification using the texcat.manual recipe, and we have multiple annotators, however, the feed_overlap = true feature is not working well. We were hoping that all annotators could receive all entries, however, when one finishes annotating, the interface says "not available tasks". Each annotator has their own session. We've been using the same workflow for a while and it worked well, I don't understand why this is an issue now.

thanks,
Veronica

Hi! Which Prodigy version are you using, are you on the latest v1.11?

Hi I'm using version 1.10.8

If you're able to upgrade, could you try the latest v1.11? It includes various updates and improvements to the stream logic and will probably solve the problems you're seeing with overlapping feeds.

Alternatively, if your goal is to have every example annotated by everyone, there's little advantage over just running separate instances. So you could also run Prodigy multiple times on different ports, one per annotator. This keeps the sessions entirely separate, and you'll even be able to shut down instances if one annotator is done before the others.

Hi,

first of all, many thanks for such a cool annotation tool, I really like working with it :slight_smile:

Second, I think we are facing a similar issue as the one described above.
Today, we tried a parallel session with 4 annotators, combined relation and entity annotation. Every annotator was supposed to see each example, so we would have four annotated datasets to compare in the end.

Here are the details:
Version: 1.11.4

Command:

PRODIGY_CONFIG=prodigy.json PRODIGY_ALLOWED_SESSIONS=name1,name2,name3,name4 PRODIGY_BASIC_AUTH_USER=username PRODIGY_BASIC_AUTH_PASS=password prodigy rel.manual ner_rels_pilot blank:en data/data.jsonl --label REL_1,REL_2 --span-label ENTITY_1,ENTITY_2

Prodigy config:

{
  "auto_exclude_current": true,
  "batch_size": 5,
  "feed_overlap": true,
  "host": HOST,
  "port": PORT,
  "show_flag": true,
  "show_stats": true,
  "total_examples_target": 14
}

We had 14 examples (first 7 English (EN), then 7 Japanese (JA) (pre-tokenized) examples) -- normally, we would not mix the languages, of course, but we wanted to test some specifics of our guidelines.

The procedure was as follows:
I started the service on the server and created a session for each annotator. In the beginning, we all saw the same. We annotated the first two EN examples, then wanted to skip to the JA examples. Therefore, we did not annotate the examples, but wanted to reject each one of them until we reached the JA examples.
We encountered two problems:

  1. After rejecting some of the EN, prodigy displayed "no tasks available" (for all of us), although we clearly did not annotate all of the examples. For three of us, I could resolve this by restarting the server and refreshing the annotation interface once. We could skip the EN examples (saw some duplicates) and arrived at JA. For our fourth annotator, this did not work out for some reason, only after restarting the server twice and refreshing the browser (Chrome) several times. This is a bit unfortunate, since the annotators cannot restart the service by themselves. Usually we wouldn't want to skip examples like this, but still this is not supposed to happen, is it?
    I could re-create the first problem (with two annotators (me & myself :), 14 test samples, suddenly no tasks available) with a simple dummy test file; here, again, the message "no tasks available" message popped up in the middle of annotating.

Do you maybe think we were skipping the examples too fast?
I would have assumed that the streams would not interact with each other, since every annotator has to see every example and feed_overlap was set to be true.

  1. Although we are using prodigy 1.11.4, we still get duplicate examples. This happened not only in the setting described above, where we only had 14 examples, but also in other projects I set up, e.g. for text classification (textcat.teach). Compared to the prodigy version before, it occurs less frequently, but still happens. In the textcat.teach scenario (two annotators, but only one really doing annotations) we had repetitions of 4-5 examples after every ~ 40 examples. After restarting the service, we did not have duplicates so far (~ 150 examples without duplicates).

I saw that you recommended to use one session per annotator -- I guess that would be ok to manage for projects with only two annotators, but a lot less elegant (I think). Or should I maybe be patient and wait for prodigy teams?

Thanks a lot in advance!
Best,
Lisa

Thanks for the super detailed report, I'm investigating this.