Hi,
first of all, many thanks for such a cool annotation tool, I really like working with it
Second, I think we are facing a similar issue as the one described above.
Today, we tried a parallel session with 4 annotators, combined relation and entity annotation. Every annotator was supposed to see each example, so we would have four annotated datasets to compare in the end.
Here are the details:
Version: 1.11.4
Command:
PRODIGY_CONFIG=prodigy.json PRODIGY_ALLOWED_SESSIONS=name1,name2,name3,name4 PRODIGY_BASIC_AUTH_USER=username PRODIGY_BASIC_AUTH_PASS=password prodigy rel.manual ner_rels_pilot blank:en data/data.jsonl --label REL_1,REL_2 --span-label ENTITY_1,ENTITY_2
Prodigy config:
{
"auto_exclude_current": true,
"batch_size": 5,
"feed_overlap": true,
"host": HOST,
"port": PORT,
"show_flag": true,
"show_stats": true,
"total_examples_target": 14
}
We had 14 examples (first 7 English (EN), then 7 Japanese (JA) (pre-tokenized) examples) -- normally, we would not mix the languages, of course, but we wanted to test some specifics of our guidelines.
The procedure was as follows:
I started the service on the server and created a session for each annotator. In the beginning, we all saw the same. We annotated the first two EN examples, then wanted to skip to the JA examples. Therefore, we did not annotate the examples, but wanted to reject each one of them until we reached the JA examples.
We encountered two problems:
- After rejecting some of the EN, prodigy displayed "no tasks available" (for all of us), although we clearly did not annotate all of the examples. For three of us, I could resolve this by restarting the server and refreshing the annotation interface once. We could skip the EN examples (saw some duplicates) and arrived at JA. For our fourth annotator, this did not work out for some reason, only after restarting the server twice and refreshing the browser (Chrome) several times. This is a bit unfortunate, since the annotators cannot restart the service by themselves. Usually we wouldn't want to skip examples like this, but still this is not supposed to happen, is it?
I could re-create the first problem (with two annotators (me & myself :), 14 test samples, suddenly no tasks available) with a simple dummy test file; here, again, the message "no tasks available" message popped up in the middle of annotating.
Do you maybe think we were skipping the examples too fast?
I would have assumed that the streams would not interact with each other, since every annotator has to see every example and feed_overlap
was set to be true.
- Although we are using prodigy 1.11.4, we still get duplicate examples. This happened not only in the setting described above, where we only had 14 examples, but also in other projects I set up, e.g. for text classification (textcat.teach). Compared to the prodigy version before, it occurs less frequently, but still happens. In the textcat.teach scenario (two annotators, but only one really doing annotations) we had repetitions of 4-5 examples after every ~ 40 examples. After restarting the service, we did not have duplicates so far (~ 150 examples without duplicates).
I saw that you recommended to use one session per annotator -- I guess that would be ok to manage for projects with only two annotators, but a lot less elegant (I think). Or should I maybe be patient and wait for prodigy teams?
Thanks a lot in advance!
Best,
Lisa