Issue in multi-session mode: duplicated annotation tasks and different order?

Hi! We setup a prodigy installation to allow three individuals to label our set of annotation tasks.

We have two requirements: 1) each annotation task has to be labeled by each coder (and only once by each coder), 2) the annotation tasks should be shown to each coder in the same order.

Our pre-tests showed that both conditions are met in our setup. Further, we have not set the respective configuration parameter, which for the first requirement would be feed_overlap, so this should default to True and be fine. For the second requirement, I believe that I found this in the documentation, too, back then, but cannot find it currently (IIRC, there is an option which allows to randomize the order in which tasks are shown (which is, IIRC, disabled by default and thus should be fine in our case)).

However, in our main study, we find that basically both requirements are partially not met. Specifically, two coders have a fair bunch of overlap, whereas the third (who started a day later compared to the two others) seems to be labeling completely different annotation tasks from the overall stream. So, the order of the annotation tasks is not the same for everyone.

Did we miss something in our configuration or might there be a bug involved?

Cheers,
Felix

Update: Out of ca. 900 tasks, only 65 were coded by each annotator. The two first annotators have a higher overlap, but the core issue remains, i.e., the tasks are shown in a different order. I'm running version 1.8.4.

Update 2: One thing I noticed during our tests is that the progress bar shows infinity as the maximum numbers of tasks, whereas in earlier tests a finite number was shown that matched the number of annotation tasks in our input file.

image

Update 3: I just verified locally with two sessions running in the same process, e.g., accessed via the same http port etc., that if the first session annotates say 10 tasks and saves them (Ctrl+S), and then the second session opens the browser, the 11th task will be shown instead of the 1st, which I would expect, since they should be in same order and overlap is set to true.

Thanks for the detailed report, I hope we can get to the bottom of this quickly.

Could you provide the Prodigy command you're running? I'm especially interested in whether you're using a recipe that uses example selection (i.e., whether you're using active learning, via a recipe like ner.teach or textcat.teach). If you're using the active learning, then some of the examples will be skipped during annotation --- which would explain some of what you're seeing here.

Thanks for the quick reply!

The command to start prodigy is:

prodigy newstsa polnewstargetsentiment -F /prodigy/newstsarecipe.py /prodigy/ccnc_anno_equalbin.jsonl

We're using a custom recipe (AFAIK no active learning involved), the source code of which you can find attached:

import prodigy
from prodigy.components.loaders import JSONL


@prodigy.recipe('newstsa',
                dataset=prodigy.recipe_args['dataset'],
                file_path=("Path to texts", "positional", None, str))
def sentiment(dataset, file_path):
    """Annotate the sentiment of texts using different mood options."""
    stream = JSONL(file_path)  # load in the JSONL file
    stream = add_options(stream)  # add options to each task

    return {
        'dataset': dataset,  # save annotations in this dataset
        'view_id': 'choice',  # use the choice interface
        "config": {
            "choice_auto_accept": True,  # auto-accept example, once the users selects an option
            "instructions": "/prodigy/manual.html"
        },
        'on_exit': on_exit,
        'stream': stream,
    }


def add_options(stream):
    """Helper function to add options to every task in a stream."""
    options = [{'id': 'positive', 'text': '😊 positive'},
               {'id': 'neutral', 'text': '😶 neutral'},
               {'id': 'negative', 'text': '🙁 negative'},
               {'id': 'posneg', 'text': '😊+🙁 pos. and neg.'}]
    for task in stream:
        task['options'] = options
        yield task


def on_exit(controller):
    """Get all annotations in the dataset, filter out the accepted tasks,
    count them by the selected options and print the counts."""
    # taken from https://prodi.gy/docs/workflow-custom-recipes#example-choice
    examples = controller.db.get_dataset(controller.dataset)
    examples = [eg for eg in examples if eg['answer'] == 'accept']
    for option in ('positive', 'neutral', 'negative', 'posneg'):
        count = get_count_by_option(examples, option)
        print('Annotated {} {} examples'.format(count, option))


def get_count_by_option(examples, option):
    filtered = [eg for eg in examples if option in eg['accept']]
    return len(filtered)

Just one follow-up, only slightly related to this. Given the previously described command to start prodigy and the recipe, is it normal that the log states two dataset are created, i.e., polnewstargetsentiment3 (as expected) and 2019-10-16_23-01-37?

23:01:38 - DB: Creating dataset 'polnewstargetsentiment3'
Added dataset polnewstargetsentiment3 to database SQLite.
23:01:38 - DB: Loading dataset 'polnewstargetsentiment3' (0 examples)
23:01:38 - DB: Creating dataset '2019-10-16_23-01-37'

Okay, regarding the length thing: you can either provide a stream that provides a __len__, or if it's a generator, you can include a progress callback. So that's why your recipe doesn't show the number of examples remaining your stream is a generator, so doesn't have a length.

Regarding the question orders, are your annotators connecting with the session query parameters? If so, I would've thought they'd each get their own feed, and the feed_overlap setting defaults to true. So I'm not immediately sure what could be going wrong here.

The "feed_overlap" setting in your prodigy.json or recipe config lets you
configure how examples should be sent out across multiple sessions. By default
(true), each example in the dataset will be sent out once for each session,
so you'll end up with overlapping annotations (e.g. one per example per annotator).
Setting "feed_overlap" to false will send out each example in the data
once to whoever is available. As a result, your data will have each example
labelled only once in total.

Edit: The second dataset is the session, which seems to be a timed session rather than a named session. That does look like a clue that you're not using a query parameter to differentiate your annotators, which might be what's going wrong?

Okay, regarding the length thing: you can either provide a stream that provides a __len__ , or if it's a generator, you can include a progress callback. So that's why your recipe doesn't show the number of examples remaining your stream is a generator, so doesn't have a length.

Thanks, that's good to know and I should be able to resolve the missing progress with this.

Regarding the question orders, are your annotators connecting with the session query parameters? If so, I would've thought they'd each get their own feed, and the feed_overlap setting defaults to true. So I'm not immediately sure what could be going wrong here.

Yes, in the study we conducted, each annotator had their own session id, e.g, one annotator would use http://example.com/?session=anno1 whereas each of the other annotators had their own, unique session id, such as anno2 etc.

Edit: The second dataset is the session, which seems to be a timed session rather than a named session. That does look like a clue that you're not using a query parameter to differentiate your annotators, which might be what's going wrong?

This is actually from our new study setup, where each annotator (due to the issues I described in this support thread) has their own prodigy process, running on a different port and a different dataset. By setting it up this way, I hope to circumvent the duplicates and wrong order problems. But what do you mean by query parameter? Are you referring to the session id? How would I set the query parameter properly? And what's the effect of not setting it manually, given that each annotator in the new setup has their own process and dataset?

Hi @fhamborg,

This is a quirk of mixing named sessions with the old-style timestamps. Timestamp sessions are created even if you use named sessions because logic handling the two is in different places. The timestamp sessions don't contain anything in this case, so I guess it's normal. We could consider suppressing those sessions in the future.

I'm trying to reproduce your issue given:

recipe.py

import prodigy
from prodigy.components.loaders import JSONL


@prodigy.recipe(
    "sentiment_choices",
    dataset=prodigy.recipe_args["dataset"],
    file_path=("Path to texts", "positional", None, str),
)
def sentiment(dataset, file_path):
    """Annotate the sentiment of texts using different mood options."""
    stream = JSONL(file_path)  # load in the JSONL file
    stream = add_options(stream)  # add options to each task

    return {
        "dataset": dataset,  # save annotations in this dataset
        "view_id": "choice",  # use the choice interface
        "config": {
            "choice_auto_accept": True,  # auto-accept example, once the users selects an option
            # "instructions": "/prodigy/manual.html",
        },
        "on_exit": on_exit,
        "stream": stream,
    }


def add_options(stream):
    """Helper function to add options to every task in a stream."""
    options = [
        {"id": "positive", "text": "😊 positive"},
        {"id": "neutral", "text": "😶 neutral"},
        {"id": "negative", "text": "🙁 negative"},
        {"id": "posneg", "text": "😊+🙁 pos. and neg."},
    ]
    for task in stream:
        task["options"] = options
        yield task


def on_exit(controller):
    """Get all annotations in the dataset, filter out the accepted tasks,
    count them by the selected options and print the counts."""
    # taken from https://prodi.gy/docs/workflow-custom-recipes#example-choice
    examples = controller.db.get_dataset(controller.dataset)
    examples = [eg for eg in examples if eg["answer"] == "accept"]
    for option in ("positive", "neutral", "negative", "posneg"):
        count = get_count_by_option(examples, option)
        print("Annotated {} {} examples".format(count, option))


def get_count_by_option(examples, option):
    filtered = [eg for eg in examples if option in eg["accept"]]
    return len(filtered)

run.sh

prodigy sentiment_choices sentiment-dataset "./data.jsonl" -F ./recipe.py

data.jsonl

{"text": "1", "label": "TEST"}
{"text": "2", "label": "TEST"}
{"text": "3", "label": "TEST"}
{"text": "4", "label": "TEST"}
{"text": "5", "label": "TEST"}
{"text": "6", "label": "TEST"}
{"text": "7", "label": "TEST"}
{"text": "8", "label": "TEST"}
{"text": "9", "label": "TEST"}
{"text": "10", "label": "TEST"}
{"text": "11", "label": "TEST"}
{"text": "12", "label": "TEST"}
{"text": "13", "label": "TEST"}
{"text": "14", "label": "TEST"}
{"text": "15", "label": "TEST"}

the dataset I used for testing was actually 1000 entries, trimmed here for brevity

When I start your recipe (with the instructions line commented out) I observe that each named user session gets their own copy of the data, in the same order, starting at 1 and increasing by 1 for each annotation.

Here's a gif of what it looks like for me: https://giphy.com/gifs/XHGuRWz3HCgSDsUdlD

Can you help me understand what you're doing differently?

Hi @justindujardin, thanks for your answers.
So, I guess my slightly off-topic question regarding whether that log message is okay is answered now :slight_smile: Thanks.

Regarding the order issue, I guess we were/are basically doing the same thing and when I tested this locally just by myself, e.g., using different tabs and different session ids, for say 10 or 20 tasks, I found the task shown the correct order without duplicates in each session, independently of what I had already labeled in the other sessions. That's why I was surprised to see that this is not the case, while conducting our main study with three different annotators for a while. I'm not sure what the key difference of both setups (local pre-test vs. main study on our server) is, but just to name a few that come to my mind:

  • strongly different times of annotation, e.g., two coders started their annotations within a few hours, whereas the other coder started a day later
  • longer annotation times, i.e., each coder spent a couple of hours coding, also incl. intermediate breaks; then hours or a day later they continued their annotation
  • one coder reported that they were having the same tab open all the time (also between longer breaks), whereas another one closed the tab when they finished their work for the day

Besides such differences, there should be no technical difference, since in both cases I was using the same docker image to run prodigy, i.e., only the computer running the container changed.

If you have any idea of what could be potential root causes of this issue, I'll gladly try to provide more information.

Just FYI and maybe others who are reading this thread because they have similar issues: Right now, we have changed the setup slightly: we set up the main study using three prodigy processes, each with their own PRODIGY_HOME, PRODIGY_PORT and also own dataset. Moreover, we have applied the stream function as suggested by Ines (Struggling to create a multiple choice image classification). These two changes seem to work better now, but I couldn't have checked it in detail yet, since not all coders resumed their annotations.

1 Like

That's a great clue. I think what is happening is that you're experiencing a long-standing quirk of Prodigy, where closing/reopening the tab causes examples to get missed by the user. Luckily when this happens the examples aren't gone forever, just until you restart the annotation task (at which point they end up in the queue again.)

Prodigy streams don't have any mechanism built-in for returning unanswered questions, which means if you have items on the client that are not returned with answers, the server doesn't know and will give you questions starting after the last one it previously sent you when you ask again. You can see the effect here: https://giphy.com/gifs/SvcmLM0NPixm5vTapk

Do you think this could account for the issues you have seen? I can see how this is not great for a use-case with restrictions like yours. I'm looking into an enhancement that will work with your requirements and will follow up here when I'm done with that.

If what I've described is your problem, in the short-term you can ask your developers to keep the tab open until they are done with the task.

Hi Justin, what the GIF you posted shows is probably exactly the behavior we experienced. But I didn't fully get why, in your GIF example, suddenly also coolname-2 gets task #10. Shouldn't coolname-2 see task #0 first, because he is running in his own session, which should (?) be totally independent of coolname?

I wondered about this too. It showed up 10 because I had already navigated to that particular session once before I recorded the video.

Here's a fresh run after dropping the test dataset: http://www.giphy.com/gifs/MFayYRIUwrPk9mEfea

@fhamborg I've come up with a solution to enforce your constraints. I tried to make it work with the current prodigy version but it required client modifications, so it should be available with the next release.

The way it works is that you specify a custom repeating feed class that causes the stream to start over at the last question that was answered each time you ask for questions. Then you set the client option to tell the server about any answered questions the client has that are undelivered. The effect is that every time you get the same questions in the same order until you answer them, even if you close the tab or hit refresh: http://www.giphy.com/gifs/iGeg41sdvj21e1a4f1

The updates to your recipe would then look something like this:

import prodigy
from prodigy.components.db import connect
from prodigy.components.feeds import RepeatingFeed
from prodigy.components.loaders import JSONL


@prodigy.recipe(
    "sentiment_choices",
    dataset=prodigy.recipe_args["dataset"],
    file_path=("Path to texts", "positional", None, str),
)
def sentiment(dataset, file_path):
    """Annotate the sentiment of texts using different mood options."""
    stream = JSONL(file_path)  # load in the JSONL file
    stream = add_options(stream)  # add options to each task
    DB = prodigy.components.db.connect()
    return {
        "dataset": dataset,  # save annotations in this dataset
        "view_id": "choice",  # use the choice interface
        "config": {
            "choice_auto_accept": True,  # auto-accept example, once the users selects an option,
            "hint_pending_answers": True,  # don't send questions we've already answered but haven't returned
            "feed_class": RepeatingFeed,  # repeat previously sent questions
            "feed_filters": [],
        },
        "on_exit": on_exit,
        "stream": stream,
    }


def add_options(stream):
    """Helper function to add options to every task in a stream."""
    options = [
        {"id": "positive", "text": "😊 positive"},
        {"id": "neutral", "text": "😶 neutral"},
        {"id": "negative", "text": "🙁 negative"},
        {"id": "posneg", "text": "😊+🙁 pos. and neg."},
    ]
    for task in stream:
        task["options"] = options
        yield task


def on_exit(controller):
    """Get all annotations in the dataset, filter out the accepted tasks,
    count them by the selected options and print the counts."""
    # taken from https://prodi.gy/docs/workflow-custom-recipes#example-choice
    examples = controller.db.get_dataset(controller.dataset)
    examples = [eg for eg in examples if eg["answer"] == "accept"]
    for option in ("positive", "neutral", "negative", "posneg"):
        count = get_count_by_option(examples, option)
        print("Annotated {} {} examples".format(count, option))


def get_count_by_option(examples, option):
    filtered = [eg for eg in examples if option in eg["answer"]]
    return len(filtered)

Hi @justindujardin
That's great news! So, basically I'd need to add most importantly two lines to my recipe:

"hint_pending_answers": True,  # don't send questions we've already answered but haven't returned
"feed_class": RepeatingFeed,  # repeat previously sent questions

Right? I'm not sure whether I saw the behaviour in the GIF, e.g., in the GIF, the user does four annotations, saves it and then opens the same session in another tab (starting at the next annotation task). What you describes sounds like you could close a tab and when reopening the tab (=session) it would always show first the task that comes directly after the last annotated task of the same session, or did I misunderstood this?

Yeah, that's right

Yep, exactly

Sorry about that, the app that I use for GIFs limits me to 15 seconds, so it's difficult to be slow and deliberate in my actions. You're right that you can close/reopen/refresh the tab and it will show you the same question until you answer it. :sweat_smile:

Alright, that's perfect - I'm looking forward to the new update! :slight_smile:

In the meantime, using the looped setup described in the last paragraph of Issue in multi-session mode: duplicated annotation tasks and different order?, two annotators repeatedly receive internal server error. Looking at the logs, I find:

Exception when serving /give_answers

Traceback (most recent call last):

  File "/usr/local/lib/python3.6/site-packages/waitress/channel.py", line 336, in service

    task.service()

  File "/usr/local/lib/python3.6/site-packages/waitress/task.py", line 175, in service

    self.execute()

  File "/usr/local/lib/python3.6/site-packages/waitress/task.py", line 452, in execute

    app_iter = self.channel.server.application(env, start_response)

  File "/usr/local/lib/python3.6/site-packages/hug/api.py", line 451, in api_auto_instantiate

    return module.__hug_wsgi__(*args, **kwargs)

  File "/usr/local/lib/python3.6/site-packages/falcon/api.py", line 244, in __call__

    responder(req, resp, **params)

  File "/usr/local/lib/python3.6/site-packages/hug/interface.py", line 789, in __call__

    raise exception

  File "/usr/local/lib/python3.6/site-packages/hug/interface.py", line 762, in __call__

    self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)

  File "/usr/local/lib/python3.6/site-packages/hug/interface.py", line 698, in call_function

    return self.interface(**parameters)

  File "/usr/local/lib/python3.6/site-packages/hug/interface.py", line 100, in __call__

    return __hug_internal_self._function(*args, **kwargs)

  File "/usr/local/lib/python3.6/site-packages/prodigy/_api/hug_app.py", line 282, in give_answers

    controller.receive_answers(answers, session_id=session_id)

  File "cython_src/prodigy/core.pyx", line 148, in prodigy.core.Controller.receive_answers

  File "/usr/local/lib/python3.6/site-packages/prodigy/components/db.py", line 378, in add_examples

    input_hash=eg[INPUT_HASH_ATTR],

KeyError: '_input_hash' 

What could be the issue here? Restarting the server helps, but unfortunately only temporarily. The full recipe is:

import prodigy
from prodigy.components.db import connect
from prodigy.components.loaders import JSONL


@prodigy.recipe('newstsa',
                dataset=prodigy.recipe_args['dataset'],
                file_path=("Path to texts", "positional", None, str))
def sentiment(dataset, file_path):
    """Annotate the sentiment of texts using different mood options."""
    stream = get_stream_loop(file_path, dataset)

    return {
        'dataset': dataset,  # save annotations in this dataset
        'view_id': 'choice',  # use the choice interface
        "config": {
            "choice_auto_accept": True,  # auto-accept example, once the users selects an option
            "instructions": "/prodigy/manual.html"
        },
        'on_exit': on_exit,
        'stream': stream,
    }


def get_stream_loop(file_path, dataset):
    # to prevent that no tasks are shown even though there are still unlabeled tasks left
    # https://support.prodi.gy/t/struggling-to-create-a-multiple-choice-image-classification/1345/2
    db = connect()
    while True:
        stream = get_stream(file_path)
        hashes_in_dataset = db.get_task_hashes(dataset)
        yielded = False
        for eg in stream:
            # Only send out task if its hash isn't in the dataset yet, which should mean that we will not have duplicates
            if eg["_task_hash"] not in hashes_in_dataset:
                yield eg
                yielded = True
        if not yielded:
            break


def get_stream(file_path):
    stream = JSONL(file_path)  # load in the JSONL file
    stream = add_options(stream)  # add options to each task

    for eg in stream:
        eg = prodigy.set_hashes(eg)
        yield eg


def add_options(stream):
    """Helper function to add options to every task in a stream."""
    options = [{'id': 'positive', 'text': '😊 positive'},
               {'id': 'neutral', 'text': '😶 neutral'},
               {'id': 'negative', 'text': '🙁 negative'},
               {'id': 'posneg', 'text': '😊+🙁 pos. and neg.'}]
    for task in stream:
        task['options'] = options
        yield task


def on_exit(controller):
    """Get all annotations in the dataset, filter out the accepted tasks,
    count them by the selected options and print the counts."""
    # taken from https://prodi.gy/docs/workflow-custom-recipes#example-choice
    examples = controller.db.get_dataset(controller.dataset)
    examples = [eg for eg in examples if eg['answer'] == 'accept']
    for option in ('positive', 'neutral', 'negative', 'posneg'):
        count = get_count_by_option(examples, option)
        print('Annotated {} {} examples'.format(count, option))


def get_count_by_option(examples, option):
    filtered = [eg for eg in examples if option in eg['accept']]
    return len(filtered)

What you describe sounds like this issue that was fixed in 1.8.5, have you updated since it was released?

Ah, thanks for the hint; we were still on 1.8.4. I updated and we haven't seen the issue since.

1 Like