Can’t set feed_overlap override in custom recipe

I have the following custom recipe using blocks to have two label capture elements: 3 point select and a text comment field. My custom recipe is below. Our global prodigy.json file has feed_overlap set to false. We want to build a calibration recipe to let us have all sessions label all tasks. When launching recipe it does not honor the feed overlap setting of TRUE. Am I missing something here?


import random
from typing import List, Optional
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string


# Helper functions for adding user provided labels to annotation tasks.
def preprocess_stream(stream):

    blocks = [
        {"view_id": "choice", "text": None},
        {"view_id": "text_input", "field_rows": 2, "field_label": "Optional Notes"},
    ]

    options = [
        {"id": "1", "text": "1 - not interesting"},
        {"id": "2", "text": "2 - somewhat interesting"},
        {"id": "3", "text": "3 - highly interesting"},
    ]

    for task in stream:
        task["options"] = options
        yield task


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "user_post_interest.likert",
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    exclude=("Names of datasets to exclude", "option", "e", split_string),
)
def user_post_interest_likert(
    dataset: str,
    source: str,
    exclude: Optional[List[str]] = None,
):
    """
    Manually annotate categories that apply to a text. If more than one label
    is specified, categories are added as multiple choice options. If the
    --exclusive flag is set, categories become mutually exclusive, meaning that
    only one can be selected during annotation.
    """

    # Load the stream from a JSONL file and return a generator that yields a
    # dictionary for each example in the data.
    stream = JSONL(source)

    stream = preprocess_stream(stream)

    return {
        "view_id": "blocks",
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,  # Incoming stream of examples
        "config": {
            "blocks": [
                {"view_id": "choice"},
                {
                    "view_id": "text_input",
                    "field_rows": 2,
                    "field_label": "Optional notes can be added",
                },
            ],
            "choice_style": "single",
            "feed_overlap": True,
            "exclude_by": "task",

        },
    }

I just tried your recipe locally, using this examples.jsonl file.

{"text": "example 1"}
{"text": "example 2"}
{"text": "example 3"}
{"text": "example 4"}
{"text": "example 5"}

This was my call in the terminal.

python -m prodigy user_post_interest.likert issue-6402 -F recipe.py

When I ran this, I got this warning when I clicked the localhost link.

⚠ The running recipe is configured for multiple annotators using named
sessions with feed_overlap=True, but a client is requesting questions using the
default session. For this recipe, open the app with ?session=name added to the
URL or set feed_overlap to False in your configuration.

This suggests to me that there's nothing wrong with your recipe, but that there might be a prodigy.json file in your folder that's overriding this. Could you check that?

If that doesn't help, please let me know because there might be a bug. That said, another "fix" would be to manually override your recipe from the command line. You should be able to run this via this one-liner:

PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}' python -m prodigy user_post_interest.likert issue-6402 examples.jsonl -F recipe.py

The app loads, and I am connecting to our instance by appending ?session=nick to the end of our host address.

prodigy loads a task, and if I label it, and save the annotation, and then reload the instance with a new user session name like newUser the first task that is shown is the next task in the stream, NOT the original first task again. This is not the expected behavior.

Worth noting, that if I change the global setting in prodigy.jsonl file and relaunch the app, the behavior is as expected. In our environment we want the default to be feed_overlap=False for almost all projects, and only enforce overlap True for specific tasks.

Unfortunately, in the way we have our prodigy instances configured, we cannot use the command line override. Our implementation is currently configured with Gitlab CI/CD using variables and so the python -m prodigy portion is hardcoded into our yaml config to construct commands we send through the pipeline.

What else could I help do to triage or confirm its a bug, and not a configuration issue on our end?

That's strange, since that's not the behavior that I'm seeing. I've also set feed_overlap to false in my global prodigy.json file and I'm able to run through these steps.

Here's me first checking in with /?session=alice.

After annotating I move on to /?session=bob.

On both occasions I've hit save and I've annotated "example 1" twice. Just to double-check, can you confirm that you've hit save? Then when I close Prodigy I'm able to export the annotations to confirm this

python -m prodigy db-out issue-6402

And this is what I see.

{"text":"example 1","options":[{"id":"1","text":"1 - not interesting"},{"id":"2","text":"2 - somewhat interesting"},{"id":"3","text":"3 - highly interesting"}],"_input_hash":-872502053,"_task_hash":839879573,"_view_id":"blocks","config":{"choice_style":"single"},"accept":["2"],"user_input":"yes","answer":"accept","_timestamp":1677924419,"_annotator_id":"issue-6402-alice","_session_id":"issue-6402-alice"}
{"text":"example 1","options":[{"id":"1","text":"1 - not interesting"},{"id":"2","text":"2 - somewhat interesting"},{"id":"3","text":"3 - highly interesting"}],"_input_hash":-872502053,"_task_hash":839879573,"_view_id":"blocks","config":{"choice_style":"single"},"accept":["1"],"user_input":"no","answer":"accept","_timestamp":1677924432,"_annotator_id":"issue-6402-bob","_session_id":"issue-6402-bob"}

It seems that I'm currently not able to reproduce your issues, but I'll gladly hear it if I glanced over something.

Unfortunately, in the way we have our prodigy instances configured, we cannot use the command line override. Our implementation is currently configured with Gitlab CI/CD using variables and so the python -m prodigy portion is hardcoded into our yaml config to construct commands we send through the pipeline.

Would a local prodigy.json file work in that scenario then? You wouldn't need to override manually in that case since the local file should override the global one. Alternatively, you could also export the variable beforehand?

export PRODIGY_CONFIG_OVERRIDES='{"feed_overlap": true}'

Should that not work, Gitlab allows you to configure variables from the .gitlab-ci.yml file. You can read more about that on their docs. Would that suffice?

What else could I help do to triage or confirm its a bug, and not a configuration issue on our end?

Just for the safe side, could you share me some of your environment info? You can get a summary by running python -m prodigy stats.