instant_submit and no tasks available

Hi.

I'm writing a custom recipe with which we are going to do text classification.
And we need the annotation results to be saved instantly which I think can be implemented by setting instant_submit to true.
However, when I set the option true, prodigy displays 'No tasks available' after it presents as many annotation tasks as batch_size (eg, 10) even though there are more annotation tasks.
If I unset instant_submit, everything gets fine. So I think setting the option (i.e., instant_submit) has an impact on displaying the message: No tasks available.
Would you help me resolve this issue?

Hi! Could you share a bit more about your workflow? Which recipe are you using and if it's custom, how is it configured?

Our workflow is (1) loading images (e.g., png files) from an external database as a stream, (2) adding about 10 options to each task, and (3) displaying the images as tasks by using an html template.

The recipe I made is as follows:

@prodigy.recipe(
        'newdomain-annotation',
        dataset = ('Dataset to save answers to', 'positional', None, str),
        view_id = ('Annotation interface', 'positional', None, str),
        file_path = ('Path to texts', 'positional', None, str),
)
def annotation(dataset, view_id='blocks', file_path='tasks.txt'):
    stream = load_my_custom_stream(file_path)
    stream = add_options(stream)

return {
        'dataset': dataset,
        'view_id': view_id,
        'stream': stream,
        'update': update,
        'on_exit': on_exit,
        'config': {
            'custom_theme': {'cardMaxWidth': 1200},
            'instructions': 'instructions.html',
            'batch_size': 15,
            'host': '0.0.0.0',
            'port': 8001,
            'show_status': True,
            'show_flag': True,
            'feed_overlab': False,
            'instant_submit': True,
            'blocks': [
                {'view_id': 'html', 'html_template': '''<a href="{{img}}" target="_blank"> <img width="800" height="2000" border="0" align="center" src="{{img}}"/> </a>'''}, {'view_id': 'choice'}
            ]
        }
}

def main():
    os.environ['PRODIGY_ALLOWED_SESSIONS'] = 'user1,user2'
    os.environ['PRODIGY_LOGGING'] = 'basic'
    prodigy.serve('newdomain-annotation newdomain_annotation_dataset')

I think there is nothing special in my recipe, and it works well unless I unset the instant_submit option.

Thanks for the details!

Do the images come in as URLs or do you encode them as base64? The "no tasks available" message is shown when Prodigy receives an empty batch – this typically happens if there's nothing left to annotate, but could potentially also happen if it takes too long for the stream to serve the next batch. It's confusing, though, why instant_submit makes a difference here :thinking: Because even in this case, Prodigy will still request a batch of the batch size in the background if the queue is running low.

Do you see a new batch of tasks when you refresh the browser?

Also, this is unrelated but I just noticed there's a small typo in the config: feed_overlab.

Thank you for your kind response. I also think, as you said, the batch loading time causes the problem. I will check that first. And also thank you for pinpointing the typo!

Hi, @ines!
We have the same issue with instant_submit. We are loading text samples from .jsonl file located in s3 bucket as a stream via dvc.api.open(). When instant_submit is on, new batch of tasks loading only after refresh. Without instant_submit tasks loading as expected until all of samples won't labeled. But new batch loading slowly, so it might causes the problem.

def get_dvc_stream(repo, path, rev, github_token):
    repo_url = f"https://{github_token}@github.com/{repo}"
    with dvc.api.open(path=path, repo=repo_url, rev=rev, mode='rb') as fsource:
        for line in fsource.readlines():
            yield json.loads(line)

@recipe('clausescat.manual',
        dataset=("Dataset to save answers to", "positional", None, str),
        label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels)
        )
def clauses_categorisation(dataset, label):

    options = [{"id": id, "text": text} for id, text in enumerate(label)]

    blocks = [
        {"view_id": "text"},
        {"view_id": "choice", "text": None},
        {"view_id": "text_input", "field_rows": 3, "field_label": "Comment"}
    ]

    def get_stream():
        stream = get_dvc_stream(settings.dvc_repo,
                                settings.dvc_data_path,
                                settings.dvc_rev,
                                settings.github_token)
        for eg in stream:
            yield {'text': eg['text'], "options": options, 'meta': eg['meta']}

    stream = get_stream()

    return {
        "dataset": dataset,  # the dataset to save annotations to
        "view_id": "blocks",  # set the view_id to "blocks"
        "stream": stream,  # the stream of incoming examples
        "config": {
            "blocks": blocks,  # add the blocks to the config
            "card_css": {"text-align": "left"}
        }
    }

Here is the config for annotation server:

from settings import settings
import prodigy
import recipes

prodigy_config = {"db": "postgresql",
                  "db_settings":
                      {"postgresql":
                           {"dbname": "prodigy",
                            "user": settings.database_username,
                            "password": settings.database_password,
                            "host": settings.database_host,
                            "port": settings.database_port}
                       },
                  "host": "0.0.0.0",
                  "show_flag": True,
                  "custom_theme":
                      {"cardMaxWidth": "95%",
                       "smallText": 16},
                  "feed_overlap": True,
                  "hide_meta": False,
                  "instant_submit": settings.instant_submit}

print(settings.prodigy_recipe_cmd)
prodigy.serve(settings.prodigy_recipe_cmd, **prodigy_config)

Hi @ines,

I'm running into the same issue with instant_submit=true and the audio.manual recipe. This seems to be a bug with instant_submit.

Hi @ines !
I am also facing the same issue when using instant_submit=true with the textcat.manual task. Are there any fixes available?

Yes, we ended up doing some refactoring for the upcoming v1.11 (currently available as a nightly pre-release) and it should resolve the likely underlying problem. So if you haven't checked it out yet, give it a try and see how you go :slightly_smiling_face:

1 Like

Hello,

As the v1.11 prodigy version only works with spacy V3, do you have a work around for the instant_submit issue that shows "no tasks available" at the end of the first batch?

Thank you for your help.

We don't currently backport individual fixes to older versions because they're often more complex so maintaining multiple versions in parallel can get pretty messy. However, the data format of Prodigy hasn't changed so in theory, you should be able to run different versions of Prodigy in different environments in parallel, annotate with a newer version and use the old version to export annotations for training with spaCy v2 (e.g. using data-to-spacy) until you're ready to upgrade to v3.