Update callback into "No Task Available"

Hi !

If I understand correctly, every "batch_size" of annotations, the update callaback is call by Prodigy while another "batch_size" is ask by the client for the annotator to keep annotating without waiting.
Everything work well but my update function (which I use to optimise the hyper-parameters of a model) may take longer to finish than the annotator to annotate the batch of files. When it does there is a "Loading..." screen (no problem with that) but then, when the update end, it says "No Task Available" which is a problem, there are still a lot of files left.
I think if we refresh the page everything may turn out fine, but it doesn't seem very practical, the annotator doesn't know if it's really finished or not.

Is this the right behaviour ? Is there a way to overcome this (without just increasing the batch size or hope it will end in time...) ?

I replace what's inside my update callaback with just a sleep and the same thing happen so I assume it's not due to my function.

Thank you,
Jim

1 Like

Hi! Could you share some more details about your recipe? What does it do and how is the update callback implemented? And how do you have it configured in the config and/or your prodigy.json? Also, which version of Prodigy are you running?

Hi !
I run the version 1.10.6.
This recipe is used to annotate chunks of audio with a voice activity detection model in the loop.

My json file look like these :

{
  "theme": "basic",
  "custom_theme": {},
  "buttons": ["accept", "reject", "ignore", "undo"],
  "batch_size": 3,
  "history_size": 3,
  "port": 8080,
  "host": "localhost",
  "cors": true,
  "db": "sqlite",
  "db_settings": {},
  "validate": true,
  "auto_exclude_current": true,
  "instant_submit": false,
  "feed_overlap": false,
  "auto_count_stream": false,
  "total_examples_target": 0,
  "ui_lang": "en",
  "project_info": ["dataset", "session", "lang", "recipe_name", "view_id", "label"],
  "show_stats": false,
  "hide_meta": false,
  "show_flag": false,
  "instructions": false,
  "swipe": false,
  "swipe_gestures": { "left": "accept", "right": "reject" },
  "split_sents_threshold": false,
  "html_template": false,
  "global_css": null,
  "javascript": null,
  "writing_dir": "ltr",
  "show_whitespace": false,
  "exclude_by": "task"
}

The config for the receipe :

@recipe(
    "audio.test",
    dataset=("Dataset to save annotations to", "positional", None, str),
    source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
    chunk=("split long audio files into shorter chunks of that many seconds each","option",None,float),
    loader=("Loader to use", "option", "lo", str),
    keep_base64=("If 'audio' loader is used: don't remove base64-encoded data from the data on save", "flag", "B", bool),
    autoplay=("Autoplay audio when a new task loads", "flag", "A", bool),
    fetch_media=("Convert URLs and local paths to data URIs", "flag", "FM", bool),
    exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
)
def test(
    dataset: str,
    source: Union[str, Iterable[dict]],
    loader: Optional[str] = "audio",
    chunk: float = 10.0,
    autoplay: bool = False,
    keep_base64: bool = False,
    fetch_media: bool = False,
    exclude: Optional[List[str]] = None,
) -> Dict[str, Any]:
  
    label=['Speech']

    return {
        "view_id": "audio_manual",
        "dataset": dataset,
        "stream": sad_manual_stream(pipeline, source, chunk=chunk),
        "before_db": remove_base64 if not keep_base64 else None,
        "exclude": exclude,
        "update": update,
        "config": {
            "labels": label,
            "audio_autoplay": autoplay,
            "force_stream_order": False,
            "show_audio_minimap": False,
        },
    }

And I try with this update function:

def update(answers):
    prodigy.log('---- UPDATE ----')
    time.sleep(10)
    prodigy.log('----   END  ----')

The sad_manual_stream take a voice activity detection pipeline and the path of audio files, it loop in the path for all audio file, then cuts audios by chunk and yield the differents informations (made with pyannote-audio http://pyannote.github.io/ ). It works fine without the update callback. But with this one (or the real one), it's stop with "No tasks available." after the update end. Here is the last logs:

INFO:     127.0.0.1:51017 - "POST /get_session_questions HTTP/1.1" 200 OK                                                                                                                                                                
14:02:25: POST: /give_answers (received 3)                                                                                                                                                                                                   
14:02:25: CONTROLLER: Receiving 3 answers                                                                                                                                                                                                    
14:02:25: ---- UPDATE ----                                                                                                                                                                                                                   
14:02:35: ----   END  ----                                                                                                                                                                                                                      
14:02:35: DB: Getting dataset '2021-09-22_13-58-16'                                                                                                                                                                                          
14:02:35: DB: Getting dataset 'test'                                                                                                                                                                                                   
14:02:35: DB: Getting dataset '2021-09-22_13-58-16'                                                                                                                                                                                          
14:02:35: DB: Added 3 examples to 2 datasets                                                                                                                                                                                                 
14:02:35: CONTROLLER: Added 3 answers to dataset 'test' in database SQLite                                                                                                                                                             
14:02:35: RESPONSE: /give_answers                                                                                                                                                                                                            
INFO:     127.0.0.1:51017 - "POST /give_answers HTTP/1.1" 200 OK

I try to replace the stream with get_stream from prodigy.components.loaders (stream = get_stream(source, loader=loader, dedup=True, rehash=True)), and I get "No tasks available." after the same kind of log (CONTROLLER: Receiving,... ), so I don't really now what cause this, maybe the configuration...

Jim

For a example, with this simple recipe that yield a random string in a while True :

from typing import List, Optional, Union, Iterable, Dict, Any
import prodigy
from prodigy.core import recipe
import time
import string
import random

def update(answers):
    print('start update')
    time.sleep(15)
    print('end update')

def dummystream() -> Iterable[Dict]:
    while True:
        letters = string.ascii_lowercase
        t = ''.join(random.choice(letters) for i in range(10))
        yield {
          "text" : t,
        }

@recipe(
    "simple",
    dataset=("Dataset to save annotations to", "positional", None, str),
)

def simple(
    dataset: str,
    loader: Optional[str] = "text",
) -> Dict[str, Any]:

    stream = dummystream()

    return {
        "view_id": "classification",
        "dataset": dataset,
        "stream": stream,
        "update":update,
    }

If the update callback has not finished before the annotator has finished annotating the batch it will show "no task available" after the loading.

Jim

Hi @ines,

Jumping in as I am working closely with @Jpetiot on this project.

In short, we noticed that when update takes a long time to complete (i.e. longer than it takes to empty the tasks queue), Prodigy thinks that there is no longer any task to perform and displays "No task available" to the user and stays like that even after update eventually completes.

Maybe this is by design (is it?) but what we think should happen is one of those two options:

  1. display a "Loading..." message until update completes, then fetch new tasks from the stream and send them for annotation
  2. run update asynchronously so that it does not prevent stream from generating new tasks -- though we do foresee potential issues due to calling a second update when the first one still has not completed...

What do you think?

"Why does yourupdate takes so long to complete? ", you may ask.
We are trying to fine-tune pyannote.audio voice activity detection model every time a new batch of annotations is completed so that we can quickly adapt to the domain (e.g. background noise, language, you name it) of the data currently being annotated (hence improve automatic pre-annotation, hence speed up the annotation process).

Let us know if you think of a better way to do that! Thanks!