"No tasks available" on page refresh

sean · December 23, 2017, 8:23am

Using the sentiment example recipe, when I refresh the page or open it in another tab I get the message “No tasks available” even though I have not labeled anything.

Steps to reproduce:

Run recipe with code and example data provided in the docs.
Open Prodigy in browser.
Refresh or open Prodigy in a new tab, and you’ll see the message “No tasks available”.

This seems to persist until I restart the server again.

ines · December 23, 2017, 8:29am

This is likely because the example only includes 3 annotation tasks, i.e. only one batch. When you load Prodigy for the first time, it’ll fetch the first batch from the server. On second load, there’s no second batch available anymore, so you see the “No tasks available” message.

To avoid this and make Prodigy keep yielding tasks until they’re all annotated, you can wrap the streaming logic in a while True loop:

def get_stream(stream):
    while True:
        for task in stream:
            yield task

Maybe this should be mentioned somewhere in the docs – I left it out of that particular example to keep it simple and not distract from the other, more important aspects of the workflow.

MajidD4t1qbit · December 27, 2018, 7:05pm

I have the same exact issue with image.manual but the provided snippet does not solve the problem for me (unless I'm using it wrong). When the page is refreshed a few times and it says "No tasks available", the input stream to this function (which would come from the preprocess.fetch_images or loader.get_stream in case of image.manual) actually has reached to its end. This means that get_stream in the snippet, just keeps looping in the while True loop.

Now, if I change the snippet to re-create the stream when it's fully traversed as follows, it does not stop even after all of the images are traversed.

def wrap_stream(source, api, loader):
    while True:
        stream = get_stream(source, api=api, loader=loader, input_key='image')
        stream = fetch_images(stream)
        for task in stream:
            yield task

Still trying to fix this, but also would appreciate any pointers.

ines · December 27, 2018, 7:26pm

Ah yes – sorry, I forgot to actually add logic to refresh the stream generator.

At any time in your loop, you can break – but you have to decide when the stream is actually "done". When you're sending out the new questions, you don't always know what answers were "lost" and what answers are still being answered. Maybe the annotator is just taking a while and still has the questions in the queue – in that case, you might not want to send them out again immediately.

A very basic solution would be to check the hashes of the incoming examples against the hashes already present in the current dataset and break if all hashes are covered. You might find this thread useful, which explains a lot of this in more detail and has some examples:

MajidD4t1qbit · December 27, 2018, 8:19pm

Thank you for the quick response and the reference to the thread, very informative!

The approach of checking the hashes in the incoming stream against the hashes in the dataset makes sense to me. However, there is still one confusion:

I’m using prodigy.components.loaders.get_stream to create my stream. get_stream starts from the unannotated examples when the prodigy process is first started (meaning that it does not show the already annotated images again when I restart the process). That’s why I was thinking that it already includes the hash checking logic inside. Is that actually the case?

Because of that behaviour, I was hoping that by simply recreating the stream by calling get_stream, I’ll be able to fetch the unannotated examples. What am I missing here?

ines · December 27, 2018, 9:59pm

Btw, one small thing I forgot to mention: It's probably not viable to call db.get_task_hashes (and make a request to the database) within the for loop, e.g. for each task. So you might want to add logic that only calls it every X seconds or every X tasks.

You could also move the hash checking logic into the update callback of your recipe that's executed every time new answers are received (before they are stored in the database). The stream generator can respond to external state – for example, a global IS_DONE that is set to True by your update callback as soon as the new incoming hashes + the existing hashes = all hashes. In that case, the stream loop would break and stop sending out questions. With a low batch_size, you should get very little to no overlap here.

By default, it doesn't – but you should be able to set rehash=True when calling get_stream!

Topic		Replies	Views
Losing tasks while reloading page. usage	2	700	October 15, 2018
Get 'no task' before all annotation finished usage , ner	3	1234	June 18, 2019
Re-populate stream with un annotated usage	1	576	August 7, 2019
ordered tasks on "mark" receipe usage , custom	11	2133	May 5, 2020
"Refreshing" the stream of examples usage , solved	6	1799	October 23, 2018

"No tasks available" on page refresh

Related topics