Hi,
I am using a custom recipe for multi-label text classification.
But I am getting issue "No tasks available" after few annotations.
Only works if I restart the instance server.
Using prodigy version: 1.10.8
import prodigy
from prodigy.components.loaders import JSONL
@prodigy.recipe(
"article_cat",
dataset=("The dataset to save to", "positional", None, str),
file_path=("Path to texts", "positional", None, str),
)
def article_cat(dataset, file_path):
"""Annotate the sentiment of texts using different mood options."""
stream = JSONL(file_path) # load in the JSONL file
stream = add_options(stream) # add options to each task
blocks = [
{"view_id": "html"},
{"view_id": "text"},
{"view_id": "choice", "text": None, "html": None}
]
return {
"dataset": dataset, # save annotations in this dataset
"view_id": "blocks", # set the view_id to "blocks"
"stream": list(stream),
"config": {
"blocks": blocks, # add the blocks to the config
}
}
def add_options(stream):
#Helper function to add options to every task in a stream
options = [
{"id": "1", "text": "A"},
{"id": "2", "text": "B"},
{"id": "3", "text": "C"},
{"id": "4", "text": "D"}
]
#I few more labels
for task in stream:
task["options"] = options
yield task
I am not using multi-user.
however, running 5 different prodigy instances with the same recipe.py script and different input data files( output datasets as well), on same machine.
Here is my prodigy json.
This all looks reasonable! Just one quick comment: the auto_count_stream and total_examples_target settings were both only introduced in v1.11, so they won't have any effect in v1.10. So if you want to use them, you should upgrade to v1.11 – if you can, this would be interesting to try in a separate environment to see if it solves the problem you're seeing.
I've tried out your recipe with the same settings and some random data file and I can't seem to reproduce the problem Some things to check on your end:
What's in the input JSONL files? Do they contain duplicates? How many examples are in them? Do you see "No tasks available" at the beginning of the file or do you actually hit the end? (Maybe you want to set "force_stream_order": true so that refreshing the browser doesn't request the next batch? This only makes sense if you only have one user per instance, though.)
Since you're running multiple instances, do you have enough memory?
Can you share some more details on the data you're using? How large is your data file and did you confirm that it still includes examples that are not yet present in your dataset? Does you data contain duplicates that could be excluded?