No tasks available in prodigy==1.11.8 when batch_size=1, instant_submit=True but there should be tasks available

I want to have multiple annotators labelling NER data with each sample seen only once by a person (2 annotators won't see the same sample), I am using prodigy==1.11.8, and this is my .prodigy/prodigy.json to avoid duplicates (2 person label the same example)

{
    "feed_overlap": false,
    "exclude_by": "input",
    "batch_size": 1,
    "instant_submit": true
}

So I followed the example on https://prodi.gy/docs, and start the labelling process via

PRODIGY_LOGGING=basic prodigy ner.manual ner_news_headlines blank:en ./annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl --label PERSON,ORG,PRODUCT,LOCATION

However after only labelling 5% of the data in this toy example, no matter how I refresh the page (for each named multiuser session, as well as vanilla default session without a session name), the page shows "No tasks available", which is not true. Screenshots attached:



Even after I Ctrl-C stop the labelling script, and restart it, the same problem persist, I simply cannot label on both named and default session, even if I create a new name and use named session, it still show "No tasks available".

Is the aforementioned behavior expected? If not, what is the best way to fix it?

This is the log after I restarted the script, but still see "No tasks available" on all named and unnamed default sessions:

root@junwang-ec2:/workspaces/multitask-llm-rnd/labelling/tutorial/ner# PRODIGY_LOGGING=basic prodigy ner.manual ner_news_headlines blank:en ./annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl --label PERSON,ORG,PRODUCT,LOCATION
17:13:08: INIT: Setting all logging levels to 20
17:13:09: RECIPE: Calling recipe 'ner.manual'
Using 4 label(s): PERSON, ORG, PRODUCT, LOCATION
17:13:09: RECIPE: Starting recipe ner.manual
17:13:09: RECIPE: Annotating with 4 labels
17:13:09: LOADER: Using file extension 'jsonl' to find loader
17:13:09: LOADER: Loading stream from jsonl
17:13:09: LOADER: Rehashing stream
17:13:09: CONFIG: Using config from global prodigy.json
17:13:09: VALIDATE: Validating components returned by recipe
17:13:09: CONTROLLER: Initialising from recipe
17:13:09: VALIDATE: Creating validator for view ID 'ner_manual'
17:13:09: VALIDATE: Validating Prodigy and recipe config
17:13:09: CONFIG: Using config from global prodigy.json
17:13:09: DB: Initializing database SQLite
17:13:09: DB: Connecting to database SQLite
17:13:09: DB: Creating dataset '2023-01-05_17-13-09'
17:13:09: FEED: Initializing from controller
17:13:09: PREPROCESS: Tokenizing examples (running tokenizer only)
17:13:09: FILTER: Filtering duplicates from stream
17:13:09: FILTER: Filtering out empty examples for key 'text'
17:13:09: CORS: initialized with wildcard "*" CORS origins

✨  Starting the web server at http://0.0.0.0:8080 ...
Open the app in your browser and start annotating!

INFO:     Started server process [15713]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:     127.0.0.1:56852 - "GET /?session=hao HTTP/1.1" 200 OK
INFO:     127.0.0.1:56852 - "GET /bundle.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56852 - "GET /project/hao HTTP/1.1" 200 OK
INFO:     127.0.0.1:56838 - "GET /favicon.ico HTTP/1.1" 200 OK
17:13:18: POST: /get_session_questions
17:13:18: CONTROLLER: Getting batch of questions for session: ner_news_headlines-hao
17:13:18: FEED: Finding next batch of questions in stream
17:13:18: FEED: re-adding open tasks to stream
17:13:18: FEED: Stream is empty
17:13:18: FEED: adding tasks from other sessions to ner_news_headlines-hao queue.
17:13:18: FEED: batch of questions requested for session ner_news_headlines-hao: 0
17:13:18: RESPONSE: /get_session_questions (0 examples)
INFO:     127.0.0.1:56852 - "POST /get_session_questions HTTP/1.1" 200 OK
INFO:     127.0.0.1:56852 - "GET /?session=hao2 HTTP/1.1" 200 OK
INFO:     127.0.0.1:56852 - "GET /bundle.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:56852 - "GET /project/hao2 HTTP/1.1" 200 OK
17:13:24: POST: /get_session_questions
17:13:24: CONTROLLER: Getting batch of questions for session: ner_news_headlines-hao2
17:13:24: FEED: Finding next batch of questions in stream
17:13:24: FEED: re-adding open tasks to stream
17:13:24: FEED: adding tasks from other sessions to ner_news_headlines-hao2 queue.
17:13:24: FEED: batch of questions requested for session ner_news_headlines-hao2: 0
17:13:24: RESPONSE: /get_session_questions (0 examples)
INFO:     127.0.0.1:56852 - "POST /get_session_questions HTTP/1.1" 200 OK
INFO:     127.0.0.1:56858 - "GET /favicon.ico HTTP/1.1" 200 OK
INFO:     127.0.0.1:56858 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:56858 - "GET /bundle.js HTTP/1.1" 200 OK
17:13:27: GET: /project
INFO:     127.0.0.1:56858 - "GET /project HTTP/1.1" 200 OK
INFO:     127.0.0.1:56852 - "GET /favicon.ico HTTP/1.1" 200 OK
17:13:27: POST: /get_session_questions
17:13:27: CONTROLLER: Getting batch of questions for session: None
17:13:27: FEED: Finding next batch of questions in stream
17:13:27: FEED: re-adding open tasks to stream
17:13:27: FEED: adding tasks from other sessions to None queue.
17:13:27: FEED: batch of questions requested for session None: 0
17:13:27: RESPONSE: /get_session_questions (0 examples)
INFO:     127.0.0.1:56858 - "POST /get_session_questions HTTP/1.1" 200 OK
INFO:     127.0.0.1:46524 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:46524 - "GET /bundle.js HTTP/1.1" 200 OK
17:13:35: GET: /project
INFO:     127.0.0.1:46524 - "GET /project HTTP/1.1" 200 OK
INFO:     127.0.0.1:46528 - "GET /favicon.ico HTTP/1.1" 200 OK
17:13:36: POST: /get_session_questions
17:13:36: CONTROLLER: Getting batch of questions for session: None
17:13:36: FEED: Finding next batch of questions in stream
17:13:36: FEED: re-adding open tasks to stream
17:13:36: FEED: adding tasks from other sessions to None queue.
17:13:36: FEED: batch of questions requested for session None: 0
17:13:36: RESPONSE: /get_session_questions (0 examples)
INFO:     127.0.0.1:46524 - "POST /get_session_questions HTTP/1.1" 200 OK
INFO:     127.0.0.1:50652 - "GET /?session=hao4 HTTP/1.1" 200 OK
INFO:     127.0.0.1:50652 - "GET /bundle.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:50652 - "GET /project/hao4 HTTP/1.1" 200 OK
17:14:44: POST: /get_session_questions
17:14:44: CONTROLLER: Getting batch of questions for session: ner_news_headlines-hao4
17:14:44: FEED: Finding next batch of questions in stream
17:14:44: FEED: re-adding open tasks to stream
17:14:44: FEED: adding tasks from other sessions to ner_news_headlines-hao4 queue.
17:14:44: FEED: batch of questions requested for session ner_news_headlines-hao4: 0
17:14:44: RESPONSE: /get_session_questions (0 examples)
INFO:     127.0.0.1:50652 - "POST /get_session_questions HTTP/1.1" 200 OK
INFO:     127.0.0.1:50652 - "GET /?session=hao6 HTTP/1.1" 200 OK
INFO:     127.0.0.1:50652 - "GET /bundle.js HTTP/1.1" 200 OK
INFO:     127.0.0.1:50652 - "GET /project/hao6 HTTP/1.1" 200 OK
17:14:46: POST: /get_session_questions
17:14:46: CONTROLLER: Getting batch of questions for session: ner_news_headlines-hao6
17:14:46: FEED: Finding next batch of questions in stream
17:14:46: FEED: re-adding open tasks to stream
17:14:46: FEED: adding tasks from other sessions to ner_news_headlines-hao6 queue.
17:14:46: FEED: batch of questions requested for session ner_news_headlines-hao6: 0
17:14:46: RESPONSE: /get_session_questions (0 examples)
INFO:     127.0.0.1:50652 - "POST /get_session_questions HTTP/1.1" 200 OK

Related to https://support.prodi.gy/t/no-tasks-available-when-batch-size-1-instant-submit-true/6215/2 but not identical, since I changed the prodigy version and config.

So this is my version prodigy==1.11.8a4, this is my config

{
    "feed_overlap": false,
    "batch_size": 10,
    "experimental_feed": true
}

this is my command

PRODIGY_ALLOWED_SESSIONS=hao,hao2 PRODIGY_LOGGING=basic prodigy ner.manual ner_news_headlines blank:en ./annotated_news_headlines-ORG-PERSON-LOCATION-ner.jsonl --label PERSON,ORG,PRODUCT,LOCATION

I get "No tasks available" when there should be, after labelling a few samples, and after restarting the script, sometimes I get new samples and sometimes I don't, but it always end up leading to "No tasks available" when clearly there are some samples left to annotate:


Is the aforementioned behavior expected and how to fix it if not?

hi @wjhtomwjh!

Thanks for your question and welcome to the Prodigy community :wave:

Thank you for your detailed notes. I combined both issues as it's easier for us to manage.

This is great timing. We are very close to releasing a new update for Prodigy which includes several bug fixes for duplicate and missing annotations. I know one specific one was caught for batch_size=1 so this could be it.

Next week, let me check in with our Dev Team and provide your issue. I'll keep you informed when we're getting close to release.

Thanks again for your help!

1 Like

hi @wjhtomwjh!

Hopefully, you saw yesterday's post for Prodigy v1.11.9. We automatically email all active users with the updated versioning.

We fixed a bug that I suspect is what you were running into. If you have feedback or still having issues, I recommend replying directly to our v1.11.9 post as that's where we'll directly handle any issues with it.