Progress Bar returning 0% and returning to first document in dataset at task completion

Hi,

I'm wondering if this is a bug related to v1.12.0. I just saw some updates on the Changelog, so I will update and test locally to see if the latest patch fixes this.

Multiple annotators have reported seeing the following when they complete a task:

Instead of showing No tasks available on the screen and 100% on the progress bar, it shows 0% and returns to the first document in the dataset. I haven't made any changes to this recipe, but I have installed v1.12.0.

I tested with the latest patch release (v1.12.4), and I'm still seeing some odd behavior with the progress bar and when tasks are completed.

I tried two different recipes and got the following results. Each dataset I used had 10 total documents.

Test #1:
Problem: progress bar starts at 100%:

After I annotated the full set of 10 documents:

Screen Shot 2023-07-24 at 2.31.17 PM

You can still press accept after you've finished annotating the dataset.

Stats (14 annotations, but only 10 documents):

(prodigy) cheyannebaird@Cheyannes-MacBook-Pro:~/posh/datasets/prodigy-datasets/data/annotated$ prodigy stats progress_bar_test-cheyanne

============================== ✨  Prodigy Stats ==============================

Version          1.12.4                        
                  
============================== ✨  Dataset Stats ==============================

Dataset       progress_bar_test-cheyanne
Created       2023-07-24 14:04:37       
Description   None                      
Author        None                      
Annotations   14                        
Accept        14                        
Reject        0                         
Ignore        0   

Test #2:
This tast started at 1%:

Not at 50% when halfway through, but at 10%:

After I've annotated all of the documents:

Screen Shot 2023-07-24 at 2.33.37 PM

Ends with 11 documents annotated instead of 10 (dataset has 10). Says 0% complete:

Error (issue with starlette?)

Task exception was never retrieved
future: <Task finished name='Task-33' coro=<RequestResponseCycle.run_asgi() done, defined at /opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:394> exception=KeyError('_input_hash')>
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 399, in run_asgi
    self.logger.error(msg, exc_info=exc)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 1475, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 1589, in _log
    self.handle(record)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 1598, in handle
    if (not self.disabled) and self.filter(record):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 806, in filter
    result = f.filter(record)
  File "cython_src/prodigy/_util.pyx", line 203, in prodigy._util.ServerErrorFilter.filter
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response
    await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/prodigy/app.py", line 572, in give_answers
    controller.receive_answers(
  File "cython_src/prodigy/core.pyx", line 540, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/core.pyx", line 557, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/core.pyx", line 657, in prodigy.core.Controller._db_add_examples
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/prodigy/components/db.py", line 758, in add_examples
    input_hash=eg[INPUT_HASH_ATTR],
KeyError: '_input_hash'
Task exception was never retrieved
future: <Task finished name='Task-35' coro=<RequestResponseCycle.run_asgi() done, defined at /opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:394> exception=KeyError('_input_hash')>
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 399, in run_asgi
    self.logger.error(msg, exc_info=exc)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 1475, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 1589, in _log
    self.handle(record)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 1598, in handle
    if (not self.disabled) and self.filter(record):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/logging/__init__.py", line 806, in filter
    result = f.filter(record)
  File "cython_src/prodigy/_util.pyx", line 203, in prodigy._util.ServerErrorFilter.filter
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/cors.py", line 92, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/cors.py", line 147, in simple_response
    await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/prodigy/app.py", line 572, in give_answers
    controller.receive_answers(
  File "cython_src/prodigy/core.pyx", line 540, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/core.pyx", line 557, in prodigy.core.Controller.receive_answers
  File "cython_src/prodigy/core.pyx", line 657, in prodigy.core.Controller._db_add_examples
  File "/opt/homebrew/Caskroom/miniforge/base/envs/prodigy/lib/python3.9/site-packages/prodigy/components/db.py", line 758, in add_examples
    input_hash=eg[INPUT_HASH_ATTR],
KeyError: '_input_hash'

Stats:

============================== ✨  Dataset Stats ==============================

Dataset       progress_bar_test_v2-cheyanne
Created       2023-07-24 14:20:00          
Description   None                         
Author        None                         
Annotations   18                           
Accept        18                           
Reject        0                            
Ignore        0 

You could still press the accept button after completing the documents.

This is indeed strange/interesting. We may not be able to do a deep dive on this today, but we'll certainly come back to this during the week. It does seem like there is an _input_hash missing somewhere.

One thing to perhaps check; when you start the recipe ... do you see a validation error appear that warns you that the hashes are not set? At the moment we still try to automatically hash everything, but if an input hash is missing somewhere we need to figure out how that happened.

1 Like

1 more question @cheyanneb, what are you using as your source argument to these tests. What kind of file? Or are you using a dataset?

It would be super helpful if you could share what your prodigy CLI command looks like.

Here is a sample command that I ran locally to test this. .jsonl file.

PRODIGY_ALLOWED_SESSIONS=cheyanne prodigy helpful-banking-moments progress_bar_test /Users/cheyannebaird/posh/progress_bar_tests/progress_bar_test.jsonl -F /Users/cheyannebaird/posh/annotation-service/src/annotator/recipes/helpful_banking_moments.py 

Thanks for sharing! So the way progress works by default now is by looking at the source used. See the new docs for progress calculation here: Components and Functions · Prodigy · An annotation tool for AI, Machine Learning & NLP

The weird progress bar numbers you're seeing are based on what percentage of the source file you're using has been read by the Prodigy Stream. e.g. if you have a batch_size of 10 and you use a source file with 10 lines of JSONL, 100% of the file has been read on batch 1 so hopefully that helps explain some of the weirder progress numbers for you.

We recommend now to use the total_examples_target setting if you want progress calculated by an exact number of annotations saved to the Database.

As for the errors you're seeing, these basically shouldn't happen (you shouldn't run into a case where you're submitting answers without an _input_hash which is what that Starlette error your mentioned is saying). So my intuition is we have a bug with our recent frontend updates that led to that attribute not being sent back to the server and those examples were not marked as done. So they recycled on a subsequent batch.

It's pretty hard to reproduce this specific issue without your actual Recipe and a sample of the data you used here (e.g. the 10 examples you put in this test JSONL file) Is that something you'd be willing to share privately?

Also, am I right that this only comes up at the end of the source file? If I'm reading your first message correctly it seems like your annotators are not blocked until the stream repeats at the end?

1 Like

Yes I can share these privately. Let me know where to send!

you can share to kabir@explosion.ai

Related discussion: total_examples_target pulls the number of docs in the dataset instead of being hard coded - #2 by ryanwesslen

Version 1.12.7 is out with a small fix for this progress issue. Thanks again for the detailed report!