Progress bar shows 100% but still getting the text for labelling

Hi,

Please find my labelling details.

image

Yes, 18687 is due to working of "annotations_per_task". I' am happy on this.

Just a split among 10 annotators.

image

Now the query starts here.

Actually, annotators will get the text for labelling until it satisfies the condition "annotation_per_task: 3". Meanwhile the progress bar in left side pan showing as 100%. What does this 100% meant to? Below screenshot for your reference.

Thanks,
Vinoth Kumar S

hi @Vinoth,

That's an excellent question. We have docs that describe more of the progress, especially after more recent updates. I highlighted in bold likely the issue:

Under the hood, Prodigy has a few abstractions to separate the different concerns of the application.

Diagram of different abstractions

  1. The Controller makes sure that all the settings are parsed correctly and can be seen as the central object that deligates tasks to other objects.

  2. The Session represents a user session during which data will get annotated. If multiple annotators are annotating then they can be identified by the name that is passed via /?session=. Anynomous annotations will get assigned a session that is denoted by the creation timestamp.

  3. The Stream represents the stream of data that still needs to be annotated. The stream maintains a queue of examples for each session. This way, we have a single place where task routing might occur.

  4. The Source represents the original data source for the stream. This may refer to a file on disk but can also refer to a set of files or to a Prodigy dataset.

The Source object can represent a file on disk together with a read position. This file will have a known size upfront. That means that whenever a new block of data is read from the file the read position can update, allowing us to to estimate the annotation progress. However, because this estimate is only aware of the current position in the Source it is not aware of anything that’s happening in the Stream or Controller objects.

This offers a tractable way to estimate progress, but it has a few consequences that can be unintuitive in multiuser scenarios.

  • The Source-based progress tracks the progress through the input file rather than completed annotations. This means that even when starting a single-annotator flow, there will already be a non-zero progress as some of the file will have been consumed in the process of session initialization.
  • It is possible for one annotator to reach the end of their queue before the rest does. When this happens, the Source will have a position at the end of the file. So the Source -based progress bar will show 100% to every annotator.
  • The Source is unaware of the task router and the current sessions. Depending on the Prodigy configuration it could be possible for new annotators to join at given time. That also means that a person could join and immediately see a significant progress when they arrive at the annotation interface.

To prevent these scenarios, it may be more convenient to configure to set a target via total_examples_target in your prodigy.json file or to use a custom progress callback.

Thanks Ryan,

So, progress bar is purely depends on the data from the Source (which is input to Prodigy labelling).

Even though progress bar shows 100%, still texts are available for further tagging in the stage name Stream.

Am I right?

In my case, still group1 doing the labelling even after 100%. Does this happens normally?

Per the docs, it's likely one of your annotators finished their queue first.

It is possible for one annotator to reach the end of their queue before the rest does. When this happens, the Source will have a position at the end of the file. So the Source-based progress bar will show 100% to every annotator.

If you know how many you want each annotator to annotate, then try to set the target via total_examples_target in your prodigy.json file.

Thanks for the reply.

Yes, you are correct. One of my annotator, finish his/her queue.

Now I got the gist of the progress.

Last question.

Can we update the config file while labelling in progress? If I enabled "allow_work_steal", will this work upcoming labelling in the queue?