There is the progress bar for the teach method, but it would be great to be able to see how many samples are remaining to label. Since Prodigy expects a generator, the user can calculate how many samples inside the custom recipe.
Then maybe in the return config, we can return
...
n_samples = ... # calculate total number of possible labels
return {
...
'n_samples': n_samples,
...
}
And it shows in the sidebar and decrement for each label.
This would be helpful since some commands like teach will return a subset of the data, and also every time we reload the page, it pops off a batch of data so it’s hard to estimate how many samples are left.
Thanks, that’s a good idea! We might actually be able to handle this pretty conveniently via the stream. In the active learning-powered recipe, the progress is calculated based on the loss, but in all other cases, Prodigy also checks if the original stream returned by the recipe exposes a __len__. If so, this is used to calculate the progress (and could then also be used to expose the number of remaining examples).
Since Prodigy only checks for the __len__, this would also allow exposing a stream length for generators, e.g. like this:
class StreamWrapper(object):
def __init__(self, stream, total):
self.stream = stream
self.total = total
def __iter__(self):
yield from self.stream
def __len__(self):
return self.total
stream = StreamWrapper(stream, 123456)
A solution like that could be useful if you want to stream in a large number of examples, but you do know the number upfront (e.g. if you’re reading them from a database or something).