Feature Request: Number of samples remaining


There is the progress bar for the teach method, but it would be great to be able to see how many samples are remaining to label. Since Prodigy expects a generator, the user can calculate how many samples inside the custom recipe.

Then maybe in the return config, we can return

n_samples = ...  # calculate total number of possible labels

return {
    'n_samples': n_samples,

And it shows in the sidebar and decrement for each label.

This would be helpful since some commands like teach will return a subset of the data, and also every time we reload the page, it pops off a batch of data so it’s hard to estimate how many samples are left.

Thanks, that’s a good idea! We might actually be able to handle this pretty conveniently via the stream. In the active learning-powered recipe, the progress is calculated based on the loss, but in all other cases, Prodigy also checks if the original stream returned by the recipe exposes a __len__. If so, this is used to calculate the progress (and could then also be used to expose the number of remaining examples).

Since Prodigy only checks for the __len__, this would also allow exposing a stream length for generators, e.g. like this:

class StreamWrapper(object):
    def __init__(self, stream, total):
        self.stream = stream
        self.total = total

    def __iter__(self):
        yield from self.stream

    def __len__(self):
        return self.total
stream = StreamWrapper(stream, 123456)

A solution like that could be useful if you want to stream in a large number of examples, but you do know the number upfront (e.g. if you’re reading them from a database or something).

1 Like

This worked too :slight_smile:

stream = list(stream)