There is the progress bar for the
teach method, but it would be great to be able to see how many samples are remaining to label. Since Prodigy expects a generator, the user can calculate how many samples inside the custom recipe.
Then maybe in the return config, we can return
n_samples = ... # calculate total number of possible labels
And it shows in the sidebar and decrement for each label.
This would be helpful since some commands like
teach will return a subset of the data, and also every time we reload the page, it pops off a batch of data so it’s hard to estimate how many samples are left.
Thanks, that’s a good idea! We might actually be able to handle this pretty conveniently via the stream. In the active learning-powered recipe, the progress is calculated based on the loss, but in all other cases, Prodigy also checks if the original stream returned by the recipe exposes a
__len__. If so, this is used to calculate the progress (and could then also be used to expose the number of remaining examples).
Since Prodigy only checks for the
__len__, this would also allow exposing a stream length for generators, e.g. like this:
def __init__(self, stream, total):
self.stream = stream
self.total = total
yield from self.stream
stream = StreamWrapper(stream, 123456)
A solution like that could be useful if you want to stream in a large number of examples, but you do know the number upfront (e.g. if you’re reading them from a database or something).