Sorter Batch Size / Local Sorters?

When correcting the annotation on a number of large images, there is a substantial loading time because it looks like (based on the recipe here Using Prodigy to train a new Computer Vision object detection model) the prefer_uncertain function pulls all (or a large number of images) through the loading and prediction pipeline. I wrote a small wrapper function for the sorters called local_sorter to only operate on small minibatches. Presumably there is a better way to adjust the ‘batch size’? It didn’t appear in PRODIGY_README.html#sorters and all of the doc strings have been stripped out. Setting “batch_size”: 10 in the prodigy.json also does not seem to effect the results. I ask because for the model I am running it is very time-consuming and it would be nice to run on fewer samples, particularly during development.

def local_wrapper(sorter_func, n = 10):
    def _new_sorter(in_stream):
        for first_ele in in_stream:
            m_batch  = [first_ele]+[x for _, x in zip(range(n), in_stream)]
            for z in sorter_func(m_batch):
                yield z
    return _new_sorter

Ah, cool to see that you're trying the image recipe!

The prefer_uncertain sorter has an initial "warm up" period during which it's conditioning the moving averages. The size of the pre-batch is defined in the first_n attribute (currently 64 and not exposed as an argument – but we can easily fix that!). In the meantime, you should be able to simply overwrite the first_n after the sorter is initialised:

sorted_stream = prefer_uncertain(stream)
sorted_stream.first_n = 10

The initial pre-batching is the only batching the sorter will do – after that it will simply yield out the examples, given they meet the threshold. I guess a pre-batch of 64 examples was slightly more optimised for working with text – for images, it definitely makes sense to adjust that.

Damn, this shouldn't be happening! I spent so much time writing nice docstrings for the internals, so you can call help on them if you need/want to :sweat: Will check our compiler settings and hopefully fix that for the next release!

1 Like