When correcting the annotation on a number of large images, there is a substantial loading time because it looks like (based on the recipe here Using Prodigy to train a new Computer Vision object detection model) the prefer_uncertain function pulls all (or a large number of images) through the loading and prediction pipeline. I wrote a small wrapper function for the sorters called local_sorter to only operate on small minibatches. Presumably there is a better way to adjust the ‘batch size’? It didn’t appear in PRODIGY_README.html#sorters and all of the doc strings have been stripped out. Setting “batch_size”: 10 in the prodigy.json also does not seem to effect the results. I ask because for the model I am running it is very time-consuming and it would be nice to run on fewer samples, particularly during development.
def local_wrapper(sorter_func, n = 10):
def _new_sorter(in_stream):
for first_ele in in_stream:
m_batch = [first_ele]+[x for _, x in zip(range(n), in_stream)]
for z in sorter_func(m_batch):
yield z
return _new_sorter