combine_models - the effect of batch_size?

Hello team!

I have been trying to get the prefer_high_scores sorter to look over the larger window of examples to present the bootstrapped (although relatively rare) texts first. I have came across this thread where
the function combine_models is discussed.

Would setting a higher batch_size to combine model achieve just that? I have found nothing in the docs so maybe it was already deprecated. The goal is simply to get the examples containing the patterns to come up first (now I have to run the rule-based matcher in a separate script to pre-assign the scores, which is not very convenient).

Thank you in advance!
Best wishes,
Jan

Hi! There are two batch sizes here: first, the batch size used to partition the two generators and interleave them, and second, the batch size that Prodigy uses to divide the final stream into batches.

The batch_size on combine_models is less relevant here and mostly used for efficiency. In the end, the predict function still just yields (score, example) tuples. The batch_size setting in Prodigy is what decides about how many examples are fetched from the stream at once, how many are sent to the web app and how many are sent back to the server.

See my reply here for more ideas on how you could solve your problem:

Oh, I see!

Currently, I have compared bootstrapped textcat.teach with textcat.manual stream and since the patterns were not so common, the stream were almost identical. I will try using higher batch-size setting.

Thank you so much!

1 Like