This all looks reasonable! Just one quick comment: the auto_count_stream
and total_examples_target
settings were both only introduced in v1.11, so they won't have any effect in v1.10. So if you want to use them, you should upgrade to v1.11 – if you can, this would be interesting to try in a separate environment to see if it solves the problem you're seeing.
I've tried out your recipe with the same settings and some random data file and I can't seem to reproduce the problem Some things to check on your end:
- What's in the input JSONL files? Do they contain duplicates? How many examples are in them? Do you see "No tasks available" at the beginning of the file or do you actually hit the end? (Maybe you want to set
"force_stream_order": true
so that refreshing the browser doesn't request the next batch? This only makes sense if you only have one user per instance, though.) - Since you're running multiple instances, do you have enough memory?