I think there are several ways to make this work and it depends on the details of your use case. When you load in your files, do they show up in alphabetical order? (Under the hood, Prodigy calls path.iterdir, which I thought was alphabetic – but turns out that depends on your operating system).
Instead of loading in a directory of images, you could also load in a JSONL file (and set --loader jsonl when you start image.manual). If an image task specifies a "text", that's going to be used in the history pane in the sidebar. So you could use that to give the history entries more readable names, and make sure the files are presented in the right order:
You can also include other metadata in the tasks – e.g. "group": 1. This will be passed through with the data and saved in the database (and you can later use that to more easily filter your annotations).
One thing to note about the history that it will always show you the 10 most recently edited examples (which makes sense). So if an annotator clicks on an older annotation to change it, it will show up on top, because it was most recently edited. So you couldn't rely on the history always showing the original group order.
Datasets in Prodigy are what annotations are saved to – to load in data, you just stream it in from a directory, a file or a Python script. So you can set up your data like I described above and stream in page by page from multiple PDFs in order, however you like. That's a pretty standard use case.
I'm not 100% sure I understand what the problem is or what's still missing. If you stream in your pages in order, the annotator can work on them and if they need to go back to a previous page, they can go back, check something or make a correction, and then move on.
The history shows the most recently edited or created annotations, in chronological order. This is typically what's expected from the annotation history – if it wasn't showing you the annotations in "historical" order, it would be pretty confusing. The number of items kept in the history are examples that haven't yet been sent back to the server – so making the history longer would mean delaying sending examples back (see my comment here for details).