Kugelfischer That's basically how original sampling works today. The problem we found was that many overall inefficient file types (like VHDX, for instance) filled in their data sequentially at the start of the file, so we'd sample a bunch, find no compressible data, then give up on a lot more data that would've compressed. You can get around that by sampling a lot more often but it's expensive to do this operation and we hope to find something more cheaply deterministic. We also want to understand the file type so we don't have to waste cycles ever bothering. My ultimate goal would be to have compression be on by default, but I can't until I do this work.
I have a very sharp algorithm dev I'm noodling with on this, it's just a question of priorities for now. We are very busy and have an 'ok' solution for the meantime.