How to track block size pre-compression?
I'd like to audit my many-gigabyte data streams by reading arbitrary chunks and verifying that they contain the files I expect. It's easy to snip up the index file to get it to read the blocks I want. Unfortunately, the sizes in the index file refer to the compressed data stream, and it's not obvious how to make a mapping between the recorded sizes and the uncompressed block size.
I could insert a program around the compressor and make it measure the in and out size and record that, but compression is done in parallel and it's not obvious how to reliably correlate a size computed this way with the correct block, as the block number isn't passed around.
Any suggestions? Having a way to pass the block number to a command in the script would make this really simple and should be pretty easy. Even more convenient would be to make the procs pass through the original size (and record both in the index), but that's harder and possibly not well defined for all procs.