test: optimize storage tests RAM usage through the use of Blober and math/rand/v2

What does this MR do?

This MR optimizes the RAM usage of the storage drivers test suite in order to resolve issues in #1482 (closed). It builds upon the changes made in:

!2034 (merged) !2035 (merged) !2036 (merged) !2037 (merged) !2038 (merged)

Improvements:

  • using Blobber to limit memory allocations using the pre-allocated 2GiB of random data for all blobs (was not the case before),hence we allocate memory only once and limit pressure on the Garbadge-Collector
  • optimized memory usage in few places by directly passing the random number generator (RNG) instead of byte slices
  • prioritize usage of ReaderFrom and WriterTo which minimize use of intermediate memory and optimize CPU usage/tests execution time.
  • fixed potential test reliability/validation issues where blobs stored in the backend were being overwritten with identical data, resulting in no effective changes in the storage.
  • improved test debugging in TestWriteReadLargeStreams:
    • replaced checksum comparison with byte-by-byte stream comparison, so now we can actually see the differences
    • tests are now printing seed used to initialize RNG, which will enable us to debug issues like #1470 (closed)

The memory usage of the current master is:

$ go tool pprof mem.prof 
File: azure.test
Type: alloc_space
Time: Jan 13, 2025 at 2:10pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 7292.42MB, 99.46% of 7332.22MB total
Dropped 235 nodes (cum <= 36.66MB)
Showing top 10 nodes out of 11
      flat  flat%   sum%        cum   cum%
 6140.37MB 83.75% 83.75%  6140.87MB 83.75%  io.ReadAll
    1024MB 13.97% 97.71%     1024MB 13.97%  github.com/docker/distribution/registry/storage/driver/testsuites.randomContents.func1
  128.04MB  1.75% 99.46%   128.04MB  1.75%  bufio.NewWriterSize (inline)
         0     0% 99.46%   128.54MB  1.75%  github.com/docker/distribution/registry/storage/driver/azure/v2.(*driver).Writer
         0     0% 99.46%   128.04MB  1.75%  github.com/docker/distribution/registry/storage/driver/azure/v2.(*driver).newWriter (inline)
         0     0% 99.46%   128.04MB  1.75%  github.com/docker/distribution/registry/storage/driver/base.(*Base).Writer
         0     0% 99.46%  7297.93MB 99.53%  github.com/docker/distribution/registry/storage/driver/testsuites.(*DriverSuite).TestConcurrentFileStreams.func1
         0     0% 99.46%  7298.43MB 99.54%  github.com/docker/distribution/registry/storage/driver/testsuites.(*DriverSuite).testFileStreams
         0     0% 99.46%     1024MB 13.97%  github.com/docker/distribution/registry/storage/driver/testsuites.randomContents
         0     0% 99.46%  1029.26MB 14.04%  sync.(*Once).Do (inline)

and with these changes:

$ go tool pprof azure.test mem.prof
File: azure.test
Type: alloc_space
Time: Jan 20, 2025 at 5:45pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3304.88MB, 95.57% of 3458.18MB total
Dropped 339 nodes (cum <= 17.29MB)
Showing top 10 nodes out of 99
      flat  flat%   sum%        cum   cum%
    2048MB 59.22% 59.22%     2048MB 59.22%  github.com/docker/distribution/testutil.NewBlobberFactory
  510.55MB 14.76% 73.99%   510.55MB 14.76%  bytes.growSlice
  249.54MB  7.22% 81.20%   755.78MB 21.85%  github.com/docker/distribution/testutil.CreateRandomTarFile
  236.08MB  6.83% 88.03%   236.08MB  6.83%  bufio.NewWriterSize
  182.62MB  5.28% 93.31%   182.62MB  5.28%  io.ReadAll
   35.07MB  1.01% 94.32%   545.12MB 15.76%  io.copyBuffer
   23.52MB  0.68% 95.00%    23.52MB  0.68%  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/exported.getWeightTables
    9.50MB  0.27% 95.28%    17.50MB  0.51%  encoding/xml.(*Decoder).Token
    5.50MB  0.16% 95.44%    26.50MB  0.77%  encoding/xml.(*Decoder).unmarshal
    4.50MB  0.13% 95.57%   115.73MB  3.35%  github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next

so we have reduced the memory usage x2/by 3.6GiB. Further reduction of the memory requires refactoring how we handle temporary Tar files. We can't really drop pre-allocating the 2GiB memory in the tests though, as we need a big continuos blob of memory to exercise chunking in the drivers (i.e. we would allocate this memory anyway).

In case you are curious: mem_profile

Related to #1482 (closed)

Author checklist

  • Assign one of conventional-commit prefixes to the MR.
    • fix: Indicates a bug fix, triggers a patch release.
    • feat: Signals the introduction of a new feature, triggers a minor release.
    • perf: Focuses on performance improvements that don't introduce new features or fix bugs, triggers a patch release.
    • docs: Updates or changes to documentation. Does not trigger a release.
    • style: Changes that do not affect the code's functionality. Does not trigger a release.
    • refactor: Modifications to the code that do not fix bugs or add features but improve code structure or readability. Does not trigger a release.
    • test: Changes related to adding or modifying tests. Does not trigger a release.
    • chore: Routine tasks that don't affect the application, such as updating build processes, package manager configs, etc. Does not trigger a release.
    • build: Changes that affect the build system or external dependencies. May trigger a release.
    • ci: Modifications to continuous integration configuration files and scripts. Does not trigger a release.
    • revert: Reverts a previous commit. It could result in a patch, minor, or major release.
  • Feature flags
    • This change does not require a feature flag
    • Added feature flag: ( Add the Feature flag tracking issue link here )
  • Unit-tests
    • Unit-tests are not required
    • I added unit tests
  • Documentation:
  • database changes including schema/background migrations:
    • Change does not introduce database changes
    • MR includes DB chagnes
      • Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
      • Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here.
      • If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison.
        • I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
      • If adding new background migration, follow the guide for performance testing new background migrations and add a report/summary to the MR with your analysis.
  • Ensured this change is safe to deploy to individual stages in the same environment (cny -> prod). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.
  • If the change contains a breaking change, apply the breaking change label.
  • If the change is considered high risk, apply the label high-risk-change
  • Changes cannot be rolled back
    • Change can be safelly rolled back
    • Change can't be safelly rolled back
      • Apply the label cannot-rollback.
      • Add a section to the MR description that includes the following details:
        • The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure)
        • Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.)
        • Ensure this MR does not add code that depends on these changes that cannot be rolled back.
Documentation/resources

Code review guidelines

Go Style guidelines

Reviewer checklist

  • Ensure the commit and MR tittle are still accurate.
  • If the change contains a breaking change, verify the breaking change label.
  • If the change is considered high risk, verify the label high-risk-change
  • Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).
Edited by João Pereira

Merge request reports

Loading