test: optimize storage tests RAM usage through the use of Blober and math/rand/v2
What does this MR do?
This MR optimizes the RAM usage of the storage drivers test suite in order to resolve issues in #1482 (closed). It builds upon the changes made in:
!2034 (merged) !2035 (merged) !2036 (merged) !2037 (merged) !2038 (merged)
Improvements:
- using Blobber to limit memory allocations using the pre-allocated 2GiB of random data for all blobs (was not the case before),hence we allocate memory only once and limit pressure on the Garbadge-Collector
- optimized memory usage in few places by directly passing the random number generator (RNG) instead of byte slices
- prioritize usage of
ReaderFromandWriterTowhich minimize use of intermediate memory and optimize CPU usage/tests execution time. - fixed potential test reliability/validation issues where blobs stored in the backend were being overwritten with identical data, resulting in no effective changes in the storage.
- improved test debugging in TestWriteReadLargeStreams:
- replaced checksum comparison with byte-by-byte stream comparison, so now we can actually see the differences
- tests are now printing seed used to initialize RNG, which will enable us to debug issues like #1470 (closed)
The memory usage of the current master is:
$ go tool pprof mem.prof
File: azure.test
Type: alloc_space
Time: Jan 13, 2025 at 2:10pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 7292.42MB, 99.46% of 7332.22MB total
Dropped 235 nodes (cum <= 36.66MB)
Showing top 10 nodes out of 11
flat flat% sum% cum cum%
6140.37MB 83.75% 83.75% 6140.87MB 83.75% io.ReadAll
1024MB 13.97% 97.71% 1024MB 13.97% github.com/docker/distribution/registry/storage/driver/testsuites.randomContents.func1
128.04MB 1.75% 99.46% 128.04MB 1.75% bufio.NewWriterSize (inline)
0 0% 99.46% 128.54MB 1.75% github.com/docker/distribution/registry/storage/driver/azure/v2.(*driver).Writer
0 0% 99.46% 128.04MB 1.75% github.com/docker/distribution/registry/storage/driver/azure/v2.(*driver).newWriter (inline)
0 0% 99.46% 128.04MB 1.75% github.com/docker/distribution/registry/storage/driver/base.(*Base).Writer
0 0% 99.46% 7297.93MB 99.53% github.com/docker/distribution/registry/storage/driver/testsuites.(*DriverSuite).TestConcurrentFileStreams.func1
0 0% 99.46% 7298.43MB 99.54% github.com/docker/distribution/registry/storage/driver/testsuites.(*DriverSuite).testFileStreams
0 0% 99.46% 1024MB 13.97% github.com/docker/distribution/registry/storage/driver/testsuites.randomContents
0 0% 99.46% 1029.26MB 14.04% sync.(*Once).Do (inline)
and with these changes:
$ go tool pprof azure.test mem.prof
File: azure.test
Type: alloc_space
Time: Jan 20, 2025 at 5:45pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 3304.88MB, 95.57% of 3458.18MB total
Dropped 339 nodes (cum <= 17.29MB)
Showing top 10 nodes out of 99
flat flat% sum% cum cum%
2048MB 59.22% 59.22% 2048MB 59.22% github.com/docker/distribution/testutil.NewBlobberFactory
510.55MB 14.76% 73.99% 510.55MB 14.76% bytes.growSlice
249.54MB 7.22% 81.20% 755.78MB 21.85% github.com/docker/distribution/testutil.CreateRandomTarFile
236.08MB 6.83% 88.03% 236.08MB 6.83% bufio.NewWriterSize
182.62MB 5.28% 93.31% 182.62MB 5.28% io.ReadAll
35.07MB 1.01% 94.32% 545.12MB 15.76% io.copyBuffer
23.52MB 0.68% 95.00% 23.52MB 0.68% github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/exported.getWeightTables
9.50MB 0.27% 95.28% 17.50MB 0.51% encoding/xml.(*Decoder).Token
5.50MB 0.16% 95.44% 26.50MB 0.77% encoding/xml.(*Decoder).unmarshal
4.50MB 0.13% 95.57% 115.73MB 3.35% github.com/Azure/azure-sdk-for-go/sdk/azcore/internal/exported.(*Request).Next
so we have reduced the memory usage x2/by 3.6GiB. Further reduction of the memory requires refactoring how we handle temporary Tar files. We can't really drop pre-allocating the 2GiB memory in the tests though, as we need a big continuos blob of memory to exercise chunking in the drivers (i.e. we would allocate this memory anyway).
Related to #1482 (closed)
Author checklist
- Assign one of conventional-commit prefixes to the MR.
-
fix: Indicates a bug fix, triggers a patch release. -
feat: Signals the introduction of a new feature, triggers a minor release. -
perf: Focuses on performance improvements that don't introduce new features or fix bugs, triggers a patch release. -
docs: Updates or changes to documentation. Does not trigger a release. -
style: Changes that do not affect the code's functionality. Does not trigger a release. -
refactor: Modifications to the code that do not fix bugs or add features but improve code structure or readability. Does not trigger a release. -
test: Changes related to adding or modifying tests. Does not trigger a release. -
chore: Routine tasks that don't affect the application, such as updating build processes, package manager configs, etc. Does not trigger a release. -
build: Changes that affect the build system or external dependencies. May trigger a release. -
ci: Modifications to continuous integration configuration files and scripts. Does not trigger a release. -
revert: Reverts a previous commit. It could result in a patch, minor, or major release.
-
-
Feature flags
-
This change does not require a feature flag -
Added feature flag: ( Add the Feature flag tracking issue link here )
-
- Unit-tests
-
Unit-tests are not required -
I added unit tests
-
- Documentation:
-
Documentation is not required -
I added documentation -
I created or linked to an existing issue for every added or updated TODO,BUG,FIXMEorOPTIMIZEprefixed comment
-
-
database changes including schema/background migrations:
-
Change does not introduce database changes - MR includes DB chagnes
- Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
-
Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here. -
If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison. -
I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
-
-
If adding new background migration, follow the guide for performance testing new background migrations and add a report/summary to the MR with your analysis.
-
-
Ensured this change is safe to deploy to individual stages in the same environment ( cny->prod). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways. -
If the change contains a breaking change, apply the breaking change label. -
If the change is considered high risk, apply the label high-risk-change - Changes cannot be rolled back
-
Change can be safelly rolled back - Change can't be safelly rolled back
-
Apply the label cannot-rollback. -
Add a section to the MR description that includes the following details: -
The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure) -
Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.) -
Ensure this MR does not add code that depends on these changes that cannot be rolled back.
-
-
-
Reviewer checklist
-
Ensure the commit and MR tittle are still accurate. -
If the change contains a breaking change, verify the breaking change label. -
If the change is considered high risk, verify the label high-risk-change -
Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).
Edited by João Pereira
