As I understand it it's already bundled in omnibus and behind a flag (@craig-gomes can you confirm \ point us in the direction of how to configure?). If this is the case this will only take a day at most.
Looking to complete this tomorrow. 10k environment is now upgraded and some quick tests is showing results inline with previous. Full run will happen automatically overnight.
The first spike I believe is api_v4_projects_merge_requests a known problem endpoint. It did reduce from 100% to 80% but I've seen this endpoint still spike higher on occassion.
Second spike on 11.x is can be ignored in the comparison context. This is a brand new test for search that is known to cause this spike (wasn't present for the 10.x run but the test causes the same behavior).
Third spike can also be ignored. This is the web_user test mentioned earlier and is unrealistic on this environment.
In conclusion PG11 is performant and is actually a performance gain on the whole. While achieved RPS remained consistent with some specific gains response times have actually improved notable by around 7% median. CPU, Memory and I/O usage also remained consistent.
Will close with the above but happy to reopen as required. Well done to the team for the upgrade, both in terms of the performance and in the super smooth upgrade process in Omnibus.
Do you by chance have the postgres config handy that was used for the test cluster? A SHOW ALL on psql would suffice in case you can get that easily. Also: What is the total memory available in that system? Thanks!
@grantyoung Just wondering, were there any tests run that involved sidekiq esp. import/export as well? It looks like pg11 improved performance by about 7% which is good to hear. We're currently puzzling over a significant slowdown we were seeing in export jobs on staging, and they appeared to coincide with the postgres upgrade, but it might just be coincidence: gitlab-org/gitlab#216540 (closed)
Import wasn't tested as part of this work no as it was a performance test. Testing export time with PG 11 is more of functional test imo. Unsure if the team who did the PG11 work tested that.
Testing export time with PG 11 is more of functional test imo.
How so? I think of performance as being a non-functional property, unless you're in the business of "selling performance" :-) Did not mean to say that these tests should have happened as part of this test suite of course, but I would disagree with saying that sidekiq work loads should not be performance tested. They are just as much affected by changes in database or caching infrastructure as are our online work loads. If they suffer performance regressions, it will have knock-on effects on the entire product, and we should know about that.
We were trying to make steps in that direction with https://gitlab.com/gitlab-org/memory-team/import-export-performance but it's fairly crude currently, since all we have is durations. But now we're at a point where we're seeing 30% slow downs in these tests and we don't know why. It is likely not related to the pg upgrade, but all I'm saying is that we should ultimately treat this with the same attention as req/rep work loads.
Yeah that's all a fair shout, I used the incorrect term there (functional) apologies. Let me clarify below.
This is a subjective take of mine but functional and non-functional as terms I find aren't used as much these days as both are more equally important now (performance really is a part of functionality these days for users). For example if the export took an incredible amount of time to do like 48 hours I would say that the feature isn't really functional.
What I meant to say above when I said functional is (singular) export testing time with a new DB version should be earlier in the test chain and part of the quality gate criteria, performed by the core team working on the feature.
Testing the feature at scale though (multiple concurrent exports) is something that could happen later in the test chain as that falls more neatly in that traditional "non-functional" space. That's something we (Quality) could explore as an automated test.
Happy to discuss that further. Back to export though a 30% drop is indeed notable. Are you wanting some help with investigating that?
Thanks for clarifying, that's a really good take on this actually
Are you wanting some help with investigating that?
I gotta admit I'm stuck right now and I had meant to time box the investigation, since I'm also conscious of how much time we might spend on this and we might multiply it the more folks we involve. We don't even know right now if the slow down also happens on prod, because we don't run these tests for prod, only staging.
I posted some updates in the ticket around things I had looked at but so far they look like dead ends. If you have any ideas for what else to consider though, do let me know!
Grant Youngchanged title from Test Postgres11 against the 10k reference architecture environment to Performance Test Postgres11 against the 10k reference architecture environment
changed title from Test Postgres11 against the 10k reference architecture environment to Performance Test Postgres11 against the 10k reference architecture environment
Grant Youngchanged the descriptionCompare with previous version
Grant Youngchanged title from Performance Test Postgres11 against the 10k reference architecture environment to Performance Test Postgres11 against the 10k reference architecture environment
changed title from Performance Test Postgres11 against the 10k reference architecture environment to Performance Test Postgres11 against the 10k reference architecture environment