Review optimal Webservice Puma configuration on Hybrid Architectures
As part of the ongoing work to bring Hybrid environment support into GET it was noticed that performance results for the prospective 10k Hybrid Reference Architecture were slower compared to a standard architecture.
Starting with comparing the results between the two architectures:
Test TTFB Results | 10k | 10k Hybrid 56W/506T | Comparison |
---|---|---|---|
api_v4_groups | 125.51 | 115.84 | 9.67 |
api_v4_groups_group | 5720.13 | 5209.57 | 510.56 |
api_v4_groups_group_subgroups | 137.44 | 206.21 | -68.77 |
api_v4_groups_issues | 2163.58 | 2791.93 | -628.35 |
api_v4_groups_merge_requests | 1920.77 | 1735.94 | 184.83 |
api_v4_groups_projects | 1804.07 | 2166.57 | -362.5 |
api_v4_projects | 3451.27 | 4870.84 | -1419.57 |
api_v4_projects_deploy_keys | 58.92 | 71.1 | -12.18 |
api_v4_projects_issues | 371.84 | 1719.03 | -1347.19 |
api_v4_projects_issues_issue | 267.64 | 1265.89 | -998.25 |
api_v4_projects_languages | 50.42 | 77.06 | -26.64 |
api_v4_projects_merge_requests | 261.18 | 889.1 | -627.92 |
api_v4_projects_merge_requests_merge_request | 112.99 | 310.92 | -197.93 |
api_v4_projects_merge_requests_merge_request_changes | 2301.62 | 3365.4 | -1063.78 |
api_v4_projects_merge_requests_merge_request_commits | 103.37 | 118.12 | -14.75 |
api_v4_projects_merge_requests_merge_request_discussions | 260.13 | 691.78 | -431.65 |
api_v4_projects_project | 162.85 | 257.84 | -94.99 |
api_v4_projects_project_pipelines | 70 | 74.97 | -4.97 |
api_v4_projects_project_pipelines_pipeline | 82.82 | 83.18 | -0.36 |
api_v4_projects_project_services | 55.29 | 54.17 | 1.12 |
api_v4_projects_releases | 2585.22 | 3330.96 | -745.74 |
api_v4_projects_repository_branches | 105.82 | 173.02 | -67.2 |
api_v4_projects_repository_branches_branch | 75.03 | 93.38 | -18.35 |
api_v4_projects_repository_commits | 69.6 | 85.53 | -15.93 |
api_v4_projects_repository_commits_commit | 122.97 | 134.14 | -11.17 |
api_v4_projects_repository_commits_commit_diff | 125.54 | 140.67 | -15.13 |
api_v4_projects_repository_compare_commits | 174.21 | 252.5 | -78.29 |
api_v4_projects_repository_files_file | 104.35 | 118.65 | -14.3 |
api_v4_projects_repository_files_file_blame | 9852.56 | 10771.93 | -919.37 |
api_v4_projects_repository_files_file_raw | 100.21 | 102.97 | -2.76 |
api_v4_projects_repository_tags | 1301.79 | 1468.94 | -167.15 |
api_v4_projects_repository_tree | 99.25 | 98.24 | 1.01 |
api_v4_user | 44.17 | 44.13 | 0.04 |
api_v4_users | 81.74 | 88.6 | -6.86 |
git_ls_remote | 63.89 | 56.82 | 7.07 |
git_pull | 86.3 | 86.67 | -0.37 |
git_push | 582.37 | 589.12 | -6.75 |
scenario_api_list_group_variables | 128.43 | 85.09 | 43.34 |
scenario_api_list_project_variables | 143.22 | 106.02 | 37.2 |
scenario_api_new_branches | 363.51 | 353.2 | 10.31 |
scenario_api_new_commits | 455.28 | 443.42 | 11.86 |
scenario_api_new_group_variables | 98.11 | 65.11 | 33 |
scenario_api_new_issues | 945.42 | 246.17 | 699.25 |
scenario_api_new_project_variables | 106.59 | 83.5 | 23.09 |
web_group | 158.3 | 163.35 | -5.05 |
web_group_issues | 394.36 | 352.04 | 42.32 |
web_group_merge_requests | 379.98 | 355.12 | 24.86 |
web_project | 353.91 | 281.77 | 72.14 |
web_project_branches | 573.22 | 535.52 | 37.7 |
web_project_commit | 9268.39 | 9238.67 | 29.72 |
web_project_commits | 651.69 | 496.7 | 154.99 |
web_project_file_blame | 3777 | 4464.9 | -687.9 |
web_project_file_rendered | 2246.41 | 2607.8 | -361.39 |
web_project_file_source | 2021.99 | 2655.58 | -633.59 |
web_project_files | 281.08 | 227.19 | 53.89 |
web_project_issue | 1085.45 | 1069.01 | 16.44 |
web_project_issues | 427.59 | 334.06 | 93.53 |
web_project_merge_request_changes | 526.83 | 556.27 | -29.44 |
web_project_merge_request_commits | 783.18 | 779.79 | 3.39 |
web_project_merge_request_discussions | 3405.51 | 3688.91 | -283.4 |
web_project_merge_requests | 366.69 | 369.31 | -2.62 |
web_project_pipelines | 726.77 | 710.8 | 15.97 |
web_project_pipelines_pipeline | 1447.44 | 1388.35 | 59.09 |
web_project_tags | 921.55 | 873.63 | 47.92 |
web_user | 130.54 | 90.07 | 40.47 |
Across the board we can see the Hybrid environment is underperforming the normal environment notably. This is due to a difference in Puma configuration. Puma is currently recommended to be configured as follows on both the 10k and 10k Hybrid architectures:
- 10k - Automatically configures in Omnibus to fill CPU and Memory automatically. This currently translates to 26 workers each across 3 32 vCPU 28.8 GB memory node in the reference architecture. The default thread count per worker is set to 4 after extensive testing was conducted to find the right sweet spot. Total count = 78W/312T
- 10k Hybrid - Recommended to be 28 pods each with 2 workers and 10 threads across 4 . Total count = 56W/560T
In extensive testing for Puma we found that for raw performance Worker count should still be considered the main area to manage performance. For Threads we found that it works more as a sweet spot and anything higher than 4 didn't return much value - if anything it degraded performance.
However these findings are based on Omnibus tests so the task is to review what is the optimal Puma config in Kubernetes to achieve comparable performance with the normal environment with all of the added considerations k8s comes with such as CPU and Memory limits and avoiding resource limit kills.