Review optimal Webservice Puma configuration on Hybrid Architectures

As part of the ongoing work to bring Hybrid environment support into GET it was noticed that performance results for the prospective 10k Hybrid Reference Architecture were slower compared to a standard architecture.

Starting with comparing the results between the two architectures:

Test TTFB Results	10k	10k Hybrid 56W/506T	Comparison
api_v4_groups	125.51	115.84	9.67
api_v4_groups_group	5720.13	5209.57	510.56
api_v4_groups_group_subgroups	137.44	206.21	-68.77
api_v4_groups_issues	2163.58	2791.93	-628.35
api_v4_groups_merge_requests	1920.77	1735.94	184.83
api_v4_groups_projects	1804.07	2166.57	-362.5
api_v4_projects	3451.27	4870.84	-1419.57
api_v4_projects_deploy_keys	58.92	71.1	-12.18
api_v4_projects_issues	371.84	1719.03	-1347.19
api_v4_projects_issues_issue	267.64	1265.89	-998.25
api_v4_projects_languages	50.42	77.06	-26.64
api_v4_projects_merge_requests	261.18	889.1	-627.92
api_v4_projects_merge_requests_merge_request	112.99	310.92	-197.93
api_v4_projects_merge_requests_merge_request_changes	2301.62	3365.4	-1063.78
api_v4_projects_merge_requests_merge_request_commits	103.37	118.12	-14.75
api_v4_projects_merge_requests_merge_request_discussions	260.13	691.78	-431.65
api_v4_projects_project	162.85	257.84	-94.99
api_v4_projects_project_pipelines	70	74.97	-4.97
api_v4_projects_project_pipelines_pipeline	82.82	83.18	-0.36
api_v4_projects_project_services	55.29	54.17	1.12
api_v4_projects_releases	2585.22	3330.96	-745.74
api_v4_projects_repository_branches	105.82	173.02	-67.2
api_v4_projects_repository_branches_branch	75.03	93.38	-18.35
api_v4_projects_repository_commits	69.6	85.53	-15.93
api_v4_projects_repository_commits_commit	122.97	134.14	-11.17
api_v4_projects_repository_commits_commit_diff	125.54	140.67	-15.13
api_v4_projects_repository_compare_commits	174.21	252.5	-78.29
api_v4_projects_repository_files_file	104.35	118.65	-14.3
api_v4_projects_repository_files_file_blame	9852.56	10771.93	-919.37
api_v4_projects_repository_files_file_raw	100.21	102.97	-2.76
api_v4_projects_repository_tags	1301.79	1468.94	-167.15
api_v4_projects_repository_tree	99.25	98.24	1.01
api_v4_user	44.17	44.13	0.04
api_v4_users	81.74	88.6	-6.86
git_ls_remote	63.89	56.82	7.07
git_pull	86.3	86.67	-0.37
git_push	582.37	589.12	-6.75
scenario_api_list_group_variables	128.43	85.09	43.34
scenario_api_list_project_variables	143.22	106.02	37.2
scenario_api_new_branches	363.51	353.2	10.31
scenario_api_new_commits	455.28	443.42	11.86
scenario_api_new_group_variables	98.11	65.11	33
scenario_api_new_issues	945.42	246.17	699.25
scenario_api_new_project_variables	106.59	83.5	23.09
web_group	158.3	163.35	-5.05
web_group_issues	394.36	352.04	42.32
web_group_merge_requests	379.98	355.12	24.86
web_project	353.91	281.77	72.14
web_project_branches	573.22	535.52	37.7
web_project_commit	9268.39	9238.67	29.72
web_project_commits	651.69	496.7	154.99
web_project_file_blame	3777	4464.9	-687.9
web_project_file_rendered	2246.41	2607.8	-361.39
web_project_file_source	2021.99	2655.58	-633.59
web_project_files	281.08	227.19	53.89
web_project_issue	1085.45	1069.01	16.44
web_project_issues	427.59	334.06	93.53
web_project_merge_request_changes	526.83	556.27	-29.44
web_project_merge_request_commits	783.18	779.79	3.39
web_project_merge_request_discussions	3405.51	3688.91	-283.4
web_project_merge_requests	366.69	369.31	-2.62
web_project_pipelines	726.77	710.8	15.97
web_project_pipelines_pipeline	1447.44	1388.35	59.09
web_project_tags	921.55	873.63	47.92
web_user	130.54	90.07	40.47

Across the board we can see the Hybrid environment is underperforming the normal environment notably. This is due to a difference in Puma configuration. Puma is currently recommended to be configured as follows on both the 10k and 10k Hybrid architectures:

10k - Automatically configures in Omnibus to fill CPU and Memory automatically. This currently translates to 26 workers each across 3 32 vCPU 28.8 GB memory node in the reference architecture. The default thread count per worker is set to 4 after extensive testing was conducted to find the right sweet spot. Total count = 78W/312T
10k Hybrid - Recommended to be 28 pods each with 2 workers and 10 threads across 4 . Total count = 56W/560T

In extensive testing for Puma we found that for raw performance Worker count should still be considered the main area to manage performance. For Threads we found that it works more as a sweet spot and anything higher than 4 didn't return much value - if anything it degraded performance.

However these findings are based on Omnibus tests so the task is to review what is the optimal Puma config in Kubernetes to achieve comparable performance with the normal environment with all of the added considerations k8s comes with such as CPU and Memory limits and avoiding resource limit kills.

Edited Mar 08, 2021 by Grant Young