Submitting code review with 64 notes to MR can cause rack timeouts
Summary
Drafting a code review with 64 notes for an MR with 123 files, (+3701, −2569 changes), causes a rack timeout.
From the customer reporting this problem:
I have a user who was drafting a code review for what I'd consider a medium sized MR (123 files, +3701, −2569). He has 64 comments in his draft MR. There are 6 participants in the merge request; the original author, and 5 other people who have submitted code review comments. There are 125 commits and 61 pipelines associated with the MR.
When this user (or myself, impersonating this user), tries to submit the pending code review comments, it causes the CPU on the system to peg at 100% for a minute or so, mostly of puma worker processes. It also causes hundreds of megabytes of data to be pulled from the database. And the submission doesn't succeed, it times out after 60 seconds, so the code review comments still show as pending. Of course, the user tried submitting a few more times, all with the same result, leading to several minutes of degraded performance.
It looks like each time, some of the comments were submitted, and some were not; if you go down the list of comments in the review, the ones at the top have been submitted a number of times, while the one at the bottom has never been submitted, and a few close to the end have been submitted one or two times.
Steps to reproduce
- Currently unreproducible *
Example Project
What is the current bug behavior?
- Notes for code review get submitted but fails to complete and errors with a rack timeout
What is the expected correct behavior?
Relevant logs and/or screenshots
Error from logs (some parts removed for privacy):
{
"action": "publish",
"controller": "Projects::MergeRequests::DraftsController",
"correlation_id": "01HGM0D1CY1RFW7960XW3PTXZG",
"cpu_s": 13.082121,
"db_cached_count": 130,
"db_ci_cached_count": 0,
"db_ci_count": 0,
"db_ci_duration_s": 0.0,
"db_ci_replica_cached_count": 0,
"db_ci_replica_count": 0,
"db_ci_replica_duration_s": 0.0,
"db_ci_replica_wal_cached_count": 0,
"db_ci_replica_wal_count": 0,
"db_ci_wal_cached_count": 0,
"db_ci_wal_count": 0,
"db_count": 1421,
"db_duration_s": 36.95823,
"db_main_cached_count": 130,
"db_main_count": 1421,
"db_main_duration_s": 31.53,
"db_main_replica_cached_count": 0,
"db_main_replica_count": 0,
"db_main_replica_duration_s": 0.0,
"db_main_replica_wal_cached_count": 0,
"db_main_replica_wal_count": 0,
"db_main_wal_cached_count": 0,
"db_main_wal_count": 0,
"db_primary_cached_count": 130,
"db_primary_count": 1421,
"db_primary_duration_s": 31.53,
"db_primary_wal_cached_count": 0,
"db_primary_wal_count": 0,
"db_replica_cached_count": 0,
"db_replica_count": 0,
"db_replica_duration_s": 0.0,
"db_replica_wal_cached_count": 0,
"db_replica_wal_count": 0,
"db_write_count": 258,
"duration_s": 59.97863,
"exception.backtrace": [
"lib/gitlab/database/load_balancing/connection_proxy.rb:107:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:107:in `block in read_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:137:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:224:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:126:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:106:in `read_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:48:in `select_all'",
"app/models/todo.rb:292:in `target'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:127:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:137:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:224:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:126:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:78:in `transaction'",
"app/services/todo_service.rb:300:in `block in create_todos'",
"app/services/todo_service.rb:296:in `map'",
"app/services/todo_service.rb:296:in `create_todos'",
"app/services/todo_service.rb:361:in `create_mention_todos'",
"app/services/todo_service.rb:333:in `handle_note'",
"app/services/todo_service.rb:147:in `new_note'",
"app/services/notes/create_service.rb:88:in `when_saved'",
"app/services/notes/create_service.rb:40:in `block in execute'",
"app/services/notes/create_service.rb:62:in `execute_quick_actions'",
"app/services/notes/create_service.rb:23:in `execute'",
"app/services/draft_notes/publish_service.rb:61:in `create_note_from_draft'",
"app/services/draft_notes/publish_service.rb:37:in `block in publish_draft_notes'",
"app/services/draft_notes/publish_service.rb:35:in `map'",
"app/services/draft_notes/publish_service.rb:35:in `publish_draft_notes'",
"app/services/draft_notes/publish_service.rb:11:in `execute'",
"app/controllers/projects/merge_requests/drafts_controller.rb:56:in `publish'",
"ee/lib/gitlab/ip_address_state.rb:10:in `with'",
"ee/app/controllers/ee/application_controller.rb:45:in `set_current_ip_address'",
"app/controllers/application_controller.rb:498:in `set_current_admin'",
"lib/gitlab/session.rb:11:in `with_session'",
"app/controllers/application_controller.rb:489:in `set_session_storage'",
"lib/gitlab/i18n.rb:114:in `with_locale'",
"lib/gitlab/i18n.rb:120:in `with_user_locale'",
"app/controllers/application_controller.rb:480:in `set_locale'",
"app/controllers/application_controller.rb:473:in `set_current_context'",
"lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'",
"lib/gitlab/middleware/memory_report.rb:13:in `call'",
"lib/gitlab/middleware/speedscope.rb:13:in `call'",
"lib/gitlab/database/load_balancing/rack_middleware.rb:23:in `call'",
"lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'",
"lib/gitlab/etag_caching/middleware.rb:21:in `call'",
"lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'",
"lib/gitlab/metrics/web_transaction.rb:46:in `run'",
"lib/gitlab/metrics/rack_middleware.rb:16:in `call'",
"lib/gitlab/jira/middleware.rb:19:in `call'",
"lib/gitlab/middleware/go.rb:20:in `call'",
"lib/gitlab/middleware/query_analyzer.rb:11:in `block in call'",
"lib/gitlab/database/query_analyzer.rb:37:in `within'",
"lib/gitlab/middleware/query_analyzer.rb:11:in `call'",
"lib/gitlab/middleware/multipart.rb:173:in `call'",
"lib/gitlab/middleware/read_only/controller.rb:50:in `call'",
"lib/gitlab/middleware/read_only.rb:18:in `call'",
"lib/gitlab/middleware/same_site_cookies.rb:27:in `call'",
"lib/gitlab/middleware/path_traversal_check.rb:25:in `call'",
"lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'",
"lib/gitlab/middleware/basic_health_check.rb:25:in `call'",
"lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'",
"lib/gitlab/middleware/request_context.rb:15:in `call'",
"lib/gitlab/middleware/webhook_recursion_detection.rb:15:in `call'",
"config/initializers/fix_local_cache_middleware.rb:11:in `call'",
"lib/gitlab/middleware/compressed_json.rb:44:in `call'",
"lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'",
"lib/gitlab/middleware/sidekiq_web_static.rb:20:in `call'",
"lib/gitlab/metrics/requests_rack_middleware.rb:79:in `call'",
"lib/gitlab/middleware/release_env.rb:13:in `call'"
],
"exception.class": "Rack::Timeout::RequestTimeoutException",
"exception.message": "Request ran for longer than 60000ms ",
"format": "json",
"gitaly_calls": 58,
"gitaly_duration_s": 2.303203,
"mem_bytes": 319988011,
"mem_mallocs": 1213148,
"mem_objects": 2636124,
"mem_total_bytes": 425432971,
"meta.caller_id": "Projects::MergeRequests::DraftsController#publish",
"meta.client_id": "user/27",
"meta.feature_category": "code_review_workflow",
"meta.user_id": 27,
"method": "POST",
"params": [
{
"key": "noteable_type",
"value": "MergeRequest"
},
{
"key": "noteable_id",
"value": 13380
},
{
"key": "note",
"value": "[FILTERED]"
},
{
"key": "approve",
"value": false
},
{
"key": "approval_password",
"value": "[FILTERED]"
},
{
"key": "namespace_id",
"value": "[REDACTED]"
},
{
"key": "project_id",
"value": "[REDACTED]"
},
{
"key": "merge_request_id",
"value": "629"
},
{
"key": "draft",
"value": {
"approval_password": "[FILTERED]",
"approve": false,
"note": "[FILTERED]",
"noteable_id": 13380,
"noteable_type": "MergeRequest"
}
}
],
"path": "/[REDACTED]/-/merge_requests/629/drafts/publish",
"pid": 3405326,
"queue_duration_s": 0.039656,
"rate_limiting_gates": [],
"redis_action_cable_calls": 63,
"redis_action_cable_duration_s": 0.710389,
"redis_action_cable_read_bytes": 64,
"redis_action_cable_write_bytes": 5182,
"redis_allowed_cross_slot_calls": 1,
"redis_cache_calls": 249,
"redis_cache_duration_s": 1.821822,
"redis_cache_read_bytes": 312,
"redis_cache_write_bytes": 40382,
"redis_calls": 1347,
"redis_duration_s": 6.965468,
"redis_feature_flag_calls": 16,
"redis_feature_flag_duration_s": 0.203122,
"redis_feature_flag_read_bytes": 1596,
"redis_feature_flag_write_bytes": 1917,
"redis_queues_calls": 124,
"redis_queues_duration_s": 0.667577,
"redis_queues_metadata_calls": 62,
"redis_queues_metadata_duration_s": 0.525367,
"redis_queues_metadata_read_bytes": 124,
"redis_queues_metadata_write_bytes": 13020,
"redis_queues_read_bytes": 124,
"redis_queues_write_bytes": 50812,
"redis_read_bytes": 5769,
"redis_repository_cache_calls": 1,
"redis_repository_cache_duration_s": 0.000505,
"redis_repository_cache_read_bytes": 102,
"redis_repository_cache_write_bytes": 48,
"redis_sessions_allowed_cross_slot_calls": 1,
"redis_sessions_calls": 3,
"redis_sessions_duration_s": 0.001454,
"redis_sessions_read_bytes": 204,
"redis_sessions_write_bytes": 592,
"redis_shared_state_calls": 829,
"redis_shared_state_duration_s": 3.035232,
"redis_shared_state_read_bytes": 3243,
"redis_shared_state_write_bytes": 33810,
"redis_write_bytes": 145763,
"remote_ip": "10.255.1.17",
"request_urgency": "low",
"status": 500,
"target_duration_s": 5,
"time": "2023-12-02T00:57:05.448Z",
"ua": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0",
"user_id": 27,
"view_duration_s": 0.0,
"worker_id": "puma_24"
}
Output of checks
Results of GitLab environment info
Affects gitlab-ee 16.5.1
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
It looks like the 500 error is popping up due to this function: https://gitlab.com/gitlab-org/gitlab/-/blob/v16.5.1-ee/app/services/todo_service.rb?ref_type=tags#L352-362
I believe it's timing out when creating todos for mentioned_users
.