Sporadic Bad Gateway 502 user-specific errors in global code search
Summary
While searching gitlab-org/gitlab
on gitlab.com I came across 500 and 502 errors which persisted across multiple requests, refreshes and search attempts with the same user but could not be reproduced with other users. After brief a discussion with @changzhengliu and @terrichu we decided to create this issue for future reference.
https://gitlab.com/search?search=%22web_url%22+%22namespace%22&nav_source=navbar&project_id=2009901&group_id=9970&search_code=true&repository_ref=master
Steps to reproduce
TBD
What is the current bug behavior?
Application
On the application level, this manifests as a normal HTTP 500 error.
Logs
On the rails and logging level, however, we can reliably observe an 502
error followed by a 500
error with a Zoekt stacktrace.
502 Bad Gateway
<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n
And sometimes it seemed as if at some point this erroneous output was being parsed, albeit unsuccessfully:
unexpected character (after ) at line 1, column 1 [parse.c:804] in '<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n
500 Failed to open TCP connection to 10.216.8.36:443
Errno::ECONNREFUSED
Failed to open TCP connection to 10.216.8.36:443 (Connection refused - connect(2) for \"10.216.8.36\" port 443)
"lib/gitlab/json.rb:107:in `rescue in adapter_load'",
"lib/gitlab/json.rb:102:in `adapter_load'",
"lib/gitlab/json.rb:28:in `parse'",
"ee/lib/gitlab/search/zoekt/client.rb:165:in `parse_response'",
"ee/lib/gitlab/search/zoekt/client.rb:50:in `search'",
"ee/lib/gitlab/search/zoekt/client.rb:15:in `search'",
"ee/lib/gitlab/zoekt/search_results.rb:152:in `zoekt_search_and_wrap'",
"ee/lib/gitlab/zoekt/search_results.rb:118:in `search_as_found_blob'",
"ee/lib/gitlab/zoekt/search_results.rb:96:in `block in blobs'",
"ee/lib/gitlab/zoekt/search_results.rb:95:in `blobs'",
"ee/lib/gitlab/zoekt/search_results.rb:37:in `blobs_count'",
"ee/lib/gitlab/zoekt/search_results.rb:33:in `formatted_count'",
"app/controllers/search_controller.rb:102:in `block in count'",
"app/models/application_record.rb:73:in `block (2 levels) in with_fast_read_statement_timeout'",
"app/models/concerns/cross_database_modification.rb:92:in `block in transaction'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:111:in `public_send'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:111:in `block in read_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:63:in `read'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:110:in `read_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:75:in `transaction'",
"lib/gitlab/database.rb:359:in `block in transaction'",
"lib/gitlab/database.rb:358:in `transaction'",
"app/models/concerns/cross_database_modification.rb:83:in `transaction'",
"app/models/application_record.rb:70:in `block in with_fast_read_statement_timeout'",
"lib/gitlab/database/load_balancing/session.rb:95:in `fallback_to_replicas_for_ambiguous_queries'",
"app/models/application_record.rb:69:in `with_fast_read_statement_timeout'",
"app/controllers/search_controller.rb:101:in `count'",
"app/controllers/application_controller.rb:547:in `block in allow_gitaly_ref_name_caching'",
"lib/gitlab/gitaly_client.rb:457:in `allow_ref_name_caching'",
"app/controllers/application_controller.rb:546:in `allow_gitaly_ref_name_caching'",
"ee/lib/gitlab/ip_address_state.rb:10:in `with'",
"ee/app/controllers/ee/application_controller.rb:45:in `set_current_ip_address'",
"app/controllers/application_controller.rb:498:in `set_current_admin'",
"lib/gitlab/session.rb:11:in `with_session'",
"app/controllers/application_controller.rb:489:in `set_session_storage'",
"lib/gitlab/i18n.rb:114:in `with_locale'",
"lib/gitlab/i18n.rb:120:in `with_user_locale'",
"app/controllers/application_controller.rb:480:in `set_locale'",
"app/controllers/application_controller.rb:473:in `set_current_context'",
"ee/lib/omni_auth/strategies/group_saml.rb:41:in `other_phase'",
"lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'",
"lib/gitlab/middleware/memory_report.rb:13:in `call'",
"lib/gitlab/middleware/speedscope.rb:13:in `call'",
"lib/gitlab/database/load_balancing/rack_middleware.rb:23:in `call'",
"lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'",
"lib/gitlab/etag_caching/middleware.rb:21:in `call'",
"lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'",
"lib/gitlab/metrics/web_transaction.rb:46:in `run'",
"lib/gitlab/metrics/rack_middleware.rb:16:in `call'",
"lib/gitlab/middleware/go.rb:20:in `call'",
"lib/gitlab/middleware/query_analyzer.rb:11:in `block in call'",
"lib/gitlab/database/query_analyzer.rb:37:in `within'",
"lib/gitlab/middleware/query_analyzer.rb:11:in `call'",
"lib/gitlab/middleware/multipart.rb:173:in `call'",
"lib/gitlab/middleware/read_only/controller.rb:50:in `call'",
"lib/gitlab/middleware/read_only.rb:18:in `call'",
"lib/gitlab/middleware/same_site_cookies.rb:27:in `call'",
"lib/gitlab/middleware/path_traversal_check.rb:25:in `call'",
"lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'",
"lib/gitlab/middleware/basic_health_check.rb:25:in `call'",
"lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'",
"lib/gitlab/middleware/request_context.rb:15:in `call'",
"lib/gitlab/middleware/webhook_recursion_detection.rb:15:in `call'",
"config/initializers/fix_local_cache_middleware.rb:11:in `call'",
"lib/gitlab/middleware/compressed_json.rb:44:in `call'",
"lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'",
"lib/gitlab/middleware/sidekiq_web_static.rb:20:in `call'",
"lib/gitlab/metrics/requests_rack_middleware.rb:79:in `call'",
"lib/gitlab/middleware/release_env.rb:13:in `call'"
What is the expected correct behavior?
The query is performed as expected:
https://gitlab.com/search?search=%22web_url%22+%22namespace%22&nav_source=navbar&project_id=2009901&group_id=9970&search_code=true&repository_ref=master
Relevant logs and/or screenshots
See jdsalaro_zoekt_500s.json
as exported from https://log.gprd.gitlab.net/app/discover#/?_g=h@e5dc602&_a=h@590ff3b
Possible fixes
TBD
, although this seems like a temporary condition arising at the Kubernetes/network level.