Unable to fetch pod logs with Elasticstack integration
Summary
Currently, if the Elasticstack integration is enabled for a certificate-connected Kubernetes cluster, attempting to fetch the pod logs will result in a "There was an error fetching the logs" message.
I wasn't able to find the error within Kibana, but I was able to find it via Sentry.
Using the same setup on a self-managed instance (14.2.5), this error does not occur. I was able to use the rails console on GitLab.com to reproduce the specific error that's being encountered:
[ gprd ] production> cluster = Clusters::Cluster.find(171751)
=> #<Clusters::Cluster id: 171751, user_id: 4554657, provider_type: "user", platform_type: "kubernetes", created_at: "2022-01-17 18:04:23.012468000 +0000", updated_at: "2022-01-17 18:24:12.100096000 +0000", enabled: true, name: "autodevops-og", environment_scope: "*", cluster_type: "project_type", domain: "clan-...
[ gprd ] production> client = cluster&.elasticsearch_client
=> #<Elasticsearch::Transport::Client:0x00007f6bdf4b6228 @options={:url=>"https://[redacted]/api/v1/namespaces/gitlab-managed-apps/services/elastic-stack-elasticsearch-master:9200/proxy", :logger=>nil, :tracer=>nil, :reload_connections=>false, :retry_on_failure=>false, :reload_on_failure=>false, :randomize_ho...
[ gprd ] production> client.search
Traceback (most recent call last):
2: from (irb):20
1: from lib/gitlab/instrumentation/elasticsearch_transport.rb:12:in `perform_request'
Faraday::ConnectionFailed (SSL peer certificate or SSH remote key was not OK)
Steps to reproduce
-
Create/find a project deployed to a certificate-connected Kubernetes cluster.
-
Install Elasticstack on the cluster, following our docs
-
Enable the Elasticstack integration via the cluster settings
-
Attempt to view the pod logs for a given deployment
What is the current bug behavior?
Pod logs fail to be fetched due to an SSL certificate error.
What is the expected correct behavior?
Pod logs are fetched successfully.
Output of checks
This bug happens on GitLab.com.
Possible fixes
Sentry shows that this error started popping up on January 12th, via the commit 4724cb4d929
. This commit contained the changes found in this MR. January 12th also happens to be the day the FF associated with this MR was enabled. I believe this change to be related to this issue.