Skip to content

Support for environments with only AWS IMDSv2 enabled

Proposal

AWS is recommending that customers disable IMDSv1, for example: AWS EKS best practice advises disabling IMDSv1 for nodes and pods.

A number of components of GitLab rely in AWS IMDS, and it's not clear how many support IMDSv2.

Customers who deploy GitLab with IMDSv1 disabled are likely to have a bad experience as some functionality will work, some functionality will not.

It is likely to result in issues and tickets, and the broken functionality, such as IAM, will be identified as the issue, not the fact that the customer has disabled IMDSv1.

Purpose of this issue

This is a pseudo-epic, to act as a SSOT for IMDSv2 issues and to provide more context for the shift to IMDSv2.

A number of issues will be raised to specific engineering groups. Product managers may want to associate those issues to their own epics, so this issue has not been promoted to an epic.

Origin of this issue

Customer raised a ticket to troubleshoot cloud-native backups that weren't working. GitLab team members can read more in the ticket.

The reason is that they've disabled IMDSv1, and s3cmd doesn't seem to support IMDSv2.

The customer has turned IMDSv1 back on, as they are concerned that they would keep finding functionality that won't work with IMDSv1 disabled.

What is IMDSv2

  • Instance Metadata Service Version 2
  • IMDS is the AWS API that's available at 169.254.169.254
  • One use case is obtaining credentials in an environment that uses IAM.

For example, looking at the s3cmd code, it makes an HTTP connection to 169.254.169.254, and then:

request('GET', "/latest/meta-data/iam/security-credentials/")

This will return JSON payload, and from that AccessKeyId, SecretAccessKey, and Token can be extracted.

However, it looks like s3cmd is only using IMDSv1, because the steps documented for using IMDSv2 are:

  • obtain a session token with: PUT "http://169.254.169.254/latest/api/token"
  • include that token in GET requests to the instance metadata service

Details on availability

Demand is likely to grow

A quick search identified a number of articles and posts recommending IMDSv1 be disabled. (List moved to a comment)

It took a few months for EKS support to be announced, and I found some other suggestions about other AWS components that didn't fully support IMDSv2 on various time frames.

However, as AWS advises turning off IMDSv1, it's only a matter of time before this becomes common practice and customers will be requiring full support.

What GitLab support exists?

What is the nature of this support?

If we have added support for IMDSv2, it would be useful to know what functionality in GitLab:

  • Requires configuration to use IMDSv2, and if so only uses IMDSv2 (so customers can 'tick boxes' on which bits are switched over) or
  • Uses IMDSv2 if available, falling back to v1 only if necessary.
  • Can handle IMDSv1 being turned off.

Customers who want to follow best practise or are directed to to use IMDSv2 for compliance reasons don't want to scope GitLab's support by turning off IMDSv1 and seeing what breaks.

Adding a section in our documentation would be helpful - so customers can see what's supported, what isn't, what explicit configuration changes are needed.

Example:

In the spec code associated with the fog update it is commented:

# If IMDSv2 is disabled, we should still fall back to IMDSv1

This implies that Fog will automatically use v2 if available.

Functionality inventory

What functionality do we have in GitLab, in the broadest sense, which uses IMDSv1 to obtain IAM credentials, or uses IMDS for anything else, and so would be impacted if IMDSv1 were disabled, per AWS best practise.

description status group scoping issue resolve - issue/MR
Fog (Rails) fixed ~"group::ecosystem" n/a MR (13.7): !48519 (merged)
Docker Machine Executor fixed grouprunner n/a MR (13.9?): gitlab-org/ci-cd/docker-machine!49 (merged)
Helm backups (s3cmd) fixed groupdistribution n/a gitlab-org/charts/gitlab#2787
Runners - shared cache unknown grouprunner gitlab-runner#28027 TBC
Runners - uploading artifacts unknown grouprunner gitlab-runner#28027 TBC
Fargate runner executor unknown grouprunner gitlab-runner#28027 TBC
Kubernetes executor unknown grouprunner gitlab-runner#28027 TBC
Using ECR for runner containers unknown grouprunner gitlab-runner#28027 TBC
Container registry (S3 storage) OK ~"group::package" #334890 (closed)
Dependency proxy OK ~"group::package" #334890 (closed) TBC
Backups to object storage OK groupgeo #334891
Deploying Lambda functions. OK ~"group::configure" #334894 (closed) (feature removed)
Kubernetes agent Doesn't apply ~"group::configure" #334894 (comment 615371261) TBC
Continuous Deployment to AWS Elastic Container Service unknown groupenvironments #334895 TBC
Edited by Grant Young