Skip to content

Fix flaky 2FA login specs

Jessie Young requested to merge jy-unquarantine into master

What does this MR do and why?

Fixes #402322 (closed)

These feature specs started failing regularly, but not 100% of the time, recently. This commit was the first instance where the failure started occurring regularly.

In previous commits in this MR, I added verbose puts statements to see why TOTP verification was failing. Some pipelines where it faild:

In each of these pipelines, /spec/features/users/login_spec.rb:324 is failing. The logged data:

# in spec

TIME: 2023-03-28 21:13:23 UTC
USER ID: 15
USER OTP SECRET: IH2QUCNO5XCCRL4YO72VBCY2ICEDBUTS
USER OTP: 641912

# in lib/gitlab/auth/otp/strategies/devise.rb

TIME: 2023-03-28 21:14:08 UTC
CURRENT USER ID: 15
CURRENT USER OTP SECRET: IH2QUCNO5XCCRL4YO72VBCY2ICEDBUTS
ENTERED OTP CODE: 641912
CURRENT OTP: 227675
# in spec

TIME: 2023-03-28 20:43:50 UTC
USER ID: 14
USER OTP SECRET: QUT2L7V5Z2WG5R64W4SK3ZRCT3FWP7L2
USER OTP: 958221

# in lib/gitlab/auth/otp/strategies/devise.rb

TIME: 2023-03-28 20:44:35 UTC
CURRENT USER ID: 14
CURRENT USER OTP SECRET: QUT2L7V5Z2WG5R64W4SK3ZRCT3FWP7L2
ENTERED OTP CODE: 958221
CURRENT OTP: 998530

Note that in each test case the OTP SECRET matches, and the entered OTP code works, but more than 30 seconds has elapsed between the logging in the test context and where the OTP is being validated.

The OTP validation logic in lib/gitlab/auth/otp/strategies/devise.rb goes to the devise-two-factor gem here:

      # This defaults to the model's otp_secret
      # If this hasn't been generated yet, pass a secret as an option
      def validate_and_consume_otp!(code, options = {})
        otp_secret = options[:otp_secret] || self.otp_secret
        return false unless code.present? && otp_secret.present?

        totp = otp(otp_secret)

        if self.consumed_timestep
          # reconstruct the timestamp of the last consumed timestep
          after_timestamp = self.consumed_timestep * otp.interval
        end

        if totp.verify(code.gsub(/\s+/, ""), drift_behind: self.class.otp_allowed_drift, drift_ahead: self.class.otp_allowed_drift, after: after_timestamp)
          return consume_otp!
        end

        false
      end

The totp.verify call goes to the ROTP gem:

    # Verifies the OTP passed in against the current time OTP
    # and adjacent intervals up to +drift+.  Excludes OTPs
    # from `after` and earlier.  Returns time value of
    # matching OTP code for use in subsequent call.
    # @param otp [String] the one time password to verify
    # @param drift_behind [Integer] how many seconds to look back
    # @param drift_ahead [Integer] how many seconds to look ahead
    # @param after [Integer] prevent token reuse, last login timestamp
    # @param at [Time] time at which to generate and verify a particular
    #   otp. default Time.now
    # @return [Integer, nil] the last successful timestamp
    #   interval
    def verify(otp, drift_ahead: 0, drift_behind: 0, after: nil, at: Time.now)
      timecodes = get_timecodes(at, drift_behind, drift_ahead)

      timecodes = timecodes.select { |t| t > timecode(after) } if after

      result = nil
      timecodes.each do |t|
        result = t * interval if super(otp, generate_otp(t))
      end
      result
    end

The most relevant detail here is the drift_ahead and drift_behind parameter. GitLab uses the default value for otp_allowed_drift, which is 30 seconds. If you note the timestamps in both failures above, they are more than 30 seconds apart. As a result, the entered OTP was no longer valid by the time it was being validated.

It is not clear why this drift issue only became apparent now. Perhaps a recent change in the GitLab codebase is resulting in much slower request times? I am not sure. But either way, the user login feature specs are not designed to test drift issues with TOTP, so freezing time in these tests is OK and will ensure that we do not have failures related to clock drift.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Jessie Young

Merge request reports

Loading