Fix flaky 2FA login specs
What does this MR do and why?
Fixes #402322 (closed)
These feature specs started failing regularly, but not 100% of the time, recently. This commit was the first instance where the failure started occurring regularly.
In previous commits in this MR, I added verbose puts
statements to see why TOTP verification was failing. Some pipelines where it faild:
- https://gitlab.com/gitlab-org/gitlab/-/jobs/4019220891
- https://gitlab.com/gitlab-org/gitlab/-/jobs/4019074537
In each of these pipelines, /spec/features/users/login_spec.rb:324
is failing. The logged data:
# in spec
TIME: 2023-03-28 21:13:23 UTC
USER ID: 15
USER OTP SECRET: IH2QUCNO5XCCRL4YO72VBCY2ICEDBUTS
USER OTP: 641912
# in lib/gitlab/auth/otp/strategies/devise.rb
TIME: 2023-03-28 21:14:08 UTC
CURRENT USER ID: 15
CURRENT USER OTP SECRET: IH2QUCNO5XCCRL4YO72VBCY2ICEDBUTS
ENTERED OTP CODE: 641912
CURRENT OTP: 227675
# in spec
TIME: 2023-03-28 20:43:50 UTC
USER ID: 14
USER OTP SECRET: QUT2L7V5Z2WG5R64W4SK3ZRCT3FWP7L2
USER OTP: 958221
# in lib/gitlab/auth/otp/strategies/devise.rb
TIME: 2023-03-28 20:44:35 UTC
CURRENT USER ID: 14
CURRENT USER OTP SECRET: QUT2L7V5Z2WG5R64W4SK3ZRCT3FWP7L2
ENTERED OTP CODE: 958221
CURRENT OTP: 998530
Note that in each test case the OTP SECRET
matches, and the entered OTP code works, but more than 30 seconds has elapsed between the logging in the test context and where the OTP is being validated.
The OTP validation logic in lib/gitlab/auth/otp/strategies/devise.rb
goes to the devise-two-factor
gem here:
# This defaults to the model's otp_secret
# If this hasn't been generated yet, pass a secret as an option
def validate_and_consume_otp!(code, options = {})
otp_secret = options[:otp_secret] || self.otp_secret
return false unless code.present? && otp_secret.present?
totp = otp(otp_secret)
if self.consumed_timestep
# reconstruct the timestamp of the last consumed timestep
after_timestamp = self.consumed_timestep * otp.interval
end
if totp.verify(code.gsub(/\s+/, ""), drift_behind: self.class.otp_allowed_drift, drift_ahead: self.class.otp_allowed_drift, after: after_timestamp)
return consume_otp!
end
false
end
The totp.verify
call goes to the ROTP
gem:
# Verifies the OTP passed in against the current time OTP
# and adjacent intervals up to +drift+. Excludes OTPs
# from `after` and earlier. Returns time value of
# matching OTP code for use in subsequent call.
# @param otp [String] the one time password to verify
# @param drift_behind [Integer] how many seconds to look back
# @param drift_ahead [Integer] how many seconds to look ahead
# @param after [Integer] prevent token reuse, last login timestamp
# @param at [Time] time at which to generate and verify a particular
# otp. default Time.now
# @return [Integer, nil] the last successful timestamp
# interval
def verify(otp, drift_ahead: 0, drift_behind: 0, after: nil, at: Time.now)
timecodes = get_timecodes(at, drift_behind, drift_ahead)
timecodes = timecodes.select { |t| t > timecode(after) } if after
result = nil
timecodes.each do |t|
result = t * interval if super(otp, generate_otp(t))
end
result
end
The most relevant detail here is the drift_ahead
and drift_behind
parameter. GitLab uses the default value for otp_allowed_drift
, which is 30 seconds. If you note the timestamps in both failures above, they are more than 30 seconds apart. As a result, the entered OTP was no longer valid by the time it was being validated.
It is not clear why this drift issue only became apparent now. Perhaps a recent change in the GitLab codebase is resulting in much slower request times? I am not sure. But either way, the user login feature specs are not designed to test drift issues with TOTP, so freezing time in these tests is OK and will ensure that we do not have failures related to clock drift.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.