Skip to content
Snippets Groups Projects

Use http read total timeout by default (logging errors rather than raising)

Merged Andy Schoenen requested to merge 334930_use-http-total-timeout-by-default into master

What does this MR do?

Related to https://gitlab.com/gitlab-org/gitlab/-/issues/334930 (this is a confidential issue but it was decided that we don't have to follow the security workflow)

In https://gitlab.com/gitlab-org/security/gitlab/-/merge_requests/1379 (internal only) we introduced a total read timeout for HTTP requests. This changed a fundamental part of the system, but we couldn't put it behind a feature flag and roll it out slowly while making sure it doesn't break anything. Instead, we added an opt-in option and used the new timeout only where it was necessary. Now it looks like the timeout is stable, so we can make it the default behavior.

This MR changes:

  • all requests default to using the read total timeout, but only the existing calls that pass use_read_total_timeout will raise an exception (current behaviour) all other calls will instead log the error. This allows us to monitor impacts of making the read total timeout the default.
  • adds skip_read_total_timeout opt-out option. Skipping the total timeout is needed in some cases where a request is expected to be streamed, like downloading a file.
  • response.to_s to response.body in several places because to_s is not guaranteed to return body as a string (see httparty/response.rb#L88-L94)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Luke Duncalfe

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Piotr Skorupa approved this merge request

    approved this merge request

  • Piotr Skorupa removed review request for @pskorupa

    removed review request for @pskorupa

    • Resolved by Luke Duncalfe

      :wave: @pskorupa, thanks for approving this merge request.

      Please consider starting a new pipeline if:

      • This is the first time the merge request is approved, or
      • The merge request is ready to be merged, and there has not been a merge request pipeline in the last 2 hours.

      For more info, refer to the guideline.

  • Andy Schoenen added 1 commit

    added 1 commit

    • 73ac5227 - Use named subject in download_blob_serivce_spec

    Compare with previous version

  • Hey @.luke would you be up for the maintainer review?

  • Andy Schoenen requested review from @.luke

    requested review from @.luke

  • Luke Duncalfe
    • Resolved by Luke Duncalfe

      Skipping the total timeout is needed in some cases where a request is expected to be streamed, like downloading a file.

      @Andysoiron What happens to a streaming request when total timeout is applied that means we can't apply it?

      I'm assuming this means that those parts of the app are "vulnerable" to https://gitlab.com/gitlab-org/gitlab/-/issues/334930, but because both parts of the app that are skipping the total read timeout are interacting with external services that can't be controlled by an attacker (LFS object storage and wherever we store container images, unlike webhooks) then this is okay.

      I've left one suggestion! When I've approved I'll also send this to a second maintainer for extra :eyes:.

  • Luke Duncalfe removed review request for @.luke

    removed review request for @.luke

  • Andy Schoenen requested review from @.luke

    requested review from @.luke

  • Andy Schoenen requested review from @mhenriksen

    requested review from @mhenriksen

  • Michael Henriksen approved this merge request

    approved this merge request

    • Resolved by Luke Duncalfe

      :wave: @mhenriksen, thanks for approving this merge request.

      Please consider starting a new pipeline if:

      • This is the first time the merge request is approved, or
      • The merge request is ready to be merged, and there has not been a merge request pipeline in the last 2 hours.

      For more info, refer to the guideline.

  • Luke Duncalfe
    • Resolved by Luke Duncalfe

      Thanks, @Andysoiron!

      I've left a couple of suggestions!

      I'm beginning to wonder if we should deploy an intermediary step, which might increase the overall time to deliver this feature, where we continue to raise the exception for the HTTP calls that currently use use_read_total_timeout (the calls that we know were vulnerable due to the server being external), and for all other calls, we log to Sentry rather than raise.

      Then after a couple of weeks, we can see if there should be some parts of the app that should be exempt, before finalising the feature.

      This is because I wonder if a possibility exists that there is a part of the app that performs non-streaming requests, and that previously was able to initiate long network requests, that might now find some of those requests fail? If I understand, read_timeout doesn't apply once the remote server is sending data.

      It could be a safer way to deploy this change in a way that doesn't disrupt services.

      What do you think?

      I think the change would look something like this:

      diff --git a/lib/gitlab/http.rb b/lib/gitlab/http.rb
      index 7e45cd216f5..2ba3184aad1 100644
      --- a/lib/gitlab/http.rb
      +++ b/lib/gitlab/http.rb
      @@ -43,16 +43,22 @@ def self.perform_request(http_method, path, options, &block)
                 options
               end
      
      -      unless options.has_key?(:use_read_total_timeout)
      +      if options[:skip_read_total_timeout]
               return httparty_perform_request(http_method, path, options_with_timeouts, &block)
             end
      
             start_time = Gitlab::Metrics::System.monotonic_time
             read_total_timeout = options.fetch(:timeout, DEFAULT_READ_TOTAL_TIMEOUT)
      +      read_total_timeout_logged = false
      
             httparty_perform_request(http_method, path, options_with_timeouts) do |fragment|
               elapsed = Gitlab::Metrics::System.monotonic_time - start_time
      -        raise ReadTotalTimeout, "Request timed out after #{elapsed} seconds" if elapsed > read_total_timeout
      +
      +        if options.has_key?(:use_read_total_timeout)
      +          raise ReadTotalTimeout, "Request timed out after #{elapsed} seconds" if elapsed > read_total_timeout
      +        else
      +          # Log here unless read_total_timeout_logged
      +        end
      
               block.call fragment if block
  • Luke Duncalfe removed review request for @.luke

    removed review request for @.luke

  • Andy Schoenen added 974 commits

    added 974 commits

    Compare with previous version

  • Andy Schoenen added 2 commits

    added 2 commits

    • f4af0e70 - Use http read total timeout by default
    • deaa3ad7 - Log error unless use_read_total_timeout

    Compare with previous version

  • Andy Schoenen added 1 commit

    added 1 commit

    • 0f26f4c1 - Skip total timeout if stream_body

    Compare with previous version

  • Thank you @.luke! I've applied your suggestions. Can you have another look?

  • Andy Schoenen requested review from @.luke and removed review request for @mhenriksen

    requested review from @.luke and removed review request for @mhenriksen

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading