Skip to content
Snippets Groups Projects

Improve docker timeouts

Merged Kamil Trzciński requested to merge improve-docker-timeouts into master
All threads resolved!

What does this MR do?

Tries to resolve the state of #2408 (closed).

Most of the errors that the user sees are related to Docker API taking a long time to process requests. For example, any I/O expensive operation can make Docker Engine to not be responsive fast enough. Increasing timeouts gives more room for Docker Engine.

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Tests
    • Added for this feature/bug
    • All builds are passing
  • Branch has no merge conflicts with master (if you do - rebase it please)

What are the relevant issue numbers?

cc @tmaczukin

Related to #2408 (closed)

Edited by Kamil Trzciński

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Kamil Trzciński changed the description

    changed the description

  • mentioned in issue #2408 (closed)

  • added 1 commit

    Compare with previous version

  • @ayufan Should we maybe make this configurable from config.toml? I mean rising the default values is IMO a good change, but in some environments it may be still not enough. Having the possibility to configure these values from config.toml every user could adjust the settings to mach his environment, no matter how slow it is.

  • Author Maintainer

    I don't think that we can make it unconditionally too high. We should rather figure out good defaults.

  • I think it should be configurable value in a runner (config.toml). For example we could only use GitLab CI when it is able to download a large docker images from our private repo and this process could take several minutes.

  • Author Maintainer

    @tjurak Are large downloads affected by this? AFAIK, this is already handled OK.

  • Author Maintainer

    @tjurak I would not give unreasonable big timeouts, as it prevents fail-fast. I would rather have reasonable timeouts, where: 1. we make it clear that something timed out, and for how long, 2. we retry timed out operations whatever we can.

    We don't retry all operations yet, that we could retry. I plan to fix it after this MR.

  • Yes, but now when I try to download relatively small image from dockerhub with slower internet connection it takes about 3-5min. And I get everytime this error

    Cannot connect to the Docker daemon at unix:///var/run/docker.sock

    When there is small delay and image is already downloaded it work everytime. But sometimes it is necessary to download a new docker image and sometimes it takes a few minutes to download - so it would be nice to be able to set timeouts even manually if needed.

    Developers are unable to "preload" docker images they will need to the runner server. And GitLab does it nicely, except the timeout is too short. Maybe also some retry strategy would be useful.

    Regarding to 'fail-fast', I understand that but this applies rather to small projects with small dependencies. We have large projects with huge dependencies we have to test it all together - fail-fast is something we do not care so much as we need to run test safely without false positives and false erros as this is usually during night.

    So to sum up. I really think that you should come up with some reasonable timeouts and retryies - but let the final user change attributes in a runner so everyone can change this when needed (according to docs for runners). There are a lot of teams like us, we know what we're doing and it will be our decision to live with "longer" fail fasts...

    Now the timeouts and false errors during build are "deal breakers" for us to use integrated CI (although I like it very much more than Jenkins CI)

    Edited by Tomas Jurak
  • Author Maintainer

    Thanks. Let's try to solve these timeouts here. Maybe we need different timeouts for pulling :) I'm happy to introduce them if needed.

  • Author Maintainer

    I made this MR to improve error messages: !964 (merged), so we know what kind of failure it is.

  • A special timeout for pulling could be nice, but I prefer to be able to override default settings when needed. In changes I can se the new timeout is 300s, but this can still lead to a false errors as some of our docker images are not accessible over fast ethernet (for some security reasons) so downloading them could be even around 15 minutes :-)

  • Author Maintainer

    I think that downloading should have unlimited timeout as long as you are pulling anything.

  • Yes, I agree. And other timeouts should be effective after the pull is done (ideally still overridable by administrator of local GitLab runners).

  • Author Maintainer

    I would really love to see where all these errors do came from. So having this !964 (merged), gonna help.

  • Tomasz Maczukin
  • Tomasz Maczukin resolved all discussions

    resolved all discussions

  • Tomasz Maczukin approved this merge request

    approved this merge request

  • Tomasz Maczukin mentioned in commit 2aac6418

    mentioned in commit 2aac6418

  • mentioned in issue #3391 (closed)

  • Please register or sign in to reply
    Loading