Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Topics Snippets
  • Register
  • Sign in
  • ocserv ocserv
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 80
    • Issues 80
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 11
    • Merge requests 11
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • OpenConnect VPN projectsOpenConnect VPN projects
  • ocservocserv
  • Issues
  • #310
Closed
Open
Issue created Jun 09, 2020 by Alan Jowett@Alan_JowettDeveloper

ocserv with 5000 concurrent client connections results in high rate of connection failures (99% connections fail)

During this state, the ocserv-sm socket queue depth shows at 128 / 128 entries (as reported by ss -ax), suggesting that ocserv-sm is unable to keep up with requests. Server recovers from the state when clients timeout.

Client logs show the timeouts waiting for TLS responses from the server, suggesting that SM is working on requests for timed out clients.

The code already has an option to unconditionally sleep after the accept, but this stalls subsequent processing by the ocserv-main process resulting in slower connect times (it blocks the worker process from completing establishment of the connection). If the time is too low, the server still gets flooded.

Here is what the preferred behavior would be:

  1. Only apply the mitigation when the SM process is busy.
  2. Delay accepting new clients when SM is backed up.
  3. Continue processing existing clients while waiting for SM to recover.

Here is the propose changes:

  1. Add a timer that can be used to delay processing of an accept.
  2. On an accept, arm the timer.
  3. If another accept arrives before the timer expires, check the SM queue depth.
  4. If SM queue is backlogged, add accept to pending accept queue, otherwise accept and re-arm timer.
  5. When the timer fires, process any pending accepts.

This gives the following behaviors:

  1. TCP connections arrive with (interval > timer period) -> No change in behavior.
  2. TCP connections arrive with (interval < timer period) and (SM queue < threshold) -> No change in behavior.
  3. TCP connections arrive with (interval < timer period) and (SM queue > threshold) -> Accept is delayed until queue recovers.

Thoughts?

Assignee
Assign to
Time tracking