Optimize Latency for AI-Powered Features(by Reducing Sidekiq Dependency)
Description
AI-powered features are experiencing latency issues and service disruptions during Sidekiq outages. This issue aims to explore and implement solutions to reduce latency and improve reliability for these features.
Problem Statement
- AI-powered features, are experiencing high latency due to Sidekiq processing.
- During Sidekiq outages, these features become unusable, impacting user experience.
- Current architecture adds unnecessary round-trip delays, affecting response times.
Proposed Solutions:
1. Workhorse Offloading:
- Pros:
- Reduces load on Rails/Puma workers
- Minimizes time spent in Rails
- Similar to existing code suggestion implementation
- Cons:
- Rails loses visibility of AI gateway responses
- Requires maintaining connection logic in both Rails and Workhorse
- Limited by current websocket support for some AI features
2. Direct Communication with AI Gateway:
- Pros:
- Eliminates Sidekiq dependency
- Potentially reduces overall latency
- Cons:
- Requires significant architectural changes
- May need to move logic out of Rails into AI gateway
3. ActionController::Live with WebSockets:
- Pros:
- Enables real-time updates
- Could reduce load on Sidekiq workers
- Cons:
- May spawn uncontrolled threads in Rails
- Could complicate capacity planning
4. Fibers for Lightweight Concurrency:
- Pros:
- Improves job processing efficiency within Ruby threads
- Better resource utilization for I/O-bound tasks
- Cons:
- May require significant refactoring of existing code
Possible Next Steps
- Conduct detailed latency analysis to identify bottlenecks
- Set specific latency targets for AI-powered features
- Create proof-of-concept implementations for proposed solutions
- Consult with infrastructure team on feasibility and impact of each approach
- Evaluate performance gains and integration complexity for each solution
Links
-
Code Suggestions Implementation
- URL: https://gitlab.com/gitlab-org/gitlab/blob/8b47a4e4bbc33858f18d623634f63cd94ad9138d/ee/lib/api/code_suggestions.rb#L101
- Context: Current implementation of code suggestions using Workhorse offloading.
-
Merge Request for Workhorse Implementation
- MR: Serve completions endpoint through Workhorse (!126957 - merged) • Matthias Käppler • 16.3
- Context: Implementation of serving completions through Workhorse.
-
Related Issue: Completion Worker Delay
- Issue: https://gitlab.com/gitlab-org/gitlab/-/issues/482625+s
- Context: Highlights problems with Sidekiq processing delays affecting AI feature responsiveness.
-
Code Suggestion Performance Dashboard
- Epic: &12224
- Context: Detailed performance analysis for code suggestions, including latency breakdowns.
-
Sidekiq Incident Report
- Issue: 2024-09-03: The sidekiq_queueing SLI of the sid... (gitlab-com/gl-infra/production#18489 - closed) • Matt Smiley, Rehab+
- Context: Details of a recent Sidekiq outage affecting AI features.
-
Flamegraph Profiling Guide
- URL: https://gitlab.com/gitlab-com/runbooks/-/blob/v2.198.1/docs/tutorials/how_to_use_flamegraphs_for_perf_profiling.md
- Context: Guide on using Flamegraphs for performance profiling.
-
Latency Monitoring Dashboard
- URL: https://log.gprd.gitlab.net/app/dashboards#/view/3684dc90-73f6-11ee-ac5b-8f88ebd04638
- Context: Dashboard showing median values for Sidekiq scheduling latency and worker run duration.
-
ActionController::Live Documentation
- URL: https://api.rubyonrails.org/classes/ActionController/Live.html
- Context: Official documentation for ActionController::Live, a proposed solution.
-
Vertex AI Claude Integration
- URL: https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude
- Context: Documentation on using Claude models via Google Vertex AI.
Edited by David O'Regan