Global Search Stability, Performance, and Scalability.

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

What are you trying to do? Articulate your objectives using absolutely no jargon.

Enable Gitlab Operations and infrastructure team to manage Global Search with less than one headcount needed.

How is it done today, and what are the limits of current practice?

Today the management of the Infrastructure is shared between operations and Global Search Group, while the Product is maturing to a state that requires less management.

What's new in your approach and why do you think it will be successful?

The Global Search Elastic based for core/free/
Need a disaster recovery plan
Stability to not require indexing to be manually paused if the index stops for any reason
Auto-scaling to accommodate additional usage peaks or additional Storage needs
More efficiency in Storage to Usage ratio. Storage is charged in ES cloud
Gitlab Admin console should have more flexibility and capability to remotely modify ES cluster when needed
Performance and growth should be planned
Search Abuse prevention should require less manual intervention
Dashboard improvements for monitoring.
Performance Testing framework

Who cares? If you're successful, what difference will it make?

Reduce the overall cost to manage Search in SaaS
Allow better efficiency in Adding features to Global Search
More cost-effectively serve information currently using PG
Reduce Global search engineering need to manage the Infrastructure (Shift left)
Will improve Stability and performance for larger self-managed customers

What are the risks and the payoffs?

High-value, High- Availability, Higher quality iterations as the performance concerns decrease with this automation.

How much will it cost?

Infrastructure, and Headcount time. (This is included in the current priority and no additional funding is being requested. )

How long will it take?

1 quarter as the top priority.
3 Quarters ongoing work.

What are the midterm and final "exams" to check for success?

Error Budget is consistently green based on manual changes and managing abuse
- this was the case up till the changes to the Error budget reporting.
Updated the data layer in ES to be efficient as usage and content change and increase.
- Sharding changes to Notes index.
- Making each content type its own index.
- More efficient sharding strategy.
DR- Runbook in the event of a failure.
DR- Restore from the confirmed good snapshot.
Robust indexing pipeline- (queue-based, with WAL logs)
High-Availability Multiple zone and regions for fallback and duplication
Traning and documentation for the Operations team.
Global Search Team spends less than 10% of the time in the milestone working on Performance and infrastructure needs and changes as the Operations team is properly enabled to take ownership.

How is would this fit into the GitLab DR plan?

Steps to Transition back to Ops as a primary.

Training Elasticsearch Operations and Fundamnetals in Elastic Cloud
Documentation for Elastic Cloud Operations
Elastic Cloud Runbooks
Elasticsearch production incident management documentation

Edited Jul 21, 2025 by 🤖 GitLab Bot 🤖

Assignee Loading

Time tracking Loading