Upgrade Cluster for log.gprd.gitlab.net
In discussions with Elastic via https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23550, we have learned it will likely be easiest to get logs.gprd.gitlab.net to Elastic 8.x by building a new cluster and switching log shipping over to it.
@dawsmith - further discussions with Elastic as of 2023-07-03 have shown this upgrade to be about the same difficulty.
Upgrading in place, will be difficult and since retention is only 7 days, a new build may be the best approach.
Older info:
We are looking to get input from our Elastic account and support contacts for what the new cluster would best be configured as. That information and input should be available to us next week by around June 15.
We covered that we would like to:
- Evaluated increased retention per discussions in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23011
- Handle log ingestion overload per learnings in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/23681
- keep elastic credit usage at the same burn rate.
Per looking at the Elastic Calculator, we may be able to go with:
- General Purpose or Compute Optimized Hardware profile.
- 3 or 4 tiers relying heavily on Cold/Frozen tiers for the 2-14+ day retention.
- Add extra Ingest instances
We would need to create new ILM policies in https://gitlab.com/gitlab-com/runbooks/-/tree/master/elastic/managed-objects to match the tiering we decide upon.
From Elastic, the recommended upgrade would then be to switch the logging pubsub to point to the new cluster.