Kubernetes Fault Tolerance Feedback issue

Feedback: GitLab Runner Fault Tolerance Feature

Overview

We've recently implemented fault tolerance for GitLab Runner, allowing Runner Managers to resume running jobs after restarts or failures. This feature helps address orphaned Kubernetes pods and jobs stuck in "Running" state when Runner Managers restart.

The initial implementation supports:

Kubernetes executor with attach strategy
File store for saving job execution state
Seamless resumption of running jobs after manager restarts

Feedback Request

We'd like to gather feedback from users who have tested this feature to help guide future improvements. Please share your experiences with:

General functionality - Did the feature work as expected? Were you able to resume jobs after Runner Manager restarts?
Configuration - Was the configuration intuitive? Did you encounter any issues setting up the store or other options?
Deployment scenarios - How did you deploy Runner with fault tolerance? Single instance, multiple instances, Helm chart, Runner Operator, etc.
Performance impact - Did you notice any performance changes when fault tolerance was enabled?
Store behavior - How did the File store perform in your environment? Any issues with cleanup, space usage, or job resumption?
Edge cases - Did you encounter any unexpected behavior or edge cases we should address?
Feature requests - What additional capabilities would make this feature more valuable to you? (e.g., additional store types, support for other executors)

Usage Information

Please include the following information when providing feedback:

GitLab Runner version
Executor configuration (relevant parts of your config.toml)
Deployment environment (Kubernetes version, cloud provider if applicable)
Any relevant error messages or logs

Future Development

Based on your feedback, we plan to:

Consider supporting additional store types (e.g., Redis)
Potentially expand support to other executors
Improve handling of edge cases and error scenarios
Enhance the fault tolerance documentation

Your input is valuable in helping us prioritize these improvements. Thank you for testing this feature!