Image Resizing: [2] Sidecar process-in-WH approach
Idea
We do the resizing in a separate process that is alive through the whole WH lifecycle and would be responsible for image resizing tasks. Workhorse will communicate with this process through some form of IPC (sockets, RPC, ...)
Pros
- Better control over Mem/CPU than the Inbound approach (#230516 (closed))
- There is a chance it may cheaper in terms CPU/Mem/overall latency than starting/forking a new process on each request (to check)
- Fault isolation from main request serving process
- Can be scaled independently of serving processes
- No new service definition required in Omnibus; it remains an implementation detail of workhorse
- Allows us to abstract away the actual scaling implementation by defining a simple IPC interface; see also Variations (we could decide to start with a simple library based scaler approach, but later swap it out for a more powerful service like
imgproxy
) - It makes it easy to degrade gracefully to the current serving approach if the sidecar falls over, since we could simply serve the original image as before
Cons
- May be tricky to implement
- Will need health monitoring and strategies to restart if down
- No existing examples in the WH (utils in
/cmd
use another approach) - Cannot benefit from existing workhorse utilities such as logging, prometheus integration etc. (although most of this can be reinstated via https://gitlab.com/gitlab-org/labkit
- Higher memory use overall for workhorse nodes
Variations
Variation 1: Sidecar uses custom golang module + imaging library
This is similar to the embedded approach, where a library would size images in-process. Just that it would now run in its own process. Pros: this could be a simple evolution/iteration of the embedded approach, since we'd just extract existing code into a new module and process.
Variation 2: Use stand-alone service
We can use a dedicated scaling service as a sidecar. For instance, imgproxy
can bind to a UNIX domain socket so workhorse could talk to it: https://github.com/imgproxy/imgproxy/issues/296
Pros: image scaling work is already done for us and it should just work", i.e. faster iteration. Cons: we drag in some baggage we don't need, since workhorse already functions as a proxy. This would also almost certainly mean higher overall memory use for workhorse nodes.
Concerns
For each concern, it is good to have a strategy on how to evolve the solution to solve it and the estimation of doing so.
Concern: Does it fit the WH philosophy? (to ask WH/Infra folks)
Solution: TBD
Concern: Does this fit our Kubernetes roadmap? In terms of scaling out, would it be preferable for services to run in their own pods rather than as a sidecar? If it runs as a sidecar on the same pod, can or should we containerize it?
Security prerequisites
As stated by Jeremy in https://gitlab.com/gitlab-com/gl-security/engineering/-/issues/1043#note_388815325:
-
Absolutely avoid processing
svg
files: here is a post mortem of the 3rd party service we are currently using to resize Gitter images. Anyhow it should not be an issue becausesvg
files don't need to be resized. - Enforce strict limits on inputs:
- file size
- picture size
- crosscheck extension matches signature
- Sandbox the library that will do the image processing, i.e. don't run it as the same linux account than Rails/Workhorse/Gitaly.