Image Resizing: [3] Spawn separate process on request approach
Idea
On each resizing request, we spawn/fork a separate process which would be responsible for the resizing.
Pros
- Probably, the most granular control over Mem/CPU: we could set the limit for each process and control the rate at which we spawn/fork new ones.
- mk: let's clarify this (don't understand)
- Would not require a long-living process like in the Sidecar approach (#230517 (closed))
- mk: how is that a pro though?
- It seems that we do something similar while executing our
/cmd
utils (to check if it is like that)- mk: you mean for consistency reasons?
- Failures are isolated out of main serving process
- Easy to evolve into from embedded approach
- Can rely on existing tools to do the heavy lifting (e.g.
imagemagick
)
Cons
- May be expensive in terms of resources (to check the latency/CPU/Mem hit)
- Memory thrashing: need to page in and out the same program data constantly
- Zombies; since we would be running hundreds of thousands of these every hour, we might be creating zombies that get stuck or slip out of the parent pid (e.g. double fork) so we cannot reap them anymore
Concerns
For each concern, it is good to have a strategy on how to evolve the solution to solve it and the estimation of doing so.
Concern: Does it fit the WH philosophy? (to ask WH/Infra folks)
Solution: TBD
Security prerequisites
As stated by Jeremy in https://gitlab.com/gitlab-com/gl-security/engineering/-/issues/1043#note_388815325:
-
Absolutely avoid processing
svg
files: here is a post mortem of the 3rd party service we are currently using to resize Gitter images. Anyhow it should not be an issue becausesvg
files don't need to be resized. - Enforce strict limits on inputs:
- file size
- picture size
- crosscheck extension matches signature
- Sandbox the library that will do the image processing, i.e. don't run it as the same linux account than Rails/Workhorse/Gitaly.
Notes
Edited by Aleksei Lipniagov