Image Resizing: [5] Resizing via Sidekiq - static/upfront or lazily

Idea

Instead of resizing images on request, we produce resized versions of the same image ahead of time.

Pros

Moves image scaling out of the request path and into a background job, so no image sizing will happen for site rendering. (away from web/workhorse). This would be especially beneficial for larger images like content images, where resizing can be costly.
It adds fewer moving parts, since it appears that we could leverage Sidekiq + Carrierwave to perform the scaling (Carrierwave already support imagemagick: https://www.rubydoc.info/github/jnicklas/carrierwave/CarrierWave/MiniMagick)
It could happen on the upload path, then latency is not as important
It can scale the queue workers as needed
It's flexible - with combined lazy/static approach we don't need to store all image versions. We could recreate only the most popular ones, and serve the original/next closest image size while recreating missing versions in the background.
We are no longer resizing in-line
The request cannot be flooded, as we prevent multiple resize requests entering sidekiq while waiting for the first one to complete.
Easy to maintain: easy to add new sizes, easy to support different sizes per model type (user avatars, group avatars, project avatars).
Configurable, it can be easily disabled, or put behind the feature flag

Cons

It is unclear which sizes we should generate upfront, since most of them will likely never be requested, resulting in a lot of image data sitting unused
- Answered in this comment - #232616 (comment 393263867). We will generate images that cover the majority of use cases. Projects - top 3 sizes cover 92%, Users - top 6 cover 92%, Group - top 3 cover 95%
Since it would only work for new uploads, we would either have to go back and rescale GitLab's entire image library in a similar way, or combine it with a semi-dynamic strategy where in the case we do not yet have a rescaled image, kick off that job lazily
- Answer - the second choice here. We do not need to rescale the entire image library. We will kick off the jobs lazily when images are requested that have not yet been resized. Rescaling the entire image library is not desired and could result in unneeded conversions of little used projects and their images.
Since resized images are not part of the serving layer at all, the amount of book keeping increases, since we need to know which sizes we have already generated to be able to tell workhorse which one to deliver
- Answer - we will still serve /uploads/-/system/user/avatar/895869/avatar.png?width=40, If the image version with avatar_x40.png is present, it will be served, if not, the original avatar.png/or next biggest size will be served.
We will likely have to rely on additional object storage to hold all of these pre-scaled images, and it is unclear how that would happen for self-managed, where this might not be an option
- Answer - Additional storage needed for avatars would require an additional ~80Gb. Its answered here: #232616 (comment 393263867). For self-managed it will default to local storage, but it will require additional space requirements. The question remains what will happen with self-managed where this might not be an option. We could disable the resizing feature for them?

Variations

There are some variations to this approach that might be somewhere in between dynamic and static scaling.

For instance, instead of generating each size ahead of time, we could do it as a background job still, but lazily, so that whenever a certain size is requested and has not yet been generated, we create a fire-and-forget job that recreates them all, serve the original image/scale next largest icon, but the next time the same request comes in, the respective size would exist.

Similar to how you warm a cache, we could synthesize these requests before shipping this in order to auto-generate images that we know are requested frequently (e.g. avatars for gitlab-org/gitlab).

We could also pre-create only the most popular sizes (For example only 40 and 24 for User avatars) and lazily serve/recreate other sizes.

It is possible that we could also lazy re-create only specific requested size instead of recreating all sizes.

Concerns

For each concern, it is good to have a strategy on how to evolve the solution to solve it and the estimation of doing so.

Concern: what will happen with self-managed that doesn't support object-storage. It would require additional space requirements for their local storage which is maybe not an option.
- Solution: TBD
Concern: Need a way to prevent multiple resize requests entering sidekiq while waiting for the first one to complete
- Solution: We need to introduce new :until_executed deduplication strategy which locks until the previously scheduled job finishes (gitlab-com/gl-infra/scalability#195 (closed)). This will prevent resize background jobs to be running simultaneously multiple times.
Concern: How should we approach Content images in the feature
- Which sizes to support?
- What additional storage will this cost?
- Solution: TBD
Concern: If no match, do we resend the original or next biggest size?
- Solution: We could easily check for the next biggest version that exists, but for this PoC, it is not required. We could easily improve this in the next iteration.
Concern: After storing an image version, should we create a corresponding Upload record?
- Solution: We probably should, since it would store the information about image versions sizes. But it would increase number of records in the database. At the moment we have 1.7 million records in this table only for avatars. If we decide to record versions in Upload record, we will need to add new column version_name, and we should provide a way to properly restore to correct Version Uploader from Upload record.
Concern: How do we roll out? % of users? By project? If we do not pre-create any image sizes and we go with the full lazy approach, this could generate a lot of sidekiq resize background jobs, for each user/project/group avatars.
- Solution: We could enable the feature flag first for popular projects (gitlab-org/gitlab) and then we could gradually rollout the ff for specific % of users, so we prevent generating a log of resizing background jobs.

Security prerequisites

Concern: Masquerading SVGs or PNGs
- Solution: this is the common issue for both static and dynamic approaches. We created a separate issue for this: https://gitlab.com/gitlab-org/gitlab/-/issues/235140. We are whitelisting content-types for avatars, and we are using MiniMagick to parse the actual content-type header -https://gitlab.com/gitlab-org/gitlab/-/issues/235140#note_398571766

Notes

List of all avatar sizes ordered by popularity: #227388 (comment 391901232)
Storage cost for avatars: #232616 (comment 393263867)

Edited Aug 19, 2020 by Nikola Milojevic