when gitlab pages are loaded from object storage, does the site get downloaded as a zip, or do the elements of the site get downloaded as zips? Is there a way to cache this locally?
The zip archive is not downloaded in to Pages. Instead, we make multiple requests, for example the first time we access a domain:
- create a resource in memory by fetching 1 byte to ensure we support ranged-requests. This gives us the total size of the archive.
- Load zip metadata
- Load all archive files names into memory
- Get the requested file offset, for example for
index.html - Get the actual bytes of the file
The resource (zip archive) is now cached. If we want to load additional files within the cached period:
-
For another_file.html:
1, Get file offset 2. Get bytes
-
For
index.htmlagain:- File offset cached, we just need one request to fetch the bytes
The test TestOpenCached shows this in action
An extra refresh request might happen if accessed during the zip_cache_refresh period. If the content changed (e.g. after a new pages deployment), the resource in memory needs to be updated, triggering the 5 requests again (treated as a new resource).
If you would like to reduce the number of requests made to object storage, you could configure and extend the timings of the cache. This might be good solution if your Pages content doesn’t change too often.
For example, you could increase zip_cache_expiration and reduce zip_cache_refresh. If the archive expires in 10 minutes, and the refresh period is only 10s, it means we would always serve the cached content for 9m50s. If a request comes at time 9m55s, a refresh will occur. The risk of doing this, is that you may see old content, potentially indefinitely, as the resource would never be refreshed and rather extending its lifetime in memory.