Tech evaluation: Object storage using presigned URLs
This is a follow-up tech evaluation from #355 (closed)
@ayufan thanks for your input on slack!(copying here)
- We need to use pre-signed URLs from GitLab, that way we don’t need any credentials on Pages, and whether the .zip is used can be controlled by Rails exclusively, the link would have an encoded and Rails controlled expiry date
- If serving from .zip I think we need to likely define the maximum archive size that we can support, likely filtering the relevant files (public/ only folder), and holding that somewhere in memory. I would assume that we could likely configure how many files-in-archives/archives we cache and allow this to be configured and optimised towards cache-hit-ratio, likely GitLab.com would allow to use a ton of memory if needed
- I would likely break the support for Content-Range if serving files as I don’t think that this is cheaply possible with .zip
- GitLab Workhorse does have OpenArchive that supports local and remote archive just it is not performance optimised: the HTTP requests are badly aligned and this will likely need to be somehow improved, so just copy-pasting will not give a great performance yet
-
@vshushlin started a discussion: (+7 comments) Oh, I thought that !136 (closed) has some object storage implementation while it only has serving from zip files from disc
🙈 I have a very simple idea for alternative PoC:
- We can copy(and maybe slightly modify) https://gitlab.com/gitlab-org/gitlab/-/blob/84c0ffe12646b9bae1fdf2e576cde7f01f8ded73/lib/api/job_artifacts.rb#L75-94 to pages API https://gitlab.com/gitlab-org/gitlab/-/blob/75f8d42bb443d0a6101a9c2f6b65c607cd95efd4/lib/api/internal/pages.rb#L19
- Then we can return the
job_id
in the API - Pages will get specific file and proxy it to user.
- Later we can add cache for it.
Alternatively, we can get the whole artifacts zip archive and use the current zip reading code, then we'll need to cache those files.
I don't think that adding some object-storage specific code to pages is a good idea. We already have it in the workhorse, we can just use the API. It's slower, but much simplier.
Diagram/proposal
sequenceDiagram
participant U as User
participant P as gitlab-pages
participant G as gitlab-workhorse and rails
participant OS as Object Storage
U->>P: 1. username.gitlab.io/index.html
P->>G: 2. GET /api/v4/internal/pages?host=username.gitlab.io
G->>P: 3. {... lookup_paths: [{source: {type: "zip", path: "presignedURL"}],...}
loop zipartifacts
P->>P: 4. reader:= OpenArchive(presignedURL)
P->>OS: 5. GET presignedURL
OS->>P: 6. .zip file
P->>P: 7. reader.find(public/index.html)
P->>P: 8. go func(){ cache({host,reader}) } ()
end
P->>U: 9. username.gitlab.io/index.html
Proposal
In this PoC we will hardcode the returning value from /api/v4/internal/pages
to reduce the scope. I will use minio which is already supported in the GDK. I'll also shamelessly steal and slightly modify the zipartifacts
package from workhorse.
To address #377 (comment 367358348) the source type should be "zip"
so that Pages can serve from .zip
regardless of the path (pre-signed URL or disk path).
Outcomes
We have now &3901 (closed) and &3902 (closed) with parent &1316 (closed) to track all future efforts.
Rails
- Allow deploying Pages as
.zip
archives with amax_archive_size
. gitlab#208135 (closed) - On deploy ->check size -> store
public.zip
either on disk or in object storage depending on the features enabled. also tracked in gitlab#208135 (closed) - Update
/api/v4/internal/pages
-> return a "source"."type":“zip” with a path gitlab#225840 (closed) e.g.
{
"lookup_paths" : [
{
"source": {
"type": "zip",
"path": "https://presigned.url/public.zip",
"_":"or from disk path"
"path": "/shared/pages/domain/project/public.zip"
}
]
Pages (Go)
- extract the
resolvePath
logic from disk serving into its own package so it can be shared. #421 (closed) - Add package
zip
withzip/reader
gitlab#28784 (closed) - Add
zip
serving to Pages - this allows serving from disk or pre-signed URLs from object storage gitlab#28784 (closed) - Implement a zip reader caching mechanism #422 (closed)
- Add metrics for zip serving #423 (closed)
- while testing I hit #371 so I think it would be valuable to work on that issue first.