Serve HTML build artifacts as website, a.k.a. GitLab Pages

Moved from https://dev.gitlab.org/gitlab/gitlabhq/issues/2290

GitHub pages is popular, it is something I loved to use and GitHub is actively marketing it to large customers as a unique benefit.

We want it but have to decide if we want to use standalone application that builds it or one of the two GitLab CI flows.

Requirements:

If you serve html from user content it is best to use a different domain name to prevent XSS against credentials.

The user should be allowed to choose their own FQDN to serve the content from

It uses Jekyll setup by https://github.com/github/pages-gem

Standalone app advantages:

Easy to do, see https://github.com/Glavin001/GitLab-Pages

Better user experience because less steps to setup

What GitHub did as well https://github.com/blog/1992-eight-lessons-learned-hacking-on-github-pages-for-six-months

Easier to do with different domain name and own FQDN

Using GitLab CI advantages:

Allows reuse of all improvements for other purposes

Can leverage existing runner infrastructure

GitLab CI flow with runners on the pages server:

need a .gitlab.yml file in the project

have a server that listens for the right fqdn's

need a runner per project to ensure you can't push to someone else their project

GitLab CI flow with runners external to the pages server:

need a .gitlab.yml file

need to setup a runner

need a server that listens for the right fqdn's

runner upload build artifacts to coordinator

coordinator pushes build artifacts to server

@dzaporozhets what do you think? I think a standalone app looks the easiest.

@sytses:

there is a server that runs our 'GitLab Pages' software that has its own FQDN and is authenticated for all projects on GitLab CI

you have a artifacts attribute in .gitlab-ci.yml that tells to trigger a webhook on the pages server FQDN when artifacts are ready

when the pages server gets the artifacts webhook it starts downloading the build artifacts from GitLab CI though the API

GitLab Pages makes the artifacts available on FQDN/project-name

Additionally you have a cname attribute in .gitlab-ci.yml is also passed to GitLab Pages server in the artifacts webhook. This serves to update the routing table of Nginx (maybe dynamic in Lua). Users need to ensure that the DNS of the canme resolves to the IP address of the GitLab Pages server.

So in .gitlab-ci.yml you have:
image gitbook-middleman-docker-image
upload
artifacts pages.example.com
cname www.example.com

@sytses:

The advantage of doing it via GitLab CI instead of doing everything in Pages is:

Easy to get multiple runners

People can select their our processor/image

No need to give Gitlab Pages access to GitLab, only needs access to GitLab CI build artifacts

Disadvantage is that build artifacts get copied twice.

What we also can do is change GitLab Pages => GitLab Artifact and always upload build artifacts to an app separate from GitLab CI. In this case it should not also serve html but also allow people to preview/download pdf and others (if there is no index.html).

@sytses:

I want to describe the option to store the artifacts in GitLab CI and add a simple routing layer to CI, preventing the need for another application.

Problem is that artifacts normally need to be versioned (per SHA1) and that they need access control. We can store versioned artifacts in /var/opt/gitlab/artifacts/PROJECT_NAME/SHA1.

Some projects will not want to use versioned artifacts (because they get too big) and only store and show the latest version(s). But they will probably still need access control.

Projects with the sites attribute are added to a directory /sites where a PROJECT_NAME symlink points to the latest SHA1. Having ci.example.com/sites/PROJECT_NAME serve the assets would be possible a simple NGINX rule to forward ci.example.com/sites to the /sites directory with the symlinks. The Cname functionality would maybe require a dynamic Nginx rule.

The .gitlab-ci.yml file would look like:
image gitbook-middleman-docker-image
artifacts
sites

@sytses:

Thinking about this GitLab Sites further and taking into account:

Docker images that can now be specified in .gitlab-ci.yml

Zero config CI

Artifacts upload https://gitlab.com/gitlab-org/gitlab-ce/issues/3028

To host the site on s3.amazon.com you just need to execute two steps:

You define s3 credentials for a bucket in GitLab CI.

You insert a .gitlab-ci.yml file that defines a Jekyll processor docker image and has artifacts upload --permissions public that sets public rights http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html

I think this is extremely clean and shows that we're on the right track with the docker images and artifacts upload.

If you want a custom domain (www.example.com) you should name the bucket appropriately and add a cname attribute to let the the assets uploader execute some extra steps: http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html Apart from that you should of course add a new record to your DNS configuration.

For use on-premises or to offer gitlabsites.com we need software for s3 compatible buckets (Ceph) and put a webserver around that.

@JobV:

This is very close to something that can be done natively with S3 buckets. You can simply serve a static site from any bucket by a setting. You can also just push to them.

My worry is that we don't make it much easier than that. See also middleman s3 sync, which does this for you.

@DouweM:

Any step the user doesn't completely understand is a step they won't take. Supporting S3 for power users is great, but for the regular user, setting it up with gitlabsites.com needs to be really straightforward. .gitlab-ci.yml may already be too complicated.