Support Docker Compose `deploy` Specification for HA in incus-compose

This issue tracks the implementation of the Docker Compose deploy specification in incus-compose, enabling users to define HA and scaling configurations in a familiar, declarative way. The goal is to align with the Compose spec while leveraging Incus's unique features (e.g., live migration, system containers).


Key Features to Implement

  1. deploy.replicas
    • Basic N-replica scaling is already implemented.
    • Remaining work: auto-distribute replicas across cluster nodes when shared storage is configured.
    • Warn if shared storage is missing.
  2. deploy.mode
    • Support replicated (default) and global (one container per node).
  3. deploy.placement
    • Support constraints and preferences to control container placement.
    • Example: placement.constraints: node.labels.storage == ssd.
  4. deploy.restart_policy
    • Map condition to Incus boot config (boot.autostart, boot.autorestart).
    • delay and max_attempts have no Incus equivalent and will be ignored.
  5. deploy.update_config
    • Implement rolling updates (e.g., parallelism: 1, delay: 10s).
  6. deploy.rollback_config (Future)
    • Implement rollback support for failed updates.

Out of Scope (For Now)

  • deploy.endpoint_mode: Requires external load balancer integration.
  • deploy.rollback_config: Lower priority than core HA features.

Proposed UX for incus-compose HA

1. Basic Replica Scaling

services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 3
  • Behavior:
    • Creates 3 containers named web-1, web-2, web-3.
    • Auto-distributes them across available nodes (if shared storage is configured).
    • Warns if shared storage is missing.

2. Global Mode (One Container per Node)

services:
  agent:
    image: monitoring-agent:latest
    deploy:
      mode: global
  • Behavior:
    • Creates one container on each node in the Incus cluster.
    • Automatically starts a container on new nodes as they join the cluster.

3. Placement Constraints and Preferences

services:
  db:
    image: postgres:14
    deploy:
      replicas: 2
      placement:
        constraints:
          - node.labels.storage == ssd
        preferences:
          - spread: instanceId
  • Behavior:
    • Only deploys containers on nodes labeled storage=ssd.
    • Spreads containers across nodes to avoid co-location.

4. Restart Policy

services:
  worker:
    image: myworker:latest
    deploy:
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
  • Behavior:
    • Restarts the container on unexpected exit (boot.autorestart=true).
    • delay and max_attempts are not supported by Incus and will be ignored.

5. Rolling Updates

services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        order: start-first
  • Behavior:
    • Updates containers one at a time.
    • Starts the new container before stopping the old one (start-first).
    • Waits 10 seconds between updates.

6. Resource Limits

services:
  app:
    image: myapp:latest
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
  • Behavior:
    • Sets CPU and memory limits for the container (already implemented).

Tasks

  • Implement node-aware distribution for deploy.replicas across cluster nodes.
  • Add shared storage validation when using replicas or global mode.
  • Implement deploy.mode: global.
  • Implement deploy.placement.constraints and preferences.
  • Map deploy.restart_policy.condition to Incus boot config.
  • Implement deploy.update_config for rolling updates.
  • Document HA examples (e.g., 3-node cluster with replicas: 3).

Open Questions

  1. Should incus-compose automatically create node labels (e.g., storage=ssd) if they don't exist, or require users to label nodes manually?
  2. How should incus-compose handle node failures during rolling updates (e.g., pause, continue, or rollback)?
  3. Should we support deploy.rollback_config in the initial implementation, or defer it to a later release?