Skip to content

Docker Image 24.04 rebase - non-root user issue.

Root Cause Analysis: Ubuntu 24.04 Base Image Causing Boot Loop in Existing Containers

Overview

When updating our Docker base image from Ubuntu 22.04 to Ubuntu 24.04, existing containers failed to start and got stuck in a boot loop. This issue was traced to a conflict caused by the presence of a default ubuntu user in the new 24.04 image. Because of how user IDs (UIDs) were assigned, our own non-root user could no longer access mounted files.

Affected Versions

  • 4.4.4: Unaffected
  • 4.4.5 – 4.4.6: Affected by the user-permissions conflict
  • 4.4.7: Will include the fix to remove the shipped ubuntu user and restore normal operation

Timeline

  1. Initial Update (Version 4.4.5)

    • A pull request changing the base image from ubuntu:22.04 to ubuntu:24.04 was merged.
    • Local tests on fresh containers passed without issues.
  2. Deployment (Version 4.4.6)

    • The updated Docker image was rolled out to production.
    • Existing containers attempted to restart and entered a boot loop.
  3. Investigation

    • Discovered that ubuntu:24.04 images ship with a default ubuntu user (UID 1000).
    • Our container Dockerfile created a new non-root user (assumed UID 1000), causing a UID conflict.
    • Mismatched UIDs prevented file access on mounted volumes.
  4. Resolution & Testing

    • Explored two options:
      1. Use the shipped ubuntu user and remove the custom user.
      2. Remove the shipped ubuntu user and continue creating our custom user.
    • Chose option 2 to maintain better control over user privileges (especially sudo usage).
    • A fix was implemented and tested; new builds confirmed that both fresh and existing containers now operate correctly.

Root Cause

  1. User ID Conflict

    • Ubuntu 24.04 ships with a default ubuntu user with UID 1000.
    • Our non-root user creation expected UID 1000, causing mismatched permissions and locked file access.
  2. Inherited Permissions

    • Mounted volumes rely on consistent UID mappings.
    • Any conflict in UID ownership leads to immediate permission errors.
  3. Inconsistent Testing

    • Fresh-container testing did not reveal the conflict.
    • The issue appeared only when existing containers with persistent volumes were restarted.

Impact

  • Production Outage

    • Affected versions (4.4.5 – 4.4.6) caused existing Docker containers to go into a boot loop, resulting in downtime or service disruption.
  • Time & Effort

    • Engineering time was spent diagnosing and resolving the UID and permission mismatch.

Resolution

High-Level Fix

  1. Remove the shipped ubuntu user

    RUN touch /var/mail/ubuntu \
        && chown ubuntu /var/mail/ubuntu \
        && userdel -r ubuntu
    • Ensures UID 1000 is freed up for our custom user.
  2. ** Continue to create our own non-root user**

    && useradd -g root -M crafty \
    && mkdir /crafty \
    && chown -R crafty:root /crafty
    • We maintain one non-root user (crafty) with the appropriate permissions and ownership.
  3. Retain sudo Usage

    • Removing the default ubuntu user prevents unauthorized or duplicate sudo usage.
    • Our custom user is not a sudoer, where we use sudo during container init, to step down safely as required by our workflow.
  4. Target Release

    • This fix will be included in version 4.4.7, ensuring no further conflicts for users upgrading from unaffected versions (4.4.4) or already affected versions (4.4.5 – 4.4.6).
  5. Users deploying after 4.4.4

    • Users who deployed a fresh instance of the problematic versions, will need to repair permissions after upgrading to 4.4.7, this can be done by placing a file (empty text file will do) in the import/ mount and restarting the container. The file can be removed after crafty has fully booted

Validation

  • Fresh Build Testing

    • Verified that containers built from scratch function correctly and that our custom user is assigned UID 1000 as intended.
  • Upgrade Testing

    • Tested existing containers with persistent volumes to confirm they start without permission issues under the updated image.

Preventive Measures

  1. Regular Image Audits

    • Prior to upgrading base images, review release notes (especially new Ubuntu LTS versions) to identify default user or permission changes.
  2. Automated Testing for Existing Containers

    • Expand CI/CD pipelines to include restarting containers with existing volumes to quickly catch user-permission conflicts.
  3. Version Documentation

    • Clearly document which versions are affected and note the resolved version (4.4.7).
    • Keep a changelog that highlights significant changes to base images or user-management practices.
  4. Upstream Communication

    • Continue monitoring Ubuntu release bugs (e.g., Launchpad #2005129) and Docker community channels for changes in default user configurations.

Conclusion

Switching to ubuntu:24.04 in MR !812 (merged) - Release v4.4.5 introduced a default ubuntu user with UID 1000, conflicting with our own non-root user creation process. This conflict primarily affected existing containers with persistent volumes in versions 4.4.5 – 4.4.6. By removing the default user and relying on our custom user, we resolved the UID conflict. This change is tested, verified, and slated for release in version 4.4.7, ensuring continuity and stability for our deployments.

Edited by Iain Powrie