Skip to content
  • Krishnan Parthasarathi's avatar
    feat: implement prefix-level versioning exclusion (#14828) · ad8e6110
    Krishnan Parthasarathi authored
    Spark/Hadoop workloads which use Hadoop MR 
    Committer v1/v2 algorithm upload objects to a 
    temporary prefix in a bucket. These objects are 
    'renamed' to a different prefix on Job commit. 
    Object storage admins are forced to configure 
    separate ILM policies to expire these objects 
    and their versions to reclaim space.
    
    Our solution:
    
    This can be avoided by simply marking objects 
    under these prefixes to be excluded from versioning, 
    as shown below. Consequently, these objects are 
    excluded from replication, and don't require ILM 
    policies to prune unnecessary versions.
    
    -  MinIO Extension to Bucket Version Configuration
    ```xml
    <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
            <Status>Enabled</Status>
            <ExcludeFolders>true</ExcludeFolders>
            <ExcludedPrefixes>
              <Prefix>app1-jobs/*/_temporary/</Prefix>
            </ExcludedPrefixes>
            <ExcludedPrefixes>
              <Prefix>app2-jobs/*/__magic/</Prefix>
            </ExcludedPrefixes>
    
            <!-- .. up to 10 prefixes in all -->     
    </VersioningConfiguration>
    ```
    Note: `ExcludeFolders` excludes all folders in a bucket 
    from versioning. This is required to prevent the parent 
    folders from accumulating delete markers, especially
    those which are shared across spark workloads 
    spanning projects/teams.
    
    - To enable version exclusion on a list of prefixes
    
    ```
    mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
    ```
    ad8e6110