Revisit Debian packages unicity
Context
Given the current schema, in particular:
classDiagram
Project "1" <-- "0..n" Package
Package "1" <-- "0..n" PackageFile
Package "1" <-- "0..n" DebianPublication
DebianProjectDistribution "1" <-- "0..n" DebianPublication
Project "1" <-- "0..n" DebianProjectDistribution
PackageFile "1" <-- "0..1" DebianFileMetadatum
class DebianProjectDistribution {
+bigint :id
+bigint :project_id
+text :codename
}
class DebianPublication {
+bigint :package_id
+bigint :distribution_id
}
class DebianFileMetadatum {
+bigint :package_file_id
+smallint : file_type
+text : component
+text : architecture
+jsonb : fields
}
We currently enforce unicity on packages for a given distribution (see #unique_debian_package_name
in app/models/packages/package.rb#L403-414).
We don't enforce any unicity on package files or Debian file metadata.
However, the specs has a section about duplicate packages:
A repository must not include different packages (different content) with the same package name, version, and architecture. When a repository is meant to be used as a supplement to another repository this should hold for the joint main+supplement repository as well.
A Sources index may contain multiple versions of one source package. A Packages index may contain multiple versions of one binary package, for the same architecture and/or multiple architectures (that is, all and the native architecture). In the official Debian archive, this is used to keep around old versions of an Architecture: all package that is still needed by the other packages.
Here, the word package is what we call package file in GitLab codebase.
From this, we should probably enforce unicity based Package
, Version
and Architecture
fields (from fields
in packages_debian_file_metadata
table) for a given distribution (only for types .deb
and .udeb
). However, this won't work for future group packages unless we enforce this at an upper level (which could break when moving the project to a different group).
We should also probably handle when the duplicated package file is bit-to-bit identical, and ignore it in this case.
Alternatively, we can leave this duty to the user.
Proposal
-
Change package unicity !111027 (merged): - from
(name, version, distribution)
within a project (whendebian? && !version.nil?
) with conditionnot_pending_destruction
) - to
(name, version)
within a project (whendebian?
) with conditionnot_pending_destruction
)
- from
-
Document package unicity !117492 -
Change package file unicity !117492 - from no unicity
- to
(file_name)
within a project
-
ensure file_name match expectations (i.e. equals to generated file name from Debian file metadata) -
Handle package file name conflicts gracefully (to allow building the same source for different architecture): - if same package and sha256sum -> mark incoming package file as pending destruction
- otherwise -> fail