Mirroring Debian/Ubuntu/... repositories
-
https://wiki.ubuntu.com/Mirrors
- releases mirror (http://de.archive.ubuntu.com/ubuntu-releases/) =
💿 images - vs. archive mirror (http://de.archive.ubuntu.com/ubuntu/) =
📦 packages
- releases mirror (http://de.archive.ubuntu.com/ubuntu-releases/) =
-
https://wiki.debian.org/DebianRepository/Format
-
deb http://security.ubuntu.com/ubuntu jammy-security main restricted
in/etc/apt/sources.list
means (for modern Debian/Ubuntu versions!):- fetch http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease
- check
🔏 signature of InRelease - form there on: no more signature checks but use and check file hashes from InRelease
- use InRelease to see if (on amd64)
main/binary-amd64/Packages.xz
(andrestricted/binary-amd64/Packages.xz
) present - if so, do not fetch it directly but from
main/binary-amd64/by-hash/SHA256/$hashFromInRelease
, path is relative to directory with InRelease -
Packages.xz
has lines likeFilename: pool/main/a/accountsservice/accountsservice_22.07.5-2ubuntu1.3_amd64.deb
,SHA256: f8ed0...
, with paths relative to **URL in /etc/apt/sources.list`
-
-
Size (for one backup/point in time):
- Ubuntu, https://wiki.ubuntu.com/Mirrors: 1.5 TB
- Debian, https://www.debian.org/mirror/size.en.html: 4.5 TB
-
➡ need some form of deduplication (with deduplication, for Ubuntu, maybe 5 - 10 TB per year, heavily depends on how this should be used and usable!)!
-
Existing projects:
-
https://snapshot.debian.org/ (90 TB for 15 years ~ 6 TB per year but probably more in recent years)
- since 2005: https://snapshot.debian.org/archive/debian/
-
https://salsa.debian.org/snapshot-team/snapshot
- uses deduplication using https://en.wikipedia.org/wiki/Content-addressable_storage
- filename is SHA1
- database with "mirroruns" containing mapping of file paths to SHA1
- needs to run a custom web server to serve this!
- allows to replace URLs in
/etc/apt/sources.list
(see above!) with something like https://snapshot.debian.org/archive/debian/20091004T111800Z/- using timestamp of non-existing snapshot redirects to last snapshot before given timestamp
- does not use Memento (https://www.rfc-editor.org/rfc/rfc7089.html) but would probably be a good addition (TODO: fully read and understand RFC 7089)
- https://askubuntu.com/questions/499871/is-there-ubuntus-analogue-of-snapshot-debian-org
- Internet Archive also has some random snapshots: https://web.archive.org/web/20210307030553/http://archive.ubuntu.com/ubuntu/dists/hirsute/
-
https://snapshot.debian.org/ (90 TB for 15 years ~ 6 TB per year but probably more in recent years)
-
Mirror list at https://launchpad.net/ubuntu/+archivemirrors
- Some support rsync
- Maybe useful scripts at https://wiki.ubuntu.com/Mirrors/Scripts
Edited by Rafael Gieschke