network connectivity issues in capm3 upgrades from 1.1.1 to main
Summary
The last 2 nightly pipelines of upgrade from 1.1.1 to main failed for capm3‑ha‑rke2‑virt‑ubuntu
https://gitlab.com/sylva-projects/sylva-core/-/jobs/7723851502 https://gitlab.com/sylva-projects/sylva-core/-/jobs/7722806431
In both cases, we can observe that updated nodes fail to reach the cluster, we can observe a connectivity issue in cloud-init logs:
Network configuration apparently succeed:
2024-09-02 21:44:05 :: [ 3.754508] cloud-init[731]: Cloud-init v. 24.1.3-0ubuntu1~22.04.5 running 'init' at Mon, 02 Sep 2024 21:44:05 +0000. Up 3.73 seconds.
2024-09-02 21:44:05 :: [ 3.761648] cloud-init[731]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
2024-09-02 21:44:05 :: [ 3.763160] cloud-init[731]: ci-info: +-----------+------+----------------------------+---------------+--------+-------------------+
2024-09-02 21:44:05 :: [ 3.764647] cloud-init[731]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
2024-09-02 21:44:05 :: [ 3.766138] cloud-init[731]: ci-info: +-----------+------+----------------------------+---------------+--------+-------------------+
2024-09-02 21:44:05 :: [ 3.767645] cloud-init[731]: ci-info: | bond0 | True | 192.168.10.23 | 255.255.255.0 | global | 52:54:00:44:44:02 |
2024-09-02 21:44:05 :: [ 3.769127] cloud-init[731]: ci-info: | bond0.100 | True | 192.168.100.23 | 255.255.255.0 | global | 52:54:00:44:44:02 |
2024-09-02 21:44:05 :: [ 3.770629] cloud-init[731]: ci-info: | bond0.100 | True | fe80::5054:ff:fe44:4402/64 | . | link | 52:54:00:44:44:02 |
2024-09-02 21:44:05 :: [ 3.772112] cloud-init[731]: ci-info: | ens4 | True | . | . | . | 52:54:00:44:44:02 |
2024-09-02 21:44:05 :: [ 3.773607] cloud-init[731]: ci-info: | ens5 | True | . | . | . | 52:54:00:44:44:02 |
2024-09-02 21:44:05 :: [ 3.775111] cloud-init[731]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
2024-09-02 21:44:05 :: [ 3.776702] cloud-init[731]: ci-info: +-----------+------+----------------------------+---------------+--------+-------------------+
But DNS resolution failed later on:
2024-09-02 21:44:20 :: [ 18.182381] cloud-init[1742]: vdc 252:32 0 200G 0 disk
2024-09-02 21:44:20 :: [ 18.188841] cloud-init[1742]: >> Installing miniserve for log collection in CI
2024-09-02 21:44:20 :: [ 18.194212] cloud-init[1742]: % Total % Received % Xferd Average Speed Time Time Time Current
2024-09-02 21:44:20 :: [ 18.194316] cloud-init[1742]: Dload Upload Total Spent Left Speed
2024-09-02 21:44:40 ::
[ 38.072662] cloud-init[1742]:
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:06 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:07 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:08 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:11 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:12 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:13 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:14 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:15 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:16 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:17 --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:18 --:--:-- 0curl: (6) Could not resolve host: github.com
The issue is that as part of 1.1.1, we were not using bond in libvirt-metal pods, this feature was added in !2153 (merged)
As we are using older version of libvirt-metal pods for 1.1.1 installation, they are not able to handle newer bond configuration during upgrade.
We should change CI-values to install 1.1.1 with bond configuration