Skip to content

network connectivity issues in capm3 upgrades from 1.1.1 to main

Summary

The last 2 nightly pipelines of upgrade from 1.1.1 to main failed for capm3‑ha‑rke2‑virt‑ubuntu

https://gitlab.com/sylva-projects/sylva-core/-/jobs/7723851502 https://gitlab.com/sylva-projects/sylva-core/-/jobs/7722806431

In both cases, we can observe that updated nodes fail to reach the cluster, we can observe a connectivity issue in cloud-init logs:

Network configuration apparently succeed:

2024-09-02 21:44:05 ::  [    3.754508] cloud-init[731]: Cloud-init v. 24.1.3-0ubuntu1~22.04.5 running 'init' at Mon, 02 Sep 2024 21:44:05 +0000. Up 3.73 seconds.
2024-09-02 21:44:05 ::  [    3.761648] cloud-init[731]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++
2024-09-02 21:44:05 ::  [    3.763160] cloud-init[731]: ci-info: +-----------+------+----------------------------+---------------+--------+-------------------+
2024-09-02 21:44:05 ::  [    3.764647] cloud-init[731]: ci-info: |   Device  |  Up  |          Address           |      Mask     | Scope  |     Hw-Address    |
2024-09-02 21:44:05 ::  [    3.766138] cloud-init[731]: ci-info: +-----------+------+----------------------------+---------------+--------+-------------------+
2024-09-02 21:44:05 ::  [    3.767645] cloud-init[731]: ci-info: |   bond0   | True |       192.168.10.23        | 255.255.255.0 | global | 52:54:00:44:44:02 |
2024-09-02 21:44:05 ::  [    3.769127] cloud-init[731]: ci-info: | bond0.100 | True |       192.168.100.23       | 255.255.255.0 | global | 52:54:00:44:44:02 |
2024-09-02 21:44:05 ::  [    3.770629] cloud-init[731]: ci-info: | bond0.100 | True | fe80::5054:ff:fe44:4402/64 |       .       |  link  | 52:54:00:44:44:02 |
2024-09-02 21:44:05 ::  [    3.772112] cloud-init[731]: ci-info: |    ens4   | True |             .              |       .       |   .    | 52:54:00:44:44:02 |
2024-09-02 21:44:05 ::  [    3.773607] cloud-init[731]: ci-info: |    ens5   | True |             .              |       .       |   .    | 52:54:00:44:44:02 |
2024-09-02 21:44:05 ::  [    3.775111] cloud-init[731]: ci-info: |     lo    | True |         127.0.0.1          |   255.0.0.0   |  host  |         .         |
2024-09-02 21:44:05 ::  [    3.776702] cloud-init[731]: ci-info: +-----------+------+----------------------------+---------------+--------+-------------------+

But DNS resolution failed later on:

2024-09-02 21:44:20 ::  [   18.182381] cloud-init[1742]: vdc  252:32   0  200G  0 disk
2024-09-02 21:44:20 ::  [   18.188841] cloud-init[1742]: >> Installing miniserve for log collection in CI
2024-09-02 21:44:20 ::  [   18.194212] cloud-init[1742]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2024-09-02 21:44:20 ::  [   18.194316] cloud-init[1742]:                                  Dload  Upload   Total   Spent    Left  Speed
2024-09-02 21:44:40 ::  
                                                   
[   38.072662] cloud-init[1742]: 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:07 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:08 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:11 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:12 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:13 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:14 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:15 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:16 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:17 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:18 --:--:--     0curl: (6) Could not resolve host: github.com

The issue is that as part of 1.1.1, we were not using bond in libvirt-metal pods, this feature was added in !2153 (merged)

As we are using older version of libvirt-metal pods for 1.1.1 installation, they are not able to handle newer bond configuration during upgrade.

We should change CI-values to install 1.1.1 with bond configuration