Skip to content

Overhaul Terraform nodes handling

brunos requested to merge overhaul-terraform-nodes-handling into devel

Version Control Information

Source branch: overhaul-terraform-nodes-handling
Target branch: devel

Commits:

* tf: Drop default set of nodes
  
  Does not build a default set of nodes with Terraform anymore when actually no
   nodes are added in the configuration,
  Implicit behavior should be avoided where it is not obvious and unintuitive.
  
  Additionally lets the yk8s Terraform module require at least one master node
   to be given.
  
  This change is breaking for all users that build clusters while only specifying
   master or worker defaults.
  
  Breaking: true

* tf: Provide control over the whole node name
  
  Allows the user to configure the whole name of nodes.
  Master and worker nodes are now stored together in the `terraform.nodes`
   parent table and distinguished by the mandatory `role` attribute [1].
  The full node name is given as the table name, while for gateway nodes a
   common name can be configured.
   If a cluster name is configured it is prefixed to all node names.
  
  This is a non breaking change in terms of behavior.
  
  [1] Caveat: Changing the role of a Terraform node will rebuild the node
      which might be unexpected.
  
  Breaking: true

* tf: Streamline generation of instance resources
  
  Directly uses the node object maps `master_nodes`, `worker_nodes` and
   `gateway_nodes` respectively for the generation of
   openstack_compute_instance_v2 resources instead of relying on the
   openstack_networking_port_v2 resource type.
  
  This improves clearity and aligns with the surrounding code.

* tf: Group default variables
  
  Consolidates all variables that tune node defaults into tables in
   config.toml [1] and objects in Terraform. This creates a cleaner and more
   intuitive configuration for pre-setting values across worker/master/gateway
   nodes.
  
  + Makes the anti affinity group configurable per worker node
    This fits the pattern of providing each node attribute via the node defaults
     table as well as via the individual node tables.
    Also no server group is created without need anymore.
  
  In terms of behavior, this is a non-breaking change.
  
  [1] See the release notes for an example.
  
  Breaking: true

* tf: Decouple gateway nodes from availability zones
  
  - Changes the gateway names to be index rather than availability zone based,
    see reason a) below.
    This also changes the fip_description and volume_name of the gateways because
    they depended on the gateway's availability zone.
  
    Breaking: Terraform v1.3 provides no feasible way of renaming resources using
              for_each. One must manually rename these resources.
  
  - Adds `gateway_count` to configure the amount of gateway nodes to create,
    see reason b) below. To retain the previous behavior this variable
    defaults to the amount of availability zones specified, when
    `spread_gateways_across_azs=true`.
  
  When the user disables availability zone management [1], nodes are not
   explicitly assigned to an availability zone leaving the choice to the
   cloud controller.
  In this case...
  a) it is misleading to still name the gateway nodes after availability zones
     when they could end up in a different one or none at all.
  b) it is of no use to create as many gateway nodes as availability zones when
     they might not be spread across them anyway.
  
  [1] using `spread_gateways_across_azs=false` or formerly
      `enable_az_management=false`
  
  Breaking: true

* tf: Fix availability zone configuration
  
  Removes ability to let Terraform automatically choose an availability zone per
   master/worker node if it was not explicitly set but retains it for the gateway
   nodes.
   Therefore removes `enable_az_management` and adds `spread_gateways_across_azs`.
   Now the node's availability zones must be explicitly configured, otherwise the
   choice is left to the cloud controller.
  
  This solves three issues (see also [1]):
  1. When a node had no availability zone configured but
     `enable_az_management=true` was set, the node still got one assigned which
     is unintuitive.
  2. Previously availability zones where assigned by iterating over all nodes
     causing reassignments to different availability zones when nodes are added
     or removed in between which ultimately leads to unneccessary rebuilds.
     -- the very thing we want to eliminate (see #575).
  3. The distribution across availability zones was not evened out when some
     nodes are forced to an availability zone via config.
     The cloud controller may be better in doing the distribution in case no
     explicit selection was made.
  
  [1] https://gitlab.com/yaook/k8s/-/issues/575#note_1875262146
  
  Breaking: true
  Part-of: #575

* tf: Use maps instead of lists to configure nodes

  Changes the config.toml format to use one table per Terraform node, bundling
   its attributes rather than splitting them flat across lists. See the release
   note for an example.
  This propagates further down to the Terraform module where each set of nodes
   (masters, workers, gateways) is stored as a map of objects with the node name
   as access key [1].

  This change allows us to properly add, remove, recreate and rotate Terraform
   nodes without affecting other ones. Because previously nodes where tied to
   their list index this was not possible (except for the last node in the list).

  In terms of behavior this is a non-breaking change.

  Also:
  - In order to use `optional()` increases terraform_min_version to 1.3, see
     Terraform changelog [2].
  - Changes the type of the `azs` variable from list to set
    Gateway node names are directly based on the `azs` list and must be unique.

  [1] openstack_compute_instance_v2 Terraform resources are uniquely identified
      by their name.
  [2] https://github.com/hashicorp/terraform/blob/v1.3/CHANGELOG.md#130-september-21-2022

  Breaking: true
  Resolves: #575

* tf: Permanently migrate count to for_each

Description

This merge requests attempts to significantly improve the configuration and handling of Terraform nodes in the config.toml and the yk8s Terraform module. It also solves the issue that nodes cannot be changed without affecting other ones under some circumstances. Several breaking changes are made.

Closes: #575 (closed)


Merge Prerequisites

  • MR title (and description) are descriptive
  • Code is readable and syntactically correct
  • Code is understandable
  • Documentation has been updated, if necessary
  • Commit messages look good
  • Release note file in RST format added in latest commit
Edited by brunos

Merge request reports

Loading