Confusing behaviour when using multiple same-port CI services on Kubernetes executor

Summary

We're struggling with using CI services that utilise the same kind of image (e.g. MySQL). We've been debugging this yesterday for quite some time, and here's the result of our work.

Debug of 2 services with MySQL in CI and having CI_DEBUG_SERVICES: "true" defined on job level, with exactly the same MySQL images (tags):

  • having variables MYSQL_USER, MYSQL_PASSWORD defined on job level level, the same MySQL images (tags):
    • there are logs reporting problems with port binding, for both default ports 3306 and 33060:
      [service:mysql-sql-master] 2024-11-29T08:09:09.015431177Z 2024-11-29T08:09:09.015247Z 0 [ERROR] [MY-011300] [Server] Plugin mysqlx reported: 'Setup of bind-address: '*' port: 33060 failed, `bind()` failed with error: Address already in use (98). Do you already have another mysqld server running with Mysqlx ?'
      [service:mysql-sql-master] 2024-11-29T08:09:09.015457341Z 2024-11-29T08:09:09.015299Z 0 [ERROR] [MY-013597] [Server] Plugin mysqlx reported: 'Value '*' set to `Mysqlx_bind_address`, X Plugin can't bind to it. Skipping this value.'
      [service:mysql-sql-shard-1] 2024-11-29T08:09:09.015431177Z 2024-11-29T08:09:09.015247Z 0 [ERROR] [MY-011300] [Server] Plugin mysqlx reported: 'Setup of bind-address: '*' port: 33060 failed, `bind()` failed with error: Address already in use (98). Do you already have another mysqld server running with Mysqlx ?'
      [service:mysql-sql-shard-1] 2024-11-29T08:09:09.015457341Z 2024-11-29T08:09:09.015299Z 0 [ERROR] [MY-013597] [Server] Plugin mysqlx reported: 'Value '*' set to `Mysqlx_bind_address`, X Plugin can't bind to it. Skipping this value.'
      [service:mysql-sql-shard-1] 2024-11-29T08:09:09.314558406Z 2024-11-29T08:09:09.314465Z 0 [ERROR] [MY-010262] [Server] Can't start server: Bind on TCP/IP port: Address already in use
      [service:mysql-sql-shard-1] [service:mysql-sql-master] 2024-11-29T08:09:09.314597305Z 2024-11-29T08:09:09.314496Z 0 [ERROR] [MY-010257] [Server] Do you already have another mysqld server running on port: 3306 ?2024-11-29T08:09:09.314558406Z 2024-11-29T08:09:09.314465Z 0 [ERROR] [MY-010262] [Server] Can't start server: Bind on TCP/IP port: Address already in use
      [service:mysql-sql-master] [service:mysql-sql-shard-1] 2024-11-29T08:09:09.314597305Z 2024-11-29T08:09:09.314496Z 0 [ERROR] [MY-010257] [Server] Do you already have another mysqld server running on port: 3306 ?2024-11-29T08:09:09.314605472Z 2024-11-29T08:09:09.314519Z 0 [ERROR] [MY-010119] [Server] Aborting
      [service:mysql-sql-master] 2024-11-29T08:09:09.314605472Z 2024-11-29T08:09:09.314519Z 0 [ERROR] [MY-010119] [Server] Aborting
    • aforementioned errors do not kill job -> section scripts in job is executed, where different databases are created on each host manually, via calling:
      mysql -h sql-master --default-character-set=utf8mb4 -uroot -proot -e "CREATE DATABASE test_database_master;"
      mysql -h sql-shard-1 --default-character-set=utf8mb4 -uroot -proot -e "CREATE DATABASE test_database_shard-1;"
    • verifying databases' creation in debug terminal returns BOTH databases on each host. In practice it looks like it's the same database host, no matter what host is provided in commands for verification:
      mysql -h sql-master -uroot -proot -e "show databases;"
      mysql -h sql-shard-1 -uroot -proot -e "show databases;"
  • defining the same variables as in previous case, but on each service level and additionally different MYSQL_DATABASE and MYSQL_TCP_PORT (3306 and 3307) variables:
    • according to logs both databases are created on both services, in the same time (which is unexpected and incorrect):
      [service:mysql-sql-shard-1] 2024-11-28T12:31:36.742703078Z 2024-11-28 12:31:36+00:00 [Note] [Entrypoint]: Creating database test_database_master
      [service:mysql-sql-master] 2024-11-28T12:31:36.742703078Z 2024-11-28 12:31:36+00:00 [Note] [Entrypoint]: Creating database test_database_master
      [service:mysql-sql-shard-1] 2024-11-28T12:31:37.940238636Z 2024-11-28 12:31:37+00:00 [Note] [Entrypoint]: Creating database test_database_shard-1
      [service:mysql-sql-master] 2024-11-28T12:31:37.940238636Z 2024-11-28 12:31:37+00:00 [Note] [Entrypoint]: Creating database test_database_shard-1
    • there are logs reporting problems with port binding, but only related to port 33060
    • aforementioned errors do not kill job -> section scripts in job is executed
    • verifying databases' creation via debug terminal if not providing port (e.g. mysql -h <host> ...) is possible for both databases hosts (alias in service definition), although in theory it should not work for the latter (there is 3307 defined explicitly); both databases are incorrectly present in returned results; when providing port in verification (e.g. mysql -h <host> -P <port> ...) only second databases is returned; probably provided host does not matter and during databases creation both of them are incorrectly created on first service

Debug of 2 services with MySQL in CI and having CI_DEBUG_SERVICES: "true" defined on job level, with different MySQL images (tags) - even if they pinpoint to exactly the same image, e.g. mysql:8.0 and mysql:8.0.40 (at the time of debug pinpoint to the same MySQL Server 8.0.40-1.el9):

  • having variables MYSQL_USER, MYSQL_PASSWORD defined on each service level:
    • there are logs reporting the same problems with port binding for both default ports
    • aforementioned errors do not kill job -> section scripts in job is executed, where different databases are created on each host manually, in the same manner as above
    • verifying databases' creation via debug terminal provides the same results as in previous case
  • having variables MYSQL_USER, MYSQL_PASSWORD and different MYSQL_DATABASE defined on each service level:
    • according to logs databases are correctly (correct database on correct host) created:
      [service:mysql-sql-master] 2024-11-29T08:40:11.969208897Z 2024-11-29 08:40:11+00:00 [Note] [Entrypoint]: Creating database test_database_master
      [service:mysql-sql-shard-1] 2024-11-29T08:40:23.661053115Z 2024-11-29 08:40:23+00:00 [Note] [Entrypoint]: Creating database test_database_shard-1
    • there are logs reporting the same problems with port binding for both default ports
    • aforementioned errors do not kill job -> section scripts in job is executed
    • verifying databases' creation via debug terminal returns only database created for the first of the hosts, no matter the host provided in the command (e.g. mysql -h <host> ...)
  • defining the same variables as in previous case and additionally different (values: 3306 and 3307) MYSQL_TCP_PORT variable on each service level:
    • according to logs databases are correctly (correct database on correct host) created:
      [service:mysql-sql-master] 2024-11-29T08:40:11.969208897Z 2024-11-29 08:40:11+00:00 [Note] [Entrypoint]: Creating database test_database_master
      [service:mysql-sql-shard-1] 2024-11-29T08:40:23.661053115Z 2024-11-29 08:40:23+00:00 [Note] [Entrypoint]: Creating database test_database_shard-1
    • there are logs reporting problems with port binding, but only related to port 33060
    • aforementioned errors do not kill job -> section scripts in job is executed
    • verifying databases' creation via debug terminal if not providing port (e.g. mysql -h <host> ...) is possible for both databases hosts (alias in service definition), although in theory it should not work for the latter (there is 3307 defined explicitly); when port is not provided or default 3306 is used it returns correctly only database from the first service, when providing 3307 it correctly returns only database from the second host; in both cases host (alias in service definition) provided in command does not matter - they may be used interchangeably
  • defining the same variables as in previous case and additionally different MYSQL_TCP_PORT (3307 and 3308, skipping default 3306) variable on each service level:
    • according to logs databases are correctly (correct database on correct host) created
    • there are logs reporting problems with port binding, but only related to port 33060
    • aforementioned errors do not kill job -> section scripts in job is executed
    • verifying databases' creation via debug terminal is impossible if not providing port; when port is provided there are correctly different databases returned for each service

Conclusion

It looks like with the Kubernetes executor all the services are created within the same pod and all the host/aliases point to localhost, which in practice makes it impossible to spawn multiple services of the same type that use the same port (unless you provide custom one), and also leads to situation where hosts/aliases are not fully distinguishable (all point to every service, full distinguishability is possible only with conjunction with port). Moreover, this is also possible to connect to MySQL service using... Redis or Kafka host/alias, if you define them as CI services!

image

This is super confusing and because of this our other jobs use DinD, which we wanted to avoid in this case.

The most valid options for us are those combined:

  • using different tags of MySQL (even though pointing to the same image)
  • defining port in both services, though recommending here different than defaults one, to avoid connecting to MySQL without specific port and accidentally getting back incorrect instance

What is the current bug behavior?

  • Services that are supposed to be separated are pointing to the same service (multiple host/aliases can be use for connecting to the service e.g. MySQL)
  • behaviour is confusing and differs based on little details pointed out above

What is the expected correct behavior?

  • Services should be properly separated or at least errors like "port already in use" should fail the job
  • It shouldn't matter if multiple services use the same image (like mysql:8.0) or not

Results of GitLab environment info

I don't have access to infrastructure where I can run env check. What I can tell, we use Gitlab 17.6 Premium On Premise.

Results of GitLab application Check

I don't have access to infrastructure where I can run env check.

Possible fixes

🤷

Edited Dec 09, 2024 by Grzegorz Korba
Assignee Loading
Time tracking Loading