Skip to content

Fix random number generator for ServiceDiscovery::Sampler

What does this MR do and why?

We noticed when rolling this out on production that all Rails servers were all getting the same selected addressses gitlab-com/gl-infra/production#8036 (comment 1171858328) . We rolled this back on production for now until we fix it.

After much head scratching and driving myself insane testing this locally I finally came up the realization that rand always generates a float smaller than 1 and Random.new(seed) always gives the same values if seed is a float less than 1. This is presumably because it's expecting an integer. As such I've switched to generating the seed as an integer. I also decided to use Random.new_seed as this seems to be the more explicit way of getting a seed value even though this is just some very large integer.

We also can see this behaviour in staging as we also rolled this out to staging last week:

Screen_Shot_2022-11-15_at_3.05.54_pm

source.

This shows that we're disproportionately choosing some pgbouncers over others (some have 0 connections). Since we didn't roll back staging we can merge this change and when it's deployed to staging we should hopefully see these lines converge and be evenly distributed. Once we see that we should be OK to try rolling it out to production again.

Also to be extra robust I did verify that Random.new_seed does seem to give a new number every time the Ruby process starts up and not just start from the same number every time. Even on a fresh docker container. I wanted to validate this because my first thought was that the problem was something to do with Kubernetes pods generating the same random number every time on startup:

$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 176004222810440936052995754997845339307
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 158076779882009195494252748041964262917
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 121861738313815470270957465324479091575
$ docker run --rm ruby ruby -e 'puts "Random number: #{Random.new_seed}"'
Random number: 187922300241423054260652959166859157043

Screenshots or screen recordings

irb(main):001:0> rand()
=> 0.9877912783753516
irb(main):002:0> rand()
=> 0.24387352687578734

irb(main):005:0> [1,2,3,4].shuffle(random: Random.new(rand()))
=> [3, 4, 2, 1]
irb(main):006:0> [1,2,3,4].shuffle(random: Random.new(rand()))
=> [3, 4, 2, 1]
irb(main):007:0> [1,2,3,4].shuffle(random: Random.new(rand()))
=> [3, 4, 2, 1]

irb(main):008:0> Random.new_seed
=> 250491254891584330886895437192910398408
irb(main):009:0> Random.new_seed
=> 223472397698967265728569619421196790104

irb(main):010:0> [1,2,3,4].shuffle(random: Random.new(Random.new_seed))
=> [1, 4, 3, 2]
irb(main):011:0> [1,2,3,4].shuffle(random: Random.new(Random.new_seed))
=> [1, 2, 4, 3]
irb(main):012:0> [1,2,3,4].shuffle(random: Random.new(Random.new_seed))
=> [2, 1, 3, 4]

How to set up and validate locally

You can use the same instructions from !101994 (merged) .

Before

You can see that all the rails processes (and GDK has a few) seem to choose ports 6432 and 6433. Using the pgbouncer console you'll see a bunch of connections for 6432 and 6433 and none for the other pgbouncers. Those only show the client from the psql command.

PgBouncer show clients
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6432 -d pgbouncer -c 'show clients'
 type | user  |        database         | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     |    link     | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50854 | 127.0.0.1  |       6432 | 2022-11-15 15:10:39 AEDT | 2022-11-15 15:10:39 AEDT |    0 |       0 |            0 | 0x134008210 | 0x144815410 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50865 | 127.0.0.1  |       6432 | 2022-11-15 15:10:40 AEDT | 2022-11-15 15:10:40 AEDT |    0 |       0 |            0 | 0x134008440 | 0x144815640 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50894 | 127.0.0.1  |       6432 | 2022-11-15 15:10:43 AEDT | 2022-11-15 15:10:44 AEDT |    0 |       0 |            0 | 0x1340088a0 | 0x144815aa0 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50910 | 127.0.0.1  |       6432 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT |    0 |       0 |            0 | 0x134008d00 | 0x144815cd0 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50915 | 127.0.0.1  |       6432 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT |    0 |       0 |            0 | 0x134009160 | 0x144816360 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50948 | 127.0.0.1  |       6432 | 2022-11-15 15:11:03 AEDT | 2022-11-15 15:11:04 AEDT |    0 |       0 |            0 | 0x1340095c0 | 0x1448167c0 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50866 | 127.0.0.1  |       6432 | 2022-11-15 15:10:40 AEDT | 2022-11-15 15:10:41 AEDT |    0 |       0 |            0 | 0x134008670 | 0x144815870 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50897 | 127.0.0.1  |       6432 | 2022-11-15 15:10:44 AEDT | 2022-11-15 15:10:44 AEDT |    0 |       0 |            0 | 0x134008ad0 | 0x144815f00 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50912 | 127.0.0.1  |       6432 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT |    0 |       0 |            0 | 0x134008f30 | 0x144816130 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50918 | 127.0.0.1  |       6432 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT |    0 |       0 |            0 | 0x134009390 | 0x144816590 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50949 | 127.0.0.1  |       6432 | 2022-11-15 15:11:03 AEDT | 2022-11-15 15:11:03 AEDT |    0 |       0 |            0 | 0x1340097f0 | 0x1448169f0 |          0 |
 C    | dylan | pgbouncer               | active | 127.0.0.1 | 51012 | 127.0.0.1  |       6432 | 2022-11-15 15:11:32 AEDT | 2022-11-15 15:11:32 AEDT |    0 |       0 |            0 | 0x134009a20 |             |          0 |
(12 rows)

$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6433 -d pgbouncer -c 'show clients'
 type | user  |        database         | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     |    link     | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50896 | 127.0.0.1  |       6433 | 2022-11-15 15:10:44 AEDT | 2022-11-15 15:10:44 AEDT |    0 |       0 |            0 | 0x14780e040 | 0x147814c10 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50909 | 127.0.0.1  |       6433 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT |    0 |       0 |            0 | 0x14780e4a0 | 0x1478152a0 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50916 | 127.0.0.1  |       6433 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT |    0 |       0 |            0 | 0x14780e900 | 0x147815700 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 50951 | 127.0.0.1  |       6433 | 2022-11-15 15:11:03 AEDT | 2022-11-15 15:11:04 AEDT |    0 |       0 |            0 | 0x14780ed60 | 0x147815b60 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50855 | 127.0.0.1  |       6433 | 2022-11-15 15:10:39 AEDT | 2022-11-15 15:10:40 AEDT |    0 |       0 |            0 | 0x14780de10 | 0x147814e40 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50898 | 127.0.0.1  |       6433 | 2022-11-15 15:10:44 AEDT | 2022-11-15 15:10:44 AEDT |    0 |       0 |            0 | 0x14780e270 | 0x147815070 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50911 | 127.0.0.1  |       6433 | 2022-11-15 15:10:48 AEDT | 2022-11-15 15:10:48 AEDT |    0 |       0 |            0 | 0x14780e6d0 | 0x1478154d0 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 50917 | 127.0.0.1  |       6433 | 2022-11-15 15:10:49 AEDT | 2022-11-15 15:10:49 AEDT |    0 |       0 |            0 | 0x14780eb30 | 0x147815930 |          0 |
 C    | dylan | pgbouncer               | active | 127.0.0.1 | 51018 | 127.0.0.1  |       6433 | 2022-11-15 15:11:34 AEDT | 2022-11-15 15:11:34 AEDT |    0 |       0 |            0 | 0x14780ef90 |             |          0 |
(9 rows)


$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6434 -d pgbouncer -c 'show clients'
 type | user  | database  | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     | link | remote_pid | tls
------+-------+-----------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+------+------------+-----
 C    | dylan | pgbouncer | active | 127.0.0.1 | 51036 | 127.0.0.1  |       6434 | 2022-11-15 15:11:42 AEDT | 2022-11-15 15:11:42 AEDT |    0 |       0 |            0 | 0x138009810 |      |          0 |
(1 row)

$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6435 -d pgbouncer -c 'show clients'
 type | user  | database  | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     | link | remote_pid | tls
------+-------+-----------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+------+------------+-----
 C    | dylan | pgbouncer | active | 127.0.0.1 | 51049 | 127.0.0.1  |       6435 | 2022-11-15 15:11:46 AEDT | 2022-11-15 15:11:46 AEDT |    0 |       0 |            0 | 0x160808210 |      |          0 |

After

Now there seems to be a random selection of connections across all pgbouncers:

PgBouncer show clients
$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6432 -d pgbouncer -c 'show clients'
 type | user  |        database         | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     |    link     | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51535 | 127.0.0.1  |       6432 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT |    0 |       0 |            0 | 0x15d008440 | 0x15c011410 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51544 | 127.0.0.1  |       6432 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT |    0 |       0 |            0 | 0x15d008670 | 0x15c011870 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51564 | 127.0.0.1  |       6432 | 2022-11-15 15:15:13 AEDT | 2022-11-15 15:15:13 AEDT |    0 |       0 |            0 | 0x15d0088a0 | 0x15c011aa0 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51533 | 127.0.0.1  |       6432 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT |    0 |       0 |            0 | 0x15d008210 | 0x15c011640 |          0 |
 C    | dylan | pgbouncer               | active | 127.0.0.1 | 51581 | 127.0.0.1  |       6432 | 2022-11-15 15:15:19 AEDT | 2022-11-15 15:15:19 AEDT |    0 |       0 |            0 | 0x15d008ad0 |             |          0 |
(5 rows)

$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6433 -d pgbouncer -c 'show clients'
 type | user  |        database         | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     |    link     | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51485 | 127.0.0.1  |       6433 | 2022-11-15 15:14:56 AEDT | 2022-11-15 15:14:56 AEDT |    0 |       0 |            0 | 0x12800b810 | 0x13900c810 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51498 | 127.0.0.1  |       6433 | 2022-11-15 15:14:57 AEDT | 2022-11-15 15:14:57 AEDT |    0 |       0 |            0 | 0x12800ba40 | 0x13900ca40 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51537 | 127.0.0.1  |       6433 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT |    0 |       0 |            0 | 0x12800bc70 | 0x13900cea0 |          0 |
 C    | dylan | pgbouncer               | active | 127.0.0.1 | 51590 | 127.0.0.1  |       6433 | 2022-11-15 15:15:23 AEDT | 2022-11-15 15:15:23 AEDT |    0 |       0 |            0 | 0x12800bea0 |             |          0 |
(4 rows)

$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6434 -d pgbouncer -c 'show clients'
 type | user  |        database         | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     |    link     | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51531 | 127.0.0.1  |       6434 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT |    0 |       0 |            0 | 0x12d00bc70 | 0x13d80de10 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51536 | 127.0.0.1  |       6434 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT |    0 |       0 |            0 | 0x12d00c0d0 | 0x13d80e6d0 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51486 | 127.0.0.1  |       6434 | 2022-11-15 15:14:56 AEDT | 2022-11-15 15:14:56 AEDT |    0 |       0 |            0 | 0x12d00b810 | 0x13d80e040 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51499 | 127.0.0.1  |       6434 | 2022-11-15 15:14:57 AEDT | 2022-11-15 15:14:57 AEDT |    0 |       0 |            0 | 0x12d00ba40 | 0x13d80e270 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51534 | 127.0.0.1  |       6434 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT |    0 |       0 |            0 | 0x12d00bea0 | 0x13d80e4a0 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51546 | 127.0.0.1  |       6434 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT |    0 |       0 |            0 | 0x12d00c300 | 0x13d80eb30 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51565 | 127.0.0.1  |       6434 | 2022-11-15 15:15:13 AEDT | 2022-11-15 15:15:13 AEDT |    0 |       0 |            0 | 0x12d00c530 | 0x13d80ed60 |          0 |
 C    | dylan | pgbouncer               | active | 127.0.0.1 | 51598 | 127.0.0.1  |       6434 | 2022-11-15 15:15:26 AEDT | 2022-11-15 15:15:26 AEDT |    0 |       0 |            0 | 0x12d00c760 |             |          0 |
(8 rows)

$ PGPASSWORD=gitlab psql -U $(whoami) -h localhost -p 6435 -d pgbouncer -c 'show clients'
 type | user  |        database         | state  |   addr    | port  | local_addr | local_port |       connect_time       |       request_time       | wait | wait_us | close_needed |     ptr     |    link     | remote_pid | tls
------+-------+-------------------------+--------+-----------+-------+------------+------------+--------------------------+--------------------------+------+---------+--------------+-------------+-------------+------------+-----
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51532 | 127.0.0.1  |       6435 | 2022-11-15 15:15:02 AEDT | 2022-11-15 15:15:02 AEDT |    0 |       0 |            0 | 0x12f00d010 | 0x11d00b810 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51545 | 127.0.0.1  |       6435 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT |    0 |       0 |            0 | 0x12f00d470 | 0x11d00bc70 |          0 |
 C    | dylan | gitlabhq_development    | active | 127.0.0.1 | 51566 | 127.0.0.1  |       6435 | 2022-11-15 15:15:13 AEDT | 2022-11-15 15:15:14 AEDT |    0 |       0 |            0 | 0x12f00d8d0 | 0x11d00c0d0 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51538 | 127.0.0.1  |       6435 | 2022-11-15 15:15:03 AEDT | 2022-11-15 15:15:03 AEDT |    0 |       0 |            0 | 0x12f00d240 | 0x11d00ba40 |          0 |
 C    | dylan | gitlabhq_development_ci | active | 127.0.0.1 | 51547 | 127.0.0.1  |       6435 | 2022-11-15 15:15:04 AEDT | 2022-11-15 15:15:04 AEDT |    0 |       0 |            0 | 0x12f00d6a0 | 0x11d00bea0 |          0 |
 C    | dylan | pgbouncer               | active | 127.0.0.1 | 51603 | 127.0.0.1  |       6435 | 2022-11-15 15:15:29 AEDT | 2022-11-15 15:15:29 AEDT |    0 |       0 |            0 | 0x12f00db00 |             |          0 |
(6 rows)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Dylan Griffith

Merge request reports