mtail not working properly for WALE backups
At ~13:08 UTC, mtail
stopped reading the /var/log/gitlab/postgresql/current
file where WAL-E logs are written to.
At that specific moment it seems the logfile was rotated:
vlopez@db1:~$ sudo stat /var/log/gitlab/postgresql/current | grep 'Access: 2017'
Access: 2017-07-13 13:08:13.059648952 +0000
/cc @bjk-gitlab
This is blocking https://gitlab.com/gitlab-com/infrastructure/issues/2227
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Victor Lopez mentioned in issue #2227 (closed)
mentioned in issue #2227 (closed)
Is there anything in the mtail log to indicate a problem handling the file rotation?
- Author
Sadly I just realized we didn't have logging enabled. The
-logtostderr
flag is missing, so I'll push a fix for that now. - Author
mtail
seems to keep reading the file after being rotated, but it's not updating the counters:vlopez@db1:~$ cat trace.out | egrep -v "pselect6|epoll" 35310 read(8, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 65536) = 48 35310 read(12, "2017-07-14_11:22:11.43952 db1 postgresql: 2017-07-14 11:22:11 GMT [57433]: [1-1] LOG: incomplete startup packet\n", 4096) = 113 35310 read(12, <unfinished ...> 35310 <... read resumed> "", 4096) = 0 20048 read(12, "", 4096) = 0 35317 read(8, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 65536) = 144 35317 read(12, "2017-07-14_11:22:12.10842 db1 postgresql: wal_e.worker.upload INFO MSG: begin archiving a file\n2017-07-14_11:22:12.10865 db1 postgresql: DETAIL: Uploading \"pg_xlog/0000000200001A3E000000F9\" to \"s3://asdfoo/0000000200001A3E000000F9.lzo\".\n2017-07-14_11:22:12.10876 db1 postgresql: STRUCTURED: time=2017-07-14T11:22:12.108046-00 pid=57448 action=push-wal key=s3://asdfoo/0000000200001A3E000000F9.lzo prefix=db1/ seg=0000000200001A3E000000F9 state=begin\n", 4096) = 524 20048 read(12, "", 4096) = 0 20048 read(12, <unfinished ...> 20048 <... read resumed> "", 4096) = 0 20048 read(12, <unfinished ...> 20048 <... read resumed> "", 4096) = 0 20048 read(12, <unfinished ...> 20048 <... read resumed> "", 4096) = 0 20048 read(12, <unfinished ...> 20048 <... read resumed> "", 4096) = 0 20048 read(12, "", 4096) = 0 20048 read(8, "\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\2\0\0\0\0\0\0\0\20\0\0\0current\0\0\0\0\0\0\0\0\0\4\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 65536) = 144 20048 read(12, <unfinished ...> 20048 <... read resumed> "2017-07-14_11:22:13.24234 db1 postgresql: wal_e.worker.upload INFO MSG: completed archiving to a file\n2017-07-14_11:22:13.24257 db1 postgresql: DETAIL: Archiving to \"s3://asdfoo/0000000200001A3E000000F9.lzo\" complete at 9207.24KiB/s.\n2017-07-14_11:22:13.24265 db1 postgresql: STRUCTURED: time=2017-07-14T11:22:13.241631-00 pid=57448 action=push-wal key=s3://asdfoo/0000000200001A3E000000F9.lzo prefix=db1/ rate=9207.24 seg=0000000200001A3E000000F9 state=complete\n", 4096) = 537 35310 read(12, "", 4096) = 0 35310 read(12, <unfinished ...> 35310 <... read resumed> "", 4096) = 0 35310 read(12, "", 4096) = 0 20048 read(12, <unfinished ...> 20048 <... read resumed> "", 4096) = 0 20048 read(12, "", 4096) = 0 20048 read(12, "", 4096) = 0
Edited by Victor Lopez - Victor Lopez mentioned in merge request gitlab-cookbooks/gitlab-prometheus!281 (merged)
mentioned in merge request gitlab-cookbooks/gitlab-prometheus!281 (merged)
- Author
It seems gitlab-cookbooks/gitlab-prometheus!281 (merged) was generating this problem.
The
mtail
fix was applied at12:43 UTC
:vlopez@db1:~$ ps -ef | grep -i [m]tail/mtail root 25360 25358 0 12:42 ? 00:00:18 /opt/prometheus/mtail/mtail -progs /opt/prometheus/mtail/progs -logs /var/log/gitlab/postgresql/current -log_dir /var/log/mtail
And the logfile was rotated at
15:11 UTC
:vlopez@db1:~$ sudo stat /var/log/gitlab/postgresql/current File: '/var/log/gitlab/postgresql/current' Size: 19828392 Blocks: 38728 IO Block: 4096 regular file Device: 801h/2049d Inode: 1285066 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2017-07-14 15:11:39.521985145 +0000 Modify: 2017-07-14 15:11:39.521985145 +0000 Change: 2017-07-14 15:11:39.521985145 +0000 Birth: -
- Victor Lopez closed
closed