Migrate all gitlab.com data to 'premium' Azure storage
Some of the things that keep biting us:
- It is not always obvious that the NFS server is dead.
ls
might work on some directories giving operators the wrong impression - 'restart' actions on Azure can take a very long time. Our guess is that this is because we have a large number of virtual hard drives attached which slow down the boot process. The really nasty thing about these slow boots is that the Azure API tells you the NFS server is running, long before you can reach it. But if you get impatient and click 'restart' a second time, that second restart gets queued and will hit you just when you think NFS is finally back
Our best guess about the root cause of the disk read errors on the NFS server at the moment is that we are hitting the maximum IOPS on our Azure 'storage account'. As a short term fix we will move the Postgres and Redis disks out of the storage account used by the NFS server, but we do not think this will buy us much time.
cc @jnijhof
Update 2015-11-10: we now know that contrary to what we thought, gitlab.com was not running on SSD-backed 'premium' storage. We are going to migrate all data to premium storage.