Skip to content

[#695] Automatically create new shmid if previous shmid is removed instead of requiring MUPIP RUNDOWN -RELINKCTL

Narayanan Iyer requested to merge nars1/YDB:ydb695 into master

Background

  • The function relinkctl_open() (in sr_unix/relinkctl.c) used to check if the relinkctl file has a header containing a shmid that is no longer existing in the system. If so, it used to issue a REQRLNKCTLRNDWN error. This required mupip rundown -relinkctl, an unnecessary extra step for the user.

Fix

  • relinkctl_open() already had logic to create a new shmid (in case the current process is the one creating the relinkctl file). In the case we detect the shmid has been removed, we fall through to this pre-existing logic to create a new shmid, then update the relinkctl file header to note down the new shmid and then return successfully (instead of issuing an unfriendly REQRLNKCTLRNDWNerror).

  • One subtle thing that had to be taken into account during the implementation was that we used to attach to the shared memory using do_shmat() while not holding an exclusive lock on the relinkctl file. This meant that if we detect that the shmid has been removed, we cannot safely proceed with getting the exclusive lock and creating a new shmid. This is because another process could have done exactly the same thing while we had released the lock.

    Therefore, once we reobtain the lock, we continue on to the next iteration of the pre-existing do/while loop but this time around we remember to not release the lock before the do_shmat() call (using a new rctl_do_shmat_count variable that records whether this is the first do_shmat() call or not). If in the next iteration too, we find the shmid is removed, we can safely clear it and create a new shmid because we have held the exclusive lock all along in this iteration.

  • One more subtle thing that needed handling when we fell through to the pre-existing logic was that we are now reaching that logic with rctl_existed variable set to TRUE which that pre-existing logic is not designed to handle. So this variable is set to FALSE but more importantly other variables that relied on this variable to be initialized (user_id, group_id and perm) are now initialized before we fall through as the pre-existing logic relies on these variables. This meant invoking gtm_permissions() in yet another place in the code so that logic has been moved into a new macro GET_USER_ID_GROUP_ID_AND_PERM and is now invoked from two places in this file.

Test

  • The existing relink/rundown subtest already tests scenarios where a process is killed and the relinkctl file shmid is removed (with the relinkctl file intact). That test no longer issues a REQRLNKCTLRNDWN error so all that was needed is fixing that reference file. No new test is necessary.

Merge request reports