Skip to content
  • Narayanan Iyer's avatar
    [#25] Fix DATA RACE issues with easyAPIDefaultDataSize,... · c74a703c
    Narayanan Iyer authored
    [#25] Fix DATA RACE issues with easyAPIDefaultDataSize, easyAPIDefaultSubscrCnt and easyAPIDefaultSubscrSize variables in easy_api.go
    
    Background
    ----------
    * Running the `go/randomWalk` subtest hundreds of times showed a rare failure involving two types
      of DATA RACE identified by the Go race detector (randomly enabled by test framework).
    
    * The first type was a write/write data race. The race detector identified that line 407 below
      can be concurrently executed by multiple goroutines.
    
      ```go
      Writer goroutine
      ----------------
         387 func NodeNextE(tptoken uint64, errstr *BufferT, varname string, subary []string) ([]string, error) {
           .
         403     // Check if we had an INVSTRLEN error (too small an output buffer)
         404     errorcode := ErrorCode(err)
         405     if int(YDB_ERR_INSUFFSUBS) == errorcode {
         406             // This is INSUFFSUBS - pickup number of subscripts we actually need and reallocate
      -> 407             easyAPIDefaultSubscrCnt = dbsubs.ElemUsed()
         408             dbsubs.Alloc(easyAPIDefaultSubscrCnt, easyAPIDefaultSubscrSize) // Reallocate and reset dbsubs
         409             continue
         410     }
      ```
    
    * The second type was a read/write data race. The race detector identified that lines 407 and 455
      below could be concurrently executed by multiple goroutines.
    
      ```go
      Reader goroutine
      ----------------
         446 func NodePrevE(tptoken uint64, errstr *BufferT, varname string, subary []string) ([]string, error) {
           .
      -> 455         dbsubs.Alloc(easyAPIDefaultSubscrCnt, easyAPIDefaultSubscrSize)
    
      Writer goroutine
      ----------------
         387 func NodeNextE(tptoken uint64, errstr *BufferT, varname string, subary []string) ([]string, error) {
           .
         403                         // Check if we had an INVSTRLEN error (too small an output buffer)
         404                         errorcode := ErrorCode(err)
         405                         if int(YDB_ERR_INSUFFSUBS) == errorcode {
         406                                 // This is INSUFFSUBS - pickup number of subscripts we actually need and reallocate
      -> 407                                 easyAPIDefaultSubscrCnt = dbsubs.ElemUsed()
         408                                 dbsubs.Alloc(easyAPIDefaultSubscrCnt, easyAPIDefaultSubscrSize) //
      ```
    
    Fix
    ---
    * The global variable `easyAPIDefaultSubscrCnt` is now made a constant and any code that previously used
      to change its value now uses a local variable instead. This avoids the data race condition. The only
      consequence of this is that changes to this global variable that previously used to persist for future
      EasyAPI operations will not persist now so the allocations will start from the default every time. This
      is considered okay given this is the EasyAPI and performance is not the primary concern (ease of use is).
    
    * Although the race detector did not identify issues yet, similar issues exist with the global variables
      `easyAPIDefaultDataSize` and `easyAPIDefaultSubscrSize` too. So they are made constants as well.
    
    * While reviewing `easy_api.go`, the reallocation logic in various places was found to lack a `Free()` call
      to free the older (and insuffiently sized) buffer before allocating the bigger buffer. This might not
      be an issue since Go would eventually garbage collect them but it seems better to right away free memory
      that we know is not necessary and so the `Free()` calls are now inserted before the `Alloc()` calls.
    
    * While reviewing `easy_api.go`, `for` loops were observed to have a terminating condition that checked
      `easyAPIDefaultDataSize` reaching `YDB_MAX_STR`. This did not seem necessary so I removed it making those
      for loops infinite loops. The body of the for loop has a break the moment reallocation of buffers is not
      needed so the loop is guaranteed to terminate most commonly in 1 iteration (even 2 iterations should be
      very rare if the value of that node is increasing in size so rapidly).
    
    * After the above changes, the `go/unit_tests` subtest infinite looped in the `TestNodeNextE` subtest. This
      is because of a pre-existing longstanding bug in function `NodePrevE()` where the new number of subscripts
      was incorrectly picked up from `dbkey.Subary.ElemUsed()` instead of from `dbsubs.ElemUsed()`.  The former
      is not updated by the `dbkey.NodePrevST()` call. And so we will be setting `subscrCnt` to the exact same
      value instead of the expanded subscript count. Previously, when the global variable `easyAPIDefaultSubscrCnt`
      was being used, it most likely hid this issue by not exercising the buggy code path. Nevertheless, this is
      now fixed. And the `go/unit_tests` subtest passes fine.
    c74a703c