1. 03 Dec, 2021 1 commit
  2. 24 Nov, 2021 1 commit
    • Brad Westhafer's avatar
      Fix triggers/randomtriggers test failure · 723d78cb
      Brad Westhafer authored
      The triggers/randomtriggers test failed in internal testing due to a bug in the gentrigload^randomtriggers M routine which generates the expected output for one of the MUPIP TRIGGER commands in the test. This test uses randomly generated triggers and a specific combination of randomly generated triggers could cause the test to fail as described below:
      
      * Firstly, add multiple triggers on the same global variable. For example, triggers b#1, b#2 on ^b.
      * Secondly, delete some but not all of these triggers on this global variable, making sure that there is a "missing" trigger somewhere in the sequence. In the above example, delete b#1.
      * Thirdly, add at least one new trigger on the global. Due to the bug, this will cause the variable tracking the number of triggers to be set to the index of the new trigger. In the above example, the new trigger would be b#3 and gentrigload would think that there are 3 triggers on ^b when there are actually 2.
      * Fourthly, delete all tr...
      723d78cb
  3. 23 Nov, 2021 1 commit
  4. 19 Nov, 2021 1 commit
    • Brad Westhafer's avatar
      Various enhancements to instructions for installing test system locally in README.md · f8922721
      Brad Westhafer authored
      This commit makes several enhancements to README.md:
      
      * Updates all references to `R131` or `T131` to refer to `R133` or `T133` as current development versions of YottaDB are now considered r1.33 since version r1.32 has been released
      * Clarifies some previously ambiguous passages
      * Moves the section on setting up the YottaDB build script to after instead of before the section on cloning the YDBTest repository.
      * Added a comment into the build script explaining the purpose of the `machtype` variable and why the line setting it might need to be modified depending upon the user's machine.
      * Added the argument to disable multi-system tests. This is likely something that users running E_ALLs in a local environment would want to use as such tests would otherwise fail due to the user not having access to a distributed setup to run them successfully.
      f8922721
  5. 26 Oct, 2021 3 commits
    • Brad Westhafer's avatar
      [YDB#793] Enhance r130/ydb566 to also test for ydb793 fix to ydb566 bug · 6e7822c0
      Brad Westhafer authored
      This commit enhances the existing r130/ydb566 test to also test 2 additional scenarios for external call tables:
      
      * Comments at the beginning of the external call table before the shared library line
      * Comments at the end of the shared library line
      
      Both of these cases did not work in external call tables prior to ydb793 and are fixed by ydb!1043. As the call in table does not use the shared library line and as the part of the test that tests comments in call in tables already has a comment on its first line, there is no need to change the call in table part of the test.
      6e7822c0
    • Brad Westhafer's avatar
      Enhance v43001e/C9C11002163 to use com/sstepgbl.m to better analyze rare timeout failures · 887a0487
      Brad Westhafer authored
      The v43001e/C9C11002163 test has failed twice on ARMV6L machines over the past few months. Both failures have involved the test taking much longer than expected, resulting in a diff like the below:
      
      ```diff
      13c13
      < Hello from cln2163
      ---
      > Read timed out
      ```
      
      `sstepgbl.m` records (in a global `^%sstepgbl`) the timeline of a process as it executes each M line. This will give us more information the next time this test fails so we can see where the delay happened.
      887a0487
    • Steven Estes's avatar
  6. 13 Oct, 2021 1 commit
  7. 12 Oct, 2021 1 commit
  8. 04 Oct, 2021 2 commits
    • Brad Westhafer's avatar
      Address v62000/gtm7824 failure by moving heavy computations before the mupip integ · 8f854777
      Brad Westhafer authored
      The v62000/gtm7824 test failed on an ARMV6L machine that was under heavy load due to running another test at the same time. In the test, there is a `MUPIP INTEG` command that waits for 30 seconds due to a white box test so that the INTEG process can set a flag allowing the `waitforOLIstart.csh` process to proceed. In the failure, the `offset.csh` script that is run after the integ starts but before the `waitforOLIstart.csh` took 39 seconds which meant that the integ finished before `waitforOLIstart.csh` started, causing the test to fail. The fix is to run `offset.csh` before the integ so that the integ will not terminate early on slower machines under heavy load.
      8f854777
    • Narayanan Iyer's avatar
      [YDB#782] New r134/ydb782 subtest to test ydb_lock_incr_s() calls in child... · 0077327f
      Narayanan Iyer authored
      [YDB#782] New r134/ydb782 subtest to test ydb_lock_incr_s() calls in child process while parent holds lock
      
      * Note that this test fails as follows with a Debug build of YDB that does not have the YDB#782 fixes.
      
        ```diff
        18c18
        < ## Child : Verify return status from ydb_lock_incr_s() is YDB_LOCK_TIMEOUT
        ---
        > %YDB-F-ASSERT, Assert failed in sr_port/op_lock2.c line 314 for expression (!pvt_ptr2->granted && (pvt_ptr2 == pvt_ptr1))
        26c26
        < ## Child : Verify return status from ydb_lock_incr_s() is YDB_LOCK_TIMEOUT
        ---
        > %YDB-F-ASSERT, Assert failed in sr_port/op_lock2.c line 314 for expression (!pvt_ptr2->granted && (pvt_ptr2 == pvt_ptr1))
        ```
      
      * Note that this test runs fine with a Release build of YDB that does not have the YDB#782 fixes.
        I suspect given the time one can come up with a test using multiple locks and get a user-visible
        issue but am not 100% sure. Do not want to spend a lot of time on this as the underlying code issue
        is anyways fixed. The only consequence of this is that we might classify this incorrectly as a
        user-visible issue whereas it could be a Debug build only issue. But that is considered acceptable.
      0077327f
  9. 01 Oct, 2021 1 commit
    • Brad Westhafer's avatar
      Fix v52000/C9H03002835 test to check if a socket OPEN command timed out · efd7dab8
      Brad Westhafer authored
      The v52000/C9H03002835 test failed in internal testing after an attempt to open a socket resulted in a timeout. This revealed that the test did not check if the attempt to open the socket succeeded and just outputted "Connection to receiver established" whether or not the connection was established.
      
      As an `OPEN` command that times out before successfully establishing a connection with a device sets `$test` to `0` (meaning `FALSE`), the fix is to check if `$test` is `0` immediately after the `OPEN` command. If it is (meaning the `OPEN` timed out), the test fails and immediately halts with an appropriate error message. Otherwise (meaning the `OPEN` succeeded), the test writes the `Connection to receiver established` message and continues.
      efd7dab8
  10. 30 Sep, 2021 1 commit
    • Brad Westhafer's avatar
      Fix v54003/C9L06003421 test failures on ARM by allowing up to 10 seconds time... · 2b04dc31
      Brad Westhafer authored
      Fix v54003/C9L06003421 test failures on ARM by allowing up to 10 seconds time gap (not 5); Reduce time gap from 5 to 2 seconds on x86_64
      
      The v54003/C9L06003421 test has been failing on ARM machines in the weekend test with a `Mismatch in timestamp` error in the part when comparing the EOF timestamp to the tp_resolve time towards the end of the test. In all failure instances, the time gap has been 7 seconds. This commit fixes these failures by allowing a time gap of up to 10 seconds on ARM machines.
      
      As we discovered that the 5 second time gap is excessive for x86_64, it is reduced from 5 to 2 seconds. We had tested reducing it to 1 second but that produced 2 failures out of 500. At 2 seconds, we got 0 failures out of 500.
      
      This commit also fixes an error with the `copyright.py` script that caused the end of the `All rights reserved.` line to be misaligned when automatically updating copyrights. The line had used spaces instead of tabs.
      2b04dc31
  11. 27 Sep, 2021 1 commit
    • Brad Westhafer's avatar
      Try address v52000/D9G12002636 subtest failure; Also use com/sstepgbl.m to... · 7fb9ae71
      Brad Westhafer authored
      Try address v52000/D9G12002636 subtest failure; Also use com/sstepgbl.m to better analyze future rare failures
      
      This commit makes 2 changes to address v52000/D9G12002636 test failures:
      
      * Firstly, the second interrupter job now waits for ^cnt to be non-zero not ^drvactive. The reason for this is because we had a failure where ^cnt was 0 at the end of the test causing the test to fail with no interrupts. Continuing to wait for ^drvactive to be `1` would be redundant. The intrdrv() function will not set `cnt` to a non-zero value until 14 lines after it sets `drvactive` to 1.
      
      * Secondly, com/sstepgbl.m is added to the test's M code to better analyze future failures. This records the line by line timeline of the process which will be helpful for timing failures.
      7fb9ae71
  12. 13 Sep, 2021 1 commit
    • Joshua Nelson's avatar
      Simplify setuprustenv.csh · 77c57760
      Joshua Nelson authored
      Now that YDBRust's minimum supported version is 1.41, we can configure
      the build optimization for dependencies directly in Cargo.toml instead
      of through an environment variable.
      77c57760
  13. 10 Sep, 2021 1 commit
  14. 02 Sep, 2021 1 commit
    • Narayanan Iyer's avatar
      Fix rare r130/ydb484 subtest failure with a TRESTNOT error · d3b21290
      Narayanan Iyer authored
      * The `r130_0/ydb484` subtest failed in one rare instance with the following diff.
      
        ```diff
        $ cat ydb484.diff
        1270a1271,1275
        > ZSTATUS=AllCmdPostConditionalTest+407^ydb484,%YDB-E-TRESTNOT, Cannot TRESTART, transaction is not restartable,%YDB-E-TRESTLOC, Transaction start: AllCmdPostConditionalTest+407^ydb484, Transaction failure: RetSame+6^ydb484
        > ZSTATUS=AllCmdPostConditionalTest+409^ydb484,%YDB-E-TLVLZERO, Transaction is not in progress
        >        -> PASS from TCOMMIT postconditionaltest
        >        -> PASS from TROLLBACK postconditionaltest
        >        -> PASS from TSTART postconditionaltest
        ```
      
      * The `tstart` at `AllCmdPostConditionalTest+407^ydb484` did not have anything following the command. That
        meant it was not restartable. And so the `TRESTNOT` error is explainable. Even though this test does not
        run multiple processes, it is possible for a process to step on its own buffers and cause a restartable
        situation in rare cases and so it is not surprising that happened in this test.
      
      * The fix to avoid this rare failure is to change `tstart` to `tstart ()`. The `()` causes the transaction
        to become restartable.
      d3b21290
  15. 01 Sep, 2021 3 commits
    • Narayanan Iyer's avatar
      [YDB#775] New r134/ydb775 subtest to test that LOCKs obtained inside... · 01df7840
      Narayanan Iyer authored
      [YDB#775] New r134/ydb775 subtest to test that LOCKs obtained inside TSTART/TCOMMIT are released on TRESTART
      
      * `singleproc^ydb775` implements the `tprestartlock.m` test case at YDB#775 (comment 666796604).
      
      * `multiproc^ydb775` implements the `tplock.m` test case at YDB#775.
      01df7840
    • Narayanan Iyer's avatar
      [YDB#775] Enhance locks/locks_main subtest to test YDB#775 · a79b5602
      Narayanan Iyer authored
      * The `d002014` label in `locks/inref/locks1.m` has been enhanced to do a check of
        the M locks held by a process not just after a `trollback` but also after a `trestart`.
        This serves as a good simple test of YDB#775. The logic to check the locks already
        existed and is now moved to a `lockcheck` label. This label is now invoked from 2 places
        whereas it was invoked only from 1 place previously.
      a79b5602
    • Sam Habiel's avatar
      [#364] Various docker enhancements to YDBTest · 9270f5a8
      Sam Habiel authored
      - Add ability to pass YDB source code as a volume. This way, you can
        test a local copy of YottaDB against the test system.
      - Debug replication tests and various fixes to make them work.
      - Add -shell and -rootshell to the script for debugging purposes.
      - docker/install_yottadb.csh renamed to
        docker/build_and_install_yottadb.csh.
      - Since gtmcrypt got moved ([YDB#306],
        [YottaDB/Util/YDBEncrypt#1]), the `build_and_install_yottadb.csh` script needed
        to be updated.
      - Add ability to pass the YDBTest system as a volume.
      - Protect ourselves if user does not pass arguments by displaying help
        for gtmtest.
      9270f5a8
  16. 20 Aug, 2021 1 commit
  17. 12 Aug, 2021 1 commit
    • Brad Westhafer's avatar
      Address v54000/C9D08002390 weekend test failure on ARMV6L by extending the... · 9f45c1b9
      Brad Westhafer authored
      Address v54000/C9D08002390 weekend test failure on ARMV6L by extending the wait_for_log timeout on ARMVXL.
      
      The v54000/C9D08002390 test failed on an ARMV6L machine during weekend tests because the `do_dse_flush.done_1` file did not show up until 72 seconds after the 120 second timeout for `wait_for_log.csh` had timed out. This resulted in further issues in the diff when the `TRACE_WRITERSTUCK` file was moved prematurely and the DSE process created another such file causing the later checks for no `TRACE_WRITERSTUCK` file to fail.
      
      The fix involves extending the default timeout for `wait_for_log.csh` on ARMVXL to 5 minutes instead of 2 to give all ARM tests sufficient time to finish.
      9f45c1b9
  18. 10 Aug, 2021 1 commit
  19. 05 Aug, 2021 2 commits
    • Sam Habiel's avatar
      [#364] Create Dockerfile to allow easy use of test system for outside... · f39d61b9
      Sam Habiel authored and Sam Habiel's avatar Sam Habiel committed
      [#364] Create Dockerfile to allow easy use of test system for outside contributors and fix up instructions
      
      This commit makes two major changes to the YDBTest system.
      - Add docker support in the `docker` folder.
      - Modify `stdout` so that it can accept an argument 0, 1, 2 for various levels
      of verbosity. The reason for that is that interactive users who cannot get mail
      messages or docker scripts need to see the brief output of the tests.
      `-stdout 1` is used by the docker run.csh script to print subtest
      results. Previous `-stdout` is still supported, and is equivalent to
      `-stdout 2`, which is "very verbose" output.
      
      How to run
      ----------
      See `docker/notes.txt` for instructions on building and running.
      
      Detailed Changes for docker folder
      ----------------------------------
      - `README.md`: updated to fix various errors and omissions discovered
      while developing the docker files; added instructions for doing a docker
      image.
      - `docker/Dockerfile`: definition of how to build the ydbtest image.
      - `docker/cshrc`: sets up the tcsh environment for ydbtest.
      - `docker/run.csh`: script that is launched by default to run `gtmtest.csh`, the
      YDBTest entrypoint. `run.csh` starts the logging service (rsyslogd), configures
      the serverconf.txt file, makes sure the test output area is writable in case it
      is passed from the host as a volume, and run runs `gtmtest.csh -nomail -noencrypt
      -fg -stdout 1` plus any other arguments.
      - `docker/entry.csh`: almost identical to `run.csh`, except that it prints out
      some debug messages to the user and drops them into a shell. It's intended to be
      used to debug problems in the test system. See `docker/notes.txt` for information
      on how to invoke it.
      - `docker/install_yottadb.csh` builds and installs YottaDB and copies a bunch of
      files required by the test system. A major limitation now is that it only builds
      the master branch from the main repo at gitlab. It is possible to modify the
      Dockerfile and this script to take arguments from the `docker build` command to
      build a different YottaDB repo and/or branch, but that's left as a future
      enhancement after the proof-of-concept.
      - `docker/serverconf.txt` is a skeleton serverconf.txt required by the test
      system. It's modified by the run scripts to put in the correct runtime
      information.
      
      Detailed Changes for com folder for stdout change
      -------------------------------------------------
      Add new behavior for `-stdout`:
      `-stdout` = same as previous behavior (very verbose)
      `-stdout 0` = no verbose output
      `-stdout 1` = print each subtest result (what docker will use)
      `-stdout 2` = very verbose. Same as -stdout.
      
      Revert `com/do_random_settings.csh` changes in "Fix rare v62000/gtm8086
      test hang in recent JNLSWITCHRETRY test changes
      (YDB#235)", commit 794ed6d0;
      the change there caused an undefined variable issue
      (`$gtm_test_jnlfile_sync`).
      
      Remove message from `com/gtm_test_ipv6_random.csh` as it prints to output.
      Users can run with `-stdout 2` to see exactly what is happening.
      
      In `submit.awk`, don't print the messsage "The rough results are..."
      when doing verbose output, as the verbose output makes this message
      superflous. Also, account for multiple non-zero values of `$tst_stdout`.
      
      In `submit_test.csh`:
      - Only set echo, verbose if `-stdout` is 0 (when output is redirected to
      `$tst.log`) or when `-stdout` is 2 (when verbose output is requested).
      Therefore, `-stdout 1` only only prints subtest results.
      - Redo how watchdog process is invoked to suppress shell backgrounding
      messages.
      - Print test results to screen if `-stdout` is 1 or 2.
      f39d61b9
    • Narayanan Iyer's avatar
      Disable dbg-only ydb_lockhash_n_bits env var (induces lock collisions) as... · ed1b9555
      Narayanan Iyer authored
      Disable dbg-only ydb_lockhash_n_bits env var (induces lock collisions) as YDB#297 has been reverted as part of YDB#673
      
      Main fix
      --------
      * See diff of commit for detailed comment on why we need to disable this env var.
      
      Misc notes
      ----------
      * Note that currently `ydb_lockhash_n_bits` is randomly enabled in the test framework but we do not see
        any test failures because of a bug in line 1101 (which is encountered just before the `ydb_lockhash_n_bits`
        processing happens) and results in an `undefined variable` error for the variable `gtm_test_jnlpool_sync`.
      
        ```sh
        com/do_random_settings.csh
        --------------------------
            1100 if !($?gtm_test_jnlpool_sync) then
        --> 1101         if ((1 != $gtm_test_jnlpool_sync) && (2 >= $randnumbers[44])) then
        ```
      
      * Once the above issue is fixed (will happen in a separate commit), we will start noticing test failures/hangs
        due to `ydb_lockhash_n_bits` env var usage.
      
        One prominent failure symptom that we will see is an assert failure.
      
        ```diff
        > %YDB-F-ASSERT, Assert failed in sr_port/mlk_shrblk_delete_if_empty.c line 113 for expression (hash == d->hash)
        ```
      
        Another prominent failure symptom that we will see is a test hang.
      
      * All these failures are now avoided by this commit addressing the issue before that code path is reached.
      ed1b9555
  20. 27 Jul, 2021 1 commit
    • Brad Westhafer's avatar
      Disable timing/largelvarray subtest on AARCH64 Debian as well. · 1cb6b148
      Brad Westhafer authored
      * Since timing/largelvarray was disabled on AARCH64 Ubuntu a couple weeks ago, we've seen 3 failures in in-house testing on AARCH64 Debian.
      
      * In failure 1, the average was expected to increase by at most 2x but increased by 6.3x
      
      ```diff
      1c1,97
      < PASS from largelvarray : Average node creation time remained within limits from 2**10 to 2**22 nodes
      ---
      > TEST-E-FAIL : Average(10) = 2.1728515625 but Average(22) = 13.7046444416046142 (more than 2x)
      > FAIL from largelvarray : Average node creation time increased more than 2x from 2**10 to 2**22 nodes
      ```
      
      * In failure 2, the average increased by 3.68x
      
      ```diff
      1c1,97
      < PASS from largelvarray : Average node creation time remained within limits from 2**10 to 2**22 nodes
      ---
      > TEST-E-FAIL : Average(10) = 4.5029296875 but Average(22) = 16.5863103866577148 (more than 2x)
      ```
      
      * In failure 3, the average increased by 8.52x
      
      ```diff
      1c1,97
      < PASS from largelvarray : Average node creation time remained within limits from 2**10 to 2**22 nodes
      ---
      > TEST-E-FAIL : Average(10) = 1.3837890625 but Average(22) = 11.795029878616333 (more than 2x)
      ```
      
      * Since we've seen 3 failures on AARCH64 machines running Debian in 2 weekend test runs since disabling it on AARCH64 machines running Ubuntu, we are now disabling the test for AARCH64.
      1cb6b148
  21. 26 Jul, 2021 1 commit
    • Brad Westhafer's avatar
      Fix sudo/ydb306 failures and update outref for sudo/plugins and sudo/pluginsonly · d0da71ce
      Brad Westhafer authored
      This commit fixes a couple failures for the sudo/ydb306 test:
      
      * The test was failing because the reference file was never updated after the YDB!1012 change that added "Now installing YDBZlib" to the output for the `--zlib` option.
      * On an in-house Arch machine, the test would sometimes fail with a `NONUTF8LOCALE` error because the locale was automatically set to `ANSI_X3.4-1968`. The fix is to do the `$switch_chset "UTF-8"` earlier which ensures that a UTF-8 locale is set. This is the same thing that is done in the sudo/plugins and sudo/pluginsonly tests to avoid `NONUTF8LOCALE` errors.
      
      This commit also updates the sudo/plugins and sudo/pluginsonly tests to reflect the change in YDB!1020 that suppresses the `Cloning into '.'...` message. This change was made because this message did not show up on an in-house RHEL7 machine causing these tests to fail.
      d0da71ce
  22. 20 Jul, 2021 2 commits
    • Narayanan Iyer's avatar
      Fix rare v60000_1/wc_blocked subtest hang due to %YDB-E-INSUNKNOWN error · d144821c
      Narayanan Iyer authored
      Background
      ----------
      * We noticed a hang in one run of the `v60000_1/wc_blocked` subtest in in-house testing.
        This happens very rarely (saw a hang only in the 41st test run out of hundred test runs).
      
      * The test starts replication between `INST1` (source side) and `INST2` (receiver side) in line 362 below.
        Soon afterwards, at line 372, it crashes the receiver side (`INST2`). And a little later at line 407,
        after a rollback, restarts the receiver and expects replication to resume fine.
      
        ```sh
        v60000/u_inref/wc_blocked.csh
        -----------------------------
            361 # start both instances
        --> 362 $MSR START INST1 INST2 RP
            363
              .
              .
            371 # crash secondary and take the backup of crashed database
        --> 372 $MSR CRASH INST2
            373
            374 echo
            375
              .
              .
            406 # try to start the receiver again; this time should not get any error messages
        --> 407 $MSR STARTRCV INST1 INST2
            408
            409 echo
            410
            411 # stop both instances
        --> 412 $MSR STOP INST1 INST2
      
      * In the hung test run, the restart of the receiver server (at line 407 above) failed with the following error
        and the receiver server process terminated.
      
        ```
        %YDB-E-INSUNKNOWN, Supplementary Instance INSTANCE2 has no instance definition for non-Supplementary Instance INSTANCE1
        ```
      
      * And because of this, the `$MSR STOP INST1 INST2` (at line 412 above) hung indefinitely waiting for the backlog
        to be cleared (which it never will because the receiver server is no longer up and running).
      
      * The `INSUNKNOWN` error is possible if the receiver side was crashed `before` the source side instance
        information got recorded in the receiver side instance file as part of the first start of replication.
      
      * See prior commit 356ed6e1 for more details on a similar issue encountered in
        another subtest and how to fix that.
      
      Fix
      ---
      * Like in the prior commit, the fix for this failure is to wait for the `New History Content` message to show up
        in the update process log before crashing the receiver side (in line 372 above). This ensures the instance
        information of the source side is recorded in the receiver side instance file and avoids `INSUNKNOWN` errors
        in later receiver side restarts.
      d144821c
    • Narayanan Iyer's avatar
      Fix rare v61000_1/gtm7858 subtest timing failure · 9741c17b
      Narayanan Iyer authored
      * The `v61000_1/gtm7858` subtest failed once in an in-house test run with the following diff.
      
        ```diff
        22a23,24
        > GETOPER-E-NOTFOUND : 2021-07-17 04:45:59 'Process [0-9]* was requested to resume processing' NOT found in syslog between 2021-07-17 04:40:52 and 2021-07-17 04:45:53 after waiting for 300 seconds.
        .
        .
        ```
      
      * This is a white-box test case that sets `gtm_white_box_test_case_number` env var to `99`. This
        corresponds to the `WBTEST_HOLD_GTMSOURCE_SRV_LATCH` macro in the `YDB` project.
      
        In this case, the source server has code to sleep indefinitely after grabbing the latch. It would
        then be waiting for a concurrent online rollback invocation to send it a `SIGCONT`. And that
        would break its otherwise indefinite sleep loop and it would then continue with its processing.
      
      * In the failed test run, the source server (which runs in the background after the `$MSR START INST1 INST2 RP`
        command returns) did not reach the point where it grabs the latch by the time the test script next invoked
        the online rollback. In this case, the online rollback finishes without sending a `SIGCONT` signal to the
        source server and so the source server when it reaches the code where it grabs the latch sleeps indefinitely
        waiting for a `SIGCONT` that never comes. And in turn causes the test to fail because `getoper.csh` call
        does not see the expected `requested to resume` message in the syslog.
      
      * The fix is to log a message in the source server just before it is about to go to the sleep loop
        (changes in YDB!1017) and wait for that message to show up in the test script before we proceed
        to invoke the online rollback (changes in this commit as part of the `YDBTest` project).
      9741c17b
  23. 19 Jul, 2021 4 commits
    • Brad Westhafer's avatar
      [YDB#705] New sudo/pluginsonly test for installing plugins with --plugins-only · de63d104
      Brad Westhafer authored
      This commit adds a sudo/pluginsonly test that tests that installing plugins with the ydbinstall script option `--plugins-only` works correctly for YDBAIM, YDBCrypt, YDBPosix and YDBZlib with and without UTF-8.
      de63d104
    • Brad Westhafer's avatar
      Fix sudo/plugins failures by renaming $utf8 variable to $utf · 15bab2b4
      Brad Westhafer authored
      The sudo/plugins test failed occassionally with the wget error `Error during wget of YottaDB distribution file https://gitlab.com/api/v4/projects/7957109/repository/tags/`. The cause of the failure was due to the test using a variable name ($utf8) that is sometimes set in `com/set_gtmroutines.csh`.
      
      When `com/set_gtmroutines.csh` set $utf8 to `/utf8`, this overrode the $utf8 variable in the test which was set to `--utf8 default` which confused the ydbinstall script into thinking that the test should install a version of YottaDB called `utf8`. When it attempted to wget this version from gitlab, the wget command would fail because no `utf8` tag exists on gitlab in the YDB repo. The fix for this is to rename the variable in the sudo/plugins test from $utf8 to $utf to eliminate the conflict with the test system's $utf8 variable.
      15bab2b4
    • Brad Westhafer's avatar
      Fix spurious timing/gtm9115 failures by tolerating up to 10% slowdown for 16 digit %DO · 72f293fd
      Brad Westhafer authored
      This test addresses a couple of spurious timing/gtm9115 test failures by tolerating up to a 10 percent slowdown for 16 digit %DO just like we already do for 14 digit %HO and 16 digit %OH. We've seen 2 such failures in 3 months which is even less frequent than the 14 digit %HO and 16 digit %OH failures were before !1149.
      72f293fd
    • Narayanan Iyer's avatar
      Fix v62000_0/gtm7926rcvr subtest failure due to malformed prior version choice · 6bde689f
      Narayanan Iyer authored
      * In-house testing of the `v62000_0/gtm7926rcvr` subtest failed once with the following diff.
      
        ```sh
        $ cat gtm7926rcvr.diff
        0a1
        > if: Badly formed number.
      
        $ cat priorver.txt
        V63009_R131C
        ```
      
      * In this case, the random prior version that was chosen had a `_R131C` suffix. This caused
        the `ydbrel` variable in `com/ydb_prior_ver_check.csh` to end up with the value `131C` which
        meant it was no longer an integer like the `if ($ydbrel < 124)` check expected it to be causing
        the `if: Badly formed number` error.
      
        This is fixed by taking only the numeric portion (first 3 digits past the `_R`) and setting
        that value to the `ydbrel` variable.
      
      * In addition, `com/random_ver.csh` (which first picks the prior random version) has been enhanced
        to only pick versions which have `_Rxxx` suffix where `xxx` is an even number (as even number
        corresponds to a production build of the prior version and odd numbers correspond to development
        interim builds which can not always be safely chosen for testing).
      6bde689f
  24. 16 Jul, 2021 5 commits
    • Narayanan Iyer's avatar
      Fix rare io_0/zshow_principal subtest failure on a slow/loaded system · 2d2f686b
      Narayanan Iyer authored
      * We had one rare failure of the `io_0/zshow_principal` subtest on an ARMV6L system with the following diff.
      
        ```diff
        --- zshow_principal/zshow_principal.diff ---
        28,61d27
        < ZSHOW "D" with $PRINCIPAL output redirected to a file - zshowprin1.out
        < /dev/pts/* OPEN TERMINAL NOPAST NOESCA NOREADS TYPE WIDTH=* LENG=*
        < 0-out OPEN RMS STREAM
        < A("D",1)="/dev/pts/* OPEN TERMINAL NOPAST NOESCA NOREADS TYPE WIDTH=* LENG=* "
        < A("D",2)="0-out OPEN RMS STREAM "
        < ZSHOW "D" with $PRINCIPAL input redirected by |
        < 0 OPEN FIFO STREAM
        < 0-out /dev/pts/* OPEN TERMINAL NOCENE NOPAST NOESCA NOREADS TYPE NOWRAP WIDTH=* LENG=*
        < TESTING PIPE1
        < A("D",1)="0 OPEN FIFO STREAM "
        < A("D",2)="0-out /dev/pts/* OPEN TERMINAL NOCENE NOPAST NOESCA NOREADS TYPE NOWRAP WIDTH=* LENG=* "
        < ZSHOW "D" with $PRINCIPAL output redirected by |
        < /dev/pts/* OPEN TERMINAL NOPAST NOESCA NOREADS TYPE WIDTH=* LENG=*
        < 0-out OPEN FIFO STREAM
        < A(...
      2d2f686b
    • Narayanan Iyer's avatar
      Disable timing/largelvarray subtest on AARCH64 Ubuntu (keep it enabled on AARCH64 Debian) · 1b3dc271
      Narayanan Iyer authored
      * Recently we had 2 failures of the `timing_0/largelvarray` subtest in in-house testing.
      
      * In failure 1, the average was expected to increase by at most 2x but it increased by 5.69x
      
        ```diff
        1c1,97
        < PASS from largelvarray : Average node creation time remained within limits from 2**10 to 2**22 nodes
        ---
        > TEST-E-FAIL : Average(10) = 1.537109375 but Average(22) = 8.72900605201721191 (more than 2x)
        > FAIL from largelvarray : Average node creation time increased more than 2x from 2**10 to 2**22 nodes
        ```
      
      * In failure 2, the average increased by 5.56x (instead of the limit of 2x).
      
        ```diff
        1c1,97
        < PASS from largelvarray : Average node creation time remained within limits from 2**10 to 2**22 nodes
        ---
        > TEST-E-FAIL : Average(10) = .71484375 but Average(22) = 3.95405507087707519 (more than 2x)
        > FAIL from largelvarray : Average node creation time increased more than 2x from 2**10 to 2**22 nodes
        ```
      
      * Both failures were on an AARCH64 system running `Ubuntu 20.04` whereas an AARCH64 system running `Debian 11`
        (aka `Bullseye`) always passed this test.
      
      * It is not clear why only `Ubuntu 20.04` systems end up with this failure. Since we have at least `Debian 11`
        systems passing this and testing this currently on AARCH64, this subtest is disabled on Ubuntu AARCH64 systems.
      
      * If this starts failing on Debian AARCH64 systems too, then we will spend more time on this.
      1b3dc271
    • Narayanan Iyer's avatar
      Enhance tptime/tptime subtest to help debug rare test failure in unwztrap entryref · 7c8d1840
      Narayanan Iyer authored
      * The `tptime_0/tptime` subtest once had a rare failure with the following diff.
      
        ```diff
        --- tptime/tptime.diff ---
        141c141
        <  Subtest unwztrap completed successfully (got interrupt)
        ---
        >  Subtest unwztrap failed - duration was 14 seconds - min: 15  max: 30
        ```
      
      * We do a sequence of 15 iterations of `HANG 1` and expect the total elapsed time to be at least 15 seconds
        and at most 30 seconds (to take into account system load). But we found the elapsed time was 14 seconds.
      
      * This means that one `HANG 1` slept for less than `1 second` in at least one iteration. That is not correct
        as `HANG` is supposed to guarantee `at least` the specified time as the elapsed time before it returns.
      
      * Another possibility that can explain the failure might be case of `ntpd` adjusting the system time backwards.
      
      * Towards better determining exactly what happened, the test M program has been enhanced to record each M line
        ...
      7c8d1840
    • Narayanan Iyer's avatar
      Fix rare msreplic_F_1/errors subtest failure due to a test timing issue · 86c57a2c
      Narayanan Iyer authored
      * This is an interesting failure. It is a multi-host test.
      
      * Pasted below are parts of the diff file that has the relevant errors and secondary errors.
        The primary error is the Assert failure.
      
        ```diff
        $ cat errors.diff
        276a277
        > TEST-E-MULTISITE replic action failed.Pls. check msr_execute_##FILTERED##NO.csh and all logs related to ##FILTERED## ##TIMESTAMP##
        .
        > %YDB-E-NOJNLPOOL, No journal pool info found in the replication instance of msreplic_F_1/errors/instance2/mumps.repl
        .
        > %YDB-E-UPDSYNC2MTINS, Can only UPDATERESYNC with an empty instance file
        .
        > %YDB-E-NORECVPOOL, No receiver pool info found in the replication instance of msreplic_F_1/errors/instance3/mumps.repl
        .
        > remotehost:msreplic_F_1/errors/instance3/inst_create.out
        > %YDB-F-ASSERT, Assert failed in sr_unix/repl_inst_create.c line 133 for expression (FALSE || WBTEST_ENABLED(WBTEST_REPLINSTSTNDALN))
        .
        ```
      
      * `Line 372` below is the corre...
      86c57a2c
    • Ashok Bhaskar's avatar
      [YottaDB/DBMS/YDBOcto#716] Update `copyright.py` to refer to the current year dynamically · 15285b59
      Ashok Bhaskar authored
      * Updates tools/ci/copyright.py to get the current year based on local time programmatically instead of using
        a hard-coded value that needs to be manually updated each year
      
      * Tracked by YottaDB/DBMS/YDBOcto#716
      15285b59
  25. 15 Jul, 2021 1 commit
  26. 14 Jul, 2021 1 commit