This project is mirrored from https://*****@github.com/tarantool/tarantool.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Last successful update .
  1. 29 Nov, 2020 2 commits
    • Sergey Ostanevich's avatar
      fiber: handle fiber cancellation for fiber.cond · c6a85142
      Sergey Ostanevich authored
      
      
      Before this patch fiber.cond():wait() just returns for cancelled
      fiber. In contrast fiber.channel():get() throws "fiber is canceled"
      error. This patch unifies behaviour of channels and condvars.
      It also fixes a related net.box module problem #4834 since fiber.cond
      now performs test for fiber cancellation.
      
      Closes #4834
      Closes #5013
      Co-authored-by: Oleg Babin's avatarOleg Babin <olegrok@tarantool.org>
      
      @TarantoolBot document
      Title: fiber.cond():wait() throws if fiber is cancelled
      
      Currently fiber.cond():wait() throws an error if waiting fiber is
      cancelled.
      c6a85142
    • Sergey Ostanevich's avatar
      relay: preparation for fiber:cond to throw · 29e04361
      Sergey Ostanevich authored
      The fiber_cond_wait() will set an error in case fiber is cancelled.
      As a result, the current diag in the fiber can be reset during
      the wal_clear_watcher(). To prevent such overwrite the diag copy from
      the relay into current fiber is moved to the exit of the
      relay_subscribe_f().
      
      Part of #5013
      29e04361
  2. 13 Nov, 2020 1 commit
  3. 12 Nov, 2020 1 commit
  4. 11 Nov, 2020 1 commit
  5. 10 Nov, 2020 9 commits
    • Vladislav Shpilevoy's avatar
      vclock: move to src/lib · b8d3c762
      Vladislav Shpilevoy authored
      Vclock is used in raft, which is going to be moved to src/lib.
      That means vclock also should be moved there.
      
      It is easy, because vclock does not depend on anything in box/.
      
      Needed for #5303
      b8d3c762
    • Vladislav Shpilevoy's avatar
      raft: check box_raft is inited before usage · dcd8e66b
      Vladislav Shpilevoy authored
      Since box_raft is now initialized at runtime and is used from
      several subsystems (memtx for snapshots; applier for accepting
      rows; box.info for monitoring), it may be easy to screw the
      intialization order and accidentally use the not initialized
      global raft object.
      
      This patch adds a sanity check ensuring it does not happen. The
      raft state is set to 0 at program start. Then any access to the
      global raft object firstly checks the state not being 0.
      
      The initialization order will get trickier when raft will stop
      using globals from replication and from box, and will be used from
      them more extensively.
      
      Part of #5303
      dcd8e66b
    • Vladislav Shpilevoy's avatar
      raft: add explicit raft argument to all functions · a886aa06
      Vladislav Shpilevoy authored
      All raft functions worked with a global raft object. That would
      make impossible to move raft to a separate module, where it could
      be properly unit-tested with multiple raft nodes in each test.
      
      The patch adds an explicit raft pointer argument to each raft
      function as a first part of moving raft to a separate library.
      
      The global object is renamed to box_raft_global so as to emphasize
      this is a global box object, not from the future raft library.
      
      Its access now should go through box_raft() function, which will
      get some sanity checks in the next commit.
      
      Part of #5303
      a886aa06
    • Vladislav Shpilevoy's avatar
      fiber: introduce fiber.f_arg · b237799f
      Vladislav Shpilevoy authored
      Struct fiber has a member va_list f_data. It is used to forward
      arguments to the fiber function when fiber_start() is called,
      right from the caller's stack.
      
      But it is useless when fiber is started asynchronously, with
      fiber_new + fiber_wakeup. And there is no way to pass anything
      into such a fiber.
      
      This patch adds a new member 'void *f_arg', which shares memory
      with va_list f_data, and can be used to pass something into the
      fiber.
      
      The feature is going to be used by raft. Currently raft worker
      fiber works only with global variables, but soon it will need to
      have its own pointer at struct raft object. And it can't be
      started with fiber_start(), because raft code does not yield
      anywhere in its state machine.
      
      Needed for #5303
      b237799f
    • Vladislav Shpilevoy's avatar
      raft: fix crash on candidate cfg during WAL write · 9a8688fa
      Vladislav Shpilevoy authored
      Raft state machine crashed if it was configured to be a candidate
      during a WAL write with a known leader.
      
      It tried to start waiting for the leader death, but should have
      waited for the WAL write end first.
      
      The code tried to handle it, but the order of 'if' conditions was
      wrong. WAL write being in progress was checked last, but should
      have been checked first.
      
      Closes #5506
      9a8688fa
    • Vladislav Shpilevoy's avatar
      raft: fix crash on sm restart during WAL write · b4c4387d
      Vladislav Shpilevoy authored
      Raft state machine crashed if was restarted during a WAL write
      being in progress. When the machine was started, it didn't assume
      there still can be a not finished WAL write from the time it was
      enabled earlier.
      
      The patch makes it continue waiting for the write end.
      
      Part of #5506
      b4c4387d
    • Vladislav Shpilevoy's avatar
      raft: send state when state machine is started · 5293ee0d
      Vladislav Shpilevoy authored
      Raft didn't broadcast its state when the state machine was
      started. It could lead to the state being never sent until some
      other node would generate a term number bigger that the local one.
      
      That happened when a node participated in some elections,
      accumulated a big term number, then the election was turned off,
      and a new replica was connected in a 'candidate' state. Then the
      first node was configured to be a 'voter'.
      
      The first node didn't send anything to the replica, because at
      the moment of its connection the election was off.
      
      So the replica started from term 1, tried to start elections in
      this term, but was ignored by the first node. It waited for
      election timeout, bumped the term to 2, and the process was
      repeated until the replica reached the first node's term + 1. It
      could take very long time.
      
      The patch fixes it so now Raft broadcasts its state when it is
      enabled. To cover the replicas connected while it was disabled.
      
      Closes #5499
      5293ee0d
    • Vladislav Shpilevoy's avatar
      test: fix a typo in election_basic · 455c6d18
      Vladislav Shpilevoy authored
      The typo led to not resetting the election timeout to the default
      value. It was left 1000, and as a result the next election tests
      could work extremely long.
      
      Part of #5499
      455c6d18
    • Vladislav Shpilevoy's avatar
      raft: fix crash in worker fiber · 03512e53
      Vladislav Shpilevoy authored
      Raft worker fiber does all the heavy and yielding jobs. These are
      2 - disk write, and network broadcast. Disk write yields. Network
      broadcast is slow, so it happens at most once per event loop
      iteration.
      
      The worker on each iteration checks if any of these 2 jobs is
      active, and if not, it goes to sleep until an explicit wakeup.
      
      But there was a bug. Before going to sleep it did a yield + a
      check that there is nothing to do. However during the yield new
      tasks could appear, and the check failed, leading to a crash.
      
      The patch reorganizes this part of the code so now the worker does
      not yield between checking new tasks and going to sleep.
      
      No test, because extremely hard to reproduce, and don't want to
      clog this part of the code with error injections.
      03512e53
  6. 03 Nov, 2020 4 commits
    • Sergey Ostanevich's avatar
      core: fix static_alloc buffer overflow · 99d6c8a4
      Sergey Ostanevich authored and Vladislav Shpilevoy's avatar Vladislav Shpilevoy committed
      
      
      Static buffer overflow in thread local pool causes random fails on OSX
      platform. This was caused by an incorrect use of the allocator result.
      
      Fixes #5312
      Co-authored-by: Vladislav Shpilevoy's avatarVladislav Shpilevoy <v.shpilevoy@tarantool.org>
      99d6c8a4
    • Vladislav Shpilevoy's avatar
      txn: warn "too long WAL" on write, not on commit · 58cc0822
      Vladislav Shpilevoy authored
      "Too long WAL write" is supposed to warn a user that either the
      disk write was too long, or the event loop is too slow, maybe due
      to certain fibers not doing yields often enough.
      
      It was printed by the code doing the transaction commit. As a
      result, for synchronous transactions the check also included the
      replication time, often overflowing a threshold and printing
      "too long WAL write" even when it had nothing to do with a WAL
      write or the event loop being too slow.
      
      The patch makes so the warning is checked and printed after WAL
      write right away, not after commit.
      
      Closes #5139
      58cc0822
    • Vladislav Shpilevoy's avatar
      txn: split complete into success and fail paths · bd5b7166
      Vladislav Shpilevoy authored
      txn_complete used to handle all the transaction outcomes:
      - manual rollback;
      - error at WAL write;
      - successful WAL write and commit;
      - successful WAL write and wait for synchronization with replicas.
      
      The code became a mess after synchronous replication was
      introduced. This patch splits txn_complete's code into multiple
      pieces.
      
      Now WAL write success and fail are handled by
      txn_on_journal_write() exclusively. It also runs the WAL write
      triggers. It was very strange to call them from txn_complete().
      
      txn_on_journal_write() also checks if the transaction is
      synchronous, and if it is not, it completes it with
      txn_complete_success() whose code is simple now, and only works
      on committing the changes.
      
      In case of fail the transaction always ends up in
      txn_complete_fail().
      
      These success and fail functions are now used by the limbo as
      well. It appeared all the places finishing a transaction always
      know if they want to fail it or complete successfully.
      
      This should also remove a few ifs from the hot code of transaction
      commit.
      
      The patch simplifies the code in order to fix the false warning
      about too long WAL write for synchronous transactions, which is
      printed not at WAL write now, but at commit. These two events are
      far from each other for synchro requests.
      
      Part of #5139
      bd5b7166
    • Vladislav Shpilevoy's avatar
      txn: rename txn_complete_async to txn_on_journal_write · a512cb8a
      Vladislav Shpilevoy authored
      The function is called only by the journal when write is finished.
      
      Besides, it may not complete the transaction. In case of
      synchronous replication it is not enough for completion. It means,
      it can't have 'complete' in its name.
      
      Also the function is never used out of txn.c, so it is removed
      from txn.h and is now static.
      
      The patch is a preparation for not spaming "too long WAL write" on
      synchronous transactions, because it is simply misleading.
      
      Part of #5139
      a512cb8a
  7. 02 Nov, 2020 2 commits
  8. 01 Nov, 2020 7 commits
    • Alexander V. Tikhonov's avatar
      test: fix hanging of vinyl/gh.test.lua · 42c64d06
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Found that the previously fixed vinyl/gh.test.lua test in commit:
      
        94dc5bdd ('test: gh test hangs after gh-4957-too-many-upserts')
      
      with adding fiber.sleep(1) workaround to avoid of raise from the
      previously run vinyl/gh-4957-too-many-upserts.test.lua test can be
      changed in the other way. The new change from one side will leave
      the found issue untouched to be able to resolve it within opened
      issue in github. And from the other side it will let the test-run
      tool to be able to avoid of this issue using fragile list feature
      to save the stability of testing due to found issue is flaky and
      can be passed on reruns.
      
      The current fix changes the forever waiting loop to especially
      created for such situations test_run:wait_cond() routine which has
      timeout in it to avoid of hanging the test till global timeout will
      occure. It will let the testing to be continued even after the fail.
      
      Needed for #5141
      42c64d06
    • Alexander V. Tikhonov's avatar
      test: fix flaky vinyl/gh-4957-too-many-upserts · 5c09e52b
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Added restart the current server to resolve the issue #5141 which
      reproduced in test:
      
        vinyl/gh-5141-invalid-vylog-file.test.lua
      
      Added test-run filter on box.snapshot error message:
      
        'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'
      
      to avoid of printing changing data in results file to be able to use
      its checksums in fragile list of test-run to rerun it as flaky issue.
      
      Part of #5141
      5c09e52b
    • Alexander V. Tikhonov's avatar
      test: create reproducer for #5141 · 0e6a61ac
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Created the stable reproducer for the issue #5141:
      
        box.snapshot()
        ---
       -- ok
       +- error: 'Invalid VYLOG file: Slice <NUM> deleted but not registered'
        ...
      
      flaky occured in vinyl/ suite tests if running after the test:
      
        vinyl/gh-4957-too-many-upserts.test.lua
      
      as new standalone test:
      
        vinyl/gh-5141-invalid-vylog-file.test.lua
      
      based on test:
      
        vinyl/gh-4957-too-many-upserts.test.lua
      
      Due to issue not reproduced on FreeBSD 12, then test was blocked with:
      
        vinyl/gh-5141-invalid-vylog-file.skipcond
      
      Needed for #5141
      0e6a61ac
    • Alexander V. Tikhonov's avatar
      test: add test filter for box.snapshot · 5ad18786
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Added test-run filter on box.snapshot error message:
      
        'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'
      
      to avoid of printing changing data in results file to be able to use
      its checksums in fragile list of test-run to rerun it as flaky issue.
      Also added checksums to fragile list for the following tests:
      
        vinyl/iterator.test.lua                       gh-5141
        vinyl/snapshot.test.lua                       gh-4984
      
      Needed for #5141
      Needed for #4984
      5ad18786
    • Alexander V. Tikhonov's avatar
      gitlab-ci: test default gcc on CentOS 7 · 12453d51
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Sometimes it is convenient to use default compiler on CentOS 7.
      Added test job which uses for compiling default compiler files:
      
        CC=/usr/bin/gcc
        CXX=/usr/bin/g++
      
      Closes #4941
      12453d51
    • Alexander V. Tikhonov's avatar
      build: add Werror flag on packages builds · 07360d03
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Added ENABLE_WERROR flag to build options to enable Werror.
      
      Part of #4941
      07360d03
    • Alexander V. Tikhonov's avatar
      build: fix Werror warning in test/unit:fiber*.c* · 6b3ed663
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Building with gcc-4.8.5 on CentOS 7 found issue:
      
      cd /source/build/usr/src/debug/tarantool-2.6.0.144/test/unit && /usr/bin/g++ ... -Wp,-D_FORTIFY_SOURCE=2 ... -O2 ... -O0 -o CMakeFiles/fiber.test.dir/fiber.cc.o -c /source/build/usr/src/debug/tarantool-2.6.0.144/test/unit/fiber.cc
      In file included from /usr/include/inttypes.h:25:0,
                       from /source/build/usr/src/debug/tarantool-2.6.0.144/src/lib/small/small/region.h:34,
                       from /source/build/usr/src/debug/tarantool-2.6.0.144/src/lib/core/memory.h:33,
                       from /source/build/usr/src/debug/tarantool-2.6.0.144/test/unit/fiber.cc:1:
      /usr/include/features.h:330:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp]
       # warning _FORTIFY_SOURCE requires compiling with optimization (-O)
      
      It happened because _FORTIFY_SOURCE=2 flag needed -O[1|2] optimization,
      but latest set in command was -O0. To fix it removed not needed '-O0'
      optimization from test/unit/CmakeLists.txt file. This optimization
      became unneeded after the commit:
      
        aa78a941 ("test/uint: fiber")
      
      when the test was completely rewritten.
      
      Needed for #4941
      6b3ed663
  9. 30 Oct, 2020 5 commits
    • Alexander Turenko's avatar
      test: update test-run · 8a0d45f2
      Alexander Turenko authored
      Store *.reject files in ${BUILD}/test/var/rejects/<...>/ instead of
      ${SOURCE}/test/<...>/.
      
      The past approach leads to problems with testing, when the out of source
      build is used and sources are on a read-only filesystem. The main
      problem is when a test fails, but it is marked as fragile and should be
      run again. The test fail assumes storing of the .reject file and the
      testing fails on attempt to write to the read-only filesystem. The
      re-run is not performed so.
      
      Side effect for the in-source build: since the test/var/ directory is
      gitignored, the *.reject files will not shown in `git status` output
      anymore.
      
      https://github.com/tarantool/test-run/pull/209
      
      Follows up #4874
      8a0d45f2
    • Mary Feofanova's avatar
      box/memtx: support bitset indexes for binary fields · 14bf2fdd
      Mary Feofanova authored and Nikita Pettik's avatar Nikita Pettik committed
      Closes #5071
      
      @TarantoolBot document
      Title: memtx: varbinary supported in bitset indexes
      Now it is possible to create bitset indexes on the fields of varbinary type,
      e.g.: s:create_index('b', {type = 'bitset', parts = {1, "varbinary"}})
      14bf2fdd
    • Kirill Yukhin's avatar
      luajit: bump new version · d3b1c481
      Kirill Yukhin authored
      * test: fix warnings spotted by luacheck
      d3b1c481
    • Mary Feofanova's avatar
      box/lua: new way to define index parts · 42374a16
      Mary Feofanova authored and Nikita Pettik's avatar Nikita Pettik committed
      Previously accepted formats of index parts:
      parts = {field1, type1, field2, type2}, or
      parts = {{field1, type1, ...}, {field2, type2, ...}}
      
      Now it is allowed to write without extra brace if there is one part only:
      parts = {field1, type1, ...}
      
      Closes #2866
      42374a16
    • Sergey Bronnikov's avatar
      gitlab-ci: enhance jobs with jepsen tests · 4ab0ddcc
      Sergey Bronnikov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      To run Jepsen tests in different configurations we need to parametrize run
      script by options, so lein options and number of nodes passed with environment
      variables. By default script runs testing with Tarantool built from latest
      commit.
      
      Added these configurations:
      
      - single instance
      - single instance with enabled TXM
      - cluster with enabled Raft
      - cluster with enabled Raft and TXM
      
      Closes #5437
      4ab0ddcc
  10. 29 Oct, 2020 1 commit
  11. 23 Oct, 2020 1 commit
  12. 22 Oct, 2020 6 commits
    • Aleksandr Lyapunov's avatar
      memtx: fix a bug in unlinking story lists · 2699549f
      Aleksandr Lyapunov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      Tx stories must be linked into correct double-linked list.
      Preserve it.
      
      Part of #5423
      2699549f
    • Aleksandr Lyapunov's avatar
      memtx: fix a bug in TX that caused deletion of a durty tuple · b610758b
      Aleksandr Lyapunov authored and Kirill Yukhin's avatar Kirill Yukhin committed
      There was a mess in tuple refernce in TX history.
      Now it was remade in the following asumptions:
       * a clean tuple belongs to space, and the space implicitly holds
      a reference to the tuple.
       * a dirty tuple belongs to TX manager and a reference is held
      in the corresponding story.
      
      Closes #5423
      b610758b
    • Serge Petrenko's avatar
      raft: fix an assertion failure on transition to voter · d4de0ed1
      Serge Petrenko authored and Kirill Yukhin's avatar Kirill Yukhin committed
      When an instance is configured as candidate, it has a leader death timer
      ticking constantly to schedule an election as soon as leader disappears.
      When the instance receives the leader's heartbeat, it resets the timer
      to its initial value.
      
      When being a voter, the instance ignores heartbeats, since it has
      nothing to wait for. So its timer must be stopped. Otherwise it'll try
      to schedule a new election and fail.
      
      Stop the timer on transition from candidate to voter.
      d4de0ed1
    • Vladislav Shpilevoy's avatar
      raft: don't drop GC when restart relay recovery · e5009dc4
      Vladislav Shpilevoy authored and Kirill Yukhin's avatar Kirill Yukhin committed
      When a node becomes a leader, it restarts relay recovery cursors
      to re-send all the data since the last acked row.
      
      But during recovery restart the relay lost the trigger, which used
      to update GC state in TX thread.
      
      The patch preserves the trigger.
      
      Follow up for #5433
      e5009dc4
    • Vladislav Shpilevoy's avatar
      raft: use local LSN in relay recovery restart · a0a60102
      Vladislav Shpilevoy authored and Kirill Yukhin's avatar Kirill Yukhin committed
      When a Raft node is elected as a leader, it should resend all its
      data to the followers from the last acked vclock. Because while
      the node was not a leader, the other instances ignored all the
      changes from it.
      
      The resending is done via restart of the recovery cursor in the
      relay thread. When the cursor was restarted, it used the last
      acked vclock to find the needed xlog file. But it didn't set the
      local LSN component, which was 0 (replicas don't send it).
      
      When in reality the component was not zero, the recovery cursor
      still tried to find the oldest xlog file having the first local
      row. And it couldn't. The first created local row may be gone long
      time ago.
      
      The patch makes the restart keep the local LSN component
      unchanged, as it was used by the previous recovery cursor, before
      the restart.
      
      Closes #5433
      a0a60102
    • Alexander V. Tikhonov's avatar
      test: added new checksums for flaky tests · c596d313
      Alexander V. Tikhonov authored and Kirill Yukhin's avatar Kirill Yukhin committed
        box/net.box_incorrect_iterator_gh-841.test.lua	gh-5434
        replication/election_basic.test.lua			gh-5368
        replication/election_qsync.test.lua			gh-5430
        replication/election_qsync_stress.test.lua		gh-5395
        replication/gh-4402-info-errno.test.lua		gh-5366
        replication/gh-5426-election-on-off.test.lua		gh-5433
        wal_off/snapshot_stress.test.lua			gh-5431
      c596d313