1. 14 Jul, 2020 2 commits
  2. 11 Jul, 2020 1 commit
  3. 09 Jul, 2020 4 commits
  4. 07 Jul, 2020 2 commits
  5. 01 Jul, 2020 3 commits
  6. 30 Jun, 2020 3 commits
  7. 29 Jun, 2020 1 commit
  8. 27 Jun, 2020 2 commits
  9. 24 Jun, 2020 1 commit
    • Antonio Sánchez's avatar
      Fix packetmath_1 float tests for arm/aarch64. · 7222f0b6
      Antonio Sánchez authored
      Added missing `pmadd<Packet2f>` for NEON. This leads to significant
      improvement in precision than previous `pmul+padd`, which was causing
      the `pcos` tests to fail. Also added an approx test with
      `std::sin`/`std::cos` since otherwise returning any `a^2+b^2=1` would
      pass.
      
      Modified `log(denorm)` tests.  Denorms are not always supported by all
      systems (returns `::min`), are always flushed to zero on 32-bit arm,
      and configurably flush to zero on sse/avx/aarch64. This leads to
      inconsistent results across different systems (i.e. `-inf` vs `nan`).
      Added a check for existence and exclude ARM.
      
      Removed logistic exactness test, since scalar and vectorized versions
      follow different code-paths due to differences in `pexp` and `pmadd`,
      which result in slightly different values. For example, exactness always
      fails on arm, aarch64, and altivec.
      7222f0b6
  10. 23 Jun, 2020 1 commit
  11. 22 Jun, 2020 1 commit
    • Antonio Sánchez's avatar
      Add missing Packet2l/Packet2ul ops for NEON. · ff4e7a08
      Antonio Sánchez authored
      The current multiply (`pmul`) and comparison operators (`pcmp_lt`,
      `pcmp_le`, `pcmp_eq`) are missing for packets `Packet2l` and
      `Packet2ul`. This leads to compile errors for the `packetmath.cpp` tests
      in clang. Here we add and test the missing ops.
      
      Tested:
      ```
      $ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
      $ adb push packetmath /data/local/tmp/
      $ adb shell "/data/local/tmp/packetmath"
      
      $ arm-linux-gnueabihf-g++ -mfpu=neon -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
      $ adb push packetmath /data/local/tmp/
      $ adb shell "/data/local/tmp/packetmath"
      
      $ clang++ -target aarch64-linux-android21 -static -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
      $ adb push packetmath /data/local/tmp/
      $ adb shell "/data/local/tmp/packetmath"
      
      $ clang++ -target armv7-linux-android21 -static -mfpu=neon -I./ '-DEIGEN_TEST_PART_9=1' '-DEIGEN_TEST_PART_10=1' test/packetmath.cpp -o packetmath
      $ adb push packetmath /data/local/tmp/
      $ adb shell "/data/local/tmp/packetmath"
      ```
      ff4e7a08
  12. 21 Jun, 2020 1 commit
    • Antonio Sánchez's avatar
      Added missing NEON pcasts, update packetmath tests. · 03ebdf6a
      Antonio Sánchez authored
      The NEON `pcast` operators are all implemented and tested for existing
      packets. This requires adding a `pcast(a,b,c,d,e,f,g,h)` for casting
      between `int64_t` and `int8_t` in `GenericPacketMath.h`.
      
      Removed incorrect `HasHalfPacket`  definition for NEON's
      `Packet2l`/`Packet2ul`.
      
      Adjustments were also made to the `packetmath` tests. These include
      - minor bug fixes for cast tests (i.e. 4:1 casts, only casting for
        packets that are vectorizable)
      - added 8:1 cast tests
      - random number generation
        - original had uninteresting 0 to 0 casts for many casts between
          floating-point and integers, and exhibited signed overflow
          undefined behavior
      
      Tested:
      ```
      $ aarch64-linux-gnu-g++ -static -I./ '-DEIGEN_TEST_PART_ALL=1' test/packetmath.cpp -o packetmath
      $ adb push packetmath /data/local/tmp/
      $ adb shell "/data/local/tmp/packetmath"
      ```
      03ebdf6a
  13. 20 Jun, 2020 1 commit
  14. 18 Jun, 2020 1 commit
  15. 16 Jun, 2020 3 commits
    • Nicolas Mellado's avatar
      Update `things you can do` message using cmake commands · cf7adf3a
      Nicolas Mellado authored
      Print cmake commands instead of make commands, which should work for any generator.
      cf7adf3a
    • Ilya Tokar's avatar
      Run two independent chains, when reducing tensors. · 231ce215
      Ilya Tokar authored
      Running two chains exposes more instruction level parallelism,
      by allowing to execute both chains at the same time.
      
      Results are a bit noisy, but for medium length we almost hit
      theoretical upper bound of 2x.
      
      BM_fullReduction_16T/3        [using 16 threads]       17.3ns ±11%        17.4ns ± 9%        ~           (p=0.178 n=18+19)
      BM_fullReduction_16T/4        [using 16 threads]       17.6ns ±17%        17.0ns ±18%        ~           (p=0.835 n=20+19)
      BM_fullReduction_16T/7        [using 16 threads]       18.9ns ±12%        18.2ns ±10%        ~           (p=0.756 n=20+18)
      BM_fullReduction_16T/8        [using 16 threads]       19.8ns ±13%        19.4ns ±21%        ~           (p=0.512 n=20+20)
      BM_fullReduction_16T/10       [using 16 threads]       23.5ns ±15%        20.8ns ±24%     -11.37%        (p=0.000 n=20+19)
      BM_fullReduction_16T/15       [using 16 threads]       35.8ns ±21%        26.9ns ±17%     -24.76%        (p=0.000 n=20+19)
      BM_fullReduction_16T/16       [using 16 threads]       38.7ns ±22%        27.7ns ±18%     -28.40%        (p=0.000 n=20+19)
      BM_fullReduction_16T/31       [using 16 threads]        146ns ±17%          74ns ±11%     -49.05%        (p=0.000 n=20+18)
      BM_fullReduction_16T/32       [using 16 threads]        154ns ±19%          84ns ±30%     -45.79%        (p=0.000 n=20+19)
      BM_fullReduction_16T/64       [using 16 threads]        603ns ± 8%         308ns ±12%     -48.94%        (p=0.000 n=17+17)
      BM_fullReduction_16T/128      [using 16 threads]       2.44µs ±13%        1.22µs ± 1%     -50.29%        (p=0.000 n=17+17)
      BM_fullReduction_16T/256      [using 16 threads]       9.84µs ±14%        5.13µs ±30%     -47.82%        (p=0.000 n=19+19)
      BM_fullReduction_16T/512      [using 16 threads]       78.0µs ± 9%        56.1µs ±17%     -28.02%        (p=0.000 n=18+20)
      BM_fullReduction_16T/1k       [using 16 threads]        325µs ± 5%         263µs ± 4%     -19.00%        (p=0.000 n=20+16)
      BM_fullReduction_16T/2k       [using 16 threads]       1.09ms ± 3%        0.99ms ± 1%      -9.04%        (p=0.000 n=20+20)
      BM_fullReduction_16T/4k       [using 16 threads]       7.66ms ± 3%        7.57ms ± 3%      -1.24%        (p=0.017 n=20+20)
      BM_fullReduction_16T/10k      [using 16 threads]       65.3ms ± 4%        65.0ms ± 3%        ~           (p=0.718 n=20+20)
      231ce215
    • Pedro Caldeira's avatar
      a475bf14
  16. 14 Jun, 2020 1 commit
  17. 11 Jun, 2020 4 commits
  18. 09 Jun, 2020 1 commit
  19. 08 Jun, 2020 1 commit
  20. 05 Jun, 2020 2 commits
  21. 04 Jun, 2020 3 commits
  22. 03 Jun, 2020 1 commit