Aocl integration updated

Add AMD Optimizing CPU Libraries (AOCL) Integration Support

Summary

This MR adds comprehensive support for AMD Optimizing CPU Libraries (AOCL), providing significant performance improvements for mathematical operations on AMD Zen processors through vectorized math functions (VRDA), optimized BLAS operations (BLIS), and enhanced LAPACK routines (libFLAME).

Key Benefits

  • Significant speedup for transcendental functions (exp, sin, cos, sqrt, log) on large vectors
  • Optimized matrix operations with automatic SIMD instruction selection
  • Zero breaking changes - existing Eigen code works unchanged
  • Optional integration via compile-time flags with graceful fallback

Technical Implementation

Integration Strategy

  • Uses Eigen's template specialization system for seamless operation dispatch
  • Threshold-based activation (vectors ≥128 elements) to avoid overhead
  • Automatic fallback to standard Eigen when AOCL unavailable
  • Contiguous memory optimization with compatible storage orders

AOCL Components

  1. Vector Math Library (VML): VRDA functions for double-precision transcendentals
  2. BLIS: Single/multithreaded BLAS implementations optimized for AMD
  3. libFLAME: LAPACK operations with AMD hardware optimization

Files Added/Modified

New Files

  • Eigen/src/Core/AOCL_Support.h - Central configuration and header inclusion
  • Eigen/src/Core/Assign_AOCL.h - Template specialization dispatch layer
  • cmake/FindAOCL.cmake - CMake module for library detection
  • bench/benchmark_aocl.cpp - Performance validation benchmark suite
  • doc/UsingAOCL.dox - User documentation

Modified Files

  • CMakeLists.txt - AOCL benchmark target with MT/ST configuration
  • Eigen/Core - Integration of AOCL headers

Configuration & Usage

Build Configuration

# Single-threaded (maximum compatibility)
cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON -DEIGEN_AOCL_BENCH_USE_MT=OFF
# Multithreaded (maximum performance)
cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON -DEIGEN_AOCL_BENCH_USE_MT=ON

Preprocessor Options

#define EIGEN_USE_AOCL_ALL    // Single-threaded integration
#define EIGEN_USE_AOCL_MT     // Multithreaded integration

Architecture & Compatibility

AMD Processor Support

  • Zen/Zen2: AVX2 optimization
  • Zen3: Enhanced cache and IPC improvements
  • Zen4/Zen5: AVX-512 support with automatic detection

Platform Compatibility

  • Primary: Linux x86_64 with AMD processors
  • Compilers: GCC 13+, AOCC 5.0+
  • AOCL: Requires AOCL 5.0+ installation

Safety & Quality

Thread Safety

  • Single-threaded: Thread-safe, no OpenMP dependencies
  • Multithreaded: Requires OpenMP, uses thread-local operations

Backward Compatibility

  • Zero breaking changes to existing APIs
  • Optional opt-in integration
  • Graceful fallback when AOCL unavailable
  • Preserves all Eigen memory safety guarantees

Code Quality

  • MPL 2.0 licensed with comprehensive documentation
  • Clang-formatted to Eigen standards

Testing & Validation

Benchmark Suite

  • Vector math operations across different sizes
  • Matrix multiplication performance validation
  • Real-world scenarios (financial risk computation)
  • Automatic AOCL configuration detection

Integration Testing

  • Numerical accuracy validation
  • Performance regression prevention
  • Compiler compatibility verification with clang, aocc and gcc compiler

Merge request reports

Loading