Aocl integration updated
Add AMD Optimizing CPU Libraries (AOCL) Integration Support
Summary
This MR adds comprehensive support for AMD Optimizing CPU Libraries (AOCL), providing significant performance improvements for mathematical operations on AMD Zen processors through vectorized math functions (VRDA), optimized BLAS operations (BLIS), and enhanced LAPACK routines (libFLAME).
Key Benefits
- Significant speedup for transcendental functions (exp, sin, cos, sqrt, log) on large vectors
- Optimized matrix operations with automatic SIMD instruction selection
- Zero breaking changes - existing Eigen code works unchanged
- Optional integration via compile-time flags with graceful fallback
Technical Implementation
Integration Strategy
- Uses Eigen's template specialization system for seamless operation dispatch
- Threshold-based activation (vectors ≥128 elements) to avoid overhead
- Automatic fallback to standard Eigen when AOCL unavailable
- Contiguous memory optimization with compatible storage orders
AOCL Components
- Vector Math Library (VML): VRDA functions for double-precision transcendentals
- BLIS: Single/multithreaded BLAS implementations optimized for AMD
- libFLAME: LAPACK operations with AMD hardware optimization
Files Added/Modified
New Files
- Eigen/src/Core/AOCL_Support.h - Central configuration and header inclusion
- Eigen/src/Core/Assign_AOCL.h - Template specialization dispatch layer
- cmake/FindAOCL.cmake - CMake module for library detection
- bench/benchmark_aocl.cpp - Performance validation benchmark suite
- doc/UsingAOCL.dox - User documentation
Modified Files
- CMakeLists.txt - AOCL benchmark target with MT/ST configuration
- Eigen/Core - Integration of AOCL headers
Configuration & Usage
Build Configuration
# Single-threaded (maximum compatibility)
cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON -DEIGEN_AOCL_BENCH_USE_MT=OFF
# Multithreaded (maximum performance)
cmake .. -DEIGEN_BUILD_AOCL_BENCH=ON -DEIGEN_AOCL_BENCH_USE_MT=ON
Preprocessor Options
#define EIGEN_USE_AOCL_ALL // Single-threaded integration
#define EIGEN_USE_AOCL_MT // Multithreaded integration
Architecture & Compatibility
AMD Processor Support
- Zen/Zen2: AVX2 optimization
- Zen3: Enhanced cache and IPC improvements
- Zen4/Zen5: AVX-512 support with automatic detection
Platform Compatibility
- Primary: Linux x86_64 with AMD processors
- Compilers: GCC 13+, AOCC 5.0+
- AOCL: Requires AOCL 5.0+ installation
Safety & Quality
Thread Safety
- Single-threaded: Thread-safe, no OpenMP dependencies
- Multithreaded: Requires OpenMP, uses thread-local operations
Backward Compatibility
- Zero breaking changes to existing APIs
- Optional opt-in integration
- Graceful fallback when AOCL unavailable
- Preserves all Eigen memory safety guarantees
Code Quality
- MPL 2.0 licensed with comprehensive documentation
- Clang-formatted to Eigen standards
Testing & Validation
Benchmark Suite
- Vector math operations across different sizes
- Matrix multiplication performance validation
- Real-world scenarios (financial risk computation)
- Automatic AOCL configuration detection
Integration Testing
- Numerical accuracy validation
- Performance regression prevention
- Compiler compatibility verification with clang, aocc and gcc compiler