Skip to content

GLAS | Optimize multi-core scanning for balanced execution time

Current Situation

The multi-core feature currently splits the list of rules used by the engine based on the number of available cores (e.g., for 5 cores, rules are split into 5 groups). However, this splitting is based solely on the order of rules on the disk, which is not optimal for performance improvement.

Problem

This naive distribution can lead to significant imbalances in scan duration across cores, as highlighted in customer feedback (see: #514156 (comment 2327291172)).

Proposed Solution

Implement a cached artifact-based approach to store and utilize rule execution times for optimized distribution:

  1. First Scan ("Alignment Scan"): Create timing stats Artifact

    • Generate JSON file mapping rule IDs to execution times (rule ID -> milliseconds)
    • Note: This initial scan will not be optimized
  2. Subsequent Scans: Optimize Distribution

    • Load cached timings during Lightz-AIO init
    • Distribute rules based on execution times to balance load
    • Share cache across scans via artifact publishing
    • Improved performance and balanced execution times expected

Implementation Plan

1. Timing Collection

  • Use the stats published by the engine -testing flag
  • Collect per-rule scan timing data
  • Generate initial cache file

Note: Currently, the stats collection is not straightforward and requires parsing of the engine logs. We should implement a better way to collect those stats.

2. Cache Distribution

  • Research how exactly we can share a cache artifact between different GLAS scans
    • We would like to keep it as a "hidden" artifact like the report artifact.

3. Distribution Algorithm

  • Load cache at startup
  • Sort rules by execution time
  • Use greedy distribution:
    • Start with longest-running rules
    • Assign to core with lowest total time
  • Fallback to current method if no-cache

4. Testing

  • Unit test cache operations
  • Performance benchmarking
  • Validate proper rule splitting based on the cached timing information
Edited by Mher Tolpin