K
kv-cache

Projects with this topic

View KVortex project

Ayi NEDJIMI / KVortex

VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache Engine with Multi-Stream GPU Transfers

https://ayinedjimi-consultants.fr

AI cpp23 cuda GPU-computing high-perform... kv-cache llm-inference machine-lear... vllm vram-offload cpp deep-learning gpu inference nvidia vRAM

0

Updated May 22, 2026

0 0 0 0

Updated May 22, 2026
View flashquant project

Ayi NEDJIMI / flashquant

Extreme KV Cache Compression for LLM Inference — C++17/CUDA implementation of TurboQuant (arXiv 2504.19874). 7.5x compression, <2% quality loss.

https://ayinedjimi-consultants.fr

compression cpp cuda flash-attention gpu inference kv-cache llm machine-lear... PyTorch quantization transformer turboquant vllm

0

Updated May 22, 2026

0 0 0 0

Updated May 22, 2026