T
turboquant

  • Any
  • Blade
  • C
  • C#
  • C++
  • CMake
  • CSS
  • Dockerfile
  • Go
  • HCL
  • HTML
  • Java
  • JavaScript
  • Jupyter Notebook
  • Kotlin
  • Makefile
  • Objective-C
  • PHP
  • Python
  • Ruby
  • SCSS
  • Shell
  • Swift
  • TSX
  • TypeScript
  • Vue

Projects with this topic

Sort by:
  • Sort by
  • Updated date
  • Name
  • Name, descending
  • Oldest updated
  • Oldest created
  • Last created
  • Most stars
  • Hide archived projects
  • Show archived projects
  • Show archived projects only
  • View flashquant project
    F

    Ayi NEDJIMI / flashquant

    Extreme KV Cache Compression for LLM Inference — C++17/CUDA implementation of TurboQuant (arXiv 2504.19874). 7.5x compression, <2% quality loss.

    https://ayinedjimi-consultants.fr

    compression cpp cuda flash-attention gpu inference kv-cache llm machine-lear... PyTorch quantization transformer turboquant vllm
    0
    Updated May 22, 2026
    0 0 0 0
    Updated May 22, 2026