Projects with this topic
Sort by:
-
useful Gentoo overlay Curated ebuilds, AI, tools & science
Updated -
Experiments with Subliminal Learning
Updated -
VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache Engine with Multi-Stream GPU Transfers
Updated -
Extreme KV Cache Compression for LLM Inference — C++17/CUDA implementation of TurboQuant (arXiv 2504.19874). 7.5x compression, <2% quality loss.
Updated -
Track AI spending across API providers, MCP tools, subscriptions, and self-hosted GPUs from your terminal. One CLI to see what you're actually paying — Anthropic, OpenAI, OpenRouter, vLLM on your own hardware — with real TCO math. Compare your self-hosted $/MTok vs cloud pricing. Local-first, no SaaS required.
Updated