Machine Learning / AI / LLM Platforms Integration
Description
As NuNet evolves into a decentralized computation platform for AI, integrating Machine Learning (ML), Artificial Intelligence (AI), and Large Language Model (LLM) platforms is essential to support dynamic, distributed inference and training workloads across a diverse ecosystem of heterogeneous devices.
This issue aims to formalize and implement support for integrating popular ML/LLM serving backends such as Ollama, Hugging Face, and OpenAPI-style inference endpoints, while remaining agnostic to hardware (NVIDIA, AMD, Intel) and adaptable to decentralized orchestration across IPFS/libp2p-connected nodes.
Goals
- Enable plug-and-play deployment of ML/LLM backends as NuNet-compatible workloads across consumer and enterprise-grade devices.
- Standardize API exposure for LLMs (chat-based, multimodal, vision, embedding).
- Bridge the compute layer (Docker containers, GPUs, CPUs, accelerators) with AI-specific runtimes (e.g., Transformers, GGUF loaders, Ollama) to support low-latency real-time applications.
- Support concurrent multi-model workloads, ensuring that multiple LLMs (e.g., Mistral, LLaMA, Gemma) can run across various containers with GPU-specific affinity logic.
Context & Motivation
With the growing decentralization of AI infrastructure, especially in the wake of community-led model development and local inferencing frameworks (e.g., Ollama, LM Studio, AutoGPTQ), it becomes crucial for NuNet to:
- Provide native compatibility with AI workloads being adopted in real-world scenarios like decentralized research (DeSci), real-time vision applications, chat-based agents, and edge-based inferencing.
- Address resource-awareness in model deployment—automatically mapping models based on available VRAM, CPU cores, and bandwidth through a decentralized scheduling strategy.
- Align with the broader mission of NuNet: to democratize access to computational resources and AI capabilities by enabling anyone with spare computing capacity to contribute to a global AI network.
Avimanyu Bandyopadhyay has already prototyped:
- Multi-GPU Ollama-based LLM orchestration across multiple machines.
- Model selection based on VRAM availability (40 GB LLaMA-3 inference on 3090, fallbacks to 8B for lighter GPUs). Check the table attached below for more context.
- Custom Docker and OpenWebUI deployments for inference across NVIDIA, AMD, and Intel GPUs.
- Deployed OpenVoice voice cloning tool on Gradio through DMS.
- Deployed JupyterLab through DMS - facilitates decentralized access to Jupyter Notebooks for Python, R, Julia with additional support for terminal access.
- Role-playing multi-agent AI simulations using containerized endpoints.
These foundational experiments now need to be hardened and formalized into reusable patterns across the NuNet device ecosystem.
Tasks
Task | Description | Status |
---|---|---|
|
Define standard architecture for running LLMs via OpenWebUI on decentralized nodes |
|
|
Create Dockerfiles for deploying various models (LLaMA 3, Mistral, Gemma, Moondream, etc.) |
|
|
Build logic to select best-suited models based on GPU VRAM/availability |
|
|
Enable multi-model support within a single machine with workload routing |
|
|
Create videos, provide user and developer guides for deploying and scaling LLMs in NuNet |
|
Next Steps
- Finalize OpenWebUI integration with support for model filtering, usage monitoring, and multi-endpoint support.
- Collaborate with DMS and MCP teams to expose LLM containers as service endpoints with registration metadata.
- Enable fine-grained usage metrics and response latency logging to improve resource scheduling.
- Plan long-term for training support (H2O LM Studio, DeepSpeed, Lightning) once inference workflows are stable.
Additional Notes
- This effort ties directly into the strategic direction of NuNet to support AI-as-a-Service at scale.
- Integration opens the door for token-based access to LLMs, supporting grant-based or DAO-governed resource allocation.
- Important to maintain open-source, hardware-agnostic principles, especially for regions where NVIDIA monopoly isn’t sustainable.