Machine Learning / AI / LLM Platforms Integration

Description

As NuNet evolves into a decentralized computation platform for AI, integrating Machine Learning (ML), Artificial Intelligence (AI), and Large Language Model (LLM) platforms is essential to support dynamic, distributed inference and training workloads across a diverse ecosystem of heterogeneous devices.

This issue aims to formalize and implement support for integrating popular ML/LLM serving backends such as Ollama, Hugging Face, and OpenAPI-style inference endpoints, while remaining agnostic to hardware (NVIDIA, AMD, Intel) and adaptable to decentralized orchestration across IPFS/libp2p-connected nodes.

Goals

Enable plug-and-play deployment of ML/LLM backends as NuNet-compatible workloads across consumer and enterprise-grade devices.
Standardize API exposure for LLMs (chat-based, multimodal, vision, embedding).
Bridge the compute layer (Docker containers, GPUs, CPUs, accelerators) with AI-specific runtimes (e.g., Transformers, GGUF loaders, Ollama) to support low-latency real-time applications.
Support concurrent multi-model workloads, ensuring that multiple LLMs (e.g., Mistral, LLaMA, Gemma) can run across various containers with GPU-specific affinity logic.

Context & Motivation

With the growing decentralization of AI infrastructure, especially in the wake of community-led model development and local inferencing frameworks (e.g., Ollama, LM Studio, AutoGPTQ), it becomes crucial for NuNet to:

Provide native compatibility with AI workloads being adopted in real-world scenarios like decentralized research (DeSci), real-time vision applications, chat-based agents, and edge-based inferencing.
Address resource-awareness in model deployment—automatically mapping models based on available VRAM, CPU cores, and bandwidth through a decentralized scheduling strategy.
Align with the broader mission of NuNet: to democratize access to computational resources and AI capabilities by enabling anyone with spare computing capacity to contribute to a global AI network.

Avimanyu Bandyopadhyay has already prototyped:

Multi-GPU Ollama-based LLM orchestration across multiple machines.
Model selection based on VRAM availability (40 GB LLaMA-3 inference on 3090, fallbacks to 8B for lighter GPUs). Check the table attached below for more context.
Custom Docker and OpenWebUI deployments for inference across NVIDIA, AMD, and Intel GPUs.
Deployed OpenVoice voice cloning tool on Gradio through DMS.
Deployed JupyterLab through DMS - facilitates decentralized access to Jupyter Notebooks for Python, R, Julia with additional support for terminal access.
Role-playing multi-agent AI simulations using containerized endpoints.

These foundational experiments now need to be hardened and formalized into reusable patterns across the NuNet device ecosystem.

Tasks

Task	Description	Status
🧱 Base Architecture	Define standard architecture for running LLMs via OpenWebUI on decentralized nodes	✅ Initial prototype tested
🐳 Docker Templates	Create Dockerfiles for deploying various models (LLaMA 3, Mistral, Gemma, Moondream, etc.)	🔄 Ongoing
⚡ Model Affinity Engine	Build logic to select best-suited models based on GPU VRAM/availability	✅ PoC done
🔁 Concurrent Model Manager	Enable multi-model support within a single machine with workload routing	🟡 Planned
📚 Videos/Documentation	Create videos, provide user and developer guides for deploying and scaling LLMs in NuNet	🟡 In Progress

Next Steps

Finalize OpenWebUI integration with support for model filtering, usage monitoring, and multi-endpoint support.
Collaborate with DMS and MCP teams to expose LLM containers as service endpoints with registration metadata.
Enable fine-grained usage metrics and response latency logging to improve resource scheduling.
Plan long-term for training support (H2O LM Studio, DeepSpeed, Lightning) once inference workflows are stable.

Additional Notes

This effort ties directly into the strategic direction of NuNet to support AI-as-a-Service at scale.
Integration opens the door for token-based access to LLMs, supporting grant-based or DAO-governed resource allocation.
Important to maintain open-source, hardware-agnostic principles, especially for regions where NVIDIA monopoly isn’t sustainable.

Edited Apr 27, 2025 by Avimanyu Bandyopadhyay