SplitUp: Decentralized AI Inference on Consumer Hardware

This project won Solana Superteam's first AI3 Hackathon in London!!

We're shaking things up, and will be back with an updated README soon!

Hackathon won at Git commit fa3db00a611a7b8754bd2cddaeed81c358c45719.

SplitUp: Decentralized AI Inference on Consumer Hardware

Run any size AI model across distributed consumer GPUs with efficient verification on Solana

🚀 The Problem We Solve

Modern AI models like LLaMA-70B require 80-140GB VRAM, but consumer GPUs only have 8-24GB. Current solutions force centralization or expensive hardware. Verification adds 100%+ overhead in traditional decentralized systems.

SplitUp solves this with automatic model partitioning and our Proof of Sampling Protocol (PoSP) with just 8% verification overhead.

flowchart LR
    subgraph "The SplitUp Solution"
        LM[Large 70B Model] --> |Auto-Partition| P1[Task 1: 12GB]
        P1 --> |Intermediate Result| P2[Task 2: 12GB]
        P2 --> |Intermediate Result| P3[Task 3: 12GB]
        P3 --> |Intermediate Result| P4["..."]
        P4 --> |Intermediate Result| P5[Task N: 12GB]
        P5 --> FR[Final Result]

        P1 -.-> |Assigned to| N1[Consumer GPU 1]
        P2 -.-> |Assigned to| N2[Consumer GPU 2]
        P3 -.-> |Assigned to| N3[Consumer GPU 3]
        P4 -.-> |Assigned to| N4[Consumer GPU ...]
        P5 -.-> |Assigned to| N5[Consumer GPU N]
    end

🔑 Key Technical Advantages

Feature	SplitUp	Others
VRAM Distribution	✅ Run any size model on consumer GPUs	❌ Limited by single node VRAM
Verification Overhead	✅ Only 8% overhead (PoSP)	❌ 100%+ overhead
Memory Safety	✅ Tensor-only operations	❌ Often allows arbitrary code
Hardware Compatibility	✅ Any GPU (NVIDIA, AMD, Intel)	❌ Often vendor-specific
Developer Experience	✅ TinyGrad compatible	❌ Complex custom APIs
Economic Model	✅ Mathematically optimal incentives	❌ Vulnerable to dishonesty

💻 How It Works

Our system integrates EigenTensor's memory-safe computation with Solana's efficient contract platform:

sequenceDiagram
    participant Client as AI Developer
    participant Contract as Solana Contracts
    participant Node as GPU Nodes

    Client->>Contract: 1. Chose model, submit input
    Contract->>Contract: 2. Pick nodes to run computation
    Contract->>Node: 3. Assign tasks to specialized nodes
    Node->>Node: 4. Execute partial computation
    Contract->>Contract: 5. Verify 8% of results randomly
    Node->>Contract: 6. Submit verified results
    Contract->>Client: 7. Return complete output

Auto-Partitioning Magic

# Define your model using TinyGrad-compatible code
model = LLaMAModel(config)
outputs = model(input_ids)

# Automatically partition for distributed execution
partitions = auto_partition(
    graph_program=outputs,
    target_vram=12 * 1024 * 1024 * 1024  # 12GB target
)

🏗️ Technical Architecture

Our system consists of four integrated layers:

Solana Contract Layer
- Model Registry: Stores model metadata, the structure of it's computational DAG (made up of "tasks"), and it's tensor interfaces
- Task Registry: Specifies input and output tensor interfaces for each task, VRAM requirements, and weight file locations
- Node Registry: Tracks specializations, stake amounts, etc
- Model Execution Contract: Assigns tasks based on optimal allocation, tracks execution state, and handles result aggregation
- Verification Contract: Implements PoSP consensus with VRF for 8% random verification
- Staking Contract: Manages deposits, withdrawals, and slashing conditions
Node Execution Layer
- Task Executor: Uses TinyGrad for GPU execution with device-optimized machine code
- Pre-loading System: Downloads and verifies weight files, pre-loads into GPU memory, optimizes for multi-task handling
- Heartbeat Service: Sends regular heartbeats to Oracle Committee
Verification Layer
- Proof of Sampling Protocol: 8% random verification
- Economic incentives: Dishonesty becomes unprofitable
- VRF-based validator selection: Prevents manipulation
Storage Layer
- Model Definitions: Stores complete model specifications with DAG structure and task relationships
- Weight Files: Efficiently stores weights in safetensors format with standardized URI scheme
- Tensor Data: Handles intermediate results with automatic garbage collection and efficient serialization
Client Interface Layer
- Model Deployment CLI: Analyzes model structure for optimal partitioning, creates task definitions, uploads weight files
- Node Management CLI: Registers node capabilities, manages stake deposits and withdrawals, monitors performance

🛠️ Hackathon Deliverables

We've built a complete end-to-end prototype:

EigenTensor Integration
- Memory-safe tensor operations
- TinyGrad-compatible API
- Automatic computational graph analysis
Auto-Partitioning Engine
- Splits models to fit target VRAM constraints
- Optimizes communication between partitions
- Creates clean tensor interfaces between tasks
Solana Programs
- Model and Task Registry: Track model definitions and tasks
- Node Registry: Register ML compute nodes
- Execution Contract: Coordinate inference tasks between nodes
- Verification Contract: Implement PoSP with 8% overhead
Developer Tools
- splitup-deploy: For model developers to register models
- splitup-node: For GPU owners to participate in marketplace
- Web interface for job submission and monitoring
MNIST Demo
- NextJS UI with Tailwind CSS, detect numbers drawn on canvas
- Interactive web demo showcasing model partitioning
- End-to-end flow from model submission to result visualization

🔐 Security & Economics

Our Proof of Sampling Protocol creates a Nash equilibrium where honesty is the dominant strategy:

Only 8% of work gets verified (vs traditional 100%+ overhead)
Verification reward: 1.2× computation cost
Slashing amount: 10× computation cost
Economic security mathematically guaranteed when:
```
p > C/((1-r)(R+S))
```
Where p=verification probability, C=computation cost, r=collusion fraction, R=reward, S=slashing amount

🌐 Advanced Features

Fault Tolerance: Automatic task reassignment for failed nodes (diagram 7)
Optimal Assignment: Nodes can handle multiple adjacent tasks (diagram 9)
Parallel Execution: Independent DAG branches execute simultaneously (diagram 6)
Dynamic Scaling: Execution adapts to available marketplace capacity

📚 Learn More

Built for AI Web3 Hackathon 2025
Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
concrete_plans		concrete_plans
demo		demo
system_diagrams		system_diagrams
v1		v1
README.md		README.md
Tessera Demo.mp4		Tessera Demo.mp4
presentation.md		presentation.md
presentation.pdf		presentation.pdf
tech_stack.md		tech_stack.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SplitUp: Decentralized AI Inference on Consumer Hardware

🚀 The Problem We Solve

🔑 Key Technical Advantages

💻 How It Works

Auto-Partitioning Magic

🏗️ Technical Architecture

🛠️ Hackathon Deliverables

🔐 Security & Economics

🌐 Advanced Features

📚 Learn More

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

ToxicPine/planning-win

Folders and files

Latest commit

History

Repository files navigation

SplitUp: Decentralized AI Inference on Consumer Hardware

🚀 The Problem We Solve

🔑 Key Technical Advantages

💻 How It Works

Auto-Partitioning Magic

🏗️ Technical Architecture

🛠️ Hackathon Deliverables

🔐 Security & Economics

🌐 Advanced Features

📚 Learn More

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages