Skip to content

7txr/cloud-controller

Repository files navigation

Cloud Controller Core

Internal infrastructure automation system for the AstroidMC network. Manages containerized Minecraft game server lifecycle with auto-scaling, capacity management, and Pterodactyl panel integration.

Technical Overview

This controller provides automated lifecycle management of game server instances across the AstroidMC distributed infrastructure. The system implements event-driven orchestration using RabbitMQ message queues, Redis state storage, and the Pterodactyl REST API for container provisioning.

Architecture:

  • Asynchronous event processing via RabbitMQ AMQP consumers
  • Redis-backed server registry for O(1) lookup performance
  • Thread-safe capacity evaluation with 30-second check intervals
  • Pterodactyl API integration for Docker container orchestration
  • Per-game-mode capacity policies with configurable scaling thresholds

Dependencies

Required:

  • Python 3.8+
  • Pterodactyl Panel (v1.0+) with application API access
  • RabbitMQ server (v3.8+)
  • Redis server (v6.0+)
  • Docker container runtime on Pterodactyl nodes

Python Packages:

  • pika (RabbitMQ client)
  • redis (Redis client)
  • requests (HTTP client for Pterodactyl API)
  • pyyaml (configuration parsing)

Optional:

  • BungeeCord/Velocity proxy (for player routing integration)
  • Prometheus/Grafana (for metrics visualization)

System Configuration

Redis Setup

Connection Configuration:

redis:
  host: "10.0.1.50"              # Internal Redis server
  port: 6379
  password: "astroid_redis_2025"
  db: 0
  key_prefix: "minecraft:servers:"

Data Structures Used:

  • minecraft:servers:{game_type}:{server_id} - Hash map of server metadata
  • Keys: pterodactyl_id, ip, port, player_count, max_players, status, created_at
  • TTL: None (persistent until explicit deletion)

RabbitMQ Setup

Connection Configuration:

rabbitmq:
  host: "10.0.1.51"              # Internal RabbitMQ server
  port: 5672
  username: "cloud_controller"
  password: "astroid_rmq_2025"
  vhost: "/minecraft"

Queue Topology:

queues:
  spawn_request: "spawn_request"      # Inbound: Server spawn requests
  server_ready: "server_ready"        # Outbound: Server availability notifications
  server_empty: "server_empty"        # Inbound: Empty server notifications
  player_count: "player_count"        # Inbound: Player count updates

Message Durability:

  • All queues configured as durable (survive broker restart)
  • Messages published with delivery_mode=2 (persistent)
  • Automatic acknowledgment disabled (manual ack after processing)

Pterodactyl Integration

API Configuration:

pterodactyl:
  panel_url: "https://panel.astroidmc.net"
  api_key: "ptla_xxxxxxxxxxxxxxxxxxxxxxxxxxxx"  # Application API key
  node_id: 1                                    # Default node for server deployment

Required API Permissions:

  • server.create - Provision new server instances
  • server.read - Query server status and metadata
  • server.delete - Deprovision servers
  • server.control - Start/stop/restart operations

Capacity Management Configuration

Game Mode Policies

Each game mode has independent capacity settings. Example configuration for BedWars:

minigames:
  bedwars:
    min_servers: 2                    # Minimum running servers (always maintained)
    max_servers: 20                   # Maximum allowed servers (hard limit)
    empty_buffer_count: 3             # Target empty servers ready for instant joins
    max_players_per_server: 16        # Capacity per server instance
    spawn_threshold_percent: 75       # Spawn new server when 75% capacity reached
    despawn_empty_after: 300          # Seconds before empty server is terminated
    nest_id: 1                        # Pterodactyl nest ID
    egg_id: 3                         # Pterodactyl egg ID
    user_id: 1                        # Default server owner user ID
    docker_image: "ghcr.io/astroidmc/bedwars-server:1.8.8"
    startup: "java -Xms512M -Xmx2G -jar server.jar"
    environment:
      GAME_MODE: "BUNGEE"
      MAX_PLAYERS: "16"
      SERVER_TYPE: "BEDWARS"

Scaling Algorithm

The controller evaluates capacity every check_interval seconds (default: 30):

Spawn Decision Logic:

total_capacity = running_servers * max_players_per_server
current_utilization = total_player_count / total_capacity
empty_servers = count(servers where player_count == 0)

if empty_servers < empty_buffer_count:
    spawn_new_server()

if current_utilization >= (spawn_threshold_percent / 100):
    spawn_new_server()

if running_servers < min_servers:
    spawn_servers_to_reach_minimum()

Despawn Decision Logic:

for each server:
    if player_count == 0 AND idle_time >= despawn_empty_after:
        if running_servers > min_servers:
            despawn_server()

Example Scaling Behavior (BedWars):

  • 0 players: 2 servers running (min_servers)
  • 16 players (100% of 1 server): 3-4 servers running (1 active + buffer)
  • 48 players (75% of 4 servers): 5-6 servers running (triggers spawn threshold)
  • 320 players: 20 servers running (max_servers limit reached)

Performance Metrics

Server Provisioning Times:

  • Pterodactyl API call: ~500ms
  • Docker container start: 15-30 seconds
  • Minecraft server initialization: 20-60 seconds
  • Total time to accepting connections: 40-90 seconds

Capacity Evaluation Performance:

  • Redis query time: <5ms per game mode
  • Full evaluation cycle: <100ms for 10 game modes
  • RabbitMQ message processing: <10ms per message

Capacity Evaluation Performance:

  • Redis query time: <5ms per game mode
  • Full evaluation cycle: <100ms for 10 game modes
  • RabbitMQ message processing: <10ms per message

Installation

Prerequisites

Windows Environment:

  • Python 3.8+ installed and added to PATH
  • Git for Windows
  • Access to AstroidMC internal network (VPN required if remote)

Network Access Required:

  • panel.astroidmc.net:443 (Pterodactyl API)
  • 10.0.1.50:6379 (Redis)
  • 10.0.1.51:5672 (RabbitMQ)

Setup Procedure

1. Clone Repository

git clone https://github.com/astroidmc/cloud-controller.git
cd cloud-controller

2. Install Dependencies

pip install -r requirements.txt

3. Configuration

# Create configuration from template
copy config.yml.example config.yml

# Edit with internal credentials
notepad config.yml

Update the following sections with AstroidMC internal values:

  • Pterodactyl API key (obtain from panel.astroidmc.net/admin/api)
  • Redis password (check internal documentation)
  • RabbitMQ credentials (check internal documentation)

4. Verify Connectivity

python -c "import pika, redis, requests; print('Dependencies OK')"

5. Run Controller

# Development mode (foreground)
python main.py

# Production mode (see Deployment section)

Internal Implementation Notes

Caching Strategy

No Local Caching:

  • Redis acts as single source of truth for server state
  • All server lookups query Redis directly (O(1) hash operations)
  • No in-memory cache in controller process
  • Prevents stale data across controller restarts

Redis Key Structure:

minecraft:servers:bedwars:server-abc123
  {
    "pterodactyl_id": "12345",
    "ip": "10.0.2.10",
    "port": 25565,
    "player_count": 8,
    "max_players": 16,
    "status": "running",
    "created_at": "2025-10-20T10:30:00Z",
    "last_updated": "2025-10-20T10:35:00Z"
  }

Thread Safety

Single-Threaded Event Loop:

  • Main thread runs capacity evaluation loop
  • Separate thread for RabbitMQ consumer
  • Thread-safe queue for message passing between threads
  • No shared mutable state between threads

Concurrency Model:

# Main thread: Capacity evaluation (blocking)
while True:
    evaluate_all_game_modes()  # Sequential evaluation
    time.sleep(check_interval)

# Consumer thread: RabbitMQ message processing
def callback(ch, method, properties, body):
    message_queue.put(body)    # Thread-safe queue
    ch.basic_ack(delivery_tag=method.delivery_tag)

Critical Sections:

  • Pterodactyl API calls use requests library (thread-safe)
  • Redis operations use redis-py (connection pooling, thread-safe)
  • RabbitMQ message queue uses threading.Queue (thread-safe)

Data Persistence

Normal Shutdown:

  1. SIGTERM/SIGINT signal received
  2. Stop accepting new RabbitMQ messages
  3. Process remaining messages in queue
  4. Close RabbitMQ connection gracefully
  5. Close Redis connection pool
  6. Exit process (exit code 0)

Emergency Shutdown:

  • Server state persists in Redis (no data loss)
  • RabbitMQ messages requeued if not acknowledged
  • Pterodactyl servers continue running independently
  • Controller restart resumes from Redis state

Data Loss Scenarios:

  • Redis failure: Controller cannot operate (fails fast)
  • RabbitMQ failure: Messages lost if not acknowledged (rare)
  • Pterodactyl API failure: Retry logic with exponential backoff

Performance Characteristics

Memory Usage:

  • Base process: ~50-100 MB
  • Per game mode: ~5-10 MB
  • Total for 10 game modes: ~150-200 MB

CPU Usage:

  • Idle: <1% (sleeping between evaluations)
  • During evaluation: 5-15% spike (API calls, Redis queries)
  • Average: 2-5% on dual-core system

Network Traffic:

  • Redis queries: ~1-5 KB/s average
  • RabbitMQ messages: ~0.5-2 KB/s average
  • Pterodactyl API: Burst traffic during provisioning (~100 KB per server spawn)

Configuration Reference

General Settings

Parameter Type Default Description
log_level string INFO Logging verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL
log_file string cloud_controller.log Log file path (relative or absolute)
check_interval integer 30 Seconds between capacity evaluation cycles
server_startup_timeout integer 180 Maximum seconds to wait for server ready signal

Pterodactyl Settings

Parameter Type Description
panel_url string Pterodactyl panel URL (https://panel.astroidmc.net)
api_key string Application API key from panel
node_id integer Default node ID for server deployment

Redis Settings

Parameter Type Description
host string Redis server IP (10.0.1.50)
port integer Redis port (6379)
password string Redis authentication password
db integer Database number (0-15)
key_prefix string Prefix for all Redis keys

RabbitMQ Settings

Parameter Type Description
host string RabbitMQ server IP (10.0.1.51)
port integer AMQP port (5672)
username string Authentication username
password string Authentication password
queues.* string Queue name mappings

Game Mode Policy Settings

Each game mode under minigames: supports these parameters:

Parameter Type Description
min_servers integer Minimum running servers (baseline capacity)
max_servers integer Maximum allowed servers (hard limit)
empty_buffer_count integer Target number of empty servers
max_players_per_server integer Player capacity per server
spawn_threshold_percent integer Utilization percentage triggering spawn (0-100)
despawn_empty_after integer Seconds before empty server termination
nest_id integer Pterodactyl nest ID
egg_id integer Pterodactyl egg ID
user_id integer Default server owner user ID
docker_image string Docker image for server container
startup string Startup command for container
environment.* map Environment variables for container

Complete configuration documentation: docs/CONFIGURATION.md

RabbitMQ Event Schemas

Spawn Request (Inbound)

Queue: spawn_request
Producer: BungeeCord lobby servers, other game servers

{
  "type": "bedwars",
  "players": 8,
  "priority": "normal",
  "requested_by": "lobby-01"
}

Controller Action:

  • Validates game type exists in configuration
  • Checks if spawn is needed (not at max_servers)
  • Provisions server via Pterodactyl API
  • Responds with server_ready message when available

Server Ready (Outbound)

Queue: server_ready
Consumer: BungeeCord proxy servers

{
  "server_id": "bedwars-abc123",
  "type": "bedwars",
  "ip": "10.0.2.10",
  "port": 25565,
  "pterodactyl_id": "12345",
  "timestamp": "2025-10-20T10:30:00Z"
}

Proxy Action:

  • Registers server in proxy server list
  • Begins routing players to new server
  • Updates load balancer state

Server Empty (Inbound)

Queue: server_empty
Producer: Game server plugins

{
  "server_id": "bedwars-abc123",
  "type": "bedwars",
  "timestamp": "2025-10-20T10:35:00Z"
}

Controller Action:

  • Updates Redis state with zero player count
  • Starts idle timer for server
  • Despawns server after despawn_empty_after seconds

Player Count Update (Inbound)

Queue: player_count
Producer: Game server plugins (periodic updates)

{
  "server_id": "bedwars-abc123",
  "type": "bedwars",
  "player_count": 12,
  "max_players": 16,
  "timestamp": "2025-10-20T10:40:00Z"
}

Controller Action:

  • Updates Redis server metadata
  • Used in capacity evaluation calculations
  • Triggers spawn logic if utilization thresholds exceeded

Complete event documentation: docs/EVENTS.md

Network Deployment

Multi-Server Setup

Infrastructure Requirements:

  • Minimum 1 Pterodactyl node with Docker runtime
  • Shared Redis instance accessible from all nodes
  • Shared RabbitMQ instance accessible from all nodes
  • Network connectivity between all components

Deployment Topology:

[Controller VM]
  - Runs cloud-controller Python process
  - Connects to: Redis, RabbitMQ, Pterodactyl API
  - CPU: 2 cores, RAM: 2 GB, Disk: 10 GB

[Redis VM: 10.0.1.50]
  - Standalone Redis instance
  - Persistence: AOF enabled
  - CPU: 1 core, RAM: 1 GB

[RabbitMQ VM: 10.0.1.51]
  - Standalone RabbitMQ instance
  - Persistence: Queue durability enabled
  - CPU: 1 core, RAM: 1 GB

[Pterodactyl Node(s)]
  - Docker host for game servers
  - Managed by Pterodactyl panel
  - CPU: 8+ cores, RAM: 16+ GB per node

Scaling Considerations:

  • Single controller instance (no horizontal scaling needed)
  • Redis can be clustered for high availability
  • RabbitMQ can be clustered for message durability
  • Pterodactyl supports multiple nodes (horizontal scaling)

High Availability Setup

Controller Redundancy:

  • Primary controller runs on VM-CTRL-01
  • Standby controller can run on VM-CTRL-02
  • Use systemd or supervisor for automatic restart
  • No active-active support (single writer to Redis)

Failover Procedure:

  1. Monitor primary controller health
  2. If primary fails, start standby controller
  3. Standby reads state from Redis
  4. Continues capacity management seamlessly

Data Redundancy:

  • Redis: Configure AOF persistence + RDB snapshots
  • RabbitMQ: Enable queue mirroring in cluster mode
  • Pterodactyl: Database backups (MySQL/MariaDB)

Troubleshooting

Common Issues

Controller fails to start:

Symptom: Python process exits immediately with connection error

Diagnostic Steps:

# Test Redis connectivity
redis-cli -h 10.0.1.50 -a astroid_redis_2025 ping

# Test RabbitMQ connectivity
Test-NetConnection -ComputerName 10.0.1.51 -Port 5672

# Check credentials in config.yml
Select-String -Path config.yml -Pattern "password"

Solution:

  • Verify internal network connectivity (VPN active?)
  • Check credentials against internal documentation
  • Ensure Redis/RabbitMQ services are running

Servers not spawning:

Symptom: Controller logs show capacity evaluation, but no Pterodactyl API calls

Diagnostic Steps:

# Enable debug logging
# In config.yml: log_level: DEBUG

# Check capacity calculation
# Look for: "Current utilization: X%, empty servers: Y"

Solution:

  • Verify game mode configuration exists in config.yml
  • Check if max_servers limit reached
  • Confirm spawn_threshold_percent is not too high
  • Verify Pterodactyl API key has correct permissions

Servers not despawning:

Symptom: Empty servers remain running beyond despawn_empty_after

Diagnostic Steps:

# Check Redis server state
redis-cli -h 10.0.1.50 -a astroid_redis_2025 HGETALL "minecraft:servers:bedwars:server-abc123"

# Verify player_count is 0
# Check last_updated timestamp

Solution:

  • Ensure server plugins send server_empty messages
  • Verify RabbitMQ consumer thread is running
  • Check if server count is at min_servers (won't despawn)

Debug Mode

Enable verbose logging in config.yml:

general:
  log_level: DEBUG
  log_file: cloud_controller_debug.log

Debug Output Includes:

  • RabbitMQ message payloads (full JSON)
  • Pterodactyl API request/response bodies
  • Redis query details (keys, values)
  • Capacity evaluation calculations
  • Thread state information

Performance Impact:

  • Increased log file size (~10x larger)
  • Minimal CPU/memory impact (<5% increase)
  • Safe to run in production for diagnostics

Log Location:

  • Windows: cloud-controller\cloud_controller.log
  • Linux: /var/log/cloud-controller/cloud_controller.log (systemd)

Development Guidelines

Adding New Game Modes

1. Update Configuration

Add new game mode to config.yml:

minigames:
  skywars:  # New game mode
    min_servers: 1
    max_servers: 10
    empty_buffer_count: 2
    max_players_per_server: 12
    spawn_threshold_percent: 80
    despawn_empty_after: 300
    nest_id: 1
    egg_id: 5  # SkyWars egg
    user_id: 1
    docker_image: "ghcr.io/astroidmc/skywars-server:1.8.8"
    startup: "java -Xms512M -Xmx2G -jar server.jar"
    environment:
      GAME_MODE: "BUNGEE"
      MAX_PLAYERS: "12"

2. Restart Controller

No code changes required - controller reads configuration on startup.

3. Verify Deployment

  • Controller logs show new game mode in evaluation cycle
  • Redis keys created for new game mode
  • Servers spawn based on min_servers setting

Code Style Standards

Async Operations:

  • Use Pterodactyl API asynchronously where possible
  • Avoid blocking main evaluation thread
  • Use threading.Queue for cross-thread communication

Thread Safety:

  • No shared mutable state between threads
  • Use thread-safe data structures (Queue, redis-py connection pool)
  • Document any new threading requirements

Input Validation:

  • Validate all RabbitMQ message payloads
  • Check Pterodactyl API responses for errors
  • Handle missing configuration keys gracefully

Error Logging:

  • Use logging module (not print statements)
  • Log exceptions with full stack trace
  • Include context in error messages (game mode, server ID, etc.)

Try-Catch Usage:

try:
    # Risky operation (API call, Redis query)
    result = pterodactyl_api.create_server(...)
except requests.exceptions.RequestException as e:
    logger.error(f"Failed to create server: {e}", exc_info=True)
    # Don't crash - continue with next evaluation
except Exception as e:
    logger.critical(f"Unexpected error: {e}", exc_info=True)
    # Critical errors may require process restart

Testing Checklist

Before deploying changes:

  • Test with single game mode in config.yml
  • Test with multiple game modes (min 3)
  • Verify server spawn logic with different player counts
  • Verify server despawn after idle timeout
  • Test Redis connection failure (graceful degradation)
  • Test RabbitMQ connection failure (retry logic)
  • Test Pterodactyl API failure (error handling)
  • Run for 1 hour minimum in test environment
  • Monitor memory usage (no leaks)
  • Check log output for errors/warnings

Maintenance

Regular Tasks

Daily:

  • Review controller logs for errors/warnings
  • Check Redis memory usage
  • Verify RabbitMQ queue depths

Weekly:

  • Review Pterodactyl server count vs configuration limits
  • Analyze capacity utilization trends
  • Update Docker images for game servers

Monthly:

  • Review and rotate log files
  • Update Python dependencies (pip install -U -r requirements.txt)
  • Review configuration for optimization opportunities

Database Maintenance

Redis Monitoring:

# Connect to Redis
redis-cli -h 10.0.1.50 -a astroid_redis_2025

# Check memory usage
INFO memory

# List all server keys
KEYS minecraft:servers:*

# Count servers per game mode
KEYS minecraft:servers:bedwars:* | wc -l

Redis Cleanup:

# Remove stale server entries (manual cleanup)
# Find servers not in Pterodactyl anymore

redis-cli -h 10.0.1.50 -a astroid_redis_2025
SCAN 0 MATCH minecraft:servers:* COUNT 100

# Delete specific server
DEL minecraft:servers:bedwars:server-abc123

RabbitMQ Monitoring:

# List queues
rabbitmqctl list_queues

# Purge stuck messages (emergency only)
rabbitmqctl purge_queue spawn_request

Backup Procedures

Configuration Backup:

# Backup config.yml (automated via Git)
git add config.yml
git commit -m "Update configuration"
git push

Redis Backup:

# Redis handles persistence automatically (AOF + RDB)
# Backup files located in /var/lib/redis/

# Manual backup
redis-cli -h 10.0.1.50 -a astroid_redis_2025 BGSAVE

# Copy dump.rdb to backup location
cp /var/lib/redis/dump.rdb /backups/redis/dump-$(date +%Y%m%d).rdb

Version History

2.0.0 - Major rewrite with RabbitMQ integration

  • Replaced polling with event-driven architecture
  • Added per-game-mode capacity policies
  • Implemented Redis state storage
  • Improved error handling and logging

1.x.x - Legacy versions (deprecated)

  • Simple polling-based capacity management
  • Single game mode support

Maintainer: Stijn Jakobs
Repository: Internal
Last Updated: 2025-10-20

About

An automated system for managing Minecraft minigame arena servers on Pterodactyl nodes. This cloud controller uses RabbitMQ for messaging, the Pterodactyl API for server management, and Redis for synchronization with Orion proxies.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages