Internal infrastructure automation system for the AstroidMC network. Manages containerized Minecraft game server lifecycle with auto-scaling, capacity management, and Pterodactyl panel integration.
This controller provides automated lifecycle management of game server instances across the AstroidMC distributed infrastructure. The system implements event-driven orchestration using RabbitMQ message queues, Redis state storage, and the Pterodactyl REST API for container provisioning.
Architecture:
- Asynchronous event processing via RabbitMQ AMQP consumers
- Redis-backed server registry for O(1) lookup performance
- Thread-safe capacity evaluation with 30-second check intervals
- Pterodactyl API integration for Docker container orchestration
- Per-game-mode capacity policies with configurable scaling thresholds
Required:
- Python 3.8+
- Pterodactyl Panel (v1.0+) with application API access
- RabbitMQ server (v3.8+)
- Redis server (v6.0+)
- Docker container runtime on Pterodactyl nodes
Python Packages:
- pika (RabbitMQ client)
- redis (Redis client)
- requests (HTTP client for Pterodactyl API)
- pyyaml (configuration parsing)
Optional:
- BungeeCord/Velocity proxy (for player routing integration)
- Prometheus/Grafana (for metrics visualization)
Connection Configuration:
redis:
host: "10.0.1.50" # Internal Redis server
port: 6379
password: "astroid_redis_2025"
db: 0
key_prefix: "minecraft:servers:"
Data Structures Used:
minecraft:servers:{game_type}:{server_id}- Hash map of server metadata- Keys:
pterodactyl_id,ip,port,player_count,max_players,status,created_at - TTL: None (persistent until explicit deletion)
Connection Configuration:
rabbitmq:
host: "10.0.1.51" # Internal RabbitMQ server
port: 5672
username: "cloud_controller"
password: "astroid_rmq_2025"
vhost: "/minecraft"
Queue Topology:
queues:
spawn_request: "spawn_request" # Inbound: Server spawn requests
server_ready: "server_ready" # Outbound: Server availability notifications
server_empty: "server_empty" # Inbound: Empty server notifications
player_count: "player_count" # Inbound: Player count updates
Message Durability:
- All queues configured as durable (survive broker restart)
- Messages published with
delivery_mode=2(persistent) - Automatic acknowledgment disabled (manual ack after processing)
API Configuration:
pterodactyl:
panel_url: "https://panel.astroidmc.net"
api_key: "ptla_xxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Application API key
node_id: 1 # Default node for server deployment
Required API Permissions:
server.create- Provision new server instancesserver.read- Query server status and metadataserver.delete- Deprovision serversserver.control- Start/stop/restart operations
Each game mode has independent capacity settings. Example configuration for BedWars:
minigames:
bedwars:
min_servers: 2 # Minimum running servers (always maintained)
max_servers: 20 # Maximum allowed servers (hard limit)
empty_buffer_count: 3 # Target empty servers ready for instant joins
max_players_per_server: 16 # Capacity per server instance
spawn_threshold_percent: 75 # Spawn new server when 75% capacity reached
despawn_empty_after: 300 # Seconds before empty server is terminated
nest_id: 1 # Pterodactyl nest ID
egg_id: 3 # Pterodactyl egg ID
user_id: 1 # Default server owner user ID
docker_image: "ghcr.io/astroidmc/bedwars-server:1.8.8"
startup: "java -Xms512M -Xmx2G -jar server.jar"
environment:
GAME_MODE: "BUNGEE"
MAX_PLAYERS: "16"
SERVER_TYPE: "BEDWARS"
The controller evaluates capacity every check_interval seconds (default: 30):
Spawn Decision Logic:
total_capacity = running_servers * max_players_per_server
current_utilization = total_player_count / total_capacity
empty_servers = count(servers where player_count == 0)
if empty_servers < empty_buffer_count:
spawn_new_server()
if current_utilization >= (spawn_threshold_percent / 100):
spawn_new_server()
if running_servers < min_servers:
spawn_servers_to_reach_minimum()
Despawn Decision Logic:
for each server:
if player_count == 0 AND idle_time >= despawn_empty_after:
if running_servers > min_servers:
despawn_server()
Example Scaling Behavior (BedWars):
- 0 players: 2 servers running (min_servers)
- 16 players (100% of 1 server): 3-4 servers running (1 active + buffer)
- 48 players (75% of 4 servers): 5-6 servers running (triggers spawn threshold)
- 320 players: 20 servers running (max_servers limit reached)
Server Provisioning Times:
- Pterodactyl API call: ~500ms
- Docker container start: 15-30 seconds
- Minecraft server initialization: 20-60 seconds
- Total time to accepting connections: 40-90 seconds
Capacity Evaluation Performance:
- Redis query time: <5ms per game mode
- Full evaluation cycle: <100ms for 10 game modes
- RabbitMQ message processing: <10ms per message
Capacity Evaluation Performance:
- Redis query time: <5ms per game mode
- Full evaluation cycle: <100ms for 10 game modes
- RabbitMQ message processing: <10ms per message
Windows Environment:
- Python 3.8+ installed and added to PATH
- Git for Windows
- Access to AstroidMC internal network (VPN required if remote)
Network Access Required:
panel.astroidmc.net:443(Pterodactyl API)10.0.1.50:6379(Redis)10.0.1.51:5672(RabbitMQ)
1. Clone Repository
git clone https://github.com/astroidmc/cloud-controller.git
cd cloud-controller
2. Install Dependencies
pip install -r requirements.txt
3. Configuration
# Create configuration from template
copy config.yml.example config.yml
# Edit with internal credentials
notepad config.yml
Update the following sections with AstroidMC internal values:
- Pterodactyl API key (obtain from panel.astroidmc.net/admin/api)
- Redis password (check internal documentation)
- RabbitMQ credentials (check internal documentation)
4. Verify Connectivity
python -c "import pika, redis, requests; print('Dependencies OK')"
5. Run Controller
# Development mode (foreground)
python main.py
# Production mode (see Deployment section)
No Local Caching:
- Redis acts as single source of truth for server state
- All server lookups query Redis directly (O(1) hash operations)
- No in-memory cache in controller process
- Prevents stale data across controller restarts
Redis Key Structure:
minecraft:servers:bedwars:server-abc123
{
"pterodactyl_id": "12345",
"ip": "10.0.2.10",
"port": 25565,
"player_count": 8,
"max_players": 16,
"status": "running",
"created_at": "2025-10-20T10:30:00Z",
"last_updated": "2025-10-20T10:35:00Z"
}
Single-Threaded Event Loop:
- Main thread runs capacity evaluation loop
- Separate thread for RabbitMQ consumer
- Thread-safe queue for message passing between threads
- No shared mutable state between threads
Concurrency Model:
# Main thread: Capacity evaluation (blocking)
while True:
evaluate_all_game_modes() # Sequential evaluation
time.sleep(check_interval)
# Consumer thread: RabbitMQ message processing
def callback(ch, method, properties, body):
message_queue.put(body) # Thread-safe queue
ch.basic_ack(delivery_tag=method.delivery_tag)
Critical Sections:
- Pterodactyl API calls use requests library (thread-safe)
- Redis operations use redis-py (connection pooling, thread-safe)
- RabbitMQ message queue uses threading.Queue (thread-safe)
Normal Shutdown:
- SIGTERM/SIGINT signal received
- Stop accepting new RabbitMQ messages
- Process remaining messages in queue
- Close RabbitMQ connection gracefully
- Close Redis connection pool
- Exit process (exit code 0)
Emergency Shutdown:
- Server state persists in Redis (no data loss)
- RabbitMQ messages requeued if not acknowledged
- Pterodactyl servers continue running independently
- Controller restart resumes from Redis state
Data Loss Scenarios:
- Redis failure: Controller cannot operate (fails fast)
- RabbitMQ failure: Messages lost if not acknowledged (rare)
- Pterodactyl API failure: Retry logic with exponential backoff
Memory Usage:
- Base process: ~50-100 MB
- Per game mode: ~5-10 MB
- Total for 10 game modes: ~150-200 MB
CPU Usage:
- Idle: <1% (sleeping between evaluations)
- During evaluation: 5-15% spike (API calls, Redis queries)
- Average: 2-5% on dual-core system
Network Traffic:
- Redis queries: ~1-5 KB/s average
- RabbitMQ messages: ~0.5-2 KB/s average
- Pterodactyl API: Burst traffic during provisioning (~100 KB per server spawn)
| Parameter | Type | Default | Description |
|---|---|---|---|
log_level |
string | INFO |
Logging verbosity: DEBUG, INFO, WARNING, ERROR, CRITICAL |
log_file |
string | cloud_controller.log |
Log file path (relative or absolute) |
check_interval |
integer | 30 |
Seconds between capacity evaluation cycles |
server_startup_timeout |
integer | 180 |
Maximum seconds to wait for server ready signal |
| Parameter | Type | Description |
|---|---|---|
panel_url |
string | Pterodactyl panel URL (https://panel.astroidmc.net) |
api_key |
string | Application API key from panel |
node_id |
integer | Default node ID for server deployment |
| Parameter | Type | Description |
|---|---|---|
host |
string | Redis server IP (10.0.1.50) |
port |
integer | Redis port (6379) |
password |
string | Redis authentication password |
db |
integer | Database number (0-15) |
key_prefix |
string | Prefix for all Redis keys |
| Parameter | Type | Description |
|---|---|---|
host |
string | RabbitMQ server IP (10.0.1.51) |
port |
integer | AMQP port (5672) |
username |
string | Authentication username |
password |
string | Authentication password |
queues.* |
string | Queue name mappings |
Each game mode under minigames: supports these parameters:
| Parameter | Type | Description |
|---|---|---|
min_servers |
integer | Minimum running servers (baseline capacity) |
max_servers |
integer | Maximum allowed servers (hard limit) |
empty_buffer_count |
integer | Target number of empty servers |
max_players_per_server |
integer | Player capacity per server |
spawn_threshold_percent |
integer | Utilization percentage triggering spawn (0-100) |
despawn_empty_after |
integer | Seconds before empty server termination |
nest_id |
integer | Pterodactyl nest ID |
egg_id |
integer | Pterodactyl egg ID |
user_id |
integer | Default server owner user ID |
docker_image |
string | Docker image for server container |
startup |
string | Startup command for container |
environment.* |
map | Environment variables for container |
Complete configuration documentation: docs/CONFIGURATION.md
Queue: spawn_request
Producer: BungeeCord lobby servers, other game servers
{
"type": "bedwars",
"players": 8,
"priority": "normal",
"requested_by": "lobby-01"
}
Controller Action:
- Validates game type exists in configuration
- Checks if spawn is needed (not at max_servers)
- Provisions server via Pterodactyl API
- Responds with server_ready message when available
Queue: server_ready
Consumer: BungeeCord proxy servers
{
"server_id": "bedwars-abc123",
"type": "bedwars",
"ip": "10.0.2.10",
"port": 25565,
"pterodactyl_id": "12345",
"timestamp": "2025-10-20T10:30:00Z"
}
Proxy Action:
- Registers server in proxy server list
- Begins routing players to new server
- Updates load balancer state
Queue: server_empty
Producer: Game server plugins
{
"server_id": "bedwars-abc123",
"type": "bedwars",
"timestamp": "2025-10-20T10:35:00Z"
}
Controller Action:
- Updates Redis state with zero player count
- Starts idle timer for server
- Despawns server after
despawn_empty_afterseconds
Queue: player_count
Producer: Game server plugins (periodic updates)
{
"server_id": "bedwars-abc123",
"type": "bedwars",
"player_count": 12,
"max_players": 16,
"timestamp": "2025-10-20T10:40:00Z"
}
Controller Action:
- Updates Redis server metadata
- Used in capacity evaluation calculations
- Triggers spawn logic if utilization thresholds exceeded
Complete event documentation: docs/EVENTS.md
Infrastructure Requirements:
- Minimum 1 Pterodactyl node with Docker runtime
- Shared Redis instance accessible from all nodes
- Shared RabbitMQ instance accessible from all nodes
- Network connectivity between all components
Deployment Topology:
[Controller VM]
- Runs cloud-controller Python process
- Connects to: Redis, RabbitMQ, Pterodactyl API
- CPU: 2 cores, RAM: 2 GB, Disk: 10 GB
[Redis VM: 10.0.1.50]
- Standalone Redis instance
- Persistence: AOF enabled
- CPU: 1 core, RAM: 1 GB
[RabbitMQ VM: 10.0.1.51]
- Standalone RabbitMQ instance
- Persistence: Queue durability enabled
- CPU: 1 core, RAM: 1 GB
[Pterodactyl Node(s)]
- Docker host for game servers
- Managed by Pterodactyl panel
- CPU: 8+ cores, RAM: 16+ GB per node
Scaling Considerations:
- Single controller instance (no horizontal scaling needed)
- Redis can be clustered for high availability
- RabbitMQ can be clustered for message durability
- Pterodactyl supports multiple nodes (horizontal scaling)
Controller Redundancy:
- Primary controller runs on VM-CTRL-01
- Standby controller can run on VM-CTRL-02
- Use systemd or supervisor for automatic restart
- No active-active support (single writer to Redis)
Failover Procedure:
- Monitor primary controller health
- If primary fails, start standby controller
- Standby reads state from Redis
- Continues capacity management seamlessly
Data Redundancy:
- Redis: Configure AOF persistence + RDB snapshots
- RabbitMQ: Enable queue mirroring in cluster mode
- Pterodactyl: Database backups (MySQL/MariaDB)
Controller fails to start:
Symptom: Python process exits immediately with connection error
Diagnostic Steps:
# Test Redis connectivity
redis-cli -h 10.0.1.50 -a astroid_redis_2025 ping
# Test RabbitMQ connectivity
Test-NetConnection -ComputerName 10.0.1.51 -Port 5672
# Check credentials in config.yml
Select-String -Path config.yml -Pattern "password"
Solution:
- Verify internal network connectivity (VPN active?)
- Check credentials against internal documentation
- Ensure Redis/RabbitMQ services are running
Servers not spawning:
Symptom: Controller logs show capacity evaluation, but no Pterodactyl API calls
Diagnostic Steps:
# Enable debug logging
# In config.yml: log_level: DEBUG
# Check capacity calculation
# Look for: "Current utilization: X%, empty servers: Y"
Solution:
- Verify game mode configuration exists in config.yml
- Check if max_servers limit reached
- Confirm spawn_threshold_percent is not too high
- Verify Pterodactyl API key has correct permissions
Servers not despawning:
Symptom: Empty servers remain running beyond despawn_empty_after
Diagnostic Steps:
# Check Redis server state
redis-cli -h 10.0.1.50 -a astroid_redis_2025 HGETALL "minecraft:servers:bedwars:server-abc123"
# Verify player_count is 0
# Check last_updated timestamp
Solution:
- Ensure server plugins send server_empty messages
- Verify RabbitMQ consumer thread is running
- Check if server count is at min_servers (won't despawn)
Enable verbose logging in config.yml:
general:
log_level: DEBUG
log_file: cloud_controller_debug.log
Debug Output Includes:
- RabbitMQ message payloads (full JSON)
- Pterodactyl API request/response bodies
- Redis query details (keys, values)
- Capacity evaluation calculations
- Thread state information
Performance Impact:
- Increased log file size (~10x larger)
- Minimal CPU/memory impact (<5% increase)
- Safe to run in production for diagnostics
Log Location:
- Windows:
cloud-controller\cloud_controller.log - Linux:
/var/log/cloud-controller/cloud_controller.log(systemd)
1. Update Configuration
Add new game mode to config.yml:
minigames:
skywars: # New game mode
min_servers: 1
max_servers: 10
empty_buffer_count: 2
max_players_per_server: 12
spawn_threshold_percent: 80
despawn_empty_after: 300
nest_id: 1
egg_id: 5 # SkyWars egg
user_id: 1
docker_image: "ghcr.io/astroidmc/skywars-server:1.8.8"
startup: "java -Xms512M -Xmx2G -jar server.jar"
environment:
GAME_MODE: "BUNGEE"
MAX_PLAYERS: "12"
2. Restart Controller
No code changes required - controller reads configuration on startup.
3. Verify Deployment
- Controller logs show new game mode in evaluation cycle
- Redis keys created for new game mode
- Servers spawn based on min_servers setting
Async Operations:
- Use Pterodactyl API asynchronously where possible
- Avoid blocking main evaluation thread
- Use threading.Queue for cross-thread communication
Thread Safety:
- No shared mutable state between threads
- Use thread-safe data structures (Queue, redis-py connection pool)
- Document any new threading requirements
Input Validation:
- Validate all RabbitMQ message payloads
- Check Pterodactyl API responses for errors
- Handle missing configuration keys gracefully
Error Logging:
- Use logging module (not print statements)
- Log exceptions with full stack trace
- Include context in error messages (game mode, server ID, etc.)
Try-Catch Usage:
try:
# Risky operation (API call, Redis query)
result = pterodactyl_api.create_server(...)
except requests.exceptions.RequestException as e:
logger.error(f"Failed to create server: {e}", exc_info=True)
# Don't crash - continue with next evaluation
except Exception as e:
logger.critical(f"Unexpected error: {e}", exc_info=True)
# Critical errors may require process restart
Before deploying changes:
- Test with single game mode in config.yml
- Test with multiple game modes (min 3)
- Verify server spawn logic with different player counts
- Verify server despawn after idle timeout
- Test Redis connection failure (graceful degradation)
- Test RabbitMQ connection failure (retry logic)
- Test Pterodactyl API failure (error handling)
- Run for 1 hour minimum in test environment
- Monitor memory usage (no leaks)
- Check log output for errors/warnings
Daily:
- Review controller logs for errors/warnings
- Check Redis memory usage
- Verify RabbitMQ queue depths
Weekly:
- Review Pterodactyl server count vs configuration limits
- Analyze capacity utilization trends
- Update Docker images for game servers
Monthly:
- Review and rotate log files
- Update Python dependencies (
pip install -U -r requirements.txt) - Review configuration for optimization opportunities
Redis Monitoring:
# Connect to Redis
redis-cli -h 10.0.1.50 -a astroid_redis_2025
# Check memory usage
INFO memory
# List all server keys
KEYS minecraft:servers:*
# Count servers per game mode
KEYS minecraft:servers:bedwars:* | wc -l
Redis Cleanup:
# Remove stale server entries (manual cleanup)
# Find servers not in Pterodactyl anymore
redis-cli -h 10.0.1.50 -a astroid_redis_2025
SCAN 0 MATCH minecraft:servers:* COUNT 100
# Delete specific server
DEL minecraft:servers:bedwars:server-abc123
RabbitMQ Monitoring:
# List queues
rabbitmqctl list_queues
# Purge stuck messages (emergency only)
rabbitmqctl purge_queue spawn_request
Configuration Backup:
# Backup config.yml (automated via Git)
git add config.yml
git commit -m "Update configuration"
git push
Redis Backup:
# Redis handles persistence automatically (AOF + RDB)
# Backup files located in /var/lib/redis/
# Manual backup
redis-cli -h 10.0.1.50 -a astroid_redis_2025 BGSAVE
# Copy dump.rdb to backup location
cp /var/lib/redis/dump.rdb /backups/redis/dump-$(date +%Y%m%d).rdb
2.0.0 - Major rewrite with RabbitMQ integration
- Replaced polling with event-driven architecture
- Added per-game-mode capacity policies
- Implemented Redis state storage
- Improved error handling and logging
1.x.x - Legacy versions (deprecated)
- Simple polling-based capacity management
- Single game mode support
Maintainer: Stijn Jakobs
Repository: Internal
Last Updated: 2025-10-20