Skip to content

Conversation

@hanrok
Copy link

@hanrok hanrok commented Nov 18, 2025

No description provided.

This commit adds support for mlx-whisper as a backend option, optimized
for Apple Silicon (M1/M2/M3) Macs. MLX leverages Apple's Neural Engine
and GPU for hardware-accelerated inference.

Changes:
- Add MLX_WHISPER to BackendType enum with is_mlx_whisper() method
- Create WhisperMLX transcriber wrapper (transcriber_mlx.py)
  * Maps standard model sizes to MLX community models
  * Implements transcribe() with language detection support
  * Returns segments compatible with WhisperLive base interface
- Create ServeClientMLXWhisper backend class (mlx_whisper_backend.py)
  * Extends ServeClientBase with MLX-specific implementation
  * Supports single model mode for memory efficiency
  * Thread-safe model access with locking
  * Graceful fallback to faster_whisper if MLX unavailable
- Update server.py initialize_client() to instantiate MLX backend
- Update run_server.py CLI to include mlx_whisper in backend options
- Add mlx-whisper dependency to requirements/server.txt

The backend follows the same pluggable architecture as other backends
(faster_whisper, tensorrt, openvino) and implements the required
transcribe_audio() and handle_transcription_output() methods.

Usage:
  python run_server.py --backend mlx_whisper --model small.en
This commit adds:

1. MLX Model Path CLI Argument:
   - Add --mlx_model_path/-mlx argument to run_server.py
   - Server can now specify MLX model, overriding client's choice
   - Supports model sizes (small.en) and HF repos (mlx-community/whisper-large-v3-turbo)
   - Integrated into single_model mode support

2. Server-side MLX Model Configuration:
   - Update server.run() to accept mlx_model_path parameter
   - Update recv_audio(), handle_new_connection(), and initialize_client()
   - MLX backend now uses server-specified model when provided
   - Falls back to client-specified model if not set

3. Microphone Test Script (test_mlx_microphone.py):
   - Real-time transcription test with microphone input
   - Command-line args for host, port, model, language, translate
   - User-friendly interface with status messages
   - Saves output to SRT file
   - Proper error handling and cleanup

4. GPU Verification Tool (verify_mlx_gpu.py):
   - Checks if running on Apple Silicon (M1/M2/M3)
   - Verifies MLX and mlx-whisper installation
   - Tests GPU/Neural Engine access with sample computation
   - Optional MLX Whisper model loading test
   - Provides instructions for monitoring GPU usage:
     * Activity Monitor (GUI)
     * powermetrics (Terminal)
     * asitop (Third-party)
   - Comprehensive summary with actionable recommendations

Usage Examples:
  # Server with specific MLX model
  python run_server.py --backend mlx_whisper --mlx_model_path small.en

  # Verify GPU functionality
  python verify_mlx_gpu.py

  # Test with microphone
  python test_mlx_microphone.py --model small.en --lang en
@zoq
Copy link
Contributor

zoq commented Nov 24, 2025

This is huge, I will test the backend this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants