LMX Setup
This guide covers installing and configuring Opta LMX on your Apple Silicon machine. LMX is designed to run on a dedicated inference server — typically a Mac Studio on your local network.
Hardware Requirements
LMX requires Apple Silicon with sufficient unified memory to load your target models. The minimum and recommended configurations are:
Recommended Configurations
| Tier | Chip | Memory | Model Range |
|---|---|---|---|
| Minimum | M1 Pro / M2 Pro | 32GB | 7B - 14B models |
| Recommended | M2 Ultra / M3 Ultra | 64GB - 128GB | 30B - 70B models |
| Ideal | M3 Ultra | 192GB | 70B+ models, multiple concurrent |
Python Environment
LMX requires Python 3.12 or later. Use a virtual environment to isolate dependencies:
python3 --versionPython 3.12.8
python3 -m venv .venv && source .venv/bin/activate.venv/ directory is excluded from Syncthing via .stignore.Installation
pip Install
pip install -e '.[dev]'This installs LMX from the local source tree. The -e flag enables editable mode so changes to the source are reflected immediately.
Key dependencies installed:
mlx/mlx-lm— Apple MLX framework and model utilitiesfastapi+uvicorn— HTTP servertransformers— Tokenizer supporthuggingface-hub— Model downloading
Configuration
LMX reads configuration from environment variables and an optional config file. The primary settings are:
[server]
host = "0.0.0.0"
port = 1234
[models]
# Default model to load on startup
default = "mlx-community/Qwen3-30B-A3B-4bit"
# Model search paths
paths = [
"~/.cache/huggingface/hub",
"~/models"
]
[inference]
max_tokens = 4096
temperature = 0.7
context_length = 32768
[memory]
# Maximum percentage of unified memory to use
max_memory_pct = 85
# Auto-unload model if memory exceeds this threshold
oom_threshold_pct = 90OPTA_LMX_ prefix. For example, OPTA_LMX_PORT=5678 overrides the port setting.Starting LMX
Activate the virtual environment
source .venv/bin/activateStart the server
python -m opta_lmx.mainINFO: LMX starting on 0.0.0.0:1234 INFO: Loading model: mlx-community/Qwen3-30B-A3B-4bit INFO: Model loaded in 2.3s (VRAM: 18.4GB) INFO: Ready for inference
Verify the server is responding
curl http://localhost:1234/healthz{"status":"ok"}launchd Service
For production use, run LMX as a launchd service so it starts automatically on boot and restarts on crash.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.opta.lmx</string>
<key>ProgramArguments</key>
<array>
<string>/path/to/.venv/bin/python</string>
<string>-m</string>
<string>opta_lmx.main</string>
</array>
<key>WorkingDirectory</key>
<string>/path/to/1M-Opta-LMX</string>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/opta-lmx.stdout.log</string>
<key>StandardErrorPath</key>
<string>/tmp/opta-lmx.stderr.log</string>
</dict>
</plist>launchctl load ~/Library/LaunchAgents/com.opta.lmx.plistlaunchctl list | grep opta.lmx12345 0 com.opta.lmx
/path/to/ in the plist with the actual absolute paths to your LMX virtual environment and project directory.Verification
Run these checks from your MacBook to confirm LMX is accessible over the LAN:
Liveness check
curl http://192.168.188.11:1234/healthz{"status":"ok"}Readiness check (model loaded)
curl http://192.168.188.11:1234/readyz{"ready":true,"model":"qwen3-30b-a3b"}Test inference
curl http://192.168.188.11:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-30b-a3b","messages":[{"role":"user","content":"Hi"}]}'Check model list
curl http://192.168.188.11:1234/admin/models{"models":[{"id":"qwen3-30b-a3b","loaded":true,"vram_gb":18.4}]}