LMX Setup

This guide covers installing and configuring Opta LMX on your Apple Silicon machine. LMX is designed to run on a dedicated inference server — typically a Mac Studio on your local network.

Hardware Requirements

LMX requires Apple Silicon with sufficient unified memory to load your target models. The minimum and recommended configurations are:

TierChipMemoryModel Range
MinimumM1 Pro / M2 Pro32GB7B - 14B models
RecommendedM2 Ultra / M3 Ultra64GB - 128GB30B - 70B models
IdealM3 Ultra192GB70B+ models, multiple concurrent
Unified memory sizing
As a rule of thumb, a model requires roughly 1GB of memory per billion parameters at 4-bit quantization. A 70B model needs approximately 40GB. Always leave headroom for the OS and MLX runtime overhead.

Python Environment

LMX requires Python 3.12 or later. Use a virtual environment to isolate dependencies:

Verify Python version (must be 3.12+)
python3 --version
Python 3.12.8
Create and activate a virtual environment
python3 -m venv .venv && source .venv/bin/activate
System Python
Do not install LMX into the system Python. Always use a virtual environment. The .venv/ directory is excluded from Syncthing via .stignore.

Installation

pip Install

Install LMX in editable mode with dev dependencies
pip install -e '.[dev]'

This installs LMX from the local source tree. The -e flag enables editable mode so changes to the source are reflected immediately.

Key dependencies installed:

  • mlx / mlx-lm — Apple MLX framework and model utilities
  • fastapi + uvicorn — HTTP server
  • transformers — Tokenizer support
  • huggingface-hub — Model downloading

Configuration

LMX reads configuration from environment variables and an optional config file. The primary settings are:

~/.config/opta/lmx/config.toml
[server]
host = "0.0.0.0"
port = 1234

[models]
# Default model to load on startup
default = "mlx-community/Qwen3-30B-A3B-4bit"

# Model search paths
paths = [
  "~/.cache/huggingface/hub",
  "~/models"
]

[inference]
max_tokens = 4096
temperature = 0.7
context_length = 32768

[memory]
# Maximum percentage of unified memory to use
max_memory_pct = 85
# Auto-unload model if memory exceeds this threshold
oom_threshold_pct = 90
Environment overrides
All config values can be overridden with environment variables using the OPTA_LMX_ prefix. For example, OPTA_LMX_PORT=5678 overrides the port setting.

Starting LMX

1

Activate the virtual environment

source .venv/bin/activate
2

Start the server

Start LMX in the foreground
python -m opta_lmx.main
INFO:     LMX starting on 0.0.0.0:1234
INFO:     Loading model: mlx-community/Qwen3-30B-A3B-4bit
INFO:     Model loaded in 2.3s (VRAM: 18.4GB)
INFO:     Ready for inference
3

Verify the server is responding

curl http://localhost:1234/healthz
{"status":"ok"}

launchd Service

For production use, run LMX as a launchd service so it starts automatically on boot and restarts on crash.

~/Library/LaunchAgents/com.opta.lmx.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.opta.lmx</string>
  <key>ProgramArguments</key>
  <array>
    <string>/path/to/.venv/bin/python</string>
    <string>-m</string>
    <string>opta_lmx.main</string>
  </array>
  <key>WorkingDirectory</key>
  <string>/path/to/1M-Opta-LMX</string>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/opta-lmx.stdout.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/opta-lmx.stderr.log</string>
</dict>
</plist>
Install and start the launchd service
launchctl load ~/Library/LaunchAgents/com.opta.lmx.plist
Verify the service is running
launchctl list | grep opta.lmx
12345	0	com.opta.lmx
Update paths
Replace /path/to/ in the plist with the actual absolute paths to your LMX virtual environment and project directory.

Verification

Run these checks from your MacBook to confirm LMX is accessible over the LAN:

1

Liveness check

curl http://192.168.188.11:1234/healthz
{"status":"ok"}
2

Readiness check (model loaded)

curl http://192.168.188.11:1234/readyz
{"ready":true,"model":"qwen3-30b-a3b"}
3

Test inference

Send a test completion request
curl http://192.168.188.11:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"qwen3-30b-a3b","messages":[{"role":"user","content":"Hi"}]}'
4

Check model list

curl http://192.168.188.11:1234/admin/models
{"models":[{"id":"qwen3-30b-a3b","loaded":true,"vram_gb":18.4}]}