LMX Setup

This guide covers installing and configuring Opta LMX on your Apple Silicon machine. LMX is designed to run on a dedicated inference server — typically a Mac Studio on your local network.

Hardware Requirements

LMX requires Apple Silicon with sufficient unified memory to load your target models. The minimum and recommended configurations are:

Recommended Configurations

Tier	Chip	Memory	Model Range
Minimum	M1 Pro / M2 Pro	32GB	7B - 14B models
Recommended	M2 Ultra / M3 Ultra	64GB - 128GB	30B - 70B models
Ideal	M3 Ultra	192GB	70B+ models, multiple concurrent

Unified memory sizing

As a rule of thumb, a model requires roughly 1GB of memory per billion parameters at 4-bit quantization. A 70B model needs approximately 40GB. Always leave headroom for the OS and MLX runtime overhead.

Python Environment

LMX requires Python 3.12 or later. Use a virtual environment to isolate dependencies:

Verify Python version (must be 3.12+)

python3 --version

Python 3.12.8

Create and activate a virtual environment

python3 -m venv .venv && source .venv/bin/activate

System Python

Do not install LMX into the system Python. Always use a virtual environment. The .venv/ directory is excluded from Syncthing via .stignore.

Installation

pip Install

Install LMX in editable mode with dev dependencies

pip install -e '.[dev]'

This installs LMX from the local source tree. The -e flag enables editable mode so changes to the source are reflected immediately.

Key dependencies installed:

mlx / mlx-lm — Apple MLX framework and model utilities
fastapi + uvicorn — HTTP server
transformers — Tokenizer support
huggingface-hub — Model downloading

Configuration

LMX reads configuration from environment variables and an optional config file. The primary settings are:

~/.config/opta/lmx/config.toml

[server]
host = "0.0.0.0"
port = 1234

[models]
# Default model to load on startup
default = "mlx-community/Qwen3-30B-A3B-4bit"

# Model search paths
paths = [
  "~/.cache/huggingface/hub",
  "~/models"
]

[inference]
max_tokens = 4096
temperature = 0.7
context_length = 32768

[memory]
# Maximum percentage of unified memory to use
max_memory_pct = 85
# Auto-unload model if memory exceeds this threshold
oom_threshold_pct = 90

Environment overrides

All config values can be overridden with environment variables using the OPTA_LMX_ prefix. For example, OPTA_LMX_PORT=5678 overrides the port setting.

Starting LMX

Activate the virtual environment

source .venv/bin/activate

Start the server

Start LMX in the foreground

python -m opta_lmx.main

INFO:     LMX starting on 0.0.0.0:1234
INFO:     Loading model: mlx-community/Qwen3-30B-A3B-4bit
INFO:     Model loaded in 2.3s (VRAM: 18.4GB)
INFO:     Ready for inference

Verify the server is responding

curl http://localhost:1234/healthz

{"status":"ok"}

launchd Service

For production use, run LMX as a launchd service so it starts automatically on boot and restarts on crash.

~/Library/LaunchAgents/com.opta.lmx.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.opta.lmx</string>
  <key>ProgramArguments</key>
  <array>
    <string>/path/to/.venv/bin/python</string>
    <string>-m</string>
    <string>opta_lmx.main</string>
  </array>
  <key>WorkingDirectory</key>
  <string>/path/to/1M-Opta-LMX</string>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/opta-lmx.stdout.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/opta-lmx.stderr.log</string>
</dict>
</plist>

Install and start the launchd service

launchctl load ~/Library/LaunchAgents/com.opta.lmx.plist

Verify the service is running

launchctl list | grep opta.lmx

12345	0	com.opta.lmx

Update paths

Replace /path/to/ in the plist with the actual absolute paths to your LMX virtual environment and project directory.

Verification

Run these checks from your MacBook to confirm LMX is accessible over the LAN:

Liveness check

curl http://192.168.188.11:1234/healthz

{"status":"ok"}

Readiness check (model loaded)

curl http://192.168.188.11:1234/readyz

{"ready":true,"model":"qwen3-30b-a3b"}

Test inference

Send a test completion request

curl http://192.168.188.11:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-30b-a3b","messages":[{"role":"user","content":"Hi"}]}'

Check model list

curl http://192.168.188.11:1234/admin/models

{"models":[{"id":"qwen3-30b-a3b","loaded":true,"vram_gb":18.4}]}

Overview

API Reference