Learn About

Deep workflow guides aligned to this documentation section.

Opta LMX Masterclass

Advanced LMX runtime behavior, memory strategy, and performance tuning.

LMX Setup

This guide covers installing and configuring Opta LMX on your Apple Silicon machine. LMX is designed to run on a dedicated inference server — typically a dedicated Apple Silicon host on your local network.

Hardware Requirements

LMX requires Apple Silicon with sufficient unified memory to load your target models. The minimum and recommended configurations are:

Recommended Configurations

Tier	Chip	Memory	Model Range
Minimum	M1 Pro / M2 Pro	32GB	7B - 14B models
Recommended	M2 Ultra / M3 Ultra	64GB - 128GB	30B - 70B models
Ideal	M3 Ultra	192GB	70B+ models, multiple concurrent

Unified memory sizing

As a rule of thumb, a model requires roughly 1GB of memory per billion parameters at 4-bit quantization. A 70B model needs approximately 40GB. Always leave headroom for the OS and MLX runtime overhead.

Python Environment

LMX requires Python 3.12 or later. Use a virtual environment to isolate dependencies:

Verify Python version (must be 3.12+)

python3 --version

Python 3.12.8

Create and activate a virtual environment

python3 -m venv .venv && source .venv/bin/activate

System Python

Do not install LMX into the system Python. Always use a virtual environment. The .venv/ directory is excluded from Syncthing via .stignore.

Installation

pip Install

Install LMX in editable mode with dev dependencies

pip install -e '.[dev]'

This installs LMX from the local source tree. The -e flag enables editable mode so changes to the source are reflected immediately.

Key dependencies installed:

mlx / mlx-lm — Apple MLX framework and model utilities
fastapi + uvicorn — HTTP server
transformers — Tokenizer support
huggingface-hub — Model downloading

Configuration

LMX reads configuration from environment variables and an optional config file. The primary settings are:

~/.config/opta/lmx/config.toml

[server]
host = "0.0.0.0"
port = 1234

[models]
# Default model to load on startup
default = "mlx-community/Qwen3-30B-A3B-4bit"

# Model search paths
paths = [
  "~/.cache/huggingface/hub",
  "~/models"
]

[inference]
max_tokens = 4096
temperature = 0.7
context_length = 32768

[memory]
# Maximum percentage of unified memory to use
max_memory_pct = 85
# Auto-unload model if memory exceeds this threshold
oom_threshold_pct = 90

Environment overrides

All config values can be overridden with environment variables using the OPTA_LMX_ prefix. For example, OPTA_LMX_PORT=5678 overrides the port setting.

Starting LMX

Activate the virtual environment

source .venv/bin/activate

Start the server

Start LMX in the foreground

python -m opta_lmx.main

INFO:     LMX starting on 0.0.0.0:1234
INFO:     Loading model: mlx-community/Qwen3-30B-A3B-4bit
INFO:     Model loaded in 2.3s (VRAM: 18.4GB)
INFO:     Ready for inference

Verify the server is responding

curl http://localhost:1234/healthz

{"status":"ok"}

launchd Service

For production use, run LMX as a launchd service so it starts automatically on boot and restarts on crash.

~/Library/LaunchAgents/com.opta.lmx.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.opta.lmx</string>
  <key>ProgramArguments</key>
  <array>
    <string>/path/to/.venv/bin/python</string>
    <string>-m</string>
    <string>opta_lmx.main</string>
  </array>
  <key>WorkingDirectory</key>
  <string>/path/to/1M-Opta-LMX</string>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>StandardOutPath</key>
  <string>/tmp/opta-lmx.stdout.log</string>
  <key>StandardErrorPath</key>
  <string>/tmp/opta-lmx.stderr.log</string>
</dict>
</plist>

Install and start the launchd service

launchctl load ~/Library/LaunchAgents/com.opta.lmx.plist

Verify the service is running

launchctl list | grep opta.lmx

12345	0	com.opta.lmx

Update paths

Replace /path/to/ in the plist with the actual absolute paths to your LMX virtual environment and project directory.

Verification

Run these checks from your MacBook to confirm LMX is accessible over the LAN:

Liveness check

curl http://lmx-host.local:1234/healthz

{"status":"ok"}

Readiness check (model loaded)

curl http://lmx-host.local:1234/readyz

{"ready":true,"model":"qwen3-30b-a3b"}

Test inference

Send a test completion request

curl http://lmx-host.local:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-30b-a3b","messages":[{"role":"user","content":"Hi"}]}'

Check model list

curl http://lmx-host.local:1234/admin/models

{"models":[{"id":"qwen3-30b-a3b","loaded":true,"vram_gb":18.4}]}

Overview

API Reference