API Reference
LMX exposes an OpenAI-compatible API for inference alongside admin endpoints for model management and monitoring. All endpoints are unauthenticated on LAN.
Base URL
When running locally on the dedicated Apple Silicon host, use http://localhost:1234. From other devices on the LAN, use the dedicated Apple Silicon host's IP address.
Inference Endpoints
Chat Completions
Streaming (SSE)
Set stream: true to receive Server-Sent Events. Each event is a JSON chunk following the OpenAI streaming format:
WebSocket Streaming
Embeddings
Reranking
Health Endpoints
Admin Endpoints
List Models
Load Model
Unload Model
Metrics Stream
Error Codes
LMX returns standard HTTP status codes with a JSON error body:
| Code | HTTP | Description |
|---|---|---|
no-model-loaded | 503 | No model in memory. Load one first. |
model-not-found | 404 | Requested model not found on disk. |
storage-full | 507 | Insufficient disk space for model download. |
lmx-timeout | 504 | Inference timed out (model took too long). |
oom-unloaded | 503 | Model was unloaded due to OOM pressure. |
invalid-request | 400 | Malformed request body or missing fields. |