Dashboard
The Opta Local Web dashboard provides real-time visibility into your LMX inference server -- VRAM utilization, loaded models, token throughput, and server health at a glance.
Dashboard Overview
The dashboard is the default landing page of Opta Local Web. It is organized into a grid of status panels, each updating in real time through the SSE connection to LMX. The layout is designed to give you immediate situational awareness: is the server healthy, how much memory is available, what models are loaded, and how fast inference is running.
VRAM Gauge
The VRAM gauge is a circular progress indicator that shows the current unified memory utilization of your Apple Silicon GPU. It displays:
- Current usage -- how many GB of unified memory are allocated to loaded models
- Total capacity -- the total unified memory pool (e.g., 192GB on a Mac Studio Ultra)
- Percentage fill -- rendered as a circular arc with smooth animation
The gauge color shifts from green to amber to red as utilization increases, providing instant visual feedback on memory pressure. When VRAM is nearly full, loading additional models risks OOM conditions -- the gauge helps you decide when to unload before loading.
Active Models
The active models panel lists every model currently loaded into GPU memory. Each entry shows:
- Model name -- the identifier used for inference requests
- Memory footprint -- how many GB this model occupies
- Quantization -- the quantization level (e.g., Q4_K_M, Q8_0, F16)
- Status indicator -- whether the model is idle, actively inferring, or loading
You can load and unload models directly from this panel. Loading a model sends a request to the LMX admin API, and the panel updates in real time as the model initializes and enters the ready state. Unloading frees the GPU memory immediately.
Throughput Chart
The throughput panel shows a rolling history of inference speed measured in tokens per second. Data arrives via the SSE stream from /admin/events and is stored in a client-side circular buffer of 300 entries.
The chart renders as a line graph showing throughput over time. Spikes indicate active inference requests; flat lines indicate idle periods. This is useful for:
- Verifying that inference is running at expected speeds
- Identifying throughput degradation under concurrent load
- Comparing performance between different models or quantizations
GET /admin/events
Accept: text/event-stream
data: {"type":"throughput","tokens_per_sec":42.3,"model":"qwen3-72b"}
data: {"type":"vram","used_gb":87.2,"total_gb":192.0}
data: {"type":"model_status","model":"qwen3-72b","state":"ready"}Server Status
Status badges at the top of the dashboard indicate the health of each component in the stack:
- LMX Server -- connected or disconnected, with latency
- Active Requests -- number of concurrent inference requests
- Uptime -- how long the LMX server has been running
Each badge is color-coded: green for healthy, amber for degraded, and red for offline. The badges update every heartbeat cycle.
Health Polling
In addition to the SSE event stream, the dashboard runs a periodic heartbeat check against the LMX health endpoint. This independent polling mechanism ensures the dashboard can detect when the SSE connection silently drops (e.g., due to network interruption) and display an accurate offline state.
The heartbeat interval is configurable but defaults to 10 seconds. If three consecutive heartbeats fail, the dashboard transitions to a disconnected state and displays a reconnection indicator.