Introduction to Opta Local
Opta Local is a private, local-first AI stack for developers who want to run large language models on their own Apple Silicon hardware. No cloud dependency, no data leaving your network, no subscription fees for inference.
What is Opta Local?
Opta Local is a vertically integrated system that connects a command-line interface, a local inference server, and a web dashboard into a single cohesive developer experience. It lets you chat with AI models, run autonomous coding tasks, manage sessions, and monitor your hardware -- all from your local network.
Unlike cloud-only AI tools, Opta Local runs entirely on your LAN. Your prompts, responses, and session data never leave your machines. The stack is designed for Apple Silicon Macs with large unified memory pools (64GB+), where local models can run at speeds competitive with cloud APIs.
The Three Components
The Opta Local stack consists of three layered components that work together:
| Component | Role | Runs On |
|---|---|---|
| Opta CLI + Daemon | Terminal interface and background orchestration service. Handles chat, task execution, session management, permissions, and tool routing. | Your workstation (MacBook, desktop) |
| Opta LMX | Apple Silicon inference server. Serves models via an OpenAI-compatible API using MLX for optimized Metal GPU inference. | Mac Studio / Mac Pro (high-memory host) |
| Opta Local Web | Browser-based dashboard for real-time monitoring, chat, model management, and VRAM usage tracking. | Any device on your LAN (or remotely via tunnel) |
Who is it For?
Opta Local is built for a specific audience:
- Developers with Apple Silicon hardware -- particularly Mac Studios or Mac Pros with 96GB+ unified memory, capable of running 70B+ parameter models locally.
- Privacy-conscious engineers who want AI assistance without sending proprietary code or sensitive data to cloud providers.
- Power users who want full control over model selection, inference parameters, and tool permissions.
- Teams who want to share a local inference server across multiple workstations on a LAN.
Key Benefits
Privacy
Every prompt, response, and session stays on your local network. There is no telemetry, no cloud logging, and no data retention by third parties. Your code and conversations are yours alone.
Speed
Apple Silicon unified memory architecture allows models to load directly into GPU-accessible memory without PCIe bottlenecks. A Mac Studio with 192GB of unified memory can run 70B parameter models at 40+ tokens per second -- comparable to or faster than many cloud API endpoints.
Control
You choose which models to run, which tools to enable, and what permissions to grant. The CLI's permission system lets you approve or deny individual tool invocations. There are no opaque safety filters -- you set the guardrails.
No Recurring Costs
After the initial hardware investment, inference is free. No per-token pricing, no API rate limits, no monthly subscriptions. Run as many queries as your hardware can handle.
Architecture Overview
The following diagram shows how the three components connect:
opta chat / opta do / opta tui CLI commands (your terminal)
|
v
opta daemon 127.0.0.1:9999 Background orchestration service
| HTTP v3 REST + WS streaming
v
Opta LMX 192.168.188.11:1234 Apple Silicon inference server
| OpenAI-compatible /v1/chat/completions
v
Opta Local Web localhost:3004 Browser dashboard + chat UIThe CLI is your primary interface. When you run opta chat or opta do, the CLI connects to the daemon (starting it automatically if needed). The daemon manages sessions, enforces permissions, and proxies inference requests to LMX over your LAN. The web dashboard provides a visual interface for the same stack, connecting to LMX directly for monitoring and chat.
Next Steps
Ready to get started? The next page walks you through installing the CLI and verifying your setup.