Introduction to Opta Local

Opta Local is a private, local-first AI stack for developers who want to run large language models on their own Apple Silicon hardware. No cloud dependency, no data leaving your network, no subscription fees for inference.

What is Opta Local?

Opta Local is a vertically integrated system that connects a command-line interface, a local inference server, and a web dashboard into a single cohesive developer experience. It lets you chat with AI models, run autonomous coding tasks, manage sessions, and monitor your hardware -- all from your local network.

Unlike cloud-only AI tools, Opta Local runs entirely on your LAN. Your prompts, responses, and session data never leave your machines. The stack is designed for Apple Silicon Macs with large unified memory pools (64GB+), where local models can run at speeds competitive with cloud APIs.

The Three Components

The Opta Local stack consists of three layered components that work together:

Component	Role	Runs On
Opta CLI + Daemon	Terminal interface and background orchestration service. Handles chat, task execution, session management, permissions, and tool routing.	Your workstation (MacBook, desktop)
Opta LMX	Apple Silicon inference server. Serves models via an OpenAI-compatible API using MLX for optimized Metal GPU inference.	Mac Studio / Mac Pro (high-memory host)
Opta Local Web	Browser-based dashboard for real-time monitoring, chat, model management, and VRAM usage tracking.	Any device on your LAN (or remotely via tunnel)

How they connect

The CLI daemon runs on your development machine and proxies requests to the LMX inference server over your local network. The web dashboard connects directly to LMX for monitoring and chat. All communication stays on your LAN unless you explicitly configure a Cloudflare Tunnel for remote access.

Who is it For?

Opta Local is built for a specific audience:

Developers with Apple Silicon hardware -- particularly Mac Studios or Mac Pros with 96GB+ unified memory, capable of running 70B+ parameter models locally.
Privacy-conscious engineers who want AI assistance without sending proprietary code or sensitive data to cloud providers.
Power users who want full control over model selection, inference parameters, and tool permissions.
Teams who want to share a local inference server across multiple workstations on a LAN.

Key Benefits

Privacy

Every prompt, response, and session stays on your local network. There is no telemetry, no cloud logging, and no data retention by third parties. Your code and conversations are yours alone.

Speed

Apple Silicon unified memory architecture allows models to load directly into GPU-accessible memory without PCIe bottlenecks. A Mac Studio with 192GB of unified memory can run 70B parameter models at 40+ tokens per second -- comparable to or faster than many cloud API endpoints.

Control

You choose which models to run, which tools to enable, and what permissions to grant. The CLI's permission system lets you approve or deny individual tool invocations. There are no opaque safety filters -- you set the guardrails.

No Recurring Costs

After the initial hardware investment, inference is free. No per-token pricing, no API rate limits, no monthly subscriptions. Run as many queries as your hardware can handle.

Architecture Overview

The following diagram shows how the three components connect:

Stack Architecture

opta chat / opta do / opta tui        CLI commands (your terminal)
        |
        v
opta daemon  127.0.0.1:9999            Background orchestration service
        |   HTTP v3 REST + WS streaming
        v
Opta LMX  192.168.188.11:1234          Apple Silicon inference server
        |   OpenAI-compatible /v1/chat/completions
        v
Opta Local Web  localhost:3004         Browser dashboard + chat UI

The CLI is your primary interface. When you run opta chat or opta do, the CLI connects to the daemon (starting it automatically if needed). The daemon manages sessions, enforces permissions, and proxies inference requests to LMX over your LAN. The web dashboard provides a visual interface for the same stack, connecting to LMX directly for monitoring and chat.

Next Steps

Ready to get started? The next page walks you through installing the CLI and verifying your setup.

Installation