Chat

The Opta Local Web chat interface provides a full streaming conversation experience in your browser, backed by the same models running on your LMX inference server.

Chat Interface

The chat view presents a familiar conversational layout: your messages on one side, model responses on the other, rendered in a scrolling message list. Messages are composed in an input area at the bottom of the screen with keyboard shortcuts for submission.

The interface supports multi-turn conversations with full context retention. Each message is displayed with the model identity and timestamp, and the entire conversation history is maintained in the browser session.

Model Picker

Before starting a conversation (or at any point during one), you can select which loaded model to use via the model picker. The picker shows only models currently loaded in LMX memory -- it pulls the active model list from the same SSE stream that powers the dashboard.

Selecting a different model mid-conversation switches inference to that model for subsequent messages. Previous messages in the conversation are preserved and re-sent as context to the new model.

Model switching
Switching models mid-conversation is useful for comparing response quality. Ask the same question to different models within a single session to evaluate their capabilities side by side.

Streaming Responses

Responses stream token by token as they are generated by LMX. The web client connects to the standard OpenAI-compatible streaming endpoint and renders tokens incrementally as they arrive.

Streaming Request
POST /v1/chat/completions
Content-Type: application/json

{
  "model": "qwen3-72b",
  "messages": [
    { "role": "user", "content": "Explain unified memory architecture" }
  ],
  "stream": true
}

During streaming, a typing indicator and token counter are visible. The response text renders with progressive markdown formatting -- headings, code blocks, and lists appear as soon as enough tokens have been received to parse them.

Tool Execution Feedback

When the model requests tool execution (file reads, web searches, code execution), the chat interface displays tool call cards inline with the conversation. Each card shows:

  • Tool name -- which tool the model invoked
  • Arguments -- the parameters passed to the tool
  • Result -- the output returned to the model
  • Duration -- how long the tool execution took

Tool cards are collapsible -- click to expand or collapse the details. This keeps the conversation readable while still providing full transparency into what tools the model used and what data they returned.

Session Continuity

Sessions started in the Opta CLI can be continued in the web interface, and vice versa. The daemon maintains session state independently of the client, so you can:

  • Start a coding conversation in your terminal with opta chat
  • Switch to the web dashboard to continue the same session visually
  • Return to the CLI and pick up where you left off

Session continuity works because both the CLI and web dashboard connect to the same daemon session store. The session ID is the key -- any client that knows the session ID can resume the conversation.

Session persistence
Session data is persisted to disk by the daemon. Conversations survive daemon restarts and can be resumed hours or days later with full context.

Markdown Rendering

Model responses are rendered with full markdown support, including:

  • Headings (h1 through h6)
  • Fenced code blocks with syntax highlighting
  • Inline code spans
  • Bold, italic, and strikethrough text
  • Ordered and unordered lists
  • Blockquotes
  • Tables
  • Links (opening in new tabs)

Code blocks include a copy button for extracting snippets. The markdown renderer matches the Opta design system -- dark backgrounds for code blocks, violet accent for links, and consistent typography throughout.