Architecture
How Salmex I/O is structured internally — from message ingestion to response delivery, across 8 distinct layers.
Overview
Salmex I/O is a personal AI operations platform, not a chatbot. It ships as a single Go binary containing 8 layers, each with a clear responsibility boundary. Every component communicates through a typed event bus — there are no hidden side channels or shared mutable state between layers.
The entire system runs locally. Your data never leaves your machine unless you choose to call an external LLM API. There is no telemetry, no phoning home, and no cloud dependency. The only outbound network calls are the ones you explicitly configure: LLM provider APIs and search engine APIs.
The 8 layers, in order of the request lifecycle:
- Channel — message ingestion from Telegram, CLI, Web UI, and other interfaces
- Gateway — authentication, routing, and lane queue management
- Agent — the ReAct loop that reasons, acts, and observes
- Judge — risk assessment and approval gating for tool calls
- Memory — three-tier storage with hybrid retrieval
- Plugin — JSON-RPC 2.0 subprocess extensions
- Scheduler — persistent cron and natural language job scheduling
- Event Bus — typed events, SSE streaming, and internal pub/sub
System Architecture
The diagram below shows all 8 layers grouped by function. Cross-cutting concerns — the event bus, config manager, delivery tracking, and platform toolkit — span every layer.
Each layer is implemented as a Go package with explicit interfaces. Dependencies flow downward: channels depend on the gateway, the gateway depends on the agent, and the agent depends on the judge, memory, and plugin layers. The event bus and scheduler operate orthogonally, reacting to events emitted by any layer.
Data Flow
A message follows a deterministic lifecycle from input to response. Every stage emits events that drive the real-time UI and audit log.
Message lifecycle
- Arrival — a message arrives via a channel adapter (Telegram webhook, CLI stdin, Web UI WebSocket). The adapter normalizes it into a unified message envelope.
- Gateway — the gateway authenticates the request, determines the target lane, and enqueues the message. Lanes provide priority-based routing: urgent requests jump the queue.
- Agent pickup — the agent picks up the next message from its assigned lane and enters the reason-act-observe loop.
- Judge evaluation — each tool call the agent proposes passes through the judge for risk assessment. Low-risk calls proceed automatically; high-risk calls require explicit user approval.
- Memory query — the agent queries the memory system for relevant context. Hybrid retrieval combines semantic search (vector embeddings) with BM25 keyword matching to surface the most useful facts.
- Response assembly — the agent assembles its final response and delivers it back through the originating channel adapter.
- Event emission — events are emitted at each stage. SSE streams push these events to web clients for real-time UI updates. Internal subscribers handle logging, metrics, and side effects.
Every message carries a unique trace ID through the entire lifecycle. You can follow a single request from channel ingestion through agent reasoning, tool execution, and response delivery in the audit log.
Agent System
The agent is the core processing engine. It implements the ReAct (Reason-Act-Observe) pattern — a loop where the agent reasons about the current state, decides on an action, executes it, observes the result, and repeats until the task is complete or a limit is reached.
ReAct loop
Each iteration of the loop follows the same structure:
- Reason — the agent receives the conversation history, memory context, and available tools. It produces a reasoning step explaining what it plans to do and why.
- Act — based on the reasoning, the agent selects a tool and provides arguments. The tool call is sent to the judge for risk evaluation before execution.
- Observe — the tool result is returned to the agent. It evaluates whether the task is complete, needs another iteration, or has encountered an error.
- Repeat — if the task is not complete, the agent loops back to the reasoning step with the new observation added to context.
Tool pipeline
The agent has access to a set of built-in tools and any tools registered by plugins. Built-in tools include:
- Search — web search via configured engines (Perplexity, Brave, Google)
- Code editor — read, write, and edit files with diff-based modifications
- File manager — list, copy, move, and delete files within sandboxed directories
- Shell — execute shell commands with configurable timeout and working directory
- Plugins — any tools registered by JSON-RPC 2.0 plugin subprocesses
Coder agent
The coder is an embedded coding agent that can read, write, and edit files and execute commands. It operates as a specialized mode of the main agent, with access to the code editor, file manager, and shell tools. The coder understands project structure, can navigate codebases, and applies edits with full diff context.
Loop runner
The loop runner manages multi-step tool chains with error recovery. If a tool call fails, the loop runner can retry with backoff, skip the failed step and continue, or escalate to the user. It enforces maximum iteration limits to prevent runaway loops.
Token and cost tracking
Every LLM call records input tokens, output tokens, and cost. These metrics are tracked per conversation and per day. The agent reports cumulative cost at the end of each response, and budget limits can halt execution before overspending.
Memory System
Memory operates across three tiers, each serving a different temporal scope. The tiers work together to give the agent the right context at the right time without overwhelming its context window.
Session memory
Full conversation transcripts preserved per chat session. Session memory contains every user message, agent response, tool call, and tool result in order. It is the agent's complete record of the current conversation and persists until the user clears it.
Working memory
Active goals, recent tool results, and ephemeral context. Working memory is the agent's scratchpad — it holds the current task state, intermediate results, and any context that is relevant right now but does not need to persist long-term. Working memory is cleared when the task completes.
Long-term memory
Extracted facts stored in PostgreSQL with pgvector. Long-term memory is the agent's persistent knowledge base. It contains facts, preferences, decisions, and patterns extracted from past conversations.
Hybrid retrieval
Memory retrieval combines two strategies to maximize recall and precision:
- Semantic search — queries are embedded as vectors and matched against stored fact embeddings using cosine similarity. This captures meaning even when wording differs.
- BM25 keyword search — a traditional term-frequency search that excels at exact matches, names, identifiers, and technical terms that embeddings may flatten.
Results from both strategies are merged and ranked. The top-k facts are injected into the agent's context window before each reasoning step.
Memory extraction
The agent automatically extracts facts, preferences, and decisions from conversations. Extraction runs as a post-processing step after each conversation turn. Extracted facts are stored with source attribution, timestamp, and an initial relevance score.
Memory decay
Unused memories gradually reduce in relevance score. Each time a fact is retrieved and used by the agent, its relevance score is boosted. Facts that are never retrieved decay over time, ensuring the most useful knowledge stays prominent while stale information fades.
All memory — session, working, and long-term — is stored in your local PostgreSQL instance. Embeddings are generated locally if you use Ollama, or sent to your configured provider's embedding API. No memory data is ever sent to Salmex I/O servers.
Judge & Safety
The judge is a separate evaluation layer that sits between the agent and tool execution. Every tool call the agent proposes is evaluated for risk before it runs. The judge uses a dedicated LLM call — separate from the agent's reasoning — to assess potential impact.
Risk levels
The judge classifies each tool call into one of four risk tiers:
| Level | Action | Example |
|---|---|---|
| None | Auto-approve | Reading a file, web search |
| Low | Log only | Writing to a known project file |
| Medium | Notify user | Installing a package, modifying config |
| High | Require explicit approval | Deleting files, running destructive commands |
Escalation flow
When the judge assigns a Medium or High risk level, the escalation flow activates:
- The agent proposes an action with tool name and arguments.
- The judge evaluates the action and assigns a risk level with reasoning.
- For Medium risk: the user is notified and the action proceeds unless they intervene within the configured window.
- For High risk: execution is blocked until the user explicitly approves or denies the action.
There is no way to auto-approve High-risk actions. Destructive operations like file deletion, database modifications, and unrecognized shell commands always require your explicit approval. This is by design and cannot be overridden in configuration.
Budget tracking
The judge enforces cost limits at two levels: per conversation and per day. When a limit is reached, the agent is stopped mid-loop and the user is notified. Budget limits prevent runaway costs from long-running agent loops or expensive model calls.
Audit log
Every judge decision is recorded in the audit log with the full context: the proposed action, the risk assessment, the reasoning, and the outcome (approved, denied, or timed out). The audit log is stored in PostgreSQL and is queryable through the CLI and Web UI.
Channel System
Channels are the interface layer between external messaging platforms and the Salmex I/O core. Each channel adapter translates platform-specific message formats into a unified message envelope that the gateway understands.
Unified message envelope
Every message, regardless of origin, is normalized into the same structure: sender identity, channel type, content (text, attachments, or both), timestamp, and metadata. This means the agent and all downstream layers are completely channel-agnostic — they never need to know whether a message came from Telegram, the CLI, or the Web UI.
Channel adapters
Each adapter handles the platform-specific details:
- Telegram — webhook-based. Handles bot commands, inline queries, and media attachments. Supports Markdown formatting in responses.
- CLI — stdin/stdout interface for scripting and terminal use. Supports piped input and streaming output.
- Web UI — WebSocket connection to the SvelteKit frontend. Supports real-time streaming, rich formatting, and interactive elements.
- Future — Slack, Discord, and WhatsApp adapters are planned. The adapter interface is stable and documented for third-party implementations.
Multi-channel
The same agent instance, with the same memory and configuration, is accessible from every connected channel. A conversation started on Telegram can be continued from the Web UI. The agent remembers context regardless of which channel delivered the message.
Delivery tracking
Each outbound message carries a delivery status: sent, delivered, or read. The delivery tracker updates status based on channel-specific acknowledgements (Telegram delivery receipts, WebSocket acks, etc.). Failed deliveries are retried with exponential backoff.
Scheduler
The scheduler enables the agent to perform actions on a schedule — recurring tasks, deferred actions, and timed reminders. It runs as an independent layer that submits messages to the gateway at the scheduled time.
Schedule expressions
Jobs can be defined using standard cron expressions or natural language:
# Cron expression
0 9 * * 1-5 → every weekday at 9:00 AM
# Natural language (parsed to cron)
every weekday at 9am → 0 9 * * 1-5
every 6 hours → 0 */6 * * *
daily at midnight → 0 0 * * *Persistent job queue
The job queue is backed by PostgreSQL. Jobs survive server restarts, crashes, and upgrades. Each job records its schedule, payload (the message to submit), next run time, and execution history.
Dead letter queue
Jobs that fail after the configured retry limit are moved to a dead letter queue (DLQ). The DLQ preserves the failed job with its error context for inspection. You can replay, modify, or discard dead-lettered jobs from the CLI or Web UI.
Job history
Every job execution is recorded: start time, duration, result (success or failure), and any output. Job history is queryable and powers the scheduler dashboard in the Web UI.
Plugin System
Plugins extend Salmex I/O with custom tools without modifying the core binary. Each plugin is a separate executable that communicates with Salmex I/O over JSON-RPC 2.0 through stdin/stdout.
Plugin protocol
The protocol is JSON-RPC 2.0, chosen for its simplicity and wide language support. Plugins can be written in any language that can read from stdin and write to stdout. The protocol defines four phases:
- Discover — Salmex I/O scans the plugin directory and finds executables matching the naming convention.
- Init — Salmex I/O starts the plugin subprocess and sends an initialization message with configuration.
- Register — the plugin responds with its tool definitions: name, description, parameter schema, and risk hints.
- Execute — when the agent calls a plugin tool, Salmex I/O sends the call over stdin and reads the result from stdout.
Crash isolation
Each plugin runs as an independent subprocess. If a plugin crashes, it does not take down the Salmex I/O server or affect other plugins. The plugin manager monitors subprocess health and restarts crashed plugins automatically with exponential backoff.
Tool registration
Tools registered by plugins appear alongside built-in tools in the agent's tool list. The agent does not distinguish between built-in and plugin tools — they share the same interface. Plugin tools are subject to the same judge evaluation as built-in tools.
{
"jsonrpc": "2.0",
"method": "tools.register",
"params": {
"tools": [
{
"name": "weather.forecast",
"description": "Get weather forecast for a location",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" },
"days": { "type": "integer", "default": 3 }
},
"required": ["location"]
},
"risk_hint": "none"
}
]
}
}Event Bus
The event bus is a cross-cutting concern that connects every layer. All significant actions in the system emit typed events. Internal subscribers and external SSE clients consume these events for logging, metrics, real-time UI updates, and side effects.
Typed events
Every event has a defined type, a payload schema, and a source layer. Examples of event types:
| Event | Source | Description |
|---|---|---|
message.created | Channel | New message received from a channel |
agent.reasoning | Agent | Agent produced a reasoning step |
tool.started | Agent | Tool execution began |
tool.completed | Agent | Tool execution finished with result |
judge.evaluated | Judge | Risk assessment completed for a tool call |
memory.stored | Memory | New fact extracted and stored |
response.delivered | Channel | Response sent back to the user |
SSE streaming
The Web UI connects to the server via Server-Sent Events (SSE). As events are emitted internally, they are pushed to connected SSE clients in real time. This powers the live-updating conversation view, tool execution indicators, and the system activity dashboard.
Internal subscribers
Layers can subscribe to events from other layers without direct coupling. The logging system subscribes to all events for the audit trail. The metrics collector subscribes to performance-related events for dashboard charts. The delivery tracker subscribes to response events to update message status.
Fan-out
A single event can trigger multiple handlers. When a tool.completed event fires, the logger records it, the metrics collector updates latency stats, the SSE streamer pushes it to the Web UI, and the agent's loop runner processes the result — all concurrently. Fan-out is non-blocking: a slow subscriber cannot stall the event bus or other subscribers.
Security Model
Security in Salmex I/O follows a defense-in-depth approach with multiple independent layers of protection.
Trust zones
Salmex I/O operates with two trust zones:
- Local (trusted) — requests from localhost are trusted by default. No API key is required. This is the primary mode for personal use.
- External (untrusted) — requests from non-localhost origins require API key authentication. External channels like Telegram are always treated as untrusted.
Authentication
External access requires an API key, sent via the X-API-Key header or Authorization: Bearer header. API keys are compared using constant-time comparison (crypto/subtle) to prevent timing attacks. Keys are stored hashed in the database.
Network security
- CORS — configurable allowed origins, strict by default. Only explicitly listed origins can make cross-origin requests.
- Host validation — only configured hostnames are accepted in the
Hostheader. Requests with unexpected hosts are rejected. - Request IDs — validated as UUID format before accepting client-provided values. Invalid request IDs are replaced with server-generated UUIDs.
Tool sandboxing
File operations (read, write, edit, delete) are restricted to configured directories. The agent cannot access files outside these boundaries. Shell commands run with the server process's permissions — running Salmex I/O as a non-root user with limited filesystem access is strongly recommended.
Salmex I/O should never run as root. Create a dedicated user with access only to the directories the agent needs. The tool sandbox enforces path restrictions, but OS-level permissions provide the strongest boundary.
No telemetry
Salmex I/O sends zero data home. There are no analytics, no crash reports, no usage metrics, and no update checks. The binary makes no network requests unless you have configured an external LLM provider or search engine. You can verify this by running Salmex I/O with network monitoring — the only outbound connections will be to your configured API endpoints.
What's next
- Getting Started — installation and initial configuration
- Run Locally — full guide to running on your own machine
- Remote Access — deploy behind nginx, Caddy, or Cloudflare