Approve Deployments From Your Phone
"I cannot go for a run, I cannot go to the movies, I cannot go for a dinner with family, I cannot even go shopping." — dakiol, Hacker News, on being on-call
The Story
It's 8pm. You're at dinner with your partner. Your phone buzzes. The monitoring alert says 500 errors on the payments service — error rate spiked from 0.1% to 12% in the last three minutes.
Normally, this is the moment dinner ends. You excuse yourself, find somewhere with decent WiFi, open the laptop, VPN into the network, SSH into the server, pull the logs, figure out what broke, write a fix, test it, deploy it, verify it's working, write up the incident. Ninety minutes minimum. Probably more. By the time you're done, the restaurant is closing and the evening is gone.
This happens often enough that you've stopped making plans on nights you're on-call. Some engineers stop making plans altogether.
The Idea
Your agent is monitoring the same alerts you are. When the 500 errors spike, it doesn't just notify you — it investigates. It pulls the logs, traces the error, reads the recent deployment diff, checks the database state, and identifies the root cause.
Then it writes the fix. Runs the tests. And sends everything to your phone.
ESCALATION — High Risk — Payments Service
500 errors on /api/payments/process since 19:57. Error rate at 12.3%, up from 0.1% baseline.
Root cause: Database schema mismatch. Migration 047_add_currency_field deployed at 19:55 added a NOT NULL column currency to payments table without a default value. Existing rows have NULL, new inserts fail the constraint.
Proposed fix:
ALTER TABLE payments ALTER COLUMN currency SET DEFAULT 'USD';
UPDATE payments SET currency = 'USD' WHERE currency IS NULL;
Test results: Ran against staging snapshot — 847 rows updated, all payment processing tests pass, no constraint violations.
Impact: ~4 minutes downtime during migration. Payments queued during outage will retry automatically.
[ Approve ] [ Deny ] [ Show full logs ]
Approve
Migration applied. 847 rows updated in 2.3 seconds. Error rate dropping — 12.3% → 4.1% → 0.8% → 0.1%. Back to baseline.
Payments queued during the outage (23 transactions) are retrying now. 19 of 23 processed successfully so far. Monitoring the remaining 4.
I'll send you a full incident report in the morning. Enjoy your dinner.
Four minutes. You didn't leave the table. You didn't open a laptop. You tapped a button on your phone and went back to your meal.
What It Looks Like
Not every incident gets escalated to your phone. The Judge evaluates risk and determines the appropriate response:
Low risk — handled silently. A non-critical background job fails and auto-retries. Disk usage crosses 80% and the agent runs the cleanup script it's used before. You never hear about it unless you ask.
Medium risk — handled, logged, morning summary. A dependency update causes a test failure in CI. The agent reverts the update, pins the previous version, and files an issue. You see it in your morning briefing with a one-line summary.
High risk — diagnosis + approval on your phone. Database migrations, production deployments, infrastructure changes. The agent does the investigation and prepares the fix, but won't execute without your explicit approval. This is the payments example above.
Critical — blocked entirely, full escalation. Anything involving data deletion, security credentials, or financial systems above a threshold. The agent describes the situation and its recommendation but does not propose autonomous action. "I found the issue but this requires manual intervention — here's what you need to do when you're at a terminal."
How It Works
- Channels — Monitoring alerts come in through webhooks or polling. Escalation messages go out through Telegram (or any configured channel) with inline action buttons.
- Coding agent — Investigates the incident: reads logs, traces errors through code, checks database state, reads recent diffs to find what changed. Writes the fix using the same 9+1 tool loop used for normal coding tasks.
- Exec command — Runs diagnostic commands on the server: log queries, database checks, health endpoints, test suites. All commands are logged and auditable.
- Judge — Four risk levels determine the escalation path. Risk assessment considers: what system is affected, what the proposed action is, time of day, whether similar actions have been approved before, and the blast radius of failure.
- Escalation — The human-in-the-loop mechanism. High-risk actions are packaged with full context (diagnosis, proposed fix, test results, impact assessment) and sent to your phone. You approve, deny, or ask for more information — all from the chat interface.
- Memory — Remembers past incidents, what fixes worked, what your preferences are. "Last time this migration pattern caused issues, you asked me to always test against a staging snapshot first." Over time, the agent gets better at diagnosing your specific infrastructure.
- Remote access — The agent runs where your code runs. It has the same access you would have if you SSH'd in: logs, databases, deployment tools, monitoring endpoints.
What Breaks Without This
Without AI diagnosis (30–60 minutes manual). You get the alert, open the laptop, connect to VPN, SSH in, start reading logs. Half the time is figuring out what changed. The other half is understanding why it broke. The agent does this in under a minute because it can read code, logs, and database state simultaneously.
Without mobile approval (need laptop). Even if someone else diagnoses the problem, you still need a terminal to deploy the fix. The approval-on-phone pattern means the agent handles execution — you just authorise it.
Without the Judge (autonomous = dangerous). An AI agent that can deploy to production without approval is terrifying. The Judge is the critical safety layer that makes this viable. It ensures the agent never takes high-risk action without human confirmation, while still handling low-risk tasks autonomously.
Without memory (no incident history). Every incident starts from zero. The agent doesn't know that this migration pattern has caused problems before, doesn't remember that the last time payments went down it was the same column constraint issue, doesn't know your preference for testing against staging snapshots. Memory turns a generic agent into one that knows your infrastructure.
Build This
This is not a concept — it's buildable today.
Salmex I/O's Judge reviews every action before execution, the coding agent diagnoses and writes fixes, and Telegram delivers approve/deny buttons straight to your phone. On-call doesn't have to mean chained to a laptop.