gateway.remote pins CLIs to cloud hosts, and how OPENCLAW_GATEWAY_PORT plus isolated state directories prevent shadow instances from stomping each other, closing with a Hosting matrix for Linux VPS versus macOS bare-metal cloud nodes across Singapore, Tokyo, Seoul, Hong Kong, US East, and US West.
Why Gateway hot reload looks broken even when it works
Gateway is a long-lived Node process multiplexing WebSockets, CLI RPC, sessions, and channel adapters on one listener (default 18789). Hot reload merges configuration deltas while attempting to keep sessions alive, yet anything that requires tearing down the listening socket cannot be faked in place. TLS termination parameters, bind scope moving from loopback to LAN, port collisions, and auth hard gates tied to exposure class all imply a controlled bounce. Hybrid reload modes maintain internal allowlists of fields considered safe for online merge versus fields flagged restart-required; that split shifts release by release in pre-1.0 velocity, so runbooks must defer to live logs instead of memorizing static lists.
Five recurring signatures masquerade as reload bugs but really indicate category mistakes or pointer drift.
Port hopping without restart: The old listener keeps file descriptors until the process exits, yielding EADDRINUSE or split-brain CLIs.
Non-loopback exposure missing tokens: Safety rails refuse to apply hot merges that would expose anonymous admin surfaces.
Stale remote targets: Laptops still target localhost while the authoritative Gateway migrated upstream.
Shared state directories: Second instances read foreign channel caches and produce spooky cross-effects.
Schema drift after upgrades: Defaults reorder merges so yesterday hot-safe knobs become restart-gated today.
Triage order: classify edits as socket-touching versus routing-only, validate remote wiring and environment variables, then revisit daemon health. For full-stack install-through-doctor coverage, read the complete Gateway deployment guide on this blog; this article stays orthogonal and deeper on reload plus multi-instance mechanics.
Archive config snapshots with ticket identifiers whenever operators blur staging and production; git diff discipline beats heroic midnight merges.
Operational telemetry reinforces the story: when median CLI latency spikes only during merges that touch the gateway block, you almost always missed a restart-required signature rather than suffering systemic CPU starvation. Correlate release timestamps with channel mute incidents and you will see how skipping the bounce elongates gray failures where messages appear delivered upstream yet never reach model routing. Training junior responders to grep logs for those signatures pays faster than expanding fleet-wide CPU headroom.
Finally, document which automation accounts may invoke reload-sensitive tooling. CI jobs that rewrite JSON through naive templating often introduce trailing commas or duplicate keys that Gateway parsers reject quietly until restart, which teams misattribute to flaky reload. Schema validation in CI keeps human edits and robot edits aligned.
Which knobs hot-apply and which deserve a maintenance window
Instead of enumerating volatile field names, anchor decisions on transport class: listeners, TLS, and auth coupling touch the kernel-facing boundary and belong to planned restarts; routing policies, many channel parameters, model routing experiments, and formatting tweaks tend to merge online. Always confirm via Gateway logs whether apply succeeded or restart is mandatory. Public exposure continues to require synchronized reverse-proxy WebSocket upgrades and token rotation discipline documented in the VPS hardening article referenced later.
| Dimension | Usually hot-friendly | Usually restart-required |
|---|---|---|
| Transport | Channel message templates within safe subsets | Port, bind mode, TLS materials |
| Trust boundary | Allowlist expansions inside supported hot paths | Loopback to LAN transitions without auth tokens |
| Model routing | Provider aliases, sampling knobs | Structural gateway.mode shifts |
| Observability | Verbose logging when hot-merge supported | Admin UI bind changes sharing the listener |
| Topology | Progressive channel probes | Colliding second listener on identical triples |
Respect the socket boundary: hot reload optimizes routing churn, not kernel listener gymnastics.
Firewall, NGINX or Caddy snippets, and provider security groups still belong to the public Gateway playbook on this site; duplicate screenshots would only rot when vendors rename consoles.
Game-day exercises help cement the matrix: schedule tabletop drills where operators intentionally toggle a restart-required field during synthetic traffic, measuring mean time to acknowledge versus mean time to restore channel ACK rates. Capture_video-less textual timelines suffice; the muscle memory matters more than polished recordings.
gateway.remote, tokens, and isolated homes for parallel Gateways
Remote mode decouples the machine running daily CLI commands from the machine owning channels and sessions. The laptop keeps a thin configuration pointing at the cloud WebSocket endpoint plus secrets injected through environment variables rather than shell history. Parallel Gateways for staging demand triple isolation: distinct ports, distinct OPENCLAW_HOME trees, and distinct log labels so support staff never confuse stderr streams during incidents.
Host A production:
OPENCLAW_GATEWAY_PORT=18789
OPENCLAW_HOME=/var/openclaw/prod-a
Host B staging:
OPENCLAW_GATEWAY_PORT=18790
OPENCLAW_HOME=/var/openclaw/staging-b
Laptop CLI:
gateway.remote.url=wss://gateway.example.com/openclaw
gateway.remote.token=${OPENCLAW_REMOTE_TOKEN}
After each upgrade, diff remote-related keys because rapid pre-1.0 releases reorder defaults. On cloud Mac hosts, validate LaunchAgent wake behavior separately from reload semantics; sleep-induced disconnects masquerade as failed merges.
Reverse proxies complicate remote URLs: terminating TLS at the edge means your CLI sees wss while the inner hop might still speak ws on localhost. Document three URLs explicitly—public client URL, inner upstream URL, health-check curl target—to prevent mysterious split-brain where dashboards green-light while CLIs fail handshake negotiation. Keep idle timeouts longer than longest cron-driven bursts so intermediaries do not truncate silent periods.
When running two Gateways for blue-green experimentation, rotate tokens independently so a leaked staging secret never unlocks production channels. Pair rotation with automated reminders because parallel environments amplify forgetfulness.
Note: Treat tokens like deploy secrets: rotate when exposure classes change and store them in a vault accessible to CI, not in plaintext README fragments.
Six-step change discipline that respects channel traffic
Snapshot configuration: Export JSON plus relevant environment before edits with owner metadata.
Classify edits: Tag socket-level versus routing-only to choose restart windows proactively.
Apply and read logs: Confirm reload applied versus restart required signatures immediately.
Probe channels off-peak: Send single test messages on critical surfaces.
Validate remote CLI: From laptops run read-only status commands against the intended port.
Prepare rollback: Pin previous package versions and keep diff lists if anomalies appear.
Annotate each step with responsible roles so platform engineers and application owners know who owns probes versus who owns customer communications. Confusion during minute fifteen of an outage wastes more wall-clock than any single mistyped port.
Lightweight checklists beat improvisational heroics when very tired humans operate complex meshes.
Three rules of thumb plus macOS cloud versus Linux VPS hosting
Port uniqueness: One active listener per port; parallel workspaces need parallel ports and homes.
Remote minimalism: Laptops should not accidentally spawn duplicate local Gateways while remote is authoritative.
Host matrix: Lightweight control planes fit Linux VPS; stacks demanding stable Apple APIs or desktop-class tooling pair better with macOS bare metal in the same metro as your users.
Warning: Never widen bind to any without completing auth and proxy reviews first.
Operating Gateway like a disposable container without understanding reload contracts burns incident time during traffic peaks. Pairing disciplined configuration merges with stable hosts turns OpenClaw into infrastructure instead of a laptop accessory. MESHLAUNCH Mac mini cloud rentals are typically the stronger fit when you need Apple Silicon exclusivity, resilient power, and elastic leases across Singapore, Tokyo, Seoul, Hong Kong, US East, and US West to park macOS-native workloads beside your Gateway control plane, backed by clear change windows instead of hoping a lid-open Mac survives the weekend.
Finance stakeholders care about this distinction because mistaken reload assumptions trigger unnecessary vertical scaling spend. Prove through logs whether incidents correlate with socket churn; if not, redirect budget toward networking clarity or duplicate staging hosts rather than oversized CPUs that idle.
Yes when listener or auth coupling changed. Follow the complete Gateway troubleshooting guide for ladder checks before restarting blindly.
See the VPS firewall and WebSocket guide. Compare tiers on the pricing page when sizing bandwidth.
Visit the help center and attach remote URLs, ports, and token variable names to support tickets.