Five signatures that misroute hybrid OpenClaw plus Ollama incidents
Hybrid stacks multiply failure surfaces from a single vendor rate limit into a sandwich of local inference processes, Gateway WebSockets, channel adapters, tool sandboxes, and upstream hosted models. When any layer is judged by gut feel alone, week three becomes a ritual of rebooting the entire cloud Mac without a change record. The signatures below are not vocabulary flexing; they are language for change review. If you can reproduce two of them together, freeze model routing and attach rollback commands to the ticket instead of pulling another quantized file.
The first signature is fluent chat with tools that never enter the executor. Teams blame Telegram latency when the model route still points at Ollama while the tool stream lacks compatible deltas. Fix that by logging the resolved provider per request and running the same tool smoke against a cloud default control host. The second signature is curl succeeding to port eleven thousand four hundred thirty-four from an SSH session while Gateway logs connection refused. That usually means different network namespaces or half-open loopback stacks between the container publish path and the host process. Align what the Gateway process sees as one two seven zero zero one with what your SSH session curls before opening broad firewall rules.
Chat works, tools never fire: treat as routing or streaming semantics first, not channel outage.
SSH curl works, Gateway refuses loopback: compare namespaces, IPv4 versus IPv6 bind, and Docker publish targets.
Swap climbs while CPU looks idle: GGUF weights plus browser automation on sixteen gigabyte tiers create hidden memory pressure.
Ollama flaps only after OpenClaw upgrades: diff global npm prefix, plist absolute paths, and workspace roots before blaming quantization.
Latency blamed on Singapore routing: split member-to-host RTT from model time-to-first-token with timestamps.
After you name the signature, write policy: production gateways may keep Ollama on a whitelist of low-risk skills while heavy browser runs default to cloud models. Beta quantizations belong on day-rent burn-in hosts, not on the same plist that carries customer tokens. If you still compare Docker versus install.sh delivery, read the dual-path article in parallel because volume maps decide whether weights survive a rolling release or disappear like ephemeral containers.
Cloud-only, Ollama-only, hybrid: one matrix for blast radius and skills
There is no forever-correct topology, only whether you can explain which supply chain each request used. The table is deliberately coarse so a staff engineer and a finance partner can align in ten minutes on data residency, tool stability, cost curves, and operational load. Hybrid is not fifty-fifty token split; it is task-type routing. Summaries and classification can ride a local eight billion parameter model while multi-file edits and guarded shell chains stay on hosted models with clearer tool contracts.
| Dimension | Cloud closed models | Ollama local only | Hybrid production exploration |
|---|---|---|---|
| Data residency story | Depends on vendor terms and egress audits | Weights and prompts stay inside the host boundary | Sensitive segments local, public segments cloud, needs routing discipline |
| Tooling and skills | Mature protocols, richer runbooks | More sensitive to quantization and stream deltas | Use cloud for complex tools, local for lighter tools |
| Cost spikes | Token billing makes bursts visible | Cost shifts to RAM and disk IO | Needs queues and fallback or you pay twice |
| Operational load | Low until vendor or quota drift | Medium because model files join the same runbook as Gateway | Higher but can be layered with frozen windows |
| Fit for seven-day cloud Macs | Strong for stable egress and channels | Strong for batch windows and redacted pipelines | Strong when control plane is cloud-first and data plane can be local |
Hybrid value is not a smaller API bill; it is separating resource-bound local failures from policy-bound cloud failures.
If you mix Singapore, Tokyo, Seoul, Hong Kong, US East, and US West with different instance sizes, also record which host is the single source of truth for each provider mix. Otherwise beta quantization looks like a regional outage. Pair that record with maintenance windows that avoid heavy automation peaks, and archive ollama list output next to openclaw doctor before and after each window.
Loopback topology and provider skeleton: make 127.0.0.1:11434 auditable
The stable co-hosting assumption is that Gateway and Ollama share the same user session, same network namespace, and the same launchd ordering story. Any workflow that starts Ollama only after an engineer SSHs in becomes non-reproducible by day seven. Encode dependency so port health precedes Gateway kickstart, not the reverse with channel traffic hammering a cold model daemon. Docker sidecars need explicit publish alignment so logs stop showing almost-successful handshakes that never reach the host loopback your Gateway reads.
curl -sS http://127.0.0.1:11434/api/tags openclaw doctor openclaw channels status --probe
On the configuration side, write three names on the same wiki page instead of scattering them across laptops: default model for daily chat, fallback model when queue depth or time-to-first-token crosses a threshold, and tool-heavy default that stays on cloud routes. Map each name to observable metrics so on-call shifts latency from feelings to numbers. When gateway.reload boundaries matter, cross-read the hot-reload article because routing edits often stack with reload versus restart semantics.
Note: Align ollama ps timestamps with Gateway logs in ticket attachments; that beats guessing whether a new GGUF caused the flap.
Six-step hybrid runbook: freeze routing through executable fallback
Treat the runbook as an interface between automation owners and finance. Each step should emit an artifact: a ticket field, a tarball, or a timestamped log bundle. Skipping artifacts turns hybrid routing into tribal knowledge that breaks every time someone rotates off the project.
Freeze the provider matrix and exact versions: list Ollama tags, OpenClaw build, and Gateway expectations on the change record.
Back up state roots and model inventory: tarball configs, plists, environment exports, and ollama list output with a UTC timestamp.
Smoke on day-rent or pre-prod: curl loopback, doctor, channels, and one lightweight tool call before touching production traffic.
Enter the maintenance window: pause heavy queues before switching defaults so browser IO does not stack with model IO.
Turn on observability thresholds: assign owners for time-to-first-token, queue depth, Swap rate, and disk free space alerts.
Publish fallback commands: document the exact sequence to return to the cloud default model with a time box for rollback completion.
Hard thresholds for on-call manuals and metro placement
These numbers are engineering communication rails, not warranties from a silicon vendor. Tune them with your own histograms, but keep them explicit so incident reviews have something falsifiable instead of vibes.
Time-to-first-token and queue depth: when an eight billion class local model exceeds roughly two point five seconds median while idle and queue depth stays above three, auto-fallback to the cloud default and log a reason code.
Swap guardrail: on sixteen gigabyte hosts running a seven billion quantization plus single-page browser automation, treat five consecutive minutes of uncomfortable Swap write rates as a sizing incident, not noise.
Disk headroom: keep roughly thirty-five percent free for logs and temporary downloads; block new model pulls below roughly twelve percent free until cleanup runbooks finish.
Caution: Thresholds here are operational shorthand, not cloud SLA promises; cross-region RTT still needs your own probes.
Relying on reinstall theater or locking to a single hosted model forces data residency stories to fight tool stability, and teams pay with weekend rebuilds. A routed, observable, fallback-aware split across bare-metal metros lets you rehearse hybrid policies on day or week rentals before committing monthly capacity. Office laptops and home machines struggle with sleep, Wi-Fi roaming, and upstream jitter while holding both Gateway long-lived sockets and large local weights. MESHLAUNCH bare-metal Mac mini cloud rental is usually the stronger operational choice because it gives stable egress, reproducible launchd units, and room to rehearse Ollama plus OpenClaw together without betting the whole production story on one fragile notebook.
Treat silent tools as routing first. Cross-read heavy tools and memory stability and open pricing when you need a fresh host profile.
Depends on immutable delivery discipline and volume maps. Compare publish ports in Docker versus install.sh and network steps in the help center.
Separate hot-reload keys from restart-only keys before the window. Read hot reload and multi-instance alongside this checklist.