Can a 64GB M4 Pro run ds4?

No for production: Flash q2 asymmetric weights expect at least 96GB unified memory. Rent a 128GB cloud Mac day-first to validate before buying hardware.

Does cloud ds4 traffic go through a third-party API?

No. ds4-server listens on your dedicated instance; point Cursor or Claude Code at that host. Weights and KV snapshots stay on your rented disk.

Can ds4 coexist with Ollama?

Yes on the same machine, but do not load two large models at full duty cycle. Reserve 96GB+ for ds4 long-context sessions; keep smaller models on Ollama.

2026 antirez ds4 on Mac: DeepSeek V4, the 96GB Wall, and Cloud Mac Rental

If you want frontier-class open weights offline on a Mac, software is no longer the blocker—RAM is. Redis author antirez shipped ds4 (DwarfStar 4) in May 2026: pure C, Metal-first, built only for DeepSeek V4 Flash. This post is for AI engineers who hit the 96GB unified-memory floor: what ds4 actually does, a quant/memory matrix, and a six-step runbook to compile, load weights, and wire ds4-server to Cursor on a high-RAM cloud Mac without buying a six-figure Studio.

What is ds4 in 2026 and why antirez went single-model

llama.cpp, Ollama, and MLX already load many GGUFs. ds4 does the opposite: one model family, end-to-end—Metal graph execution, asymmetric quants, on-disk KV snapshots, tool calling, and ds4-server with OpenAI- and Anthropic-compatible endpoints. In his write-up, antirez argues the gap was never “another runtime,” but “weights fast enough to replace daily Claude calls on personal gear.”

Momentum: github.com/antirez/ds4 crossed 10k+ stars within days—developers want depth on one checkpoint, not another generic loader.

Self-contained: no llama.cpp dependency; macOS production path is Metal (CPU is debug-only; README warns macOS VM bugs can kernel-panic on CPU inference).

Agent-ready: point Cursor, opencode, or Claude Code at your instance—data stays on your disk, not a hosted API.

Long context: design targets up to ~1M tokens with compressed KV plus ds4 disk snapshots so sessions survive restarts.

Real blocker: 96GB–512GB unified memory—that is what cloud Mac rental is meant to unblock.

Metal, disk KV, and 2-bit routing quants: how ds4 differs

Community reports on M-series Max machines cite roughly 463 tok/s prefill and 34 tok/s generation for Flash—always benchmark on your own box before signing SLAs.

Capability	ds4	Generic Ollama / llama.cpp
Scope	DeepSeek V4 Flash path	hundreds of GGUF architectures
macOS GPU	Metal as primary target	multi-backend, less DS-specific tuning
KV state	RAM + disk snapshots	often lost on process exit
Quant	2-bit on routed experts only	single global quant tier
Coding agents	built-in tools + compatible APIs	extra gateway assembly

Apple Silicon unified memory (UMA) lets CPU and GPU share one pool—why ds4 pairs Metal with fast NVMe for KV persistence instead of treating Mac as an afterthought.

Citable baseline: official docs bind production inference to Metal/CUDA; asymmetric 2/8-bit Flash weights expect 96GB or 128GB UMA—below that is outside the supported path.

How much RAM for DeepSeek V4 Flash and PRO: 2026 matrix

Model / quant	Min unified RAM	Typical hardware	Buy-side order of magnitude
V4 Flash · q2	96 GB	MacBook Pro M3/M4/M5 Max	~$4k+ USD class
V4 Flash · q4	256 GB	Mac Studio Ultra	~$8k+ USD class
V4 PRO · q2	512 GB	Mac Studio M3 Ultra maxed	~$15k+ USD class

Pilot tier (96–128GB): enough for Flash q2 plus Cursor tool-calling smoke tests—ideal for daily cloud rental.

Production coding (128–256GB): parallel agents plus long context—keep ~20% RAM headroom to avoid swap thrash.

PRO experiments (512GB): rent by the week on cloud metal instead of capitalizing a one-off purchase.

Six steps to run ds4 on a cloud Mac end to end

Pick RAM for your quant: Flash pilot → 128GB instance; q4 or PRO → 256GB / 512GB to avoid re-downloading weights mid-project.

Validate Metal: system_profiler SPDisplaysDataType; ensure Command Line Tools via xcode-select -p.

Build ds4: git clone https://github.com/antirez/ds4.git && cd ds4 && make inside tmux so SSH drops do not kill the compile.

Stage weights on local NVMe: follow the repo for official vectors/GGUF paths—hundreds of GB; never use iCloud-synced folders.

Start ds4-server: bind loopback or private IP; curl /v1/models to confirm Metal, not CPU debug backend.

Agent acceptance: tunnel or Tailscale Serve; run a tool-calling coding task; verify KV snapshots survive reconnect without full prefill.

SSH port forward

ssh -N -L 8080:127.0.0.1:PORT user@your-cloud-mac.example.com
export OPENAI_BASE_URL=http://127.0.0.1:8080/v1

Skip the six-figure Mac: rent Flash, burst to PRO when needed

Buying locks capital and depreciation; cloud bare-metal turns RAM into a dial—128GB this week for Flash plugins, 512GB next week for PRO benchmarks, then power off.

Dimension	Buy Studio Ultra	High-RAM cloud Mac
Cash upfront	five-figure purchase	hourly / daily / monthly
Elasticity	new machine = new purchase	resize 128GB ↔ 512GB
Team sharing	one laptop per person	one instance, SSH roles, shift inference
Privacy	physical control	dedicated bare metal—weights never leave your disk

Generic Linux GPU VPS paths are a poor fit: ds4’s supported macOS story is Metal. Pair ds4 with our parallel agent workflow post—use a 64GB cloud Mac as the control plane and a 128GB+ box as the heavy inference worker.

For teams that need stable Metal inference without a six-figure CapEx line, MESHLAUNCH high-RAM Mac mini / M4 Pro / Max bare-metal rental is usually the pragmatic path: day-rent Flash, month-lock long-context production, burst PRO on demand—all inside your dedicated instance, not a third-party model API. See the pricing page and help center.

FAQ

Not on the supported path—Flash q2 needs 96GB UMA minimum. Day-rent 128GB first, then decide on hardware.

No—ds4-server runs on your rented instance; point your IDE base URL there. We do not proxy model payloads.

Yes, but avoid loading two large models at full duty. Reserve 96GB+ for ds4; keep small models on Ollama—memory tables in the help center.

Back to blog Rent now

2026 antirez ds4 on MacDeepSeek V4, the 96GB Wall, Cloud Rental

What is ds4 in 2026 and why antirez went single-model

Metal, disk KV, and 2-bit routing quants: how ds4 differs

How much RAM for DeepSeek V4 Flash and PRO: 2026 matrix

Six steps to run ds4 on a cloud Mac end to end

Skip the six-figure Mac: rent Flash, burst to PRO when needed

2026 antirez ds4 on Mac
DeepSeek V4, the 96GB Wall, Cloud Rental