GPT-5.6 Sol / Terra / Luna
hard numbers и runbook (2026)

91,9% TerminalBench · three-tier pricing · gov lock · Cerebras 750 t/s

GPT-5.6 Sol Terra Luna benchmark сравнение
26 июня 2026 OpenAI дропнул GPT-5.6 Sol, Terra и Luna — самую жирную model family года, впервые с celestial naming (Солнце / Земля / Луна). Если вы гоняете agent pipelines и хочется не hype, а numbers: pricing matrix на три tier, Max/Ultra multi-agent modes, полный dump TerminalBench + CTF + ExploitBench, контекст US government lock, head-to-head с Claude Mythos 5, timeline Cerebras 750 token/s, шестишаговый runbook и шесть FAQ. Без воды.
01

Release 26.06.2026: pricing matrix и почему access locked

OpenAI выкатил GPT-5.6 26 июня с solar-system naming: Sol (flagship), Terra (balanced), Luna (lightweight). Sol сбивает Claude Mythos 5 на TerminalBench 2.1 — рекорд 91,9%. Все три tier впервые в одной линейке проходят OpenAI High cybersecurity threshold.

ModelWorkloadInputOutputHighlight
SolAgents, hard coding$5 / 1M$30 / 1MTerminalBench #1: 91,9%
TerraHigh-volume biz API$2.50 / 1M$15 / 1MGPT-5.5 perf, −50% cost
LunaSummary, automation$1 / 1M$6 / 1M80% cheaper input vs Sol

Catch: по запросу US government сейчас только ~20 vetted orgs имеют access. Broad GA — в течение нескольких недель.

01

Preview-only: обычный ChatGPT user не видит GPT-5.6. API gated для gov-approved partners — production planning gap.

02

5× pricing spread: Sol input = 5× Luna. Terra claims GPT-5.5 parity at half price — без своих workload benchmarks не верифицируется.

03

Competitor vacuum: Claude Fable 5 и Mythos 5 offline с 12 июня. Gemini 3.5 Pro slipped to July. Июнь 2026 должен был быть biggest AI release month ever.

04

High cyber rating: compliance teams нужны deployment guardrails до internal rollout.

05

Incomplete system card: SWE-Bench Pro и другие dimensions не fully published. TerminalBench alone ≠ production decision.

02

Sol vs Terra vs Luna: specs, modes, когда что тащить в prod

GPT-5.6 Sol — top of stack OpenAI. Два reasoning modes, которые раньше не были в lineup:

Max

Max Mode: Sol тратит extra reasoning time pre-response. Latency trade-off for accuracy — когда answer must be right, not just fast.

Ultra

Ultra Mode: spawn multiple subagents, parallel execution, merge results. Multi-agent architecture = TerminalBench record. Reserve для genuinely complex tasks — token burn significantly higher.

GPT-5.6 Terra — daily enterprise grind: customer support at scale, internal tools, doc analysis. Near GPT-5.5 при 50% lower cost — best bang per buck для large deployments.

GPT-5.6 Luna — high-frequency, low-latency tasks. First non-flagship OpenAI model с simultaneous High в cybersecurity и biology.

DimensionSolTerraLuna
Context window~1.5M tokens~1.5M tokens~1.5M tokens
Input / output $$5 / $30$2.50 / $15$1 / $6
Cyber ratingHighHighHigh
Ideal workloadAgents, sec researchEnterprise API scaleDrafting, classification

Mythos 5 держал TerminalBench #1 всего 17 дней (с 9 июня) — до Sol.

03

Benchmark dump: TerminalBench, CTF, life sciences

Coding: TerminalBench 2.1 — 89 complex CLI planning challenges, real agent behavior.

ModelScoreMode
GPT-5.6 Sol91,9%Ultra (multi-agent)
GPT-5.6 Sol88,8%Standard
Claude Mythos 588,0%Standard
GPT-5.583,4%Standard
Gemini 3.1 Pro Preview70,7%Standard

Long-horizon agents: Agent's Last Exam

ModelTask completion (code mode)
GPT-5.6 Sol50,9% — only model above 50%
GPT-5.6 LunaSlightly above GPT-5.5

Cybersecurity: CTF hit rates

ModelHit rate
Sol96,7%
Terra91,84%
Luna85,19%

ExploitBench: Sol matches Anthropic Mythos Preview при ~⅓ output tokens. Red-teaming: Sol cannot autonomously engineer complete functional exploit chain vs hardened Chromium/Firefox targets.

Life sciences: GeneBench v1 — Sol matches/exceeds GPT-5.5 with fewer tokens. HealthBench Professional: 60,5, +8,7 points vs GPT-5.5.

Safety stack: real-time misuse classifiers, account-level review для sensitive workflows, 700k A100-equivalent GPU hours automated red-teaming, universal jailbreak testing, specialized large reasoning model как final filter pre user-facing output.

04

6-step runbook: как не облажаться с GPT-5.6 access

01

Verify access tier: ваша org в ~20 approved partners? Если нет — держите GPT-5.5 + Claude Opus 4.8, alerts на OpenAI status pages.

02

Match model to workload: Sol (Ultra) для complex coding agents. Terra для doc pipelines и support APIs. Luna для summarization и lightweight automation. Terra как half-price GPT-5.5 substitute при tight budget.

03

Externalize model IDs: gpt-5.6-sol, gpt-5.6-terra, gpt-5.6-luna через env vars. LiteLLM fallback chains вместо hardcoded offline IDs как claude-mythos-5.

04

Regression benchmarks: replay multi-step agent tasks на своей codebase vs GPT-5.5 baseline. Profile Ultra mode token cost + latency — enable только когда overhead justified.

05

Plan Cerebras July: Sol on Cerebras target до 750 tokens/sec vs 50–150 у большинства frontier models today. 10-sec response → under 1 sec. Contact OpenAI enterprise sales early для quota.

06

Compliance review: все три tier = High cyber risk. Review classifier policies pre internal deploy. Watch US cyber EO framework ~2 July в 30-day review window.

05

GPT-5.6 vs Mythos 5: numbers + government restriction precedent

CategoryGPT-5.6 SolClaude Mythos 5
TerminalBench 2.191,9% (Ultra)88,0%
ExploitBenchNear-identical, 3× cheaperStrong (restricted)
Pricing$5 / $30$10 / $50 (offline)
AvailabilityLimited preview, GA soonOffline (export control)
Context~1.5M tokens200K tokens

2 июня 2026 Trump signed EO: до 30 дней pre-release government access для frontier AI models. 26 июня OpenAI limited GPT-5.6 to ~20 pre-approved trusted partnersfirst time US government formally required AI company restrict model release.

VendorModelStatus
OpenAIGPT-5.6 Sol/Terra/LunaLimited preview (~20 orgs)
AnthropicClaude Fable 5 / Mythos 5Forced offline 12 June
GoogleGemini 3.5 ProDelayed to July

Timeline: now — ~20 partners via API + Codex. July — ChatGPT GA (Plus/Pro first), public API, Cerebras Sol 750 t/s для enterprise. Polymarket: 87% probability broad release by 31 July 2026.

A

TerminalBench 2.1: Sol Ultra 91,9%, dethroned Mythos 5 after 17 days at #1.

B

Cerebras speed: up to 750 t/s in July — 5× to 15× faster than today's frontier.

C

Token efficiency: ExploitBench parity at ~ output tokens vs competitors.

Heads up: cloud APIs alone = zero buffer against government restrictions или sudden model takedowns. Shared VPS agent hosts = resource contention + swap jitter. Local Mac purchase = depreciation risk + uncertain upgrade cycles.

Для 24/7 prod с AI agents, Sol Ultra multi-agent workflows и Cursor/Codex eval pipelines — MESHLAUNCH Mac Mini M4 bare-metal cloud rental обычно better fit: dedicated Apple Silicon, elastic day/week/month billing, native launchd agent supervision. См. цены аренды, альтернативы Claude Fable 5 и сравнение AI coding assistants.

FAQ

Для обычных users — нет. Сейчас ~20 trusted partners via API + Codex. Broad ChatGPT rollout expected в июле 2026. Agent host options: страница тарифов.

Sol: flagship с Max/Ultra, 91,9% TerminalBench 2.1, $5/$30 per MTok. Terra: GPT-5.5-level perf at half cost ($2.50/$15), ideal для high-volume doc + support APIs.

После Trump EO 2 июня White House (OSTP/ONCD) requested OpenAI limit access during security review. OpenAI complied but publicly opposes permanent industry practice.

До 750 tokens/sec с июля 2026 для select enterprise — ~5–15× faster vs 50–150 t/s у большинства frontier models.

Sol leads TerminalBench 2.1: 91,9% vs 88,0%. ExploitBench near-identical at ⅓ token cost. Context ~1.5M vs 200K. Fable 5 may still lead SWE-Bench Pro — full GPT-5.6 system card pending.

Sol для complex coding agents + sec research. Terra для scale. Luna для drafting + automation. Sol on Cerebras after July для latency-critical realtime. Multi-model eval setup: центр помощи.