When did openPangu 2.0 launch? HDC 2026 timeline and core parameters
Richard Yu unveiled openPangu 2.0 at Huawei Developer Conference (HDC) 2026 in Dongguan on June 12. On June 30, openPangu-2.0-Flash model weights, base inference code, and training operators went live on GitCode. Pro weights are planned for July; pre-training code, post-training code, and additional training tooling are scheduled for the second half of 2026.
| Variant | Total params | Active params | Sparsity | Context | Status |
|---|---|---|---|---|---|
| openPangu 2.0 Pro | 505B | 18B | ~28:1 | 512K | Planned July 2026 |
| openPangu 2.0 Flash | 92B | 6B | ~15:1 | 512K | Live June 30, 2026 |
512K context means processing roughly eight full-length novels in a single pass — among the longest windows in the open-weight category.
Model architecture: Full MoE structure definition, released with Flash.
Model weights: Flash live June 30; Pro planned for July.
Technical report: Architecture and training details published alongside weights.
Inference code + training operators: Base inference stack and Ascend custom operators, live June 30.
Pre-training code: Full reproducible training pipeline, H2 2026 — rare at this MoE scale.
Post-training code (SFT/RLHF): Alignment and fine-tuning toolchain, H2 2026.
Ascend training operators: High-performance kernels for MoE training on 910B clusters, H2 2026.
Why seven components matter: Most open releases ship weights plus inference only. openPangu 2.0 adds pre-training, post-training (SFT/RLHF), and Ascend operator code — a genuine full-stack open release for frontier-scale MoE.
What is openPangu 2.0 architecture? mHC routing and Ascend NPU stack
openPangu 2.0 uses a Mixture-of-Experts (MoE) design and is the first frontier LLM trained entirely without NVIDIA hardware — every training step ran on Huawei Ascend 910B NPUs, with zero A100 or H100 involvement.
mHC (Multi-Head Combinatorial) routing: Improves expert routing efficiency and reduces MoE load imbalance.
Muon optimizer: Second-order momentum scheme from Microsoft research, improving stability at large scale.
ModAttn (Modular Attention): Modular attention blocks tuned for 512K sequences.
DSA+SWA ultra-sparse attention (Flash only): Pushes sparsity further to cut inference compute.
Embedded edge variant: Native 30B on-device model — 50% faster inference, 20% less memory, runs offline on Kirin-powered phones.
| Metric | openPangu 2.0 | Industry baseline |
|---|---|---|
| Ascend single-card throughput | 2x mainstream open models | Non-Ascend-native architectures |
| Hypernode training efficiency | +30% | Standard MoE clusters |
| 512K long-sequence training throughput | +50% | 128K-context models |
| Train/infer consistency | >99% | Common MoE pain point |
| Flash-Int8 W4A8 memory | -40% vs BF16 | Full-precision Flash |
The developer stack runs on CANN (Huawei's CUDA-class runtime) plus torch_npu (PyTorch backend adapter). Standard PyTorch code switches to Ascend with import torch_npu. Deployment paths include Huawei Cloud ModelArts (managed API), GitCode Ascend Tribe (self-host), and HarmonyOS on-device integration. HarmonyOS Agent Framework 2.0 reports >90% success on complex multi-step tasks.
import torch
import torch_npu
model = load_openpangu("./openPangu-Flash")
model = model.to("npu:0")
output = model.generate(input_ids.to("npu:0"), max_new_tokens=512, temperature=0.7)
openPangu 2.0 vs DeepSeek, Qwen, Kimi, Llama 4: how to choose
| Model | Total params | Active params | Context | Training HW | Open depth |
|---|---|---|---|---|---|
| openPangu 2.0 Pro | 505B | 18B | 512K | Ascend NPU | Full stack (7 components) |
| openPangu 2.0 Flash | 92B | 6B | 512K | Ascend NPU | Full stack (7 components) |
| DeepSeek V4 Pro | 1.6T | ~200B | 128K | NVIDIA | Weights + inference |
| Qwen 3.7 Max | ~400B+ | varies | 128K | NVIDIA | Weights + inference + partial training |
| Kimi K2.7 | 1T | 32B | 256K | NVIDIA | Weights + inference |
| Llama 4 405B | 405B | — | 128K | NVIDIA | Weights + inference |
| Capability | openPangu 2.0 Pro | DeepSeek V4 Pro | Qwen 3.7 Max | Kimi K2.7 |
|---|---|---|---|---|
| Code generation | Good | Leader | Strong | Strong |
| Complex reasoning | Good | Leader | Leader | Strong |
| Tool use / Agent | Strong | Strong | Strong | Leader |
| Ultra-long context | Leader (512K) | Moderate | Moderate | Strong |
| Inference efficiency | Leader | Moderate | Moderate | Strong |
| Sovereign / no NVIDIA | Leader | Not applicable | Not applicable | Not applicable |
| Full-stack open source | Leader | Partial | Partial | Partial |
Code / reasoning → DeepSeek V4 Pro. Agent / multi-tool workflows → Kimi K2.7. Documents beyond 256K → openPangu 2.0 Pro. Sovereign AI / no NVIDIA → openPangu 2.0. Low-cost local inference → Flash (6B active, ~96GB unified memory).
Note: Independent third-party benchmarks for openPangu 2.0 are still pending. The capability matrix above reflects architecture and published specs; we will update when standardized results appear.
How to deploy openPangu 2.0: ModelArts API and GitCode six-step runbook
Register a Huawei Cloud account: Complete identity verification at huaweicloud.com — no hardware required for API access.
Subscribe via ModelArts: Navigate to ModelArts → AI Gallery → search "openPangu 2.0" and subscribe to Flash or Pro.
Obtain API endpoint and token: Copy the inference endpoint and X-Auth-Token from the console; call in Chat Completions format.
Pull weights from GitCode (self-host): Visit gitcode.com/org/ascend-tribe and clone openPangu-2.0-Flash, openPangu-2.0-Infer, openPangu-2.0-Op, and related repos.
Run Ascend inference: On a single Ascend 910B execute python inference.py --model_path ./openPangu-Flash --device npu:0 --context_length 512000 --precision bf16. Flash-Int8 W4A8 cuts memory by 40% with under 10% accuracy loss.
Domain fine-tune with LoRA: python finetune.py --model_path ./openPangu-Pro --data_path ./domain_data --method lora --lora_rank 16. Pro multi-card distributed inference needs an 8-card Ascend cluster once July weights ship.
curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
-H "Content-Type: application/json" \
-H "X-Auth-Token: ${TOKEN}" \
-d '{
"model": "openpangu-2.0-flash",
"messages": [{"role": "user", "content": "Summarize openPangu 2.0 in three sentences"}],
"max_tokens": 1024,
"temperature": 0.7
}'
| Variant | Recommended hardware | Minimum config | Notes |
|---|---|---|---|
| Flash (6B active) | Single Ascend 910B | ~96GB unified memory | Community trials on large-memory systems |
| Flash-Int8 | Single Ascend Atlas A2 | ~48GB memory | W4A8 quantization |
| Pro (18B active) | 4+ Ascend 910B cards | Multi-card cluster | Validate after July weight release |
Why openPangu 2.0 matters: export controls, sovereign AI, and cite-ready data
Under US export controls restricting advanced AI chips (A100/H100) to China, openPangu 2.0 demonstrates that frontier-scale MoE training is achievable without NVIDIA. It anchors Huawei's sovereign AI stack: HarmonyOS 7 enters the Agent era with Framework 2.0 exceeding 90% success on complex tasks, and a 30B edge model runs locally on Kirin phones without cloud dependency.
The release ships under the Huawei openPangu License: commercial use permitted, royalty-free, non-exclusive (confirm exact terms on GitCode). For teams blocked from NVIDIA procurement or building domestic AI infrastructure, this is the most complete open alternative at frontier scale.
Open-source roadmap: 2026-06-30 Flash weights + inference + operators live; 2026-07 Pro weights planned; H2 2026 pre-training, post-training, and data tooling.
Flash sparsity efficiency: 92B total with only 6B active (~6.5% per token) — inference cost near a dense 6B model while retaining a 92B knowledge pool.
Flash-Int8 quantization: W4A8 cuts memory 40% with under 10% accuracy loss — viable on ~48GB memory configs.
Benchmark disclaimer: Some capability assessments in this article are architecture-informed estimates. Independent third-party benchmark results will be added when published. Article date: July 1, 2026.
If you are wiring Agent gateways, model routing layers, or iOS/macOS automation on a local Mac, sleep disconnects, memory ceilings, and unstable gateway processes are familiar pain points. For production environments that need 7×24 uptime running OpenClaw, Hermes, or similar Agent frameworks against openPangu APIs, MESHLAUNCH cloud Mac Mini rental is usually the better fit: dedicated Apple Silicon, elastic daily/weekly/monthly billing, and routing plus CI builds on the same always-on node.
Flash ships 92B total / 6B active and went live June 30 on GitCode — best for high-concurrency API serving. Pro targets 505B total / 18B active, planned for July, and suits ultra-long document analysis and continued pre-training. Both support 512K context.
GitCode Ascend Tribe: openPangu-2.0-Flash (weights), openPangu-2.0-Flash-Int8 (quantized), openPangu-2.0-Infer (inference), openPangu-2.0-Op (Ascend operators). Fastest trial is Huawei Cloud ModelArts API. For a stable Agent host while you integrate, see cloud Mac pricing.
Yes. openPangu 2.0 is the first frontier open model trained without NVIDIA hardware — entirely on Ascend 910B with CANN and torch_npu. It fits domestic compliance and Ascend-native deployment. For infrastructure planning, see the help center.
(1) Model architecture (2) Weights (3) Technical report (4) Inference code + training operators — live June 30 (5) Pre-training code (6) Post-training code SFT/RLHF (7) Ascend training operators. Items 5–7 are scheduled H2 2026 and are exceptionally rare at this MoE scale.