Huawei’S Supernode 384 Threatens Nvidia’S AI Reign With Massive Leap

Huawei’s Supernode 384 is shaking up the AI hardware scene with a whopping 300 petaFLOPs of computing muscle—166% more powerful than Nvidia’s GB200 NVL72. Despite guzzling 559 kW (Nvidia’s only needs 145 kW), this beast leverages 384 Ascend 910C processors in an all-to-all topology that’s apparently “a generation ahead.” China’s low electricity costs make the power-hungry approach viable. The David vs. Goliath battle for AI chip supremacy just got interesting.

While Nvidia has comfortably worn the crown in AI hardware for years, Huawei‘s new CloudMatrix 384 “Supernode” system is making Jensen Huang’s empire look a bit less invincible. The Chinese tech giant has assembled a behemoth featuring 384 Ascend 910C AI processors in a peer-to-peer architecture that delivers approximately 300 PetaFLOPs of dense BF16 compute performance—a whopping 166% higher than Nvidia’s GB200 NVL72.

Let’s be real: Huawei’s individual chips aren’t winning any races. Nvidia’s GB200 processors deliver triple the BF16 performance per chip, with more on-chip memory and faster bandwidth. But Huawei’s strategy isn’t about the solo performance—it’s about strength in numbers, like a less-talented basketball team that still dominates through sheer depth.

The Supernode‘s secret sauce? An all-to-all topology that replaces conventional Ethernet with high-speed bus connections, improving communications bandwidth by 15 times. This architectural wizardry slashes single-hop latency from 2 microseconds to 200 nanoseconds. Not too shabby.

When running AI models, the numbers get interesting. Meta’s LLaMA 3 achieved 132 tokens per second per card, while models from Alibaba’s Qwen and DeepSeek families reached up to 750 tokens per second per card—delivering 2.5 times better performance than traditional cluster architectures. The system has already been deployed in provinces across China, demonstrating its real-world viability beyond theoretical capabilities.

There’s a power consumption catch, though. The Supernode gulps down 559 kW compared to Nvidia’s relatively modest 145 kW, making Nvidia’s solution 2.3 times more power-efficient. But in China, where electricity costs are dropping to around $56 per MWh in some regions, performance trumps efficiency.

Industry analysts suggest Huawei’s scale-up solution is “arguably a generation ahead” of current offerings from Nvidia and AMD, despite using individual chips that are technically a generation behind. The system was explicitly designed to excel with Mixture-of-Experts models, addressing the growing complexity of modern AI workloads. With the global AI market projected to reach $1.85 trillion by 2030, innovations like the Supernode represent critical developments in the competitive landscape. It’s like watching David challenge Goliath—if David had 384 slingshots and didn’t mind a hefty electric bill.