At AWS’s annual Re:Invent conference this week, CEO Adam Selipsky and Nvidia founder Jensen Huang shared the stage to provide an in-depth look at AWS’s expanding partnership with Nvidia and overall strategy for generative AI.
Selipsky explained AWS’s “three macro layers” approach – investing in infrastructure, access tools, and applications across the generative AI stack. “We think about generative AI as having actually Three macro layers, if you will, of a stack, and they’re all equally important. And we are investing in all 3 of them,” he said.
On the infrastructure front, AWS announced it will be the first cloud provider to offer Nvidia’s new GH100 “Grace Hopper” GPUs, providing up to 4x faster large language model inference. “The h two hundred, this is really amazing thing. The combination between the brand new TensorRT, LLM optimizing compilers for Generative AI and h two hundred improves the throughput of inference, large language model inference, by a factor of 4, reducing the cost In just 1 year by a factor of 4,” said Huang.
AWS also unveiled Graviton 4, its latest generation ARM-based server processor, claiming 50% faster performance over Graviton 3. And it announced Trainium 2, its second-generation AI training chip promising up to 4x faster training for large language models.
On the access and services layer, AWS is bringing Nvidia’s DGX Cloud to its platform. “We realized almost 10 years ago that if we wanted to continue to push the envelope on price performance for all of your workloads, We had to reinvent general purpose computing for the cloud era all the way down to the silicon,” noted Selipsky.
And at the application level, AWS launched Amazon Q, a new generative AI assistant service that connects to data sources and business systems to answer natural language questions. “With the foundation models, there are 2 main types of workloads, training and inference. Training is to create and prove FMs by learning patterns from large amounts of training data,” Selipsky explained regarding Amazon Q’s knowledge.
The announcements highlight AWS’s aggressive push into generative AI across infrastructure, services and applications – aiming to provide customers a complete platform for leveraging large language models and other emerging techniques. It also continues AWS’s long-running collaboration with Nvidia, spanning over a decade of GPU innovation in the cloud.
“It’s so early in the game, and we’re incredibly excited about what we’re gonna be able to do together,” concluded Selipsky. But with towering investments in specialized hardware and models for training and inference, AWS is positioning itself to be a dominant player in next-generation AI.