Science - Technology

Elon Musk Creates Largest AI Training Supercomputer in 122 Days

TH (according to VnExpress) September 4, 2024 17:20

Colossus, the supercomputer that Elon Musk said is "the world's largest for AI training", was operational after only 4 months of implementation.

Charles Liang, CEO Supermicro chia sẻ ảnh chụp cùng Elon Musk tại một trung tâm dữ liệu ngày 2/7. Ảnh: X/Charles Liang
Charles Liang, CEO of Supermicro, shares a photo with Elon Musk at a data center on July 2.

"This week, xAI launched the Colossus training cluster with 100,000 H100 chips. From start to finish, everything was completed in 122 days," Elon Musk wrote on X on September 3.

Colossus is the world's most powerful AI training system, but xAI will soon double its computing power to 200,000 H100 chips or buy 50,000 new H200 chips "in the next few months," according to the American billionaire. The H200 is currently Nvidia's most powerful AI chip, twice as powerful as the current H100.

In addition, Musk also thanked the Nvidia team and partners and suppliers who helped xAI complete the work on schedule.

Musk’s move to launch Colossus was swift, given the scale of the supercomputer. In March, the billionaire announced plans to build a so-called “Gigafactory of Compute” to train the Grok AI. In May, Musk said he would personally ensure the supercomputer’s development stayed on track. A month later, he chose Memphis, Tennessee, despite numerous setbacks.

According toFortuneWith around 100,000 Nvidia H100 chips, Musk’s center has the largest number of GPUs and is larger than any other known single AI computing cluster. If it were to increase to 200,000 chips, it would further solidify its position as the world’s largest AI training supercomputer.

The Memphis supercomputer cluster is expected to train Musk's third generation of Grok, called Grok-3. In July, he shared on a Jordan Peterson podcast that "Grok-3 will be introduced in December and will be the most powerful AI in the world when it is released."

The first beta version of Grok-2 was released to users last month. This AI is trained on a system of 15,000 H100 chips. According to data released by Imsys.org on August 24, this AI is currently very powerful, only behind ChatGPT with OpenAI's GPT-4o and Google's Gemini 1.5 Pro in terms of computing power, higher than Meta's Llama 3.1 with 405 billion parameters.

According toBusiness Insider, with the new announcement, the AI ​​race between Elon Musk and Meta CEO Mark Zuckerberg will become more interesting, at least through the ownership of the H100 chip. Previously, Street Capital estimates showed that Musk's companies had 135,000 chips, while Zuckerberg's company had 350,000 chips.

In January, Zuckerberg said Meta would have a stockpile of 600,000 chips by the end of the year. Meta has not yet disclosed exactly how many it has purchased. In July, the company said Llama 3 had been trained on 16,000 chips and was building a 24,000-chip cluster to develop more advanced models.

TH (according to VnExpress)
(0) Comments
Latest News
Elon Musk Creates Largest AI Training Supercomputer in 122 Days