Science - Technology

China launches AI specializing in math, aiming to surpass ChatGPT and Gemini

TH (according to VTC News) • August 12, 2024 13:50

The large language model Qwen2-Math developed by Alibaba is expected to help solve complex math problems.

Công cụ AI chuyên giải toán của Alibaba đã vượt qua GPT-4o, Claude 3.5 Sonnet hay Gemini. (Ảnh minh họa: Shutterstock) — Alibaba's math-solving AI tool has surpassed GPT-4o, Claude 3.5 Sonnet or Gemini

Alibaba is aiming to raise the bar in AI development by launching a set of large language models (LLMs) dedicated to mathematics called Qwen2-Math, which the e-commerce giant says can outperform GPT-4o.

"Over the past year, we have spent significant effort researching and improving the reasoning capabilities of large language models, with a particular focus on their ability to solve numerical problems," the Qwen team shared on the developer platform GitHub recently.

Alibaba's big language models were released in June. There are three versions of the models, which differ in the number of parameters they use. Parameters are variables that help the AI learn how to produce the correct output from given data.

According to the Qwen team's post, the model with the largest number of parameters, Qwen2-Math-72B-Instruct, outperformed proprietary US-developed LLMs on measures of mathematical ability. Those LLMs included GPT-4o, Anthropic's Claude 3.5 Sonnet, Google's Gemini 1.5 Pro, and Meta Platforms' Llama-3.1-405B.

"We hope that Qwen2-Math can contribute to the community in solving complex mathematical problems," the development team said.

According to the post, the Qwen2-Math AI models were tested on both English and Chinese math benchmarks. These included GSM8K, a dataset of 8,500 linguistically diverse advanced elementary school math problems; OlympiadBench, a high-level bilingual multimodal science benchmark; and gaokao, China’s notoriously difficult college entrance exam.

In July, Qwen2-72B-Instruct ranked behind only GPT-4o and Claude 3.5 Sonnet in the LLM rankings from SuperClue, a platform that evaluates models based on parameters such as computational power, logical reasoning, encoding, and text understanding, among others.

The gap between China and the US AI models appears to be narrowing, according to SuperClue, which said China has made significant progress in developing domestic LLM in the first half of this year.

A separate test published in July by LMSYS — an AI modeling research organization backed by the University of California, Berkeley — found Qwen2-72B ranked 20th, while proprietary models from OpenAI, Anthropic, and Google took most of the top 10 spots.

TH (according to VTC News)