DeepSeek’s reasoning AI shows power of small models, efficiently trained

DeepSeek’s reasoning AI shows power of small models, efficiently trained

DeepSeek-R1, the AI model from Chinese startup DeepSeek, soared to the top of the charts of the most downloaded and active models on the AI open-source platform Hugging Face hours after its launch last week. It also sent shockwaves through the financial markets as it prompted investors to reconsider the valuations of chipmakers like NVIDIA and the colossal investments that American AI giants are making to scale their AI businesses.

Why all the buzz? A so-called “reasoning model,” DeepSeek-R1 is a digital assistant that performs as well as OpenAI’s o1 on certain AI benchmarks for math and coding tasks, was trained with far fewer chips and is approximately 96% cheaper to use, according to the company.

“DeepSeek is definitely reshaping the AI landscape, challenging giants with open-source ambition and state-of-the-art innovations,” says Kaoutar El Maghraoui, a Principal Research Scientist and Manager at IBM AI Hardware.

Meanwhile, ByteDance, the Chinese tech giant that owns TikTok, recently announced its own reasoning agent, UI-TARS, which it claims outperforms OpenAI’s GPT-4o, Anthropic’s Claude and Google’s Gemini on certain benchmarks. ByteDance’s agent can read graphical interfaces, reason and take autonomous, step-by-step action.

From startups to established giants, Chinese AI companies appear to be closing the gap with their American rivals, in large part thanks to their willingness to open source or share the underlying software code with other businesses and software developers. “DeepSeek has been able to proliferate some pretty powerful models across the community,” says Abraham Daniels, a Senior Technical Product Manager for IBM’s Granite model. DeepSeek-R1 is offered on Hugging Face under an MIT license that permits unrestricted commercial use. “DeepSeek could really accelerate AI democratization,” he says.

Last summer, Chinese company Kuaishou unveiled a video-generating tool that was like OpenAI’s Sora but available to the public out of the gates. Sora was unveiled last February but was only fully released in December and even then only those with a ChatGPT Pro subscription could access all of its features. Developers on Hugging Face have also snapped up new open-source models from the Chinese tech giants Tencent and Alibaba. While Meta has open-sourced its Llama models, both OpenAI and Google have pursued a predominantly closed-source approach to their model development.

Besides the boon of open source, DeepSeek engineers also used only a fraction of the highly specialized NVIDIA chips used by that of their American competitors to train their systems. DeepSeek engineers, for example, said they needed only 2,000 GPUs (graphic processing units), or chips, to train their DeepSeek-V3 model, according to a research paper they published with the model’s release.

link

Leave a Reply

Your email address will not be published. Required fields are marked *