Why DeepSeek Stands Out: An Analysis in Three Charts
DeepSeek, a Chinese artificial intelligence company that was relatively unknown until recently, has dramatically risen to prominence in the tech industry. This shift has come about after the company launched a series of large language models that have excelled in performance compared to some of the most recognized AI developers globally.
One of its most talked-about models, R1, was released on January 20. Remarkably, it climbed to the top of the Apple App Store rankings, overtaking OpenAI's long-reigning ChatGPT.
DeepSeek's rapid ascent has caused a stir in Silicon Valley, especially as the company claims it developed its models at a fraction of the cost, proving that efficient AI development might not require massive funding and resources.
Following R1 was its predecessor, V3, released in late December. On January 22, the company introduced another advanced model, Janus-Pro-7B, which offers multimodal capabilities, enabling it to process various types of media content.
Unique Features of DeepSeek's Models
DeepSeek's approach sets it apart in several key areas:
Model Size and Efficiency
Despite operating with a smaller team and significantly less funding compared to major U.S. tech companies, DeepSeek boasts a large, effective model that consumes fewer resources. This efficiency is achieved through a "mixture-of-experts" system, where the primary model is broken down into numerous specialized submodels, or "experts." Each expert is activated based on the specific task at hand, allowing for optimized resource usage.
For example, even though V3 contains 671 billion parameters—settings that the AI adjusts during learning—it engages only 37 billion of those parameters simultaneously. This innovative design means that DeepSeek can continually expand its pool of experts without slowing down the entire model.
The company employs a technique known as inference-time compute scaling, adjusting computational resources according to the task. This allows simpler questions to require fewer resources, while more complex queries can utilize the full model's power.
Cost of Training
In addition to model size, the cost and speed of training are crucial factors in DeepSeek’s success. While leading U.S. tech firms spend billions annually on AI, DeepSeek claims to have built V3 in less than $6 million and within two months. Restrictions on access to high-end Nvidia AI chips led DeepSeek to innovate using the less powerful Nvidia H800 chips.
A key advance in DeepSeek's development is its mixed precision framework. This framework mixes full-precision 32-bit calculations with less memory-intensive 8-bit calculations. By using the less accurate but faster 8-bit computations for most operations and reserving high precision for critical tasks, the company effectively saves time and resources.
In many ways, these constraints have spurred significant innovation, suggesting that AI developers could achieve more with fewer resources.
Performance Benchmarks
Despite its limited resources, DeepSeek's performance metrics are impressive, capable of matching performances from leading AI models in the U.S. For instance, the R1 model is in close competition with OpenAI's o1 in various independent AI quality assessments.
R1 has already surpassed several other prominent models like Google’s Gemini 2.0 Flash and Anthropic’s Claude 3.5 Sonnet. Its notable ability is its use of chain-of-thought reasoning, a method that dissects complex tasks into smaller, manageable segments, allowing for logical backtracking similar to human problem-solving.
The preceding model, V3, also demonstrated competitive capabilities upon its launch, outperforming notable competitors. Furthermore, the newest addition, Janus-Pro-7B, is reported to have exceeded the performance of well-known models in multiple fields.
In summary, DeepSeek is not just a rising company; it is changing perceptions about what is possible within the AI development space.
AI, DeepSeek, Innovation