Chinese AI firm DeepSeek has unveiled a new DeepSeek-V3 model, an open-source large language model (LLM) with 671 billion parameters, surpassing Meta’s Llama 3.1, which has 405 billion parameters. Despite its size, DeepSeek emphasizes efficiency with a mixture-of-expert (MoE) architecture, allowing the model to activate only the parameters relevant to the task at hand, optimizing performance without compromising accuracy. Notably, DeepSeek-V3 is a text-based model, lacking multimodal capabilities.
Hosted on Hugging Face, the DeepSeek-V3 model focuses on efficient inference and cost-effective training. The researchers implemented Multi-head Latent Attention (MLA) and DeepSeekMoE architectures to achieve this. The model is pre-trained on 14.8 trillion tokens, utilizing supervised fine-tuning and reinforcement learning to ensure high-quality responses. With this architecture, the AI only activates the necessary parameters for each task, which increases speed and precision compared to typical models of similar size.
The model was fully trained in 2.788 million hours using Nvidia H800 GPUs. It also includes a load-balancing technique, first introduced in its predecessor, to prevent performance degradation. According to internal testing, DeepSeek-V3 outperforms Meta’s Llama 3.1 and Qwen 2.5 models on several benchmarks, including Big-Bench High-Performance (BBH), MMLU, HumanEval, and MATH. However, these results have not been independently verified.
The DeepSeek-V3 model’s remarkable 671 billion parameters make it one of the largest open-source LLMs. While models like Gemini 1.5 Pro surpass it with one trillion parameters, DeepSeek-V3 holds the record for the largest open-source model. The model's code is available under an MIT license on Hugging Face for both personal and commercial use, and an API is provided for developers to integrate it into their applications.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.