|
|
|
|
|
<br>DeepSeek-R1 is based on DeepSeek-V3, a mix of specialists (MoE) design recently open-sourced by DeepSeek. This base design is fine-tuned utilizing Group Relative Policy Optimization (GRPO), [setiathome.berkeley.edu](https://setiathome.berkeley.edu/view_profile.php?userid=11857434) a reasoning-oriented variant of RL. The research group likewise carried out understanding distillation from DeepSeek-R1 to [open-source](https://git.137900.xyz) Qwen and Llama designs and released a number of versions of each |