{"id":1770,"date":"2025-02-05T10:03:38","date_gmt":"2025-02-05T10:03:38","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2025\/02\/05\/tinyzero-researchers-replicated-deepseeks-r1-zero-model-for-just-30\/"},"modified":"2025-02-05T10:03:38","modified_gmt":"2025-02-05T10:03:38","slug":"tinyzero-researchers-replicated-deepseeks-r1-zero-model-for-just-30","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2025\/02\/05\/tinyzero-researchers-replicated-deepseeks-r1-zero-model-for-just-30\/","title":{"rendered":"TinyZero \u2013 Researchers Replicated DeepSeek\u2019s R1-Zero Model for Just $30"},"content":{"rendered":"<p>    TinyZero \u2013 Researchers Replicated DeepSeek\u2019s R1-Zero Model for Just $30<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>In an impressive demonstration of cost-effective AI research, a group of researchers has successfully replicated <a href=\"https:\/\/cybersecuritynews.com\/new-jailbreak-techniques-expose-deepseek-llm-vulnerabilities\/\" target=\"_blank\" rel=\"noreferrer noopener\">DeepSeek\u2019s R1-Zero<\/a> model for just $30. <\/p>\n<p>Dubbed <strong>TinyZero<\/strong>, this project focuses on countdown and multiplication tasks, leveraging reinforcement learning (RL) to enable a 3-billion-parameter (3B) base language model (LM) to develop self-verification and search abilities autonomously.<\/p>\n<p>Built on the <a href=\"https:\/\/github.com\/volcengine\/verl\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">veRL framework<\/a>, TinyZero showcases how reinforcement learning can help large language models (LLMs) evolve reasoning capabilities independently. <\/p>\n<p>The researchers behind this project highlight an <strong>\u201cAha!\u201d moment<\/strong> that users can experience firsthand with minimal computational costs.<\/p>\n<p>For those interested in exploring the methodology, a detailed experiment log is available on <a href=\"https:\/\/wandb.ai\/jiayipan\/TinyZero\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Weights &amp; Biases<\/a>, with further insights shared in a <a href=\"https:\/\/x.com\/jiayi_pirate\/status\/1882839370505621655\">Twitter thread<\/a>. The team has also confirmed that a formal research paper is forthcoming.<\/p>\n<p>The<a href=\"https:\/\/github.com\/Jiayi-Pan\/TinyZero?fbclid=IwY2xjawIP4OVleHRuA2FlbQIxMAABHSUbdbau5YaNkJH8CyHklh-CUOcSKd6JUSzxtRJdAlxCClH_g7SsvIZyqQ_aem_-ofKtIRDXkINyzpakIuyGA\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"> research team<\/a> selected the \u201ccountdown game\u201d as their test environment, a mathematical challenge where the AI generates equations from a set of numbers to reach a specific target. <\/p>\n<p>This game is ideal for testing problem-solving capabilities, as it requires logical reasoning and strategic trial-and-error to improve over time. Initially, the model produced random outputs with no clear strategy. <\/p>\n<p>However, through reinforcement learning, <a href=\"https:\/\/www.linkedin.com\/pulse\/how-researchers-replicated-deepseeks-r1-zero-model-30-o-connor-mis-bx2dc\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">it gradually refined its approach<\/a>, developing logical reasoning skills independently.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Running TinyZero: Installation and Setup<\/strong><\/h2>\n<p>To replicate TinyZero, users can follow a straightforward setup process:<\/p>\n<h4 class=\"wp-block-heading\"><strong>Installation Steps<\/strong><\/h4>\n<ol start=\"1\" class=\"wp-block-list\">\n<li>\n<strong>Create Environment:<\/strong><code>conda create -n zero python=3.9<\/code>\n<\/li>\n<li>\n<strong>Install Torch (optional):<\/strong><code>pip install torch==2.4.0 --index-url https:\/\/download.pytorch.org\/whl\/cu121<\/code>\n<\/li>\n<li>\n<strong>Install vLLM:<\/strong><code>pip3 install vllm==0.6.3<\/code>\n<\/li>\n<li>\n<strong>Install veRL and Dependencies:<\/strong><code>pip install -e . pip3 install flash-attn --no-build-isolation pip install wandb IPython matplotlib<\/code>\n<\/li>\n<\/ol>\n<h2 class=\"wp-block-heading\"><strong>Countdown Task: Training TinyZero<\/strong><\/h2>\n<h4 class=\"wp-block-heading\"><strong>Data Preparation<\/strong><\/h4>\n<p>Activate the environment and preprocess the dataset:<\/p>\n<pre class=\"wp-block-code\"><code>conda activate zero\npython .\/examples\/data_preprocess\/countdown.py --local_dir {path_to_your_dataset}<\/code><\/pre>\n<h4 class=\"wp-block-heading\"><strong>Training on a Single GPU<\/strong><\/h4>\n<p>For models up to <strong>1.5B<\/strong> parameters, a single GPU setup works effectively:<\/p>\n<pre class=\"wp-block-code\"><code>export N_GPUS=1\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=1\nexport EXPERIMENT_NAME=countdown-qwen2.5-0.5b\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash .\/scripts\/train_tiny_zero.sh<\/code><\/pre>\n<h4 class=\"wp-block-heading\"><strong>Scaling Up: Training a 3B+ Model<\/strong><\/h4>\n<p>For larger models that exhibit more advanced reasoning skills, a two-GPU configuration is recommended:<\/p>\n<pre class=\"wp-block-code\"><code>export N_GPUS=2\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=2\nexport EXPERIMENT_NAME=countdown-qwen2.5-3b\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash .\/scripts\/train_tiny_zero.sh<\/code><\/pre>\n<h2 class=\"wp-block-heading\"><strong>Instruct Ablation: Experimenting with Qwen-2.5-3B<\/strong><\/h2>\n<p>The team also experimented with an instruction-tuned version of Qwen-2.5-3B. This requires additional data preprocessing:<\/p>\n<pre class=\"wp-block-code\"><code>conda activate zero\npython examples\/data_preprocess\/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}<\/code><\/pre>\n<p>Training follows a similar two-GPU setup:<\/p>\n<pre class=\"wp-block-code\"><code>export N_GPUS=2\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=2\nexport EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash .\/scripts\/train_tiny_zero.sh<\/code><\/pre>\n<p>TinyZero was developed based on the veRL framework and employs the Qwen2.5 series base models. The research team, comprising Jiayi Pan, Junjie Zhang, Xingyao Wang, Lifan Yuan, Hao Peng, and Alane Suhr, has made the project open-source, accessible on GitHub <a href=\"https:\/\/github.com\/Jiayi-Pan\/TinyZero\">here<\/a>.<\/p>\n<p>With the success of TinyZero, this experiment demonstrates that state-of-the-art AI capabilities can be developed and studied on a <strong>remarkably small budget,<\/strong> potentially paving the way for more affordable AI research.<\/p>\n<p class=\"has-text-align-center has-background\" style=\"background:linear-gradient(135deg,rgb(238,238,238) 100%,rgb(169,184,195) 100%)\"><strong><code><strong>Find this News Interesting! Follow us on\u00a0<a href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMOffpwsw1Oq_Aw\" target=\"_blank\" rel=\"noreferrer noopener\">Google News<\/a>,\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/cybersecurity-news\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a>, and\u00a0<a href=\"https:\/\/x.com\/The_Cyber_News\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a>\u00a0to Get Instant Updates<\/strong><\/code><\/strong><\/p>\n<p>The post <a href=\"https:\/\/cybersecuritynews.com\/tinyzero\/\">TinyZero \u2013 Researchers Replicated DeepSeek\u2019s R1-Zero Model for Just $30<\/a> appeared first on <a href=\"https:\/\/cybersecuritynews.com\/\">Cyber Security News<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Balaji N<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/cybersecuritynews.com\/tinyzero\/\">Go to cyber-security-news<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>TinyZero \u2013 Researchers Replicated DeepSeek\u2019s R1-Zero Model for Just $30 In an impressive demonstration of cost-effective AI research, a group of researchers has successfully replicated DeepSeek\u2019s R1-Zero model for just $30. Dubbed TinyZero, this project focuses on countdown and multiplication tasks, leveraging reinforcement learning (RL) to enable a 3-billion-parameter (3B) base language model (LM) to [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[726,129,63],"tags":[130],"class_list":["post-1770","post","type-post","status-publish","format-standard","hentry","category-cyber-ai","category-cyber-security","category-cyber-security-news","tag-cyber-security-news"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/1770"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=1770"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/1770\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=1770"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=1770"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=1770"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}