1 Harbin Institute of Technology (Shenzhen) 2 Huawei Noah's Ark Lab 3 Tsinghua University
4 The Chinese University of Hong Kong 5 Zhongguancun Academy 6 Shenzhen Loop Area Institute
†Equal Contribution *Corresponding Author
-
Balanced Thinking Unlocking Smarter Reasoning. Given the question ``For what real values of
$x$ is$-4 < x^{4} + 4x^{2} < 21$ ?'', the model first obtains intervals$(-\sqrt{3}, 0)$ and$(0, \sqrt{3})$ , and then verifies if$x = 0$ is included. However, the model redundantly checks irrelevant values after correctly validating$x = 0$ , causing overthinking. Current mitigation methods overly suppress necessary reflection, leading to underthinking. Our ReBalance dynamically controls the reasoning state, effectively balancing these two extremes. - Superior Performance. ReBalance outperforms previous state-of-the-art methods across multiple mathematical reasoning datasets and model scales (0.5B–32B), reducing reasoning length while simultaneously improving accuracy.
- Effects of overthinking mitigation on reasoning modes. We compare the distributions of reasoning lengths for correct and incorrect predictions before and after applying overthinking mitigation methods. The reduction in reasoning lengths for correct and incorrect predictions indicates the degree to which overthinking is mitigated and underthinking is introduced, respectively. Existing methods significantly introduce underthinking, whereas our ReBalance effectively achieves a balanced reduction of both.
- Correlation between confidence and reasoning modes. We observe that the overthinking samples exhibit higher confidence variance compared to normal samples, while underthinking samples show persistently high confidence levels.
- One-Pass Data Collection. We first perform offline one-pass data collection on a small-scale seen dataset. At each step, the steering vector is extracted at the first token of the specified layer based on confidence, and a dynamic function is fitted according to model behaviors.
- Inference with Dynamic Steering. During deployment, the dynamic function outputs steering weights based on the model's real-time confidence online, thus balancing between overthinking and underthinking
The fitted model behavior-based dynamic control function is visualized as a 3D surface above. As confidence signals evolve, the control function adaptively adjusts steering weight, which in turn shifts the model between overthinking mitigation and underthinking prevention.
We warmly welcome you to try our interactive demo, where you can manipulate different confidence signals and directly observe how the control function's steering behavior changes and finally affects the model's reasoning state.
- [2026.03.19] We release an interactive demo to intuitively showcase how our dynamic control function adjusts steering weights based on real-time model reasoning states. Try it out!
- [2026.03.12] We Release the code and steering vectors for DeepSeek-R1-Distill-Qwen (1.5B, 7B), QwQ-32B, and openPangu-Embedded-7B-V1.1. Happy coding!
- [2026.01.26] Our paper has been accepted by ICLR 2026🎖️.
- Initialize Project.
- Release the interactive demo.
- Release the code and steering vectors for Qwen3-14B.
To facilitate quick deployment and reproducibility, we have released our pre-extracted steering vectors on Hugging Face 🤗.
Step 1. Download vectors from Hugging Face
Option 1: clone the full vector repository
git lfs install
git clone https://huggingface.co/Yulin-Li/ReBalanceOption 2: download only vectors/ with huggingface_hub
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Yulin-Li/ReBalance",
repo_type="model",
allow_patterns="vectors/*",
local_dir="."
)Then place the downloaded vectors/ folder under your local project root as:
ReBalance/
├── ...
├── transformer_inference_steer_dp.py
└── vectors/
├── DeepSeek-R1-Distill-Qwen-1.5B/
│ └── steer_vector_layer19_conf_mixed.pt
├── DeepSeek-R1-Distill-Qwen-7B/
│ └── steer_vector_layer22_conf_mixed.pt
└── QwQ-32B/
└── steer_vector_layer58_conf_mixed.pt
Step 2. Inference with dynamic steering
python transformer_inference_steer_dp.py \
--model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
--dataset_dir "./Data/" \
--output_path "./outputs" \
--dataset "Math_AIME2024" \
--max_generated_tokens 16000 \
--num_gpus 8 \
--steer_vector_path ./vectors/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
--steer_layer 19 \
--steer_coef -1Step 3. Merge multi-GPU shards
python merge_shards.py \
--dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024 \
--base 'steer_temp0.7_maxlen16000'Step 4. Evaluate merged outputs
python check.py \
--model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
--data_name "Math_AIME2024" \
--generation_path "./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024/steer_temp0.7_maxlen16000.merged.jsonl"To better understand the underlying mechanisms of ReBalance or apply it to a broader range of models, you can conveniently obtain a lightweight steering vector (e.g., only 22 KB for QwQ-32B) in a single pass over a small-scale seen dataset.
Step 1. Extract hidden states and model confidence signals
python transformer_inference_dp.py \
--model_name_or_path 'DeepSeek-R1-Distill-Qwen-7B' \
--dataset_dir "./Data" \
--dataset "Math_Train" \
--output_path "./outputs" \
--max_generated_tokens 16000 \
--num_gpus 8 \
--trust_remote_codeStep 2. Automated best-layer selection for confidence modeling
python hidden_config_ridge.py \
--jsonl_path ./outputs/DeepSeek-R1-Distill-Qwen-7B/Math_Train/origin_temp0.7_maxlen16000.merged.jsonl \
--hidden_dir ./outputs/DeepSeek-R1-Distill-Qwen-7B/Math_Train/ \
--layers all \
--max_files 500 \
--expected_offset 1 \
--alpha 1.0 \
--pca_components 64 \
--test_size 0.2 \
--random_state 42Step 3. Extract steering vectors with automatic calibration
python hidden_analysis_auto.py \
--layer_id 19 \
--jsonl_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_Train/origin_temp0.7_maxlen16000.merged.jsonl \
--hidden_dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_Train \
--save_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
--max_files 500 \
--expected_offset 1Step 4. Dynamic steering with your extracted vectors
python transformer_inference_steer_dp.py \
--model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
--dataset_dir "./Data/" \
--output_path "./outputs" \
--dataset "Math_AIME2024" \
--max_generated_tokens 16000 \
--num_gpus 8 \
--steer_vector_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
--steer_layer 19 \
--steer_coef -1Step 5. Merge multi-GPU shards
python merge_shards.py \
--dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024 \
--base 'steer_temp0.7_maxlen16000'Step 6. Evaluate merged outputs
python check.py \
--model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
--data_name "Math_AIME2024" \
--generation_path "./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024/steer_temp0.7_maxlen16000.merged.jsonl"Our work builds upon the codebase of SEAL, DeepSeek-R1-Distill-Qwen, Qwen3, QwQ, and openPangu. We sincerely thank the authors for their remarkable contributions.
If you find ReBalance useful in your research, please cite our paper:
@article{li2026efficient,
title={Efficient Reasoning with Balanced Thinking},
author={Li, Yulin and Tu, Tengyao and Ding, Li and Wang, Junjie and Zhen, Huiling and Chen, Yixin and Li Yong and Tian, Zhuotao},
booktitle={Proceedings of the 14th International Conference on Learning Representations},
year={2026}
}



