Efficient Reasoning with Balanced Thinking

Yulin Li^1†, Tengyao Tu^1,5†, Li Ding¹, Junjie Wang¹, Huiling Zhen², Yixin Chen⁴ Yong Li^3,5 Zhuotao Tian^1,6*
¹ Harbin Institute of Technology (Shenzhen)     ² Huawei Noah's Ark Lab     ³ Tsinghua University
⁴ The Chinese University of Hong Kong     ⁵ Zhongguancun Academy     ⁶ Shenzhen Loop Area Institute
^†Equal Contribution     ^*Corresponding Author

🏆 Why ReBalance

Balanced Thinking Unlocking Smarter Reasoning. Given the question ``For what real values of $x$ is $-4 < x^{4} + 4x^{2} < 21$?'', the model first obtains intervals $(-\sqrt{3}, 0)$ and $(0, \sqrt{3})$, and then verifies if $x = 0$ is included. However, the model redundantly checks irrelevant values after correctly validating $x = 0$, causing overthinking. Current mitigation methods overly suppress necessary reflection, leading to underthinking. Our ReBalance dynamically controls the reasoning state, effectively balancing these two extremes.
Superior Performance. ReBalance outperforms previous state-of-the-art methods across multiple mathematical reasoning datasets and model scales (0.5B–32B), reducing reasoning length while simultaneously improving accuracy.

🎯 Motivation

Effects of overthinking mitigation on reasoning modes. We compare the distributions of reasoning lengths for correct and incorrect predictions before and after applying overthinking mitigation methods. The reduction in reasoning lengths for correct and incorrect predictions indicates the degree to which overthinking is mitigated and underthinking is introduced, respectively. Existing methods significantly introduce underthinking, whereas our ReBalance effectively achieves a balanced reduction of both.
Correlation between confidence and reasoning modes. We observe that the overthinking samples exhibit higher confidence variance compared to normal samples, while underthinking samples show persistently high confidence levels.

🌈 Method

One-Pass Data Collection. We first perform offline one-pass data collection on a small-scale seen dataset. At each step, the steering vector is extracted at the first token of the specified layer based on confidence, and a dynamic function is fitted according to model behaviors.
Inference with Dynamic Steering. During deployment, the dynamic function outputs steering weights based on the model's real-time confidence online, thus balancing between overthinking and underthinking

🎨 Interactive Demo

The fitted model behavior-based dynamic control function is visualized as a 3D surface above. As confidence signals evolve, the control function adaptively adjusts steering weight, which in turn shifts the model between overthinking mitigation and underthinking prevention.

We warmly welcome you to try our interactive demo, where you can manipulate different confidence signals and directly observe how the control function's steering behavior changes and finally affects the model's reasoning state.

🎉 News

[2026.03.19] We release an interactive demo to intuitively showcase how our dynamic control function adjusts steering weights based on real-time model reasoning states. Try it out!
[2026.03.12] We Release the code and steering vectors for DeepSeek-R1-Distill-Qwen (1.5B, 7B), QwQ-32B, and openPangu-Embedded-7B-V1.1. Happy coding!
[2026.01.26] Our paper has been accepted by ICLR 2026🎖️.

🔥 TODO

Initialize Project.
Release the interactive demo.
Release the code and steering vectors for Qwen3-14B.

🚀 Quick Start

Easy Reproduction with Released Vectors

To facilitate quick deployment and reproducibility, we have released our pre-extracted steering vectors on Hugging Face 🤗.

Step 1. Download vectors from Hugging Face

Option 1: clone the full vector repository

git lfs install
git clone https://huggingface.co/Yulin-Li/ReBalance

Option 2: download only vectors/ with huggingface_hub

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Yulin-Li/ReBalance",
    repo_type="model",
    allow_patterns="vectors/*",
    local_dir="."
)

Then place the downloaded vectors/ folder under your local project root as:

ReBalance/
├── ...
├── transformer_inference_steer_dp.py
└── vectors/
    ├── DeepSeek-R1-Distill-Qwen-1.5B/
    │   └── steer_vector_layer19_conf_mixed.pt
    ├── DeepSeek-R1-Distill-Qwen-7B/
    │   └── steer_vector_layer22_conf_mixed.pt
    └── QwQ-32B/
        └── steer_vector_layer58_conf_mixed.pt

Step 2. Inference with dynamic steering

python transformer_inference_steer_dp.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --dataset_dir "./Data/" \
  --output_path "./outputs" \
  --dataset "Math_AIME2024" \
  --max_generated_tokens 16000 \
  --num_gpus 8 \
  --steer_vector_path ./vectors/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
  --steer_layer 19 \
  --steer_coef -1

Step 3. Merge multi-GPU shards

python merge_shards.py \
  --dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024 \
  --base 'steer_temp0.7_maxlen16000'

Step 4. Evaluate merged outputs

python check.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --data_name "Math_AIME2024" \
  --generation_path "./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024/steer_temp0.7_maxlen16000.merged.jsonl"

Extract Steering Vectors Yourself

To better understand the underlying mechanisms of ReBalance or apply it to a broader range of models, you can conveniently obtain a lightweight steering vector (e.g., only 22 KB for QwQ-32B) in a single pass over a small-scale seen dataset.

Step 1. Extract hidden states and model confidence signals

python transformer_inference_dp.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-7B' \
  --dataset_dir "./Data" \
  --dataset "Math_Train" \
  --output_path "./outputs" \
  --max_generated_tokens 16000 \
  --num_gpus 8 \
  --trust_remote_code

Step 2. Automated best-layer selection for confidence modeling

python hidden_config_ridge.py \
  --jsonl_path ./outputs/DeepSeek-R1-Distill-Qwen-7B/Math_Train/origin_temp0.7_maxlen16000.merged.jsonl \
  --hidden_dir ./outputs/DeepSeek-R1-Distill-Qwen-7B/Math_Train/ \
  --layers all \
  --max_files 500 \
  --expected_offset 1 \
  --alpha 1.0 \
  --pca_components 64 \
  --test_size 0.2 \
  --random_state 42

Step 3. Extract steering vectors with automatic calibration

python hidden_analysis_auto.py \
  --layer_id 19 \
  --jsonl_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_Train/origin_temp0.7_maxlen16000.merged.jsonl \
  --hidden_dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_Train \
  --save_path  ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
  --max_files 500 \
  --expected_offset 1

Step 4. Dynamic steering with your extracted vectors

python transformer_inference_steer_dp.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --dataset_dir "./Data/" \
  --output_path "./outputs" \
  --dataset "Math_AIME2024" \
  --max_generated_tokens 16000 \
  --num_gpus 8 \
  --steer_vector_path ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/steer_vector_layer19_conf_mixed.pt \
  --steer_layer 19 \
  --steer_coef -1

Step 5. Merge multi-GPU shards

python merge_shards.py \
  --dir ./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024 \
  --base 'steer_temp0.7_maxlen16000'

Step 6. Evaluate merged outputs

python check.py \
  --model_name_or_path 'DeepSeek-R1-Distill-Qwen-1.5B' \
  --data_name "Math_AIME2024" \
  --generation_path "./outputs/DeepSeek-R1-Distill-Qwen-1.5B/Math_AIME2024/steer_temp0.7_maxlen16000.merged.jsonl"

❤️ Acknowledgements

Our work builds upon the codebase of SEAL, DeepSeek-R1-Distill-Qwen, Qwen3, QwQ, and openPangu. We sincerely thank the authors for their remarkable contributions.

🙏 Citation

If you find ReBalance useful in your research, please cite our paper:

@article{li2026efficient,
  title={Efficient Reasoning with Balanced Thinking},
  author={Li, Yulin and Tu, Tengyao and Ding, Li and Wang, Junjie and Zhen, Huiling and Chen, Yixin and Li Yong and Tian, Zhuotao},
  booktitle={Proceedings of the 14th International Conference on Learning Representations},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Data		Data
assets		assets
code_evaluation		code_evaluation
modeling_utils		modeling_utils
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check.py		check.py
hidden_analysis_auto.py		hidden_analysis_auto.py
hidden_config_ridge.py		hidden_config_ridge.py
merge_shards.py		merge_shards.py
transformer_inference_dp.py		transformer_inference_dp.py
transformer_inference_steer_dp.py		transformer_inference_steer_dp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Reasoning with Balanced Thinking

📚 TABLE OF CONTENTS

🏆 Why ReBalance

🎯 Motivation

🌈 Method

🎨 Interactive Demo

🎉 News

🔥 TODO

🚀 Quick Start

Easy Reproduction with Released Vectors

Extract Steering Vectors Yourself

❤️ Acknowledgements

🙏 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Efficient Reasoning with Balanced Thinking

📚 TABLE OF CONTENTS

🏆 Why ReBalance

🎯 Motivation

🌈 Method

🎨 Interactive Demo

🎉 News

🔥 TODO

🚀 Quick Start

Easy Reproduction with Released Vectors

Extract Steering Vectors Yourself

❤️ Acknowledgements

🙏 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages