Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ A collection of LLM with RL related papers for instruction following, reasoning,

[8] Laskin M, Wang L, Oh J, et al. **In-context reinforcement learning with algorithm distillation**[J]. arXiv preprint arXiv:2210.14215, 2022.[[link]](https://arxiv.org/pdf/2210.14215)


[9] Cheng P, Hu T, Xu H, et al. **Self-playing Adversarial Language Game Enhances LLM Reasoning**[J]. arXiv preprint arXiv:2404.10642, 2024. [[link]](https://arxiv.org/abs/2404.10642)

## RLHF(RL with Human Feedback)

Expand All @@ -45,6 +45,9 @@ A collection of LLM with RL related papers for instruction following, reasoning,

[9] Zhu B, Jiao J, Jordan M I. **Principled Reinforcement Learning with Human Feedback from Pairwise or $ K $-wise Comparisons**[J]. arXiv preprint arXiv:2301.11270, 2023.[[link]](https://arxiv.org/pdf/2301.11270)

[10] Cheng P, Yang Y, Li J, et al. **Adversarial Preference Optimization**. arXiv preprint arXiv:2311.08045, 2023.[[link]](https://arxiv.org/abs/2311.08045)


### Prompt-based but RL related

[1] Madaan A, Tandon N, Gupta P, et al. **Self-refine: Iterative refinement with self-feedback**[J]. arXiv preprint arXiv:2303.17651, 2023.[[link]](https://arxiv.org/pdf/2303.17651)
Expand Down