Detailing requirements for new and improved MCTS #1591
Replies: 9 comments
-
|
Here's where the U term is currently implemented: Lines 397 to 401 in 1eb0d56 Here's where the passed argument is computed: Lines 1028 to 1029 in 1eb0d56 |
Beta Was this translation helpful? Give feedback.
-
This should probably be relaxed to fewer hyperparameters OR equal hyperparameters and decent strength gain. Also it might be the case that pst can be removed (set to 1) with the new formulas, since changing pst = changing the sharpness of policies. In the new formula, the behavior you could get from pst is approximately the behavior you would get from changing c_1 and c_3 (policies becoming more or less important in the search). |
Beta Was this translation helpful? Give feedback.
-
|
The formulas in the first post are supposed to minimize simple and cumulative regret. Can we list the corresponding scientific papers? |
Beta Was this translation helpful? Give feedback.
-
|
Good idea. Will do. |
Beta Was this translation helpful? Give feedback.
-
|
https://link.springer.com/content/pdf/10.1007/s10472-011-9258-6.pdf is the paper given as the source of the current PUCT formula in the original AlphaGo paper. |
Beta Was this translation helpful? Give feedback.
-
|
The original PUCT formula slightly resembles the one posted above, with a few deviations: |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
As you can see I tried to collect all issues and PRs so far which discuss similar things, some of them showing that the people who have posted here so far aren't happy with the PUCT formula for quite a while :) @oscardssmith what do you suggest on how to proceed here? |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Current best effort
At root
In tree
Beta Was this translation helpful? Give feedback.
All reactions