Detailing requirements for new and improved MCTS #1591

oscardssmith · 2020-04-22T23:28:52Z

oscardssmith
Apr 22, 2020

General Goals

Policy should only influence behavior for nodes with few visits
Monotonic
Fewer hyper-parameters
FPU sort of feels like a hack. The only actual information that we have at 0 visits is the P, so the term that deals with policy shouldn't need a special case at n=0

At Root

Simple regret minimization

Current best effort
At root

In tree

AlexisOlson · 2020-04-22T23:35:23Z

AlexisOlson
Apr 22, 2020

Here's where the U term is currently implemented:

lc0/src/mcts/node.h

Lines 397 to 401 in 1eb0d56

    
           // Returns U = numerator * p / N. 
        
           // Passed numerator is expected to be equal to (cpuct * sqrt(N[parent])). 
        
           float GetU(float numerator) const { 
        
             return numerator * GetP() / (1 + GetNStarted()); 
        
           }

Here's where the passed argument is computed:

lc0/src/mcts/search.cc

Lines 1028 to 1029 in 1eb0d56

    
           const float puct_mult = 
        
               cpuct * std::sqrt(std::max(node->GetChildrenVisits(), 1u));

0 replies

farmersrice · 2020-04-23T01:32:34Z

farmersrice
Apr 23, 2020

Fewer hyper-parameters

This should probably be relaxed to fewer hyperparameters OR equal hyperparameters and decent strength gain.

Also it might be the case that pst can be removed (set to 1) with the new formulas, since changing pst = changing the sharpness of policies. In the new formula, the behavior you could get from pst is approximately the behavior you would get from changing c_1 and c_3 (policies becoming more or less important in the search).

0 replies

kiudee · 2020-04-23T14:51:34Z

kiudee
Apr 23, 2020
Collaborator

The formulas in the first post are supposed to minimize simple and cumulative regret. Can we list the corresponding scientific papers?

0 replies

oscardssmith · 2020-04-23T15:10:28Z

oscardssmith
Apr 23, 2020
Author

Good idea. Will do.

0 replies

Naphthalin · 2020-04-23T15:19:11Z

Naphthalin
Apr 23, 2020
Collaborator

https://link.springer.com/content/pdf/10.1007/s10472-011-9258-6.pdf is the paper given as the source of the current PUCT formula in the original AlphaGo paper.

0 replies

kiudee · 2020-04-23T15:54:13Z

kiudee
Apr 23, 2020
Collaborator

The original PUCT formula slightly resembles the one posted above, with a few deviations:

The variable Mi here is the policy of move i.

0 replies

isty2e · 2020-04-23T16:24:11Z

isty2e
Apr 23, 2020

Here is a slightly modified version of the formula, whose purpose is to have a few tunable parameters and removal of if-statements, and monotonicity of the m term.

c_att is added for numerical stability for p~0 cases, but can be removed. Probably this formula can be enhanced further for low-visit cases.

0 replies

Naphthalin · 2020-04-28T13:56:57Z

Naphthalin
Apr 28, 2020
Collaborator

Other PRs and issues concerning the PUCT formula and policies:

#796 #818 #791 #913 #918 #746 #743 #699 #1235 #1173 #1150

0 replies

Naphthalin · 2020-05-01T13:35:18Z

Naphthalin
May 1, 2020
Collaborator

As you can see I tried to collect all issues and PRs so far which discuss similar things, some of them showing that the people who have posted here so far aren't happy with the PUCT formula for quite a while :)

@oscardssmith what do you suggest on how to proceed here?

0 replies

LCZero

Detailing requirements for new and improved MCTS #1591

Uh oh!

Uh oh!

oscardssmith Apr 22, 2020

Replies: 9 comments

Uh oh!

Uh oh!

AlexisOlson Apr 22, 2020

Uh oh!

Uh oh!

farmersrice Apr 23, 2020

Uh oh!

kiudee Apr 23, 2020 Collaborator

Uh oh!

oscardssmith Apr 23, 2020 Author

Uh oh!

Naphthalin Apr 23, 2020 Collaborator

Uh oh!

kiudee Apr 23, 2020 Collaborator

Uh oh!

isty2e Apr 23, 2020

Uh oh!

Uh oh!

Naphthalin Apr 28, 2020 Collaborator

Uh oh!

Naphthalin May 1, 2020 Collaborator

oscardssmith
Apr 22, 2020

AlexisOlson
Apr 22, 2020

farmersrice
Apr 23, 2020

kiudee
Apr 23, 2020
Collaborator

oscardssmith
Apr 23, 2020
Author

Naphthalin
Apr 23, 2020
Collaborator

kiudee
Apr 23, 2020
Collaborator

isty2e
Apr 23, 2020

Naphthalin
Apr 28, 2020
Collaborator

Naphthalin
May 1, 2020
Collaborator