SPSA improvements [RFC]

Issue opened to collect info about possible future SPSA improvements.

### SPSA references
SPSA is a fairly simple algorithm to be used for local optimization (not global optimization).
The wiki has now a simple documentation to explain the [SPSA implementation in fishtest](https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#tuning-with-spsa)
Here is other documentation:
* [SPSA seminal paper](https://www.jhuapl.edu/SPSA/PDF-SPSA/Spall_Implementation_of_the_Simultaneous.PDF) cited by @zamar in the forum
* [SPSA website](https://www.jhuapl.edu/SPSA/)
* [wikipedia SPSA page](https://en.wikipedia.org/wiki/Simultaneous_perturbation_stochastic_approximation)

### SPSA implementation problems/improvements
- we ask for "c_k_end" and "r_k_end" (final parameters values), but IMO we should ask for for "c" and  "r" (starting values) if those are too big the SPSA diverges
- we use "r_k = a_k / c_k^2" instead of "r_k = a_k" (I searched unsuccessfully a reference in some SPSA papers)
- we set "c_k_end" and "r_k_end" for any single variable to be optimized (the original SPSA uses global values): this makes sense to account for the different sensitivity of the variables, but IMO this should be dealt with an internal normalization of the variables values based upon the starting values and the bounds.
- one iteration should be set to a 2 games for match, but our worker code cannot support this, so we set one iteration to a 2*N_cores games for match
- compute an averaged SP gradient per iteration to lower the noise
- we have experimental code (special rounding and clipping) that nobody use: I'm afraid that it's theoretically correct but not very useful for the rough way we use SPSA
- "A" parameter should be computed from the number of games
- the worker passes rounded values to cutechess-cli: we should normalize the variables values to have the same resolution for all the variables

### SPSA testing process (aka Time Control)
-------
EDIT_000
this paragraph is outdated, I kept it to avoid disrupting the chain of posts:
- read the wiki for a SPSA description <https://github.com/glinscott/fishtest/wiki/Creating-my-first-test#tuning-with-spsa>
- the experience on these last years has shown that a very short time control on fishtest is not working:
  - with NNUE, workers running on dual CPU have time losses at ultra short time control (USTC)
  - that SPSA using or LTC or even ULTC has a high Signal/Noise ratio that helps the convergence. A ULTC match is very drawish, so in SPSA one side will win a pair of games only if the parameters random increments are somehow aligned with the gradient direction
-------

I suggest this process to optimize the developer time and the framework CPU.
- first steps: run some SPSAs at Ultra STC (e.g. 1+0.01) to find good "c_k_end", "r_k_end" values and some good variables starting values. This can be done or locally with a recent CPU or in fishtest.
- last step: run a final SPSA in fishtest to optimize the variables for a longer TC (e.g. STC, 20+0.2, LTC etc.)

I took a SPSA from fishtest and run it locally changing only the the TC, the results are similar:

* 20+0.2 - original test on fishtest:
https://tests.stockfishchess.org/tests/view/5e2dade6ab2d69d58394fb5e

![20+02](https://user-images.githubusercontent.com/15718418/73590094-b480f480-44de-11ea-9cb4-8a053216980b.jpg)

* 2+0.02:

![2+002](https://user-images.githubusercontent.com/15718418/73673775-8acaf780-46af-11ea-82f6-9509a1aa5bd7.jpg)


* 1+0.01::

![1+001](https://user-images.githubusercontent.com/15718418/73673788-8ef71500-46af-11ea-936d-7c0376f04282.jpg)


* 0.5+0.01:

![05+001](https://user-images.githubusercontent.com/15718418/73673806-94545f80-46af-11ea-9110-30c1d7b6abe6.jpg)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPSA improvements [RFC] #535

SPSA references

SPSA implementation problems/improvements

SPSA testing process (aka Time Control)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SPSA improvements [RFC] #535

Description

SPSA references

SPSA implementation problems/improvements

SPSA testing process (aka Time Control)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions