PyTorch_Adam_vs_LBFGS

Curve fitting comparison between Adam and L-BFGS optimizer

Motivation

I'm studying the NN tools for theoretical chemistry simulation, especially potential energy surface (PES) fitting.

At first, I chose TensorFlow for NN simulation. I have successfully constructed a Diabatic PES in TensorFlow 2.4 with adam optimizer, and the result has been published in J. Chem. Phys. 155, 214102 (2021). However, some of the reviews said the second optimizer, like Levenberg-Marquardt, can provide better convergence results efficiently. In my previous simulation, it usually takes almost 10^7 epochs in a week for convergency, which can be 10^3 level in L-M optimizer in review. So, I want to test the performance of the second optimizer in the regression problem.

previous comparison

Fabio Di Marco has compared Levenberg-Marquardt and Adam with TensorFlow. The target function is sinc function.
Soham Pal has compared L-BFGS and Adam with PyTorch in linear regression problem.
NN-PES review has compared some optimizers but it lacks details. And matlab has more study costs (in my point of view).

L-BFGS in PyTorch

Since TensorFlow does not have an official second optimizer, I will use pyTorch L-BFGS optimizer in this test.

You can find some information about L-BFGS algorithms on many websites, and I will not discuss this. However, when you use L-BFGS in PyTorch, you need to define a 'closure' function for gradient evaluation. I'm not so familiar with optimization algorithms, and simply follow the code written by Soham Pal. The 'train' function will be:

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    lm_lbfgs=model.to(device)
    #spacial function for LBFGS
    for batch, (X, y) in enumerate(dataloader):
    ¦   x_ = Variable(X, requires_grad=True)
    ¦   y_ = Variable(y)
    ¦   def closure():
    ¦   ¦   # Zero gradients
    ¦   ¦   optimizer.zero_grad()
    ¦   ¦   # Forward pass
    ¦   ¦   y_pred = lm_lbfgs(x_)
    ¦   ¦   # Compute loss
    ¦   ¦   loss = loss_fn(y_pred, y_)
    ¦   ¦   # Backward pass
    ¦   ¦   loss.backward()
    ¦   ¦   return loss

    ¦   optimizer.step(closure)
    ¦   loss=closure()

    ¦   if batch % train_size == 0:
    ¦   ¦   loss, current = loss.item(), batch * len(X)
    ¦   ¦   print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
    return loss_train

Also, I use strong_wolfe option. Otherwise, the loss function will become very large (I don't know the reason).

optimizer_lbfgs= torch.optim.LBFGS(model.parameters(), lr=1,
    ¦   history_size=100, max_iter=20,
    ¦   line_search_fn="strong_wolfe"
    ¦   )

Code

The code for this test can be found in optimizer_test/

Fitting detail:

NN structure

I compared two NN structures:

One hidden layer with 20 neurons. Linear output. I use t20 to denote this situation while "t" means using tanh for activation function.
Two hidden layers with 20 neurons each. Linear output. I use t20-t20 to denote this.

Train data

I use 20000 sampled points from Sinc function: x in [-1.1], y=sinc(x)=( 1 if x=0 or sin(x)/x if else )

And 80% of data was randomly chosen for training.

Result

prediction plot

It is not surprise that adam t20 perform worst, and adam t20-t20 seems has the same performance with l-bfgs

However, if we zoom into boundary:

The green line adam t20-t20 derivate a lot from the target.

Training loss function

The loss decay curve in log() can illustrate the fitting error better.

Adam t20-t20 is still worse than lbfgs t20 in several orders of magnitude.

computational cost

Almost the same due to relative small network. However, second-order optimizer commmonly need more memory for gradient.

Conclusion

Please try second-order optimizer in regression problems if possible, especially for small networks.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
optimizer_test		optimizer_test
LICENSE		LICENSE
README.md		README.md
adam_vs_lbfgs_in_log.png		adam_vs_lbfgs_in_log.png
line_plot.png		line_plot.png
line_plot_boundary.png		line_plot_boundary.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch_Adam_vs_LBFGS

Motivation

previous comparison

L-BFGS in PyTorch

Code

Fitting detail:

NN structure

Train data

Result

prediction plot

Training loss function

computational cost

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyTorch_Adam_vs_LBFGS

Motivation

previous comparison

L-BFGS in PyTorch

Code

Fitting detail:

NN structure

Train data

Result

prediction plot

Training loss function

computational cost

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages