L-BFGS – Knowledge and References

Explore chapters and articles related to this topic

Batch Learning Algorithm Convergence

Published in Richard M. Golden, Statistical Machine Learning, 2020

Intuitively, the L-BFGS works by computing a particular weighted sum of the gradient descent search direction, the previous search direction, and the previous gradient descent search direction. The underlying principle of L-BFGS is similar to a momentum type search direction as discussed in Example 7.1.2 yet the computational requirements are similar to a gradient descent algorithm. After M iterations, the L-BFGS takes a gradient descent step and then over the next M iterations, weighted sums of the current gradient descent direction with previous descent directions are computed again. It can be shown that after each inner cycle of M iterations has been completed, the resulting search direction of the L-BFGS algorithm is a close approximation to the search direction for a Newton-Raphson algorithm. The number of inner iterations M is always chosen to be less than or equal to the dimension of the system state x(t).

Investigations on Stabilization and Compression of Medical Videos

View Chapter

Purchase Book

Published in J. Dinesh Peter, Steven Lawrence Fernandes, Carlos Eduardo Thomaz, Advances in Computerized Analysis in Clinical and Medical Imaging, 2019

D. Raveena Judie Dolly, D. J. Jagannath, R. Anup Raveen Jaison

Clinical videos with jittery platform require stabilization to ascertain stability. Differential motion estimation can be adopted if handheld cameras are utilized. After the entire video is converted to frames, filtering algorithm can be adopted to undergo corrective measures in the motion as suggested in ref. [1]. Numerous research has been evolved in the area of video compression and video stabilization. Video compression involving adaptive frame determination (AFD) resulted in better subjective and objective result as provided in ref. [2]. After performing the adaptive frame determination, it is exposed to affine translation in order to reduce the buffering of memory. After which, the whole system undergoes affine transformation as suggested in ref. [3]. The information is stored in terms of parameters in a matrix fashion which occupies very less file size. This is fed to an optimizer to get optimized results as indicated in ref. [4] so as to get very good results. Both Broyden–Fletcher–Goldfarb–Shanno (BFGS) and limited-memory BFGS (L-BFGS) belong to the family of quasi-Newton methods. L-BFGS is preferred in many applications so as to reduce computer memory utilization.

Nations performance evaluation during SARS-CoV-2 outbreak handling via data envelopment analysis and machine learning methods

View Article

Journal Information

Published in International Journal of Systems Science: Operations & Logistics, 2023

Ali Taherinezhad, Alireza Alinezhad

Limited memory BFGS (L-BFGS) algorithm. Starting from initial random weights, MLP minimises the loss function by repeatedly updating these weights. After computing the loss, a backward pass propagates it from the output layer to the previous layers, providing each weight parameter with an update value meant to decrease the loss. MLP trains using Stochastic Gradient Descent (SGD), Adam (Kingma & Ba, 2014), or L-BFGS algorithm. In this paper, the L-BFGS algorithm is used to optimise the weights. L-BFGS is an optimisation algorithm in the family of quasi-Newton methods that approximates the BFGS algorithm using a limited amount of computer memory. It is a popular algorithm for parameter estimation in ML. The algorithm’s target problem is to minimise over unconstrained values of the real-vector where is a differentiable scalar function. See Liu and Nocedal (1989) for a full review of algorithm’s details.

Revisiting kernel logistic regression under the random utility models perspective. An interpretable machine-learning approach

View Article

Journal Information

Published in Transportation Letters, 2021

José Ángel Martín-Baos, Ricardo García-Ródenas, Luis Rodriguez-Benitez

One disadvantage that has been highlighted in the literature is the high computational cost of KLR (Ouyed and Allili 2018; Zhu and Hastie 2005), leading to the development of so-called sparse KLR. In this work, it has been tested the Newton’s method and BFGS algorithms, which are the canonical methods for solving the MLE problem. However, by using the L-BFGS-B optimization algorithm (Byrd et al. 1995), it has been achieved a limited memory usage and a lower computational time comparing to the BFGS or Newton’s method. More concretely, it is reduced by a factor ranging from 8 to 15. For the sake of simplicity, these results are not reported in the paper and all the numerical results have been calculated applying the L-BFGS-B algorithm. This highlight allows the practical application of the KLR to the field of transport since the data is usually collected through surveys, whose size is moderate. It can also be noticed that the computation time for the KLR is considerably reduced when uncertainty increases, for example in the case .

Trust-region algorithms for training responses: machine learning methods using indefinite Hessian approximations

View Article

Journal Information

Published in Optimization Methods and Software, 2020

Jennifer B. Erway, Joshua Griffin, Roummel F. Marcia, Riadh Omheni

L-BFGS has a number of disadvantages for solving problems in machine learning, especially in deep learning, where the network is composed of multiple cascading layers. First, it cannot be used in an on-line learning environment without significant modifications that limit its scalability to arbitrarily large data sets. (This has given rise to recent research into stochastic L-BFGS variations that have thus far been unable to maintain the robustness of classical L-BFGS in a stochastic mini-batch environment [7,12,18,19,31,48,58].) A third disadvantage of L-BFGS occurs if one tries to enforce positive definiteness of the L-BFGS matrices in a nonconvex setting. In this case, L-BFGS has the difficult task of approximating an indefinite matrix (the true Hessian) with a positive-definite matrix , which can result in the generation of nearly-singular matrices . Numerically, this creates need for heuristics such as periodically reinitializing to a multiple of the identity, effectively generating a steepest-descent direction in the next iteration. This can be a significant disadvantage for neural network problems where model quality is highly correlated with the quality of initial steps [43].