Mathematics - Optimization 17
Branches of Optimization Research #
Convex Optimization #
Convex optimization focuses on problems where the objective function and constraints are convex, ensuring a single global optimum. This field is foundational in machine learning, signal processing, and control systems due to its guaranteed convergence and efficient algorithms.
- Convex Optimization by Boyd and Vandenberghe - PDF
- Convex Optimization Theory by Dimitri P. Bertsekas - PDF
Discrete, Combinatorial, and Integer Optimization #
This branch deals with optimization problems involving discrete variables, such as integers or combinatorial structures, often encountered in scheduling, network design, and logistics. Bayesian optimization, a subset, is particularly useful for optimizing expensive black-box functions.
- Bayesian Optimization In Action by Quan Nguyen - Amazon
- Experimentation for Engineers by David Sweet - Amazon
Operations Research #
Operations research applies mathematical modeling and optimization to complex decision-making in logistics, supply chain, and resource allocation. It integrates techniques like linear programming, simulation, and heuristic methods to optimize real-world systems.
- Operations Research An Introduction by Hamdy A. Taha - Pearson
- Introduction to Operations Research by Frederick Hillier and Gerald Lieberman - McGraw Hill
- Julia Programming for Operations Research by Changhyun Kwon - PDF - code
- Mathematical Programming and Operations Research: Modeling, Algorithms, and Complexity. Examples in Python and Julia. Edited by Robert Hildebrand - PDF
- A First Course in Linear Optimization by Jon Lee - PDF
- Decomposition Techniques in Mathematical Programming by Conejo , Castillo , Mínguez , and García-Bertrand - Springer
- Algorithms for Optimization by Mykel J. Kochenderfer and Tim A. Wheeler - PDF
- Model Building in Mathematical Programming - Introductory modeling book by H. Paul Williams - Wiley
Meta-heuristics #
Meta-heuristics are high-level strategies for solving complex optimization problems where exact methods are computationally infeasible. They include nature-inspired algorithms like genetic algorithms and simulated annealing, widely used in engineering and data science.
- Metaheuristics by Patrick Siarry - Springer (open access)
- Essentials of Metaheuristics by Sean Luke - link
- Handbook of Metaheuristics by Michel Gendreau and Jean-Yves Potvin - Springer (open access)
- An Introduction to Metaheuristics for Optimization by Bastien Chopard , Marco Tomassini - Springer (open access)
- Metaheuristic and Evolutionary Computation: Algorithms and Applications by Hasmat Malik, Atif Iqbal, Puneet Joshi, Sanjay Agrawal, and Farhad Ilahi Bakhsh - Springer (open access)
- Clever Algorithms: Nature-Inspired Programming Recipes by Jason Brownlee - GitHub
- Metaheuristics: from design to implementation by El-Ghazali Talbi - Wiley
Dynamic Programming and Reinforcement Learning #
Dynamic programming and reinforcement learning address sequential decision-making problems, breaking them into subproblems or learning optimal policies through interaction with environments. These methods are critical in robotics, finance, and AI.
- Various tiltes on Dynamic Programming, Optimal Control and Reinforcement Learning by Dimitri Bertsekas. - List
- Reinforcement Learning: An Introduction (2nd Edition) by Richard Sutton and Andrew Barto - PDF
- Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer - PDF
- Algorithms for Decision Making by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray - PDF
Constraint Programming #
Constraint programming solves problems by defining constraints that must be satisfied, often used in scheduling, planning, and configuration tasks. It excels in problems with complex logical constraints and discrete variables.
- Handbook of Constraint Programming by Francesca Rossi, Peter van Beek and Toby Walsh - Amazon
- A Tutorial on Constraint Programming by Barbara M. Smith (University of Leeds) - PDF
Combinatorial Optimization #
Combinatorial optimization focuses on finding optimal solutions in discrete structures, such as graphs or sets, often using algorithms for problems like the traveling salesman or graph coloring, with applications in logistics and network design.
- Combinatorial Optimization: Algorithms and Complexity by by Christos H. Papadimitriou and Kenneth Steiglitz - Amazon
- Combinatorial Optimization: Theory and Algorithms by Bernhard Korte and Jens Vygen - Springer
- A First Course in Combinatorial Optimization by Jon Lee - Amazon
Stochastic Optimization and Control #
Stochastic optimization handles problems with uncertainty or randomness, using probabilistic models to optimize objectives. It is widely applied in machine learning, finance, and operations research for robust decision-making.
- Lectures on Stochastic Programming Modeling and Theory (SIAM) - by Shapiro, Dentcheva, and Ruszczynski - PDF
- Introductory Lectures on Stochastic Optimization by John C. Duchi - PDF
Useful Resources #
- Prof. Nguyen Mau Nam, Convex Analysis - An introduction to convexity and nonsmooth analysis
- Ben Recht, arg min
- Prof. Dimitri P. Bertsekas, Convex Analysis and Optimization
- Prof. Dimitri P. Bertsekas, Nonlinear Programming: 3rd Edition
- Off the convex path
Post on Optimization #
Optimization Papers in JMLR Volume 26
Optimization Research Papers in JMLR Volume 25
Optimization Research Papers in JMLR Volume 25 (2024) # This document lists papers from JMLR Volume 25 (2024) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications. Convex Optimization # Papers addressing convex optimization problems, including sparse NMF, differential privacy, and sparse regression. Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction Authors: Yuze Han, Guangzeng Xie, Zhihua Zhang Description: Investigates lower complexity bounds for finite-sum optimization problems in convex settings.
Ebooks & related papers on Convex Optimizations
Ebooks # Boris Mordukhovich , Nguyen Mau Nam. An Easy Path to Convex Analysis and Applications. 2023 Yurii Nesterov. Lectures on Convex Optimization. 2018 Sébastien Bubeck. Convex Optimization: Algorithms and Complexity. 2015 Dimitri Bertsekas. Nonlinear Programming. 2016 Boris Teodorovich Polyak. Introduction to Optimization. 1987 R. T. Rockafellar. Convex Analysis. 1970 H. H. Bauschke & P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. 2011 Lieven Vandenberghe and Stephen P. Boyd. Convex Optimization. 2004 Papers # Yu. E. Nesterov. A method of solving a convex programming problem with convergence rate. 1983
Pre-print articles on Adagrad-variant methods
1. Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models # Authors: Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti Abstract: Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.
Pre-print articles on Adaptive Optimization
1. A simple uniformly optimal method without line search for convex optimization # Authors: Tianjiao Li, Guanghui Lan Abstract: Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with Hölder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.
Pre-print articles on gradient-clipping methods
1. Why gradient clipping accelerates training: A theoretical justification for adaptivity # Authors: Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie Abstract: We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, \emph{gradient clipping} and \emph{normalized gradient}, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.
Pre-print articles on Difference-of-Convex (DC) Programming
57. Stochastic Difference-of-Convex Optimization with Momentum # Authors: El Mahdi Chayti, Martin Jaggi Abstract: Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.
Recent Advanced in Research on Difference-of-Convex (DC) Programming
Second-order Stochastic Optimization methods for Machine Learning
Analysis of the Hessian # 1. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks # Year: 2017 Authors: Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou ArXiv ID: arXiv:1706.04454 URL: https://arxiv.org/abs/1706.04454 Abstract: We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): Fixing data, increasing the number of parameters merely scales the bulk of the spectrum; fixing the dimension and changing the data (for instance adding more clusters or making the data less separable) only affects the outliers. We believe that our observations have striking implications for non-convex optimization in high dimensions. First, the flatness of such landscapes (which can be measured by the singularity of the Hessian) implies that classical notions of basins of attraction may be quite misleading. And that the discussion of wide/narrow basins may be in need of a new perspective around over-parametrization and redundancy that are able to create large connected components at the bottom of the landscape. Second, the dependence of small number of large eigenvalues to the data distribution can be linked to the spectrum of the covariance matrix of gradients of model outputs. With this in mind, we may reevaluate the connections within the data-architecture-algorithm framework of a model, hoping that it would shed light into the geometry of high-dimensional and non-convex spaces in modern applications. In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.
Optimization Research Papers in JMLR Volume 24
Optimization Research Papers in JMLR Volume 24 (2023) # This document lists papers from JMLR Volume 24 (2023) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications. Convex Optimization # Papers addressing convex optimization problems, including sparse PCA, L0 regularization, and matrix decomposition. Sparse PCA: A Geometric Approach Authors: Dimitris Bertsimas, Driss Lahlou Kitane Description: Develops a geometric approach for sparse principal component analysis using convex optimization techniques.
Optimization Research Papers in JMLR Volume 23
Optimization Research Papers in JMLR Volume 23 (2022) # This document lists papers from JMLR Volume 23 (2022) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications. Convex Optimization # Papers addressing convex optimization problems, including sparse PCA, L1-regularized SVMs, and metric-constrained problems. Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet Description: Develops convex optimization techniques for large-scale sparse principal component analysis with certifiable near-optimal solutions.
Optimization Research Papers in JMLR Volume 22
Optimization Research Papers in JMLR Volume 22 (2021) # This document lists papers from JMLR Volume 22 (2021) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications. Convex Optimization # Papers addressing convex optimization problems, including clustering, Wasserstein barycenters, sparse optimization, and bandits. Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan Description: Proposes a convex clustering model with theoretical guarantees and an efficient algorithm.
Optimization Research Papers in JMLR Volume 21
Optimization Research Papers in JMLR Volume 21 (2020) # This document lists papers from JMLR Volume 21 (2020) that focus on optimization research, categorized by their primary themes. Each paper is numbered starting from 1 within its subsection, with a brief description of its key contributions to optimization theory, algorithms, or applications. Convex Optimization # Papers addressing convex optimization problems, including complexity bounds, convergence analysis, and applications in regression and assortment optimization.