optimization | Beni Bogoşel's blog

Iterative algorithms – Convergence rates

April 13, 2024 beni22sof Leave a comment

In optimization and any other iterative numerical algorithms, we are interested in having convergence estimates for all algorithms. We are not only interested in showing that the error goes to ${0}$ , although that is essential, but also at what speed the error goes to ${0}$ . An algorithm which converges quicker is a better choice in all cases!

Generally, there are two points of view for convergence: convergence in terms of ${1/n}$ where ${n}$ is the number of iterations or having a comparison between the error at iteration ${n+1}$ and the error at the iteration ${n}$ . I will take the latter point of view below.

To fix the ideas, denote ${e_n}$ the error estimate for ${|x_n-x^*|}$ , where ${x_n}$ is the current iterate and ${x^*}$ is the solution to the problem.

We have the following standard classification:

linear convergence: there exists ${q \in (0,1)}$ such that ${r_{i+1} \leq q r_i}$
${\star}$ the constant ${q \in (0,1)}$ is called the convergence ratio ${\star}$ it is easy to show that ${r_i \leq q^i r_0}$ , so in particular ${r_i \rightarrow 0}$ .
sublinear convergence: ${r_i \rightarrow 0}$ but is not linearly converging
superlinear convergence: ${r_i\rightarrow 0}$ with any positive convergence ratio
${\star}$ sufficient condition: ${\lim\limits_{i\rightarrow \infty} (r_{i+1}/{r_i}) =0}$
{convergence of order ${p>1}$ }: there exists ${C>0}$ such that for ${i}$ large enough $\displaystyle r_{i+1} \leq Cr_i^p$
${\star}$ ${p}$ is called the order of convergence
${\star}$ the case ${p=2}$ has a special name: quadratic convergence

To underline the significant difference between linear and superlinear convergence consider the following examples: Let ${\gamma \in (0,1)}$ . Then:

${(\gamma^n)}$ converges linearly to zero, but not superlinearly
${(\gamma^{n^2})}$ converges superlinearly to zero, but not quadratically
${(\gamma^{2^n})}$ converges to zero quadratically

Quadratic convergence is much faster than linear convergence.

Among optimization algorithm, the simpler ones are usually linearly convergent (bracketing algorithms: trisection, Golden search, Bisection algorithm, gradient descent). Algorithms involving higher order information or approximation are generally superlinear (Secant method, Newton, BFGS or LBFGS in higher dimensions etc.).

There is a huge difference between linear convergence and super-linear convergence. If a faster algorithm is available using it is surely useful!

Categories: Algorithms, Analysis, Calculus of Variations, Numerical Analysis, Optimization Tags: algorithm, convergence, optimization, sequence, speed

Golden search algorithm – efficient 1D gradient free optimization

April 12, 2024 beni22sof Leave a comment

Bracketing algorithms for minimizing one dimensional unimodal functions have the form:

Suppose ${[a_n,b_n]}$ is an interval containing the minimizer ${x^*}$ ;
Pick ${x^-<x^+}$ in ${[a_n,b_n]}$ and evaluate the function at these points.
If ${f(x^-)\leq f(x^+)}$ then choose ${[a_{n+1},b_{n+1}] = [a_n,x^+]}$
If ${f(x^-)\geq f(x^+)}$ then choose ${[a_{n+1},b_{n+1}] = [x^-,b_n]}$
Stop the process when ${[a_n,b_n]}$ has a length smaller than a given tolerance or a given maximum number of function evaluations is reached.

The simplest algorithm corresponds to choosing ${x^-,x^+}$ which divide the interval ${[a_n,b_n]}$ into three equal parts. This algorithm can be called trisection algorithm. Below you can see the intervals and intermediary points ${x^-,x^+}$ for a few iterations in the trisection algorithm.

Bracketing intervals and intermediary points for trisection algorithm

Optimizing a 1D function – trisection algorithm

April 11, 2024 beni22sof Leave a comment

Optimization problems take the classical form

$\displaystyle \min_{x \in K} f(x).$

Not all such problems have explicit solution, therefore numerical algorithms may help approximate potential solutions.

Numerical algorithms generally produce a sequence which approximates the minimizer. Information regarding function values and its derivatives are used to generate such an approximation.

The easiest context is one dimensional optimization. The basic intuition regarding optimization algorithms starts by understanding the 1D case. Not all problems are easy to handle for a numerical optimization algorithm. Take a look at the picture below:

Maximize the circumradius of a unit diameter simplex

May 22, 2023 beni22sof Leave a comment

Given a simplex in dimension ${d}$ with diameter at most equal to one, what is its largest possible circumradius? The question is well posed, since a ball of diameter ${2}$ will cover all the points (a ball of radius ${1}$ centered at one of the given points).

Consider ${d+1}$ points in dimension ${d}$ and suppose the origin is the center of the circumscribed ball. Then given two vertices ${x_i,x_j}$ , the circumradius verifies ${|x_i-x_j|^2 = |x_i|^2+|x_j|^2-2x_i\cdot x_j = R^2(2-2\cos \widehat{(x_i,x_j)})}$ . Therefore, one should maximize the minimal cosine of angles of two such vectors.

An immediate upper bound, which is tight can be obtained from the identity ${|x_1+...+x_{d+1}|^2 = (d+1)R^2+2R^2 \sum_{i<j} \cos \widehat{x_i,x_j}}$ . It can be seen that ${\min \cos \widehat{x_i,x_j} \leq -\frac{d+1}{2} / {d+1\choose 2}=-\frac{1}{d}}$ . Thus the maximal circumradius verifies

$\displaystyle R^2 \leq \frac{1}{2+\frac{2}{d}}= \frac{d}{2d+2}.$ This coincides with the case where all angles between different vectors are equal and their cosine is ${-1/d}$ , i.e. the simplex is regular.

Categories: Calculus of Variations, Computational Geometry, Olympiad, Optimization, Problem Solving, shape optimization Tags: circumcircle, diameter, jung, optimization

Minimal volume of some particular convex sets in the cube

November 21, 2022 beni22sof Leave a comment

Consider a cube and a convex body $K$ contained in it such that the projections of this body on every face of the cube completely covers that face. Prove that the volume of the convex body is at least equal with a third of the volume of the cube.

Categories: Calculus of Variations, Combinatorics, Geometry, IMO, Inequalities, Olympiad, Optimization Tags: convex, cube, minimal, olympiad, optimization, volume

Graham’s Biggest Little Hexagon

October 14, 2022 beni22sof 1 comment

Think of the following problem: What is the largest area of a Hexagon with diameter equal to 1?

As is the case with many questions similar to the one above, called polygonal isoperimetric problems, the first guess is the regular hexagon. For example, the largest area of a Hexagon with fixed perimeter is obtained for the regular hexagon. However, for the initial question, the regular hexagon is not the best one. Graham proved in his paper “The largest small hexagon” that there exists a better competitor and he showed precisely which hexagon is optimal. More details on the history of the problem and more references can be found in Graham’s paper, the Wikipedia page or the Mathworld page.

I recently wanted to use this hexagon in some computations and I was surprised I could not find explicitly the coordinates of such a hexagon. The paper “Isodiametric Problems for Polygons” by Mossinghoff was as close as possible to what I was looking for, although the construction is not explicit. Therefore, below I present a strategy to find what is the optimal hexagon and I will give a precise (although approximate) variant for the coordinates of Graham’s hexagon.

Checking the gradient and Hessian implementation using finite differences

April 14, 2021 beni22sof Leave a comment

Solving a numerical optimization problem depends not only on the optimization algorithm you use, but also on the fact that you implemented correctly the gradient and, eventually, the Hessian matrix associated to the function you want to optimize. The correct implementation of the partial derivatives is not always a trivial calculus question, where you have an analytic formula for your function and you just need to take care when performing the computations. However, even the best students can make a typing error, put a wrong sign somewhere, miss a factor, etc, and in that case the optimization algorithm simply doesn’t work as expected.

Things get even more complicated when the computation of the gradient (and Hessian) goes through some PDE model, multiplying the places in your code where an error might hide. For example in some parametric shape optimization problem, one has a parametric description of the shape and the gradient of the function to be optimized is obtained by putting in the associated shape derivative formula a perturbation field associated to the variation in the corresponding parameter.

Categories: Numerical Analysis, Optimization Tags: gradient, numerical analysis, optimization

Gradient algorithm with optimal step: Quadratic Case: theory

April 6, 2021 beni22sof Leave a comment

In a previous post I looked at the theory behind the convergence of the gradient descent algorithm with fixed step for a quadratic function. In this post I will treat a similar topic, namely the gradient descent algorithm with optimal step for the case of a quadratic function. Let us consider again ${J:\Bbb{R}^n\rightarrow \Bbb{R}}$ defined by

$\displaystyle J(x) = \frac{1}{2}\langle Ax,x\rangle -\langle b,x \rangle,$

the classical quadratic function, where ${A}$ is symmetric positive definite and ${b}$ is an arbitrary vector. The gradient descent algorithm with optimal step has the form

$\displaystyle x_{k+1} = x_k - \mu_k \nabla J(x_k),$

where the descent step ${\mu_k}$ is chosen such that

$\displaystyle \mu \mapsto J(x_k - \mu \nabla J(x_k))$

is minimal.

Categories: Numerical Analysis, Optimization Tags: gradient, numerical analysis, optimization, quadratic

Gradient algorithm: Quadratic Case: theory

March 31, 2021 beni22sof 1 comment

Everyone who did a little bit of numerical optimization knows the principle gradient descent algorithm. This is the simplest gradient based algorithm: given the current iterate, advance in the opposite direction of the gradient with a given step

$\displaystyle x_{k+1} = x_k - \mu \nabla J(x_j).$

It is straightforward to see that if ${\rho}$ is small enough, ${\nabla J(x_k) \neq 0}$ and ${J}$ is at least ${C^1}$ then

$\displaystyle J(x_{k+1}) = J(x_k)-\mu |\nabla J(x_k)|^2 +o(\rho).$

This means that there exists a ${\rho_k}$ small enough for which ${J(x_{k+1})<J(x_k)}$ .

Categories: Numerical Analysis, Optimization Tags: numerical analysis, numerical computations, optimization

Are zero-order optimization methods any good?

May 20, 2020 beni22sof 2 comments

The short answer is yes, but only when the derivative of gradient of the objective function is not available. To fix the ideas we refer to:

optimization algorithm: as an iterative process of searching for approximations of a (local/global) minimum of a certain function ${f:\Bbb{R}^n \rightarrow \Bbb{R}}$
zero-order algorithm: an optimization method which only uses function evaluations in order to decide on the next point in the iterative process.

Therefore, in view of the definitions above, zero-order algorithms want to approximate minimizers of a function using only function evaluations; no further information on derivatives is available. Classical examples are bracketing algorithms and genetic algorithms. The objective here is not to go into detail in any of these algortihms, but to underline one basic limitation which must be taken into account whenever considering these methods.

In a zero-order optimization algorithm any decision regarding the choice of the next iterate can be made only by comparing the values of ${f}$ at different evaluation points. For example, look at the classical trisection algorithm for minimizing a unimodal function, i.e. a real function defined on ${[a,b]}$ which is decreasing on ${[a,x^*]}$ and increasing on ${[x^*,b]}$ :

given the bracketing ${x^* \in [a_i,b_i]}$ choose the points ${x_-=2/3a_i+1/3b_i}$ and ${x_+=1/3a_i+2/3b_i}$ .
compare the values ${f(x_-)}$ and ${f(x_+)}$ in order to decide on the next bracketing interval:if ${f(x_-)\leq f(x_+)}$ then ${[a_{i+1},b_{i+1}]=[a_i,x_+]}$
if ${f(x_-)\geq f(x_+)}$ then ${[a_{i+1},b_{i+1}]=[x_-,b_i]}$
stop when the difference ${b_i-a_i}$ is small enough.

One question immediately rises: can such an algorithm reach any desired precision, for example, can it reach the machine precision, i.e. the precision to which computations are done in the software used? To fix the ideas, we’ll suppose that we are in the familiar world of numbers written in double precision, where the machine precision is something like ${\varepsilon = 2.2\times 10^{-16}}$ . I will not go into further details regarding this, since people do whole courses on this. Note that Matlab, Octave and Numpy are well known languages in which the default setup uses this machine precision.

More precisely, the real numbers are stored as floating point numbers and only ${16}$ digits are relevant in the computations. When adding two numbers ${a,b}$ whose ratio ${a/b}$ is smaller than ${\varepsilon}$ the result will be ${a+b = b}$ . This is due to the fact that adding the numbers would require us to shift the decimal point to the same position. However, since the ratio ${a/b}$ is smaller than ${\varepsilon}$ , when shifting the decimal point to the same position, the resulting number will contain zeros on the first ${16}$ or more significant digits in ${a}$ . Therefore the addition will not change any significant digits in ${b}$ .

This issue related to computations done in floating point precision makes the question of comparing ${f(x_-)}$ and ${f(x_+)}$ in the algorithm above pointless when ${x_-}$ and ${x_+}$ are close to the minimum. In order to identify the source of the potential problem let us write the Taylor expansion of the function ${f}$ around the minimum ${x^*}$ . First, note that ${f'(x^*)=0}$ at the minimum, which leaves us with

$\displaystyle f(x) = f(x^*)+f'(x^*)(x-x^*)+\frac{1}{2} f''(x^*)(x-x^*)^2+o()=f(x^*)+\frac{1}{2} f''(x^*)(x-x^*)^2+o(),$

where ${o()}$ denotes higher order terms. You may note that neglecting higher order terms shows that the function looks like a quadratic function near the optimum. This is a good thing to keep in mind and students should not hold any grudge for professors who illustrate the behavior of optimization algorithms on quadratic functions. As it turns out, any good algorithm needs to behave well for quadratic functions, since every function, near a minimum is eventually quadratic.

Coming back to our previous paragraph, it turns out that the computer won’t be able to tell the difference between ${f(x)}$ and ${f(x^*)}$ if

$\displaystyle \frac{1}{2} |f''(x^*)|(x-x^*)^2<\varepsilon |f(x^*)|,$

where ${\varepsilon}$ is the machine precision. Therefore, if

$\displaystyle |x-x^*|/|x^*| < \sqrt{\varepsilon }\sqrt{\frac{2|f(x^*)|}{|x^*|^2|f''(x^*)|}}$

the machine won’t tell the difference between ${f(x)}$ and ${f(x^*)}$ . Since the second factor in the above expression is well defined and positive in most cases, it turns out that we will not be able to decide if we are closer than ${\sqrt{\varepsilon}}$ to the solution ${x^*}$ based only on the values of the function ${f}$ .

Therefore, when using a zero-order algorithms for a function ${f}$ with ${x^*,f(x^*),f''(x^*)\neq 0}$ you cannot decide based only on the function values if you are closer to the optimum ${x^*}$ than the square root of the machine precision ${\sqrt{\varepsilon}}$ . This is quite a limitation, and in practice it should save lots of pointless function evaluations.

When using derivatives or gradients there is no such problem, since we can decide if we are close enough to ${x^*}$ by comparing the derivative ${f'(x^*)}$ to zero, and this comparison is valid up to the precision to which we can compute ${f'}$ .

In conclusion, use zero-order methods only if gradient based methods are out of reach. Zero-order methods are not-only slowly convergent, but may also be unable to achieve the expected precision in the end.

Categories: Algorithms, Analysis, Numerical Analysis, Optimization Tags: algorithm, convergence, optimization, precision

Optimality conditions – equality constraints

May 12, 2020 beni22sof Leave a comment

Everyone knows that local minima and maxima of a function $f:\Bbb{R}^n\to \Bbb{R}$ are critical points, i.e. points where the gradient is equal to zero. This is what we call an optimality condition, a condition verified by every solution of a minimization or maximization problem.

The situation changes when dealing with constraints. Consider a simple example, like $\min_{x=1} x^2+y^2$ and note that the gradient of the objective function is not zero at the optimal solution $(1,0)$ . The idea is that one needs to take into account the constraint when writing optimality conditions in this case. This leads us to consider the theory of Lagrange multipliers. Every student finds this concept a bit difficult when seen for the first time.

Since a picture is worth 1000 words, let me share with you two pictures 🙂 Consider the minimization of $f(x,y) = 2x^2+y^2$ under the constraint $g(x,y) = (x-1)^2+(y-1)^2=0.5$ . The first photo below shows the gradients of the function and the constraint at the optimal solution approximated numerically. One striking property becomes obvious: the gradient of the function aligns with the gradient of the constraint. In order to see why this is necessary, let us look at another point in the constraint set where the two gradients $\nabla f(x), \nabla g(x)$ are not aligned, shown in the second picture.

It becomes obvious that if the two gradients are not aligned, then there exists a component of $\nabla f(x)$ which is tangent to the constraint set. Therefore, going along this direction, in its opposite sense it is possible to further decrease the value of the function $f$ , while still following the constraint set. Therefore, when the two gradients are not aligned we are not at a minimum (the same type of argument works for maximizing problems).

Of course, this geometric argument is not the whole picture, but it gives an important insight, which directly implies the theory of Lagrange multipliers: at the optimum, the gradient $\nabla f$ should be orthogonal to the tangent space to the constraint set, and this orthogonal, under some regularity assumptions, is generated by the gradients of the constraints themselves, giving the famous relation

$\nabla f(x^*) +\sum_{i=1}^m \nabla g(x^*) = 0$

Now, once we understood the intuition behind this, it remains to see under which conditions there exists a tangent space to the constraints set, and see rigorously why the above relation holds. But all this is left for some future post.

Categories: Analysis, Numerical Analysis, Optimization Tags: lagrange, lagrangian, optimality, optimization

Gradient Descent converging to a Saddle Point

April 19, 2020 beni22sof Leave a comment

Gradient descent (GD) algorithms search numerically for minimizers of functions $f:\Bbb{R}^n \to \Bbb{R}$ . Once an initialization $x_0$ is chosen the algorithm is defined by

$x_{n+1}=x_n-t_n\nabla f(x_n)$

where $t_n$ is a descent step, often found using a particular line-search procedure. It is widely known that GD algorithms converge to critical points under very mild hypotheses on the function $f$ and on the line-search procedure. Moreover, GD algorithms almost always converge to local minimizers (convergence to the global minimizer is hard to guarantee, except in the convex case).

However, the “almost always” part from the above sentence is in itself interesting. It turns out that the choice of the initialization may lead the algorithm to get stuck in a Saddle Point, i.e. a critical point where the Hessian matrix has both positive and negative eigenvalues. The way to imagine such examples is to note that at such a Saddle Point there are directions for which $f$ is minimized and directions for which $f$ is maximized. If the gradient only contains information in the direction where $f$ is minimized, the algorithm will stop there. However, the set of initializations for which GD will get stuck in a Saddle Point is very small and considering slightly perturbed initializations might solve the problem.

To better illustrate this phenomenon, let’s look at a two dimensional example:

$f(x,y) = (x^2-1)^2(1+y^2)+0.2y^2.$

It can be seen immediately that this function has critical points at $(0,0),(\pm 1,0)$ and that $(0,0)$ is a saddle point, while the other two are local minima. A graph of this function around the origin is shown below

Saddle_point_3D

Note that looking only along the line $x=0$ the saddle point is a minimizer. Therefore, choosing an initialization on this line will make the GD algorithm be stuck in the Saddle Point $(0,0)$ . Below you can see an example for the initialization $x_0 = (0,1.5)$ , and the trajectory of the GD algorithm is illustrated. Considering only a slight perturbation $x_0 = (10^{-6},0.5)$ allows the GD algorithm to escape the saddle point and to converge to a local minimizer.

One simple way to prevent GD algorithms being stuck in a saddle point is to consider randomized initializations so that you avoid any bias you might have regarding the objective function.

Categories: Numerical Analysis, Optimization Tags: numerical analysis, numerical computations, optimization, python, simulation

Computing the Shape Derivative of the Wentzell Problem

July 2, 2019 beni22sof Leave a comment

Computing shape derivatives is a nice thing to know if you do numerical shape optimization. When you want to minimize a quantity depending on a shape in a more or less direct way, you should be able to differentiate it so that gradient algorithms could be applied. Computing shape derivatives is not that easy, but below there is a method which is not that hard once you practice some simple examples. For some bibliography on the subject I present you the following:

Jean CEA, Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût: this is in French, but it presents the method nicely.
Chicco-Ruiz, Morin, Pauletti, The shape derivative of the Gauss curvature. This is not an original source of the classical results, but it contains a very nice summary of the basic notions of shape derivatives.
Delfour, Zolesio, Shapes and Geometries
Henrot, Pierre, Shape Variation and Optimization
Gregoire Allaire, see his the shape optimization course on his homepage

We start directly with a complicated example, containing some complex boundary terms. The Wentzell eigenvalue problem is given by the following:

$\displaystyle \left\{\begin{array}{rcll} -\Delta u & = & 0 & \text{ in } \Omega\\ -\beta \Delta_\tau u +\frac{\partial u}{\partial n} & = & \lambda u & \text{ on } \partial \Omega \end{array}\right. \ \ \ \ \ (1)$

It is not hard to see that the associated variational formulation is given by

$\displaystyle \int_\Omega \nabla u \cdot \nabla v + \beta \int_{\partial \Omega} \nabla_\tau u \cdot \nabla_\tau v = \lambda\int_{\partial \Omega} uv \ \ \ \ \ (2)$

Lagrangians – derivation under constraints

December 21, 2018 beni22sof Leave a comment

In this post I will describe a formal technique which can help in finding derivatives of functions under certain constraints. In order to see what I mean, take the following two examples:

1. If you differentiate the function ${x \mapsto f}$ with respect to ${x}$ , you get zero, obviously.

2. If you differentiate the function ${x \mapsto f}$ under the constraint that ${f^2=x}$ (supposing that ${x>0}$ ) then it is not hard to see that ${f = \sqrt{x}}$ and the derivative will be ${1/(2\sqrt{x})}$ .

This shows that adding certain constraints will change the derivative and when dealing with more sophisticated constraints, like PDE or differential equations, things get less clear.

The question of the existence of derivatives with respect to some variables, when dealing with constraints, is usually dealt with by using the implicit function theorem: this basically says that if some variables are linked by some smooth equations and that you can locally invert the dependence, then you can infer differentiability.

The method presented below skips this step and goes directly for the computation of the derivative. This is a common used technique in control theory or shape optimization, where computing derivatives is essential in numerical simulations. I will come back to derivatives linked to shape optimization in some further posts. For now, let’s see how one can use the Lagrangian method for computing derivatives in some simple cases.

Example 1. Let’s compute the derivative of ${x \mapsto f}$ under the constraint ${f^2 = x}$ . In order to do this we introduce a Lagrangian, which has the following form

$\displaystyle \text{(objective function) } + \text{ (penalization of the constraints)}.$

Example of optimization under constraints – using Lagrange multipliers

November 27, 2018 beni22sof Leave a comment

Consider the functional ${J : (0,\infty)^2 \rightarrow\Bbb{R}}$ defined by

$\displaystyle J(x,y) = \cosh(3x+2y)$

and the set ${K}$ defined by ${K = \{(x,y) \in (0,\infty)^2 : xy \geq 1\}}$ .

Prove that ${J}$ admits a minimizer on ${K}$ and find it.
Let ${K_2}$ be the set ${K_2 = \{ (x,y) \in (0,\infty)^2 : y \leq 2+x,\ y \geq -x+5\}}$ . Does ${J}$ admit minimizers on ${K_2}$ ? If yes determine all points in which the maximum is attained.

Proof: This is a problem which illustrates well concepts related to optimality conditions for constrained problems. It is possible to solve these problems by two methods: one using only basic ideas and a second one using results from optimization theory. One can also see by examining the basic method how to recover naturally the optimality condition given by the Lagrange multipliers.

Categories: Analysis, Optimization Tags: lagrange, optimization

Optimizing a quadratic functional under affine constraints

November 2, 2018 beni22sof Leave a comment

Consider the quadratic functional ${J : \Bbb{R}^n \rightarrow \Bbb{R}}$ defined by

$\displaystyle J(v) = \frac{1}{2} Av\cdot v-b\cdot v$

for a ${n\times n}$ real matrix ${A}$ which is symmetric positive definite and a vector ${b \in \Bbb{R}^n}$ . Consider also the application ${F : \Bbb{R}^n \rightarrow \Bbb{R}^m}$ defined by ${F(v) = Cv-d}$ where ${C}$ is a ${m\times n}$ real matrix and ${d \in \Bbb{R}^m}$ . Furthermore suppose that ${m<n}$ and that ${C}$ is of maximal rank. The objective is to study the following optimization problem under equality constraints:

$\displaystyle \min_{v \in K} J(v)$

where ${K}$ is defined by ${K = \{ v \in \Bbb{R}^n : F(v) = d\}}$ .

Show that ${J}$ has a unique minimizer on ${K}$ .
Prove that if ${u}$ is the minimizer of ${J}$ on ${K}$ then there exists a unique vector ${\lambda \in \Bbb{R}^m}$ such that ${(u,\lambda) \in \Bbb{R}^n\times \Bbb{R}^m}$ solves the linear system
$\displaystyle \begin{pmatrix} A & C^t \\ C & 0 \end{pmatrix} \begin{pmatrix} u \\ \lambda \end{pmatrix} = \begin{pmatrix} b \\ d \end{pmatrix}$
Show that the matrix of the above system is invertible.

Categories: Analysis, Optimization, Uncategorized Tags: Analysis, lagrange, optimization, teaching

Project Euler 607

June 11, 2017 beni22sof 9 comments

If you like solving Project Euler problems you should try Problem number 607. It’s not very hard, as it can be reduced to a small optimization problem. The idea is to find a path which minimizes time, knowing that certain regions correspond to different speeds. A precise statement of the result can be found on the official page. Here’s an image of the path which realizes the shortest time:

euler607

Categories: Optimization, Programming, Uncategorized Tags: MATLAB, optimization, programming, project euler

Project Euler tips

March 28, 2017 beni22sof 2 comments

A few years back I started working on Project Euler problems mainly because it was fun from a mathematical point of view, but also to improve my programming skills. After solving about 120 problems I seem to have hit a wall, because the numbers involved in some of the problems were just too big for my simple brute-force algorithms.

Recently, I decided to try and see if I can do some more of these problems. I cannot say that I’ve acquired some new techniques between 2012-2016 concerning the mathematics involved in these problems. My research topics are usually quite different and my day to day programming routines are more about constructing new stuff which works fast enough than optimizing actual code. Nevertheless, I have some experience coding in Matlab, and I realized that nested loops are to be avoided. Vectorizing the code can speed up things 100 fold.

So the point of Project Euler tasks is making things go well for large numbers. Normally all problems are tested and should run within a minute on a regular machine. This brings us to choosing the right algorithms, the right simplifications and finding the complexity of the algorithms involved.

Categories: Algorithms, Programming, Uncategorized Tags: algorithm, optimization, project euler

IMC 2016 – Day 2 – Problem 7

July 28, 2016 beni22sof Leave a comment

Problem 7. Today, Ivan the Confessor prefers continuous functions ${f:[0,1]\rightarrow \Bbb{R}}$ satisfying ${f(x)+f(y) \geq |x-y|}$ for all ${x,y \in [0,1]}$ . Fin the minimum of ${\int_0^1 f}$ over all preferred functions.

Categories: Analysis, Inequalities, Uncategorized Tags: IMC, Inequalities, integral, minimal, optimization

SIAM 100 digit challenge no 4 – Matlab solution

October 11, 2015 beni22sof Leave a comment

In 2002 the SIAM News magazine published a list of ${10}$ problems in numerical analysis. The answer to each of the problems should be given with an accuracy of at least ${10}$ digits. And so there came the Hundred-Dollar, Hundred-Digit Challenge. I present below my attempt to solving the ${4}$ th problem in this series, as it deals to finding the global minimum of a highly oscillating function. I confess that my approach is rather simplistic, but solves the problem. First let’s state the problem below:

Problem 4. What is the global minimum of the function

$\displaystyle e^{\sin(50x)}+\sin(60e^y)+\sin(70\sin x)+\sin(\sin(80y))-\sin(10(x+y))+\frac{1}{4}(x^2+y^2).$

Categories: matlab, Optimization Tags: MATLAB, optimization, SIAM 100 digit challenge

Older Entries

Beni Bogoşel's blog

Archive

Iterative algorithms – Convergence rates

Golden search algorithm – efficient 1D gradient free optimization

Optimizing a 1D function – trisection algorithm

Maximize the circumradius of a unit diameter simplex

Minimal volume of some particular convex sets in the cube

Graham’s Biggest Little Hexagon

Checking the gradient and Hessian implementation using finite differences

Gradient algorithm with optimal step: Quadratic Case: theory

Gradient algorithm: Quadratic Case: theory

Are zero-order optimization methods any good?

Optimality conditions – equality constraints

Gradient Descent converging to a Saddle Point

Computing the Shape Derivative of the Wentzell Problem

Lagrangians – derivation under constraints

Example of optimization under constraints – using Lagrange multipliers

Optimizing a quadratic functional under affine constraints

Project Euler 607

Project Euler tips

IMC 2016 – Day 2 – Problem 7

SIAM 100 digit challenge no 4 – Matlab solution

Beni Bogoşel

Blog Stats

Today’s Visitors

Stay connected

Blogroll

Archive

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Beni Bogoşel

Blog Stats

Today’s Visitors

Stay connected

Blogroll