algorithm | Beni Bogoşel's blog

Iterative algorithms – Convergence rates

April 13, 2024 beni22sof Leave a comment

In optimization and any other iterative numerical algorithms, we are interested in having convergence estimates for all algorithms. We are not only interested in showing that the error goes to ${0}$ , although that is essential, but also at what speed the error goes to ${0}$ . An algorithm which converges quicker is a better choice in all cases!

Generally, there are two points of view for convergence: convergence in terms of ${1/n}$ where ${n}$ is the number of iterations or having a comparison between the error at iteration ${n+1}$ and the error at the iteration ${n}$ . I will take the latter point of view below.

To fix the ideas, denote ${e_n}$ the error estimate for ${|x_n-x^*|}$ , where ${x_n}$ is the current iterate and ${x^*}$ is the solution to the problem.

We have the following standard classification:

linear convergence: there exists ${q \in (0,1)}$ such that ${r_{i+1} \leq q r_i}$
${\star}$ the constant ${q \in (0,1)}$ is called the convergence ratio ${\star}$ it is easy to show that ${r_i \leq q^i r_0}$ , so in particular ${r_i \rightarrow 0}$ .
sublinear convergence: ${r_i \rightarrow 0}$ but is not linearly converging
superlinear convergence: ${r_i\rightarrow 0}$ with any positive convergence ratio
${\star}$ sufficient condition: ${\lim\limits_{i\rightarrow \infty} (r_{i+1}/{r_i}) =0}$
{convergence of order ${p>1}$ }: there exists ${C>0}$ such that for ${i}$ large enough $\displaystyle r_{i+1} \leq Cr_i^p$
${\star}$ ${p}$ is called the order of convergence
${\star}$ the case ${p=2}$ has a special name: quadratic convergence

To underline the significant difference between linear and superlinear convergence consider the following examples: Let ${\gamma \in (0,1)}$ . Then:

${(\gamma^n)}$ converges linearly to zero, but not superlinearly
${(\gamma^{n^2})}$ converges superlinearly to zero, but not quadratically
${(\gamma^{2^n})}$ converges to zero quadratically

Quadratic convergence is much faster than linear convergence.

Among optimization algorithm, the simpler ones are usually linearly convergent (bracketing algorithms: trisection, Golden search, Bisection algorithm, gradient descent). Algorithms involving higher order information or approximation are generally superlinear (Secant method, Newton, BFGS or LBFGS in higher dimensions etc.).

There is a huge difference between linear convergence and super-linear convergence. If a faster algorithm is available using it is surely useful!

Categories: Algorithms, Analysis, Calculus of Variations, Numerical Analysis, Optimization Tags: algorithm, convergence, optimization, sequence, speed

Golden search algorithm – efficient 1D gradient free optimization

April 12, 2024 beni22sof Leave a comment

Bracketing algorithms for minimizing one dimensional unimodal functions have the form:

Suppose ${[a_n,b_n]}$ is an interval containing the minimizer ${x^*}$ ;
Pick ${x^-<x^+}$ in ${[a_n,b_n]}$ and evaluate the function at these points.
If ${f(x^-)\leq f(x^+)}$ then choose ${[a_{n+1},b_{n+1}] = [a_n,x^+]}$
If ${f(x^-)\geq f(x^+)}$ then choose ${[a_{n+1},b_{n+1}] = [x^-,b_n]}$
Stop the process when ${[a_n,b_n]}$ has a length smaller than a given tolerance or a given maximum number of function evaluations is reached.

The simplest algorithm corresponds to choosing ${x^-,x^+}$ which divide the interval ${[a_n,b_n]}$ into three equal parts. This algorithm can be called trisection algorithm. Below you can see the intervals and intermediary points ${x^-,x^+}$ for a few iterations in the trisection algorithm.

Bracketing intervals and intermediary points for trisection algorithm

Optimizing a 1D function – trisection algorithm

April 11, 2024 beni22sof Leave a comment

Optimization problems take the classical form

$\displaystyle \min_{x \in K} f(x).$

Not all such problems have explicit solution, therefore numerical algorithms may help approximate potential solutions.

Numerical algorithms generally produce a sequence which approximates the minimizer. Information regarding function values and its derivatives are used to generate such an approximation.

The easiest context is one dimensional optimization. The basic intuition regarding optimization algorithms starts by understanding the 1D case. Not all problems are easy to handle for a numerical optimization algorithm. Take a look at the picture below:

Find coefficients of trigonometric polynomial given samples

January 27, 2022 beni22sof Leave a comment

Suppose the values ${v_i}$ of some trigonometric polynomial

$\displaystyle p(\theta) = a_0+\sum_{k=1}^N (a_k\cos(k\theta)+b_k\sin(k\theta))$

are known for a set of distinct angles ${\theta_i \in [0,2\pi]}$ , ${i=1,...,M}$ .

Question: Recover the coefficients of the trigonometric polynomial ${p}$ that verify ${p(\theta_i) = v_i}$ when ${M = 2N+1}$ or which best fits the values ${v_i}$ in the sense of least squares when ${M>2N+1}$ .

Answer: Obviously, since there are ${2N+1}$ unknowns, we need at least ${2N+1}$ equations, therefore we restrict ourselves to the case ${M\geq 2N+1}$ . The equalities ${p(\theta_i) = v_i}$ produce a set of ${M}$ equations which has at most a solution when ${M \geq 2N+1}$ , provided the rank of the matrix of the system is ${2N+1}$ . Define the function

$\displaystyle f(x) = \sum_{i=1}^{M}(a_0+\sum_{k=1}^N (a_k\cos(k\theta_i)+b_k\sin(k\theta_i)) - v_i)^2.$

This function is a sum of squares, which has zero as a lower bound. The function ${f}$ can be written using norms as ${f(x) = \|Ax-v\|^2}$ where

$\displaystyle A = \begin{pmatrix} 1 & \cos \theta_1 & ... & \cos (N\theta_1) & \sin\theta_1 & ... & \sin(N\theta_1) \\ \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 1 & \cos \theta_M & ... & \cos (N\theta_M) & \sin\theta_M & ... & \sin(N\theta_M) \end{pmatrix}, x = \begin{pmatrix} a_0\\ a_1\\ \vdots \\ a_N \\ b_1 \\ \vdots \\ b_N \end{pmatrix}, v = (v_1,...,v_M)^T$

A straightforward computation shows that the gradient of ${f}$ is

$\displaystyle \nabla f(x) = A^TAx-A^Tv.$

The ${(2N+1)\times (2N+1)}$ matrix ${A^TA}$ is invertible, provided all angles ${\theta_i,\ i=1,...,M}$ are distinct. This is a direct application of the formula of the Vandermonde determinant: using operations on columns you can recover the Vandermonde matrix corresponding to ${x_j = e^{i\theta_j}}$ . Therefore, one can always solve the system ${\nabla f(x) = 0}$ when ${M\geq 2N+1}$ and ${x^* = (A^TA)^{-1}A^Tv}$ will minimize ${f}$ . In particular, when ${M=2N+1}$ the minimum will be equal to zero and the coefficients of the trigonometric polynomial verifying ${p(\theta_i) = v_i}$ will be found. When ${M>2N+1}$ the best fit, in the sense of least squares, of the values ${v_i}$ with a trigonometric polynomial will be found.

Below, you can find a Python code which solves the problem in some particular case.

import numpy as np
import matplotlib.pyplot as plt

N = 5 # coeffs
M = 2*N+1 # M>=2N+1 samples

# function to be approximated
def fun(x):
    return np.sin(x+np.sqrt(2))+0.3*np.sin(5*x-np.sqrt(3))+0.1*np.sin(8*x-np.sqrt(7))

thetas =np.linspace(0,2*np.pi,M,endpoint=0)
dthetas =np.linspace(0,2*np.pi,1000,endpoint=1)

# values of the function at sample points
vals = fun(thetas)

A = np.zeros((M,2*N+1))

# construct the matrix of the system
A[:,0] = 1
for i in range(0,N):
    A[:,i+1] = np.cos((i+1)*thetas)
    A[:,N+i+1] = np.sin((i+1)*thetas)

coeffs = np.zeros(2*N+1)


B = (A.T)@A
print(np.shape(B))

# solve the system to find the coefficients
coeffs = np.linalg.solve(B,A.T@vals)

# function computing a trigonometric polynomial
def ptrig(x,c):
    res = np.zeros(np.shape(x))
    res = c[0]
    n = len(c)
    m = (n-1)//2
    for i in range(0,m):
        res = res+c[i+1]*np.cos((i+1)*x)+c[i+m+1]*np.sin((i+1)*x)
    return res

vals2 = ptrig(thetas,coeffs)

# plotting the result
plt.figure()
plt.plot(thetas,vals,'.b',label="Sample values")
plt.plot(dthetas,fun(dthetas),'g',label="Original function")
plt.plot(dthetas,ptrig(dthetas,coeffs),'r',label="Fitted trigonometric polynomial")
plt.legend()
plt.savefig("TrigPoly.png",dpi=100,bbox_inches='tight')
plt.show()

For the parameters chosen above the program outputs the following result. You can play with the input parameters to observe the changes.

Categories: Analysis, Fourier Analysis, Numerical Analysis, Optimization Tags: algorithm, fourier, python, trigonometric polynomial, trigonometry

Construct Pythagorean triangles with given constraints

August 24, 2020 beni22sof Leave a comment

This post will present some ideas related to the generation of all Pythagorean triangles satisfying a certain criterion, from an algorithmic point of view. Of course, there are infinitely many such triangles (integer sided with a right angle). Denoting by ${b}$ and ${c}$ the edges adjacent to the right angle and with ${a}$ the hypothenuse, we have the classical relation

$\displaystyle b^2+c^2=a^2.$

When ${a,b,c}$ are all integers, it is possible to give a more precise result regarding the possible values of ${a,b,c}$ . Indeed, for any such triangle, there exist integers ${k,m,n}$ with ${m>n}$ , ${m,n}$ not both odd and coprime such that

$\displaystyle a = k(m^2+n^2), b = k(m^2-n^2), c = 2kmn.$

These formulas are attributed to Euclid. The case ${k=1}$ corresponds to ${a,b,c}$ coprime and such a triangle is caled “primitive”. What is important to note is that this formula generates ALL pythagorean triangles exactly once, and this allows us to solve the following two questions:

1. Generate all pythagorean triangles with edges ${\leq N}$ .

2. Generate all pythagorean triangles with ${b = N}$ or ${c=N}$ .

3. Generate all pythagorean triangles with ${a = N}$ .

Now let’s answer these questions one at a time:

1. In order to generate pythagorean triangles with edges ${\leq N}$ we need to loop over all ${m,n}$ , coprime, not both odd with ${m>n}$ such that ${m^2+n^2\leq N}$ and for each primitive triangle, add to the list all its “multiples” by looking at the appropriate ${k}$ . One possible algorithm is

set an empty list ${L}$
loop for to
- loop for ${n=1}$ to ${m-1}$ , ${\gcd(m,n)=1}$ , ${m,n}$ not both odd
- loop for ${k=1}$ to ${N/(m^2+n^2)}$ and add the triangle ${[2kmn,k(m^2-n^2),k(m^2+n^2)]}$ to the list ${L}$
in the end ${L}$ will contain the desired triangles

2. In this question one of the two legs ${b}$ or ${c}$ should be equal to ${N}$ . The only difficulty here is that we need to loop on the divisors of ${N}$ . Indeed, we have two cases:

${N=2kmn}$ : for all divisors ${k}$ of ${N}$ , we should find the possible factorizations ${N/(2k) = mn}$ with ${m,n}$ not both odd and coprime, and then add the corresponding triangle to the list.
${N = k (m^2-n^2) = k(m+n)(m-n)}$ : find all factorizations ${N/k = (m+n)(m-n)}$ and check again that ${m,n}$ obtained are coprime and not both odd.

3. In this case ${a = k(m^2+n^2)}$ . Therefore, one should loop on all divisors ${k}$ of ${N}$ and in each case solve

$\displaystyle N/k = m^2+n^2$

where ${m>n}$ , ${m,n}$ are coprime, not both odd. This can be done again with a loop.

These tools might come in handy when working on Project Euler problems, since often when dealing with integer sided quantities in a triangle, things can be reduced to pythagorean triangles. When you reach this step, it is enough to loop on these triangles and perform the requested operations.

Categories: Algorithms, Geometry, Number theory, Olympiad Tags: algorithm, integer, project euler, pythagora

Sum of floors of multiples of the Golden Ratio

July 25, 2020 beni22sof 1 comment

Propose an algorithm for computing

$\displaystyle S_n = \sum_{i=1}^n \lfloor i\varphi \rfloor$

for ${n=10^{16}}$ , where ${\varphi= \frac{\sqrt{5}+1}{2}}$ is the golden ratio. The sum ${S_n}$ is the sum of the first ${n}$ terms in the so called lower Wythoff sequence.

Solution: Computing a floor function and a multiplication is not complicated, therefore proposing a ${O(n)}$ algorithm for computing ${S_n}$ is trivial. However, such an algorithm is not viable for ${n = 10^{16}}$ .

The path to a much more efficient algorithm goes through the notion of Beatty sequence. For a positive irrational number ${r>1}$ the associated Beatty sequence is ${(\lfloor nr\rfloor)_{n\geq 1}}$ . This notion becomes interesting when noting the following fact. If ${s>1}$ is defined by ${1/r+1/s=1}$ then the Beatty sequences ${(\lfloor nr\rfloor)_{n\geq 1}}$ and ${(\lfloor ns\rfloor)_{n\geq 1}}$ cover all the positive integers. The latter sequence is called the associated complementary Beatty sequence. Two proofs of this fact can be found in the Wikipedia link above.

It is obvious now that ${S_n}$ is just the partial sum of the Beatty sequence associated to ${\varphi}$ . Now let’s see what is the complementary sequence. A brief computation shows that the associated ${s}$ is given by ${s=\varphi+1}$ , which shows that the terms which are not of the form ${\lfloor i\varphi\rfloor}$ are not far from it. In order to see how this can give an efficient algorithm, let’s follow the instructions below:

denote by ${N = \lfloor n\varphi \rfloor}$ , the largest term in ${S_n}$ .
looking at all the integers up to ${N}$ , in view of the result regarding the Beatty sequences, they are either of the form ${\lfloor i\varphi \rfloor}$ or of the form ${\lfloor j(\varphi+1) \rfloor}$ (in the associated complementary sequence).
denote by ${J}$ the largest integer such that ${\lfloor j(\varphi+1) \rfloor< N}$ , which is given by ${J = \lfloor (N+1)/(\varphi+1)\rfloor}$ . Then, it is not difficult to see that ${S_n}$ is the difference between the sum of all positive integers up to ${N}$ and ${S_J+1+...+J}$ .
In the end we obtain
$\displaystyle S_n = N(N+1)/2 - S_J-J(J+1)/2.$
by convention, state that ${S_0=0}$ or ${S_1=1}$ .

The last item above shows that a recursive algorithm can be implemented in order to compute ${S_n}$ . Moreover, since the computation of ${S_n}$ is reduced to the computation of ${S_J}$ where ${J = \lfloor (\lfloor n\varphi\rfloor+1)/(\varphi+1)\rfloor\leq n \frac{\varphi}{\varphi+1}+\frac{1}{\varphi+1}}$ , the algorithm will converge very fast, since the sequence of upper bounds for ${J}$ converges exponentially to zero. Therefore, the algorithm obtained will have complexity ${O(\log n)}$ and will work for extremely large values of ${n}$ , provided the language you are using can handle the expected precision.

The algorithm for computing ${S_n}$ shown above can be used, for example, in computing sums related to elements in the Wythoff table, which can be expressed using Fibonacci numbers and terms in ${S_n}$ .

Other values of ${r}$ , like for example ${r=\sqrt{2}}$ (with ${s=2+\sqrt{2}}$ ) lead to other types of sums for which the same type of algorithm can be applied. It is likely that even cases where ${s}$ is not explicitly related to ${r}$ through an integer may be handled through a similar procedure (to be confirmed).

Categories: Algebra, Algorithms, Problem Solving, puzzles, Uncategorized Tags: algorithm, beatty, fibonacci, project euler, sequence, wythoff

Compute recurrent sequences fast

July 9, 2020 beni22sof 1 comment

In many programming questions the computation of a recurrent sequence might be necessary. One of the famous such sequences is the Fibonacci sequence defined by ${F_0=F_1=1}$ and

$\displaystyle F_{n+1}=F_n+F_{n-1},\ n \geq 1.$

Writing a code which computes ${F_n}$ for given ${n}$ is a no-brainer, but the complexity may vary much depending on the implementation. As it turns out, not everyone has the right reflexes when writing such a code. In the following I write some algorithmic variants and discuss their speed and complexity. In the end I’ll say a few words on how to compute quickly the ${n}$ -th sequence of any recurrence relation with fixed coefficients and fixed order.

First, remember that it is possible to have an exact formula:

$\displaystyle F_n = a\lambda_1^n+b\lambda_2^n$

where ${\lambda_i}$ are solutions of ${\lambda^2=\lambda+1}$ and ${a}$ and ${b}$ are found from ${F_0}$ and ${F_1}$ . Note that ${\lambda}$ are irrational, so when looking for an exact formula you might need to be careful for large ${n}$ . This could give you a ${O(\log n)}$ algorithm. The formula is explicit, but you need to compute two exponentials so it’s not really free. Moreover, this shows that values of ${F_n}$ grow exponentially fast so you’ll quickly overflow if you’re not working in arbitrary precision. Moreover, if you need to compute ${F_n (\mod p)}$ for really large ${n}$ , this might not be the simplest approach, since you’re working with irrationals.

1. Recursion. The simplest way to code the computation of ${F_n}$ is using a recursive function. Create a function fibonacci which depending on the value of ${n}$ returns ${1}$ if ${n \leq 2}$ or returns fibonacci( ${n-1}$ )+fibonacci( ${n-2}$ ). I remember using coding this when I was a beginner, thinking it was a nice idea… It turns out, however, that this approach has exponential complexity. If you’re not convinced, implement this and try to compute ${F_{10}, F_{20}}$ , etc. You’ll quickly see the exponential cost we’re talking about.

In order to see why, it is enough to note that when computing ${F_n}$ the algorithms needs to compute ${F_j}$ for all ${j <n}$ . Then for each ${F_j}$ one also needs to compute ${F_{j'}}$ for all ${j'<j}$ , etc. The only way this can be efficient if one stores the values already computed. This is called recursion with memoization, and should always be used when coding recursive tasks when there is a risk of computing the same value multiple times. However, the memoization has an ${O(n)}$ memory cost.

2. Iterative sequence. The second approach is more efficient. You only need two variables, ${A}$ and ${B}$ which store ${F_n}$ and ${F_{n-1}}$ . Then at each iteration, store ${F_{n}}$ in a temporary variable, change ${A}$ into ${F_{n}+F_{n-1}}$ and store the value of ${F_n}$ in ${B}$ . It is obvious that this approach costs ${O(n)}$ in execution time and ${O(1)}$ in memory. If you need to store all values of ${F_j}$ , ${j<n}$ you will also have ${O(n)}$ memory cost. As always, when doing this you can go up to ${n\approx 10^9}$ in reasonable computing time (even higher, depending on your hardware, or programming language). However, reaching ${n=10^{16}}$ is generally out of the question.

3. Matrix exponentiation. One interesing approach, which is really fast is rewriting the second order recurrence as a first order matrix recurrence:

$\displaystyle \begin{pmatrix} F_{n+1}\\ F_n \end{pmatrix} = \begin{pmatrix} 1& 1 \\ 1& 0 \end{pmatrix} \begin{pmatrix} F_{n}\\ F_{n-1} \end{pmatrix}$

This implies that ${\begin{pmatrix} F_{n+1}\\ F_n \end{pmatrix} = A^n \begin{pmatrix} F_{1}\\ F_0 \end{pmatrix}}$ where ${A = \begin{pmatrix} 1& 1 \\ 1& 0 \end{pmatrix}}$ . Therefore, in order to compute ${F_n}$ only a matrix exponential is needed. Using exponentiation by squaring this can be done in ${O(\log n)}$ time. At the end you will obtain an (exact) integer result or a result modulo whatever integer ${p}$ you want. So this is the way to go if you need to compute ${F_n}$ for extremely large ${n}$ .

Please note that you can generalize this to recurrences of the form

$\displaystyle x_{n+k+1} = a_{k}x_{n+k} + ... + a_1 x_1.$

It suffices to see that

$\displaystyle \begin{pmatrix} x_{n+k+1}\\ x_{n+k} \\ ... \\ x_{n+1} \end{pmatrix} = \begin{pmatrix} a_{k} & a_{k-1} & ... & a_2 & a_1 \\ 1 & 0 & ... & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & ... & 1& 0 \end{pmatrix} \begin{pmatrix} x_{n+k}\\ x_{n+k-1} \\ ... \\ x_{n} \end{pmatrix}$

so in the end you just need to compute the exponential of a bigger matrix. The complexity stays the same!

What to remember from this post?

Don’t use recursion (without memoization) when computing recurrent sequences. You won’t get very far and you’ll waste your ressources!
You can compute the ${n}$ -th value in recurrent sequences with constant coefficients with a ${O(\log n)}$ complexity using matrix exponentiation. This works well even for huge ${n}$ when working modulo a fixed integer ${p}$ .

Categories: Algebra, Algorithms, Programming Tags: algorithm, programming, sequence

Are zero-order optimization methods any good?

May 20, 2020 beni22sof 2 comments

The short answer is yes, but only when the derivative of gradient of the objective function is not available. To fix the ideas we refer to:

optimization algorithm: as an iterative process of searching for approximations of a (local/global) minimum of a certain function ${f:\Bbb{R}^n \rightarrow \Bbb{R}}$
zero-order algorithm: an optimization method which only uses function evaluations in order to decide on the next point in the iterative process.

Therefore, in view of the definitions above, zero-order algorithms want to approximate minimizers of a function using only function evaluations; no further information on derivatives is available. Classical examples are bracketing algorithms and genetic algorithms. The objective here is not to go into detail in any of these algortihms, but to underline one basic limitation which must be taken into account whenever considering these methods.

In a zero-order optimization algorithm any decision regarding the choice of the next iterate can be made only by comparing the values of ${f}$ at different evaluation points. For example, look at the classical trisection algorithm for minimizing a unimodal function, i.e. a real function defined on ${[a,b]}$ which is decreasing on ${[a,x^*]}$ and increasing on ${[x^*,b]}$ :

given the bracketing ${x^* \in [a_i,b_i]}$ choose the points ${x_-=2/3a_i+1/3b_i}$ and ${x_+=1/3a_i+2/3b_i}$ .
compare the values ${f(x_-)}$ and ${f(x_+)}$ in order to decide on the next bracketing interval:if ${f(x_-)\leq f(x_+)}$ then ${[a_{i+1},b_{i+1}]=[a_i,x_+]}$
if ${f(x_-)\geq f(x_+)}$ then ${[a_{i+1},b_{i+1}]=[x_-,b_i]}$
stop when the difference ${b_i-a_i}$ is small enough.

One question immediately rises: can such an algorithm reach any desired precision, for example, can it reach the machine precision, i.e. the precision to which computations are done in the software used? To fix the ideas, we’ll suppose that we are in the familiar world of numbers written in double precision, where the machine precision is something like ${\varepsilon = 2.2\times 10^{-16}}$ . I will not go into further details regarding this, since people do whole courses on this. Note that Matlab, Octave and Numpy are well known languages in which the default setup uses this machine precision.

More precisely, the real numbers are stored as floating point numbers and only ${16}$ digits are relevant in the computations. When adding two numbers ${a,b}$ whose ratio ${a/b}$ is smaller than ${\varepsilon}$ the result will be ${a+b = b}$ . This is due to the fact that adding the numbers would require us to shift the decimal point to the same position. However, since the ratio ${a/b}$ is smaller than ${\varepsilon}$ , when shifting the decimal point to the same position, the resulting number will contain zeros on the first ${16}$ or more significant digits in ${a}$ . Therefore the addition will not change any significant digits in ${b}$ .

This issue related to computations done in floating point precision makes the question of comparing ${f(x_-)}$ and ${f(x_+)}$ in the algorithm above pointless when ${x_-}$ and ${x_+}$ are close to the minimum. In order to identify the source of the potential problem let us write the Taylor expansion of the function ${f}$ around the minimum ${x^*}$ . First, note that ${f'(x^*)=0}$ at the minimum, which leaves us with

$\displaystyle f(x) = f(x^*)+f'(x^*)(x-x^*)+\frac{1}{2} f''(x^*)(x-x^*)^2+o()=f(x^*)+\frac{1}{2} f''(x^*)(x-x^*)^2+o(),$

where ${o()}$ denotes higher order terms. You may note that neglecting higher order terms shows that the function looks like a quadratic function near the optimum. This is a good thing to keep in mind and students should not hold any grudge for professors who illustrate the behavior of optimization algorithms on quadratic functions. As it turns out, any good algorithm needs to behave well for quadratic functions, since every function, near a minimum is eventually quadratic.

Coming back to our previous paragraph, it turns out that the computer won’t be able to tell the difference between ${f(x)}$ and ${f(x^*)}$ if

$\displaystyle \frac{1}{2} |f''(x^*)|(x-x^*)^2<\varepsilon |f(x^*)|,$

where ${\varepsilon}$ is the machine precision. Therefore, if

$\displaystyle |x-x^*|/|x^*| < \sqrt{\varepsilon }\sqrt{\frac{2|f(x^*)|}{|x^*|^2|f''(x^*)|}}$

the machine won’t tell the difference between ${f(x)}$ and ${f(x^*)}$ . Since the second factor in the above expression is well defined and positive in most cases, it turns out that we will not be able to decide if we are closer than ${\sqrt{\varepsilon}}$ to the solution ${x^*}$ based only on the values of the function ${f}$ .

Therefore, when using a zero-order algorithms for a function ${f}$ with ${x^*,f(x^*),f''(x^*)\neq 0}$ you cannot decide based only on the function values if you are closer to the optimum ${x^*}$ than the square root of the machine precision ${\sqrt{\varepsilon}}$ . This is quite a limitation, and in practice it should save lots of pointless function evaluations.

When using derivatives or gradients there is no such problem, since we can decide if we are close enough to ${x^*}$ by comparing the derivative ${f'(x^*)}$ to zero, and this comparison is valid up to the precision to which we can compute ${f'}$ .

In conclusion, use zero-order methods only if gradient based methods are out of reach. Zero-order methods are not-only slowly convergent, but may also be unable to achieve the expected precision in the end.

Categories: Algorithms, Analysis, Numerical Analysis, Optimization Tags: algorithm, convergence, optimization, precision

On Heron’s algorithm – approximating the square root

November 22, 2019 beni22sof Leave a comment

Heron’s algorithm computes an approximation of the square root of a positive real number using the iteration

$\displaystyle x_0 >0,\ x_{n+1} =\frac{1}{2}\left( x_n+\frac{y}{x_n}\right). \ \ \ \ \ (1)$

Show that the recurrence defined above is Newton’s algorithm applied for finding a zero of the function ${f(x) = x^2-y}$ . Show that the sequence ${(x_n)}$ converges to ${\sqrt{y}}$ when ${x_0}$ is close enough to ${\sqrt{y}}$ . What is the order of convergence of the sequence?In the following we consider the following error measure:
$\displaystyle E_y(x) = \frac{x-\sqrt{y}}{x+\sqrt{y}}.$

This choice simplifies the computations and is an estimation of half the relative error when ${x \rightarrow \sqrt{y}}$ .
Show that the sequence ${(x_n)}$ verifies
$\displaystyle E_y(x_{n+1}) = E_y(x_n)^2.$

Deduce an explicit formula for ${E_y(x_n)}$ in terms of ${E_y(x_0)}$ . Show that the sequence ${x_n}$ converges to ${\sqrt{y}}$ for every choice of the initial condition ${x_0>0}$ .
Show that for every ${y>0}$ there exists an integer ${k \in \Bbb{Z}}$ such that ${4^ky \in [1/2,2]}$ . Write a modified Heron algorithm which reduces the computation of the square root of ${y>0}$ to the computation of the square root of a real number ${\tilde y \in [1/2,2]}$ .The choice of the initialization ${x_0}$ determines ${E_y(x_0)}$ and has an influence on the speed of convergence. In the following we restrict ourselves to the case ${y \in [1/2,2]}$ and we look at the choice of the initial condition. We will suppose that the initial condition depends on ${x_0}$ and we will consider the following cases: ${x_0 = a}$ , a constant function and ${x_0 = a+by}$ , a polynomial of degree ${1}$ in ${y}$ . In every case we will try to minimize ${\|g\|_{L^\infty([1/2,2])}}$ for ${g(y) = E_y(x_0)}$ .
Show that if ${x_0=a}$ then ${M_a = \max_{y \in [1/2,2]} |E_y(a)|}$ is minimal for ${a=1}$ . Find the explicit value of ${M_a}$ . (Hint: You could represent graphically the function ${y \mapsto |E_y(a)|}$ .)
Suppose that ${x_0 = a+by}$ is a polynomial of degree ${1}$ in ${y}$ . We want to find the coefficients ${a}$ and ${b}$ which minimize ${M_{a,b}=\|g\|_{L^\infty([1/2,2])}}$ with ${g(y)=E_y(a+by)}$ .
(a) Prove that the maximum of ${y \mapsto |g(y)|}$ is found for ${y\in \{a/b,1/2,2\}}$ .
(b) Show that in order to minimize ${M_{a,b}}$ it is necessary that ${a=b}$ and that ${g(2)=g(1/2)=-g(a/b)}$ .
(c) Conclude that the ${a}$ and ${b}$ which minimize ${M_{a,b}}$ are ${a=b=\sqrt{\sqrt{2}/6}}$ . Find an exact expression for ${M_{a,b}}$ .
For ${y=2}$ compare the number of iterations necessary to arrive at a demanded precision ( ${E_y(x_n)<\text{tol})}$ for the two initializations studied: ${x_0=1}$ and ${x_0 = ay+b}$ with ${a=b=\sqrt{\sqrt{2}/6}}$ . Comment on the interest of using the initialization of degree ${1}$ compared to the constant initialization
Evaluate the number of operations used for the two cases and conclude which of the two approaches is more advantageous.
Try to find the initialization of second degree which is optimal and perform a similar investigation to see if from the point of view of the number of operations this choice is well adapted or not.

Categories: Algorithms, Analysis, Inequalities, Numerical Analysis Tags: algorithm, heron, newton, numerical analysis

p-Laplace equation – Fixed point approach

January 25, 2019 beni22sof Leave a comment

The ${p}$ -Laplace problem is the generalization of the usual Laplace problem and is defined for ${p \geq 2}$ in the following way

$\displaystyle \left\{ \begin{array}{rcll} -\text{div} ((1+|\nabla u|^{p-2})\nabla u) & = & f & \text{ in }\Omega \\ u & = & 0 & \text{ on }\partial \Omega \end{array} \right.$

We are not going into details regarding the functional spaces corresponding to ${f}$ and ${u}$ since in the end we are interested in showing a way to solve this equation numerically.

Project Euler tips

March 28, 2017 beni22sof 2 comments

A few years back I started working on Project Euler problems mainly because it was fun from a mathematical point of view, but also to improve my programming skills. After solving about 120 problems I seem to have hit a wall, because the numbers involved in some of the problems were just too big for my simple brute-force algorithms.

Recently, I decided to try and see if I can do some more of these problems. I cannot say that I’ve acquired some new techniques between 2012-2016 concerning the mathematics involved in these problems. My research topics are usually quite different and my day to day programming routines are more about constructing new stuff which works fast enough than optimizing actual code. Nevertheless, I have some experience coding in Matlab, and I realized that nested loops are to be avoided. Vectorizing the code can speed up things 100 fold.

So the point of Project Euler tasks is making things go well for large numbers. Normally all problems are tested and should run within a minute on a regular machine. This brings us to choosing the right algorithms, the right simplifications and finding the complexity of the algorithms involved.

Categories: Algorithms, Programming, Uncategorized Tags: algorithm, optimization, project euler

ICPC 2015 World Final Problem A

May 21, 2015 beni22sof Leave a comment

This is the solution to Problem A from the International Collegiate Programming Contest. The list of problems can be found here. This first problem consists simply of reading the parameters of the function defined below, and computing its values on the set ${\{1,2,...,n\}}$ . Then, you need to find the maximum decrease in the function values.

The inputs are parameters ${p,a,b,c,d,n}$ , the function is

$\displaystyle \text{price}(k) =p(\sin(ak+b)+\cos(ck+d)+2).$

and the values to be considered are ${p(1),p(2),...,p(n)}$ . Below is a Matlab code which works, at least for the given Sample Inputs. This should be optimized by removing the for loop. It is instant for $n=1000 fg=000000$ but it should be modified to work until $n=10^6 fg=000000$ .

function res = Prob1_2015(p,a,b,c,d,n)
dis = 1:n;

vals = p*(sin(a*dis+b)+cos(c*dis+d)+2);

mat = zeros(n,1);
for i = 1:n
mat(i) = vals(i)-min(vals(i:end));
end

res = max(mat)

The sample outputs are:

 >> Prob1_2015(42,1,23,4,8,10)  
      104.855110477394

 >> Prob1_2015(100,7,615,998,801,3)
      0

 >> Prob1_2015(100,432,406,867,60,1000)
      399.303812592112

New version which works for large $n$ . It suffices to eliminate the computation of the min at each of the phases of the iteration. This solves the problem in 0.2 seconds for $n= 10^6$

function res = Prob1_2015(p,a,b,c,d,n)
dis = 1:n;
vals = p*(sin(a*dis+b)+cos(c*dis+d)+2);
mat       = zeros(n,1);
lastmax   = vals(1);
for i = 1:n
  lastmax   = max(lastmax,vals(i));
  mat(i)    = lastmax-vals(i);
end
res = max(mat)

Categories: matlab, Olympiad Tags: algorithm, MATLAB, programming

Identifying edges and boundary points – 2D Mesh – Matlab

April 21, 2015 beni22sof 3 comments

A triangulation algorithm often gives as output a list of points, and a list of triangle. Each triangle is represented by the indexes of the points which form it. Sometimes we need extra information on the structure of the triangulation, like the list of edges, or the list of boundary points. I will present below two fast algorithms for doing this.

Finding the list of edges is not hard. The idea is to go through each triangle, and extract all edges. The algorithm proposed below creates the adjacency matrix in the following way: for each triangle ${[i,j,k]}$ we set the elements ${a_{ij},a_{jk},a_{ik}}$ (and their symmetric correspondents) to be equal to ${1}$ .

In order to find the points on the boundary (in two dimensions), it is enough to look for the edges which are sides to only one triangle. We can do this using the adjacency matrix. Note that if ${A}$ is the adjacency matrix, then ${A^2=(b_{ik})}$ stores the number of paths of length ${2}$ (two sides) between two points of the triangulation. Note that any edge which is not on the boundary will contain the starting and ending point of two paths of length ${2}$ . If ${[i,j,k]}$ is a triangle such that points ${i,j}$ are on the boundary, then ${b_{i,j}=1}$ (there is one path of length ${2}$ going through ${i,k,j}$ . We also have ${a_{i,j} = 1}$ . Conversely, if ${a_{i,j} = 1}$ and ${b_{i,j} = 1}$ then ${i,j}$ are connected, and there is a unique path of length ${2}$ going from ${i}$ to ${j}$ . Thus, ${i,j}$ is an edge on the boundary. Therefore, we just need to identify the indexes ${i}$ such that there exists ${j}$ with ${a_{i,j} b_{i,j} = 1}$ .

Below are two short Matlab codes doing these two algorithms. I guess they are close to being optimal, since only sparse and vectorized operations are used.


%p is the list of points
%T is the list of triangles, ap is the number of points
%this computes the adjacency matrix
A = min(sparse(T(:,1),T(:,2),1,ap,ap)+sparse(T(:,2),T(:,3),1,ap,ap)+sparse(T(:,3),T(:,1),1,ap,ap),1);
A = min(A+A',1);
% this finds the boundary points, whose indexes are stored in Ibord
B = A^2.*A==1;
Ibord = find(sum(B,2)>0);

Categories: Numerical Analysis, Partial Differential Equations, Programming Tags: algorithm, MATLAB, mesh, triangle

Distance from a point to an ellipsoid

October 17, 2013 beni22sof 5 comments

Here I will discuss the problem of finding the point which realizes the minimal distance from a given ${x_0}$ to an ellipsoid. First I will look at the theoretical properties which will lead us to ${x_0}$ and then I will present a numerical algorithm to find the nearest point.

This problem is interesting in itself, but has some other applications. Sometimes in optimization problems we need to impose a quadratic constraint, so after a gradient descent we always project the result on the constraint which is an ellipsoid.

Let’s begin with the problem, which can be formulated in a general way like this: we are given ${y \in \Bbb{R}^n}$ , and a matrix ${A}$ , positive semidefinite, which gives an ellipsoid ${x^TAx=1}$ . Without loss of generality assume that ${A}$ is diagonal (if not, we diagonalize it using an orthogonal matrix and work in a new coordinate system). The problem becomes

$\displaystyle \min_{x^TAx=1} \|x-y\|^2.$

(note that the square is not important here, but simplifies the computations later)

Beni Bogoşel's blog

Archive

Iterative algorithms – Convergence rates

Golden search algorithm – efficient 1D gradient free optimization

Optimizing a 1D function – trisection algorithm

Find coefficients of trigonometric polynomial given samples

Construct Pythagorean triangles with given constraints

Sum of floors of multiples of the Golden Ratio

Compute recurrent sequences fast

Are zero-order optimization methods any good?

On Heron’s algorithm – approximating the square root

p-Laplace equation – Fixed point approach

Project Euler tips

ICPC 2015 World Final Problem A

Identifying edges and boundary points – 2D Mesh – Matlab

Distance from a point to an ellipsoid

Beni Bogoşel

Blog Stats

Today’s Visitors

Stay connected

Blogroll

Archive

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Beni Bogoşel

Blog Stats

Today’s Visitors

Stay connected

Blogroll