## A rational harmonic series

Can the rational numbers in the interval be enumerated as a sequence in such a way that the series is convergent?

*source:* 17th University of Michigan Undergraduate Mathematics Competition 2000, you can also take a look at *Bicycle or unicycle?* Problem 63

## Ten bottles of wine – Python code

In a previous post I presented a nice geometry problem which stated that if some bottles are stacked in the box such that the bottom layer extremities touch the sides of the box and each bottle on the next rows touches two bottles from the row below, then the top bottle will be centered in the box.

Below you will find a python code which allows you to produce pictures of stacks of bottles in a box for arbitrary numbers of rows (you may need to modify the code slightly for more than 15 rows).

The code for drawing the bottles goes like this:

```
import numpy as np
import matplotlib.pyplot as plt
N = 15 # number of rows
r = 1 # radius
skips = np.zeros(N)
# skips between bottom bottles
skips = [0,1.6,0.4,1.2,0.3,0.6,0.7,0.8,0.9,1,0.4,0.4,0.6,0.7,0.9]*r
skips = skips[0:N]
w = N*2*r+np.sum(skips) # total width
skips = np.cumsum(skips) # compute cumulative skips
plotw = 0.05*w # compute space for plot borders
def drawing(N,r,skips):
lin = np.linspace(0,N,N+1)[0:-1]
xs = 2*r*lin+skips+r # coordinates of the bottom bottles
ys = np.zeros(N)
plt.figure()
plt.plot(xs,ys,'.r')
for i in range(0,N):
# draw the bottles
circle1 = plt.Circle((xs[i],ys[i]),r,color='b',fill=False)
plt.gcf().gca().add_artist(circle1)
# compute the position of the bottles from
# the second row up
for i in range(1,N):
newxs = np.zeros(N-i)
newys = np.zeros(N-i)
for j in range(0,N-i):
seg = [xs[j+1]-xs[j],ys[j+1]-ys[j]]
l = np.linalg.norm(seg)/2
nor = [-seg[1],seg[0]]
nor = nor/np.linalg.norm(nor)
mid = 0.5*np.array([xs[j+1]+xs[j],ys[j+1]+ys[j]])
nl = np.sqrt(4*r**2-l**2)
newxs[j] = mid[0]+nl*nor[0]
newys[j] = mid[1]+nl*nor[1]
# draw circles
circle1 = plt.Circle((newxs[j],newys[j]),r,color='b',fill=False)
plt.gcf().gca().add_artist(circle1)
plt.plot(newxs,newys,'.r')
xs = newxs
ys = newys
print("Last x coordinate: ",newxs[0]," Midpoint: ",w/2)
# box plot
botx = np.array([0,w])
boty = -r*np.ones(2)
plt.plot(botx,boty,color='k')
leftx = np.zeros(2)
lefty = np.array([-r,newys[0]+r])
plt.plot(leftx,lefty,color='k')
rightx = w*np.ones(2)
righty = np.array([-r,newys[0]+r])
plt.plot(rightx,righty,color='k')
# some plot settings
plt.axis("scaled")
plt.axis("off")
plt.xlim([-plotw,w+plotw])
plt.ylim([-1-plotw,newys+r+plotw])
plt.savefig("Bottles.png",bbox_inches='tight',pad_inches = -0.1,dpi=300)
plt.show()
return 0
drawing(N,r,skips)
```

Here is an output obtained using this code:

The code for drawing the segments between the centers which illustrate the rhombic pattern and which indicate why the top bottle should be centered goes like this:

```
def drawingRhombi(N,r,skips):
lin = np.linspace(0,N,N+1)[0:-1]
xs = 2*r*lin+skips+r
ys = np.zeros(N)
plt.figure()
plt.plot(xs,ys,'.r',zorder=2)
for i in range(0,N-1):
plt.plot(xs[i:i+2],ys[i:i+2],color='k',zorder=1)
for i in range(1,N):
newxs = np.zeros(N-i)
newys = np.zeros(N-i)
for j in range(0,N-i):
seg = [xs[j+1]-xs[j],ys[j+1]-ys[j]]
l = np.linalg.norm(seg)/2
nor = [-seg[1],seg[0]]
nor = nor/np.linalg.norm(nor)
mid = 0.5*np.array([xs[j+1]+xs[j],ys[j+1]+ys[j]])
nl = np.sqrt(4*r**2-l**2)
newxs[j] = mid[0]+nl*nor[0]
newys[j] = mid[1]+nl*nor[1]
plt.plot([xs[j],newxs[j]],[ys[j],newys[j]],color='k',zorder=1)
plt.plot([xs[j+1],newxs[j]],[ys[j+1],newys[j]],color='k',zorder=1)
plt.plot([xs[j],newxs[j]],[-ys[j],-newys[j]],color='k',zorder=1)
plt.plot([xs[j+1],newxs[j]],[-ys[j+1],-newys[j]],color='k',zorder=1)
plt.plot(newxs,newys,'.r',zorder=2)
plt.plot(newxs,-newys,'.r',zorder=2)
xs = newxs
ys = newys
print("Last x coordinate: ",newxs[0]," Midpoint: ",w/2)
plt.axis("scaled")
plt.axis("off")
plt.xlim([r-plotw,w-r+plotw])
plt.ylim([-newys-plotw,newys+plotw])
plt.savefig("Rhombi.png",bbox_inches='tight',pad_inches = -0.1,dpi=300)
plt.show()
return 0
drawingRhombi(N,r,skips)
```

Here is an output:

## Ten Bottles of Wine

Ten identical bottles of wine are stacked up in a bin as shown in the Figure below. Show that the top bottle is centered between the sides of the bin.

*Bicycle or Unicycle?*, D. Velleman, S. Wagon, Problem 3

One of the nice benefits of being a part of Mathematical Reviews is being able to use some points in order to buy books edited by the American Mathematical Society. The book referenced above is very nice and it contains many puzzle style problems with detailed solutions and nice pictures. The problem at hand is surely intriguing, since the hypotheses are very simple and the conclusion is beautiful.

I am not going to try and write a complete solution here. I will just give you a hint and show you a picture which should help you get to the solution yourself. I wrote a python code which can show a variety of configurations, notably, it can generate such pictures for more than four layers of bottles. The conclusion is still the same: the top bottle is centered, and the proof can be obtained following the same lines.

And here is an example for 10 layers of bottles!

## Continuous functions on the 2-sphere

The structure of a continuous function defined on the circle with real values is quite intuitive. No one is surprised by the following statement:

A continuous function satisfies the following property: there exists such that .

For a proof one simply should take the function and notice that . Therefore, going from to on the circle, the function should pass through zero, which implies that takes the same value on two antipodal points. The following theorems involving the dimensional sphere are less intuitive and it is not immediate to give a proof. Nevertheless, they have some cool applications.

**Theorem 1. (Borsuk-Ulam)** Let be a continuous function. Then there exists such that .

Note that the proof strategy from the one-dimensional case does not work. The two coordinates of should vanish simultaneously, and it is not straightforward to see why this should happen. Two other similar results are shown below.

**Theorem 2. (Kakutani)** Let be a continuous function. Then there exist three points such that are mutually orthogonal and .

**Theorem 3. (Dyson)** Let be a continuous function. Then there exist vertices of a square centered at the origin such that .

We will not try to prove any of these results, since the techniques involved are not always elementary. However, let’s list a short list of cool consequences.

**Consequence 1.** At any moment in time there exists a place on Earth for which the temperature and the atmospherical pressure is the same as on the opposite side. It is enough to apply Borsuk-Ulam’s theorem for the function which associates to each point the temperature and the pressure, assuming they vary continuously.

**Consequence 2.** Let be a convex three dimensional body. Then there exists a cube circumscribed to (i.e. every side touches ).

For each direction it is possible to associate the distance between the two tangent planes to . This is a continuous function. Applying Kakutani’s result we find that there are three mutually orthogonal directions for which the distances between the tangent planes to orthogonal to these directions are the same. This gives the desired cube circumscribed to . It is possible to generalize this to an arbitrary parallelipiped by using some other variants of Kakutani’s result.

**Consequence 3.** Let be a three dimensional body which is star-shaped with respect to the origin. Then there exist four points on its boundary which are vertices of a square.

In order to prove this, consider the function which is the distance between the origin and a point of in the direction . This is a continuous function and Dyson’s theorem says that there are two orthogonal diameters in on which this function takes the same value. These will be the vertices of the desired square.

## Shapes with minimal area circumscribed around all bodies of constant width

In a previous post I talked about a proof of the Blaschke-Lebesgue theorem (that the Reuleaux triangle is the two dimensional constant width shape with the minimal area). The proof was quite surprising, as it gave a lower bound that was exactly attained for the Reuleaux triangle.

Looking closely at the proof one can deduce the following interesting result:

Among all convex, centrally symmetric domains containing all shapes of a given constant width the regular hexagon of width has the smallest area.

To prove this result is enough to recall the definition of the mixed area of two convex shapes in the plane given by

Note that is monotone in each one of its arguments.

Then given a constant width shape of width , the shape is the unit disk of area . Moreover, there exists a hexagon of width one circumscribed to . This gives

This gives , and the equality is attained for the Reuleaux triangle.

Suppose now that there exists another shape which contains all shapes of constant width equal to and . Repeating the above argument we would also get

However, plugging the Reuleaux triangle in place of in the above inequality gives a contradiction.

Therefore, among all shapes containing copies of all constant width domains in dimension two, the regular hexagon has the minimal area.

The paper *Sets of constant width* by G.D. Chakerian gives more details about the proof shown above, as well as some ideas about the three dimensional case, which is still an open problem!

## Minimal squared sum minus sum of squares

a) Let be real numbers such that . Then

b) Prove that if then

*Solution:* a) Denote with and the sum of positive and negative terms among the :

It is obvious that (it is enough to expand the sums). Moreover, the equality is attained here only if and contain at most one non-zero term.

Therefore, if the original inequality would be proved. This is equivalent to . But , which directly implies .

Let us now discuss the equality case. Equality holds if and only if . Since it follows that . Moreover, equality should also hold in

This means that all but one elements in and in are zero, which shows that the equality case is attaned for .

b) It is enough to apply a) for noting that

*Source:* The solution here was inspired from the following MathOverflow answer: link.

## Visualizing isosurfaces – Python and Mayavi

Recently I needed to plot some isosurfaces for some 3D unstructured tetrahedral meshes. I had the tetrahedral mesh of some 3D set and a P1 element function defined via the values at the nodes of this mesh. The objective was to visualize some iso surface of the finite element function.

Although this is quite a standard task, it was difficult for me to find the proper solution. Matplotlib did not allow me to interact with the 3D plots, so I decided to look into mayavi. There were a lot of answers on the web concerning isosurfaces on structured meshes (obtained with meshgrids like in Matlab), but that was not what I wanted. There were even some answers where the data from the unstructured mesh was somehow interpolated on a structured mesh and then the isosurface was computed. This is surely un-necessary, since the fact that my mesh was made of tetrahedrons makes it easy to see where the isosurface lies: just look for tetrahedrons in which the values at the vertices are on different sides of the iso-surface value. Fortunately, I did not give up and managed to find how to do this in Mayavi. I found an example of code in some obscure place in the online documentation which did exactly that and I present the code below.

I also had a mesh in the .mesh format (output from FreeFEM) which I needed to import into python. For this I used the “meshio” package.

```
import meshio
import numpy as np
import matplotlib.pyplot as plt
mesh = meshio.read("CurrentMesh3D.mesh")
mdet = mesh.cells
tetra = mdet[0].data
tri = mdet[1].data
pts = mesh.points
x = pts[:,0]
y = pts[:,1]
z = pts[:,2]
v = np.loadtxt("OnePhase3D.data")
import mayavi
from mayavi import mlab
from tvtk.api import tvtk
fig = mlab.figure(bgcolor=(1,1,1),size=(400, 350))
tet_type = tvtk.Tetra().cell_type
ug = tvtk.UnstructuredGrid(points=pts)
ug.point_data.scalars = v
ug.point_data.scalars.name = "value"
ug.set_cells(tet_type, tetra)
ds = mlab.pipeline.add_dataset(ug)
iso = mlab.pipeline.iso_surface(ds,contours=[0.5,])
iso.actor.property.opacity = 0.7
iso.contour.number_of_contours = 1
l = mlab.triangular_mesh(x,y,z,tri,color=(0.3,0.3,0.3),
opacity=0.4)
mlab.show()
```

As can be seen in the code above, the isosurface is set with the iso_surface command from mlab.pipeline. Also, it is possible to plot the triangular mesh directly with mlab.triangular_mesh. One result obtained with this code is plotted below.

Of course, the above example can be transformed to work with an unstructured mesh that you already have in python if you just have the tetrahedral information and the values at the vertices. Also you’ll need the triangles making the boundary of the big mesh in order to plot it. You can change the value of the iso surface, the number of isosurfaces, etc by just modifying the appropriate parts of the code.

What I found frustrating is that even though this kind of solution is already implemented, it is not easy to identify in the Mayavi documentation.

Of course, it is also possible to do all of this in ParaView, but I did not manage to find enough example of ParaView scripts in order to learn how to do it. And doing it by hand in ParaView was no fun since I had to do many such pictures and setting them all up to look the same by hand was just too much. (This is again an example of things that are out there, but which are only available to those who know them… Scripting in Paraview seems powerful, but in my case, if I don’t see a large palette of examples I’m unable to learn how to do it)

So hopefully, you’ll find this reference useful if you want to plot an isosufrace of a P1 function on an unstructured mesh in Mayavi!

**Challenge: **Try to find the place in the Mayavi documentation where a similar example is shown… I did not manage to find the link to put it here even though I saw it two days ago. Search engines did not help. Searching directly will give you examples on meshgrids. I hope I’m wrong and you find it quickly: if you do, please post the link in the comment section! 🙂

## IMO 2020 – Problem 2

**Problem 2.** The real numbers are such that and . Prove that

*Hints:* This problem is not difficult if one knows a bit of real analysis. The function is convex and can be extended continuously to noting that . Therefore, the function is can be extended continuously to and it attains its minimum and maximum values. Moreover, the constraint defines a compact set inside so there exists a minimum and a maximum for this function on the given domain. If at the maximum some of the variables should be , the restriction would make the inequality in the hypothesis to be sharp, but strict.

Having no other idea than to go ahead with the analysis, a natural idea is to see what happens when perturbing two of the variables while maintaining the constraint. First note that the case where gives the value which is obviously less than . Taking gives values as close to as we want, so this the maximum is surely not attained when all variables are equal. So we may well assume that at the maximum there exist two variables, say and such that .

Then, change the variables and to , with , which preserves the constraint. Consider the function

The derivative of this function is

It is not difficult to see that the function above is increasing with respect to . Therefore, the function is convex and attains its maxima at the extrema of the function are attained when and are as far away as possible. Therefore, is not a maximum for the above function when . A similar argument works when perturbing other pairs of points. Therefore the maximal value is attained when and and this maximal value is . Since the hypothesis clearly states , it follows that the upper bound is indeed but is not attained.

## IMO 2020 – Problem 1

**Problem 1.** Consider the convex quadrilateral . The point is in the interior of . The following ratio equalities hold:

Prove that the following three lines meet in a point: the internal bisectors of angles and and the perpendicular bisector of segment .

*Solution:* The key to solving this problem is to understand that what is important is not the quadrilateral , but the triangle . Also, note the interesting structure of the triangles and which have one angle which is three times the other. Also, note that the angle is half the angle . Therefore, it is possible to build a cyclic quadrilateral here, by considering the point on such that . (make a figure to understand the notations).

Why is this construction useful for our problem? If you look now in the triangle you see that the angles and are equal. Therefore, the angle bisector of is the perpendicular bisector of , which goes through the circumcenter of the cyclic quadrilateral . Note that this is exactly the circumcenter of .

Now, repeating the same argument in the triangle , we can see that the two angle bisectors in the problem all go through the circumcenter of the triangle , which obviously lies on the perpendicular bisector of .

As a conclusion, what helped solve this problem was understanding the structure of the triangles with one angle three times the other, and transforming the angle bisector into something more useful, like a perpendicular bisector which goes through a circumcenter.

## IMO 2020 Problems

**Problem 1.** Consider the convex quadrilateral . The point is in the interior of . The following ratio equalities hold:

Prove that the following three lines meet in a point: the internal bisectors of angles and and the perpendicular bisector of segment .

**Problem 2.** The real numbers are such that and . Prove that

**Problem 3.** There are pebbles of weights . Each pebble is coloured in one of colours and there are four pebbles of each colour. Show that we can arrange the pebbles into two piles so that the following two conditions are both satisfied:

- The total weights of both piles are the same.
- Each pile contains two pebbles of each colour.

**Problem 4.** There is an integer . There are stations on a slope of a mountain, all at different altitudes. Each of two cable car companies, and , operaters cable cars; each cable car provides a transfer from one of the stations to a higher one (with no intermediate stops). The cable cars of have different starting points and different finishing points, and a cable car which starts higher also finishes higher. The same conditions hold for . We say that two stations are *linked* by a company if one can start from the lower station and reach the higher one by using one or more cars of that company (no other movements between stations are allowed). Determine the smallest positive for which one can guarantee that there are two stations that are linked by both companies.

**Problem 5.** A deck of cards is given. A positive integer is written on each card. The deck has the property that the arithmetic mean of the numbers on each pair of cards is also the geometric mean of the numbers on some collection of one or more cards.

For which does it follow that the numbers on all cards are all equal?

**Problem 6.** Prove that there exists a positive constant such that the following statement is true:

Consider an integer , and a set of points in the plane such that the distance between any two different points in is at least . It follows that there is a line separating such that the distance for any point of to is at least .

(A line *separates* a set of points if some segment joining two points in crosses .)

*Note.* Weaker results with replaced with may be awarded points depending on the value of the constant .

Source: imo-official.org

## IMC 2020 – Day 1 – Hints for Problem 4

**Problem 4.** A polynomial with real coefficients satisfies the equation for all . Prove that for .

(proposed by Daniil Klyuev, St. Petersburg State University) source: http://www.imc-math.org

*Solution:* This problem is related to Bernoulli polynomials defined by and

It can be seen that the polynomial in the problem verifies the same relation as and the integral condition only affects the free term. Note that the condition is invariant up to a translation and multiplication by a constant. Therefore, it is enough to prove that verifies the required inequality. In order to solve the problem try to tackle the following ideas regarding Bernoulli polynomials.

1. Prove that they are uniquely defined by the above condition.

2. Compute the first few polynomials (or search the web for a list). Find the dominating coefficient of .

3. Show that for .

4. Consider and compute . Deduce that .

5. Prove that . Deduce that for we have

6. Use an inductive argument and points 3, 4, 5 above to see what the graphs of look like on . This should get you the result.

**Bonus questions:**

7. Show that .

8. Show that for the polynomial does not vanish on and that vanishes exactly once. (this should be clear when investigating the variations of , and noting that each of these polynomials, when differentiated, has the same sign as the previous one).

I must say that this problem semms a bit artificial, and people who have seen Bernoulli polynomials at least once in their life should have had no problem solving it. However, I find that if the right observations and questions are asked, the path to the solution could be enlightened:

- Is the polynomial unique? If not, what is the kernel of the app ? The inequality to be proved should be invariant when adding some polynomial in the kernel, and this turns out to be true.
- The fact that is equal to is a bit odd. The first question should be what happens when we replace this with for some ?
- Start with small, see if there is a pattern and then use an inductive argument.

## IMC 2020 Day 1 – Some Hints for Problems 1-2

**Problem 1.** Let be a positive integer. Compute the number of words (finite sequences of letters) that satisfy all the following requirements: (1) consists of letters, all of them from the alphabet (2) contains an even number of letters (3) contains an even number of letters (For example, for there are such words: and .)

(proposed by Armend Sh. Shabani, University of Prishtina)

**Hint: **In order to get a formula for the total number of words it is enough to note that the even number of a’s can be distributed in ways among the possible positions, and the even number of b’s can be distributed in ways among the remaining positions. Once the a’s and b’s are there, the rest can be filled with c’s and d’s in ways. This gives

Next, use some tricks regarding expansions of to compress the sum above to .

**Problem 2.** Let and be real matrices such that

where is the identity matrix. Prove that

(proposed by Rustam Turdibaev, V.I. Romanovskiy Institute of Mathematics)

**Hint: **Every time you hear about a rank 1 matrix you should think of “column matrix times line matrix”. Indeed, writing with column vectors, gives . Moreover, taking trace in the above equality and using the fact that you can perform circular permutations in products in traces, you obtain that .

Moreover, squaring gives

Taking trace above gives the desired result!

## IMC 2020 Online – Problems – Day 1

**Problem 1.** Let be a positive integer. Compute the number of words (finite sequences of letters) that satisfy all the following requirements: (1) consists of letters, all of them from the alphabet (2) contains an even number of letters (3) contains an even number of letters (For example, for there are such words: and .)

(proposed by Armend Sh. Shabani, University of Prishtina)

**Problem 2.** Let and be real matrices such that

where is the identity matrix. Prove that

(proposed by Rustam Turdibaev, V.I. Romanovskiy Institute of Mathematics)

**Problem 3.** Let be an integer. Prove that there exists a constant such that the following holds: For any convex polytope , which is symmetric about the origin, and any , there exists a convex polytope with at most vertices such that

(For a real , a set with nonempty interior is a *convex polytope with at most vertices* if is a convex hull of a set of at most points.)

(proposed by Fedor Petrov, St. Petersburg State University)

**Problem 4.** A polynomial with real coefficients satisfies the equation for all . Prove that for .

(proposed by Daniil Klyuev, St. Petersburg State University)

source: http://www.imc-math.org

## Construct Pythagorean triangles with given constraints

This post will present some ideas related to the generation of all Pythagorean triangles satisfying a certain criterion, from an algorithmic point of view. Of course, there are infinitely many such triangles (integer sided with a right angle). Denoting by and the edges adjacent to the right angle and with the hypothenuse, we have the classical relation

When are all integers, it is possible to give a more precise result regarding the possible values of . Indeed, for any such triangle, there exist integers with , not both odd and coprime such that

These formulas are attributed to Euclid. The case corresponds to coprime and such a triangle is caled “primitive”. What is important to note is that this formula generates ALL pythagorean triangles exactly once, and this allows us to solve the following two questions:

1. Generate all pythagorean triangles with edges .

2. Generate all pythagorean triangles with or .

3. Generate all pythagorean triangles with .

Now let’s answer these questions one at a time:

1. In order to generate pythagorean triangles with edges we need to loop over all , coprime, not both odd with such that and for each primitive triangle, add to the list all its “multiples” by looking at the appropriate . One possible algorithm is

- set an empty list
- loop for to
- loop for to , , not both odd
- loop for to and add the triangle to the list

- in the end will contain the desired triangles

2. In this question one of the two legs or should be equal to . The only difficulty here is that we need to loop on the divisors of . Indeed, we have two cases:

- : for all divisors of , we should find the possible factorizations with not both odd and coprime, and then add the corresponding triangle to the list.
- : find all factorizations and check again that obtained are coprime and not both odd.

3. In this case . Therefore, one should loop on all divisors of and in each case solve

where , are coprime, not both odd. This can be done again with a loop.

These tools might come in handy when working on Project Euler problems, since often when dealing with integer sided quantities in a triangle, things can be reduced to pythagorean triangles. When you reach this step, it is enough to loop on these triangles and perform the requested operations.

## Integer sided triangle and IA, IB and IC integers

Let be a triangle whose side lengths are positive integers. Denote by the incenter of the triangle and suppose also that the segments have integer lengths. Prove that the inradius of the triangle is an integer.

*Solution:* Denote by the projections of the incenter on , respectively. Use the classical notation for the lengths of the sides of the triangle. Moreover, use the notations , , . Using Pythagora’s theorem in triangles determined by we obtain

Moreover, it can be proved that if then . Using the hypothesis, it follows that are integers. Using this and the Pythagora’s relations above we find that should be an integer. However, this is not enough to conclude that would also be an integer.

Looking at the triangle , denoting and , and applying the sine rule we get

Now note that and , which gives

This complicated relation allows us to deduce that is rational. This means that with and coprime integers. Moreover, we saw that is an integer, which means that or . In the end we find that should always be an integer.

Now, is it possible that is only a half integer, i.e. is odd? The Pythagora’s relations above imply that

If is an odd integer then is also odd and of the form . Moreover, is also an integer, which by the above relation should also be odd, which means that is an integer of the form . In the end we arrive at

which is a contradiction. Therefore must be an integer!

## Sum of floors of multiples of the Golden Ratio

Propose an algorithm for computing

for , where is the golden ratio. The sum is the sum of the first terms in the so called *lower Wythoff sequence*.

*Solution:* Computing a floor function and a multiplication is not complicated, therefore proposing a algorithm for computing is trivial. However, such an algorithm is not viable for .

The path to a much more efficient algorithm goes through the notion of *Beatty sequence*. For a positive irrational number the associated Beatty sequence is . This notion becomes interesting when noting the following fact. If is defined by then the Beatty sequences and cover all the positive integers. The latter sequence is called the associated *complementary Beatty sequence*. Two proofs of this fact can be found in the Wikipedia link above.

It is obvious now that is just the partial sum of the Beatty sequence associated to . Now let’s see what is the complementary sequence. A brief computation shows that the associated is given by , which shows that the terms which are not of the form are not far from it. In order to see how this can give an efficient algorithm, let’s follow the instructions below:

- denote by , the largest term in .
- looking at all the integers up to , in view of the result regarding the Beatty sequences, they are either of the form or of the form (in the associated complementary sequence).
- denote by the largest integer such that , which is given by . Then, it is not difficult to see that is the difference between the sum of all positive integers up to and .
- In the end we obtain
- by convention, state that or .

The last item above shows that a recursive algorithm can be implemented in order to compute . Moreover, since the computation of is reduced to the computation of where , the algorithm will converge very fast, since the sequence of upper bounds for converges exponentially to zero. Therefore, the algorithm obtained will have complexity and will work for extremely large values of , provided the language you are using can handle the expected precision.

The algorithm for computing shown above can be used, for example, in computing sums related to elements in the Wythoff table, which can be expressed using Fibonacci numbers and terms in .

Other values of , like for example (with ) lead to other types of sums for which the same type of algorithm can be applied. It is likely that even cases where is not explicitly related to through an integer may be handled through a similar procedure (to be confirmed).

## Compute recurrent sequences fast

In many programming questions the computation of a recurrent sequence might be necessary. One of the famous such sequences is the Fibonacci sequence defined by and

Writing a code which computes for given is a no-brainer, but the complexity may vary much depending on the implementation. As it turns out, not everyone has the right reflexes when writing such a code. In the following I write some algorithmic variants and discuss their speed and complexity. In the end I’ll say a few words on how to compute quickly the -th sequence of any recurrence relation with fixed coefficients and fixed order.

First, remember that it is possible to have an exact formula:

where are solutions of and and are found from and . Note that are irrational, so when looking for an exact formula you might need to be careful for large . This could give you a algorithm. The formula is explicit, but you need to compute two exponentials so it’s not really free. Moreover, this shows that values of grow exponentially fast so you’ll quickly overflow if you’re not working in arbitrary precision. Moreover, if you need to compute for really large , this might not be the simplest approach, since you’re working with irrationals.

**1. Recursion.** The simplest way to code the computation of is using a recursive function. Create a function *fibonacci* which depending on the value of returns if or returns *fibonacci*()+*fibonacci*(). I remember using coding this when I was a beginner, thinking it was a nice idea… It turns out, however, that this approach has exponential complexity. If you’re not convinced, implement this and try to compute , etc. You’ll quickly see the exponential cost we’re talking about.

In order to see why, it is enough to note that when computing the algorithms needs to compute for all . Then for each one also needs to compute for all , etc. The only way this can be efficient if one stores the values already computed. This is called *recursion with memoization*, and should always be used when coding recursive tasks when there is a risk of computing the same value multiple times. However, the memoization has an memory cost.

**2. Iterative sequence.** The second approach is more efficient. You only need two variables, and which store and . Then at each iteration, store in a temporary variable, change into and store the value of in . It is obvious that this approach costs in execution time and in memory. If you need to store all values of , you will also have memory cost. As always, when doing this you can go up to in reasonable computing time (even higher, depending on your hardware, or programming language). However, reaching is generally out of the question.

**3. Matrix exponentiation.** One interesing approach, which is really fast is rewriting the second order recurrence as a first order matrix recurrence:

This implies that where . Therefore, in order to compute only a matrix exponential is needed. Using *exponentiation by squaring* this can be done in time. At the end you will obtain an (exact) integer result or a result modulo whatever integer you want. So this is the way to go if you need to compute for extremely large .

Please note that you can generalize this to recurrences of the form

It suffices to see that

so in the end you just need to compute the exponential of a bigger matrix. The complexity stays the same!

What to remember from this post?

- Don’t use recursion (without memoization) when computing recurrent sequences. You won’t get very far and you’ll waste your ressources!
- You can compute the -th value in recurrent sequences with constant coefficients with a complexity using matrix exponentiation. This works well even for huge when working modulo a fixed integer .

## Zero order methods – locating singularities

In my previous post I talked about zero-order optimization methods and why they present a loss of precision when dealing with minimizers of smooth functions. Near such points the function varies too slowly, so it is not possible to attain a precision greater than the square root of the machine precision, based solely on the function values.

However, the situation changes when dealing with non-smooth functions. In particular, when searching for singularities. I asked my students the following question:

In a research article you see the following figure. The author says that the downward pointing spikes correspond to solutions of some problem of interest and that they are found using the Golden Search method.

A list of the solutions is given and the author claims they are precise up to 12 significative digits. Knowing that the computations were made using a machine precision which can give 16 significative digits, do you believe this claim? Justify your answer.

Many of the students stated that the claim was wrong, since no more than 8 digits could be guaranteed using a zero-order method. However, there were some who realized that locating the spikes, which are local minima of a unimodal function in a small neighborhood, does not enter into the problematic cases for zero-order methods.

Why is that? Well the reason for which zero-order methods are imprecise around a local minimum around which the function is regular, is the fact that the first derivative vanishes and the function behaves like a quadratic function in . This is not the case in the figure above, since the spikes correspond to singularities, where the derivative is not well defined. Moreover, close to the local-minimum singularities the function varies quickly, not slowly, like in the smooth case. As a consequence, zero-order methods will have no problem finding the singularities to the precision to which we can compute the respective function.

In order to test this claim in practice, apply some zero-order optimization algorithm (trisection, Golden Search, Fibonacci search) to the functions:

- on
- on .

Note that the function should be singular both from the left and from the right of the minimum point. Take for example . This function varies rapidly on the right hand side of 1, but it varies slowly on the left hand side. Therefore the problem we had for smooth functions can still occur.

**Conclusion: **pay attention to the hypotheses needed to justify the various claims you encounter. Zero-order methods are limited in precision for smooth functions, but they work perfectly fine if the function is non-smooth at its minimum point.

## Are zero-order optimization methods any good?

The short answer is yes, but only when the derivative of gradient of the objective function is not available. To fix the ideas we refer to:

**optimization algorithm**: as an iterative process of searching for approximations of a (local/global) minimum of a certain function**zero-order algorithm**: an optimization method which only uses function evaluations in order to decide on the next point in the iterative process.

Therefore, in view of the definitions above, zero-order algorithms want to approximate minimizers of a function using only function evaluations; no further information on derivatives is available. Classical examples are **bracketing algorithms** and **genetic algorithms**. The objective here is not to go into detail in any of these algortihms, but to underline one basic limitation which must be taken into account whenever considering these methods.

In a zero-order optimization algorithm any decision regarding the choice of the next iterate can be made only by **comparing the values of ** at different evaluation points. For example, look at the classical trisection algorithm for minimizing a unimodal function, i.e. a real function defined on which is decreasing on and increasing on :

- given the bracketing choose the points and .
- compare the values and in order to decide on the next bracketing interval:if then
if then

- stop when the difference is small enough.

One question immediately rises: can such an algorithm reach any desired precision, for example, can it reach the machine precision, i.e. the precision to which computations are done in the software used? To fix the ideas, we’ll suppose that we are in the familiar world of numbers written in double precision, where the machine precision is something like . I will not go into further details regarding this, since people do whole courses on this. Note that Matlab, Octave and Numpy are well known languages in which the default setup uses this machine precision.

More precisely, the real numbers are stored as floating point numbers and only digits are relevant in the computations. When adding two numbers whose ratio is smaller than the result will be . This is due to the fact that adding the numbers would require us to shift the decimal point to the same position. However, since the ratio is smaller than , when shifting the decimal point to the same position, the resulting number will contain zeros on the first or more significant digits in . Therefore the addition will not change any significant digits in .

This issue related to computations done in floating point precision makes the question of comparing and in the algorithm above pointless when and are close to the minimum. In order to identify the source of the potential problem let us write the Taylor expansion of the function around the minimum . First, note that at the minimum, which leaves us with

where denotes higher order terms. You may note that neglecting higher order terms shows that the function looks like a *quadratic function* near the optimum. This is a good thing to keep in mind and students should not hold any grudge for professors who illustrate the behavior of optimization algorithms on quadratic functions. As it turns out, any good algorithm needs to behave well for quadratic functions, since every function, near a minimum is eventually quadratic.

Coming back to our previous paragraph, it turns out that the computer won’t be able to tell the difference between and if

where is the machine precision. Therefore, if

the machine won’t tell the difference between and . Since the second factor in the above expression is well defined and positive in most cases, it turns out that we will not be able to decide if we are closer than to the solution based only on the values of the function .

Therefore, when using a zero-order algorithms for a function with you cannot decide based only on the function values if you are closer to the optimum than the square root of the machine precision . This is quite a limitation, and in practice it should save lots of pointless function evaluations.

When using derivatives or gradients there is no such problem, since we can decide if we are close enough to by comparing the derivative to zero, and this comparison is valid up to the precision to which we can compute .

In conclusion, use zero-order methods only if gradient based methods are out of reach. Zero-order methods are not-only slowly convergent, but may also be unable to achieve the expected precision in the end.

## Optimality conditions – equality constraints

Everyone knows that local minima and maxima of a function are critical points, i.e. points where the gradient is equal to zero. This is what we call an **optimality condition**, a condition verified by every solution of a minimization or maximization problem.

The situation changes when dealing with constraints. Consider a simple example, like and note that the gradient of the objective function is not zero at the optimal solution . The idea is that one needs to take into account the constraint when writing optimality conditions in this case. This leads us to consider the theory of Lagrange multipliers. Every student finds this concept a bit difficult when seen for the first time.

Since a picture is worth 1000 words, let me share with you two pictures 🙂 Consider the minimization of under the constraint . The first photo below shows the gradients of the function and the constraint at the optimal solution approximated numerically. One striking property becomes obvious: the gradient of the function aligns with the gradient of the constraint. In order to see why this is necessary, let us look at another point in the constraint set where the two gradients are not aligned, shown in the second picture.

It becomes obvious that if the two gradients are not aligned, then there exists a component of which is tangent to the constraint set. Therefore, going along this direction, in its opposite sense it is possible to further decrease the value of the function , while still following the constraint set. Therefore, when the two gradients are not aligned we are not at a minimum (the same type of argument works for maximizing problems).

Of course, this geometric argument is not the whole picture, but it gives an important insight, which directly implies the theory of Lagrange multipliers: at the optimum, the gradient should be orthogonal to the tangent space to the constraint set, and this orthogonal, under some regularity assumptions, is generated by the gradients of the constraints themselves, giving the famous relation

Now, once we understood the intuition behind this, it remains to see under which conditions there exists a tangent space to the constraints set, and see rigorously why the above relation holds. But all this is left for some future post.