Archive

Posts Tagged ‘optimality’

Optimality conditions – equality constraints

May 12, 2020 Leave a comment

Everyone knows that local minima and maxima of a function f:\Bbb{R}^n\to \Bbb{R} are critical points, i.e. points where the gradient is equal to zero. This is what we call an optimality condition, a condition verified by every solution of a minimization or maximization problem.

The situation changes when dealing with constraints. Consider a simple example, like \min_{x=1} x^2+y^2 and note that the gradient of the objective function is not zero at the optimal solution (1,0). The idea is that one needs to take into account the constraint when writing optimality conditions in this case. This leads us to consider the theory of Lagrange multipliers. Every student finds this concept a bit difficult when seen for the first time.

Since a picture is worth 1000 words, let me share with you two pictures 🙂  Consider the minimization of f(x,y) = 2x^2+y^2 under the constraint g(x,y) = (x-1)^2+(y-1)^2=0.5. The first photo below shows the gradients of the function and the constraint at the optimal solution approximated numerically. One striking property becomes obvious: the gradient of the function aligns with the gradient of the constraint. In order to see why this is necessary, let us look at another point in the constraint set where the two gradients \nabla f(x), \nabla g(x) are not aligned, shown in the second picture.

It becomes obvious that if the two gradients are not aligned, then there exists a component of \nabla f(x) which is tangent to the constraint set. Therefore, going along this direction, in its opposite sense it is possible to further decrease the value of the function f, while still following the constraint set. Therefore, when the two gradients are not aligned we are not at a minimum (the same type of argument works for maximizing problems).

Of course, this geometric argument is not the whole picture, but it gives an important insight, which directly implies the theory of Lagrange multipliers: at the optimum, the gradient \nabla f should be orthogonal to the tangent space to the constraint set, and this orthogonal, under some regularity assumptions, is generated by the gradients of the constraints themselves, giving the famous relation

\nabla f(x^*) +\sum_{i=1}^m \nabla g(x^*) = 0

Now, once we understood the intuition behind this, it remains to see under which conditions there exists a tangent space to the constraints set, and see rigorously why the above relation holds. But all this is left for some future post.