Convexity and tangent lines

Introduction

In 1991, the famous Russian mathematician V.I. Arnold published his essay “A mathematical trivium” which took issue with the culture of mathematical education at universities in the second half of the 19th century, criticizing the mode of examination that leads to “specialists [that] are not in a position to solve the the simplest problems, and do possess the rudiments of their trade” (p. 272). In order to reverse that trend, he proposes 100 model problems that every student of mathematics should be able to solve. These problems often deal in part with estimation, approximation and computation, seldolmy you would need to prove anything formally; the appropriate application of theorems takes precendence.

For example, problem no. 13 asks to evaluate the integral 010xxdx\int\limits_{0}^{10} x^x\;\textnormal{d}x (for which the indefinite integral does not have any closed-form solution) with 5% relative error. One key idea to solve it is to use the convexity of the integrand f(x):=xxf(x) := x^x by considering the relationship of the function ff with the tangent lines of any of its points. More precisely, for I=(0,10)I = (0, 10) and arbitrary a,xIa, x \in I, it holds that

f(x)f(a)+f(a)(xa)

Though why is this true? Why does the graph of a convex function always lie above the tangent lines of all of its points? In the following, I want to build some geometrical intuition as to why this relationship, at least in the one-dimensional case, is more or less obvious. Note that the statement is also true for higher dimensions. Then you just need to replace the slope f(a)f’(a) with the gradient f\nabla f.


Basic Idea

The underlying idea about the proof is the link between convexity of the function ff and the behavior of its derivatives. When you know that ff is convex, you may prove that the derivative ff’ is monotonically increasing. This is a standard textbook result. Now, the claim that the graph of a convex function always lies above the tangent line of any of its points, then translates to the statement that the graph curves away from the tangent line, positioning the graph of ff above it.

To make yourself more familiar with the idea, consider the following TikZ picture.


Proof

We have to show the claim for arbitrary x(a,b)x \in (a, b) and p(a,b)p \in (a, b). Therefore, take some point pp as in the TikZ picture. First, we consider another point x2(a,b)x_2 \in (a, b) such that x2>px_2 > p. We know that ff is convex. So, because it holds that f(p)<f(x2)f’(p) < f’(x_2) since p<x2p < x_2 and ff’ monotonically increasing due to its convexity, we have that f(x2)f(x2)f(p)x2pf(p) But this immediately implies that f(x2)f(p)+f(p)(x2p)\boxed{f(x_2) \geq f(p) + f’(p)(x_2 - p)} and so the claim holds for all x(p,b)x \in (p, b). Now, what if we consider points xx for which x<px < p, i.e. x(a,p)x \in (a, p)? Consider now x1(a,p)x_1 \in (a, p). Then, we similarly have that f(p)f(p)f(x1)px1f(x1) again due to the monotone increase of ff’ which is due to its convexity. But then f(p)f(p)f(x1)px1f(p)(1)(1)f(x1)f(p)x1pf(p)+f(p)(x1p)f(x1) and voilà, we’re done. Note that in the last step, the inequality sign changed because x1p<0x_1 - p < 0.


Application to Problem 13

Using the theorem we can infer the lower bound of the integral 110xxdx\int\limits_{1}^{10} x^x \;\text{d}x in the form of

f(x)f(10)+f(10)(x10)xx1010+1010(log(10)+1)(x10)110xxdx1101010+1010(log(10)+1)(x10)dx

Together with a upper bound on the integral and other approximations, we can then proceed to solve the problem.