2023-10-12

$$\def\R{\mathbb{R}} \def\C{\mathcal{C}} \def\ge{g_\epsilon} $$

Every time I had a look at proofs of the Euler Lagrange Equation in the past I stumbled over certain steps which looked weird. The ones that put me off most involve the notation of partial derivatives like

\begin{equation*} \frac{\partial L}{\partial f}(x,f(x),f'(x)) \; \text{and}\; \frac{\partial L}{\partial f'}(x,f(x),f'(x)), \end{equation*}

which are also used in the above mentioned Wikipedia article. In conjunction with mentions of "calculus of variations" I thought, wow, how can a function $L$ have a derivative relative to a function $f$ or even $f'$. What kind of concept is this? As a software devloper trained in strongly typed languages, the $df$ in the denominator just did not make any sense at all.

Meanwhile I realised that this is just a silly abuse of notation and $\partial L/\partial f$ denotes the partial derivative of $L$ with regard to its second parameter. I hate it! If we have, for example, $L(x, 3x^2 + px, 6x+p)$ how's that written then? Do we write $\partial L/\partial(3x^2 + px)$? Nonsense.

For the rest of this text, mainly written for myself to come back here and see the derivation in a way I understand it, I try to be as exact as possible about notation. For the partial derivative of a function $L$ of $n$ arguments I'll write $\partial_i L$ for $1\leq i\leq n$. In particular for $n=1$, $\partial_1 L$ has the exact same meaning as $L'$. The benefit of this notation is that it does not rely on the actual parameters, whether $x$ or $x^\mu$ or $f'$, it does not matter, the meaning is well defined.

So what is the Euler-Langrange Equation?

A functional is a map from a vector space to the underlying field. For the vector space we have here the sufficiently often continuously differentiable functions on $\R$, say $\C^2(\R, \R)$. Let $J:\C^2(\R,\R)\to R$ be a functional mapping functions to values in $\R$ defined as $$ J(f) = \int_a^b L(x, f(x), f'(x)) dx $$ where $L:\R^3\to\R$ is sufficently often continuously differentiable.

Now you can ask which function $f$ would make $J$ minimal or maximal at least in a sufficiently defined neighbourhood of $f$. To be fair, the Euler-Lagrange Equation does not help to find minimizing or maximizing functions $f$ but rather one such that any sufficiently small variation of $f$ either increases of decreases the value $J(f)$.

The trick employed is to change the problem into one about a function $\phi:\R\to\R$ the derivative $\partial_1\phi=\phi'$ of which shall be zero.

First we introduce a function $\eta:[a,b]\to\R$ such that $\eta(a)=\eta(b)=0$ and then define $\ge:\R\to\R$ as $\ge(x) = f(x)+\epsilon\eta(x)$. We see that $f(x) = g_0(x)$. With these definitions we now define our $\phi$ as $$ \phi(\epsilon) = J(\ge) = \int_a^b L(x, \ge(x), \ge'(x)) dx . $$

Note how $\phi(0) = J(f)$, so if we manage to find an $f$ such that $\phi'(0)=0$ then $\phi(0)$ is extremal or a saddle point and so should be $J(f)$ for small variations of $f$ as implemented by $\ge$.

$$ 0 = \phi'(\epsilon) = \frac{d}{d\epsilon} \int_a^b L(x, f(x)+\epsilon\eta(x), f'(x)+\epsilon\eta'(x)) dx $$ We may pull the derivative into the integral to get

$$ 0 = \int_a^b \frac{d}{d\epsilon} L(x, f(x)+\epsilon\eta(x), f'(x)+\epsilon\eta'(x)) dx . $$

Now comes the part to use non-weird notation for the derivatives of $L$. $L$ is a function from $\R^3$ to $\R$ and it is applied to the function $H: \R^2\to\R^3$ defined as

$$ H(\epsilon, x) = \begin{pmatrix}x \\ f(x)+\epsilon\eta(x) \\ f'(x)+\epsilon\eta'(x))\end{pmatrix}. $$

So we can write: \begin{align*} \frac{d}{d\epsilon} L(H(\epsilon, x)) &= ((\partial_1, \partial_2, \partial_3)L)(H(\epsilon, x)) \cdot \frac{d}{d\epsilon} H(\epsilon, x)\\ &\phantom{=}\scriptstyle \text{(row vector times a column vector)}\\ &= ((\partial_1, \partial_2, \partial_3)L)(H(\epsilon, x)) \cdot \frac{d}{d\epsilon} \begin{pmatrix}x \\ f(x)+\epsilon\eta(x) \\ f'(x)+\epsilon\eta'(x))\end{pmatrix} \\ &= ((\partial_1, \partial_2, \partial_3)L)(H(\epsilon, x)) \cdot \begin{pmatrix}0 \\ \eta(x) \\ \eta'(x) \end{pmatrix} \\ &=(\partial_2 L)(H(\epsilon, x))\cdot\eta(x) + (\partial_3 L)(H(\epsilon, x))\cdot\eta'(x) \end{align*}

The latter term of the sum can be integrated by parts. \begin{align*} \int_a^b (\partial_3 L)(H(\epsilon, x))\cdot\eta'(x) dx &= (\partial_3 L)(H(\epsilon, x))\cdot\eta(x)\Bigr\vert_a^b - \int_a^b \ \frac{d}{dx}\Bigl[ (\partial_3 L)(H(\epsilon, x)) \Bigr]\eta(x) dx \\ &= - \int_a^b \frac{d}{dx} \Bigl[(\partial_3 L)(H(\epsilon, x))\Bigr]\eta(x) dx \end{align*} The last line follows because $\eta(a)=\eta(b)=0$. Now we can pick up the full integral from above again which we want to see being zero.

\begin{align*} 0 &= \int_a^b \frac{d}{d\epsilon} L(H(\epsilon, x)) dx \\ &=\int_a^b \left[(\partial_2 L)(H(\epsilon, x)) - \frac{d}{dx} (\partial_3 L)(H(\epsilon, x))\right]\eta(x) dx \end{align*}

Since $\eta$ can be arbitrarily chosen, the integral can only be zero if the part inside the brackets is a constant zero. This is called the fundamental lemma of the calculus of variations. Therefore, what we need is

$$ 0 = (\partial_2 L)(H(\epsilon, x)) - \frac{d}{dx} (\partial_3 L)(H(\epsilon, x)) $$ and in particular, see where we started from, for the case $\epsilon=0$, which brings us straight to the Euler-Lagrange Equation:

$$ \boxed{0 = (\partial_2 L)(x, f(x), f'(x)) - \frac{d}{dx} (\partial_3 L)(x,f(x), f'(x))} $$