MATH411 is a continuation of MATH410, but into higher dimensions.

Derivatives in Several Variables

Limits

Let $A \subseteq R^{n}$ . Then, $x^{*}$ is a limit point of $A$ if there is a sequence $\exists {x_{k}} \subseteq A / {x^{*}}$ such that ${x_{k}} \to x^{*}$ .

In other words, there is a sequence not containing $x^{*}$ converging to it.

If we have function $f : A \to R$ , and $x^{*}$ is a limit point of $A$ , then the limit of the function is defined as

x \to x^{*} lim f (x) = l

if $\forall {x_{k}} \subseteq A / {x^{*}}$ ,

k \to \infty lim f (x_{k}) = l

Example: Existence of Limits (1)

$f (x, y) = \frac{x y}{x ^{2} + y ^{2}}$
$f : R^{2} \to {(0, 0)}$ .

$lim_{(x, y) \to (0, 0)} f (x, y)$ does not exist, as we can chose ${x_{k}} = (1/ k, 0)$ , and ${x_{k}} = (1/ k, 1/ k)$ , which have different limits as $k \to \infty$ .

Example: Existence of Limits (2)

$g (x, y) = \frac{x ^{2} y}{x ^{2} + y ^{2}}$
In this case, $lim_{(x, y) \to (0, 0)} g (x, y)$ exists and equals 0.

Let $(x_{k}, y_{k}) \to (0, 0)$ not containing $(0, 0)$ . Then,
$∣ g (x_{k}, y_{k}) ∣ \leq ∣ x_{k} ∣ \frac{∣ x _{k} y _{k} ∣}{x _{k}^{2} + y _{k}^{2}} \leq ∣ x_{k} ∣ \frac{1}{2} \to 0$

Theorem: Compositions of Limits

Let $A \subseteq R^{n}$ , $x^{*} \in A$ be a limit point. Let $f : A \to R$ , $g : A \to R$ be functions, $l_{1}, l_{2} \in R$ such that
$x \to x^{*} lim f (x) = l_{1} x \to x^{*} lim g (x) = l_{2}$
Then:

$x \to x^{*} lim f (x) + g (x) = l_{1} + l_{2}$

$x \to x^{*} lim f (x) g (x) = l_{1} l_{2}$

If $g (x) \neq = 0$ for all $x \in A$ , and $l_{2} \neq = 0$ , $x \to x^{*} lim \frac{f ( x )}{g ( x )} = \frac{l _{1}}{l _{2}}$

The quotient rule for limits is the most interesting of the 3, and there is a broad study of limits of quotients

x \to x^{*} lim \frac{f ( x )}{g ( x )}

Where $lim_{x \to x^{*}} f (x) = lim_{x \to x^{*}} g (x) = 0$ .

These limits can occur frequently, and we commonly ask if such limits exist (think of derivatives!).

Example: Limit Example

$(x, y) \to (0, 0) lim \frac{x ^{3}}{x ^{2} + y ^{2}}$
We ask if this limit exists. To determine this, we will establish a bound on the function.
$\frac{x ^{3}}{x ^{2} + y ^{2}} \leq \frac{x ^{3}}{x ^{2}} = ∣ x ∣$
For any $(x, y)$ where $x$ and $y$ are both not equal 0, our function is bounded by $∣ x ∣$ ! Thus, as $(x, y) \to (0, 0)$ , $∣ x ∣ \to 0$ , so by the Comparison Lemma, $∣ \frac{x ^{3}}{x ^{2} + y ^{2}} ∣ \to 0$ .

Thus, the limit exists and is equal to 0!

Theorem: Limit Equivalences

Let $A \subseteq R^{n}$ and let $x^{*}$ be a limit point of $A$ . For a function $f : A \to R$ , and $l \in R$ , the following assertions are equivalent:

$x \to x^{*} lim f (x) = l$ In other words, for any ${x_{k}} \in A / {x^{*}}$ , if $lim_{k \to \infty} x_{k} = x^{*}$ , then $k \to \infty lim f (x_{k}) = l$

$\forall ϵ > 0$ , there exists some $δ > 0$ such that $d (x, x^{*}) < δ \to ∣ f (x) - l ∣ < ϵ x \in A / {x^{*}}$

We can also use the following property to show that such limits exist.

A function $f : R^{n} / {0} \to R$ is homogeneous of degree $k$ if

f (t x) = t^{k} f (x) \forall t > 0, \forall x \in R^{n} / {0}

Basically, we should be able to replace $x, y$ with $t$ , and take out the $t$ into a $t^{k}$ term.

Example: Homogenoeus Functions

$g (x, y) = \frac{x ^{2} y}{x ^{2} + y ^{2}}$
Is homogeneous of degree 1, because
$g (t x, t y) = \frac{t ^{3} x y}{t ^{2} ( x ^{2} + y ^{2} )} = t g (x, y)$ $f (x, y) = \frac{x y}{x ^{2} + y ^{2}}$
Is homogeneous of degree 0, because
$f (t x, t y) = t^{0} f (x, y)$

Curiously, $g$ is homogeneous of degree 1 and has a limit to $(0, 0)$ , whereas $f$ is homogeneous of degree 0 and doesn’t. Does there suggest some generalization?

Proposition: Limits of Homogeneous Functions

If $f : R^{n} / {0} \to R$ is continuous, and homogeneous of degree $k > 0$ , then
$x \to 0 lim f (x) = 0$

Proof

We look at $f (x)$ , and we try to make the case that as $x \to 0$ , $f (x) \to 0$ .
$f (x) = f (∣∣ x ∣∣ \frac{x}{∣∣ x ∣∣})$
We write this as a product of something that approaches 0 and something that is bounded.

By assumption, $f (x)$ is homogeneous of degree $k$ , so letting $t = ∣∣ x ∣∣$ ,
$f (∣∣ x ∣∣ \frac{x}{∣∣ x ∣∣}) = ∣∣ x ∣ ∣^{k} f (\frac{x}{∣∣ x ∣∣})$
As $x \to 0$ , $∣∣ x ∣ ∣^{k} \to 0$ ! Also, as $\frac{x}{∣∣ x ∣∣} \in S^{n - 1}$ (the unit sphere of dimension $n - 1$ ), which is a sequentially compact, then continuous functions on sequentially compact sets are bounded.
$∣∣ x ∣ ∣^{k} f (\frac{x}{∣∣ x ∣∣}) \to 0$

Example: Continuity with Homogeneity

$f (x, y) = \frac{x ^{3} y}{x ^{2} + y ^{2}} \frac{\partial f}{\partial x} (0, 0) = 0$
We show that $f$ is $C^{1}$ by homogeneity.

We know that $f$ is $C^{\infty}$ in $R^{n} / {(0, 0)}$ is homogeneous of degree 2. So, $\frac{\partial f}{\partial x}$ is homogeneous of degree 1.

It is a fact that if $f \in C^{1} (R^{n} / {0})$ and is homogeneous of degree $k$ , then $\frac{\partial f}{\partial x _{i}}$ is homogeneous of degree $k - 1$ .

Let $x \neq = 0$ , $t > 0$ . Then, say $f$ is homogeneous degree $k$ .
$f (t, x) = t^{k} f (x) \frac{\partial}{\partial x _{i}} f (t x) = \frac{\partial}{\partial x _{i}} [t^{k} f (x)] = t^{k} \frac{\partial f}{\partial x _{i}} (x) t \frac{\partial f}{\partial x _{i}} (t x) = t^{k} \frac{\partial}{\partial x _{i}} t^{k} f (x) ⟹ \frac{\partial f}{\partial x _{i}} (t x) = t^{k - 1} f (x)$
By a theorem, homogeneous functions of positive degree go to 0 as $x \to 0$ . So, $f \in C^{1}$ .

This does not necessarily mean that homogeneous functions of degree $k = 0$ don’t have a limit at 0! Just that some don’t.

Any constant function (ex. $f (x) = 1$ ) have defined limits as $x \to 0$ !

Partial Derivatives

Let $f : O \to R$ , $x = (x_{1}, \dots x_{n}) \in O$ .

For $1 \leq i \leq n$ , we define the partial derivative of $f$ with respect to $x_{i}$ at $x$ as

\frac{\partial f}{\partial x _{i}} (x) = t \to 0 lim \frac{f ( x + t e _{i} ) - f ( x )}{t}

if the latter limit exists. Note that $e_{i}$ is the $i^{t h}$ basis vector,

Keep all variables constant except $x_{i}$ , and take $\frac{d}{d x _{i}}$ !

Example: Partial Derivatives and Continuity

Let $f : R^{2} \to R$ ,
$f (x, y) = {\frac{x y}{x ^{2} + y ^{2}} 0 (x, y) \neq = (0, 0) (x, y) = (0, 0)$
We noticed that $f$ is not continuous at $(0, 0)$ , and $\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}$ exist (by the quotient rule) at all $(x, y) \neq = (0, 0)$ .

Then, does $\frac{\partial f}{\partial x}$ exist at $(0, 0)$ ?
$t \to 0 lim \frac{f ( t , 0 ) - f ( 0 , 0 )}{t} = t \to 0 lim \frac{0 - 0}{t}$
Yes! It exists and is equal to 0.

Similarly, we can find $\frac{\partial f}{\partial y} = 0$ .

This is very interesting! Even though $f (x, y)$ is not continuous at $(0, 0)$ , our partial derivatives still exist! This goes against our understanding of differentiability and continuity in the single variable case.

Let $O \subseteq R^{n}$ open, $f : O R$ . Then, we say $f$ has first-order partial derivatives if for all $1 \leq i \leq n$ , the function has a partial derivative with respect to its $i^{t h}$ component, at every point in its domain.

Differentiability Need Not Imply Continuity

In the single-variable case, a function with a derivative was continuous. However, this is no longer true in multiple variables!

A function with first-order derivatives need not be continuous. Consider the following example.

Example: Differentiability Need Not Imply Continuity

Define
$f (x, y) = {\frac{x y}{x ^{2} + y ^{2}} 0 (x, y) \neq = (0, 0) (x, y) = (0, 0)$
We show that the partial derivatives of the function exist at $(0, 0)$ . For all $t$ ,
$f (0 + t e_{i}) = f (t, 0) = 0$
So,
$\frac{\partial f}{\partial x _{i}} (0, 0) = t \to 0 lim \frac{f ( 0 + t e _{i} ) - f ( 0 )}{t} = t \to 0 lim \frac{f ( t , 0 ) - f ( 0 , 0 )}{t} = 0$
However, this function is not continuous! For sequence ${(\frac{1}{k}, \frac{1}{k})} \to 0$ , $f (\frac{1}{k}, \frac{1}{k}) = \frac{1}{2}$ for all $k$ , but $f (0) = 0$ !

It is only if all partials are continuous, that our theorems from the single-variable case hold.

We say that $f$ is continuously differentiable, $C^{1}$ if it has first-order partial derivatives such that each partial derivative $\frac{\partial f}{\partial x _{i}}$ is continuous for $1 \leq i \leq n$ .

Let’s now consider second-order partial derivatives, denoted like

\frac{\partial f}{\partial x _{j} \partial x _{i}}

Where we apply the partial derivative of $x_{i}$ first, then $x_{j}$ after.

Order matters! There are some functions where swapping the order of derivatives changes the result.

We say $f$ has second-order partial derivatives of it has first-order partials, such that for $1 \leq i \leq n$ , each $\frac{\partial f}{\partial x _{i}}$ also has first-order partial derivatives (of every variable).
We say $f$ has continuous second-order partial derivatives if it has second-order partial derivatives, and each $\frac{\partial ^{2} f}{\partial x _{i} \partial x _{j}}$ are continuous.

Theorem: Partial Derivative Order

Let $O \subseteq R^{n}$ open, and let $f : O \to R$ have continuous second-order partial derivatives. Then, for any two $1 \leq i, j \leq n$ , and any $x \in O$ ,
$\frac{d}{d x _{i}} (\frac{\partial f}{\partial x _{j}}) = \frac{d}{d x _{j}} (\frac{\partial f}{\partial x _{i}})$

Proof (TODO)

Let $x = x_{i}, y = x_{j}$ . Then,
$A = f (x_{0} + r, y_{0} + r) - f (x_{0} + r, y_{0}) - f (x_{0}, y_{0} + r) + f (x_{0}, y_{0})$
Define $ϕ (x) = f (x, y_{0} + r) - f (x, y_{0})$ . Then,
$A = ϕ (x_{0} + r) - ϕ (x_{0})$
By the mean value theorem, this is equal to
$= ϕ^{'} (x_{0} + θ_{1}) * r = \frac{\partial}{\partial x} [f (x_{0} + θ_{1}, y_{0} + r) - f (x_{0} + θ_{1}, y_{0})] R$

We create two equivalent functions that converge to the two partial derivatives, respectively, forcing equality.

Directional Derivatives and MVT

Recall that in the single variable case, we had the Mean Value Theorem.

Theorem: Mean Value Theorem

Let $f : [a, b] \to R$ be continuous, and differentiable on $(a, b)$ . Then, $\exists c \in (a, b)$ such that
$f (b) - f (a) = f^{'} (c) (b - a)$

This is a really useful theorem! In this section, we generalize it to multiple variables.

This generalization requires we use the single-variable MVT!

Lemma: Mean Value Lemma

Let $O \subseteq R^{n}$ open, and let $1 \leq i \leq n$ . Let $f : O \to R$ have a partial derivative with respect to $x_{i}$ for all $x \in O$ .

Let $x \in O$ , and $a$ be a real number such that the segment between $x$ and $x + a e_{i}$ lies in $O$ . Then, $\exists θ, 0 < θ < 1$ such that
$f (x + a e_{i}) - f (x) = \frac{\partial f}{\partial x _{i}} (x + θ a e_{i}) a$

Intuition: If we view our function along an axis, we get a function on one-variable. On this, we can apply single-variable MVT!

Proof

Let $I$ be the open interval of real numbers containing 0 and $a$ . Note that by assumption, $\forall t \in I$ , $x + t e_{i}$ is in our open set $O$ .

Now, define $ϕ (t) = f (x + t e_{i})$ . Then, as $f$ has a partial derivative with respect to $x_{i}$ , we find that $ϕ (t)$ is differentiable, whose derivative is given as
$ϕ^{'} (t) = \frac{\partial f}{\partial x _{i}} (x + t e_{i})$
Thus, we can apply the single-variable MVT to find a $0 < θ < 1$ such that
$ϕ (a) - ϕ (0) = ϕ^{'} (θ a) (a - 0) f (x + a e_{i}) - f (x) = \frac{\partial f}{\partial x _{i}} (x + θ a e_{i}) a$

We use this Lemma to prove the following.

Proposition: Mean Value Proposition

Let $f : R^{n} \to R$ be a function. Assume all partials $\frac{\partial f}{\partial x _{i}}$ exist $\forall x \in R^{n}$ , $\forall i \in {1, \dots n}$ .

Choose an $x \in R^{n}$ , and an offset $h \neq = 0, h \in R^{n}$ . Then, there exists a $z_{1}, \dots z_{n}$ in the ball around $x$ of radius $∣∣ h ∣∣$ ( $B_{∣∣ h ∣∣} (x)$ ) such that
$f (x + h) - f (x) = \frac{\partial f}{\partial x _{1}} (z_{1}) h_{1} + \dots + \frac{\partial f}{\partial x _{n}} (z_{n}) h_{n}$

Proof

We prove this for $n = 2$ , though the proof can very easily be extended to more dimensions.

Let $x = (x_{1}, x_{2})$ , $x + h = (x_{1} + h, x_{2} + h)$ . Look at the difference. Our goal is to expand this difference into a sum of differences along one variable (only one variable changes), so that we can apply the Mean Value Lemma on each term!
$f (x_{1} + h, x_{2} + h) - f (x_{1}, x_{2}) = f (x_{1} + h, x_{2} + h) - f (x_{1}, x_{2} + h_{2}) + f (x_{1}, x_{2} + h_{2}) - f (x_{1}, x_{2})$
This gives us two differences, where only one variable is changing in each. In other words, we have two differences in one-dimension!
$f (x_{1} + h, x_{2} + h) - f (x_{1}, x_{2} + h_{2}) f (x_{1}, x_{2} + h_{2}) - f (x_{1}, x_{2})$
Thus, by the Mean Value Lemma,
$f (x_{1} + h, x_{2} + h) - f (x_{1}, x_{2} + h_{2}) = \frac{\partial f}{\partial x _{1}} (x_{1} + θ h_{1}, x_{2} + h_{2}) h_{1} f (x_{1}, x_{2} + h_{2}) - f (x_{1}, x_{2}) = \frac{\partial f}{\partial x _{2}} (x_{1}, x_{2} + θ_{2} h_{2}) h_{2} f (x_{1} + h, x_{2} + h) - f (x_{1}, x_{2}) = \frac{\partial f}{\partial x _{1}} (x_{1} + θ h_{1}, x_{2} + h_{2}) h_{1} + \frac{\partial f}{\partial x _{2}} (x_{1}, x_{2} + θ_{2} h_{2}) h_{2}$
Let $z_{1} = (x_{1} + θ_{1} h_{1}, x_{2} + h_{2})$ , and $z_{2} = (x_{1}, x_{2} + θ_{2} h_{2})$ . Note that each $z_{i}$ is within the ball of $B_{∣∣ h ∣∣} (x)$ . We are done!

Recall that in our definitions of partial derivatives, we differentiate a function with respect to one of the axes

t \to 0 lim \frac{f ( x + t e _{i} ) - f ( x )}{t}

But what if we wanted to differentiate in a direction that isn’t aligned with the axes? This is where directional derivatives come in!

Let $O \subseteq R^{n}$ open, and consider the function $f : O \to R^{n}$ . For a point $x \in O$ , and direction $h$ , we define the directional derivative as

\frac{\partial f}{\partial h} (x) = t \to 0 lim \frac{f ( x + tp ) - f ( x )}{t}

If the limit exists.

Now let’s define the gradient of the function, $\nabla f$ , as the row vector

\nabla f (x) = (\frac{\partial f}{\partial x _{1}}, \dots, \frac{\partial f}{\partial x _{n}})

In some cases, we can calculate the directional derivative using the gradient, which can be a lot easier than taking a limit!

Theorem: Directional Derivative Theorem

Let $O \subseteq R^{n}$ open, and let $f : R^{n} \to R$ be $C^{1}$ .

Then, $\forall x \in O$ , and all directions $\forall h \neq = 0$ , the function has a directional derivative at $x$ in the direction $h$ , which can be calculated as
$\frac{\partial f}{\partial h} = t \to 0 lim \frac{f ( x + t h ) - f ( x )}{t} = ⟨ \nabla f (x), h ⟩ = i = 1 \sum n \frac{\partial f}{\partial x _{i}} (x) h_{i}$
In other words, the inner product of $h$ with the gradient of the function!

Proof

By the Mean Value Proposition,
$\frac{f ( x + t h ) - f ( x )}{t} = \frac{1}{t} (\frac{\partial f}{\partial x _{1}} (z_{1}) t h_{1} + \dots + \frac{\partial f}{\partial x _{n}} (z_{n}) t h_{n}) = \frac{\partial f}{\partial x _{1}} (z_{1}) h_{1} + \dots + \frac{\partial f}{\partial x _{n}} (z_{n}) h_{n}$
For $z_{1}, \dots z_{n} \in B_{∣∣ t h ∣∣} (x)$ . Then, as $t \to 0$ , the ball of $B_{∣∣ t h ∣∣} (x)$ will shrink towards $x$ , forcing all $z_{i}$ ’s to converge to $x$ ! Thus, as $t \to 0,$ we have
$\frac{\partial f}{\partial h} = tt o 0 lim \frac{f ( x + t h ) - f ( x )}{t} = \frac{\partial f}{\partial x _{1}} (x) h_{1} + \dots + \frac{\partial f}{\partial x _{n}} (x) h_{n}$

Note that sometimes the directional derivative may be denoted as
$\frac{\partial f}{\partial h} = \frac{d}{d t}_{t = 0} f (x + t h)$

Theorem: The Mean Value Theorem (Multi-Variable)

Let $f : R^{n} \to R$ be continuously differentiable. Also let $x \in R^{n}$ , $h \in R^{n}$ where $h \neq = 0$ .

Then, if the segment joining $x, x + h$ lies in $O$ , then there exists $0 < θ < 1$ such that
$f (x + h) - f (x) = ⟨ \nabla f (x + θ h), h ⟩$

This is the Mean Value Proposition, with the additional assertion that $z_{1}, \dots z_{n}$ are assumed to be at the same point.

Proof

Let $ϕ : R \to R$ , $ϕ (t) = f (x + t h)$ . We know that for $t = 1, 0$ , we have
$ϕ (1) = f (x + h) ϕ (0) = f (x)$
Then, $f (x + h) - f (x) = ϕ (1) - ϕ (0) = ϕ^{'} (θ) (1 - 0)$ , $0 < θ < 1$ , by the single-variable MVT, and furthermore, as the derivative of $ϕ (t)$ is the directional derivative,
$f (x + h) - f (x) = ϕ^{'} (θ) = ⟨ \nabla f (x + θ h), h ⟩$

We can also use directional derivatives to make a few extra inferences.

Note that if $p$ is a vector of norm 1, we can interpret the directional derivative as the rate of change in a particular direction!

Theorem: Fastest Rate of Change

Let $f : R^{n} \to R$ , $C^{1}$ . Fix $x$ , and assume $\nabla f (x) \neq = 0$ . Then, the maximum of the directional derivative at $x$ is given as
$∣∣ P ∣∣ = 1 max \frac{\partial f}{\partial P} (x)$
Is attained for
$P = \frac{\nabla f ( x )}{∣∣\nabla f ( x ) ∣∣}$
In other words, the direction of the gradient.

Proof

If $∣∣ P ∣∣ = 1$ , we have
$\frac{\partial f}{\partial P} (x) = ⟨ \nabla f (x), P ⟩$
By Cauchy-Schwarz, this is
$\leq ∣∣\nabla f (x) ∣∣ \cdot ∣∣ P ∣∣ = ∣∣\nabla f (x) ∣∣$
We have an upper bound on our directional derivative! We can attain our upper bound if $P = \frac{\nabla f ( x )}{∣∣\nabla f ( x ) ∣∣}$ .
$⟨ \nabla f (x), P ⟩ = ⟨ \nabla f (x), \frac{\nabla f ( x )}{∣∣\nabla f ( x ) ∣∣} ⟩ = \frac{1}{∣∣\nabla f ( x ) ∣∣} ⟨ \nabla f (x), \nabla f (x)⟩ = ∣∣\nabla f (x) ∣∣$
We’ve found a maximizer.

Furthermore, we can use directional dervatives to prove a notion of continuity on multiple variables.

Theorem: Partial Derivatives and Continuity

Let $f : R^{n} \to R$ , and assume $f$ is continuously differentiable. Then, $f$ is continuous.

Recall that if $f$ is $C^{1}$ , then all partials exist and are continuous.

Proof

We look at $f (x + h) - f (x)$ , and claim that as $h \to 0$ , $f (x + h) \to f (x)$ .

By MVT, for some $0 < θ < 1$ ,
$∣ f (x + h) - f (x) ∣ = ∣ ⟨ \nabla f (x + θ h), h ⟩ ∣$
By Cauchy Schwarz, we can bound this by
$\leq ∣∣\nabla f (x + θ h) ∣∣ \cdot ∣∣ h ∣∣$
But as $h$ is convergent, and the functions are continuous, we can find a bound for the first term!
$∣∣\nabla f (x + θ h) ∣∣ \leq max ∣∣\nabla f (y) ∣∣ \cdot ∣∣ x - y ∣∣ \leq C$
So, this drops to 0.

By this proof, in fact, if all the partials exist $\forall x \in O$ and are bounded, then $f$ is still continuous!

We end with a small remark that will segway into the next section. Let $f : R^{n} \to R, C^{1}$ . Then,

h \to 0 lim \frac{f ( x + h ) - f ( x ) - ⟨ \nabla f ( x ) , h ⟩}{∣∣ h ∣∣} = 0

This can be proven by using Cauchy-Schwarz.

We use this to define differentiable functions! $f : R^{n} \to R$ is differentiable at $x$ if $\exists Q \in R^{n}$ such that

{\frac{f ( x + h ) - f ( x ) - ⟨ Q , h ⟩}{∣∣ h ∣∣}} \to 0

as $h \to 0$ .

This is a stronger notion than partial diffentiation! So,

$f \in C^{1}$ implies that $f$ is differentiable
$f$ differentiable implies that all parties of $f$ exist

But, the converses are not true!

Local Approximation of Real-Valued Functions

First Order Approximations

Motivation

Say we have some function, and we want to analyze the behavior of it in an area around the point $x$ . One way to do this is to choose another function $g$ that approximates $f$ , yet is simpler! We can then work with $g$ to see what properties it has (and inherits from $f$ ).

Let $O \subseteq R^{n}$ , and $x \in O$ . For a positive integer $k$ , we say that functions $f, g : O \to R$ are $k^{t h}$ order approximations of one another at $x$ if

h \to 0 lim \frac{f ( x + h ) - g ( x + h )}{∣∣ h ∣ ∣ ^{k}} = 0

We ask, can we find a first-order approximation for a given function $f$ ?

Theorem: First Order Approximation Theorem

Let $O \subseteq R^{n}$ open, $f : O \to R$ be $C^{1}$ . Then, for $x \in O$ , we have first order approximation of $f$
$h \to 0 lim \frac{f ( x + h ) - [ f ( x ) + ⟨ \nabla f ( x ) , h ⟩]}{∣∣ h ∣∣} = 0$

Proof

Recall previously that by MVT, we find $0 < θ < 1$ such that
$f (x + h) - f (x) = ⟨ \nabla f (x + θ h), h ⟩$
We can subtract both sides by $⟨ \nabla f (x), h ⟩$ and apply Cauchy Schwarz to obtain
$f (x + h) - f (x) - ⟨ \nabla f (x), h ⟩ f (x + h) - f (x) - ⟨ \nabla f (x), h ⟩ ∣ f (x + h) - f (x) - ⟨ \nabla f (x), h ⟩ ∣ \frac{∣ f ( x + h ) - f ( x ) - ⟨ \nabla f ( x ) , h ⟩ ∣}{∣∣ h ∣∣} = ⟨ \nabla f (x + θ h), h ⟩ - ⟨ \nabla f (x), h ⟩ = ⟨ \nabla f (x + θ h) - \nabla f (x), h ⟩ \leq ∣∣\nabla f (x + θ h) - \nabla f (x) ∣∣ \cdot ∣∣ h ∣∣ \leq ∣∣\nabla f (x + θ h) - \nabla f (x) ∣∣$
Because $f$ is continuously differentiable, we know that
$h \to 0 lim ∣∣\nabla f (x + θ h) - \nabla f (x) ∣∣ = 0$
So by the Comparison Lemma, we can force our original limit to be 0.

We can alternatively write this in a few ways.

Let $E (x, h)$ denote some error depending on $x$ and $h$ . Then, our approximation can be given as $f (x + h) = f (x) + ⟨ \nabla f (x), h ⟩ + E (x, h) h \to 0 lim \frac{E ( x , h )}{∣∣ h ∣∣} = 0$ As the error drops to 0 when dividing by $∣∣ h ∣∣$ , we can also say that the error is of first order, $O (∣∣ h ∣∣)$ .
Letting $y = x + h$ , $x$ fixed, we can also write our error as $f (y) = f (x) + ⟨ \nabla f (x), (y - x)⟩ + O (∣∣ x - y ∣∣)$ So if $x$ is fixed, and $y$ is sufficiently close to $x$ , then we have a close approximation!

We can also interpret this formula geometrically. In fact, interestingly enough, our first order approximation is equivalent to a tangent plane approximation of our function!

Proof

Define $G$ to be the function $G = {(x_{1}, x_{2}, f (x_{1}, x_{2}))}$ . This defines a surface in 3-dimensions. We will define the tangent plane at $(a, b)$ .

At $(a, b)$ , we find tangent directions by differentiating with respect to variables $x_{1}$ and $x_{2}$ .
$T_{1} = (1, 0, \frac{\partial f}{\partial x _{1}} (a, b)) T_{2} = (0, 1, \frac{\partial f}{\partial x _{2}} (a, b))$
We take the cross product, to find a vector orthogonal to both. This will give us a vector that is a normal to our surface.
$N = (- \frac{\partial f}{\partial x _{1}} (a, b), \frac{\partial f}{\partial x _{2}} (a, b), 1)$
We can use this to define the tangent plane at $(a, b, f (a, b))$ as
$(x_{1} - a, x_{2} - b, f (x_{1}, x_{2}) - f (a, b)) \cdot (- \frac{\partial f}{\partial x _{1}} (a, b), \frac{\partial f}{\partial x _{2}} (a, b), 1) = 0 f (x_{1}, x_{2}) - f (a, b) - \frac{\partial f}{d x _{1}} (a, b) (x_{1} - a) - \frac{\partial f}{\partial x _{2}} (a, b) (x_{2} - b) = 0 f (x_{1}, x_{2}) - [f (a, b) + \frac{\partial f}{d x _{1}} (a, b) (x_{1} - a) + \frac{\partial f}{\partial x _{2}} (a, b) (x_{2} - b)] = 0$
Thus is the same as our first-order approximation formula! Simply redefine $(x_{1}, x_{2})$ to be offsets $(a, b)$ , $(x_{1}, x_{2}) + h$ .

Second Order Approximations and Second Derivatives

Motivation

In the single-variable case, we had the second-derivative test for determining minimums and maximums. Here, we develop the corresponding test for multiple variables.

Definitions and Context

Let $A$ be an $n \times n$ matrix. Note that for any vector $x$ , the matrix-vector product

A x = y

Is equivalent to the values inner products of the $i^{t h}$ row of $A$ and $x$ ! If $A_{i}$ denotes the $i^{t h}$ row of $A$ , then

A x = (⟨ A_{1}, x ⟩, \dots, ⟨ A_{i}, x ⟩, \dots, ⟨ A_{n}, x ⟩)

This fact will be useful later!

Let $A$ be an $n \times n$ matrix. Then, the function $Q : R^{n} \to R$ given by

Q (h) = ⟨ A h, h ⟩

Is known as the quadratic function associated with the matrix $A$ .

This function gives us a clean notation for generalizing directional derivatives into higher orders!

Let $f$ be $C^{2}$ . We define the Hessian Matrix of $f$ , denoted $\nabla^{2} f$ , as the $n \times n$ matrix where for each pair of indices $i, j$ ,

(\nabla^{2} f (x))_{ij} = \frac{\partial f}{\partial x _{j} \partial x _{i}} (x)

In other words,

\nabla^{2} f (x) = \frac{\partial ^{2} f}{\partial x _{1}^{2}} ⋮ \frac{\partial ^{2} f}{\partial x _{n} \partial x _{1}} \dots ⋱ \dots \frac{\partial ^{2} f}{\partial x _{1} \partial x _{n}} ⋮ \frac{\partial ^{2} f}{\partial x _{n}^{2}}

Note that if $f$ has continuous second-order partials, then the Hessian Matrix is symmetric because the $ij$ entry would equal the $ji$ entry!

We use the quadratic function notation to define higher order directional derivatives. If $f \in C^{2} (R)$ , $x, h$ fixed, then

$\frac{d}{d t} f (x + t h) = ⟨ \nabla f (x + t h), h ⟩ = i = 1 \sum n \frac{\partial f}{\partial x _{i}} (x + t h) h_{i}$
$\frac{d ^{2}}{d t ^{2}} f (x + t h) = ⟨ \nabla^{2} f (x + t h, h)⟩ = i, j = 1 \sum n \frac{\partial ^{2} f}{\partial x _{i} \partial x _{j}} (x + t h) h_{i} h_{j}$

Notice the pattern!

Proof

For (1), this is a chain rule.

For (2), we have
$\frac{d}{d t} [\frac{d}{d t} f (x + t h)] = \frac{d}{d t} [i = 1 \sum n (\frac{\partial f}{\partial x _{i}}) (x + t h) h_{i}] = i = 1 \sum n [j = 1 \sum n \frac{\partial}{\partial x _{j}} \frac{\partial f}{\partial x _{i}} (x + t h) h_{j}] h_{i}$

Remark

If $f \in C^{3}$ , then
$\frac{d ^{3}}{d t ^{3}} f (x + t h) = i, j, k = 1 \sum n \frac{\partial ^{3} f}{\partial x _{i} \partial x _{j} \partial x _{k}} (x + t h) h_{i} h_{j} h_{k}$

In the above formulas, (2) will be quite useful in establishing a second-derivative criterion for the multi-variable case. However, we will also need some way to estimate the sizes of the values that quadratic functions can take on! These tools are given as follows.

Let $A$ be an $n \times n$ matrix, $A = (a_{ij})$ . The Hilbert-Schmidt norm of $A$ is given as

∣∣ A ∣ ∣_{HS} = (i, j = 1 \sum n a_{ij}^{2})^{1/2}

We think of the matrix as a long vector, and take the vector norm.

With this norm for a matrix, we can generalize the Cauchy-Schwarz Inequality!

Theorem: Generalized Cauchy Schwarz Inequality

Let $A$ be $n \times n$ , and $h \in R^{n}$ . Then,
$∣∣ A h ∣∣ \leq ∣∣ A ∣∣ \cdot ∣∣ h ∣∣$

Proof

$A h ∣∣ A h ∣ ∣^{2} = Row 1 ⋮ Row n h = ⟨ r_{1}, h ⟩ ⋮ ⟨ r_{n}, h ⟩ = (⟨ r_{1}, h ⟩^{2} + \dots + ⟨ r_{n}, h ⟩^{2}) \leq (∣∣ r_{1} ∣ ∣^{2} + \dots + ∣∣ r_{n} ∣ ∣^{2}) ∣∣ h ∣ ∣^{2} \leq ∣∣ h ∣ ∣^{2} i, j = 1 \sum n a_{ij}^{2} = ∣∣ A ∣ ∣^{2} ∣∣ h ∣ ∣^{2}$

We can also define the operator norm of $A$ as

∣∣ A ∣ ∣_{op} = ∣∣ h ∣∣ = 1 max ∣∣ A h ∣∣

Based on this, and the Generalized Cauchy-Schwarz Inequality, we can find that for $∣∣ h ∣∣ = 1$ ,

∣∣ A h ∣∣ \leq ∣∣ A ∣ ∣_{H S} ∣∣ A ∣ ∣_{o p} \leq ∣∣ A ∣ ∣_{H S}

Let $A$ be a $n \times n$ matrix. $A$ is positive definite if

⟨ A u, u ⟩ > 0 u \neq = 0

Similarly, $A$ is negative definite if

⟨ A u, u ⟩ < 0 u \neq = 0

Proposition: Properties of Positive Definite Matrices

Let $A$ be a positive definite matrix. Then, there exists a $c > 0$ such that
$Q (u) = ⟨ A u, u ⟩ \geq c ∣∣ u ∣ ∣^{2}$
For all $u \in R^{n}$ .

Proof

Note that the LHS and the RHS are both homogeneous degree 2.
$⟨ A (t u), t u ⟩ = t^{2} ⟨ A u, u ⟩ ∣∣ t u ∣ ∣^{2} = t^{2} ∣∣ u ∣ ∣^{2} \forall t > 0, \forall u \in R^{n}$
Thus, our equation is true for $u$ if and only if it is true for any other $t u$ , $(t > 0)$ .

Thus, it suffices to show that our equation is true for unit vectors $\frac{u}{∣∣ u ∣∣}$ , as by the earlier proposition, the argument applies for all $u$ . Then,
$⟨ A u, u ⟩ \geq c \forall∣∣ u ∣∣ = 1$
This creates a continuous function along a $S^{n - 1}$ hypersphere in $R^{n}$ , which is sequentially compact. Thus, it must have a minimum and maximum. Choose the minimum to find our $c$ .

In fact, we can find $c$ by taking the minimum of the eigenvalues.

Second Order Approximation and Second Derivative Test

Let $A \subseteq R^{n}$ , $f : A \to R$ . Also, let $x \in A$ . Then, we have the following definitions:

$x$ is a local minimizer if there exists a $δ > 0$ such that $f (x) \leq f (x + h) (x + h) \in A, \forall0 < ∣∣ h ∣∣ < δ$
$x$ is a local maximizer if there exists a $δ > 0$ such that $f (x) \geq f (x + h) \forall0 < ∣∣ h ∣∣ < δ$
$x$ is a local extreme point if it is either a local minimizer or a local maximizer for $f$ .

Note that $x$ is a strict minimizer / maximizer if the inequality is strictly less than or greater than.

In the single-variable case, we found that for a local extremum to occur, the derivative must be 0. We define the analogous case for multiple variables.

Theorem: Necessity for Local Extremum

Let $O \subseteq R^{n}$ open, and let $f : O \to R$ have first-order partial derivatives. If $x \in O$ is a local extreme point for $f$ , then
$\nabla f (x) = 0$

But unlike the single variable case, finding the $x$ ’s such that this holds is very difficult, as we get a system of equations! To help us with this, we need a more formal way to define the behaviors of functions! We define a test analogous to the single-variable Second-Derivative Test to help us with this.

By the Lagrange Remainder Theorem, recall that if $f : R \to R$ , $f^{''} (x)$ exists for every $x$ , then for all $x, h \in R$ , there exists a $0 < θ < 1$ such that

f (x + h) = f (x) + f^{'} (x) h + \frac{1}{2} f^{''} (x + θ h) h^{2}

We can generalize this to the multi-variable case!

Theorem: Multi-Variable Remainder Theorem

Let $f : R^{n} \to R$ , $C^{2}$ . Then, for $x, h \in R^{n}$ , there exists $0 < θ < 1$ such that
$f (x + h) = f (x) + ⟨ \nabla f (x), h ⟩ + \frac{1}{2} ⟨ \nabla^{2} f (x + θ h) h, h ⟩$

Proof

Let $ϕ (t) = f (x + t h)$ . Then,
$ϕ (1) = ϕ (0) + ϕ^{'} (0) + \frac{1}{2} ϕ^{''} (θ)$
For some $0 < θ < 1$ .

Notice that this holds only because we assumed the second order derivatives are continuous. We find each term to be
$ϕ^{'} (0) = \frac{d}{d t}_{t = 0} f (x + t h) = ⟨ \nabla f (x), h ⟩ ϕ^{''} (t) = \frac{d ^{2}}{d t ^{2}} f (x + t h) = ⟨ \nabla^{2} f (x + t h) h, h ⟩$

This is in fact a second order approximation of $f$ !

Theorem: Second Order Approximation Theorem

Let $f : R^{n} \to R$ , $C^{2}$ . Then,
$h \to 0 lim \frac{f ( x + h ) - [ f ( x ) + ⟨ \nabla f ( x ) , h ⟩ + \frac{1}{2} ⟨ \nabla ^{2} f ( x ) h , h ⟩]}{∣∣ h ∣ ∣ ^{2}} = 0$

Proof

$\frac{f ( x + h ) - [ f ( x ) + ⟨ \nabla f ( x ) , h ⟩ + \frac{1}{2} ⟨ \nabla ^{2} f ( x ) h , h ⟩]}{∣∣ h ∣ ∣ ^{2}} = \frac{∣ \frac{1}{2} ⟨( \nabla ^{2} f ( x + θ h ) - \nabla ^{2} f ( x )) h , h ⟩ ∣}{∣∣ h ∣ ∣ ^{2}} \leq \frac{\frac{1}{2} ∣∣ ( \nabla ^{2} f ( x + θ h ) - \nabla ^{2} f ( x )) h ∣∣∣∣ h ∣∣}{∣∣ h ∣ ∣ ^{2}} \leq \frac{1}{2} ∣∣ \nabla^{2} f (x + θ h) - θ^{2} f (x) ∣∣ \to 0$

With the Second-Order Approximation Theorem, we can define a multi-variable analogy to the Second-Derivative test.

Theorem: Second Derivative Test

Let $f : R^{n} \to R$ , $C^{2}$ .

Let $x$ be a point such that $\nabla f (x) = 0$ .

If the Hessian Matrix $\nabla^{2} f (x)$ is positive definite, then $x$ is a strict local minimizer.

If the Hessian Matrix $\nabla^{2} f (x)$ negative definite, then $x$ is a strict local maximizer.

Proof (TODO)

We know
$f (x + h) = f (x) + ⟨ \nabla f (x), h ⟩ + \frac{1}{2} ⟨ \nabla^{2} f (x) h, h ⟩ + R (h)$
With $lim_{h \to 0} \frac{R ( h )}{∣∣ h ∣ ∣ ^{2}} = 0$ .

By assumption, $\nabla f (x) = 0$ , and $\nabla^{2} f (x)$ is positive definite matrix. By definition, $\exists c$ such that $⟨ \nabla^{2} f (x) h, h ⟩ \geq c ∣∣ h ∣ ∣^{2}$ .

So,
$f (x + h) \geq f (x) + 0 + \frac{c}{2} ∣∣ h ∣ ∣^{2} + R (h)$
But we can’t guarantee that $R (h)$ is positive, and in fact, it can be negative! So now, we show that $R (h)$ could be negative, but its magnitude cannot be too large.

Since $\lim_{h \to 0$ \frac{R(h)}{||h||} = 0 $,$ \exists \delta > 0$ such that
$∣ R (h) ∣ < \frac{c}{2} ∣∣ h ∣ ∣^{2} ∣∣ h ∣∣ < δ$
Thus,
$f (x + h) \geq f (x) + \frac{c}{2} ∣∣ h ∣ ∣^{2} > f (x) 0 < ∣∣ h ∣∣ < δ$
By definition, $x$ is a strict local minimizer.

Is the converse of this theorem also true? In other words, let $f : R^{n} \to R, C^{2}$ . Assume $x$ is a local minimizer. Then $\nabla f (x) = 0$ . But what about $\nabla^{2} f (x)$ ? Does it have to be positive definite?

No! As a counterexample, let $f (x, y) = x^{4} + y^{4}$ . Then, we have a strict local minimizer at $0$ , but the $\nabla^{2} f (x)$ is not positive definite.

But then, what’s a necessary condition for a minimizer? We discuss this below.

Let $A$ be a symmetric $n \times n$ matrix.

$A$ is positive semi-definite if $⟨ A h, h ⟩ \geq 0$ for all $h \in R^{n}$ .
$A$ is negative semi-definite if $⟨ A h, h ⟩ \geq 0$ for all $h \in R^{n}$ .

Note that the inner product can now be 0, where it couldn’t be before!

Theorem: Necessity Condition for Extremum

If $f : R \to R, C^{2}$ has a minimizer at $x$ , then $f^{'} (x) = 0$ , $f^{''} (x) \geq 0$ (can be 0). In other words, $\nabla^{2} f (x)$ has to be positive semi-definite.

Proof

Look at $ϕ (t) = f (x + t h)$ . Then, $ϕ (t) \in C^{2}$ , and has a minimizer at $t = 0$ . So, $ϕ^{'} (0) = ⟨ \nabla f, h ⟩ = 0$ , and $ϕ^{''} (0) = ⟨ \nabla^{2} f (x) h, h ⟩ \geq 0$ . In other words, the matrix is positive semi-definite.

In particular, if $f : R^{n} \to R, C^{2}$ , has a local minimum at $x$ , then all

\frac{\partial ^{2}}{\partial x _{i}^{2}} f (x) \geq 0

And similarly, if $f$ has a local maximum at $x$ , then all

\frac{\partial ^{2}}{\partial x _{i}^{2}} f (x) \leq 0

Proposition (IMPORTANT FOR EXAM)

Let $U$ be open in $R^{n}$ , $f : U \to R, C^{2}$ . Assume the Laplacian of $f$ at $x$ is positive for all $x \in U$ .
$Δ f (x) = \frac{\partial ^{2} f}{\partial x _{1}^{2}} (x) + \dots + \frac{\partial ^{2} f}{\partial x _{n}^{2}} (x) > 0 \forall x \in U$
Then, $f$ has no maximizer in $U$ .

Proof

Assume by contradiction that $f$ has a maximizer in $U$ , given as $x$ . Then, $\nabla f (x) = 0$ , and $\nabla^{2} f$ is positive semi-definite. So, $\frac{\partial ^{2} f}{\partial x _{i}^{2}} \leq 0$ , meaning
$Δ f (x) = \frac{\partial ^{2} f}{\partial x _{1}^{2}} (x) + \dots + \frac{\partial ^{2} f}{\partial x _{n}^{2}} (x) \leq 0$
Which is a contradiction.

Theorem

Let $U$ be a bounded open set, $\overset{ˉ}{U} = U \cup Boundary of U$ (giving us a sequentially compact set).

Let $f \in C^{2} (U)$ , $f$ continuous on $\overset{ˉ}{U}$ , satisfy
$Δ f (x) = 0 \forall x \in U$
Then,
$\overset{ˉ}{U} max f = Boundary U max f$
In other words, the maximum always occurs at the boundary.

Proof

Because $U \subseteq \overset{ˉ}{U}$ , $max_{Boundary U} f \leq max_{\overset{ˉ}{U}} f$ trivially.

Look at $f_{ϵ} (x) = f (x) + ϵ ∣∣ x ∣ ∣^{2}$ . Note that by this, $f_{ϵ} \geq f$ . Then,
$Δ f_{ϵ} (x) = \nabla f (x) + 2 n ϵ > 0$
Thus, $f_{ϵ}$ has no interior maximum, and so the maximum of $f_{ϵ}$ occurs at the boundary of $U$ .
$\overset{ˉ}{U} max f \leq \overset{ˉ}{U} max f_{ϵ} \leq Boundary U max f + ϵ ∣∣ x ∣ ∣^{2} \leq Boundary U max f + ϵK$
Where $K$ depends on $U$ . Let $ϵ \to 0$ to get
$\overset{ˉ}{U} max f \leq Boundary U max f$
We showed the inequalities in both directions, so we have equality. We are done.

Higher Order Approximations

Let $x \in R^{n}$ , and let there be a multi-index $α = (α_{1}, \dots, α_{n})$ where $α_{i} \in {0, 1}$ (a vector of 1’s and 0’s).

The multi-index will be used to “select” things we want later on!

With the multi-index, we define operations

∣ α ∣ = α_{1} + \dots + α_{n} α! = α_{1}! \dots α_{n}! x^{α} = x_{1}^{α_{1}} \dots x_{n}^{α_{n}} \partial^{α} f = (\frac{\partial ^{∣ α ∣}}{\partial x _{1}^{α_{1}} \dots \partial x _{n}^{α_{n}}} f) (x)

Proposition: Multinomial Formula

$(x_{1} + \dots + x_{n})^{k} = ∣ α ∣ = k \sum \frac{k !}{α !} x^{α}$

Proof

We prove this by induction. Start with $n = 2$ . Then, we have
$(x_{1} + x_{2})^{k} = i = 0 \sum k \frac{k !}{i ! ( k - i )!} x_{1}^{i} x_{2}^{k - i}$
Now, suppose that our formula is true for $n - 1$ ( $n \geq 3$ ). Prove it for $n$ .
$((x_{1} + \dots + x_{n - 1}) + x_{n})^{k} = i = 0 \sum k \frac{k !}{i ! ( k - i )!} (x_{1} + \dots + x_{n - 1})^{i} x_{n}^{k - i} = i = 0 \sum k \frac{k !}{i ! ( k - i )!} ∣ β ∣ = i \sum \frac{i !}{β !} \tilde{x}^{β} x_{n}^{k - i}$
Define $α = (β, k - i)$ , $β$ having length $i$ . Then, $α$ has length $k$ , and
$\frac{1}{( k - i )! ( β )!} = \frac{1}{α !} \tilde{x}^{β} x_{n}^{k - i} = x_{1}^{β_{1}} \dots x_{n - 1}^{β_{n - 1}} x_{n}^{k - i} = x^{α}$
So,
$i = 0 \sum k \frac{k !}{i ! ( k - i )!} ∣ β ∣ = i \sum \frac{i !}{β !} \tilde{x}^{β} x_{n}^{k - i} = ∣ α ∣ = k \sum \frac{k !}{α !} x^{α}$

Let $f \in C^{k} (R^{n})$ . Look at

ϕ (t) = f (x + t h), ϕ : R \to R, C^{1}

We know that

ϕ (1) = ϕ (0) = ϕ^{'} (0) + \frac{1}{2} ϕ^{''} (0) + \dots + \frac{1}{( k - 1 )!} ϕ^{k - 1} (0)

The one dimensional Taylor expansion!

Express $ϕ^{k} (t)$ in terms of partials of $f$ , $k \in N$ .

ϕ^{'} (t) ϕ^{''} (t) ϕ^{k} (t) = ⟨ \nabla f (x + t h), h ⟩ = (h \cdot \nabla) (x + t h) = h_{1} \frac{\partial f}{\partial x _{1}} (x_{1} + t h) + \dots + h_{n} \frac{\partial f}{\partial x _{n}} (x + t h) = [(h_{1} \partial_{1} + \dots + h_{n} \partial_{n}) f] (x + t h) = ⟨ \nabla^{2} f (x + t h) h, h ⟩ = [(h_{1} \partial_{1} + \dots + h_{n} \partial_{n})^{2} f] (x + t h) = [(h_{1} \partial_{1} + \dots + h_{n} \partial_{n})^{k} f] (x + t h) = ∣ α ∣ = k \sum \frac{k !}{α !} (h^{α} \partial^{α} f) (x + t h)

Let $t = 0$ . Then, we have

\frac{1}{i !} ϕ^{i} (0) = ∣ α ∣ = i \sum \frac{1}{α !} \partial^{α} f (x) h^{α}

We have proven the following.

Theorem: Higher Order Approximations

Let $f : R^{n} \to R, C^{k}$ . Then,
$f (x + t h) = j = 0 \sum k - 1 ∣ α ∣ = j \sum \frac{1}{α !} \partial^{α} f (x) h^{α} + ∣ α ∣ = k \sum \frac{1}{α !} \partial^{α} f (x + θ h) h^{α} = j = 0 \sum k - 1 ∣ α ∣ \leq k - 1 \sum \frac{1}{α !} \partial^{α} f (x) h^{α} + ∣ α ∣ = k \sum \frac{1}{α !} \partial^{α} f (x + θ h)$
For some $0 < θ < 1$

Notice the similarity with $ϕ : R \to R, C^{k}$ .
$ϕ (x + h) = i = 0 \sum k - 1 \frac{1}{i !} ϕ^{i} (x) h^{i} + \frac{1}{k !} ϕ^{k} (x + θ h) h^{k}$
For $0 < θ < 1$ , which is the 1-dimensional approximation formula!

Linear Map Approximations of Non-Linear Mappings

Motivation

Before, we studied linear mappings, or in other words, functions that can be expressed as linear transformations. Now, we turn to examine mappings that may not necessarily be linear!

Linear Mappings

We say a function $T : R^{n} \to R^{m}$ is linear if for all $α, β \in R$ , $u, v \in R^{n}$

T (αu + β v) = α T (u) + βT (v)

Theorem: Linear Mappings as Matrices

If $T : R^{n} \to R^{m}$ is linear, then there exists a unique $m \times n$ matrix $A$ such that
$T (u) = A u$

Proof

Uniqueness

Let $A, B$ be two non-unique matrices satisfying our property. Then,
$A u - B u = T (u) - T (u) = 0 ⟹ (A - B) u = 0$
Now, observe that for any row of $(A - B)$ , we are taking the dot product of $u$ with that row. Let $u$ be the row, to see that the norm must be 0. This is only possible if the row is the 0 vector.

Existence

Let $e_{1}, \dots e_{n}$ be the standard basis. Represent $u$ as
$u = u_{1} e_{1} + \dots + u_{n} e_{n}$
Then,
$T (u) = T (u_{1} e_{1} + \dots + u_{n} T (e_{n}) = u_{1} T (e_{1}) + \dots + u_{n} T (e_{n}) = (T (e_{1}), \dots T (e_{n})) u_{1} ⋮ v_{n}$

As given above, linear transformations can be given as their matrices, and in fact, many of their properties can be expressed in terms of matrices as well!

First, we consider compositions of transformations. Let us have linear transformations $T : R^{n} \to R^{m}$ , $S : R^{m} \to R^{k}$ . Let $A, B$ be matrices such that

T (u) = A u S (u) = B u

Then, the matrix of the composition of these transformations is

T (S (u)) = A (B u) = (A B) u S (T (u)) = B (A u) = (B A) u

Which is the product of the matrices!

Let’s now consider inverses of transformations.

Theorem: Invertible Transformations

$T : R^{n} \to R^{n}$ linear, is invertible (as a function) if and only if the corresponding matrix $A$ is invertible as a matrix if and only if $det (A) \neq = 0$ .

We commonly determine that transformations are invertible by checking the matrices!

Theorem: Properties of Invertible Matrices

Let $A$ be an $n \times n$ matrix. Then, $A$ is invertible if and only if $\exists c > 0$ such that
$∣∣ A u ∣∣ \geq c ∣∣ u ∣∣ u \in R^{n}$

By definition, $A$ is invertible if there exists a matrix $A^{- 1}$ such $A A^{- 1} = I$ .

Proof

Proof ( $\leftarrow$ )

If $∣∣ A u ∣∣ \geq c ∣∣ u ∣∣$ for all $u \in R^{n}$ , then the null space of $A$ is ${0}$ , as if $A u = 0$ , then $∣∣ A u ∣∣ \geq c ∣∣ u ∣∣$ forces $u = 0$ .

From linear algebra, we know that if $A$ is on finite dimensions $n \times n$ , and the null space is ${0}$ , then the range of $A$ is $R^{n}$ and $A$ is invertible.

Proof ( $\to$ )

Conversely, if $\exists A^{- 1}$ such that $A A^{- 1} = A^{- 1} A$ , then we wish to find a $c > 0$ such that $∣∣ A u ∣∣ \geq c ∣∣ u ∣∣$ .

Because $A^{- 1} A = I$ , then $A^{- 1} A u = u$ .
$u = ∣∣ A^{- 1} A u ∣∣ \leq ∣∣ A^{- 1} ∣∣∣∣ A u ∣∣$
This is the generalized Cauchy Schwarz inequality!

Thus,
$∣∣ A u ∣∣ \geq \frac{1}{∣∣ A ^{- 1} ∣∣} ∣∣ u ∣∣$
Let $c = \frac{1}{∣∣ A ^{- 1} ∣∣}$ .

Recall that $∣∣ A ∣∣ = (\sum a_{ij}^{2})^{1/2}$ .

Is $V$ is a vector space with bases $v_{1}, \dots v_{n}$ , and also $w_{1}, \dots w_{n}$ , and the bases are related by $(v_{1}, \dots v_{n}) = (w_{1}, \dots w_{n}) C$

If the matrix of $T$ with respect to ${v_{1}, \dots v_{n}}$ is $A$ , ${w_{1}, \dots w_{n}}$ is $B$ , then $A = CB C^{- 1}$ .

Proof

We know that $(v_{1} \dots v_{n}) = (w_{1}, \dots w_{n}) C$ , then
$(T (v_{1}), \dots T (v_{n})) = (T (w_{1}), \dots T (w_{n})) C (v_{1}, \dots v_{n}) A = (w_{1}, \dots w_{n}) C A$
So, $A = C^{- 1} BC$ .

The Derivative Matrix and Differential

We consider the following classes of mappings. These are mappings that may be non-linear, and are approximatable by linear mappings.

Let $O \subseteq R^{n}$ , and consider mapping $F : O \to R^{m}$ represented as component functions

F = (F_{1}, \dots, F_{m})

We have the following definitions for $F$

$F$ is said to have first-order partial derivatives at $x \in O$ , provided that for all $1 \leq i \leq m$ , $F_{i}$ has first-order partial derivatives at $x$ .
$F$ is said to have first-order partial derivatives, if it has first-order partial derivatives for all $x \in O$ .
$F$ is said to be continuously differentiable provided that all of the $F_{i}$ ’s are continuously differentiable.

Theorem: Continuity on Mappings

Let $O \subseteq R^{n}$ , and $F : O \to R^{m}$ .

Let $F$ be continuously differentiable. Then, $F$ is continuous.

Now, define $F : O \to R^{m}$ with first-order partials at $x \in O$ . We define the derivative matrix of $F$ at $x$ , denoted $D F (x)$ , as the matrix whose $ij$ th entry is given by

D F (x)_{ij} = \frac{\partial F _{i}}{\partial x _{j}} (x) D F (x) = \frac{\partial F _{1}}{\partial x _{1}} (x) \frac{\partial F _{m}}{\partial x _{1}} (x) \dots ⋮ \dots \frac{\partial F _{1}}{\partial x _{n}} \frac{\partial F _{m}}{\partial x _{n}} ⋮ = \nabla F_{1} (x) ⋮ \nabla F_{m} (x)

We define the gradient of a function as a row vector.

We use this derivative matrix to generalize our findings in earlier sections.

Theorem: Mean Value Theorem for Mappings

Let $F : R^{m} \to R^{n}, C^{1}$ . Then, for $x, h \in R^{n}$ , we find $0 < θ_{1} < 1, \dots, 0 < θ_{m} < 1$ such that
$F (x + h) - F (x) = \nabla F_{1} (x + θ_{1} h) ⋮ \nabla F_{m} (x + θ_{m} h) h$

This is the multi-variable MVT applied to each component!

Proof

Apply the MVT for each $F_{i} : R^{n} \to R, C^{1}$ .

Mean Value Theorem Misconception

Note that above, if we chose all $θ_{i}$ ’s to be equal, then we would have
$F (x + h) - F (x) = D F (x + θ h) h$
Which seems like a very clean generalization of the MVT! However, it is not guaranteed that we can find a single $θ$ that works for each $θ_{i}$ .

Theorem: First-Order Approximation Theorem for Mappings

Let $F : R^{n} \to R^{m}, C^{1}$ . Then,
$h \to 0 lim \frac{∣∣ F ( x + h ) - [ F ( x ) + D F ( x ) h ] ∣∣}{∣∣ h ∣∣} = 0$

Proof

The $i^{t h}$ component of the above quantity, from the previous chapter, is
$\frac{F _{i} ( x + h ) - F _{i} ( x ) - ⟨ \nabla F _{i} ( x ) , h ⟩}{∣∣ h ∣∣} \to 0$

It can be shown that at $x$ , $D F (x)$ is the only matrix in which this limit holds.

Theorem

Let $F : R^{n} \to R^{m}$ . Fix $x$ , and suppose there exists an $m \times n$ matrix $A$ such that
$h \to 0 lim \frac{∣∣ F ( x + h ) - [ F ( x ) + A h ] ∣∣}{∣∣ h ∣∣} = 0$
Then, the mapping $F$ has first-order partial derivatives at $x$ , and $A = D F (x)$ .

Proof

Look at the $i^{t h}$ component.
$h \to 0 lim \frac{F _{i} ( x + h ) - F _{i} ( x ) - ⟨ \nabla F _{i} ( x ) , h ⟩}{∣∣ h ∣∣} \to 0 h \to 0 lim \frac{F _{i} ( x + h ) - F _{i} ( x ) - ⟨( a _{i 1} , \dots a _{in} ) , ( h _{1} , \dots h _{n} )⟩}{∣∣ h ∣∣} \to 0$
In particular, for $h = t e_{j}$ , $t \to 0$ , we get
$t \to 0 lim \frac{F _{i} ( x + t e _{j} ) - F _{i} ( x ) - t a _{ij}}{∣ t ∣} = 0 t \to 0 lim \frac{F _{i} ( x + t e _{j} ) - F _{i} ( x ) - t a _{ij}}{t} = 0 t \to 0 lim \frac{F _{i} ( x + t e _{j} ) - F _{i} ( x )}{t} = a_{ij} \frac{\partial F _{i}}{\partial x _{j}} (x) = a_{ij}$

We also say $F : R^{n} \to R^{m}$ is differentiable at $x$ if there exists an $A$ , $m \times n$ matrix such that

h \to 0 lim \frac{∣∣ F _{i} ( x + h ) - [ F _{i} ( x ) + A h ] ∣∣}{∣∣ h ∣∣} = 0

So, $F \in C^{1} (R^{n})$ implies that F is differentiable $\forall x \in R^{n}$ , which implies that $D F (x)$ exists $\forall x \in R^{n}$ .

These are strict implications! See the examples below.

Example: Counterexamples

The below function is an example of $f : R^{2} \to R$ for which $D f (x)$ exists $\forall x \in R^{2}$ , but there does not exist an $A$ such that
$\frac{f ( x + h ) - f ( x ) - [ a _{1} h _{1} + a _{2} h _{2} ]}{∣∣ h ∣∣} = 0 f (x_{1}, x_{2}) = {\frac{x _{1} x _{2}}{x _{1}^{2} + x _{2}^{2}} 0 (x_{1}, x_{2}) \neq = (0, 0) (x_{1}, x_{2}) = (0, 0)$

Example

Let $F : R^{n} \to R^{m}, C^{1}$ . Assume $F (0) = 0, D F (0)$ satisfies $∣∣ D F (0) h ∣∣ \geq ∣∣ h ∣∣, \forall h \in R^{n}$ .

Prove that $\exists δ > 0$ such that $∣∣ F (h) ∣∣ \geq \frac{1}{2} ∣∣ h ∣∣, \forall∣∣ h ∣∣ < δ$ .

We know that
$h \to 0 lim \frac{F ( x + h ) - F ( 0 ) - D F ( 0 ) h}{∣∣ h ∣∣} = 0$ $∣∣ F (h) ∣∣ = ∣∣ F (h) - D F (0) h + D F (0) h ∣∣ \geq ∣∣ D F (h) ∣∣ - ∣∣ F (h) - D F (0) h ∣∣ \frac{∣∣ F ( h ) ∣∣}{∣∣ h ∣∣} = \frac{∣∣ F ( h ) - D F ( 0 ) h + D F ( 0 ) h ∣∣}{∣∣ h ∣∣} \geq \frac{∣∣ D F ( h ) ∣∣}{∣∣ h ∣∣} - \frac{∣∣ F ( h ) - D F ( 0 ) h ∣∣}{∣∣ h ∣∣} \geq 1/2$
Since
$h \to 0 lim \frac{F ( x + h ) - F ( 0 ) - D F ( 0 ) h}{∣∣ h ∣∣} = 0$
We know that $\exists δ > 0$ such that
$\frac{∣∣ F ( h ) - D F ( 0 ) h ∣∣}{∣∣ h ∣∣} \leq \frac{1}{2}$

The Chain Rule

From the single-variable case, recall that for $g, f$ , we can find the derivative of $(g \circ f)^{'} (x)$ as

(g \circ f)^{'} (x) = \frac{d}{d x} g (f (x)) = g^{'} (f (x)) f^{'} (x)

We can generalize this rule to higher dimensions!

Theorem: The Chain Rule

Let $O \subseteq R^{n}$ , and let $F : O \to R^{m}$ be continuously differentiable. Also let $U \subseteq R^{m}$ to define $g : U \to R$ continuously differentiable.

Suppose that $F (O) \subseteq U$ . Then, the composition $g \circ F$ is also continuously differentiable, and for $1 \leq i \leq n$ , we can find its partial derivative as
$\frac{\partial}{\partial x _{i}} (g \circ F) (x) = \nabla (g \circ F) (x) = \nabla g (F (x)) D F (x)$

Theorem: The Chain Rule for General Mappings

Let $O \subseteq R^{n}$ open. Let $F : O \to R^{m}$ , and let $U \subseteq R^{m}$ open to define $G : U \to R^{k}$ . Let $F, G$ be continuously differentiable.

Suppose that $F (O) \subseteq U$ . Then, their composition $G \circ F$ is also continuously differentiable, and for each $x$ , we can find
$D (G \circ F) (x) = D G (F (x)) \cdot D F (x)$

The Inverse Function Theorem

Motivation

The Inverse Function Theorem provides a sufficiency condition for when a function is one-to-one and invertible, and when we can compute the inverse. We generalize this to higher dimensions here!

Inverse Function Theorem: 1D, 2D

Recall the single-variable Inverse Function Theorem.

Theorem: Inverse Function Theorem (One Dimension)

Let $f : R \to R$ continuously differentiable, and let $x_{0} \in R$ such that $f^{'} (x) \neq = 0$ .

Then, there is an open interval $I$ around $x_{0}$ , and an open interval containing $f (x_{0})$ such that the function
$f : I \to J$
Is 1-1 and onto. Furthermore, in these intervals, we can define the function inverse $f^{- 1} : J \to I$ continuously differentiable as
$f^{- 1} (y)^{'} = \frac{1}{f ^{'} ( f ^{- 1} ( y ))}$
for all $y \in V$ .

This theorem is pretty important, as it tells us when we can find a function’s inverse! More importantly, it asserts that even if an entire function is not invertible, it may have smaller intervals where it is invertible.

We now generalize this theorem to higher dimensions!

We say that an open subset of $R^{n}$ containing $x$ is a neighborhood of the point $x$ . Using this definition, we will now generalize the Inverse Function Theorem to 2 dimensions.

Theorem: Inverse Function Theorem (Two Dimensions)

Let $F : R^{2} \to R^{2}$ , $C^{1}$ . Suppose at $(x_{0}, y_{0})$ , $D F (x_{0}, y_{0})$ is invertible.

Then, there exists a neighborhood $U$ around $(x_{0}, y_{0})$ , and a neighborhood $V$ around $F (x_{0}, y_{0})$ such that
$F : U \to V$
Is 1-1 and onto. Furthermore, $F^{- 1} : V \to U$ is $C^{1}$ , and for a point $F (x, y) = (u, v)$ where $(x, y) \in U, (u, v) \in V$ , we can find the derivative matrix of the inverse as
$D F^{- 1} (u, v) = [D F (x, y)]^{- 1}$

The inverse of the derivative matrix at $(x, y)$ !

Example: Inverse Function Theorem (2D)

$F (x, y) = (x^{2} - y^{2}, 2 x y)$
We find that
$det D F (x, y) = det [2 x 2 y - 2 y 2 x] = 4 (x^{2} + y^{2})$
So for all $(x, y) \neq = (0, 0)$ , our derivative matrix is non-zero! So, by the 2D Inverse Function Theorem, for any $(x_{0}, y_{0}) \neq = (0, 0)$ , there exists a neighborhood $U$ of $(x_{0}, y_{0})$ , $V$ of $F (x_{0}, y_{0}) = (x_{0}^{2} - y_{0}^{2}, 2 x_{0} y_{0})$ such that
$F : U \to V$
Is 1-1 and onto.

We ask, what happens at $(0, 0)$ ? Does there exist a neighborhood $U$ of $(0, 0)$ such that $F$ is 1-1 on $U$ ?

In fact, the answer is that this is impossible, because $F (x, y) = F (- x, - y)$ . So, for any neighborhood around $(0, 0)$ , $F$ is not one-to-one.

Example: Inverse Function Theorem (2D, 2)

Let $ϕ : R^{2} \to R, C^{1}$ , and
$F (x, y) = (ϕ (x, y), ϕ^{2} (x, y))$
We find derivative matrix
$F (x, y) = [\frac{\partial ϕ}{\partial x} 2 ϕ \frac{\partial ϕ}{\partial x} \frac{\partial ϕ}{\partial y} 2 ϕ \frac{\partial ϕ}{\partial y}]$
As the determinant of this matrix is always 0, we find that $F$ is not invertible anywhere.

Example: Inverse Function Theorem (2D, 3)

Note that the Inverse Function Theorem provides a sufficiency condition, and is not a necessity.

To show this, we find a $F : R^{2} \to R^{2}, C^{1}$ such that
$det (D F (x_{0}, y_{0})) = 0$
Yet $F$ is 1-1 and onto. Let $F (x, y) = (x^{3}, y^{3})$ . Then, even though the determinant of the derivative matrix $0$ at $x = 0$ or $y = 0$ , $F$ is 1-1 and onto.

However, we can prove that for all 3 conclusions of our theorem are to hold, then our our assumptions must hold. Let $F : R^{2} \to R^{2}, C^{1}$ and assume that $D F (0, 0)$ is not invertible, ad $F$ is 1-1, $F$ is onto.

By way of contradiction, as
$F \circ F^{- 1} (x, y) = (x, y)$
Then $(D F) F^{- 1} (x, y) \circ (D F^{- 1}) (x, y) = I$ , but $D F$ is not invertible at some point, which is not possible, as its inverse exists!

Example: Locality of Inverse Function Theorem (2D)

$F (x, y) = (e^{x} cos y, e^{x} sin y)$
We have
$D F (x, y) = [e^{x} cos y e^{x} sin y - e^{x} sin y e^{x} cos y]$
Where $det (D F) = e^{2 x} \neq = 0$ . So, the 2D Inverse Function Theorem holds for all $(x, y)$ ! In other words, for all $(x, y)$ we can find a local neighborhood such that $F$ is 1-1 and onto.

However, this note that this theorem does not hold globally, just locally.

$F$ is not 1-1 globally as we can find $F (x, y) = F (x, y + 2 kπ)$ .

$F$ is not onto globally, as there does not exist any $(x, y)$ such that $F (x, y) = (0, 0)$ .

Stability of Non-Linear Mappings

We will now introduce concepts necessary to generalize the inverse function theorem.

Theorem

For an $n \times n$ matrix $A$ , the following are equivalent:

$A$ is invertible

$\exists c > 0$ such that $∣∣ A h ∣∣ \geq c ∣∣ h ∣∣$ , $\forall h \in R^{n}$ .

We say that a mapping $F : O \to R^{n}$ , $O$ open, is stable if $\exists c > 0$ such that

∣∣ F (x) - F (y) ∣∣ \geq c ∣∣ x - y ∣∣ \forall (x, y) \in O

Remark

$F$ stable implies that $F$ is 1-1.

As a brief proof, if $F (x) = F (y)$ , then $∣∣ F (x) - F (y) ∣∣ \geq ∣∣ x - y ∣∣ \to x = y$ .

Note that $F$ is stable if and only if $F^{- 1}$ is Lipschitz (as the inequalities are flipped!)

Interestingly, we find that matrices that are sufficiently close to an invertible matrix are also invertible. In other words, matrices that are close to 1-1 matrices are also 1-1!

Lemma

Let $A$ be an $n \times n$ matrix, and assume that $\exists c > 0$ such that
$∣∣ A h ∣∣ \geq c ∣∣ h ∣∣ h \in R^{n}$
Now let $B$ be an $n \times n$ matrix such that $∣∣ A - B ∣∣ \leq \frac{c}{2}$ . Then, $∣∣ B h ∣∣ \geq \frac{c}{2} ∣∣ h ∣∣$ .

Proof

$∣∣ B h ∣∣ = ∣∣ A h + (B - A) h ∣∣ \geq ∣∣ A h ∣∣ - ∣∣ (B - A) h ∣∣ \geq c ∣∣ h ∣∣ - \frac{c}{2} ∣∣ h ∣∣ \geq \frac{c}{2} ∣∣ h ∣∣$

This lets us prove the 1-1 condition on the General Inverse Function Theorem.

Theorem: Nonlinear Stability Theorem

Let $F : O \to R^{n}, C^{1}$ , $O$ open. Assume that we have a point $x^{*}$ such that $D F (x^{*})$ , is invertible.

Then there exists a neighborhood of $x^{*}$ such that

$F$ is stable on $V$ (implying $F$ is 1-1 on $U$ ).

The derivative matrix of $F$ is invertible $\forall x \in U$ .

Proof

For point 2, look at $det D F (x^{*}) \neq = 0$ . Thus, $\exists U$ neighborhood of $x^{*}$ such that $det D F (x) \neq = 0$ on $U$ .

For point 1, look at $F : B_{r} (x^{*}) \to R^{n}$ . If $x, y$ belong to the ball $B_{r} (x^{*})$ , we have
$F (x) - F (y) = \nabla F_{1} (p_{1}) ⋮ \nabla F_{n} (p_{n}) (x - y)$
For some $p_{1}, \dots p_{n}$ on the line from $x$ to $y$ .

As we know that $D F (x^{*})$ is invertible, then $\exists c > 0$ such that
$∣∣ D F (x^{*}) h ∣∣ \geq c ∣∣ h ∣∣ \forall h$
If $r$ is so small that
$∣∣ D F (x^{*}) - \nabla F_{1} (p_{1}) ⋮ \nabla F_{n} (p_{n}) ∣∣ < \frac{c}{2}$
For all $p_{1}, \dots p_{n} \in B_{r} (x^{*})$ . Then,
$∣∣ B (x - y) ∣∣ \geq \frac{c}{2} ∣∣ x - y ∣∣$

Minimization Principle, General Inverse Function Theorem

To prove the General Inverse Function Theorem, we will introduce an auxiliary function such that its minimizers are solutions of some given equation.

Suppose we have a $F : O \to R^{n}, C^{1}$ , $O$ open, and $x^{*} \in O$ where $D F (x^{*})$ is invertible. Then, from the previous section, we can find a neighborhood $U$ of $x^{*}$ such that $D F (x)$ is invertible for $x \in U$ and $\exists$ c > 0$ such that

∣∣ F (x) - F (y) ∣∣ \geq c ∣∣ x ∣∣ \forall x, y \in U

Using this, we can show the following.

Proposition: A Minimization Principle

Let $U$ open in $R^{n}$ , $F : U \to R^{n}$ , $C^{1}$ . Assume that the derivative matrix of $F$ is invertible $\forall x \in U$ .

Let $E (x) = ∣∣ F (x) - y ∣ ∣^{2}$ , the distance between $F (x)$ and $y$ . If $E$ has an (interior) minimizer at $x \in U$ , then $F (x) = y$ .

Proof

Suppose we have:

A function $G$ that transforms $x$ to $F (x) - y$ .

A function $ϕ$ that takes the squared norm of its input, $ϕ (z) = ⟨ z, z ⟩$

Then, $E (x) = (ϕ \circ G) (x) = ϕ (G (x))$ . Assume $x$ is a minimizer of $E$ . Then, $\nabla E (x) = 0$
$\nabla E (x) = (\nabla ϕ) (G (x)) * D G (x) = (\nabla ϕ) (G (x)) * D F (x) = 0$
Because $D F (x)$ is invertible, then
$(\nabla ϕ) (G (x)) = 0$
Note that because $ϕ (z) = ⟨ z, z ⟩$ , $\nabla ϕ (z) = 2 z$ . So,
$(\nabla ϕ) (G (x)) = 2 G (x) = 0 ⟹ G (x) = 0$
And as $G (x) = F (x) - y = 0, F (x) = y$ .

Lemma: The Open-Image Lemma

Suppose we have a $F : O \to R^{n}, C^{1}$ , $O$ open, and $x^{*} \in O$ where $D F (x^{*})$ is invertible. Then, $F (O)$ is open.

Proof

Let $y_{0} \in F (U)$ . By assumption, we know that $\exists x_{0} \in U$ such that $F (x_{0}) = y_{0}$ .

Let $S = {x \in U : ∣∣ x - x_{0} ∣∣ = R}$ , the sphere of radius $R$ centered around $x_{0}$ . We know that all points along this sphere are greater than $∣∣ F (x) - F (x_{0}) ∣∣ \geq c ∣∣ x - x_{0} ∣∣ = c R$ .

We will show that if $∣∣ y - y_{0} ∣∣ < \frac{c R}{2}$ , then there exists an $x \in B_{R} (x_{0})$ such that $F (x) = y$ .

Look at $min_{x \in \overset{ˉ}{B}_{R} (x_{0})} ∣∣ F (x) - y ∣∣$ , where $\overset{ˉ}{B}_{R} (x_{0})$ is the closed ball of radius $R$ around $x_{0}$ (includes the border). A minimizer exists, because we have a continuous function on a compact set. We furthermore rule out a boundary minimizer.

Let $x \in S$ (so, $∣∣ x - x_{0} ∣∣ = R$ ). We know that $∣∣ F (x) - y_{0} ∣∣ = ∣∣ F (x) - F (x_{0}) ∣∣ \geq c R$ . So,
$∣∣ F (x) - y ∣∣ \geq ∣∣ F (x) - F (x_{0}) ∣∣ - ∣∣ F (x_{0}) - y ∣∣ > \frac{c R}{2}$
But $∣∣ F (x_{0}) - y ∣∣ < \frac{c R}{2}$ , $x_{0} \in B_{R} (x_{0})$ , so no point on $S$ can be the minimizer. So, the minimizer must be in the interior, so by the previous lemma, the minimizer must be such that $F (x) = y$ .

Suppose we have a $F : O \to R^{n}, C^{1}$ , $O$ open, and $x^{*} \in O$ where $D F (x^{*})$ is invertible. Using the previous proofs, we have found that exists a neighborhood $U$ of $x^{*}$ , a neighborhood $V$ of F(x^*)$, such that:

$D F (x)$ is invertible for all $x \in U$
$\exists c$ such that $∣∣ F (x) - F (y) ∣∣ \geq c ∣∣ x - y ∣∣ \forall x, y \in U$
$F (U) = V$ .

By general proprties of functions, $F^{- 1} : V \to U$ is well defined. Finally, to show that the inverse is $C^{1}$ , we will prove that

(D F^{- 1} (y)) = (D F (x))^{- 1}

Proof

To prove this, it suffices to show that
$k \to 0 lim \frac{∣∣ F ^{- 1} ( y + k ) - F ^{- 1} ( y ) - [ D F ( x ) ] ^{- 1} ( k ) ∣∣}{∣∣ k ∣∣} = 0$
We use the notation $F (x) = y, F (x + h) = y + k$ . On the LHS, we have
$k \to 0 lim \frac{∣∣ ( x + h ) - x - [ D F ( x ) ] ^{- 1} [ F ( x + h ) - F ( x )] ∣∣}{∣∣ k ∣∣} = k \to 0 lim \frac{∣∣ [ D F ( x ) ] ^{- 1} [ D F ( x ) [ h ] - [ F ( x + h ) - F ( x )]] ∣∣}{∣∣ k ∣∣} \leq k \to 0 lim \frac{∣∣ [ D F ( x ) ] ^{- 1} [ F ( x + h ) - F ( x ) - D F ( x ) h ] ∣∣}{∣∣ k ∣∣}$
We want to show that $∣∣ k ∣∣ \geq C ∣∣ h ∣∣$ .
$∣∣ F (x + h) - F (x) ∣∣ \geq C ∣∣ h ∣∣ ⟹ ∣∣ k ∣∣ \geq C ∣∣ h ∣∣$
So, we have
$\leq k \to 0 lim \frac{[ D F ( x ) ] ^{- 1}}{c} \frac{∣∣ [ F ( x + h ) - F ( x ) - D F ( x ) h ] ∣∣}{∣∣ h ∣∣} \to 0$
This is a first order approximation! So, this goes to 0 as $k \to \infty$ , as then $h \to 0$ .

This gives us the General Inverse Function Theorem.

Theorem: General Inverse Function Theorem

Let $O \subseteq R^{n}$ open, and let $F : O \to R^{n}$ be $C^{1}$ . Now, let $D F (x^{*})$ be invertible for some $x^{*} \in O$ .

Then, there is a neighborhood $U$ of $x^{*}$ , a neighborhood $V$ of $F (x^{*})$ , such that $F : U \to V$ is 1-1 and onto. Furthermore, $F^{- 1} : V \to U$ is also $C^{1}$ , and for $y \in V, x \in U$ such that $F (x) = y$ ,
$D F^{- 1} (y) = [D F (x)]^{- 1}$

We also give a second proof of the inverse function theorem based on the contraction mapping principle.

Proof (Contraction Mapping Principle)

Let $F : R^{n} \to R^{n}, C^{1}$ . Let $x^{*} \in R^{n}$ where $D F (x^{*})$ is invertible. We will show that $\exists δ_{0} > 0$ such that if $∣∣ f (x^{*}) - y ∣∣ < \frac{δ _{0}}{2∣∣ D F ( x ^{*} ) ^{- 1} ∣∣}$ , then $\exists! x \in \overset{ˉ}{B}_{δ_{0}} (x^{*})$ such that $F (x) = y$ .

In other words, $F$ is locally one-to-one and onto in a local neighborhood of $x^{*}$ !

We want to solve $F (x) = y$ if and only if $x = x - (D F (x^{*}))^{- 1} (F (x) - y) = T (x)$ . We will use the contraction mapping principle to show that there exists a fixed point of $T (x)$ .

Create a sequence
$x_{k + 1} = T (x_{k}) = x_{k} - (D F (x^{*}))^{- 1} (F (x_{k}) - y)$

Remark $f : R \to R$ )

Notice the similarity to Newton’s method, which had root-finding formula (for
$x_{n + 1} = x_{n} - \frac{f ( x _{n} )}{f ^{'} ( x _{n} )}$

The main step is as follows: $\exists δ_{0} > 0$ such that
$∣∣ x - z - D F (x^{*})^{- 1} (F (x) - F (z)) ∣∣ < \frac{1}{2} ∣∣ x - z ∣∣ \forall x, z \in \overset{ˉ}{B}_{δ_{0}} (x^{*})$
The left hand side equals
$∣∣ (D F (x^{*}))^{- 1} (F (x) - F (z) - D F (x^{*}) (x - z)) ∣∣ \leq ∣∣ D F (x^{*})^{- 1} ∣∣∣∣ \nabla F_{1} (P_{1}) ⋮ \nabla F_{n} (P_{n}) - D F (x^{*}) (x - z) ∣∣$
Choose $δ_{0} > 0$ such that
$∣∣ D F (x^{*})^{- 1} ∣∣∣∣ \nabla F_{1} (P_{1}) ⋮ \nabla F_{n} (P_{n}) - D F (x^{*}) < \frac{1}{2} \forall P_{1}, \dots P_{n} \in B_{δ_{0}} (x^{*})$
So, we found a $δ_{0}$ such tha
$∣∣ T (x) - T (z) ∣∣ \leq \frac{1}{2} ∣∣ x - z ∣∣$
Next, we will show that $T$ maps its domain onto itself.
$T : \overset{ˉ}{B}_{δ_{0}} (x^{*}) \to \overset{ˉ}{B}_{δ_{0}} (x^{*})$
Let $∣∣ x - x^{*} ∣∣ \leq δ_{0}$ . Look at $T (x) - x^{*}$ . This is equal to
$∣∣ x - x^{*} - (D F) (x^{*})^{- 1} [F (x) - F (x^{*}) + F (x^{*}) - y] ∣∣ \leq ∣∣ x - x^{*} - D F (x^{*})^{- 1} [F (x) - F (x^{*})] ∣∣ + ∣∣ D F (x^{*})^{- 1} [F (x^{*}) - y] ∣∣ \leq \frac{1}{2} ∣∣ x - x^{*} ∣∣ + \frac{1}{2} ∣∣ x - x^{*} ∣∣ \leq δ_{0}$
So, $T$ is a contraction, and has a fixed point.

The Implicit Function Theorem

We now discuss the Implicit Function Theorem. This lets us create local descriptions of the set of points $u$ where a function is equal to 0, $F (u) = 0$ , also known as a level-curve!

2D Case: Dini’s Theorem

Let us have function $f : R^{2} \to R, C^{1}$ . We ask, when is the set

{(x, y) : f (x, y) = 0}

A $C^{1}$ curve?

Examples

$f (x, y) = x^{2} + y^{2} + 1 = 0$
This will yield an empty set, so we don’t have a $C^{1}$ curve.
$f (x, y) = x^{2} + y^{2} = 0$
This will yield 1 point, so we don’t have a $C^{1}$ curve.

What does it actually mean for a set to be a $C^{1}$ curve?

Intuitively, such a set is a $C^{1}$ curve if we can define a $C^{1}$ function to represent the points. More formally, we say $C \subseteq R^{2}$ is a $C^{1}$ curve if for all points in the set $(x_{0}, y_{0}) \in C$ , there is a $U$ neighborhood of $(x_{0}, y_{0})$ , and function $g : R \to R, C^{1}$ such that

C \cap U = {Graph y = g(x) or x = g(y)} \cap U

In other words, the points in the $U$ neighborhood can be represented by some localized output of a $C^{1}$ function!

Theorem: Dini's Theorem

Let $O$ open in $R^{2}$ , $f : O \to R, C^{1}$ . Let $(x_{0}, y_{0})$ be a point in $O$ , and assume $f (x_{0}, y_{0}) = 0, \frac{\partial f}{\partial y} (x_{0}, y_{0}) \neq = 0$ .

Then, $\exists r, R > 0$ , and a function $g : (x_{0} - r, x_{0} + r) \to (y_{0} - R, y_{0} + R), C^{1}$ such that $f (x, g (x)) = 0, \forall∣ x - x_{0} ∣ < r$ , and if
$(x, y) \in (x_{0} - r, x_{0} + r) \times (y_{0} - R, y_{0} + R)$
And $f (x, y) = 0$ , then $y = g (x)$ .

In this box, the 0-set of $f$ takes on a $C^{1}$ function $g (x)$ .

Proof

We know
$f (x_{0}, y_{0}) = 0, \frac{\partial f}{\partial y} (x_{0}, y_{0}) \neq = 0$
Without loss of generality, suppose $\frac{\partial f}{\partial y} (x_{0}, y_{0}) > 0$ . Then, $\exists R > 0, c > 0$ such that
$\frac{\partial f}{\partial y} \geq c > 0$
in the box $[x_{0} - R, x_{0} + R] \times [y_{0} - R, y_{0} + R]$ . So, because $f (x_{0}, y_{0}) = 0$ , we know that along the vertical line $(x_{0}, y \pm k)$ , $f$ is strictly increasing, so
$f (x_{0}, y - R) < 0 f (x_{0}, y + R) > 0$
We can find an interval around this vertical line where it is always negative around $f (x, y - R)$ and positive around $f (x, y + R)$ . In other words, $\exists r > 0$ such that $f (x, y_{0} - R) < 0$ if $∣ x - x_{0} ∣ < r$ , and $f (x, y_{0} + r) > 0$ if $∣ x - x_{0} ∣ < r$ .

By IVT, for we can find a $y$ such that for all fixed $x$ in our interval $∣ x - x_{0} ∣ < r$ , we find a $y$ in the vertical line such that $f (x, y) = 0$ . Define $g (x) = y$ , the unique $y$ such that $f (x, y) = 0$ , $y - y_{0} < R$ .

Basically, we find a 0 by IVT for every vertical line in this interval!

We have constructed a $g : (x_{0} - r, x_{0} + r) \to (y_{0} - r, y_{0} + r)$ such that
$f (x, g (x)) = 0 \forall x$
And if $f (x, y) = 0$ in our box, then $y = g (x)$ .

We have our function $g$ . We must now show that $g$ is $C^{1}$ , a we cannot guarantee our $y$ ’s on every sliver on the interval will form a continuous function or not.

To do this, we first show that $g$ is continuous. Let $x, x + h \in (x_{0} - r, x_{0} + r)$ . We look at $f (x + h, g (x + h)) - f (x, g (x))$ as the difference between two points $f (B) - f (A)$ , and apply MVT. So, $\exists P$ on the line from $A$ to $B$ such that
$0 = f (x + h, g (x + h)) - f (x, g (x)) = ⟨ \nabla f (P), (h, g (x + h) - g (x))⟩ = \frac{\partial f}{\partial x} (P) \cdot h + \frac{\partial f}{\partial y} (P) (g (x + h) - g (x))$
So, we get
$g (x + h) - g (x) = - \frac{\frac{\partial f}{\partial x} ( P )}{\frac{\partial f}{\partial y} ( P )} h$
We can find an upper bound for the numerator as we have a continuous function on a compact set! As we assumed that the numerator is positive, we can find a bound for the fraction, so $\exists M$ such that
$∣ g (x + h) - g (x) ∣ \leq \frac{∣ \frac{\partial f}{\partial x} ( P ) ∣}{∣ \frac{\partial f}{\partial y} ( P ) ∣} ∣ h ∣ \leq M ∣ h ∣$
Thus, $g$ is continuous, as $h \to 0$ , $P \to (x, g (x))$ and thus
$h \to 0 lim \frac{g ( x + h ) - g ( x )}{h} = - \frac{\frac{\partial f}{\partial x} ( x , g ( x ))}{\frac{\partial f}{\partial y} ( x , g ( x ))}$

This also gives us a formula for $g$ !

If we know $g$ is $C^{1}$ , differentiate

\frac{d}{d x} [f (x, g (x))] = 0

To get

\frac{\partial f}{\partial x} (x, g (x)) + \frac{\partial f}{\partial y} (x, g (x)) \cdot g^{'} (x) = 0

We can solve for $g^{'} (x)$ with this!

Implicit Function Theorem

This can be generalized to higher dimensions!

Remark

We can generalize this!

Let $f : R^{n + 1} \to R, C^{1}$ ,
$f (x_{0}, y_{0}) = 0 x_{0} \in R^{n}, y_{0} \in R$
And assume the gradient at this point is not 0. Then, by the same proof, we can find $R, r > 0$ such that $g : B_{r} (x_{0}) \to (y_{0} - R, y_{0} + R), C^{1}$ such that $f (x, g (x)) = 0$ !

Theorem: The Implicit Function Theorem

We look at points $(x, y) \in R^{n + k}$ , where $x \in R^{n}$ , $y \in R^{k}$ . Let $O$ be open in $R^{n + k}$ , $F : O \to R^{k}, C^{1}$ .

Let $(x_{0}, y_{0}) \in O$ such that $F (x_{0}, y_{0}) = 0$ , $D_{y} F (x_{0}, y_{0})$ invertible. Then, $\exists R, r > 0$ , and a function
$G : B_{r} (x_{0}) \to B_{R} (x_{0}), C^{1}$
Such that
$F (x, G (x)) = 0 \forall x \in B_{r} (x_{0})$
And if $x \in B_{r} (x_{0}), y \in B_{R} (y_{0})$ , and $F (x, y) = 0$ , then $y = G (x)$ . Also, $D G (x)$ can be computed by the chain rule.

Proof

Let $H : O \to R^{n + k}$ ,
$H (x, y) = (x, F (x, y))$
Clearly, $H (x_{0}, y_{0}) = (x_{0}, 0)$ , and
$DH (x_{0}, y_{0}) = [I_{n} D_{x} (x_{0}, y_{0}) 0 D_{y} (x_{0}, y_{0})]$
Because $D_{y} F (x_{0}, y_{0})$ is invertible, this matrix has a non-zero determinant, so the inverse function theorem applies!

So, $\exists R > 0$ and a neighborhood $V$ of $(x_{0}, 0)$ such that $H : R (x_{0}) \times B_{R} (y_{0}) \to V$ is is 1-1, onto with a $C^{1}$ inverse $H^{- 1} = V \to R^{n + k}$ . We define this inverse as
$H^{- 1} (x, y) = (M (x, y), N (x, y))$
We the fact that $H (M (x, y), N (x, y)) = (x, y)$ . By plugging $M, N$ into $H$ , we get
$(M (x, y), F (M (x, y), N (x, y))) = (x, y)$
So, $M (x, y) = x$
$(x, F (x, N (x, y))) = (x, y)$
Define $G (x) = N (x, 0)$ . Pick $0 < r < R$ such that $B_{r} (x_{0}) \times {0,} \subseteq V$ . Because $G (x) = N (x, 0)$ which is $C^{1}$ , $G (x)$ is too $C^{1}$ . Subbing this in, we get
$F (x, N (x, 0)) = 0$
Now, if $x \in B_{r} (x_{0}), y \in B_{R} (y_{0})$ , and $F (x, y) = 0$ , we write that $H^{- 1} (H (x, y)) = 0$
$⟹ (M (x, F (x, y)), N (x, F (x, y))) = (x, y)$
If $F (x, y) = 0$ , then $y = N (x, 0) = G (x)$ .

Example: Implicit Function Theorem

Let $F : R^{3} \to R^{2}, C^{1}$ . Assume $F (0, 0) = (0, 0)$ and
$D F (0, 0) = [010010]$
Which of the following is true? $\exists g, h \in C^{1}, g, h : (- r, r) \to R$ , $g (0) = h (0) = 0$ , such that

$F (x, g (x), h (x)) = (0, 0), \forall∣ x ∣ < r$

$F (g (y), y, h (y)) = (0, 0), \forall∣ y ∣ < r$

$F (g (z), h (z), z) = (0, 0), \forall∣ z ∣ < r$

The second one! In the implicit function theorem, we need a $Y$ such that $D_{Y} (F)$ is invertible. So, choose them to be $x, z$ , with free variable $X = y$ . Then, we can apply our implicit function theorem to get result (2).

We ask, is it possible for $F (x, g (x), h (x)) = (0, 0), \forall∣ x ∣ < r$ ? No. If the above holds, then by the chain rule, we find
$\frac{d}{d x} F (x, g (x), h (x)) = D F (x, g (x), h (x)) 1 g^{'} (x) h^{'} (x) = 0$
And at $(0, 0, 0)$ ,
$D F (0, g (0), h (0)) 1 g^{'} (0) h^{'} (0) = 0$
But this gives us $1 = 0$ , which is impossible!

Finally, we will show a formula for $D G (x)$ , $x \in B_{r} (x_{0})$ . We use the property that $F (x, G (x)) = 0$ . We know that starting with $x$ , we map

x \to (x, G (x)) \to F (x, G (x)) = 0

So, by chain rule,

D_{x, y} F (x, G (x)) \cdot D (x, G (x)) = 0 (D_{x} F (x, G (x)) D_{y} (x, G (x))) [I D G (x)] = 0 (D_{x} F) (x, G (x)) + D_{y} (x, G (x)) D G (x) = 0

We can use this to solve for $D G (x)$ !

D G (x) = - [D_{y} (x, G (x))]^{- 1} D_{x} F (x, G (x))

Example

Describe solutions to
$(x^{2} + y^{2} + z^{2})^{3} - x + z = 0 cos (x^{2} + y^{2}) + e^{z} - 2 = 0$
On the LHS, we have $F (x, y, z), F : R^{1 + 2} \to R^{2}, F (0, 0) = 0$ .

We expect this to be a curve through $(0, 0, 0)$ . We will try to describe this curve locally. We find
$D F (0) = [\partial F_{1} / \partial x \partial F_{2} / \partial x \partial F_{1} / \partial y \partial F_{2} / \partial y \partial F_{1} / \partial z \partial F_{2} / \partial z] = [- 1 0 0011]$
Define $F (X, Y), X \in R, Y \in R^{2}$ . We need $D_{y} F (0)$ invertible, so we choose $X = (y), Y = (x, z)$ (as column $x$ and $z$ in $D F (0)$ will give us an invertible matrix).

We get $F (X, G (X)) = 0$ , or in other words, $F (y, G (y)) = 0, G (y) \in R^{2}$ , so our solutions look like $(g_{1} (y), y, g_{2} (y))$ .

Surfaces and Paths in $R^{3}$

Let $f : R^{3} \to R, C^{1}$ . Look at the level set of this function, the set of points where the function is 0.

S = {(x, y, z) : f (x, y, z) = 0}

Assume $\nabla f (x, y, z) \neq = 0$ for all $x, y, z \in S$ . Then $S$ is a $C^{1}$ surface

Recall, that for $S \subseteq R^{3}$ to be a $C^{1}$ surface, $\forall x \in S$ , there exists a $W$ neighborhood of $x$ such that $S \cap W$ is a $C^{1}$ function.

Proof

Let $(x_{0}, y_{0}, z_{0}) \in S$ . Without loss of generality, assume that $\frac{\partial f}{\partial z} (x_{0}, y_{0}, z_{0}) \neq = 0$ .

By the implicit function theorem, there exists a $r, R > 0$ and a function $g : B_{r} (x_{0}, y_{0}) \to B_{R} (z_{0})$ such that
$f (x, y, g (x, y)) = 0 \forall (x, y) \in B_{r} (x_{0}, y_{0})$
And furthermore, these are the only solutions to $f (x, y, z) = 0$ in $B_{r} (x_{0}, y_{0}) \times (z_{0} - R, z_{0} + R)$ .

So we have a transformation $(x, y) \to (x, y, g (x, y))$ paramterizing $S$ near $(x_{0}, y_{0}, z_{0})$ . To figure out the tangent vectors at $(x_{0}, y_{0}, z_{0})$ , look at the change in one variable along a particular direction.

Fixing $y$ , we differentiate in the $x$ direction to get tangent $T_{1} : (1, 0, \frac{\partial g}{\partial x} (x_{0}, y_{0}))$

Fixing $x$ , we differentiate in the $y$ direction to get tangent $T_{2} : (0, 1, \frac{\partial g}{\partial y} (x_{0}, y_{0}))$

We claim that $\nabla f (x_{0}, y_{0}, z_{0}) ⊥ T_{1}, T_{2}$ . Look at $f (x, y, g (x, y)) = 0$ for all $(x, y)$ . By the chain rule,
$0 = \frac{\partial}{\partial x}_{x = x_{0}} f (x, y_{0}, g (x, y_{0})) = \frac{\partial f}{\partial x} (x_{0}, y_{0}, z_{0}) + \frac{\partial f}{\partial z} (x_{0}, y_{0}, z_{0}) \cdot \frac{\partial g}{\partial x} (x_{0}, y_{0}) = 0 = ⟨ \nabla f (x_{0}, y_{0}, z_{0}), (1, 0, \frac{\partial g}{\partial x} (x_{0}, y_{0})⟩$
So, $\nabla f (x_{0}, y_{0}, z_{0}) ⊥ T_{1}$ . By a similar argument, it is also orthogonal to $T_{2}$ .

We can find another vector orthogonal to $T_{1}, T_{2}$ by taking the cross product! Take $T_{1} \times T_{2} \neq = 0$ . Then, $\exists λ \neq = 0$ such that
$\nabla f (x_{0}, y_{0}, z_{0}) = λ (T_{1} \times T_{2})$

Let $g, h : R^{3} \to R, C^{1}$ . Define the intersection of the two function’s level sets,

C = {(x, y, z) : g (x, y, z) = h (x, y, z) = 0}

Intuitively, we’re intersecting 2 2-dimensional surfaces. So we should expect a 1-dimensional curve!

A sufficient condition for $C$ to be a 1-dimensional curve in $R^{3}$ is

\nabla g (x_{0}, y_{0}, z_{0}) \times \nabla h (x_{0}, y_{0}, z_{0}) \neq = 0

Equivalently, let $G : R^{3} \to R^{2}, G (x, y, z) = (g (x, y, z), h (x, y, z))$ . The, we require

D G (x_{0}, y_{0}, z_{0}) = [\dots \dots \nabla g \nabla h \dots \dots]

Has rank 2 for all $(x_{0}, y_{0}, z_{0}) \in C$ .

If so, without loss of generality, the derivative matrix with repect to $y, z$ , $D_{y, z} G (x_{0}, y_{0}, z_{0})$ is invertible. By the implicit function theorem, $\exists r, R$ , and

γ : (x_{0} - r, x_{0} + r) \to B_{R} (y_{0}, z_{0})

Such that $G (x, γ (x)) = 0$ for all $∣ x - x_{0} ∣ < R$ , and these are the only solutions in $B_{r} (x_{0}) \times B_{R} (y_{0}, z_{0})$ .

Thus, $C$ agrees with the graph ${(x, γ (x)) : ∣ x - x_{0} ∣ < r}$ in $(x_{0} - r, x_{0} + r) \times B_{R} (x_{0}, y_{0})$ , and we can parameterize it as

x \to (x, γ (x))

With tangent vector at $(x_{0}, y_{0}, z_{0})$ given as $T = (1, γ^{'} (x_{0}))$ , and

\nabla g (x_{0}, y_{0}, z_{0}) ⊥ T \nabla h (x_{0}, y_{0}, z_{0}) ⊥ T

These are two normals to our curve!

So, $\nabla g (x_{0}, y_{0}, z_{0}) \times \nabla h (x_{0}, y_{0}, z_{0})$ is a non-zero tangent vector to the curve, so $\exists λ \neq = 0$ such that

\nabla g \times \nabla h = λ T

We generalize.

An $n$ -dimentional manifold embedded in $R^{N}$ , $N = n + k$ . Let $F : R^{n + k} \to R^{k}, C^{1}$ , and assume that $k \times (n + k)$ matrix $D F (x_{0})$ has maximimal rank $k$ if $F (x_{0}) = 0$ .

If so, we will represent the level set

M = {X : F (X) = 0}

Locally, as a graph.

Let $X_{0} = (x_{0}, y_{0}) \in M$ , $x_{0} \in R^{n}, y_{0} \in R^{k}$ . Without loss of generality, $D_{y} F (x_{0}, y_{0})$ (the rightmost $k \times k$ entries) is invertible. Thus, $\exists r, R > 0$ and $G : B_{r} (x_{0}) \to B_{R} (y_{0})$ such that

F (x, G (x)) = 0

And these are the only solutions if $x \in B_{r} (x_{0}), y \in B_{R} (y_{0})$ . Thus, $M \cap B_{r} (x_{0}) \times B_{R} (y_{0})$ agrees with the graph $(x, G (x)) : x \in B_{r} (x_{0})$ .

This is an $n$ -dimentional manifold!

We need $n$ linearly independent tangent vectors at $(x_{0}, y_{0})$ . The process of doing this is the same— fix $n - 1$ variable, and differentiate with respect to our last variable. These are our tangent vectors!

(1, 0, \dots, \frac{\partial G}{\partial x _{1}} (x_{0}) (0, 1, \dots, \frac{\partial G}{\partial x _{2}} (x_{0}) ⋮ (0, \dots, 1, \frac{\partial G}{\partial x _{n}} (x_{0})

The range of $D F (x, G (x))$ at $x_{0}$ is the tangent space above, as

D_{x} (F (x, G (x)) = 0 (D F) (x_{0}, y_{0}) \cdot [I D G (x_{0})] = 0

So, the tangent space to $M$ at $(x_{0}, y_{0})$ is the null space of $D F (x_{0})$ .

We have $f : R^{n} \to R^{3}, C^{1}$ , and an open set $O \subseteq R^{2}$ . We want to know if $F (O)$ looks like a smooth surface.

Recall that we say that if

\frac{\partial F}{\partial x} (x, y) \times \frac{\partial F}{\partial y} (x, y) \neq = 0

Then $F (O)$ is a smooth surface at $F (x, y)$ .

This is equivalent to saying the derivative matrix of $F$ has rank 2, as the first and second row are linearly independent!

Theorem

In the general case, let $O \subseteq R^{k}$ ,
$F : O \to R^{N}, C^{1}$
Assume that $D F (x_{0})$ has rank $k$ $(N \geq k)$ . Denote $F = (F_{1}, F_{2}), F_{1} \in R^{k}, F_{2} \in R^{N}$ . Without loss of generality, assume $D F_{1} (x_{0})$ is invertible.

Then, $\exists U$ neighborhood of $x_{0}$ and $\exists V$ neighborhood of $F_{1} (x_{0})$ such that
$F (U) = {(y, G (y)) : y \in V}$
For some $G : V \to R^{N - k}, C^{1}$ .

Proof

We know that $D F_{1} (x_{0})$ is invertible. By the inverse function theorem, $\exists U$ neighborhood of $x_{0}$ and $V$ neighborhood of $F_{1} (x_{0})$ such that $F_{1} : U \to V$ is one-to-one, onto, and has a $C^{1}$ inverse $F^{- 1} : V \to U$ .

We compose
$F (x) = (F_{1} (x), F_{2} (x)) = (F_{1} (F_{1}^{- 1} (y)), F_{2} (F_{1}^{- 1} (y))) = (y, G (y)), y \in V$

Lagrange Multipliers

Case 1: Surfaces in $R^{3}$

Let $X = (x, y, z) \in R^{3}$ , and let $g : R^{3} \to R, C^{1}$ . Furthermore, define surface

S = {x \in R^{3} : g (X) = 0}

Where $\nabla g (X) \neq = 0$ if $g (X) = 0$ ( $X \in S$ ).

Let $f : R^{3} \to R, C^{1}$ , and let $X_{0}$ be such that $f (X_{0}) \leq f (X)$ (or $f (X_{0}) \geq f (X)$ ), $\forall X \in S$ . Then, $\exists λ \in R$ such that

\nabla f (x_{0}) = λ \nabla g (x_{0})

Proof

Without loss of generality, we have that
$\frac{\partial g}{\partial z} (x_{0}, y_{0}, z_{0}) \neq = 0$
By the implicit function theorem, there exists a function $h : B_{r} (x_{0}, y_{0}) \to R$ such that
${x, y, h (x, y)}$
Is equal to $S$ in the neighborhood of $X_{0}$ . Then, look at the composition $ϕ : B_{r} (x_{0}, y_{0}) \to R$ , $ϕ (x, y) = f (x, y, h (x, y))$ , which has an unconstrained (interior) minimum (or maximum) at $(x_{0}, y_{0})$ . Thus, $\nabla ϕ (x_{0}, y_{0}) = 0$ .

Let $H (x, y) = (x, y, H (x, y))$ . By the Chain Rule,
$\nabla ϕ (x_{0}, y_{0}) = \nabla f (x_{0}, y_{0}, z_{0}) \cdot DH (x_{0}, y_{0}) = \nabla f (x_{0}, y_{0}, z_{0}) 10 \frac{\partial h}{\partial x} (x_{0}, y_{0}) 01 \frac{\partial h}{\partial y} (x_{0}, y_{0})$
Each column of the derivative matrix is a tangent vector! So, $\nabla f (x_{0})$ is orthogonal to both columns.

We know that $\nabla g (x_{0})$ is also normal to our surface at $X_{0}$ . So,
$\nabla g (X_{0}) \in (span {T_{1}, T_{2}})^{⊥}$
But the orthogonal set to the span of the tangent vectors is 1-dimensional! So, $\nabla g (X_{0})$ is a basis for $(span {T_{1}, T_{2}})^{⊥}$ . As a basis, if $\nabla f (X_{0})$ is in this space, then we can form $\nabla f (X_{0})$ as a linear combination of $\nabla g (X_{0})$ .
$\nabla f (X_{0}) = λ \nabla g (X_{0})$

Note that this same argument works for the case of $g : R^{n} \to R, C^{1}$ ,

M = {x : g (x) = 0}

Assuming $\nabla g (x) \neq = 0$ if $g (x) = 0$ (then $M$ is an $n - 1$ dimensional manifold in $R^{n}$ ).

If $f : R^{n} \to R, C^{1}$ and $X_{0}$ is such that $f (x_{0}) \geq f (x)$ (or $f (x_{0}) \leq f (x)$ ), then $\exists λ \in R$ such that

\nabla f (x_{0}) = λ \nabla g (x_{0})

Case 2: Curves in $R^{3}$

Let $g, h : R^{3} \to R, C^{1}$ . Define curve

C = {X \in R^{3} : g (X) = h (X) = 0}

And assume

Rank [\nabla g (X) \nabla h (X)] = 2

If $X \in C$ (then $C$ is a $C^{1}$ curve in $R^{3}$ ).

Let $f : R^{3} \to R, C^{1}$ . Let $X_{0} \in C$ such that $f (X_{0}) \leq f (X)$ (or $f (X_{0}) \geq f (X)$ ) for all $x \in C$ . Then there exists $λ_{1}, λ_{2} \in R$ such that

\nabla f (X_{0}) = λ_{1} \nabla g (X_{0}) + λ_{2} \nabla h (X_{0})

Proof

Without loss of generality, say $D_{y, z} (g, h) (x_{0}, y_{0}, z_{0})$ is invertible. By the implicit function theorem, $\exists γ : (x_{0} - r, x_{0} + r) \to R^{2}, C^{1}$ such that $(x, γ (x))$ is equal to $C$ in a neighborhood $X_{0}$ .

Let $ϕ (x) = f (x, γ (x))$ , $γ : (x_{0} - r, x_{0} + r) \to R$ , $γ$ has an unconstrainer min (or max) at $x_{0}$ , $γ^{'} (x_{0}) = 0$ .

By the Chain Rule,
$\nabla f (x_{0}, γ (x_{0})) [1 γ^{'} (x_{0})] = \nabla f (x_{0}, γ (x_{0})) \cdot T = 0$
Is a basis for the tangent space to $C$ at $x_{0}$ .

Recall $\nabla g (x_{0}), \nabla h (x_{0})$ are linearly independent vectors orthogonal to $T$ . So,
$Span (\nabla g (x_{0}), \nabla h (x_{0})) = (Span T)^{⊥}$
We also know $\nabla f (x_{0})$ is in this space. Thus, there exists a linear combination of $\nabla g (x_{0}), \nabla h (x_{0})$ that form $\nabla f (x_{0})$ .

Let $A$ be an $n \times n$ symmetric real matrix. Let

λ = ∣∣ x ∣∣ = 1 min ⟨ A x, x ⟩

We look at the minimum of the quadratic function in the compact set given by the unit sphere.

Let $x_{0}$ be a minimizer ( $∣∣ x_{0} ∣∣ = 1$ ). Then, $A x_{0} = λ x_{0}$

Proof

$g (x) = ∣∣ x ∣ ∣^{2}$ . We try to minimize the function $f (x) = ⟨ A x, x ⟩$ . By the above theorem, at a minimizer, $\exists λ$ such that $\nabla f (x_{0}) = λ \nabla g (x_{0}) = λ 2 x$ .

We show that $\nabla f (x_{0}) = 2 A x_{0}$ . This completes our proof.

Lemma

If $f (x) = ⟨ A x, x ⟩$ , then $\nabla f (x) = 2 A x$ .

We find
$t \to 0 lim \frac{f ( x + t e _{i} ) - f ( x )}{t} = t \to 0 lim \frac{⟨ A ( x + t e _{i} ) , x + t e _{i} ⟩ - ⟨ A x , x ⟩}{t} = t \to 0 lim \frac{⟨ A x , x ⟩ + t ⟨ A x , e _{i} ⟩ + t ⟨ A e _{i} , x ⟩ + t ^{2} ⟨ A e _{i} , e _{i} ⟩ - ⟨ A x , x ⟩}{t} = ⟨ A x, e_{i} ⟩ + ⟨ A e_{i}, x ⟩ = 2 ⟨ A x, e_{i} ⟩$
This is the ith component of $2 A x$ !

The last equality is because $⟨ A e_{i}, x ⟩ = ⟨ e_{i}, A x ⟩$ !

Shu-Ye's Quartz Space 🪴

Table of Contents

MATH411

Derivatives in Several Variables

Limits

Partial Derivatives

Directional Derivatives and MVT

Local Approximation of Real-Valued Functions

First Order Approximations

Second Order Approximations and Second Derivatives

Definitions and Context

Second Order Approximation and Second Derivative Test

Higher Order Approximations

Linear Map Approximations of Non-Linear Mappings

Linear Mappings

Uniqueness

Existence

Proof ( $\leftarrow$ )

Proof ( $\to$ )

The Derivative Matrix and Differential

The Chain Rule

The Inverse Function Theorem

Inverse Function Theorem: 1D, 2D

Stability of Non-Linear Mappings

Minimization Principle, General Inverse Function Theorem

The Implicit Function Theorem

2D Case: Dini’s Theorem

Implicit Function Theorem

Surfaces and Paths in $R^{3}$

Lagrange Multipliers

Case 1: Surfaces in $R^{3}$

Case 2: Curves in $R^{3}$

Graph View

Backlinks

Shu-Ye's Quartz Space 🪴

Table of Contents

MATH411

Derivatives in Several Variables §

Limits §

Partial Derivatives §

Directional Derivatives and MVT §

Local Approximation of Real-Valued Functions §

First Order Approximations §

Second Order Approximations and Second Derivatives §

Definitions and Context §

Second Order Approximation and Second Derivative Test §

Higher Order Approximations §

Linear Map Approximations of Non-Linear Mappings §

Linear Mappings §

Uniqueness §

Existence §

Proof (←) §

Proof (→) §

The Derivative Matrix and Differential §

The Chain Rule §

The Inverse Function Theorem §

Inverse Function Theorem: 1D, 2D §

Stability of Non-Linear Mappings §

Minimization Principle, General Inverse Function Theorem §

The Implicit Function Theorem §

2D Case: Dini’s Theorem §

Implicit Function Theorem §

Surfaces and Paths in R3 §

Lagrange Multipliers §

Case 1: Surfaces in R3 §

Case 2: Curves in R3 §

Graph View

Backlinks

Derivatives in Several Variables

Limits

Partial Derivatives

Directional Derivatives and MVT

Local Approximation of Real-Valued Functions

First Order Approximations

Second Order Approximations and Second Derivatives

Definitions and Context

Second Order Approximation and Second Derivative Test

Higher Order Approximations

Linear Map Approximations of Non-Linear Mappings

Linear Mappings

Uniqueness

Existence

Proof ( $\leftarrow$ )

Proof ( $\to$ )

The Derivative Matrix and Differential

The Chain Rule

The Inverse Function Theorem

Inverse Function Theorem: 1D, 2D

Stability of Non-Linear Mappings

Minimization Principle, General Inverse Function Theorem

The Implicit Function Theorem

2D Case: Dini’s Theorem

Implicit Function Theorem

Surfaces and Paths in $R^{3}$

Lagrange Multipliers

Case 1: Surfaces in $R^{3}$

Case 2: Curves in $R^{3}$