## Setup

This is super general because I don’t really know how deep learning is used for harder problems. I wanted it to apply in as general a case as possible.

• Let $$0\leq n\in\mathbb Z$$.
• Let $$V_k^{(d_k)}$$ be an inner product space with orthonormal basis $$e_1^{(k)},\dots,e_{d_k}^{(k)}$$ for $$k=0,\dots,n+1$$.
• Let $$b_k\in V_k$$ and $$W_k\in\mathcal L(V_{k-1},V_k)$$ for $$k=1\dots,n$$.
• Let $$\sigma:\mathbb R\to\mathbb R$$ be such that
• it is continuous
• monotonically increasing (not necessarily strictly)
• $$\sigma’(t)\leq 1$$ whereever the derivative exists. Note that the above properties already guarantee that the derivative exists (Lebesgue) almost everywhere.
• for $$v\in V_k$$ such that $$v=c_1e^{(k)}_1+\cdots+c_{d_k}e^{(k)}_{d_k}$$, define $$\sigma(v):=\sigma(c_1)e^{(k)}_1+\cdots+\sigma(c_{d_k})e^{(k)}_{d_k}$$.
• Let $$x_0\in V_0$$ be an input vector and define $$x_k = \sigma(W_kx_{k-1}+b_k)$$ for $$k=1,\dots,n+1$$.

## Analysis

• $$|\sigma(s)-\sigma(t)|\leq|s-t|$$ because $$\sigma$$ is continuous, monotonically increasing, and differentiable a.e. with $$\sigma’(t)\leq 1$$
• If $$\|\tilde x_0 - x_0\| < \varepsilon$$, then

\begin{align*} \|\tilde x_{n+1}-x_{n+1}\| &= \|\sigma(W_n\tilde x_n + b_n) - \sigma(W_nx_n+b_n)\| \\
&\leq \|W_n\tilde x_n + b_n - W_nx_n - b_n\| \\
&= \|W_n(\tilde x_n - x)\| \\
&\leq \|W_n\|\|\tilde x_n - x_n\|\\
&\leq \cdots \\
&\leq \|W_n\|\cdots\|W_1\|\|\tilde x_0-x_0\| \\
&< \|W_n\|\cdots\|W_1\|\varepsilon \end{align*}