- [[limit]], [[calculus]], [[derivative - exponential functions]], [[power rule]], [[infinitesimal]], [[partial derivative]]
# Idea
The derivative defines a rate of change as a function. It lets us determine the **slope** or **rate of change** at any point of a function. We can thinking of it in different ways: graphical view, input-nudging view, symbolic view, and linear approximation view.
$
f^{\prime}(a)=\lim _{x \rightarrow a} \frac{f(x)-f(a)}{x-a}
$
If $f(x) = 2x + 1$
$
\begin{aligned}
\lim _{x \rightarrow a} \frac{(2 x+1)-(2 a+1)}{x-a} & =\lim _{x \rightarrow a} \frac{2(x-a)}{x-a} \\
& =\lim _{x \rightarrow a} 2 \\
& =2
\end{aligned}
$
If $f(x) = x^2$
$
\begin{aligned}
f^{\prime}(a) & =\lim _{x \rightarrow a} \frac{f(x)-f(a)}{x-a} \\
& =\lim _{x \rightarrow a} \frac{x^2-a^2}{x-a} \\
& =\lim _{x \rightarrow a} \frac{(x-a)(x+a)}{x-a} \\
& =\lim _{x \rightarrow a}(x+a) \\
& =2 a .
\end{aligned}
$
The *laws of nature* are often expressed in terms of derivatives, which capture an important concept: *how a quantity changes in response to changes in another quantity*. See [[derivatives describe the laws of nature]].
![[20240103082530.png]]
The derivative of a function $f(x)$ describes how the output of a $f$ changes when there is a small change in $x$.
In calculus, the symbol for the derivative is $\frac{dy}{dx}$. It's supposed to remind you of $\frac{\Delta y}{\Delta x}$, but the changes in $dy$ and $dx$ are now [[infinitesimal|infinitesimally]] tiny. This notation was invented by [[Gottfried Wilhelm Leibniz]] and is known as **Leibniz's notation** (whereas [[Isaac Newton]] used [[fluxions]] and [[fluents]]).
It's often defined as the **instantaneous rate of change**. But this definition is an **oxymoron**: change, by definition, requires multiple points, but instantaneous refers to one point—so change cannot be instantaneous. Yet, **instantaneous change** is the best way to describe what the derivative means.
A better way to think of the derivative is that it is the **best constant approximation** for rate of change.
## Why derivatives?
They're an interesting example of a $\frac{0}{0}$ indeterminate form - that is, a limit of a fraction as both numerator and denominator approach 0.
They will give us information about the original function that’s not obvious from that function’s formula.
They're needed to make rigorous certain ideas in the world, like "instantaneous speed."
## Example derivative questions
How fast? How steep? How sensitive?
How much does raising the price of an app affect the consumer demand for it?
If one variable changes, how much does a related variable change?
The derivative of a function describes how the output of a function changes when there is a small change in an input variable.
It relates to [[elasticity]] and [[marginal effects]].
## More notation
If $J(w)$ is a function of one variable $w$, then
$\frac{d}{d w} J(w)$
If $J\left(w_1, w_2, \ldots, w_n\right)$ is a function of more than one variables, then we get [[partial derivative]]:
$\frac{\partial}{\partial w_i} J\left(w_1, w_2, \ldots, w_n\right)$
or simply $\frac{\partial J}{\partial w_i}$ or $\frac{\partial}{\partial w_i} J$
## Information definition
Let's use the cost function $J(w)$ as an example. The cost $J$ is the output and $w$ is the input variable.
Let's give a 'small change' a name *epsilon* or $\epsilon$. We use these Greek letters because it is traditional in mathematics to use *epsilon* ($\epsilon$) or *delta* ($\Delta$) to represent a small value. You can think of it as representing 0.001 or some other small value.
$
\begin{equation}
\text{if } w \uparrow \epsilon \text{ causes }J(w) \uparrow \text{by }k \times \epsilon \text{ then} \\
\frac{\partial J(w)}{\partial w} = k \tag{1}
\end{equation}
$
This just says if you change the input to the function $J(w)$ by a little bit (by $\epsilon$) and the output changes by $k$ times $\epsilon$, then the derivative of $J(w)$ is equal to $k$.
$J(w + \epsilon) - J(w) = k \epsilon $
- change in input $w$: $\epsilon$
- change in output $J$: $J(w + \epsilon) - J(w)$
Rearranging terms, we get the slope (or instantaneous change), $k$
$\frac{J(w + \epsilon) - J(w)}{\epsilon} = k$
where $k$ is actually the derivative/slope
$\frac{J(w + \epsilon) - J(w)}{\epsilon} = k = \frac{\partial J(w)}{\partial w}$
### Example
$J(w) = w^2, w = 3, \epsilon = 0.001$
$\frac{J(3 + 0.001) - J(3)}{0.001} \approx \frac{9.006 - 9}{0.001} \approx 6$
In other words, $\frac{\partial J(w)}{\partial w} = 6$ or the slope of the function $J(w)$ at $w = 3$ is $6$.
We know analytically that $\frac{\partial J(w)}{\partial w} = 2w$. Using this derivative, we can compute the slope for any value of this function. Plugging in $w = 3$, we get $2 \times 3 = 6$.
```r
J <- function(w) {
return(w^2)
}
w <- 3
J1 <- J(w) # 3^2 = 9
eps <- 0.001
J_eps <- J(w + eps) 3.001^2 = 9.006
difference <- round(J_eps - J1, 3) # 0.006
# derivative of J with respective to w at w = 3, which is 6
difference / eps # 0.006 / 0.001 = 6
# derivative of J is 2w
dj_dw <- function(w) {
return(2 * w)
}
dj_dw(w) # 6
```
```python
from sympy import symbols, diff
J, w = symbols('J, w')
J = w**2
J
dJ_dw = diff(J,w)
dJ_dw
dJ_dw.subs([(w,2)]) # derivative at the point w = 2
dJ_dw.subs([(w,3)]) # derivative at the point w = 3
dJ_dw.subs([(w,-3)]) # derivative at the point w = -3
```
## Formal definition
$f^{\prime}(x)=\lim _{\Delta x \rightarrow 0} \frac{f(c+\Delta x)-f(c)}{\Delta x}$
$
\frac{d s}{d t}(t)=\underbrace{\frac{s(t+d t)-s(t)}{d t}}_{d t \rightarrow 0}
$
## Different ways to write derivatives
![[Pasted image 20210606012148.png|500]]
# References
- [3Blue1Brown - The paradox of the derivative](https://www.3blue1brown.com/lessons/derivatives)
- [king - definition of derivative](https://courses.kristakingmath.com/library/derivatives-58cb6261/110494/path/step/57822221/)
- [[infinity principle]]
- [What is a derivative? (Optional) - Neural network training | Coursera](https://www.coursera.org/learn/advanced-learning-algorithms/lecture/i9Dqr/what-is-a-derivative-optional)