- [[calculus]], [[gradient descent]], [[Jacobian]], [[sigmoid function derivative]], [[product rule]]
- [[convolutional neural networks]] and [[recurrent networks]]
# Idea
We use the chain rule when functions are embedded within each other (i.e., function composition). "Peel" the outermost layer/function first. Then the next layer/function.
$\frac{d}{d x} f(g(x))=f^{\prime}(g(x)) g^{\prime}(x)$
$
\frac{d}{d x} g(h(x))=\frac{d g}{dh} \frac{dh}{d x}=\frac{d g}{d x}
$
$(f \circ g)^{\prime}(a)=f^{\prime}(g(a)) \times g^{\prime}(a)$
In essence, we are simply saying that the derivatives of $f$ and $g$ multiply, when evaluated at the correct, corresponding points.
![[s20220812_200459.png]]
## Examples of function composition
$
g(x)=\sin (x)
$
$
h(x)=x^{2}
$
$
g(h(x))=\sin \left(x^{2}\right)
$
Note that the following is wrong
$
\frac{d\left(\sin (x) x^{2}\right)}{d x} \neq\left(\frac{d(\sin (x))}{d x}\right)\left(\frac{d\left(x^{2}\right)}{d x}\right)
$
## Examples
$\frac{d f}{d x}\left(x^{2}+4 x^{3}\right)^{5}=5\left(x^{2}+4 x^{3}\right)^{4}\left(2 x+12 x^{2}\right)$
# References
- [Mike Cohen Udemy - Derivatives: product and chain rules](https://www.udemy.com/course/deeplearning_x/learn/lecture/27841968#overview)
- [3Blue1Brown - Visualizing the chain rule and product rule](https://www.3blue1brown.com/lessons/chain-rule-and-product-rule)