- [[calculus]], [[gradient descent]], [[Jacobian]], [[sigmoid function derivative]], [[product rule]] - [[convolutional neural networks]] and [[recurrent networks]] # Idea We use the chain rule when functions are embedded within each other (i.e., function composition). "Peel" the outermost layer/function first. Then the next layer/function. $\frac{d}{d x} f(g(x))=f^{\prime}(g(x)) g^{\prime}(x)$ $ \frac{d}{d x} g(h(x))=\frac{d g}{dh} \frac{dh}{d x}=\frac{d g}{d x} $ $(f \circ g)^{\prime}(a)=f^{\prime}(g(a)) \times g^{\prime}(a)$ In essence, we are simply saying that the derivatives of $f$ and $g$ multiply, when evaluated at the correct, corresponding points. ![[s20220812_200459.png]] ## Examples of function composition $ g(x)=\sin (x) $ $ h(x)=x^{2} $ $ g(h(x))=\sin \left(x^{2}\right) $ Note that the following is wrong $ \frac{d\left(\sin (x) x^{2}\right)}{d x} \neq\left(\frac{d(\sin (x))}{d x}\right)\left(\frac{d\left(x^{2}\right)}{d x}\right) $ ## Examples $\frac{d f}{d x}\left(x^{2}+4 x^{3}\right)^{5}=5\left(x^{2}+4 x^{3}\right)^{4}\left(2 x+12 x^{2}\right)$ # References - [Mike Cohen Udemy - Derivatives: product and chain rules](https://www.udemy.com/course/deeplearning_x/learn/lecture/27841968#overview) - [3Blue1Brown - Visualizing the chain rule and product rule](https://www.3blue1brown.com/lessons/chain-rule-and-product-rule)