# Idea Sigmoid function (aka logistic or inverse logit function). The sigmoid function $\sigma(x)=\frac{1}{1+e^{-x}}$ is frequently used in neural networks because its derivative is very simple and computationally fast to calculate, making it great for backpropagation. Let's denote the sigmoid function as the following: $\sigma(x)=\frac{1}{1+e^{-x}}$ $\sigma(x)=\frac{e^{x}}{e^{x}+1}$ $\frac{1}{1+e^{-x}}= \frac{1}{1+e^{-x}} \frac{e^{x}}{e^{x}} =\frac{e^{x}}{e^{x}+1} $ Since $\frac{e^x}{e^x} = 1$, so in essence, we're just multiplying $\frac{1}{1+e^{-x}}$ by 1. The derivative of the sigmoid function $\sigma(x)$ is the sigmoid function $\sigma(x)$ multiplied by $1 - \sigma(x)$. $\sigma(x)=\frac{1}{1+e^{-x}}$ $\sigma'(x)=\frac{d}{dx}\sigma(x)=\sigma(x)(1-\sigma(x))$ Before we begin, here's a reminder of how to find the derivatives of exponential functions. $ \frac{d}{dx}e^x = e^x$ $ \frac{d}{dx}e^{-3x^2 + 2x} = (-6x + 2)e^{-3x^2 + 2x}$ And here's the [[chain rule]]: $\frac{d}{dx} \left[ f(g(x)) \right] = f'\left[g(x) \right] * g'(x)$ Example: Find the derivative of $f(x) = (x^2 + 1)^3$: $ \begin{aligned} f'(x) &= 3(x^2 + 1)^{3-1} * 2x^{2-1}\\ &= 3(x^2 + 1)^2(2x) \\ &= 6x(x^2 + 1)^2 \end{aligned} $ ## Derivative via chain rule Line 2 of the sigmoid derivation below uses this rule. $ \begin{aligned} \frac{d}{dx} \sigma(x) &= \frac{d}{dx} \left[ \frac{1}{1+e^{-x}} \right] =\frac{d}{dx}(1+e^{-x})^{-1} \\ &=-1*(1+e^{-x})^{-2}(-e^{-x}) \\ &=\frac{-e^{-x}}{-(1+e^{-x})^{2}} \\ &=\frac{e^{-x}}{(1+e^{-x})^{2}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x} + (1 - 1)}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{(1 + e^{-x}) - 1}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \left[ \frac{(1 + e^{-x})}{1+e^{-x}} - \frac{1}{1+e^{-x}} \right] \\ &=\frac{1}{1+e^{-x}} \left[ 1 - \frac{1}{1+e^{-x}} \right] \\ &=\sigma(x) (1-\sigma(x)) \\ \end{aligned} $ ## Derivative via quotient rule [[quotient rule]]: If $f(x) = \frac{g(x)}{h(x)}$, then $f'(x) = \frac{g'(x)h(x) - h'(x)g(x)}{(h(x))^2}$. Example: Find the derivative of $f(x) = \frac{3x}{1 + x}$: $ \begin{aligned} f'(x) &= \frac{(\frac{d}{dx}(3x))*(1+x) - (\frac{d}{dx}(1+x)) * (3x)} {(1+x)^2} \\ &= \frac{3(1 + x) - 1(3x)}{(1+x)^2} \\ &= \frac{3 + 3x - 3x}{(1+x)^2} \\ &= \frac{3}{(1+x)^2} \end{aligned} $ Line 2 of the sigmoid derivation below uses this rule. $ \begin{aligned} \frac{d}{dx} \sigma(x) &= \frac{d}{dx} \left[ \frac{1}{1+e^{-x}} \right] \\ &=\frac{(0)(1 + e^{-x}) - (-e^{-x})(1)}{(1 + e^{-x})^2} \\ &=\frac{e^{-x}}{(1 + e^{-x})^2} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{e^{-x} + (1 - 1)}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \frac{(1 + e^{-x}) - 1}{1+e^{-x}} \\ &=\frac{1}{1+e^{-x}} \left[ \frac{(1 + e^{-x})}{1+e^{-x}} - \frac{1}{1+e^{-x}} \right] \\ &=\frac{1}{1+e^{-x}} \left[ 1 - \frac{1}{1+e^{-x}} \right] \\ &=\sigma(x) (1-\sigma(x)) \\ \end{aligned} $ # References - [Data science: Neural networks: Deriving the sigmoid derivative via chain and quotient rules](https://hausetutorials.netlify.app/posts/2019-12-01-neural-networks-deriving-the-sigmoid-derivative/)