# Idea Entropy in statistics is a measure of uncertainty or randomness or [[impurity measure|impurity]]. The more the randomness (the higher the uncertainty) or impurity, the greater the entropy. If $p_i$ is the proportion of data labeled as class $c_i$ we define entropy as $ H(S)=-p_{1} \log _{2} (p_{1})-\ldots-p_{n} \log _{2} (p_{n}) $ $- \sum p_i log_2(p_i)$ Note that $log_2$ always returns negative numbers when the input value is between 0 and 1 (which are probability values). So the negative sign is simply to flip the sign. If $p_i = 0$, $log2(0)$ is undefined, but by convention, we say is equals to 0 when working with entropy. ```r entropy <- function(probs) { -sum(probs * log2(probs)) } # 2 class example entropy(c(0.4, 0.6)) # probs sum to 1 # 0.9709506 entropy(c(0.5, 0.5)) # 1 entropy(c(0.01, 0.99)) # 0.08079314 # 4 class example entropy(c(0.2, 0.3, 0.4, 0.1)) # probs sum to 1 # 1.846439 ``` ## Binary classification In the case of binary classification (two classes), entropy is defined as follows: $p_0 = 1 - p_1$ $H(p_1) = -p_1 log_2(p_1) - p_0 log_2(p_0)$ which is $H(p_1) = -p_1 log_2(p_1) - (1-p_1) log_2(1-p_1)$ which is mathematically identical to the [[binary cross-entropy loss function]]. ```python def entropy(p): if p == 0 or p == 1: return 0 else: return -p * np.log2(p) - (1 - p) * np.log2(1 - p) ``` ## Graph of $H$ against $p$ (binary case) Entropy is small (close to 0) when every $p_i$ is close to 0 or 1—when most of the data are in a single class. When the data belong to many different classes, $p_i$ is not as close to 0 or 1, and thus, entropy is high. $H = -[p \times log_2(p) + (1-p) \times log_2(1-p)]$ Note that we use $log_2$ by convention (blue line). But the natural log also works (red line), but it isn't as satisfying and beautiful because $H(p)$ does not equal 1 when $p_i = 0.5$ (but it is when we use $log_2(0.5)$. ```functionplot --- bounds: [0, 1, 0, 1.1] xLabel: proportion or probability (belonging to certain class) yLabel: entropy disableZoom: true --- f(x) = -x * log2(x) - (1-x) * log2(1-x) f(x) = -x * log(x) - (1-x) * log(1-x) ``` ## Entropy as a measure of [[impurity measure|impurity]] ![[20240112093833.png]] $ H\left(p_1^{\text {root }}\right)-\left(w^{\text {left }} H\left(p_1^{\text {left }}\right)+w^{\text {right }} H\left(p_1^{\text {right }}\right)\right) $ # References - [Measuring purity - Decision trees | Coursera](https://www.coursera.org/learn/advanced-learning-algorithms/lecture/6jL2z/measuring-purity)