Gini impurity

# Idea Gini impurity is an [[impurity measure]] used in [[decision trees]] to measure the *impurity* of a node. A node is "pure" if `gini=0`. That is, if all training instances or samples a node applies to belong to the sample class. $ G_{i}=1-\sum_{k=1}^{n} (p_{i, k})^2 $ - $G_i$: Gini impurity at node $i$ - $p_{i,k}$: the ratio/proportion of class $k$ instances among the training instances in node $i$ The green leaf node below's Gini impurity can be computed as follows: ```python samples = 54 value = np.array([0, 49, 5]) gini = 1 - np.sum((value / samples) ** 2) # 0.168 ``` [[Geron 2019 hands-on machine learning with scikit-learn keras tensorflow]] ![[Pasted image 20210515221630.png]] # References