# Idea
Gini impurity is an [[impurity measure]] used in [[decision trees]] to measure the *impurity* of a node. A node is "pure" if `gini=0`. That is, if all training instances or samples a node applies to belong to the sample class.
$
G_{i}=1-\sum_{k=1}^{n} (p_{i, k})^2
$
- $G_i$: Gini impurity at node $i$
- $p_{i,k}$: the ratio/proportion of class $k$ instances among the training instances in node $i$
The green leaf node below's Gini impurity can be computed as follows:
```python
samples = 54
value = np.array([0, 49, 5])
gini = 1 - np.sum((value / samples) ** 2) # 0.168
```
[[Geron 2019 hands-on machine learning with scikit-learn keras tensorflow]]
![[Pasted image 20210515221630.png]]
# References