- [[conceptual definitions of entropy]], [[decision tree hyperparameters]], [[decision tree terminology]], [[Gini impurity]], [[conceptual definitions of entropy]], [[tree ensembles]], [[decision trees versus neural networks]] # Idea Decision trees are built by splitting observations (i.e., rows of data) based on feature values. The algorithm looks for the best split that results in the highest [[information gain]]. The top node is the *root node*. The bottom/final nodes are the *leaf nodes*. The nodes between the root and leaf nodes are *decision nodes*. Finding the best splits is the most time-consuming part of [[decision tree learning]]. Decision trees can be used to predict categorical/binary or continuous outcomes ([[regression trees]]). ![[20240112092046.png]] Decision trees are the backbone of [[random forests]]. Algorithms for splitting: - pre-sorted algorithm: feature values are pre-sorted and all possible split points are evaluated - [[histogram-based splitting]] It is a [[white box model]] instead of a [[black box model]]. Scikit-learn uses the [[Classification and Regression Trees algorithm|CART algorithm]], which constructs binary trees using the feature and threshold that yield the largest information gain at each node. Non-leaf nodes can only have two children/leafs (other algorithms can have more than 2 children/leafs). ![[Pasted image 20210125220122.png]] ![[Pasted image 20210125220417.png]] # Terminology The *root node* is the top node where depth = 0 (see [[decision tree maximum depth]]). *Leaf nodes* are the final nodes—nodes that don't have any children. [[Geron 2019 hands-on machine learning with scikit-learn keras tensorflow]] ![[Pasted image 20210515221630.png]] ![[Pasted image 20210515222446.png]] # References - [Decision tree model - Decision trees | Coursera](https://www.coursera.org/learn/advanced-learning-algorithms/lecture/HFvPH/decision-tree-model) - https://www.coursera.org/learn/model-thinking/lecture/6DnIq/decision-trees - https://scikit-learn.org/stable/modules/tree.html - https://towardsdatascience.com/scikit-learn-decision-trees-explained-803f3812290d