Modeling with Trees
This blog is a little bit about my third project as a Data Science student. For our third project, we explored different predictive models many of which were trees. We explored regression trees, decision trees, random forest, and bagging trees.
The model I chose as my final model was a Random Forest, for a couple of reasons. Firstly the data set that I was dealing with was greatly imbalanced with a 14:86 ratio.
One of the reasons I chose a random forest model is because Random Forest is less affected by an imbalanced data set than other models.
In short, the way a random forest works is that it builds multiple decision trees and merges them together to get a more accurate and stable prediction.
Also instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model.
For these reasons, I felt a random forest model would be the most suitable model in my case.