I have gained knowledge about decision trees in today’s class. In essence, decision trees are graphical depictions of decision-making procedures. Consider them as a sequence of inquiries and decisions that culminate in a decision. You start with the first question on the tree, and as you respond to each one, you move down the branches until you reach the final choice.
Choosing the most instructive questions to pose at each decision tree node is a necessary step in the construction process. Based on different characteristics of the data, these questions are chosen using statistical measures such as entropy, Gini impurity, and information gain. The objective is to choose the most pertinent attributes at each node in order to optimize the decision-making process.
Decision trees do have certain drawbacks, though, particularly in situations where the data shows a significant spread or departure from the mean. In our most recent Project 2, we came across a dataset where the mean was significantly off from the majority of data points, which reduced the effectiveness of the decision tree method. This emphasizes how crucial it is to take the distribution and features of the data into account when selecting the best statistical method for analysis. Although decision trees are a useful tool, their effectiveness depends on the type of data they are used on. In certain cases, other statistical techniques may be more appropriate for handling these kinds of scenarios.