Every wondered how Machine learning works? In this 2 part article we are going to explain machine learning with Excel. If you missed the first article and video you can read it here.
Machine learning algorithms iterate the data over and over until it can establish the best route to take with the questions. For example, it will remove data where the entropy is 0 and re run the calculations to find the next question where entropy of 0 is found. Entropy of 0 on a decision tree is the final or decision leaf on a decision tree. If entropy of greater than 0 is found, the item with the most information gain is taken to be the next question and the iteration is re done on an amended data set depending on the answers.
Looking at our sample data, in our first iteration, entropy is calculated on each classification and the data set. The first question is defined by the classification that gives the most information gain or if the entropy is 0.
After this, any data with an entropy of 0 is removed, and then the algorithm can run again. The data that is removed is the data in which a decision can be made on. This will continue running and branching until the best decision tree route is defined.
Entropy and information gain can be used to created decision trees from a top down approach. Below is the path chosen using the sample training data and an entropy and information gain model to the second iteration. Watch this video now where we will talk though the algorithm process and iterations to show how this has been achieved.
Banks and the finance sector are using ML to decide if a person is credit worthy day in and day out. The data is ever growing, and improvements have been made with the addition of new data. What we have seen in this example is a bank with a list of 3 questions. By using these questions, along with the history on loan defaults, using probability, the best route for questioning in a decision tree can be established.
In real life, the questions asked by a bank are more complex, and the training data set a lot bigger, and this decision tree quickly can become a forest with many many iterations over the data to reach decision or leaf nodes.
A decision tree and entropy/information gain calculations are only one example on how decision trees can be calculated. The bones behind most Machine learning algorithms are based on probability calculations in some form. It is therefore important that you have a good understanding of statistics and probability if you want to create machine learning algorithms. Remember decision trees are only one form of machine learning algorithm, as the aim of this article was to give you an understanding on how Machine Learning and algorithms work