## Simple Explanation of Machine Learning shown with Excel – Part 1

Every wondered how Machine learning works?  In this 2 part article we are going to explain machine learning with Excel

## Introduction

The world is full with a lot of data, pictures, music, text, video, spreadsheets.  Machine learning brings the promise of delivering meaning from that data.

Human’s learn from experience. if we touch something hot, that burning sense is stored in memory and we quickly learn not to touch it again.  Machine learning is like human learning as it learns from the past.  We feed a computer data, which represents some past experience, and then with the use of different statistical methods, we can make predictions.

Machine learning is all around us, from using Google Search, to diagnosis of skin cancer.  There are few industries yet to feel the effect.

This include the banking and finance sector.  One of the core uses of machine learning in banking has been to combat fraud and comply with AML regulations.  Banks can also use machine learning algorithms to analyse an applicant for credit, be that an individual or a business, and make approvals according to a set of pre-defined parameters. Some of these algorithms simply look at a customers credit score, age and postcode to make a decision in seconds.

To get an idea how machine learning works, we are going to focus on a bank and how they are using machine learning to make approval on loans.  We will dive into some theory and we will set up a simple algorithm to decided if a loan should be approved or rejected.

## Types of Machine Learning

There are generally 4 types of Machine learning based on their purpose:

Supervised:  This is a type of learning where by both inputs and outputs are known. For example, in supervised learning for image recognition a computer might be fed a series of pictures with labels, such as cars, vans and trucks. After feeding the computer with so many images, it should be able to classify an unlabelled image as a car, van or truck.

It is called supervised learning because the process of an algorithm learning from the training dataset is like a teacher supervising the learning of a student.

Supervised learning is used for both classification and regression problems

A classification problem is where a label is the output. Is it a car or van? Should you approve the loan or reject the loan?

A Regression problem deals with values and quantities.  Given the size of a house, predict the value.

Unsupervised Learning:  Unsupervised learning has unlabelled input data only.  The output is not known, and it is often used as exploratory analysis.   One aim is to classify & cluster data, for example grouping customers by purchasing behaviors.  Another aim is to find association rules.  For example, if you buy product Y you are also likely to buy product Z

Semi-supervised: machine learning algorithms that are semi supervised are fed small amounts of labeled data and taught how to classify a large batch of unprocessed data.

Reinforcement Learning: When an agent interacts with an environment where some actions give rewards.  For example in a computer game, the player would be the agent.  The agent must navigate a maze, at the end, is the key to the next level.  The key is the reward, the incentive.  However, it might take the agent 2-3 attempts, each time learning from their mistake in the environment, until the finally get out of the maze. ## What is a Decision Tree?

A decision tree is a set of questions used to filter information allowing you make a more informed decision. It is used in supervised learning to classify data.

Let’s look at our bank example. For a loan to be approved by a bank, the applicant completes a list of questions and these questions are used to judge if it is safe to give the loan or not.  To keep things at a very simple level, our sample bank asks 3 questions?

1. Do you own a home? Yes or No

2. What is your income bracket? Low, Average, Above average

3. What is your credit score? Below average, average, high This process is the most basic form of decision tree, each question is asked and at the end, the loan is either approved or rejected.

## Machine Learning and Decision Trees

Looking at our sample banks decision tree, how does the bank know which question, or attribute to start the tree with?  How can the bank ensure that they lend to people who won’t default and do it with the least amount of questions possible?

Using existing data on loan defaults, the bank can use this data as training data to teach a machine how to classify data and come up with a best direction to take with the questions that will minimise the number of loan defaults.

The bank will have existing data from loan application and will be able to tie this into a table showing who defaulted and who didn’t.  Like the sample shown. This existing data is known as training data because the machine will learn from this data. Machine learning will use algorithms to establish the best route to take in the decision tree, based on this past experience.

## Entropy and Information Gain

Entropy is a concept from Information Theory.  It is a measure of randomness of the information being processed.  In general, the more random the event, the more information it will contain. Entropy of 1 would suggest high probability and randomness and lots of information.  The closer is to 0, the less the less randomness and a lower probability.

The formula entropy for Entropy is:

Entropy of the data set(D) = -P1*log2(P1)-P2*log2(P2)…..

Where P= Probability

Looks a little complicated so let’s use it with our bank.

We want to use our training data to predict if an applicant should be approved or rejected for the loan.  The entropy of the dataset would therefore be the entropy of the ‘default’ column, as this is the label we want to predict.  If they person is likely to default, then we will not lend the money.

We have 18 observations.  10 default on the loans (yes) and 8 do not default (no).  Based on this we can calculate some probabilities

Probability of not defaulting = 8/18 = 0.444

Probability of default = 10/18 =0.556

Now we have our probabilities we can plug these into our entropy formula

D=-0.444*log2(0.444) – 0.556*log2(.556) = 0.991

0.991 is rather close to 1.  This suggests there is a lot of randomness in the data. If there is a lot of randomness it would be very hard to decide if the loan application should be approved or rejected.  There is a lot of information in these, such as if they own a home and their credit score. If we narrow our focus to just one piece of information, can we reduce this randomness?

Let’s calculated the entropy for the ‘home owners’ column.

We have 18 observations. 8 of which are homeowners, 10 are not.

Looking at just the home owners, 8 in total, there are 4 defaulter and 4 do not default

Therefore, the probability of a home owner defaulting is 4/8 = 0.5 and the probability of a home owner not defaulting is also 4/8 = 0.5

Entropy Home Owner  =-0.5*log2(0.5) – 0.5*log2(.5) = 1

Now looking at those that do not own homes.  There are 10 in total

The probability of a non home owner defaulting is 6/10 = 0.60 and the probability of a non home owner not defaulting is 4/10 = 0.40

Entropy Non Home Owner  =-0.60*log2(0.60) – 0.40*log2(.40) = 0.971

As there were two possible answers to the home owner question, we now have two entropy values. To get the entropy for the total column, we must combine these entropys proportionally.

Probability of being a home owner * entropy home owner = 8/18 * 1= 0.444

Probability of not being a home owner * entropy of not being a home owner  = 10/18 * 0.971 = 0.539

Entropy = 0.444 + 0.539= 0.984

We can see now that the entropy for the data set has been reduced from 0.991 to 0.984.  Although a small reduction, we can see by reducing the information we have reduced the randomness. By reducing the randomness, we increase the changes of not approving someone that will default.

There is a value attached to this reduction in randomness and this is known as information gain.  Information gain can be calculated by deducting the entropy for the restricted information from the entropy for the data set.

Information gain = 0.991 – 0.984 = 0.007

This information gain calculation comes from Information theory.

### How to Calculated Entropy and Information Gain In Excel

Now that you have be introduced to some of the probability calculations that can be used in algorithms, lets take a look at how these can be calculated in Excel.  Remember, this sample data is extremely simply and so we can easily use Excel for this demonstration.

Using Pivot tables in Excel is a very quick way of calculating probability.  All of the above seems complicated, but when you watch this video you will see how quickly you can calculated entropy and information gain

## To be Continued………..

In part 2 we will look at iterations and put all of this together to show how machine learning works.

You can access part 2 here

Do you have any questions or comments on the above?  If so please do use the comments section below, I would love to hear from you ( and I would love to reward your interaction with the STEEM token)

Don’t forget, If you care, you will share.  The sharing buttons can be found below the comments section