You can find more information on our web, so please take a look.
Luis Serrano
·
Follow
13 min read
·
Sep 16,
--
From Grokking Machine Learning by Luis SerranoRecommendation systems are one of the most common and exciting applications in machine learning. Ever wonder how Netflix recommends movies, YouTube guesses which videos you may watch, or Amazon shows you products you might be interested in buying? These are all examples of recommendation systems. One simple and interesting way to see recommendation problems is to consider them classification problems. Lets start with an easy example: our very own app-recommendation system using decision trees.
Take 40% off Grokking Machine Learning by entering fccserrano into the discount code box at checkout at manning.com.
This blog post is an excerpt from the Decision Trees chapter of my book Grokking Machine Learning. A similar example also appears in this video:
Decision treesRecommendation systems are one of the most common and exciting applications in machine learning. Ever wonder how Netflix recommends movies, YouTube guesses which videos you may watch, or Amazon shows you products you might be interested in buying? These are all examples of recommendation systems. One simple and interesting way to see recommendation problems is to consider them classification problems. Lets start with an easy example: our very own app-recommendation system using decision trees.
Lets say we want to build a system that recommends to users which app to download among the following options. We have the following three apps in our store (figure 1):
The training data is a table with the platform used by the user (iPhone or Android), their age, and the app they have downloaded (in real life there are many more platforms, but for simplicity well assume that these are the only two options). Our table contains six people, as shown in table 1.
Table 1. A dataset with users of an app store. For each customer, we record their platform, age, and the app they downloaded.Given this table, which app would you recommend to each of the following three customers?
What we should do follows:
However, going customer by customer seems like a tedious job. Next, well build a decision tree to take care of all customers at once.
In this section, we see how to build an app-recommendation system using decision trees. In a nutshell, the algorithm to build a decision tree follows:
In other words, what we do is decide which of the two features (platform or age) is more successful at determining which app the users will download and pick this one as our root of the deci- sion tree. Then, we iterate over the branches, always picking the most determining feature for the data in that branch, thus building our decision tree.
The first step to build our model is to figure out the most useful feature: in other words, the most useful question to ask. First, lets simplify our data a little bit. Lets call everyone under 20 years old Young and everyone 20 or older Adult. Our modified dataset is shown in table 2.
Table 2. Our dataset, with the age column simplified to two categories, young and adult.The building blocks of decision trees are questions of the form Does the user use an iPhone? or Is the user young? We need one of these to use as our root of the tree. Which one should we pick? We should pick the one that best determines the app they downloaded. To decide which question is better at this, lets compare them.
First question: Does the user use an iPhone or an Android?
This question splits the users into two groups, the iPhone users and Android users. Each group has three users in it. But we need to keep track of which app each user downloaded. A quick look at table 2 helps us notice the following:
Now lets see what happens if we split them by age.
Second question: Is the user young or adult?
This question splits the users into two groups, the young and the adult. Again, each group has three users in it. A quick look at table 2 helps us notice what each user downloaded, as follows:
The resulting decision stump is shown in figure 3.
Figure 3. If we split our users by age, we get this split. The young are at the left, and the adults at the right.From looking at figures 2 and 3, which one looks like a better split? It seems that the second one (based on age) is better, because it has picked up on the fact that all three young people downloaded Atom Count. But we need the computer to figure out that age is a better feature, so well give it some numbers to compare. In this section, we learn three ways to compare these two splits: accuracy, Gini impurity, and entropy. Lets start with the first one: accuracy.
Accuracy How often is our model correct?
Accuracy is the most popular metric for the effectiveness of a model. It is the fraction of correctly classified data points over the total number of data points. If youd like a small recap, please check this video.
Suppose that we are allowed only one question, and with that one question, we must deter- mine which app to recommend to our users. We have the following two classifiers:
Lets look more carefully at the classifiers. The key observation follows: if we must recommend an app by asking only one question, our best bet is to look at all the people who answered with the same answer and recommend the most common app among them.
Classifier 1: What platform do you use?
Notice that classifier 1 is correct four times out of six, and classifier 2 is correct five times out of six. Therefore, for this dataset, classifier 2 is better. In figure 4, you can see the two classifiers with their accuracy. Notice that the questions are reworded so that they have yes-or-no answers, which doesnt change the classifiers or the outcome.
Figure 4. Tree 1 is formed by asking the platform, and tree 2 is formed by asking the age.Gini impurity index: How diverse is my dataset?
The Gini impurity index, or Gini index, is another way we can compare the platform and age splits. If youd like to learn more about the Gini Index, you can continue reading, or check out this video:
The Gini impurity indexThe Gini index is a measure of diversity in a dataset. In other words, if we have a set in which all the elements are similar, this set has a low Gini index, and if all the elements are different, it has a large Gini index. For clarity, consider the following two sets of 10 colored balls (where any two balls of the same color are indistinguishable):
Set 1 looks more pure than set 2, because set 1 contains mostly red balls and a couple of blue ones, whereas set 2 has many different colors. Next, we devise a measure of impurity that assigns a low value to set 1 and a high value to set 2. This measure of impurity relies on probability. Consider the following question:
If we pick two random elements of the set, what is the probability that they have a different color? The two elements dont need to be distinct; we are allowed to pick the same element twice.
For more information, please visit TRM.
For set 1, this probability is low, because the balls in the set have similar colors. For set 2, this probability is high, because the set is diverse, and if we pick two balls, theyre likely to be of different colors. Lets calculate these probabilities. First, notice that by the law of complementary probabilities, the probability that we pick two balls of different colors is 1 minus the probability that we pick two balls of the same color:
P(picking two balls of different color) = 1 P(picking two balls of the same color)
Now lets calculate the probability that we pick two balls of the same color. Consider a general set, where the balls have n colors. Lets call them color 1, color 2, all the way up to color n. Because the two balls must be of one of the n colors, the probability of picking two balls of the same color is the sum of probabilities of picking two balls of each of the n colors:
P(picking two balls of the same color) = P(both balls are color 1) + P(both balls are color 2) + + P(both balls are color n)
What we used here is the sum rule for disjoint probabilities, that states the following:
Sum rule for disjoint probabilities If two events E and F are disjoint, namely, they never occur at the same time, then the probability of either one of them happening (the union of the events) is the sum of the probabilities of each of the events. In other words,
P(E F) = P(E) + P(F)
Now, lets calculate the probability that two balls have the same color, for each of the colors. Notice that were picking each ball completely independently from the others. Therefore, by the product rule for independent probabilities, the
probability that both balls have color 1 is the square of the probability that we pick one ball and it is of color 1. In general, if p_i is the probability that we pick a random ball and it is of color i, then
Putting all these formulas together, we get that
This last formula is the Gini index of the set.
Finally, the probability that we pick a random ball of color i is the number of balls of color i divided by the total number of balls. This leads to the formal definition of the Gini index.
Gini impurity index In a set with m elements and n classes, with ai elements belonging to the i-th class, the Gini impurity index is
where p_i = a_i/m . This can be interpreted as the probability that if we pick two random elements
out of the set, they belong to different classes.
Now we can calculate the Gini index for both of our sets. For clarity, the calculation of the
Gini index for set 1 is illustrated in figure 5 (with red and blue replaced by black and white).
Set 1: {red, red, red, red, red, red, red, red, blue, blue} (eight red balls, two blue balls)
Notice that, indeed, the Gini index of set 1 is larger than that of set 2.
Figure 4. The calculation of the Gini index for the set with eight black balls and two white ballsHow do we use the Gini index to decide which of the two ways to split the data (age or platform) is better? Clearly, if we can split the data into two purer datasets, we have performed a better split. Thus, lets calculate the Gini index of the set of labels of each of the leaves. Looking at figure 5, here are the labels of the leaves (where we abbreviate each app by the first letter in its name):
Classifier 1 (by platform):
Classifier 2 (by age):
The Gini indices of the sets {A,C,C}, {A,A,B}, and {B,C,C} are all the same: 11-(2/3)² (1/3)² = 0.444.
The Gini index of the set {A, A, A} is 1-(3/3)² = 0.
Classifier 1 (by platform):
Average Gini index = 12 (0.444+0.444) = 0.444
Classifier 2 (by age):
Average Gini index = 12 (0.444+0) = 0.222.
Figure 5. The two ways to split the dataset, by platform and age, and their Gini index calculations.We conclude that the second split is better, because it has a lower average Gini index.
Aside The Gini impurity index should not be confused with the Gini coefficient. The Gini coefficient is used in statistics to calculate the income or wealth inequality in countries. In this post, whenever we talk about the Gini index, we are referring to the Gini impurity index.
Entropy and information gain
Entropy and information gain have beautiful combinatorial properties, and they have useful applications in information theory. If youd like to learn more about it, I wrote a longer blog post with an accompanying video.
Shannon entropy and information gainClasses of different sizes? No problem: We can take weighted averages
In the previous sections we learned how to perform the best possible split by minimizing average Gini impurity index or entropy. However, imagine that you have a dataset with eight data points (which when training the decision tree, we also refer to as samples), and you split it into two datasets of sizes six and two. As you may imagine, the larger dataset should count for more in the calculations of Gini impurity index or entropy. Therefore, instead of considering the average, we consider the weighted average, where at each leaf, we assign the proportion of points correspond- ing to that leaf. Thus, in this case, we would weigh the first Gini impurity index (or entropy) by 6/8, and the second one by 2/8. Figure 6 shows an example of a weighted average Gini impurity index and a weighted average entropy for a sample split.
Figure 6. A split of a dataset of size eight into two datasets of sizes six and two.Second step to build the model: Iterating
In the previous section, we learned how to split the data in the best possible way using one of the features. That is the bulk of the training process of a decision tree. All that is left to finish build- ing our decision tree is to iterate on this step many times. In this section we learn how to do this.
Using the three methods, accuracy, Gini index, and entropy, we decided that the best split was made using the age feature. Once we make this split, our dataset is divided into two datasets. The split into these two datasets, with their accuracy, Gini index, and entropy, is illustrated in figure 7.
Figure 7. When we split our dataset by age, we get two datasets.Notice that the dataset on the left is pure all the labels are the same, its accuracy is 100%, and its Gini index and entropy are both 0. Theres nothing more we can do to split this dataset or to improve the classifications. Thus, this node becomes a leaf node, and when we get to that leaf, we return the prediction Atom Count.
The dataset on the right can still be divided, because it has two labels: Beehive Count and Check Mate Mate. Weve used the age feature already, so lets try using the platform feature. It turns out that were in luck, because the Android user downloaded Beehive Count, and the two iPhone users downloaded Check Mate Mate. Therefore, we can split this leaf using the platform feature and obtain the decision node shown in figure 8.
Figure 8. We can split the right leaf of the tree in figure 6 using platform and obtain two pure datasets.After this split, we are done, because we cant improve our splits any further. Thus, we obtain the tree in figure 9.
Figure 9. The resulting decision tree has two nodes and three leaves. This tree predicts every point in the original dataset correctly.This is the end of our process, and we have built a decision tree that classifies our entire dataset.
I hope you enjoyed this blog post! If you want to see more of the book, check it out on Mannings liveBook platform here.
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
If you want to learn more, please visit our website Split Set Mining Systems.