Top Math and Machine Learning Algorithms Cheat Sheet &#8211; with Python Samples

${y}$ is the value being predicted

references...

${y}$ is the value being predicted

import mathvalores = [1, 2, 3, 8, 4, 9.8, 6.5, 4, 3, 8, 5, 9, 3.3, 0, 4, 7, 9]valores_clean_float = [float(x) for x in valores]media = sum(valores_clean_float) / len(valores_clean_float)moda = max(valores_clean_float, key=valores_clean_float.count)valores_clean_float_sorted = sorted(valores_clean_float)list_size = len(valores_clean_float_sorted)if list_size % 2 == 0:    mediana = (valores_clean_float_sorted[int((list_size / 2) - 1)] +  valores_clean_float_sorted[(int(list_size / 2))]) / 2else:    mediana = valores_clean_float_sorted[math.floor(list_size / 2)]squared_distance_from_mean = [round((x-media)**2, 2) for x in valores_clean_float_sorted]variance = sum(squared_distance_from_mean)/list_sizestandard_deviation = math.sqrt(variance)print('Media: ', media, '- Moda: ', moda, '- Mediana: ', mediana, '- Variância: ', variance, '- Standard Deviation: ', standard_deviation)

Simple Linear Regression

Model Representation / Inference Function
${y} = B {0} + B{1} * {x}$ or ${y} = {mx} + {b}$

Loss / Error Function
${MSE}$

Optimization Algorithm(s)
${Stochastic Gradient Descent}$

Linear / Supervised / Classification

${y}$ is the value being predicted

$B{0}$ ou ${b}$ is called intercept, are coefficients obtained through this equation $B{0} = {mean(y)} - B{1} * {mean(x)}$

$B{1}$ ou ${m}$ is called slope, are coefficients obtained through $\hat{\beta}_1 = \frac{\sum(X_i - \bar{X}) (Y_i - \bar{Y})} {\sum(X_i - \bar{X})^2}$ where ${X}_i$ is the current value of ${x}$ , $\bar{X}$ is mean of the ${x}$ values, ${Y}_i$ is the current ${y}$ , $\bar{Y}$ mean of ${y}$ values.

A shortcut formula is $B1 = {corr({x},{y})} * \frac{stdev({y})} {stdev({x})}$ where $corr({x, y})$ is the correlation of ${x}$ and ${y}$ (aka Pearson’s correlation coefficient), which is a measure of how relates two variables are in the range of -1 to 1.

x = [1,2,4,3,5]y = [1,3,3,2,5]m = 0 b = 0grau_aprendizado = 0.01for i in range(4):    for i in range(len(x)):        previsao = m * float(i) + b # y = mx + b        erro = previsao - float(y[i]) # erro = p ( i ) - y ( i )        m = m - grau_aprendizado * erro * float(x[i])        b = b - grau_aprendizado * erro * 1.0        print "m {} b {}".format(m, b)

Logistic Regression

$y = \frac{e ^{ B0 + B1 * x }} {1 + e ^{ B0 + B1 * x }}$

${MSE}$

${Stochastic Gradient Descent}$

Linear / Supervised / Classification

$y$ is the predicted output

$B0$ is the bias or intercept

$B1$ is the coefficient for the single value $x$

Each column in your input data has an associated B coefficient (a constant real value) that must be learned from your training data. The actual representation of the model that you would store in memory or in a file are the coefficients in the equation (the beta value or B’s).

Linear Discriminant Analysis – LDA

$discriminant(x) = x * \frac{mean} {variance} - \frac {mean ^{2}}{2 * variance} + ln(probability)$

Linear / Supervised / Classification

Steps

1. Calculate the mean for each class

2. Calculate the class probabilities (in this example the classes are 0 and 1)
$P(y = 0) = \frac {count(y=0)} {count(y=0)+count(y=1)}$ $P(y = 1) = \frac {count(y=1)} {count(y=0)+count(y=1)}$

3. Calculate the variance
$SquaredDifference = (x - mean_k)^2$

4. Making predictions – Just plug the values found above into the representation model
for X = 4.667797637 and Y = 0
$discriminant(Y = 0|x) = 4.667797637 * \frac{4.975415507}{0.832931506} - \frac{4.975415507^2} {2 * 0.832931506} + ln(0.5)$
$discriminant(Y = 0|x) = 12.3293558$

for X = 4.667797637 and Y = 1
$discriminant(Y = 1|x) = 4.667797637 * \frac{20.08706292}{0.832931506} - \frac{20.08706292^2} {2 * 0.832931506} + ln(0.5)$
$discriminant(Y = 1|x) = -130.3349038$

We can see that the discriminant value for Y = 0 (12.3293558) is larger than the discriminate value for Y = 1 (-130.3349038), therefore the model predicts Y = 0. Which we know is correct in the dataset.

CART – Classification And Regression Trees

$G = ( ( 1 - ( {g1_1}^2 + {g1_2}^2 ) ) * \frac{n_g1} {n} ) + ( ( 1 - ( {g2_1}^2 + {g2_2}^2 ) ) * \frac{n_g2} {n} )$

Non-Linear / Supervised / Classification

Steps

Sample Dataset

X1 		X2 		Y2.771244718	1.784783929 	01.728571309	1.169761413 	03.678319846	2.81281357 	03.961043357	2.61995032 	02.999208922	2.209014212 	07.497545867	3.162953546 	19.00220326	3.339047188 	17.444542326	0.476683375 	110.12493903	3.234550982 	16.642287351 	3.319983761 	1

1. Find the best Split Point Candidate for a feature by iterating through the dataset

1.1 Apply the LEFT RIGHT rule to a split point candidate (in this case the feature selected as the split point candidate is 2.7712)
IF X1 < 2.7712 THEN LEFT
IF X1 >= 2.7712 THEN RIGHT

X1          Y Group2.771244718 0 RIGHT1.728571309 0 LEFT3.678319846 0 RIGHT3.961043357 0 RIGHT2.999208922 0 RIGHT7.497545867 1 RIGHT9.00220326 1 RIGHT7.444542326 1 RIGHT10.12493903 1 RIGHT6.642287351 1 RIGHT

1.2 Calculate the proportions for each side related to each class
LEFT
$Y = 0: \frac {1}{1} = 1.0$
$Y = 1: \frac {0}{1} = 0.0$

RIGHT
$Y = 0: \frac {4}{9} = 0.4444$
$Y = 1: \frac {5}{9} = 0.5555$

1.3 Calculate the Gini for this candidate
$Gini(X1 = 2.7712) = ( ( 1 - ( \frac{1^2}{1} + \frac{0^2}{1} ) ) * \frac{1} {10} ) + ( ( 1 - ( \frac{4^2}{9} + \frac{5^2}{9} ) ) * \frac{9} {10} )$
$Gini(X1 = 2.7712) = 0.4444$

1.4 Continue iterating over the dataset until you find the lowest Gini. In this case the lowest Gini index is the X = 6.6422

IF X1 < 6.6422 THEN LEFT
IF X1 >= 6.6422 THEN RIGHT

X1          Y Group2.771244718 0 LEFT1.728571309 0 LEFT3.678319846 0 LEFT3.961043357 0 LEFT2.999208922 0 LEFT7.497545867 1 RIGHT9.00220326 1 RIGHT7.444542326 1 RIGHT10.12493903 1 RIGHT6.642287351 1 RIGHT

LEFT
$Y = 0: \frac {5}{5} = 1.0$
$Y = 1: \frac {0}{5} = 0.0$

RIGHT
$Y = 0: \frac {0}{5} = 0.0$
$Y = 1: \frac {5}{5} = 1.0$

$Gini(X1 = 6.6422) = ( ( 1 - ( \frac{5^2}{5} + \frac{0^2}{5} ) ) * \frac{5} {10} ) + ( ( 1 - ( \frac{0^2}{5} + \frac{5^2}{5} ) ) * \frac{5} {10} )$
$Gini(X1 = 6.6422) = 0.0$

This is a split that results in a pure Gini index, because the classes are perfectly separated. The LEFT child node will classify instances as class 0 and the RIGHT as class 1.

2. Making predictions – If a given value falls to LEFT than its a 0 class, if for the RIGHT its of class 1.