Sign up with your email address to be the first to know about new products, VIP offers, blog features & more.

Top Math and Machine Learning Algorithms Cheat Sheet – with Python Samples

Mean (All the same)

    \[ {\bar {x} = \frac {(\sum i)}{n}} \]

    \[ {mean} = \frac {\textnormal sum of all values}{\textnormal number of values} \]

    \[ \bar x = \frac {x_1 + x_2 + x_3 + \dots + x_n}{n} \]

    \[ \frac {1}{n} \sum_{x=i}^{n} x_i \]

In math notation when we see \mu in a formula, it refers to Population and when we see \bar {x} it means Sample.

Moda
It is the value that is most frequent in a series. There is an implementation inside python implementation…

Variance – Population & Sample respectively.

    \[ \sigma^2 = \frac {\sum_{i=1}^{N} (x_i - \mu)^2} {N} \]

    \[ S^2 = \frac {\sum_{i=1}^{n} (x_i - \bar{x})^2} {n-1} \]

Note: S^2 is the formula for unbiased sample variance, since we are dividing by n -1

Standard Deviation – Population & Sample respectively.

    \[ \sigma = \sqrt {\frac {\sum_{i=1}^{N} (x_i - \mu)^2} {N}} \]

    \[ S = \sqrt {\frac {\sum_{i=1}^{n} (x_i - \bar{x})^2} {n-1}} \]

Note: Finding the S reintroduces bias.

  {y} is the value being predicted
  {y} is the value being predicted
import mathvalores = [1, 2, 3, 8, 4, 9.8, 6.5, 4, 3, 8, 5, 9, 3.3, 0, 4, 7, 9]valores_clean_float = [float(x) for x in valores]media = sum(valores_clean_float) / len(valores_clean_float)moda = max(valores_clean_float, key=valores_clean_float.count)valores_clean_float_sorted = sorted(valores_clean_float)list_size = len(valores_clean_float_sorted)if list_size % 2 == 0:    mediana = (valores_clean_float_sorted[int((list_size / 2) - 1)] +  valores_clean_float_sorted[(int(list_size / 2))]) / 2else:    mediana = valores_clean_float_sorted[math.floor(list_size / 2)]squared_distance_from_mean = [round((x-media)**2, 2) for x in valores_clean_float_sorted]variance = sum(squared_distance_from_mean)/list_sizestandard_deviation = math.sqrt(variance)print('Media: ', media, '- Moda: ', moda, '- Mediana: ', mediana, '- Variância: ', variance, '- Standard Deviation: ', standard_deviation)

Simple Linear Regression

Model Representation / Inference Function
{y} = B {0} + B{1}  * {x} or  {y} = {mx} + {b}
Loss / Error Function
{MSE}
Optimization Algorithm(s)
{Stochastic Gradient Descent}
Linear / Supervised / Classification
  {y} is the value being predicted
B{0} ou {b} is called intercept, are coefficients obtained through this equation  B{0} = {mean(y)} - B{1} * {mean(x)}
B{1} ou {m} is called slope, are coefficients obtained through  \hat{\beta}_1 = \frac{\sum(X_i - \bar{X}) (Y_i - \bar{Y})} {\sum(X_i - \bar{X})^2} where {X}_i is the current value of {x}\bar{X} is mean of the {x} values, {Y}_i is the current {y}, \bar{Y} mean of {y} values.
A shortcut formula is B1 = {corr({x},{y})} * \frac{stdev({y})} {stdev({x})} where corr({x, y}) is the correlation of {x} and {y} (aka Pearson’s correlation coefficient), which is a measure of how relates two variables are in the range of -1 to 1.
x = [1,2,4,3,5]y = [1,3,3,2,5]m = 0 b = 0grau_aprendizado = 0.01for i in range(4):    for i in range(len(x)):        previsao = m * float(i) + b # y = mx + b        erro = previsao - float(y[i]) # erro = p ( i ) - y ( i )        m = m - grau_aprendizado * erro * float(x[i])        b = b - grau_aprendizado * erro * 1.0        print "m {} b {}".format(m, b)

Logistic Regression

y = \frac{e ^{ B0 + B1 * x }} {1 +  e ^{ B0 + B1 * x }}
{MSE}
{Stochastic Gradient Descent}
Linear / Supervised / Classification
y is the predicted output
B0 is the bias or intercept
B1 is the coefficient for the single value x
Each column in your input data has an associated B coefficient (a constant real value) that must be learned from your training data. The actual representation of the model that you would store in memory or in a file are the coefficients in the equation (the beta value or B’s).

Linear Discriminant Analysis – LDA

discriminant(x) = x * \frac{mean} {variance} - \frac {mean ^{2}}{2 * variance} + ln(probability)
Linear / Supervised / Classification
Steps

1. Calculate the mean for each class
2. Calculate the class probabilities (in this example the classes are 0 and 1)
P(y = 0) = \frac {count(y=0)} {count(y=0)+count(y=1)}P(y = 1) = \frac {count(y=1)} {count(y=0)+count(y=1)}
3. Calculate the variance
SquaredDifference = (x - mean_k)^2

4. Making predictions – Just plug the values found above into the representation model
for X = 4.667797637 and Y = 0
discriminant(Y = 0|x) = 4.667797637 * \frac{4.975415507}{0.832931506} - \frac{4.975415507^2} {2 * 0.832931506} + ln(0.5)
discriminant(Y = 0|x) = 12.3293558

for X = 4.667797637 and Y = 1
discriminant(Y = 1|x) = 4.667797637 * \frac{20.08706292}{0.832931506} - \frac{20.08706292^2} {2 * 0.832931506} + ln(0.5)
discriminant(Y = 1|x) = -130.3349038

We can see that the discriminant value for Y = 0 (12.3293558) is larger than the discriminate value for Y = 1 (-130.3349038), therefore the model predicts Y = 0. Which we know is correct in the dataset.

CART – Classification And Regression Trees

G = ( ( 1 - ( {g1_1}^2 + {g1_2}^2 ) ) * \frac{n_g1} {n} ) + ( ( 1 - ( {g2_1}^2 + {g2_2}^2 ) ) * \frac{n_g2} {n} )
Non-Linear / Supervised / Classification
Steps

Sample Dataset

X1 		X2 		Y2.771244718	1.784783929 	01.728571309	1.169761413 	03.678319846	2.81281357 	03.961043357	2.61995032 	02.999208922	2.209014212 	07.497545867	3.162953546 	19.00220326	3.339047188 	17.444542326	0.476683375 	110.12493903	3.234550982 	16.642287351 	3.319983761 	1

1. Find the best Split Point Candidate for a feature by iterating through the dataset

1.1 Apply the LEFT RIGHT rule to a split point candidate (in this case the feature selected as the split point candidate is 2.7712)
IF X1 < 2.7712 THEN LEFT
IF X1 >= 2.7712 THEN RIGHT
X1          Y Group2.771244718 0 RIGHT1.728571309 0 LEFT3.678319846 0 RIGHT3.961043357 0 RIGHT2.999208922 0 RIGHT7.497545867 1 RIGHT9.00220326 1 RIGHT7.444542326 1 RIGHT10.12493903 1 RIGHT6.642287351 1 RIGHT

1.2 Calculate the proportions for each side related to each class
LEFT
Y = 0: \frac {1}{1} = 1.0
Y = 1: \frac {0}{1} = 0.0

RIGHT
Y = 0: \frac {4}{9} = 0.4444
Y = 1: \frac {5}{9} = 0.5555
1.3 Calculate the Gini for this candidate
Gini(X1 = 2.7712) = ( ( 1 - ( \frac{1^2}{1} + \frac{0^2}{1} ) ) * \frac{1} {10} ) + ( ( 1 - ( \frac{4^2}{9} + \frac{5^2}{9} ) ) * \frac{9} {10} )
Gini(X1 = 2.7712) = 0.4444
1.4 Continue iterating over the dataset until you find the lowest Gini. In this case the lowest Gini index is the X = 6.6422
IF X1 < 6.6422 THEN LEFT
IF X1 >= 6.6422 THEN RIGHT
X1          Y Group2.771244718 0 LEFT1.728571309 0 LEFT3.678319846 0 LEFT3.961043357 0 LEFT2.999208922 0 LEFT7.497545867 1 RIGHT9.00220326 1 RIGHT7.444542326 1 RIGHT10.12493903 1 RIGHT6.642287351 1 RIGHT

LEFT
Y = 0: \frac {5}{5} = 1.0
Y = 1: \frac {0}{5} = 0.0

RIGHT
Y = 0: \frac {0}{5} = 0.0
Y = 1: \frac {5}{5} = 1.0
Gini(X1 = 6.6422) = ( ( 1 - ( \frac{5^2}{5} + \frac{0^2}{5} ) ) * \frac{5} {10} ) + ( ( 1 - ( \frac{0^2}{5} + \frac{5^2}{5} ) ) * \frac{5} {10} )
Gini(X1 = 6.6422) = 0.0
This is a split that results in a pure Gini index, because the classes are perfectly separated. The LEFT child node will classify instances as class 0 and the RIGHT as class 1.

2. Making predictions – If a given value falls to LEFT than its a 0 class, if for the RIGHT its of class 1.

No Comments Yet.

What do you think?

Your email address will not be published. Required fields are marked *