- dataset
- target (y) which what we want to optimize
- features (x1, x2, x3 etc...) that are the "inputs" or helpers to determine y
- model: predicts Y depending on the Xs and some input
- cost function: measure the errors between y and the predictions
- minimize algorithm: minimize the errors
Also note that m
is the number of rows in the dataset and n
the number of features
- dataset:
y (cost) | x1 (size) | x2(place) | x3(quality) |
---|---|---|---|
300k | 150m2 | Paris | 4 |
200K | 100m2 | Lyon | 2 |
250k | 125m2 | Bordeaux | 3 |
- model of:
f(x) = ax + b
(affine func)
Where we don't know a and b => it's the machine role to determine these. At the beginning, we don't know these, so we put in random numbers and we trace the function in a graph.
- cost function (conventially called J)
For that, let's use the euclidian norme calculating the distance between two points in the graph: a point of the dataset and the point on the same x that cross the line in the graph. The distance between the two points is calculated with: (f(xi) - yi)^2
We can now define J and make the parameters a
and b
vary: J(a, b) = (1/2m) x sum(f(xi) - yi)^2
where i
is the position of x
in the features. This function is called Mean squared error
- minimize function
One of these is the Gradient Descent: find the minimum of any convex function( like the square one for example without multiple local minimums)
Alpha (learning rate - it's like a step to find the minimum in the Gradient descent)
dimension m x n
has m
line and n
columns