Tensorflow: Multiple Linear Regression model from scratch with calculations explained

Jyoti Yadav
5 min readApr 11, 2021

--

For a beginner, given the pool of resources available over internet, it becomes very difficult to understand even the basics of Tensorflow. This sometimes ends up getting demotivated. Once something is left, its hard to pick it up again with the fear of losing again. Then another defeat makes the next try exponentially tough :( . So, in order to break this vicious cycle, I am writing this very easy and understandable approach for a first time learner :) .

Source: https://xkcd.com/1831/

In this section, we will be converting linear regression equations into a tensorflow code. I hope all of us are familiar with basics of matrix multiplication :P . ARE WE READYYYY?????

(In case you are new to metrix operations, please refer to this link for full explanation: https://online.stat.psu.edu/stat462/node/132/)

Let’s start!

Source: https://xkcd.com/605/

The matrix representation for linear regression looks like:

Equation 1: https://online.stat.psu.edu/stat462/node/132/

After a bit of struggle with the above matrices, spits out the following formula for least square estimates:

Equation 2: https://online.stat.psu.edu/stat462/node/132/

Our task is to convert these equations into a code involving tensorflow library. Sounds easy?

There are two sections to this article:

  1. Multiple Linear Regression
  2. Multiple Linear Regression with manual computation of gradients

Multiple Linear Regression

  1. Import useful libraries
import numpy as np
import tensorflow as tf
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

2. Download data

fetched_data = fetch_california_housing()
m, n = fetched_data.data.shape
data_with_bias = np.c_[np.ones((m,1)), fetched_data.data]

Data snapshot:

The first line in the code pulls up the inbuilt dataset in scikit-learn library. The data is divided into two parts: features and target. In order to call features use “fetched_data.data” and for target use “fetched_data.target”. In order to pull the column names use “fetched_data.feature_names”. The last line of the code adds a bias term(a column containing 1s) in the feature set(as depicted in the matrix equation 1).

3. Create Tensors for feature and target variable

X = tf.constant(data_with_bias , dtype = tf.float32, name = "features")y = tf.constant(fetched_data.target.reshape(-1,1), dtype = tf.float32, name = "target")

X and y are the features and targets respectively. Since they are going to remain same for the entire analysis(Non-stochastic), they have been considered as constants.

4. Calculate b(or theta)(formula used in second equation)

XT = tf.transpose(X)theta = tf.matmul(tf.matmul(tf.linalg.inv(tf.matmul(XT, X)), XT), y)print(theta)

XT is the transpose of matrix X and tf.matmul is for matrix multiplication. tf.linalg.inv takes the inverse of the matrices added inside it. Try matching it with the matrix equation 2.

2. Multiple Linear Regression with manual computation of gradients

This section will help you understand how the above calculated theta can be optimized through the loss function as it is updated as a fraction of loss function. This is based on “Gradient Descent” approach. In short, the coefficients(theta) are updated in the direction of negative slope.

The formula for coefficient(theta) update function is :

Equation 3

lets standardize the variables, as it will reduce the computation time for every epoch(through there are other mathematical reasons behind standardization, but lets consider this one here).

scaler = StandardScaler()scaled_data = scaler.fit_transform(data_with_bias)
  1. Decide on the number of epochs and learning rate:
n_epoch = 1000learning_rate = 0.0000001

n_epochs refers to the number of iterations, in case the saturation point in loss is not reached, try increasing it. Whereas, learning rate is the fraction by which theta will update. A higher value of learning rate will create “Exploding gradient issue” whereas a very small value will lead to long training time. CHOOSE IT WISELY, AS IF YOUR LIFE DEPENDS ON IT!

2. Create Tensors for Scaled feature

X = tf.constant(scaled_data, dtype = tf.float32, name = "Scaled Features")

3. Initialise the theta using uniform distribution

theta = tf.Variable(tf.random.uniform([n+1, 1], -1.0, 1.0), name = "theta")

The uniform distribution has a range of [-1,1] so that both negative & positive values are considered to be the starting point. Theta is considered as variable here as its value is going to update over time. “n” is the number of features her(calculated earlier), there will be n+1(features + bias term) coefficients there. Therefore theta will have n+1 rows and 1 column.

4. Start the loop

for epoch in range(n_epoch):   y_pred = tf.matmul(X, theta, name = "predictions")   error = y_pred - y   mse = tf.reduce_mean(tf.square(error), name = "mse")   gradients = 2/m * tf.matmul(tf.transpose(X), error)   theta = theta - learning_rate * gradients  
if epoch % 100 == 0:
print("Epoch: ", epoch, "MSE: ", mse)

The process inside a loop will be repeated n_epoch times. The first line inside the loop calculates the prediction which is simply coefficients * features. The second line calculated the deviation between the predicted value and the actual value(target variable). Third line makes use of the error to calculate loss. Higher the deviation higher will be the loss and thereby the adjustment in the theta. forth line calculates the gradient. Fifth line finally uses gradient to update theta in every epoch(try matching it with equation 3).

If the epochs are a multiple of 100, the code will print out the epoch number and corresponding loss.

As you can see the loss is decreasing by every epoch. SUCCESS!!!!

I will be adding more posts on tensorflow and Neural Networks. STAY TUNED!!!!

Ending with a funny comic strip! Thanks!

--

--