%reset -f

Logistic regression implementation

Pytorch

Pytorch implementation of logistic regression is extremely similar to that of linear regression. The two main differences is that the linear regression layer is wrapped by a non-linearity function and that the loss function used is the Cross Entropy Loss.

import torch

Once again the steps for training a model in Pytorch as defined in Pytorch documentation are

Load Dataset
Make Dataset Iterable
Create Model Class
Instantiate Model Class
Instantiate Loss Class
Instantiate Optimizer Class
Train Model

In this case we are using the famous MNIST dataset which consists of a series of images of handwritten digits. The logistic regression model will need to correctly identify the digit represented in the image.

The dsets module has access to a number of datasets, MNIST is one of them. By setting the download parameter to True, the whole datasets is downloaded locally.

import torchvision.datasets as dsets

train_dataset = dsets.MNIST(root='./data', train=True, download=False)
test_dataset = dsets.MNIST(root='./data', train=False)

the dataset is made of a series of tuples containing two objects, a PIL image and the number it represents

train_dataset[0]

(<PIL.Image.Image image mode=L size=28x28 at 0x7F01E73260D0>, 5)

As you can see from the PIL object repr the images are $28 \times 28$ pixels in size.

A sample of 100 images in the dataset show us that the images are low-res grayscale hadrwitten digits with different styles.

svg

Images in a computer are just matrices of real values that represent the intensity of each pixel. This is also the format that is understood by machine learning algorithms. In particular for Pytorch to be able to process it, the dataset needs to be transformed to tensor. To do that we use the transforms module and ToTensor() function.

Doing that after loading the dataset would be a bit more difficult so, in practice, this step is usually performed contextually to the data load, using the transform argument.

import torchvision.transforms as transforms

train_dataset = dsets.MNIST(
    root='./data', 
    train=True, 
    download=False, 
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ]))

test_dataset = dsets.MNIST(
    root='./data', 
    train=False,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ]))

In the transform step we also specified that the tensor needs to be normalized. This avoids exploding or vanishing gradients problems. Once again the normalization strategy adopted is standardization

\[z = \frac{x-\mu}{\sigma}\]

We need to pass the mean $\mu$ and standard deviation $\sigma$ to the Normalize function inside tuples. The first tuple contains the mean of each channel in the figure and the second tuple contains the standard deviation of each channel in the figure and the second. In this case we only have one channel.

Now the images in the dataset have been converted to tensors and their values normalized. Since our input are square images of $28 \times 28$ pixels and they only have one channel, the tensor $x_i \in \mathbb{R}^{1 \times 28 \times 28}$ or $x_i \in \mathbb{R}^{784}$

train_dataset[0][0].shape

torch.Size([1, 28, 28])

And these are the values of the 11th row of pixels in the first image of the dataset

train_dataset[0][0][0][10]

tensor([-0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
        -0.4242, -0.2460, -0.4115,  1.5359,  2.7960,  0.7213, -0.4242, -0.4242,
        -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242, -0.4242,
        -0.4242, -0.4242, -0.4242, -0.4242])

We can now define here some hyperparameters that will come in handy later

batch_size = 100
epochs = 3000 / (len(train_dataset) / batch_size)
input_dim = np.prod(train_dataset[0][0].shape)
output_dim = 10
alpha = 0.001

the train and test datasets are wrapped in a DataLoader that provide functions to iterate over the dataset, split it into patches and shuffle it.

train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=batch_size,
    shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset,
    batch_size=batch_size,
    shuffle=False
)

For logistic regression we just need a linear model that we will wrap into a non linearity function like Sigmoid, ReLU or softmax. So as for linear regression, our model is just made of a Linear object instantiated with the input and output dimensions that in this case are $28 \times 28 = 784$ for the input and $10$ for the output (we want to identify handwritten digits).

# model = torch.nn.Linear(input_dim, output_dim)
class LogisticRegression(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LogisticRegression, self).__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim)

    def forward(self, x):
        outputs = self.linear(x)
        return outputs

model = LogisticRegression(input_dim, output_dim)

The loss function $\mathcal{L}$ adopted in logistic regression is the Cross Entropy Loss.

\[\mathcal{L} = - \left(y \log(p)+(1-y)\log(1-p) \right)\]

criterion = torch.nn.CrossEntropyLoss()

The optimizer we are using is Stochastic Gradient Descent with a preset learning rate $\alpha$

optimizer = torch.optim.SGD(model.parameters(), lr=alpha)

The training loop is a bit more complex than the linear regression example

from torch.autograd import Variable

losses = []
iteration = 0
for epoch in range(int(epochs)):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28 * 28))
        labels = Variable(labels)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        losses.append((iteration, loss.item()))

        iteration += 1
        if iteration % 500 == 0:
            # calculate Accuracy
            correct = 0
            total = 0
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28 * 28))
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum()
            accuracy = 100 * correct/total

            print("Iteration: {}. Loss: {}. Accuracy: {}.".format(
                iteration, loss.item(), accuracy)
                 )

Iteration: 500. Loss: 0.6954136490821838. Accuracy: 83.30000305175781.
Iteration: 1000. Loss: 0.6825755834579468. Accuracy: 86.30000305175781.
Iteration: 1500. Loss: 0.5101509094238281. Accuracy: 87.45999908447266.
Iteration: 2000. Loss: 0.4781498610973358. Accuracy: 88.12000274658203.
Iteration: 2500. Loss: 0.4298083782196045. Accuracy: 88.68000030517578.
Iteration: 3000. Loss: 0.40631353855133057. Accuracy: 89.01000213623047.

plt.plot(*zip(*losses))

[<matplotlib.lines.Line2D at 0x7f01e7281a30>]

svg

Logistic regression implementation

Pytorch

Comments