Once again we are using MNIST dataset as in ML7.5.

This time around, instead of getting them as PIL images to then manually convert them to tensors, we are exploiting the transform argument of the datasets objects [Datasets].

We are passing the transforms.toTensor() method that converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] [Transforms].

train_dataset = dsets.MNIST(
    root="./data", train=True, transform=transforms.ToTensor(), download=True
)

test_dataset = dsets.MNIST(root="./data", train=False, transform=transforms.ToTensor())
train_dataset
Dataset MNIST
    Number of datapoints: 60000
    Root location: ./data
    Split: Train
    StandardTransform
Transform: ToTensor()
len(train_dataset)
60000

The MNIST training set consists of 60000 images but we don’t want to train the algorithm as a single batch. Instead, as discussed in ML29 we are going to train the algorithm on mini-batches of the original training set.

Given that we want to set batch size $m_b=128$ images and the number of epochs $=5$ we will have $60000/m_b \approx 468$ images per batch. This means that in 5 epochs the algorithm will have $2343$ training iterations

batch_size = 128
epochs = 5
iterations = epochs * len(train_dataset) // batch_size
iterations
2343

Let’s make the dataset iteratble with the DataLoader

train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset, batch_size=batch_size, shuffle=True
)

test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset, batch_size=batch_size, shuffle=False
)

Compared to the logistic regression architecture, which is composed by a single linear layer to which is applied a Cross Entropy Loss function (that applies non-linearity and computes the cross-entropy), an FNN is composed of additional linear layer and non-linear layers.

png

Our model architecture will be:

class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)

        # Non-linearity
        self.sigmoid = nn.Sigmoid()

        # Linear function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # Linear function  # LINEAR
        out = self.fc1(x)

        # Non-linearity  # NON-LINEAR
        out = self.sigmoid(out)

        # Linear function (readout)  # LINEAR
        out = self.fc2(out)
        return out

MNIST dataset is composed of images of $28 \times 28$ pixel. Each one of these images are encoded in a vector $x^{(i)} \in \mathbb{R}^{28\cdot28}$ , which will be mapped on a hidden layer and then on the output layer. The output layer must encode for 10 output classes and it is therefore a vector $ \hat{y}^{(i)} \in \mathbb{R}^{10}$. The hidden layer can have any dimension we want, in this case we decide to have 100 hidden units, so that the input vector will be mapped to a vector $x^{(i)[1]} \in \mathbb{R}^{100}$

input_dim = 28 * 28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

The Loss function $\mathcal{L}$ of our choice is Cross entropy loss

criterion = nn.CrossEntropyLoss()

Learning rate is the speed at which the algorithm learns, setting it too high would result gradient descent being unable to land on the minimum loss, while setting it to small would result in a very slow learning process

learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

We are ready to start the training process

For each epoch we expoit the DataLoader to iterate over mini-batches. Then for each mini-batch

  1. we compute the model output model(images) for the images in the mini-batch
  2. we compute the distance $\mathcal{L}$ criterion(outputs, labels) between predictions and labels
  3. we compute the gradient that will bring us closer to the labels loss.backward()
  4. we update the parameters in the direction indicated by the gradients optimizer.step()
iter = 0
losses = []
accuracies = []
for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images with gradient accumulation capabilities
        images = images.view(-1, 28 * 28).requires_grad_()

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iter += 1
        losses.append(loss.item())

        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images with gradient accumulation capabilities
                images = images.view(-1, 28 * 28).requires_grad_()

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()
                accuracy = correct / total
                accuracies.append(accuracy)

            # Print Loss
            print(
                "Iteration: {}. Loss: {}. Accuracy: {}".format(
                    iter, loss.item(), accuracy
                )
            )
Iteration: 500. Loss: 0.6400718688964844. Accuracy: 0.8641999959945679
Iteration: 1000. Loss: 0.43419456481933594. Accuracy: 0.8948000073432922
Iteration: 1500. Loss: 0.4028102457523346. Accuracy: 0.9046000242233276
Iteration: 2000. Loss: 0.20865440368652344. Accuracy: 0.9110000133514404

We can optionally store the values of $\mathcal{L}$ at each iteration to later plot its progress

plt.plot(losses);

png

Comments