Image Classification with COIL-100 Dataset in PyTorch
In this post, we will use PyTorch to go through different models to classify images from the COIL-100 dataset and compare their performance.
COIL-100 Dataset
Columbia University Image Library (COIL-100) is a dataset of color images of 100 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated through 360 degrees to vary object pose with respect to a fixed color camera. Images of the objects were taken at pose intervals of 5 degrees. This corresponds to 72 poses per object (7200 images in total). The images were size normalized.
Preparing the Data
First, we import the libraries that we are going to use for this project.
In [1]:
import torch from torch.utils.data import Dataset, DataLoader, random_split from PIL import Image import pandas as pd from torchvision.transforms import ToTensor import matplotlib.pyplot as plt import torch.nn as nn import torch.nn.functional as F from torchvision.utils import make_grid
Here, We create two variables, one for the path of the images folder and one for the path of a .csv file that I created with information about the name of the files and their classification among the 100 existing classes.
In [2]:
path = '../input/coil100/coil-100/coil-100/' csv_path = '../input/namescsv/names.csv'
In [3]:
!head "{csv_path}"
Now, we create a custom dataset by extending the Dataset class from PyTorch. This will allow us to associate the right label to each image by using the data from our .csv file.
In [4]:
class CoilDataset(Dataset): def __init__(self, csv_file, root_dir, transform=None): self.df = pd.read_csv(csv_file) self.transform = transform self.root_dir = root_dir def __len__(self): return len(self.df) def __getitem__(self, idx): row = self.df.loc[idx] img_id, img_label = row['image'], row['label'] img_fname = self.root_dir + "/" + str(img_id) img = Image.open(img_fname) if self.transform: img = self.transform(img) return img, img_label
In [5]:
# PyTorch dataset dataset = CoilDataset(csv_path, path, transform=ToTensor())
We can see that we converted each image in a 3-dimensions tensor (3, 128, 128). The first dimension is for the number of channels. As we are using color images, there will be 3 channels (red, green, blue). The second and third dimensions are for the size of the image, in this case, 128px by 128px.
In [6]:
img, label = dataset[0] print(img.shape, label) img
torch.Size([3, 128, 128]) 20
Out[6]:
tensor([[[0.2078, 0.2078, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], ..., [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980]], [[0.2000, 0.2000, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], ..., [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980]], [[0.1255, 0.0392, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], ..., [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980], [0.0980, 0.0980, 0.0980, ..., 0.0980, 0.0980, 0.0980]]])
Now, we will define the hyperparameters for our models.
In [7]:
# Hyperparmeters batch_size = 128 learning_rate = 0.001 # Other constants in_channels = 3 input_size = in_channels * 128 * 128 num_classes = 101
Training and Validation Datasets
We are going to split the dataset into 3 parts:
- Training set – used to train the model (compute the loss and adjust the weights of the model using gradient descent).
- Validation set – used to evaluate the model while training, adjust hyperparameters (learning rate etc.) and pick the best version of the model.
- Test set – used to compare different models, or different types of modeling approaches, and report the final accuracy of the model.
In [8]:
random_seed = 11 torch.manual_seed(random_seed);
In [9]:
val_size = 720 test_size = 720 train_size = len(dataset) - val_size - test_size train_ds, val_ds, test_ds = random_split(dataset, [train_size, val_size, test_size]) len(train_ds), len(val_ds), len(test_ds)
Out[9]:
(5760, 720, 720)
We can now create data loaders for training and validation, to load the data in batches.
In [10]:
train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True) val_dl = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True) test_dl = DataLoader(test_ds, batch_size*2, num_workers=4, pin_memory=True)
We can look at batches of images from the dataset using the make_grid method from torchvision. Each time the following code is run, we get a different bach, since the sampler shuffles the indices before creating batches.
In [11]:
def show_batch(dl): for images, labels in dl: fig, ax = plt.subplots(figsize=(12, 6)) ax.set_xticks([]); ax.set_yticks([]) ax.imshow(make_grid(images, nrow=16).permute(1, 2, 0)) break
In [12]:
show_batch(train_dl)
Defining the Models
We are going to create three different models for this project:
- Logistic Regression.
- Deep Neural Network.
- Convolutional Neural Network.
1 Logistic Regression
Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes. the dependent variable is binary in nature having data coded as either 1 (stands for success/yes) or 0 (stands for failure/no).
Our images are of the shape 3x128x128, but we need them to be vectors of size 49.152. We’ll use the .reshape method of a tensor, which will allow us to efficiently ‘view’ each image as a flat vector, without really chaging the underlying data.
To include this additional functionality within our model, we need to define a custom model, by extending the nn.Module class from PyTorch.
In [13]:
class CoilLRModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(input_size, num_classes) def forward(self, xb): xb = xb.reshape(-1, in_channels*128*128) out = self.linear(xb) return out def training_step(self, batch): images, labels = batch out = self(images) # Generate predictions loss = F.cross_entropy(out, labels) # Calculate loss return loss def validation_step(self, batch): images, labels = batch out = self(images) # Generate predictions loss = F.cross_entropy(out, labels) # Calculate loss acc = accuracy(out, labels) # Calculate accuracy return {'val_loss': loss.detach(), 'val_acc': acc.detach()} def validation_epoch_end(self, outputs): batch_losses = [x['val_loss'] for x in outputs] epoch_loss = torch.stack(batch_losses).mean() # Combine losses batch_accs = [x['val_acc'] for x in outputs] epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()} def epoch_end(self, epoch, result): print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc'])) model = CoilLRModel()
Evaluation Metric and Loss Function
We need a way to evaluate how well our model is performing. A natural way to do this would be to find the percentage of labels that were predicted correctly (accuracy of the predictions). Then, we’ll define an evaluate function, which will perform the validation phase, and a fit function which will peform the entire training process.
In [14]:
def accuracy(outputs, labels): _, preds = torch.max(outputs, dim=1) return torch.tensor(torch.sum(preds == labels).item() / len(preds)) def evaluate(model, val_loader): outputs = [model.validation_step(batch) for batch in val_loader] return model.validation_epoch_end(outputs)
Training the model
In [15]:
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD): history = [] optimizer = opt_func(model.parameters(), lr) for epoch in range(epochs): # Training Phase for batch in train_loader: loss = model.training_step(batch) loss.backward() optimizer.step() optimizer.zero_grad() # Validation phase result = evaluate(model, val_loader) model.epoch_end(epoch, result) history.append(result) return history
The fit function records the validation loss and metric from each epoch and returns a history of the training process. This is useful for debuggin & visualizing the training process. Before we train the model, let’s see how the model performs on the validation set with the initial set of randomly initialized weights & biases.
Configurations like batch size, learning rate etc. need to picked in advance while training machine learning models, and are called hyperparameters. Picking the right hyperparameters is critical for training an accurate model within a reasonable amount of time, and is an active area of research and experimentation.
In [16]:
result0 = evaluate(model, val_dl) result0
Out[16]:
{'val_loss': 4.655781269073486, 'val_acc': 0.018830128014087677}
The initial accuracy is around 1.8%, which is what one might expect from a randomly intialized model (since it has a 1 in 100 chance of getting a label right by guessing randomly).
In [17]:
history = fit(10, learning_rate, model, train_dl, val_dl)
Epoch [0], val_loss: 4.2210, val_acc: 0.1641 Epoch [1], val_loss: 3.8676, val_acc: 0.2875 Epoch [2], val_loss: 3.5594, val_acc: 0.4057 Epoch [3], val_loss: 3.2867, val_acc: 0.5219 Epoch [4], val_loss: 3.0482, val_acc: 0.5643 Epoch [5], val_loss: 2.8349, val_acc: 0.6155 Epoch [6], val_loss: 2.6460, val_acc: 0.6747 Epoch [7], val_loss: 2.4802, val_acc: 0.7192 Epoch [8], val_loss: 2.3304, val_acc: 0.7447 Epoch [9], val_loss: 2.1958, val_acc: 0.7738
With 10 iterations, we went from 1.8% accuracy to 79% accuracy, which is a very good improvement.
In [18]:
accuracies = [r['val_acc'] for r in history] plt.plot(accuracies, '-x') plt.xlabel('epoch') plt.ylabel('accuracy') plt.title('Accuracy vs. No. of epochs');
In [19]:
# Evaluate on test dataset result = evaluate(model, test_dl) result
Out[19]:
{'val_loss': 2.218763589859009, 'val_acc': 0.7651241421699524}
Prediction
In [20]:
def predict_image(img, model): xb = img.unsqueeze(0) yb = model(xb) _, preds = torch.max(yb, dim=1) return preds[0].item()
In [21]:
img, label = test_ds[158] plt.imshow(img.permute(1, 2, 0)) print('Label:', label, ', Predicted:', predict_image(img, model))
Label: 64 , Predicted: 64
Saving the model
In [22]:
torch.save(model.state_dict(), '1-coil-logistic.pth')
2 Deep Neural Network
A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output.
In [23]:
def accuracy(outputs, labels): _, preds = torch.max(outputs, dim=1) return torch.tensor(torch.sum(preds == labels).item() / len(preds))
In [24]:
class CoilDNNModel(nn.Module): """Feedfoward neural network with 1 hidden layer""" def __init__(self, in_size, out_size): super().__init__() # hidden layer self.linear1 = nn.Linear(in_size, 2048) # hidden layer 2 self.linear2 = nn.Linear(2048, 1024) # output layer self.linear3 = nn.Linear(1024, out_size) def forward(self, xb): # Flatten the image tensors out = xb.view(xb.size(0), -1) # Get intermediate outputs using hidden layer 1 out = self.linear1(out) # Apply activation function out = F.relu(out) # Get intermediate outputs using hidden layer 2 out = self.linear2(out) # Apply activation function out = F.relu(out) # Get predictions using output layer out = self.linear3(out) return out def training_step(self, batch): images, labels = batch out = self(images) # Generate predictions loss = F.cross_entropy(out, labels) # Calculate loss return loss def validation_step(self, batch): images, labels = batch out = self(images) # Generate predictions loss = F.cross_entropy(out, labels) # Calculate loss acc = accuracy(out, labels) # Calculate accuracy return {'val_loss': loss, 'val_acc': acc} def validation_epoch_end(self, outputs): batch_losses = [x['val_loss'] for x in outputs] epoch_loss = torch.stack(batch_losses).mean() # Combine losses batch_accs = [x['val_acc'] for x in outputs] epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()} def epoch_end(self, epoch, result): print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))
Using a GPU
In [25]:
torch.cuda.is_available()
Out[25]:
True
In [26]:
def get_default_device(): """Pick GPU if available, else CPU""" if torch.cuda.is_available(): return torch.device('cuda') else: return torch.device('cpu')
In [27]:
device = get_default_device() device
Out[27]:
device(type='cuda')
In [28]:
def to_device(data, device): """Move tensor(s) to chosen device""" if isinstance(data, (list,tuple)): return [to_device(x, device) for x in data] return data.to(device, non_blocking=True)
In [29]:
class DeviceDataLoader(): """Wrap a dataloader to move data to a device""" def __init__(self, dl, device): self.dl = dl self.device = device def __iter__(self): """Yield a batch of data after moving it to device""" for b in self.dl: yield to_device(b, self.device) def __len__(self): """Number of batches""" return len(self.dl)
In [30]:
train_dl = DeviceDataLoader(train_dl, device) val_dl = DeviceDataLoader(val_dl, device) test_dl = DeviceDataLoader(test_dl, device)
Training the model
In [31]:
def evaluate(model, val_loader): outputs = [model.validation_step(batch) for batch in val_loader] return model.validation_epoch_end(outputs) def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD): history = [] optimizer = opt_func(model.parameters(), lr) for epoch in range(epochs): # Training Phase for batch in train_loader: loss = model.training_step(batch) loss.backward() optimizer.step() optimizer.zero_grad() # Validation phase result = evaluate(model, val_loader) model.epoch_end(epoch, result) history.append(result) return history
In [32]:
model = CoilDNNModel(input_size, out_size=num_classes) to_device(model, device)
Out[32]:
CoilDNNModel( (linear1): Linear(in_features=49152, out_features=2048, bias=True) (linear2): Linear(in_features=2048, out_features=1024, bias=True) (linear3): Linear(in_features=1024, out_features=101, bias=True) )
In [33]:
history = fit(10, learning_rate*10, model, train_dl, val_dl)
Epoch [0], val_loss: 4.4509, val_acc: 0.2058 Epoch [1], val_loss: 4.2162, val_acc: 0.2385 Epoch [2], val_loss: 3.8548, val_acc: 0.2724 Epoch [3], val_loss: 3.3516, val_acc: 0.3716 Epoch [4], val_loss: 2.8126, val_acc: 0.4919 Epoch [5], val_loss: 2.3059, val_acc: 0.5774 Epoch [6], val_loss: 1.8835, val_acc: 0.6671 Epoch [7], val_loss: 1.5763, val_acc: 0.6943 Epoch [8], val_loss: 1.3278, val_acc: 0.7539 Epoch [9], val_loss: 1.1209, val_acc: 0.8125
In [34]:
losses = [x['val_loss'] for x in history] plt.plot(losses, '-x') plt.xlabel('epoch') plt.ylabel('loss') plt.title('Loss vs. No. of epochs');
In [35]:
accuracies = [x['val_acc'] for x in history] plt.plot(accuracies, '-x') plt.xlabel('epoch') plt.ylabel('accuracy') plt.title('Accuracy vs. No. of epochs');
In [36]:
# Evaluate on test dataset result = evaluate(model, test_dl) result
Out[36]:
{'val_loss': 1.1144559383392334, 'val_acc': 0.8215144276618958}
Predictions
In [37]:
def predict_image(img, model): xb = to_device(img.unsqueeze(0), device) yb = model(xb) _, preds = torch.max(yb, dim=1) return preds[0].item()
In [38]:
img, label = test_ds[200] plt.imshow(img.permute(1, 2, 0)) print('Label:', label, ', Predicted:', predict_image(img, model))
Label: 2 , Predicted: 2
Saving the Model
In [39]:
torch.save(model.state_dict(), '2-coil-dnn.pth')
3 Convolutional Neural Network
a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.
In [40]:
def accuracy(outputs, labels): _, preds = torch.max(outputs, dim=1) return torch.tensor(torch.sum(preds == labels).item() / len(preds)) class ImageClassificationBase(nn.Module): def training_step(self, batch): images, labels = batch out = self(images) # Generate predictions loss = F.cross_entropy(out, labels) # Calculate loss return loss def validation_step(self, batch): images, labels = batch out = self(images) # Generate predictions loss = F.cross_entropy(out, labels) # Calculate loss acc = accuracy(out, labels) # Calculate accuracy return {'val_loss': loss.detach(), 'val_acc': acc} def validation_epoch_end(self, outputs): batch_losses = [x['val_loss'] for x in outputs] epoch_loss = torch.stack(batch_losses).mean() # Combine losses batch_accs = [x['val_acc'] for x in outputs] epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()} def epoch_end(self, epoch, result): print("Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format( epoch, result['train_loss'], result['val_loss'], result['val_acc']))
In [41]:
class CoilCNNModel(ImageClassificationBase): def __init__(self, in_channels, num_classes): super().__init__() self.network = nn.Sequential( nn.Conv2d(in_channels, 128, kernel_size=3, padding=1), nn.ReLU(), nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1), nn.ReLU(), nn.MaxPool2d(2, 2), nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1), nn.ReLU(), nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1), nn.ReLU(), nn.MaxPool2d(2, 2), nn.Flatten(), nn.Linear(256*1024, 1024), nn.ReLU(), nn.Linear(1024, 512), nn.ReLU(), nn.Linear(512, num_classes)) def forward(self, xb): return self.network(xb)
In [42]:
model = CoilCNNModel(in_channels, num_classes) model
Out[42]:
CoilCNNModel( (network): Sequential( (0): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU() (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU() (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Flatten() (11): Linear(in_features=262144, out_features=1024, bias=True) (12): ReLU() (13): Linear(in_features=1024, out_features=512, bias=True) (14): ReLU() (15): Linear(in_features=512, out_features=101, bias=True) ) )
Using GPU
In [43]:
def get_default_device(): """Pick GPU if available, else CPU""" if torch.cuda.is_available(): return torch.device('cuda') else: return torch.device('cpu') def to_device(data, device): """Move tensor(s) to chosen device""" if isinstance(data, (list,tuple)): return [to_device(x, device) for x in data] return data.to(device, non_blocking=True) class DeviceDataLoader(): """Wrap a dataloader to move data to a device""" def __init__(self, dl, device): self.dl = dl self.device = device def __iter__(self): """Yield a batch of data after moving it to device""" for b in self.dl: yield to_device(b, self.device) def __len__(self): """Number of batches""" return len(self.dl)
In [44]:
device = get_default_device() device
Out[44]:
device(type='cuda')
In [45]:
train_dl = DeviceDataLoader(train_dl, device) val_dl = DeviceDataLoader(val_dl, device) test_dl = DeviceDataLoader(test_dl, device) to_device(model, device);
Training the Model
In [46]:
@torch.no_grad() def evaluate(model, val_loader): model.eval() outputs = [model.validation_step(batch) for batch in val_loader] return model.validation_epoch_end(outputs) def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD): history = [] optimizer = opt_func(model.parameters(), lr) for epoch in range(epochs): # Training Phase model.train() train_losses = [] for batch in train_loader: loss = model.training_step(batch) train_losses.append(loss) loss.backward() optimizer.step() optimizer.zero_grad() # Validation phase result = evaluate(model, val_loader) result['train_loss'] = torch.stack(train_losses).mean().item() model.epoch_end(epoch, result) history.append(result) return history
In [47]:
model = to_device(CoilCNNModel(in_channels, num_classes), device)
In [48]:
evaluate(model, val_dl)
Out[48]:
{'val_loss': 4.615262031555176, 'val_acc': 0.011017628014087677}
In [49]:
num_epochs = 4 opt_func = torch.optim.Adam
In [50]:
history = fit(num_epochs, learning_rate, model, train_dl, val_dl, opt_func)
Epoch [0], train_loss: 2.8882, val_loss: 0.8734, val_acc: 0.7458 Epoch [1], train_loss: 0.4132, val_loss: 0.1532, val_acc: 0.9605 Epoch [2], train_loss: 0.1390, val_loss: 0.0667, val_acc: 0.9764 Epoch [3], train_loss: 0.0844, val_loss: 0.1092, val_acc: 0.9676
In [51]:
def plot_accuracies(history): accuracies = [x['val_acc'] for x in history] plt.plot(accuracies, '-x') plt.xlabel('epoch') plt.ylabel('accuracy') plt.title('Accuracy vs. No. of epochs');
In [52]:
plot_accuracies(history)
In [53]:
def plot_losses(history): train_losses = [x.get('train_loss') for x in history] val_losses = [x['val_loss'] for x in history] plt.plot(train_losses, '-bx') plt.plot(val_losses, '-rx') plt.xlabel('epoch') plt.ylabel('loss') plt.legend(['Training', 'Validation']) plt.title('Loss vs. No. of epochs');
In [54]:
plot_losses(history)
Testing with individual images
In [55]:
def predict_image(img, model): # Convert to a batch of 1 xb = to_device(img.unsqueeze(0), device) # Get predictions from model yb = model(xb) # Pick index with highest probability _, preds = torch.max(yb, dim=1) # Retrieve the class label return preds[0].item()
In [56]:
img, label = test_ds[0] plt.imshow(img.permute(1, 2, 0)) print('Label:', label, ', Predicted:', predict_image(img, model))
Label: 94 , Predicted: 94
In [57]:
img, label = test_ds[100] plt.imshow(img.permute(1, 2, 0)) print('Label:', label, ', Predicted:', predict_image(img, model))
Label: 35 , Predicted: 35
In [58]:
# Evaluate on test dataset result = evaluate(model, test_dl) result
Out[58]:
{'val_loss': 0.10439281165599823, 'val_acc': 0.9718549847602844}
Saving the Model
In [59]:
torch.save(model.state_dict(), '3-coil-cnn.pth')