Training AlexNet with tips and checks on how to train CNNs: Practical CNNs in PyTorch
Overfitting small batch, manually checking loss. Creating data pipelines. I give a complete and detailed introduction on how to create AlexNet model in PyTorch with code.
- Data
- Data Loading
- Check dataloaders
- Model Construction
- Best practices for CNN
- Weight Initialization
- PyTorch specific discussion
- How to check my model is correct?
- Using Pretrained Models
- Conclusion
- References
Link to jupyter notebook
- Download Imagenet. You can refer to the Imagenet site to download the data. If you have limited internet, then this option is good, as you can download fewer images. Or use ImageNet Object Localization Challenge to directly download all the files (warning 155GB).
- Unzip the tar.gz file using
tar xzvf file_name -C destination_path
. - In the Data/CLS-LOC folder you have the train, val and test images folders. The train images are already in their class folders i.e. the images of dogs are in a folder called dog and images of cats are in cat folder. But the val images are not classified in their class folders.
- Use this command from your terminal in the val folder
wget -qO- [https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh) | bash
. It would move all the images to their respective class folders. - As a general preprocessing step, we rescale all images to 256x??? on thir shorter side. As this operation repeats everytime I store the rescaled version of the images on disk. Using
find . -name “*.JPEG” | xargs -I {} convert {} -resize “256^>” {}
.
After doing the above steps you would have your folder with all the images in their class folders and the dimension of all images would be 256x???.
train_dir = '../../../Data/ILSVRC2012/train'
val_dir = '../../../Data/ILSVRC2012/val'
size = 224
batch_size = 32
num_workers = 8
data_transforms = {
'train': transforms.Compose([
transforms.CenterCrop(size),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]),
'val': transforms.Compose([
transforms.CenterCrop(size),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
}
image_datasets = {
'train': ImageFolder(train_dir, transform=data_transforms['train']),
'val': ImageFolder(val_dir, transform=data_transforms['val']),
}
data_loader = {
x: torch.utils.data.DataLoader(image_datasets[x],
batch_size=batch_size,
shuffle=True,
num_workers=num_workers) for x in ['train', 'val']
}
The normalization values are precalculated for the Imagenet dataset so we use those values for normalization step.
# As our images are normalized we have to denormalize them and
# convert them to numpy arrays.
def imshow(img, title=None):
img = img.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
img = std*img + mean
img = np.clip(img, 0, 1)
plt.imshow(img)
if title is not None:
plt.title(title)
plt.pause(0.001) #Pause is necessary to display images correctly
images, labels = next(iter(data_loader['train']))
grid_img = make_grid(images[:4], nrow=4)
imshow(grid_img, title = [labels_list[x] for x in labels[:4]])
One problem that you will face with Imagenet data is with getting the class names. The class names are contained in the file LOC_synset_mapping.txt.
f = open("../../Data/LOC_synset_mapping.txt", "r")
labels_dict = {} # Get class label by folder name
labels_list = [] # Get class label by indexing
for line in f:
split = line.split(' ', maxsplit=1)
split[1] = split[1][:-1]
label_id, label = split[0], split[1]
labels_dict[label_id] = label
labels_list.append(split[1])
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super().__init__()
self.conv_base = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2, bias=False),
nn.BatchNorm2d(96),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.fc_base = nn.Sequential(
nn.Dropout(),
nn.Linear(256*6*6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.conv_base(x)
x = x.view(x.size(0), 256*6*6)
x = self.fc_base(x)
return x
See the division of the conv_base and fc_base in the model. This is a general scheme that you would see in most implementations i.e. dividing the model into smaller models. We use 0-indexing to access the layers for now, but in future posts, I would use names for layers (as it would help for weight initialization).
Best practices for CNN
- Activation function:- ReLU is the default choice. But LeakyReLU is also good. Use LeakyReLU in GANs always.
- Weight Initialization:- Use He initialization as default with ReLU. PyTorch provides kaimingnormal for this purpose.
- Preprocess data:- There are two choices normalizing between [-1,1] or using (x-mean)/std. We prefer the former when we know different features do not relate to each other.
- Batch Normalization:- Apply before non-linearity i.e. ReLU. For the values of the mean and variance use the running average of the values while training as test time. PyTorch automatically maintains this for you. Note: In a recent review paper for ICLR 2019, FixUp initialization was introduced. Using it, you don’t need batchnorm layers in your model.
- Pooling layers:- Apply after non-linearity i.e. ReLU. Different tasks would require different pooling methods for classification max-pool is good.
- Optimizer:- Adam is a good choice, SDG+momentum+nesterov is also good. fast.ai recently announced a new optimizer AdamW. Choice of optimizer comes to experimentation and the task at hand. Look at benchmarks using different optimizers as a reference.
conv_list = [0, 4, 8, 10, 12]
fc_list = [1, 4, 6]
for i in conv_list:
torch.nn.init.kaiming_normal_(model.conv_base[i].weight)
for i in fc_list:
torch.nn.init.kaiming_normal_(model.fc_base[i].weight)
Create optimizers, schedulers and loss functions.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
# Cross entropy loss takes the logits directly, so we don't need to apply softmax in our CNN
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0005)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'max', verbose=True)
PyTorch specific discussion
- You have to specify the padding yourself. Check this thread for discussion on this topic.
- Create the optimizer after moving the model to GPU.
- The decision to add softmax layer in your model depends on your loss function. In case of CrossEntropyLoss, we do not need to add softmax layer in our model as that is handled by loss function itself.
- Do not forget to zero the grads.
inputs, labels = next(iter(data_loader['train']))
inputs = inputs.to(device)
labels = labels.to(device)
criterion_check1 = nn.CrossEntropyLoss()
optimizer_check1 = optim.SGD(model.parameters(), lr=0.001)
model.train()
for epoch in range(200):
optimizer_check1.zero_grad()
outputs = model(inputs)
loss = criterion_check1(outputs, labels)
_, preds = torch.max(outputs, 1)
loss.backward()
optimizer_check1.step()
if epoch%10 == 0:
print('Epoch {}: Loss = {} Accuracy = {}'.format(epoch+1, loss.item(), torch.sum(preds == labels)))
Check 2
Double check loss value. If you are doing a binary classification and are getting a loss of 2.3 on the first iter then it is ok, but if you are getting a loss of 100 then there are some problems.
In the above figure, you can see we got a loss value of 10.85 which is ok considering the fact we have 1000 classes. In case you get weird loss values try checking for negative signs.
data = np.load('../../../Data/cifar_10.npz')
alexnet = torch.load('../../../Data/Pytorch Trained Models/alexnet.pth')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x_train = torch.from_numpy(data['train_data'])
y_train = torch.from_numpy(data['train_labels'])
x_test = torch.from_numpy(data['test_data'])
y_test = torch.from_numpy(data['test_labels'])
# Create data loader
class CIFAR_Dataset(torch.utils.data.Dataset):
"""
Generally you would not load images in the __init__ as it forces the images
to load into memory. Instead you should load the images in getitem function,
but as CIFAR is small dataset I load all the images in memory.
"""
def __init__(self, x, y, transform=None):
self.x = x
self.y = y
self.transform = transform
def __len__(self):
return self.x.size(0)
def __getitem__(self, idx):
image = self.x[idx]
label = self.y[idx].item()
if self.transform:
image = self.transform(image)
return (image, label)
data_transforms = transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((224, 224)),
transforms.Resize(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
datasets = {
'train': CIFAR_Dataset(x_train, y_train, transform=data_transforms),
'test': CIFAR_Dataset(x_test, y_test, transform=data_transforms)
}
data_loader = {
x: torch.utils.data.DataLoader(datasets[x],
batch_size=64,
shuffle=True,
num_workers=8) for x in ['train', 'test']
}
# Freeze conv layers
for param in alexnet.parameters():
param.requires_grad = False
# Initialize the last layer of alexnet model for out 10 class dataset
alexnet.classifier[6] = nn.Linear(4096, 10)
alexnet = alexnet.to(device)
criterion = nn.CrossEntropyLoss()
# Create list of params to learn
params_to_learn = []
for name,param in alexnet.named_parameters():
if param.requires_grad == True:
params_to_learn.append(param)
optimizer = optim.SGD(params_to_learn, lr=0.001, momentum=0.9, nesterov=True)
Refer to this script on how I processed CIFAR data after downloading from the official site. You can also download CIFAR from torchvision.datasets.
PyTorch has a very good tutorial on fine-tuning torchvision models. I give a short implementation with the rest of the code being in the jupyter notebook.
Conclusion
We discussed how to create dataloaders, plot images to check data loaders are correct. Then we implemented AlexNet in PyTorch and then discussed some important choices while working with CNNs like activations functions, pooling functions, weight initialization (code for He. initialization was also shared). Some checks like overfitting small dataset and manually checking the loss function were then discussed. We concluded by using a pre-trained AlenNet to classify CIFAR-10 images.
References
- https://github.com/soumith/imagenet-multiGPU.torch Helped in preprocessing of the Imagenet dataset.
- https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html Many code references were taken from this tutorial.
- https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf AlexNet paper