├── Basic CV └── readme.md ├── utils ├── logger.py ├── metrics.py ├── scheduler.py └── visualizer.py ├── tests ├── test_model.py └── test_utils.py ├── architectures ├── __init__.py ├── classification │ ├── vit.py │ ├── mobilenet.py │ ├── resnet.py │ └── Resnet │ │ ├── Resnet-34.py │ │ ├── Resnet-152.py │ │ ├── Resnet-101.py │ │ ├── Resnet-50.py │ │ ├── Resnet-18.py │ │ └── readme.md ├── detection │ ├── yolov5.py │ └── faster_rcnn.py ├── segmentation │ ├── unet.py │ └── deeplabv3.py └── captioning │ ├── cnn_encoder.py │ └── lstm_decoder.py ├── requirements.txt ├── projects └── image_classification │ ├── evaluate.py │ ├── train.py │ ├── config.yaml │ └── README.md ├── .gitignore └── README.md /Basic CV/readme.md: -------------------------------------------------------------------------------- 1 | # okiii -------------------------------------------------------------------------------- /utils/logger.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /utils/metrics.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /tests/test_model.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /tests/test_utils.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /utils/scheduler.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /utils/visualizer.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/__init__.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Add your project dependencies here 2 | -------------------------------------------------------------------------------- /architectures/classification/vit.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/detection/yolov5.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/segmentation/unet.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/captioning/cnn_encoder.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/captioning/lstm_decoder.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/classification/mobilenet.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/classification/resnet.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/detection/faster_rcnn.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /architectures/segmentation/deeplabv3.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /projects/image_classification/evaluate.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /projects/image_classification/train.py: -------------------------------------------------------------------------------- 1 | # TODO: implement this 2 | -------------------------------------------------------------------------------- /projects/image_classification/config.yaml: -------------------------------------------------------------------------------- 1 | # model and training configs 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | __pycache__/ 3 | *.pt 4 | *.h5 5 | logs/ 6 | results/ 7 | datasets/ 8 | -------------------------------------------------------------------------------- /projects/image_classification/README.md: -------------------------------------------------------------------------------- 1 | # Computer-Vision-Projects\Projects\Image Classification 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Computer Vision Projects 2 | 3 | This repo contains implementations of various computer vision architectures and tasks. 4 | -------------------------------------------------------------------------------- /architectures/classification/Resnet/Resnet-34.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | # --------------------------------- 6 | # Basic Residual Block (Same as ResNet-18) 7 | # --------------------------------- 8 | class BasicBlock(nn.Module): 9 | expansion = 1 10 | 11 | def __init__(self, in_channels, out_channels, stride=1, downsample=None): 12 | super(BasicBlock, self).__init__() 13 | 14 | self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 15 | stride=stride, padding=1, bias=False) 16 | self.bn1 = nn.BatchNorm2d(out_channels) 17 | 18 | self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, 19 | padding=1, bias=False) 20 | self.bn2 = nn.BatchNorm2d(out_channels) 21 | 22 | self.downsample = downsample 23 | self.relu = nn.ReLU(inplace=True) 24 | 25 | def forward(self, x): 26 | identity = x 27 | 28 | out = self.relu(self.bn1(self.conv1(x))) 29 | out = self.bn2(self.conv2(out)) 30 | 31 | if self.downsample: 32 | identity = self.downsample(x) 33 | 34 | out += identity 35 | out = self.relu(out) 36 | return out 37 | 38 | # --------------------------------- 39 | # ResNet-34 with 34 layers 40 | # --------------------------------- 41 | class ResNet34(nn.Module): 42 | def __init__(self, num_classes=1000): 43 | super(ResNet34, self).__init__() 44 | self.in_channels = 64 45 | 46 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 47 | bias=False) 48 | self.bn1 = nn.BatchNorm2d(64) 49 | self.relu = nn.ReLU(inplace=True) 50 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 51 | 52 | self.layer1 = self._make_layer(BasicBlock, 64, 3) 53 | self.layer2 = self._make_layer(BasicBlock, 128, 4, stride=2) 54 | self.layer3 = self._make_layer(BasicBlock, 256, 6, stride=2) 55 | self.layer4 = self._make_layer(BasicBlock, 512, 3, stride=2) 56 | 57 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 58 | self.fc = nn.Linear(512 * BasicBlock.expansion, num_classes) 59 | 60 | def _make_layer(self, block, out_channels, blocks, stride=1): 61 | downsample = None 62 | 63 | if stride != 1 or self.in_channels != out_channels * block.expansion: 64 | downsample = nn.Sequential( 65 | nn.Conv2d(self.in_channels, out_channels * block.expansion, 66 | kernel_size=1, stride=stride, bias=False), 67 | nn.BatchNorm2d(out_channels * block.expansion) 68 | ) 69 | 70 | layers = [block(self.in_channels, out_channels, stride, downsample)] 71 | self.in_channels = out_channels * block.expansion 72 | 73 | for _ in range(1, blocks): 74 | layers.append(block(self.in_channels, out_channels)) 75 | 76 | return nn.Sequential(*layers) 77 | 78 | def forward(self, x): 79 | x = self.relu(self.bn1(self.conv1(x))) 80 | x = self.maxpool(x) 81 | 82 | x = self.layer1(x) # 64 83 | x = self.layer2(x) # 128 84 | x = self.layer3(x) # 256 85 | x = self.layer4(x) # 512 86 | 87 | x = self.avgpool(x) 88 | x = torch.flatten(x, 1) 89 | return self.fc(x) 90 | 91 | # Test 92 | if __name__ == "__main__": 93 | model = ResNet34(num_classes=10) 94 | x = torch.randn(2, 3, 224, 224) 95 | out = model(x) 96 | print("ResNet-34 output:", out.shape) # (2, 10) 97 | -------------------------------------------------------------------------------- /architectures/classification/Resnet/Resnet-152.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | # Reuse the same Bottleneck block as above 5 | class Bottleneck(nn.Module): 6 | expansion = 4 7 | 8 | def __init__(self, in_channels, out_channels, stride=1, downsample=None): 9 | super(Bottleneck, self).__init__() 10 | self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False) 11 | self.bn1 = nn.BatchNorm2d(out_channels) 12 | 13 | self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, 14 | stride=stride, padding=1, bias=False) 15 | self.bn2 = nn.BatchNorm2d(out_channels) 16 | 17 | self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, 18 | kernel_size=1, bias=False) 19 | self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) 20 | 21 | self.relu = nn.ReLU(inplace=True) 22 | self.downsample = downsample 23 | 24 | def forward(self, x): 25 | identity = x 26 | 27 | out = self.relu(self.bn1(self.conv1(x))) 28 | out = self.relu(self.bn2(self.conv2(out))) 29 | out = self.bn3(self.conv3(out)) 30 | 31 | if self.downsample: 32 | identity = self.downsample(x) 33 | 34 | out += identity 35 | out = self.relu(out) 36 | return out 37 | 38 | 39 | class ResNet152(nn.Module): 40 | def __init__(self, num_classes=1000): 41 | super(ResNet152, self).__init__() 42 | self.in_channels = 64 43 | 44 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 45 | bias=False) 46 | self.bn1 = nn.BatchNorm2d(64) 47 | self.relu = nn.ReLU(inplace=True) 48 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 49 | 50 | self.layer1 = self._make_layer(Bottleneck, 64, 3) 51 | self.layer2 = self._make_layer(Bottleneck, 128, 8, stride=2) 52 | self.layer3 = self._make_layer(Bottleneck, 256, 36, stride=2) # 🔥 Heavy one! 53 | self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2) 54 | 55 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 56 | self.fc = nn.Linear(512 * Bottleneck.expansion, num_classes) 57 | 58 | def _make_layer(self, block, out_channels, blocks, stride=1): 59 | downsample = None 60 | 61 | if stride != 1 or self.in_channels != out_channels * block.expansion: 62 | downsample = nn.Sequential( 63 | nn.Conv2d(self.in_channels, out_channels * block.expansion, 64 | kernel_size=1, stride=stride, bias=False), 65 | nn.BatchNorm2d(out_channels * block.expansion) 66 | ) 67 | 68 | layers = [block(self.in_channels, out_channels, stride, downsample)] 69 | self.in_channels = out_channels * block.expansion 70 | 71 | for _ in range(1, blocks): 72 | layers.append(block(self.in_channels, out_channels)) 73 | 74 | return nn.Sequential(*layers) 75 | 76 | def forward(self, x): 77 | x = self.relu(self.bn1(self.conv1(x))) 78 | x = self.maxpool(x) 79 | 80 | x = self.layer1(x) 81 | x = self.layer2(x) 82 | x = self.layer3(x) # 36 bottleneck blocks! 83 | x = self.layer4(x) 84 | 85 | x = self.avgpool(x) 86 | x = torch.flatten(x, 1) 87 | return self.fc(x) 88 | 89 | 90 | # Test 91 | if __name__ == "__main__": 92 | model = ResNet152(num_classes=10) 93 | x = torch.randn(2, 3, 224, 224) 94 | out = model(x) 95 | print("ResNet-152 output:", out.shape) # (2, 10) 96 | -------------------------------------------------------------------------------- /architectures/classification/Resnet/Resnet-101.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | # Same Bottleneck block used in ResNet-50 5 | class Bottleneck(nn.Module): 6 | expansion = 4 7 | 8 | def __init__(self, in_channels, out_channels, stride=1, downsample=None): 9 | super(Bottleneck, self).__init__() 10 | self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False) 11 | self.bn1 = nn.BatchNorm2d(out_channels) 12 | 13 | self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, 14 | stride=stride, padding=1, bias=False) 15 | self.bn2 = nn.BatchNorm2d(out_channels) 16 | 17 | self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, 18 | kernel_size=1, bias=False) 19 | self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) 20 | 21 | self.relu = nn.ReLU(inplace=True) 22 | self.downsample = downsample 23 | 24 | def forward(self, x): 25 | identity = x 26 | 27 | out = self.relu(self.bn1(self.conv1(x))) 28 | out = self.relu(self.bn2(self.conv2(out))) 29 | out = self.bn3(self.conv3(out)) 30 | 31 | if self.downsample: 32 | identity = self.downsample(x) 33 | 34 | out += identity 35 | out = self.relu(out) 36 | return out 37 | 38 | 39 | class ResNet101(nn.Module): 40 | def __init__(self, num_classes=1000): 41 | super(ResNet101, self).__init__() 42 | self.in_channels = 64 43 | 44 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 45 | bias=False) 46 | self.bn1 = nn.BatchNorm2d(64) 47 | self.relu = nn.ReLU(inplace=True) 48 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 49 | 50 | self.layer1 = self._make_layer(Bottleneck, 64, 3) 51 | self.layer2 = self._make_layer(Bottleneck, 128, 4, stride=2) 52 | self.layer3 = self._make_layer(Bottleneck, 256, 23, stride=2) # 👈 Key difference 53 | self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2) 54 | 55 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 56 | self.fc = nn.Linear(512 * Bottleneck.expansion, num_classes) 57 | 58 | def _make_layer(self, block, out_channels, blocks, stride=1): 59 | downsample = None 60 | 61 | if stride != 1 or self.in_channels != out_channels * block.expansion: 62 | downsample = nn.Sequential( 63 | nn.Conv2d(self.in_channels, out_channels * block.expansion, 64 | kernel_size=1, stride=stride, bias=False), 65 | nn.BatchNorm2d(out_channels * block.expansion) 66 | ) 67 | 68 | layers = [block(self.in_channels, out_channels, stride, downsample)] 69 | self.in_channels = out_channels * block.expansion 70 | 71 | for _ in range(1, blocks): 72 | layers.append(block(self.in_channels, out_channels)) 73 | 74 | return nn.Sequential(*layers) 75 | 76 | def forward(self, x): 77 | x = self.relu(self.bn1(self.conv1(x))) 78 | x = self.maxpool(x) 79 | 80 | x = self.layer1(x) 81 | x = self.layer2(x) 82 | x = self.layer3(x) # 23 bottleneck blocks! 83 | x = self.layer4(x) 84 | 85 | x = self.avgpool(x) 86 | x = torch.flatten(x, 1) 87 | return self.fc(x) 88 | 89 | 90 | # Test 91 | if __name__ == "__main__": 92 | model = ResNet101(num_classes=10) 93 | x = torch.randn(2, 3, 224, 224) 94 | out = model(x) 95 | print("ResNet-101 output:", out.shape) # (2, 10) 96 | -------------------------------------------------------------------------------- /architectures/classification/Resnet/Resnet-50.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | # --------------------------------- 5 | # Bottleneck Block for deeper ResNets (ResNet-50+) 6 | # --------------------------------- 7 | class Bottleneck(nn.Module): 8 | expansion = 4 # Final out_channels = out_channels * 4 9 | 10 | def __init__(self, in_channels, out_channels, stride=1, downsample=None): 11 | super(Bottleneck, self).__init__() 12 | 13 | self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False) 14 | self.bn1 = nn.BatchNorm2d(out_channels) 15 | 16 | self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, 17 | stride=stride, padding=1, bias=False) 18 | self.bn2 = nn.BatchNorm2d(out_channels) 19 | 20 | self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, 21 | kernel_size=1, bias=False) 22 | self.bn3 = nn.BatchNorm2d(out_channels * self.expansion) 23 | 24 | self.relu = nn.ReLU(inplace=True) 25 | self.downsample = downsample 26 | 27 | def forward(self, x): 28 | identity = x 29 | 30 | out = self.relu(self.bn1(self.conv1(x))) 31 | out = self.relu(self.bn2(self.conv2(out))) 32 | out = self.bn3(self.conv3(out)) 33 | 34 | if self.downsample: 35 | identity = self.downsample(x) 36 | 37 | out += identity 38 | out = self.relu(out) 39 | return out 40 | 41 | # --------------------------------- 42 | # ResNet-50 with Bottleneck Blocks 43 | # --------------------------------- 44 | class ResNet50(nn.Module): 45 | def __init__(self, num_classes=1000): 46 | super(ResNet50, self).__init__() 47 | self.in_channels = 64 48 | 49 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 50 | bias=False) 51 | self.bn1 = nn.BatchNorm2d(64) 52 | self.relu = nn.ReLU(inplace=True) 53 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 54 | 55 | self.layer1 = self._make_layer(Bottleneck, 64, 3) 56 | self.layer2 = self._make_layer(Bottleneck, 128, 4, stride=2) 57 | self.layer3 = self._make_layer(Bottleneck, 256, 6, stride=2) 58 | self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2) 59 | 60 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 61 | self.fc = nn.Linear(512 * Bottleneck.expansion, num_classes) 62 | 63 | def _make_layer(self, block, out_channels, blocks, stride=1): 64 | downsample = None 65 | 66 | if stride != 1 or self.in_channels != out_channels * block.expansion: 67 | downsample = nn.Sequential( 68 | nn.Conv2d(self.in_channels, out_channels * block.expansion, 69 | kernel_size=1, stride=stride, bias=False), 70 | nn.BatchNorm2d(out_channels * block.expansion) 71 | ) 72 | 73 | layers = [block(self.in_channels, out_channels, stride, downsample)] 74 | self.in_channels = out_channels * block.expansion 75 | 76 | for _ in range(1, blocks): 77 | layers.append(block(self.in_channels, out_channels)) 78 | 79 | return nn.Sequential(*layers) 80 | 81 | def forward(self, x): 82 | x = self.relu(self.bn1(self.conv1(x))) 83 | x = self.maxpool(x) 84 | 85 | x = self.layer1(x) # 256 86 | x = self.layer2(x) # 512 87 | x = self.layer3(x) # 1024 88 | x = self.layer4(x) # 2048 89 | 90 | x = self.avgpool(x) 91 | x = torch.flatten(x, 1) 92 | return self.fc(x) 93 | 94 | # Test 95 | if __name__ == "__main__": 96 | model = ResNet50(num_classes=10) 97 | x = torch.randn(2, 3, 224, 224) 98 | out = model(x) 99 | print("ResNet-50 output:", out.shape) # (2, 10) 100 | -------------------------------------------------------------------------------- /architectures/classification/Resnet/Resnet-18.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | # ---------------------------- 6 | # 1. Basic Residual Block 7 | # ---------------------------- 8 | class BasicBlock(nn.Module): 9 | expansion = 1 # Used to compute output channels in the ResNet class 10 | 11 | def __init__(self, in_channels, out_channels, stride=1, downsample=None): 12 | """ 13 | A BasicBlock has two 3x3 convolutions and a skip connection. 14 | If input and output dimensions differ, a downsample layer is applied. 15 | """ 16 | super(BasicBlock, self).__init__() 17 | 18 | # First convolution layer 19 | self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 20 | stride=stride, padding=1, bias=False) 21 | self.bn1 = nn.BatchNorm2d(out_channels) 22 | 23 | # Second convolution layer 24 | self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, 25 | stride=1, padding=1, bias=False) 26 | self.bn2 = nn.BatchNorm2d(out_channels) 27 | 28 | self.downsample = downsample # To match dimensions if needed 29 | self.relu = nn.ReLU(inplace=True) 30 | 31 | def forward(self, x): 32 | identity = x # Save input for skip connection 33 | 34 | out = self.relu(self.bn1(self.conv1(x))) 35 | out = self.bn2(self.conv2(out)) 36 | 37 | if self.downsample: 38 | identity = self.downsample(x) # Match input shape to output shape 39 | 40 | out += identity # Add skip connection 41 | out = self.relu(out) 42 | return out 43 | 44 | # ---------------------------- 45 | # 2. ResNet-18 Model 46 | # ---------------------------- 47 | class ResNet18(nn.Module): 48 | def __init__(self, num_classes=1000): 49 | """ 50 | ResNet-18 has 4 stages with [2, 2, 2, 2] BasicBlocks. 51 | """ 52 | super(ResNet18, self).__init__() 53 | self.in_channels = 64 # Initial channel count after first conv 54 | 55 | # Initial Conv Layer (stem) 56 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 57 | bias=False) 58 | self.bn1 = nn.BatchNorm2d(64) 59 | self.relu = nn.ReLU(inplace=True) 60 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 61 | 62 | # Four residual stages 63 | self.layer1 = self._make_layer(BasicBlock, 64, 2) 64 | self.layer2 = self._make_layer(BasicBlock, 128, 2, stride=2) 65 | self.layer3 = self._make_layer(BasicBlock, 256, 2, stride=2) 66 | self.layer4 = self._make_layer(BasicBlock, 512, 2, stride=2) 67 | 68 | # Global Average Pool and FC layer 69 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # Output size (1x1) 70 | self.fc = nn.Linear(512 * BasicBlock.expansion, num_classes) 71 | 72 | def _make_layer(self, block, out_channels, blocks, stride=1): 73 | """ 74 | Creates a stage with multiple residual blocks. 75 | """ 76 | downsample = None 77 | 78 | if stride != 1 or self.in_channels != out_channels * block.expansion: 79 | # Use 1x1 conv to match dimensions when needed 80 | downsample = nn.Sequential( 81 | nn.Conv2d(self.in_channels, out_channels * block.expansion, 82 | kernel_size=1, stride=stride, bias=False), 83 | nn.BatchNorm2d(out_channels * block.expansion) 84 | ) 85 | 86 | layers = [] 87 | # First block may have downsampling 88 | layers.append(block(self.in_channels, out_channels, stride, downsample)) 89 | self.in_channels = out_channels * block.expansion 90 | 91 | # Remaining blocks (no downsampling) 92 | for _ in range(1, blocks): 93 | layers.append(block(self.in_channels, out_channels)) 94 | 95 | return nn.Sequential(*layers) 96 | 97 | def forward(self, x): 98 | x = self.relu(self.bn1(self.conv1(x))) 99 | x = self.maxpool(x) 100 | 101 | x = self.layer1(x) # 64 102 | x = self.layer2(x) # 128 103 | x = self.layer3(x) # 256 104 | x = self.layer4(x) # 512 105 | 106 | x = self.avgpool(x) # (B, 512, 1, 1) 107 | x = torch.flatten(x, 1) # (B, 512) 108 | x = self.fc(x) # Final output logits 109 | return x 110 | 111 | # ---------------------------- 112 | # 3. Test the Model 113 | # ---------------------------- 114 | if __name__ == "__main__": 115 | model = ResNet18(num_classes=10) 116 | print(model) 117 | 118 | x = torch.randn(2, 3, 224, 224) # Dummy input 119 | out = model(x) 120 | print("Output shape:", out.shape) # (2, 10) 121 | -------------------------------------------------------------------------------- /architectures/classification/Resnet/readme.md: -------------------------------------------------------------------------------- 1 | 2 | # 📚 **ResNet Implementations for Computer Vision** 3 | 4 | Welcome to the ResNet implementation repository! Here, you’ll find various ResNet architectures implemented from scratch in **PyTorch**, including **ResNet-18**, **ResNet-34**, **ResNet-50**, **ResNet-101**, and **ResNet-152**. These are some of the most influential architectures in the field of **deep learning** and **computer vision**. 5 | 6 | This repository includes: 7 | - **Implementation of each ResNet model**. 8 | - Explanations of how ResNet works. 9 | - Use cases of ResNet and its variants. 10 | - Links to related research papers and further reading. 11 | 12 | --- 13 | 14 | ## 💡 **Overview of ResNet** 15 | 16 | **ResNet** (Residual Networks) was introduced in the paper "Deep Residual Learning for Image Recognition" by **Kaiming He et al.** in 2015. ResNet's key innovation is the introduction of **residual learning**, which allows training of very deep networks by avoiding the vanishing gradient problem through **skip connections**. 17 | 18 | The key idea is to learn the residual (difference) between input and output, instead of the direct mapping, which makes the training of deeper networks more efficient and effective. 19 | 20 | --- 21 | 22 | ## 🚀 **ResNet Architecture** 23 | 24 | ### **Basic Components of ResNet:** 25 | 1. **Residual Blocks**: Each block consists of two or more convolutional layers, where the input is directly added to the output of the convolution, bypassing one or more layers. 26 | 2. **Skip Connections**: These connections allow gradients to flow directly across layers, making deep networks trainable. 27 | 3. **Bottleneck Blocks (for deeper ResNets)**: These blocks are used in architectures like ResNet-50, ResNet-101, and ResNet-152, where the input is compressed and expanded to reduce computation and parameters. 28 | 29 | ### **ResNet Variants**: 30 | - **ResNet-18**: Smallest ResNet variant with 18 layers. Ideal for quick experiments and small datasets. 31 | - **ResNet-34**: Slightly larger, with more layers for better performance on standard datasets. 32 | - **ResNet-50**: Uses **Bottleneck blocks**. Suitable for more complex problems. 33 | - **ResNet-101**: Deeper ResNet with 101 layers. Best for highly complex problems. 34 | - **ResNet-152**: The deepest ResNet, used for extremely complex problems and large datasets. 35 | 36 | --- 37 | 38 | ## 🏆 **Use Cases of ResNet** 39 | 40 | ResNet is widely used across many **computer vision** tasks, including but not limited to: 41 | - **Image Classification**: Classifying objects in images, typically using datasets like **ImageNet**. 42 | - **Object Detection**: Using pre-trained ResNet networks as backbone feature extractors in networks like **Faster R-CNN** and **YOLO**. 43 | - **Image Segmentation**: Using ResNet with techniques like **U-Net** for tasks such as medical image analysis. 44 | - **Facial Recognition**: ResNet has been applied in facial recognition systems due to its robustness in identifying features. 45 | - **Transfer Learning**: ResNet models are often used as pre-trained models, fine-tuned for specific tasks such as image classification or object detection. 46 | 47 | --- 48 | 49 | ## 🔬 **Sources for Study** 50 | 51 | ### **Research Papers**: 52 | - **[Deep Residual Learning for Image Recognition (ResNet Paper)](https://arxiv.org/abs/1512.03385)**: The foundational paper introducing ResNet and the idea of residual learning. 53 | - **[Identity Mappings in Deep Residual Networks (ResNet-110)](https://arxiv.org/abs/1603.05027)**: Explores identity mappings in ResNet architectures for even deeper networks. 54 | - **[Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187)**: A paper that gives practical tips on improving ResNet models for image classification tasks. 55 | 56 | ### **Books and Tutorials**: 57 | - **[Deep Learning with Python by François Chollet](https://www.manning.com/books/deep-learning-with-python)**: A great resource for learning deep learning concepts and their applications using Keras and TensorFlow. 58 | - **[Stanford CS231n - Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)**: A comprehensive and in-depth course on computer vision and neural networks by Stanford. 59 | - **[PyTorch Documentation](https://pytorch.org/docs/stable/)**: Official documentation for PyTorch. Essential for understanding and implementing neural networks. 60 | 61 | --- 62 | 63 | 64 | 65 | 66 | ## 🔗 **References** 67 | 68 | 1. **[Original ResNet Paper: Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)** 69 | 2. **[PyTorch Official Documentation](https://pytorch.org/docs/stable/)** 70 | 3. **[Stanford CS231n Course: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)** 71 | 72 | --- 73 | 74 | ## 🌟 **Acknowledgments** 75 | 76 | Special thanks to **Kaiming He et al.** for their groundbreaking work in developing ResNet. This repository is built upon their work to further research and explore deep learning techniques in computer vision. 77 | 78 | --------------------------------------------------------------------------------