├── Basic CV
    └── readme.md
├── utils
    ├── logger.py
    ├── metrics.py
    ├── scheduler.py
    └── visualizer.py
├── tests
    ├── test_model.py
    └── test_utils.py
├── architectures
    ├── __init__.py
    ├── classification
    │   ├── vit.py
    │   ├── mobilenet.py
    │   ├── resnet.py
    │   └── Resnet
    │   │   ├── Resnet-34.py
    │   │   ├── Resnet-152.py
    │   │   ├── Resnet-101.py
    │   │   ├── Resnet-50.py
    │   │   ├── Resnet-18.py
    │   │   └── readme.md
    ├── detection
    │   ├── yolov5.py
    │   └── faster_rcnn.py
    ├── segmentation
    │   ├── unet.py
    │   └── deeplabv3.py
    └── captioning
    │   ├── cnn_encoder.py
    │   └── lstm_decoder.py
├── requirements.txt
├── projects
    └── image_classification
    │   ├── evaluate.py
    │   ├── train.py
    │   ├── config.yaml
    │   └── README.md
├── .gitignore
└── README.md


/Basic CV/readme.md:
--------------------------------------------------------------------------------
1 | # okiii 


--------------------------------------------------------------------------------
/utils/logger.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/utils/metrics.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/tests/test_model.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/tests/test_utils.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/utils/scheduler.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/utils/visualizer.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/__init__.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | # Add your project dependencies here
2 | 


--------------------------------------------------------------------------------
/architectures/classification/vit.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/detection/yolov5.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/segmentation/unet.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/captioning/cnn_encoder.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/captioning/lstm_decoder.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/classification/mobilenet.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/classification/resnet.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/detection/faster_rcnn.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/architectures/segmentation/deeplabv3.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/projects/image_classification/evaluate.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/projects/image_classification/train.py:
--------------------------------------------------------------------------------
1 | # TODO: implement this
2 | 


--------------------------------------------------------------------------------
/projects/image_classification/config.yaml:
--------------------------------------------------------------------------------
1 | # model and training configs
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | __pycache__/
3 | *.pt
4 | *.h5
5 | logs/
6 | results/
7 | datasets/
8 | 


--------------------------------------------------------------------------------
/projects/image_classification/README.md:
--------------------------------------------------------------------------------
1 | # Computer-Vision-Projects\Projects\Image Classification
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Computer Vision Projects
2 | 
3 | This repo contains implementations of various computer vision architectures and tasks.
4 | 


--------------------------------------------------------------------------------
/architectures/classification/Resnet/Resnet-34.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.nn.functional as F
 4 | 
 5 | # ---------------------------------
 6 | # Basic Residual Block (Same as ResNet-18)
 7 | # ---------------------------------
 8 | class BasicBlock(nn.Module):
 9 |     expansion = 1
10 | 
11 |     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
12 |         super(BasicBlock, self).__init__()
13 | 
14 |         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
15 |                                stride=stride, padding=1, bias=False)
16 |         self.bn1 = nn.BatchNorm2d(out_channels)
17 | 
18 |         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
19 |                                padding=1, bias=False)
20 |         self.bn2 = nn.BatchNorm2d(out_channels)
21 | 
22 |         self.downsample = downsample
23 |         self.relu = nn.ReLU(inplace=True)
24 | 
25 |     def forward(self, x):
26 |         identity = x
27 | 
28 |         out = self.relu(self.bn1(self.conv1(x)))
29 |         out = self.bn2(self.conv2(out))
30 | 
31 |         if self.downsample:
32 |             identity = self.downsample(x)
33 | 
34 |         out += identity
35 |         out = self.relu(out)
36 |         return out
37 | 
38 | # ---------------------------------
39 | # ResNet-34 with 34 layers
40 | # ---------------------------------
41 | class ResNet34(nn.Module):
42 |     def __init__(self, num_classes=1000):
43 |         super(ResNet34, self).__init__()
44 |         self.in_channels = 64
45 | 
46 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
47 |                                bias=False)
48 |         self.bn1 = nn.BatchNorm2d(64)
49 |         self.relu = nn.ReLU(inplace=True)
50 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
51 | 
52 |         self.layer1 = self._make_layer(BasicBlock, 64, 3)
53 |         self.layer2 = self._make_layer(BasicBlock, 128, 4, stride=2)
54 |         self.layer3 = self._make_layer(BasicBlock, 256, 6, stride=2)
55 |         self.layer4 = self._make_layer(BasicBlock, 512, 3, stride=2)
56 | 
57 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
58 |         self.fc = nn.Linear(512 * BasicBlock.expansion, num_classes)
59 | 
60 |     def _make_layer(self, block, out_channels, blocks, stride=1):
61 |         downsample = None
62 | 
63 |         if stride != 1 or self.in_channels != out_channels * block.expansion:
64 |             downsample = nn.Sequential(
65 |                 nn.Conv2d(self.in_channels, out_channels * block.expansion,
66 |                           kernel_size=1, stride=stride, bias=False),
67 |                 nn.BatchNorm2d(out_channels * block.expansion)
68 |             )
69 | 
70 |         layers = [block(self.in_channels, out_channels, stride, downsample)]
71 |         self.in_channels = out_channels * block.expansion
72 | 
73 |         for _ in range(1, blocks):
74 |             layers.append(block(self.in_channels, out_channels))
75 | 
76 |         return nn.Sequential(*layers)
77 | 
78 |     def forward(self, x):
79 |         x = self.relu(self.bn1(self.conv1(x)))
80 |         x = self.maxpool(x)
81 | 
82 |         x = self.layer1(x)  # 64
83 |         x = self.layer2(x)  # 128
84 |         x = self.layer3(x)  # 256
85 |         x = self.layer4(x)  # 512
86 | 
87 |         x = self.avgpool(x)
88 |         x = torch.flatten(x, 1)
89 |         return self.fc(x)
90 | 
91 | # Test
92 | if __name__ == "__main__":
93 |     model = ResNet34(num_classes=10)
94 |     x = torch.randn(2, 3, 224, 224)
95 |     out = model(x)
96 |     print("ResNet-34 output:", out.shape)  # (2, 10)
97 | 


--------------------------------------------------------------------------------
/architectures/classification/Resnet/Resnet-152.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | # Reuse the same Bottleneck block as above
 5 | class Bottleneck(nn.Module):
 6 |     expansion = 4
 7 | 
 8 |     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
 9 |         super(Bottleneck, self).__init__()
10 |         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
11 |         self.bn1 = nn.BatchNorm2d(out_channels)
12 | 
13 |         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
14 |                                stride=stride, padding=1, bias=False)
15 |         self.bn2 = nn.BatchNorm2d(out_channels)
16 | 
17 |         self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion,
18 |                                kernel_size=1, bias=False)
19 |         self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
20 | 
21 |         self.relu = nn.ReLU(inplace=True)
22 |         self.downsample = downsample
23 | 
24 |     def forward(self, x):
25 |         identity = x
26 | 
27 |         out = self.relu(self.bn1(self.conv1(x)))
28 |         out = self.relu(self.bn2(self.conv2(out)))
29 |         out = self.bn3(self.conv3(out))
30 | 
31 |         if self.downsample:
32 |             identity = self.downsample(x)
33 | 
34 |         out += identity
35 |         out = self.relu(out)
36 |         return out
37 | 
38 | 
39 | class ResNet152(nn.Module):
40 |     def __init__(self, num_classes=1000):
41 |         super(ResNet152, self).__init__()
42 |         self.in_channels = 64
43 | 
44 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
45 |                                bias=False)
46 |         self.bn1 = nn.BatchNorm2d(64)
47 |         self.relu = nn.ReLU(inplace=True)
48 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
49 | 
50 |         self.layer1 = self._make_layer(Bottleneck, 64, 3)
51 |         self.layer2 = self._make_layer(Bottleneck, 128, 8, stride=2)
52 |         self.layer3 = self._make_layer(Bottleneck, 256, 36, stride=2)  # 🔥 Heavy one!
53 |         self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2)
54 | 
55 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
56 |         self.fc = nn.Linear(512 * Bottleneck.expansion, num_classes)
57 | 
58 |     def _make_layer(self, block, out_channels, blocks, stride=1):
59 |         downsample = None
60 | 
61 |         if stride != 1 or self.in_channels != out_channels * block.expansion:
62 |             downsample = nn.Sequential(
63 |                 nn.Conv2d(self.in_channels, out_channels * block.expansion,
64 |                           kernel_size=1, stride=stride, bias=False),
65 |                 nn.BatchNorm2d(out_channels * block.expansion)
66 |             )
67 | 
68 |         layers = [block(self.in_channels, out_channels, stride, downsample)]
69 |         self.in_channels = out_channels * block.expansion
70 | 
71 |         for _ in range(1, blocks):
72 |             layers.append(block(self.in_channels, out_channels))
73 | 
74 |         return nn.Sequential(*layers)
75 | 
76 |     def forward(self, x):
77 |         x = self.relu(self.bn1(self.conv1(x)))
78 |         x = self.maxpool(x)
79 | 
80 |         x = self.layer1(x)
81 |         x = self.layer2(x)
82 |         x = self.layer3(x)  # 36 bottleneck blocks!
83 |         x = self.layer4(x)
84 | 
85 |         x = self.avgpool(x)
86 |         x = torch.flatten(x, 1)
87 |         return self.fc(x)
88 | 
89 | 
90 | # Test
91 | if __name__ == "__main__":
92 |     model = ResNet152(num_classes=10)
93 |     x = torch.randn(2, 3, 224, 224)
94 |     out = model(x)
95 |     print("ResNet-152 output:", out.shape)  # (2, 10)
96 | 


--------------------------------------------------------------------------------
/architectures/classification/Resnet/Resnet-101.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | # Same Bottleneck block used in ResNet-50
 5 | class Bottleneck(nn.Module):
 6 |     expansion = 4
 7 | 
 8 |     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
 9 |         super(Bottleneck, self).__init__()
10 |         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
11 |         self.bn1 = nn.BatchNorm2d(out_channels)
12 | 
13 |         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
14 |                                stride=stride, padding=1, bias=False)
15 |         self.bn2 = nn.BatchNorm2d(out_channels)
16 | 
17 |         self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion,
18 |                                kernel_size=1, bias=False)
19 |         self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
20 | 
21 |         self.relu = nn.ReLU(inplace=True)
22 |         self.downsample = downsample
23 | 
24 |     def forward(self, x):
25 |         identity = x
26 | 
27 |         out = self.relu(self.bn1(self.conv1(x)))
28 |         out = self.relu(self.bn2(self.conv2(out)))
29 |         out = self.bn3(self.conv3(out))
30 | 
31 |         if self.downsample:
32 |             identity = self.downsample(x)
33 | 
34 |         out += identity
35 |         out = self.relu(out)
36 |         return out
37 | 
38 | 
39 | class ResNet101(nn.Module):
40 |     def __init__(self, num_classes=1000):
41 |         super(ResNet101, self).__init__()
42 |         self.in_channels = 64
43 | 
44 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
45 |                                bias=False)
46 |         self.bn1 = nn.BatchNorm2d(64)
47 |         self.relu = nn.ReLU(inplace=True)
48 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
49 | 
50 |         self.layer1 = self._make_layer(Bottleneck, 64, 3)
51 |         self.layer2 = self._make_layer(Bottleneck, 128, 4, stride=2)
52 |         self.layer3 = self._make_layer(Bottleneck, 256, 23, stride=2)  # 👈 Key difference
53 |         self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2)
54 | 
55 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
56 |         self.fc = nn.Linear(512 * Bottleneck.expansion, num_classes)
57 | 
58 |     def _make_layer(self, block, out_channels, blocks, stride=1):
59 |         downsample = None
60 | 
61 |         if stride != 1 or self.in_channels != out_channels * block.expansion:
62 |             downsample = nn.Sequential(
63 |                 nn.Conv2d(self.in_channels, out_channels * block.expansion,
64 |                           kernel_size=1, stride=stride, bias=False),
65 |                 nn.BatchNorm2d(out_channels * block.expansion)
66 |             )
67 | 
68 |         layers = [block(self.in_channels, out_channels, stride, downsample)]
69 |         self.in_channels = out_channels * block.expansion
70 | 
71 |         for _ in range(1, blocks):
72 |             layers.append(block(self.in_channels, out_channels))
73 | 
74 |         return nn.Sequential(*layers)
75 | 
76 |     def forward(self, x):
77 |         x = self.relu(self.bn1(self.conv1(x)))
78 |         x = self.maxpool(x)
79 | 
80 |         x = self.layer1(x)
81 |         x = self.layer2(x)
82 |         x = self.layer3(x)  # 23 bottleneck blocks!
83 |         x = self.layer4(x)
84 | 
85 |         x = self.avgpool(x)
86 |         x = torch.flatten(x, 1)
87 |         return self.fc(x)
88 | 
89 | 
90 | # Test
91 | if __name__ == "__main__":
92 |     model = ResNet101(num_classes=10)
93 |     x = torch.randn(2, 3, 224, 224)
94 |     out = model(x)
95 |     print("ResNet-101 output:", out.shape)  # (2, 10)
96 | 


--------------------------------------------------------------------------------
/architectures/classification/Resnet/Resnet-50.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | # ---------------------------------
  5 | # Bottleneck Block for deeper ResNets (ResNet-50+)
  6 | # ---------------------------------
  7 | class Bottleneck(nn.Module):
  8 |     expansion = 4  # Final out_channels = out_channels * 4
  9 | 
 10 |     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
 11 |         super(Bottleneck, self).__init__()
 12 | 
 13 |         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
 14 |         self.bn1 = nn.BatchNorm2d(out_channels)
 15 | 
 16 |         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
 17 |                                stride=stride, padding=1, bias=False)
 18 |         self.bn2 = nn.BatchNorm2d(out_channels)
 19 | 
 20 |         self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion,
 21 |                                kernel_size=1, bias=False)
 22 |         self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
 23 | 
 24 |         self.relu = nn.ReLU(inplace=True)
 25 |         self.downsample = downsample
 26 | 
 27 |     def forward(self, x):
 28 |         identity = x
 29 | 
 30 |         out = self.relu(self.bn1(self.conv1(x)))
 31 |         out = self.relu(self.bn2(self.conv2(out)))
 32 |         out = self.bn3(self.conv3(out))
 33 | 
 34 |         if self.downsample:
 35 |             identity = self.downsample(x)
 36 | 
 37 |         out += identity
 38 |         out = self.relu(out)
 39 |         return out
 40 | 
 41 | # ---------------------------------
 42 | # ResNet-50 with Bottleneck Blocks
 43 | # ---------------------------------
 44 | class ResNet50(nn.Module):
 45 |     def __init__(self, num_classes=1000):
 46 |         super(ResNet50, self).__init__()
 47 |         self.in_channels = 64
 48 | 
 49 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
 50 |                                bias=False)
 51 |         self.bn1 = nn.BatchNorm2d(64)
 52 |         self.relu = nn.ReLU(inplace=True)
 53 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
 54 | 
 55 |         self.layer1 = self._make_layer(Bottleneck, 64, 3)
 56 |         self.layer2 = self._make_layer(Bottleneck, 128, 4, stride=2)
 57 |         self.layer3 = self._make_layer(Bottleneck, 256, 6, stride=2)
 58 |         self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2)
 59 | 
 60 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
 61 |         self.fc = nn.Linear(512 * Bottleneck.expansion, num_classes)
 62 | 
 63 |     def _make_layer(self, block, out_channels, blocks, stride=1):
 64 |         downsample = None
 65 | 
 66 |         if stride != 1 or self.in_channels != out_channels * block.expansion:
 67 |             downsample = nn.Sequential(
 68 |                 nn.Conv2d(self.in_channels, out_channels * block.expansion,
 69 |                           kernel_size=1, stride=stride, bias=False),
 70 |                 nn.BatchNorm2d(out_channels * block.expansion)
 71 |             )
 72 | 
 73 |         layers = [block(self.in_channels, out_channels, stride, downsample)]
 74 |         self.in_channels = out_channels * block.expansion
 75 | 
 76 |         for _ in range(1, blocks):
 77 |             layers.append(block(self.in_channels, out_channels))
 78 | 
 79 |         return nn.Sequential(*layers)
 80 | 
 81 |     def forward(self, x):
 82 |         x = self.relu(self.bn1(self.conv1(x)))
 83 |         x = self.maxpool(x)
 84 | 
 85 |         x = self.layer1(x)  # 256
 86 |         x = self.layer2(x)  # 512
 87 |         x = self.layer3(x)  # 1024
 88 |         x = self.layer4(x)  # 2048
 89 | 
 90 |         x = self.avgpool(x)
 91 |         x = torch.flatten(x, 1)
 92 |         return self.fc(x)
 93 | 
 94 | # Test
 95 | if __name__ == "__main__":
 96 |     model = ResNet50(num_classes=10)
 97 |     x = torch.randn(2, 3, 224, 224)
 98 |     out = model(x)
 99 |     print("ResNet-50 output:", out.shape)  # (2, 10)
100 | 


--------------------------------------------------------------------------------
/architectures/classification/Resnet/Resnet-18.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | # ----------------------------
  6 | # 1. Basic Residual Block
  7 | # ----------------------------
  8 | class BasicBlock(nn.Module):
  9 |     expansion = 1  # Used to compute output channels in the ResNet class
 10 | 
 11 |     def __init__(self, in_channels, out_channels, stride=1, downsample=None):
 12 |         """
 13 |         A BasicBlock has two 3x3 convolutions and a skip connection.
 14 |         If input and output dimensions differ, a downsample layer is applied.
 15 |         """
 16 |         super(BasicBlock, self).__init__()
 17 |         
 18 |         # First convolution layer
 19 |         self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
 20 |                                stride=stride, padding=1, bias=False)
 21 |         self.bn1 = nn.BatchNorm2d(out_channels)
 22 |         
 23 |         # Second convolution layer
 24 |         self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
 25 |                                stride=1, padding=1, bias=False)
 26 |         self.bn2 = nn.BatchNorm2d(out_channels)
 27 |         
 28 |         self.downsample = downsample  # To match dimensions if needed
 29 |         self.relu = nn.ReLU(inplace=True)
 30 | 
 31 |     def forward(self, x):
 32 |         identity = x  # Save input for skip connection
 33 | 
 34 |         out = self.relu(self.bn1(self.conv1(x)))
 35 |         out = self.bn2(self.conv2(out))
 36 | 
 37 |         if self.downsample:
 38 |             identity = self.downsample(x)  # Match input shape to output shape
 39 | 
 40 |         out += identity  # Add skip connection
 41 |         out = self.relu(out)
 42 |         return out
 43 | 
 44 | # ----------------------------
 45 | # 2. ResNet-18 Model
 46 | # ----------------------------
 47 | class ResNet18(nn.Module):
 48 |     def __init__(self, num_classes=1000):
 49 |         """
 50 |         ResNet-18 has 4 stages with [2, 2, 2, 2] BasicBlocks.
 51 |         """
 52 |         super(ResNet18, self).__init__()
 53 |         self.in_channels = 64  # Initial channel count after first conv
 54 | 
 55 |         # Initial Conv Layer (stem)
 56 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
 57 |                                bias=False)
 58 |         self.bn1 = nn.BatchNorm2d(64)
 59 |         self.relu = nn.ReLU(inplace=True)
 60 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
 61 | 
 62 |         # Four residual stages
 63 |         self.layer1 = self._make_layer(BasicBlock, 64,  2)
 64 |         self.layer2 = self._make_layer(BasicBlock, 128, 2, stride=2)
 65 |         self.layer3 = self._make_layer(BasicBlock, 256, 2, stride=2)
 66 |         self.layer4 = self._make_layer(BasicBlock, 512, 2, stride=2)
 67 | 
 68 |         # Global Average Pool and FC layer
 69 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # Output size (1x1)
 70 |         self.fc = nn.Linear(512 * BasicBlock.expansion, num_classes)
 71 | 
 72 |     def _make_layer(self, block, out_channels, blocks, stride=1):
 73 |         """
 74 |         Creates a stage with multiple residual blocks.
 75 |         """
 76 |         downsample = None
 77 | 
 78 |         if stride != 1 or self.in_channels != out_channels * block.expansion:
 79 |             # Use 1x1 conv to match dimensions when needed
 80 |             downsample = nn.Sequential(
 81 |                 nn.Conv2d(self.in_channels, out_channels * block.expansion,
 82 |                           kernel_size=1, stride=stride, bias=False),
 83 |                 nn.BatchNorm2d(out_channels * block.expansion)
 84 |             )
 85 | 
 86 |         layers = []
 87 |         # First block may have downsampling
 88 |         layers.append(block(self.in_channels, out_channels, stride, downsample))
 89 |         self.in_channels = out_channels * block.expansion
 90 | 
 91 |         # Remaining blocks (no downsampling)
 92 |         for _ in range(1, blocks):
 93 |             layers.append(block(self.in_channels, out_channels))
 94 | 
 95 |         return nn.Sequential(*layers)
 96 | 
 97 |     def forward(self, x):
 98 |         x = self.relu(self.bn1(self.conv1(x)))
 99 |         x = self.maxpool(x)
100 | 
101 |         x = self.layer1(x)  # 64
102 |         x = self.layer2(x)  # 128
103 |         x = self.layer3(x)  # 256
104 |         x = self.layer4(x)  # 512
105 | 
106 |         x = self.avgpool(x)  # (B, 512, 1, 1)
107 |         x = torch.flatten(x, 1)  # (B, 512)
108 |         x = self.fc(x)  # Final output logits
109 |         return x
110 | 
111 | # ----------------------------
112 | # 3. Test the Model
113 | # ----------------------------
114 | if __name__ == "__main__":
115 |     model = ResNet18(num_classes=10)
116 |     print(model)
117 | 
118 |     x = torch.randn(2, 3, 224, 224)  # Dummy input
119 |     out = model(x)
120 |     print("Output shape:", out.shape)  # (2, 10)
121 | 


--------------------------------------------------------------------------------
/architectures/classification/Resnet/readme.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # 📚 **ResNet Implementations for Computer Vision**
 3 | 
 4 | Welcome to the ResNet implementation repository! Here, you’ll find various ResNet architectures implemented from scratch in **PyTorch**, including **ResNet-18**, **ResNet-34**, **ResNet-50**, **ResNet-101**, and **ResNet-152**. These are some of the most influential architectures in the field of **deep learning** and **computer vision**.
 5 | 
 6 | This repository includes:
 7 | - **Implementation of each ResNet model**.
 8 | - Explanations of how ResNet works.
 9 | - Use cases of ResNet and its variants.
10 | - Links to related research papers and further reading.
11 | 
12 | ---
13 | 
14 | ## 💡 **Overview of ResNet**
15 | 
16 | **ResNet** (Residual Networks) was introduced in the paper "Deep Residual Learning for Image Recognition" by **Kaiming He et al.** in 2015. ResNet's key innovation is the introduction of **residual learning**, which allows training of very deep networks by avoiding the vanishing gradient problem through **skip connections**.
17 | 
18 | The key idea is to learn the residual (difference) between input and output, instead of the direct mapping, which makes the training of deeper networks more efficient and effective.
19 | 
20 | ---
21 | 
22 | ## 🚀 **ResNet Architecture**
23 | 
24 | ### **Basic Components of ResNet:**
25 | 1. **Residual Blocks**: Each block consists of two or more convolutional layers, where the input is directly added to the output of the convolution, bypassing one or more layers.
26 | 2. **Skip Connections**: These connections allow gradients to flow directly across layers, making deep networks trainable.
27 | 3. **Bottleneck Blocks (for deeper ResNets)**: These blocks are used in architectures like ResNet-50, ResNet-101, and ResNet-152, where the input is compressed and expanded to reduce computation and parameters.
28 | 
29 | ### **ResNet Variants**:
30 | - **ResNet-18**: Smallest ResNet variant with 18 layers. Ideal for quick experiments and small datasets.
31 | - **ResNet-34**: Slightly larger, with more layers for better performance on standard datasets.
32 | - **ResNet-50**: Uses **Bottleneck blocks**. Suitable for more complex problems.
33 | - **ResNet-101**: Deeper ResNet with 101 layers. Best for highly complex problems.
34 | - **ResNet-152**: The deepest ResNet, used for extremely complex problems and large datasets.
35 | 
36 | ---
37 | 
38 | ## 🏆 **Use Cases of ResNet**
39 | 
40 | ResNet is widely used across many **computer vision** tasks, including but not limited to:
41 | - **Image Classification**: Classifying objects in images, typically using datasets like **ImageNet**.
42 | - **Object Detection**: Using pre-trained ResNet networks as backbone feature extractors in networks like **Faster R-CNN** and **YOLO**.
43 | - **Image Segmentation**: Using ResNet with techniques like **U-Net** for tasks such as medical image analysis.
44 | - **Facial Recognition**: ResNet has been applied in facial recognition systems due to its robustness in identifying features.
45 | - **Transfer Learning**: ResNet models are often used as pre-trained models, fine-tuned for specific tasks such as image classification or object detection.
46 | 
47 | ---
48 | 
49 | ## 🔬 **Sources for Study**
50 | 
51 | ### **Research Papers**:
52 | - **[Deep Residual Learning for Image Recognition (ResNet Paper)](https://arxiv.org/abs/1512.03385)**: The foundational paper introducing ResNet and the idea of residual learning.
53 | - **[Identity Mappings in Deep Residual Networks (ResNet-110)](https://arxiv.org/abs/1603.05027)**: Explores identity mappings in ResNet architectures for even deeper networks.
54 | - **[Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187)**: A paper that gives practical tips on improving ResNet models for image classification tasks.
55 | 
56 | ### **Books and Tutorials**:
57 | - **[Deep Learning with Python by François Chollet](https://www.manning.com/books/deep-learning-with-python)**: A great resource for learning deep learning concepts and their applications using Keras and TensorFlow.
58 | - **[Stanford CS231n - Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)**: A comprehensive and in-depth course on computer vision and neural networks by Stanford.
59 | - **[PyTorch Documentation](https://pytorch.org/docs/stable/)**: Official documentation for PyTorch. Essential for understanding and implementing neural networks.
60 | 
61 | ---
62 | 
63 | 
64 | 
65 | 
66 | ## 🔗 **References**
67 | 
68 | 1. **[Original ResNet Paper: Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)**
69 | 2. **[PyTorch Official Documentation](https://pytorch.org/docs/stable/)**
70 | 3. **[Stanford CS231n Course: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/)**
71 | 
72 | ---
73 | 
74 | ## 🌟 **Acknowledgments**
75 | 
76 | Special thanks to **Kaiming He et al.** for their groundbreaking work in developing ResNet. This repository is built upon their work to further research and explore deep learning techniques in computer vision.
77 | 
78 | 


--------------------------------------------------------------------------------