Setup dataset from scratch

├── README.md
└── main.py


/README.md:
--------------------------------------------------------------------------------
 1 | <p align="center">
 2 |  <h1 align="center">Setup dataset from scratch </h1>
 3 | </p>
 4 | 
 5 | ## Introduction
 6 | Yes, it is true that there are many frameworks available for setting up datasets, such as `ImageFolder`, which is commonly used in computer vision tasks. However, in the real world, the data you may encounter `may not always be in a format` that can be directly used with these frameworks.
 7 | 
 8 | 
 9 | 
10 | ![z4225377581040_30ee595ce02040fce81c762705edfe2e](https://user-images.githubusercontent.com/122800932/229079504-bade2767-3acb-40c9-9e11-d44a117188af.jpg)
11 | 
12 | 
13 | This would require you to build your `own dataset and preprocess your data accordingly` 
14 | 
15 | 
16 | ## How to use my code
17 | With my code, you can:
18 | * Use your dataset in your path folder
19 | * You can customize your preprocessing steps based on the specific needs of your task
20 | 
21 | ## Requirements:
22 | * Python
23 | * Torch
24 | * PIL
25 | * Torchvision
26 | * os
27 | 
28 | I have explain more clearly in my [medium](https://medium.com/@giahuy04/imagefolder-is-enough-to-set-up-your-data-3d9689498bca)
29 | 
30 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from torch.utils.data import Dataset
 3 | from PIL import Image
 4 | from torchvision.transforms import Compose, ToTensor, Resize
 5 | 
 6 | 
 7 | 
 8 | class Your_Dataset(Dataset):
 9 |     def __init__(self, root, train=True, transform=None):
10 |         self.image_paths = []
11 |         self.labels = []
12 |         self.categories = [] # add your categories here
13 |         self.transform = transform
14 | 
15 |         data_path = os.path.join(root, " ") # add your path folder
16 | 
17 |         if train:
18 |             data_path = os.path.join(data_path, "train")
19 |         else:
20 |             data_path = os.path.join(data_path, "test")
21 | 
22 |         for i, category in enumerate(self.categories):
23 |             data_files = os.path.join(data_path, category)
24 |             for item in os.listdir(data_files):
25 |                 path = os.path.join(data_files, item)
26 |                 self.image_paths.append(path)
27 |                 self.labels.append(i)
28 | 
29 |     def __len__(self):
30 |         return len(self.labels)
31 | 
32 |     def __getitem__(self, index):
33 |         image_path = self.image_paths[index]
34 |         image = Image.open(image_path).convert('RGB')
35 |         label = self.labels[index]
36 |         if self.transform:
37 |             image = self.transform(image)
38 |         return image, label
39 | 
40 | 
41 | if __name__ == "__main__":
42 |     transform = Compose([
43 |         Resize((224, 224)), # you can change your image size
44 |         ToTensor(),
45 |     ])
46 |     dataset = Your_Dataset(root=" ", train=True, transform=transform) # add your root path
47 | 
48 | 


--------------------------------------------------------------------------------