├── README.md
└── main.py
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
Setup dataset from scratch
3 |
4 |
5 | ## Introduction
6 | Yes, it is true that there are many frameworks available for setting up datasets, such as `ImageFolder`, which is commonly used in computer vision tasks. However, in the real world, the data you may encounter `may not always be in a format` that can be directly used with these frameworks.
7 |
8 |
9 |
10 | 
11 |
12 |
13 | This would require you to build your `own dataset and preprocess your data accordingly`
14 |
15 |
16 | ## How to use my code
17 | With my code, you can:
18 | * Use your dataset in your path folder
19 | * You can customize your preprocessing steps based on the specific needs of your task
20 |
21 | ## Requirements:
22 | * Python
23 | * Torch
24 | * PIL
25 | * Torchvision
26 | * os
27 |
28 | I have explain more clearly in my [medium](https://medium.com/@giahuy04/imagefolder-is-enough-to-set-up-your-data-3d9689498bca)
29 |
30 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import os
2 | from torch.utils.data import Dataset
3 | from PIL import Image
4 | from torchvision.transforms import Compose, ToTensor, Resize
5 |
6 |
7 |
8 | class Your_Dataset(Dataset):
9 | def __init__(self, root, train=True, transform=None):
10 | self.image_paths = []
11 | self.labels = []
12 | self.categories = [] # add your categories here
13 | self.transform = transform
14 |
15 | data_path = os.path.join(root, " ") # add your path folder
16 |
17 | if train:
18 | data_path = os.path.join(data_path, "train")
19 | else:
20 | data_path = os.path.join(data_path, "test")
21 |
22 | for i, category in enumerate(self.categories):
23 | data_files = os.path.join(data_path, category)
24 | for item in os.listdir(data_files):
25 | path = os.path.join(data_files, item)
26 | self.image_paths.append(path)
27 | self.labels.append(i)
28 |
29 | def __len__(self):
30 | return len(self.labels)
31 |
32 | def __getitem__(self, index):
33 | image_path = self.image_paths[index]
34 | image = Image.open(image_path).convert('RGB')
35 | label = self.labels[index]
36 | if self.transform:
37 | image = self.transform(image)
38 | return image, label
39 |
40 |
41 | if __name__ == "__main__":
42 | transform = Compose([
43 | Resize((224, 224)), # you can change your image size
44 | ToTensor(),
45 | ])
46 | dataset = Your_Dataset(root=" ", train=True, transform=transform) # add your root path
47 |
48 |
--------------------------------------------------------------------------------