├── README.md └── main.py /README.md: -------------------------------------------------------------------------------- 1 |

2 |

Setup dataset from scratch

3 |

4 | 5 | ## Introduction 6 | Yes, it is true that there are many frameworks available for setting up datasets, such as `ImageFolder`, which is commonly used in computer vision tasks. However, in the real world, the data you may encounter `may not always be in a format` that can be directly used with these frameworks. 7 | 8 | 9 | 10 | ![z4225377581040_30ee595ce02040fce81c762705edfe2e](https://user-images.githubusercontent.com/122800932/229079504-bade2767-3acb-40c9-9e11-d44a117188af.jpg) 11 | 12 | 13 | This would require you to build your `own dataset and preprocess your data accordingly` 14 | 15 | 16 | ## How to use my code 17 | With my code, you can: 18 | * Use your dataset in your path folder 19 | * You can customize your preprocessing steps based on the specific needs of your task 20 | 21 | ## Requirements: 22 | * Python 23 | * Torch 24 | * PIL 25 | * Torchvision 26 | * os 27 | 28 | I have explain more clearly in my [medium](https://medium.com/@giahuy04/imagefolder-is-enough-to-set-up-your-data-3d9689498bca) 29 | 30 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | from torch.utils.data import Dataset 3 | from PIL import Image 4 | from torchvision.transforms import Compose, ToTensor, Resize 5 | 6 | 7 | 8 | class Your_Dataset(Dataset): 9 | def __init__(self, root, train=True, transform=None): 10 | self.image_paths = [] 11 | self.labels = [] 12 | self.categories = [] # add your categories here 13 | self.transform = transform 14 | 15 | data_path = os.path.join(root, " ") # add your path folder 16 | 17 | if train: 18 | data_path = os.path.join(data_path, "train") 19 | else: 20 | data_path = os.path.join(data_path, "test") 21 | 22 | for i, category in enumerate(self.categories): 23 | data_files = os.path.join(data_path, category) 24 | for item in os.listdir(data_files): 25 | path = os.path.join(data_files, item) 26 | self.image_paths.append(path) 27 | self.labels.append(i) 28 | 29 | def __len__(self): 30 | return len(self.labels) 31 | 32 | def __getitem__(self, index): 33 | image_path = self.image_paths[index] 34 | image = Image.open(image_path).convert('RGB') 35 | label = self.labels[index] 36 | if self.transform: 37 | image = self.transform(image) 38 | return image, label 39 | 40 | 41 | if __name__ == "__main__": 42 | transform = Compose([ 43 | Resize((224, 224)), # you can change your image size 44 | ToTensor(), 45 | ]) 46 | dataset = Your_Dataset(root=" ", train=True, transform=transform) # add your root path 47 | 48 | --------------------------------------------------------------------------------