├── cli ├── requirements.txt └── groupimg.py ├── gui ├── requirements.txt └── groupImgGUI.py ├── screenshot-GUI.png ├── .github ├── FUNDING.yml └── ISSUE_TEMPLATE │ ├── feature_request.md │ └── bug_report.md ├── LICENSE ├── README.md └── CODE_OF_CONDUCT.md /cli/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.22.0 2 | tqdm==4.66.3 3 | Pillow==10.3.0 4 | -------------------------------------------------------------------------------- /gui/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.22.0 2 | Pillow==10.3.0 3 | PySide2==5.12.0 4 | -------------------------------------------------------------------------------- /screenshot-GUI.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/victorqribeiro/groupImg/HEAD/screenshot-GUI.png -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | 4 | patreon: victorqribeiro 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | 5 | --- 6 | 7 | **Is your feature request related to a problem? Please describe.** 8 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 9 | 10 | **Describe the solution you'd like** 11 | A clear and concise description of what you want to happen. 12 | 13 | **Describe alternatives you've considered** 14 | A clear and concise description of any alternative solutions or features you've considered. 15 | 16 | **Additional context** 17 | Add any other context or screenshots about the feature request here. 18 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | 5 | --- 6 | 7 | **Describe the bug** 8 | A clear and concise description of what the bug is. 9 | 10 | **To Reproduce** 11 | Steps to reproduce the behavior: 12 | 1. Go to '...' 13 | 2. Click on '....' 14 | 3. Scroll down to '....' 15 | 4. See error 16 | 17 | **Expected behavior** 18 | A clear and concise description of what you expected to happen. 19 | 20 | **Screenshots** 21 | If applicable, add screenshots to help explain your problem. 22 | 23 | **Desktop (please complete the following information):** 24 | - OS: [e.g. iOS] 25 | - Browser [e.g. chrome, safari] 26 | - Version [e.g. 22] 27 | 28 | **Smartphone (please complete the following information):** 29 | - Device: [e.g. iPhone6] 30 | - OS: [e.g. iOS8.1] 31 | - Browser [e.g. stock browser, safari] 32 | - Version [e.g. 22] 33 | 34 | **Additional context** 35 | Add any other context about the problem here. 36 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Victor Ribeiro 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # groupImg 2 | 3 | A python script to organize your images by similarity. 4 | 5 | It uses a [k-means](https://en.wikipedia.org/wiki/K-means_clustering) algorithm to separatem them in clusters. 6 | 7 | Watch it working below. 8 | 9 | [![groupImg](http://img.youtube.com/vi/M6ntIaynKCg/0.jpg)](http://www.youtube.com/watch?v=M6ntIaynKCg) 10 | 11 | ## Why ? 12 | 13 | It was about one year since I made the switch from Windows to Linux, and I wanted to get better at the terminal. I always thought it was cool to master it. To make a long story short, one day I did mount a linux image on my external backup drive instead of the pen drive. Some basic */dev/sd\** confusion. Anyway's, I overwrited all my photos; so I used [Foremost](https://en.wikipedia.org/wiki/Foremost_(software)) (which is a great tool) to recover them. It recovered 350.000 images; miniatures, textures, profile pictures, wallpapers... and among all that my personal photos. So I wrote a little script to divide the images in folders, 1000 images per folder, if I'm not mistaken. I went through all the folders separating my photos from the random images that weren't important. To end the story, I came up with this script to "cluster" the photos by similiraty and makes things a little easier for me. 14 | 15 | ## How to use 16 | 17 | Navigate to the folder you want. CLI - Command Line Interface or GUI - Graphic User Interface 18 | 19 | Install the requirements.txt 20 | 21 | ``` 22 | pip install -r requirements.txt 23 | ``` 24 | 25 | ### CLI 26 | 27 | Call the script passing the image folder you want to organize. 28 | 29 | ``` 30 | python groupimg.py -f /home/user/Pictures/ 31 | ``` 32 | 33 | ## Parameters 34 | 35 | \-f folder where your images are (use absolute path). 36 | ```groupimg -f /home/user/Pictures``` 37 | 38 | \-k number of folders you want to separate your images. 39 | ```groupimg -f /home/user/Pictures -k 5``` 40 | 41 | \-m if you want to move your images instead of just copy them. 42 | 43 | \-s if you want the algorithm to consider the size of the images as a feature. 44 | 45 | ### GUI 46 | 47 | ![groupImgGUI](screenshot-GUI.png) 48 | 49 | Just call the groupImgGUI.py file. 50 | 51 | ``` 52 | python groupImgGUI.py 53 | ``` 54 | 55 | Click the button Select folder to select the folder with the pictures you want to organize. 56 | 57 | You can adjust the settings by checking the settings box. 58 | 59 | N. Group - How many groups should the images be separated in. 60 | 61 | Resample - Size to resample the image before comparing (small sizes gives fast results). 62 | 63 | Move - Move the images instead of copy them (useful if you have low space on your hard drive). 64 | 65 | Size - Consider the size of the images to organize them (useful if you want to separate thumbnails from real pictures). 66 | 67 | [![Donations](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=victorqribeiro%40gmail%2ecom&lc=BR&item_name=Victor%20Ribeiro&item_number=donation¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donateCC_LG%2egif%3aNonHosted) 68 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 6 | 7 | ## Our Standards 8 | 9 | Examples of behavior that contributes to creating a positive environment include: 10 | 11 | * Using welcoming and inclusive language 12 | * Being respectful of differing viewpoints and experiences 13 | * Gracefully accepting constructive criticism 14 | * Focusing on what is best for the community 15 | * Showing empathy towards other community members 16 | 17 | Examples of unacceptable behavior by participants include: 18 | 19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 20 | * Trolling, insulting/derogatory comments, and personal or political attacks 21 | * Public or private harassment 22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 23 | * Other conduct which could reasonably be considered inappropriate in a professional setting 24 | 25 | ## Our Responsibilities 26 | 27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 28 | 29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 30 | 31 | ## Scope 32 | 33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 34 | 35 | ## Enforcement 36 | 37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at victorqribeiro@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 38 | 39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 40 | 41 | ## Attribution 42 | 43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] 44 | 45 | [homepage]: http://contributor-covenant.org 46 | [version]: http://contributor-covenant.org/version/1/4/ 47 | -------------------------------------------------------------------------------- /cli/groupimg.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | import glob 4 | import math 5 | import argparse 6 | import warnings 7 | import numpy as np 8 | from PIL import Image 9 | from tqdm import tqdm 10 | from multiprocessing.dummy import Pool as ThreadPool 11 | from multiprocessing import cpu_count 12 | 13 | Image.MAX_IMAGE_PIXELS = None 14 | warnings.simplefilter('ignore') 15 | 16 | class K_means: 17 | 18 | def __init__(self, k=3, size=False, resample=32): 19 | self.k = k 20 | self.cluster = [] 21 | self.data = [] 22 | self.end = [] 23 | self.i = 0 24 | self.size = size 25 | self.resample = resample 26 | 27 | def manhattan_distance(self,x1,x2): 28 | s = 0.0 29 | for i in range(len(x1)): 30 | s += abs( float(x1[i]) - float(x2[i]) ) 31 | return s 32 | 33 | def euclidian_distance(self,x1,x2): 34 | s = 0.0 35 | for i in range(len(x1)): 36 | s += math.sqrt((float(x1[i]) - float(x2[i])) ** 2) 37 | return s 38 | 39 | def read_image(self,im): 40 | if self.i >= self.k : 41 | self.i = 0 42 | try: 43 | img = Image.open(im) 44 | osize = img.size 45 | img.thumbnail((self.resample,self.resample)) 46 | v = [float(p)/float(img.size[0]*img.size[1])*100 for p in np.histogram(np.asarray(img))[0]] 47 | if self.size : 48 | v += [osize[0], osize[1]] 49 | pbar.update(1) 50 | i = self.i 51 | self.i += 1 52 | return [i, v, im] 53 | except Exception as e: 54 | print("Error reading ",im,e) 55 | return [None, None, None] 56 | 57 | 58 | def generate_k_means(self): 59 | final_mean = [] 60 | for c in range(self.k): 61 | partial_mean = [] 62 | for i in range(len(self.data[0])): 63 | s = 0.0 64 | t = 0 65 | for j in range(len(self.data)): 66 | if self.cluster[j] == c : 67 | s += self.data[j][i] 68 | t += 1 69 | if t != 0 : 70 | partial_mean.append(float(s)/float(t)) 71 | else: 72 | partial_mean.append(float('inf')) 73 | final_mean.append(partial_mean) 74 | return final_mean 75 | 76 | def generate_k_clusters(self,folder): 77 | pool = ThreadPool(cpu_count()) 78 | result = pool.map(self.read_image, folder) 79 | pool.close() 80 | pool.join() 81 | self.cluster = [r[0] for r in result if r[0] != None] 82 | self.data = [r[1] for r in result if r[1] != None] 83 | self.end = [r[2] for r in result if r[2] != None] 84 | 85 | def rearrange_clusters(self): 86 | isover = False 87 | while(not isover): 88 | isover = True 89 | m = self.generate_k_means() 90 | for x in range(len(self.cluster)): 91 | dist = [] 92 | for a in range(self.k): 93 | dist.append( self.manhattan_distance(self.data[x],m[a]) ) 94 | _mindist = dist.index(min(dist)) 95 | if self.cluster[x] != _mindist : 96 | self.cluster[x] = _mindist 97 | isover = False 98 | 99 | ap = argparse.ArgumentParser() 100 | ap.add_argument("-f", "--folder", required=True, help="path to image folder") 101 | ap.add_argument("-k", "--kmeans", type=int, default=5, help="how many groups") 102 | ap.add_argument("-r", "--resample", type=int, default=128, help="size to resample the image by") 103 | ap.add_argument("-s", "--size", default=False, action="store_true", help="use size to compare images") 104 | ap.add_argument("-m", "--move", default=False, action="store_true", help="move instead of copy") 105 | args = vars(ap.parse_args()) 106 | types = ('*.jpg', '*.JPG', '*.png', '*.jpeg') 107 | imagePaths = [] 108 | folder = args["folder"] 109 | for files in types : 110 | imagePaths.extend(sorted(glob.glob(os.path.join(folder+files)))) 111 | nimages = len(imagePaths) 112 | nfolders = int(math.log(args["kmeans"], 10))+1 113 | if nimages <= 0 : 114 | print("No images found!") 115 | exit() 116 | if args["resample"] < 16 or args["resample"] > 256 : 117 | print("-r should be a value between 16 and 256") 118 | exit() 119 | pbar = tqdm(total=nimages) 120 | k = K_means(args["kmeans"],args["size"],args["resample"]) 121 | k.generate_k_clusters(imagePaths) 122 | k.rearrange_clusters() 123 | for i in range(k.k) : 124 | currentFolder = os.path.join(folder, str(i+1).zfill(nfolders)) 125 | try : 126 | os.makedirs(currentFolder) 127 | except FileExistsError: 128 | print("Folder '" + currentFolder + "' already exists") 129 | except Exception as e: 130 | print("An errror occurred creating folder '" + currentFolder + "': " + e) 131 | action = shutil.copy2 132 | if args["move"] : 133 | action = shutil.move 134 | for i in range(len(k.cluster)): 135 | action(k.end[i], os.path.join(folder, str(k.cluster[i]+1).zfill(nfolders)+"/")) 136 | -------------------------------------------------------------------------------- /gui/groupImgGUI.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import shutil 4 | import glob 5 | import math 6 | import warnings 7 | import numpy as np 8 | from PIL import Image 9 | from multiprocessing.dummy import Pool as ThreadPool 10 | from multiprocessing import cpu_count 11 | from PySide2.QtCore import * 12 | from PySide2.QtGui import * 13 | from PySide2.QtWidgets import * 14 | 15 | Image.MAX_IMAGE_PIXELS = None 16 | warnings.simplefilter('ignore') 17 | 18 | class K_means: 19 | 20 | def __init__(self, k=3, size=False, resample=128): 21 | self.k = k 22 | self.cluster = [] 23 | self.data = [] 24 | self.end = [] 25 | self.i = 0 26 | self.size = size 27 | self.resample = resample 28 | 29 | def manhattan_distance(self,x1,x2): 30 | s = 0.0 31 | for i in range(len(x1)): 32 | s += abs( float(x1[i]) - float(x2[i]) ) 33 | return s 34 | 35 | def euclidian_distance(self,x1,x2): 36 | s = 0.0 37 | for i in range(len(x1)): 38 | s += math.sqrt((float(x1[i]) - float(x2[i])) ** 2) 39 | return s 40 | 41 | def read_image(self,im): 42 | if self.i >= self.k : 43 | self.i = 0 44 | try: 45 | img = Image.open(im) 46 | osize = img.size 47 | img.thumbnail((self.resample,self.resample)) 48 | v = [float(p)/float(img.size[0]*img.size[1])*100 for p in np.histogram(np.asarray(img))[0]] 49 | if self.size : 50 | v += [osize[0], osize[1]] 51 | i = self.i 52 | self.i += 1 53 | return [i, v, im] 54 | except Exception as e: 55 | print("Error reading ",im,e) 56 | return [None, None, None] 57 | 58 | 59 | def generate_k_means(self): 60 | final_mean = [] 61 | for c in range(self.k): 62 | partial_mean = [] 63 | for i in range(len(self.data[0])): 64 | s = 0.0 65 | t = 0 66 | for j in range(len(self.data)): 67 | if self.cluster[j] == c : 68 | s += self.data[j][i] 69 | t += 1 70 | if t != 0 : 71 | partial_mean.append(float(s)/float(t)) 72 | else: 73 | partial_mean.append(float('inf')) 74 | final_mean.append(partial_mean) 75 | return final_mean 76 | 77 | def generate_k_clusters(self,folder): 78 | pool = ThreadPool(cpu_count()) 79 | result = pool.map(self.read_image, folder) 80 | pool.close() 81 | pool.join() 82 | self.cluster = [r[0] for r in result if r[0] != None] 83 | self.data = [r[1] for r in result if r[1] != None] 84 | self.end = [r[2] for r in result if r[2] != None] 85 | 86 | def rearrange_clusters(self): 87 | isover = False 88 | while(not isover): 89 | isover = True 90 | m = self.generate_k_means() 91 | for x in range(len(self.cluster)): 92 | dist = [] 93 | for a in range(self.k): 94 | dist.append( self.manhattan_distance(self.data[x],m[a]) ) 95 | _mindist = dist.index(min(dist)) 96 | if self.cluster[x] != _mindist : 97 | self.cluster[x] = _mindist 98 | isover = False 99 | 100 | class groupImgGUI(QWidget) : 101 | 102 | def __init__(self, parent = None) : 103 | super(groupImgGUI, self).__init__(parent) 104 | self.dir = None 105 | self.progressValue = 0 106 | self.createSettings() 107 | layout = QVBoxLayout() 108 | self.btn = QPushButton("Select folder") 109 | self.btn.clicked.connect(self.selectFolder) 110 | self.check = QCheckBox("Settings") 111 | self.check.stateChanged.connect(self.state); 112 | self.runbtn = QPushButton("Run") 113 | self.runbtn.clicked.connect(self.run) 114 | self.progress = QProgressBar(self) 115 | self.progress.hide() 116 | layout.addWidget(self.btn) 117 | layout.addWidget(self.check) 118 | layout.addWidget(self.formGroupBox) 119 | layout.addWidget(self.progress) 120 | layout.addWidget(self.runbtn) 121 | self.setMinimumSize(300,300) 122 | self.setLayout(layout) 123 | self.setWindowTitle("groupImg - GUI") 124 | 125 | def createSettings(self) : 126 | self.formGroupBox = QGroupBox("Settings") 127 | layout = QFormLayout() 128 | self.kmeans = QSpinBox() 129 | self.kmeans.setRange(3,15) 130 | self.kmeans.setValue(3) 131 | self.sample = QSpinBox() 132 | self.sample.setRange(32, 256) 133 | self.sample.setValue(128) 134 | self.sample.setSingleStep(2) 135 | self.move = QCheckBox() 136 | self.size = QCheckBox() 137 | layout.addRow(QLabel("N. Groups:"), self.kmeans) 138 | layout.addRow(QLabel("Resample:"), self.sample) 139 | layout.addRow(QLabel("Move:"), self.move) 140 | layout.addRow(QLabel("Size:"), self.size) 141 | self.formGroupBox.hide() 142 | self.formGroupBox.setLayout(layout) 143 | 144 | def selectFolder(self) : 145 | QFileDialog.FileMode(QFileDialog.Directory) 146 | self.dir = QFileDialog.getExistingDirectory(self) 147 | self.btn.setText(self.dir or "Select folder") 148 | 149 | def state(self) : 150 | if self.check.isChecked() : 151 | self.formGroupBox.show() 152 | else: 153 | self.formGroupBox.hide() 154 | 155 | def disableButton(self) : 156 | self.runbtn.setText("Working...") 157 | self.runbtn.setEnabled(False) 158 | 159 | def enableButton(self) : 160 | self.runbtn.setText("Run") 161 | self.runbtn.setEnabled(True) 162 | 163 | def run(self) : 164 | self.disableButton() 165 | types = ('*.jpg', '*.JPG', '*.png', '*.jpeg') 166 | imagePaths = [] 167 | folder = self.dir 168 | if not folder.endswith("/") : 169 | folder+="/" 170 | for files in types : 171 | imagePaths.extend(sorted(glob.glob(folder+files))) 172 | nimages = len(imagePaths) 173 | nfolders = int(math.log(self.kmeans.value(), 10))+1 174 | if nimages <= 0 : 175 | QMessageBox.warning(self, "Error", 'No images found!') 176 | self.enableButton() 177 | return 178 | k = K_means(self.kmeans.value(),self.size.isChecked(),self.sample.value()) 179 | k.generate_k_clusters(imagePaths) 180 | k.rearrange_clusters() 181 | for i in range(k.k) : 182 | try : 183 | os.makedirs(folder+str(i+1).zfill(nfolders)) 184 | except Exception as e : 185 | print("Folder already exists", e) 186 | action = shutil.copy 187 | if self.move.isChecked() : 188 | action = shutil.move 189 | for i in range(len(k.cluster)): 190 | action(k.end[i], folder+"/"+str(k.cluster[i]+1).zfill(nfolders)+"/") 191 | QMessageBox.information(self, "Done", 'Done!') 192 | self.enableButton() 193 | 194 | def main(): 195 | app = QApplication(sys.argv) 196 | groupimg = groupImgGUI() 197 | groupimg.show() 198 | sys.exit(app.exec_()) 199 | 200 | if __name__ == '__main__': 201 | main() 202 | --------------------------------------------------------------------------------