├── Data ├── All Gender (1).png ├── Cleaned_data_all_gender.png ├── Noise in face data.png ├── Only man.png ├── Only women.png ├── Readme.md ├── Time_taken and_count.png └── Unprocessed data.png ├── LICENSE └── README.md /Data/All Gender (1).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/All Gender (1).png -------------------------------------------------------------------------------- /Data/Cleaned_data_all_gender.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Cleaned_data_all_gender.png -------------------------------------------------------------------------------- /Data/Noise in face data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Noise in face data.png -------------------------------------------------------------------------------- /Data/Only man.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Only man.png -------------------------------------------------------------------------------- /Data/Only women.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Only women.png -------------------------------------------------------------------------------- /Data/Readme.md: -------------------------------------------------------------------------------- 1 | ## These are images of actors taken from internet for educational purpose only. 2 | ## The noisy images are those which actually contain the features of a face. It is then cropped out by an MTCNN model in 300 × 300 ratio to obtain such images. 3 | -------------------------------------------------------------------------------- /Data/Time_taken and_count.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Time_taken and_count.png -------------------------------------------------------------------------------- /Data/Unprocessed data.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Unprocessed data.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Nelson Joseph 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | 4 | # Gender Based Cleaning Algorithm 5 | 6 | ![Python](https://img.shields.io/badge/python-v3.6+-blue.svg) 7 | [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/issues) 8 | [![LinkedIn](https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=255)](https://www.linkedin.com/in/nelsonjoseph123/) 9 | [![Youtube](https://img.shields.io/badge/-Youtube-black.svg?style=flat-square&logo=Youtube&colorB=900)](https://www.youtube.com/channel/UCj-j1k_3vC6F1rVgrEhDF7g) 10 | [![Medium](https://img.shields.io/badge/-Medium-black.svg?style=flat-square&logo=Medium&colorB=000)](https://medium.com/me/stories/public) 11 | 12 | 13 | Data cleansing is the major use for this algorithm. It aids in determining the gender of an image by looking at the face. The image is erased if the face cannot be located. The algorithm can be altered to suit different requirements. 14 | 15 | ## General Script 16 | 17 | ```python 18 | 19 | from deepface import DeepFace # Pretrained model which is present in DeepFace library. 20 | from tqdm import tqdm # Used to create a bar that represents process progress. 21 | import cv2 22 | import matplotlib.pyplot as plt 23 | import time 24 | import os 25 | start = time.time() 26 | # plt.imshow(img[:,:,::-1]) 27 | # plt.show() # To display the image if required. 28 | 29 | dire = r"Location of folder in which all the files are present" 30 | for img in tqdm(os.listdir(dire)): 31 | path = dire+'/'+img 32 | try: 33 | # print(path) 34 | img = cv2.imread(path) 35 | result = DeepFace.analyze(img, actions= ['gender']) 36 | # print("Gender: ", result['gender']) 37 | if result['gender'] # We can make changes here for custom use. 38 | os.remove(path) 39 | except ValueError: 40 | os.remove(path) 41 | print("All is done.") # To understand that all the process is finished. 42 | time.sleep(1) 43 | end = time.time() 44 | print(f"Runtime of the program is {end-start}") # To print out the final execution time. 45 | 46 | ``` 47 | ## Use Cases:- 48 | 49 | ## 1. To eliminate noisy photos and only keep images with human faces. 50 | 51 |
52 | 53 | Multiple photographs taken from the internet are combined in the folder. The files contain photos of various genders, some of which are corrupt. These noisy photos can be removed with the help of our script. 54 | 55 | 56 | ### Noise in Face data 57 | 58 |

59 | 60 | The noisy images displayed here are not just arbitrary snapshots. In reality, these are pictures that in some way depict the attributes of a face. These are the results of a face detector model using MTCNN that was cropped out. 61 | 62 | ### Implementation 63 | 64 | We only need to make changes to one line of the general script as follows:- 65 | ```python 66 | if result['gender'] != "Man" and result['gender'] != "Woman": #change the General script with this line of code. 67 | os.remove(path) 68 | 69 | ``` 70 | After running the script we will obtain the following results as shown below. 71 | 72 |
73 | 74 | The only photographs left are those with human faces. 75 | 76 |
77 | 78 | Progress bar is shown for understanding the cleaning status. 79 | Total execution time will be printed out at the end along with the text "All is done". 80 | 81 | ## 2. To determine how many photos contain human faces. 82 | 83 | This uses the same directory as above. We must add a variable count and make the appropriate adjustments in order to determine the number of photos that contain human faces. 84 | 85 | ```python 86 | from deepface import DeepFace 87 | from tqdm import tqdm 88 | import cv2 89 | import os 90 | 91 | dire = r"Location of folder in which all the files are present" 92 | count = 0 #Initiated count 93 | for img in tqdm(os.listdir(dire)): 94 | path = dire+'/'+img 95 | try: 96 | img = cv2.imread(path) 97 | result = DeepFace.analyze(img, actions= ['gender']) 98 | if result['gender'] == "Man" or result['gender'] == "Woman": 99 | count += 1 # Count value is incremented when a face is found. 100 | except ValueError: 101 | pass 102 | print("No of human faces =",count) 103 | ``` 104 | Output is given as 105 | 106 | ```python 107 | No of human faces = 9 108 | ``` 109 | 110 | ## 3. To only save pictures with male faces. 111 | 112 | ### Implementation 113 | 114 | We only need to make changes to one line of the general script as follows:- 115 | ```python 116 | if result['gender'] != "Man" #change the General script with this line of code. 117 | os.remove(path) 118 | ``` 119 | 120 | After executing the script, we will receive a folder with only photographs of men in it and the rest empty. 121 | 122 |

123 | 124 | ## 4. To only save pictures of women's faces. 125 | 126 | ### Implementation 127 | 128 | We only need to make changes to one line of the general script as follows:- 129 | ```python 130 | if result['gender'] != "Woman" #change the General script with this line of code. 131 | os.remove(path) 132 | 133 | ``` 134 | After executing the script, we will receive a folder with just photographs of women in it, with the rest of the images being deleted. 135 | 136 |
137 | 138 | # Dependency Installation 139 | 140 | The essential libraries can be downloaded from ['PyPI'](https://pypi.org/) for installation. The libraries themselves as well as their requirements will be installed. 141 | 142 | ```python 143 | pip install deepface 144 | ``` 145 | -Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace and Dlib. The library is mainly powered by TensorFlow and Keras. 146 | Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level. 147 | 148 | ```python 149 | pip install tqdm 150 | ``` 151 | -tqdm instantly make your loops show a smart progress meter - just wrap any iterable with tqdm(iterable), and you’re done! 152 | 153 | ```python 154 | pip install opencv-python 155 | ``` 156 | -OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source library that includes several hundreds of computer vision algorithms. 157 | 158 | ```python 159 | pip install matplotlib 160 | ``` 161 | -Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. 162 | 163 | -The time and OS modules are part of Python's standard library. So no need to download it. 164 | 165 | Then you will be able to import the libraries and use its functionalities. 166 | 167 | ## Contribution 168 | 169 | Pull requests are welcome. 170 | --------------------------------------------------------------------------------