├── Data
    ├── All Gender (1).png
    ├── Cleaned_data_all_gender.png
    ├── Noise in face data.png
    ├── Only man.png
    ├── Only women.png
    ├── Readme.md
    ├── Time_taken and_count.png
    └── Unprocessed data.png
├── LICENSE
└── README.md


/Data/All Gender (1).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/All Gender (1).png


--------------------------------------------------------------------------------
/Data/Cleaned_data_all_gender.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Cleaned_data_all_gender.png


--------------------------------------------------------------------------------
/Data/Noise in face data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Noise in face data.png


--------------------------------------------------------------------------------
/Data/Only man.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Only man.png


--------------------------------------------------------------------------------
/Data/Only women.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Only women.png


--------------------------------------------------------------------------------
/Data/Readme.md:
--------------------------------------------------------------------------------
1 | ## These are images of actors taken from internet for educational purpose only. 
2 | ## The noisy images are those which actually contain the features of a face. It is then cropped out by an MTCNN model in 300 × 300 ratio to obtain such images.
3 | 


--------------------------------------------------------------------------------
/Data/Time_taken and_count.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Time_taken and_count.png


--------------------------------------------------------------------------------
/Data/Unprocessed data.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nelson123-lab/Gender_based_cleaning_algorithm/396732ff42f88d284f40b422da017fe07912a2ad/Data/Unprocessed data.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Nelson Joseph
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <div align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/74da94d77c0da72f5769494e3df7320510bcbc7e/Data/All%20Gender%20(1).png" width="900"/></div>
  2 | 
  3 | 
  4 | # Gender Based Cleaning Algorithm
  5 | 
  6 | ![Python](https://img.shields.io/badge/python-v3.6+-blue.svg)
  7 | [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/issues)
  8 | [![LinkedIn](https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=255)](https://www.linkedin.com/in/nelsonjoseph123/)
  9 | [![Youtube](https://img.shields.io/badge/-Youtube-black.svg?style=flat-square&logo=Youtube&colorB=900)](https://www.youtube.com/channel/UCj-j1k_3vC6F1rVgrEhDF7g)
 10 | [![Medium](https://img.shields.io/badge/-Medium-black.svg?style=flat-square&logo=Medium&colorB=000)](https://medium.com/me/stories/public)
 11 | 
 12 | 
 13 | Data cleansing is the major use for this algorithm. It aids in determining the gender of an image by looking at the face. The image is erased if the face cannot be located. The algorithm can be altered to suit different requirements.
 14 | 
 15 | ## General Script
 16 | 
 17 | ```python
 18 | 
 19 | from deepface import DeepFace # Pretrained model which is present in DeepFace library.
 20 | from tqdm import tqdm # Used to create a bar that represents process progress.
 21 | import cv2
 22 | import matplotlib.pyplot as plt
 23 | import time
 24 | import os
 25 | start = time.time()
 26 | # plt.imshow(img[:,:,::-1])
 27 | # plt.show() # To display the image if required.
 28 | 
 29 | dire = r"Location of folder in which all the files are present"
 30 | for img in tqdm(os.listdir(dire)):
 31 |     path = dire+'/'+img
 32 |     try:
 33 |         # print(path)
 34 |         img = cv2.imread(path)
 35 |         result = DeepFace.analyze(img, actions= ['gender'])
 36 |         # print("Gender: ", result['gender']) 
 37 |         if result['gender'] # We can make changes here for custom use.
 38 |             os.remove(path)
 39 |     except ValueError:
 40 |         os.remove(path)
 41 | print("All is done.") # To understand that all the process is finished.
 42 | time.sleep(1)
 43 | end = time.time()
 44 | print(f"Runtime of the program is {end-start}") # To print out the final execution time.
 45 | 
 46 | ```
 47 | ## Use Cases:-
 48 | 
 49 | ## 1. To eliminate noisy photos and only keep images with human faces.
 50 | 
 51 | <div align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/6ab531cc304eaa80d52a02556cf2a75abd2b9845/Data/Unprocessed%20data.png" width="900"/></div>
 52 | 
 53 | Multiple photographs taken from the internet are combined in the folder. The files contain photos of various genders, some of which are corrupt. These noisy photos can be removed with the help of our script. 
 54 | 
 55 | 
 56 | ### Noise in Face data
 57 | 
 58 | <p align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/e44635725851404a1143618f275c41d1329ddb59/Data/Noise%20in%20face%20data.png" width="400" height="440"></p>
 59 | 
 60 | The noisy images displayed here are not just arbitrary snapshots. In reality, these are pictures that in some way depict the attributes of a face. These are the results of a face detector model using MTCNN that was cropped out.
 61 | 
 62 | ### Implementation
 63 | 
 64 | We only need to make changes to one line of the general script as follows:-
 65 | ```python
 66 | if result['gender'] != "Man" and result['gender'] != "Woman": #change the General script with this line of code.
 67 |     os.remove(path)
 68 | 
 69 | ```
 70 | After running the script we will obtain the following results as shown below.
 71 | 
 72 | <div align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/6ab531cc304eaa80d52a02556cf2a75abd2b9845/Data/Cleaned_data_all_gender.png" width="900"/></div>
 73 | 
 74 | The only photographs left are those with human faces.
 75 | 
 76 | <div align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/6ab531cc304eaa80d52a02556cf2a75abd2b9845/Data/Time_taken%20and_count.png" width="900"/></div>
 77 | 
 78 | Progress bar is shown for understanding the cleaning status. 
 79 | Total execution time will be printed out at the end along with the text "All is done".
 80 | 
 81 | ## 2. To determine how many photos contain human faces.
 82 | 
 83 | This uses the same directory as above. We must add a variable count and make the appropriate adjustments in order to determine the number of photos that contain human faces.
 84 | 
 85 | ```python
 86 | from deepface import DeepFace
 87 | from tqdm import tqdm
 88 | import cv2
 89 | import os
 90 | 
 91 | dire = r"Location of folder in which all the files are present"
 92 | count = 0 #Initiated count
 93 | for img in tqdm(os.listdir(dire)):
 94 |     path = dire+'/'+img
 95 |     try:
 96 |         img = cv2.imread(path)
 97 |         result = DeepFace.analyze(img, actions= ['gender'])
 98 |         if result['gender'] == "Man" or result['gender'] == "Woman":
 99 |             count += 1 # Count value is incremented when a face is found.
100 |     except ValueError:
101 |         pass
102 | print("No of human faces =",count)
103 | ```
104 | Output is given as 
105 | 
106 | ```python
107 | No of human faces = 9
108 | ```
109 | 
110 | ## 3. To only save pictures with male faces.
111 | 
112 | ### Implementation
113 | 
114 | We only need to make changes to one line of the general script as follows:-
115 | ```python
116 | if result['gender'] != "Man" #change the General script with this line of code.
117 |     os.remove(path)
118 | ```
119 | 
120 | After executing the script, we will receive a folder with only photographs of men in it and the rest empty.
121 | 
122 | <p align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/6ab531cc304eaa80d52a02556cf2a75abd2b9845/Data/Only%20man.png" width="500" height="300"></p>
123 | 
124 | ## 4. To only save pictures of women's faces.
125 | 
126 | ### Implementation
127 | 
128 | We only need to make changes to one line of the general script as follows:-
129 | ```python
130 | if result['gender'] != "Woman" #change the General script with this line of code.
131 |     os.remove(path)
132 | 
133 | ```
134 | After executing the script, we will receive a folder with just photographs of women in it, with the rest of the images being deleted.
135 | 
136 | <div align="center"><img src="https://github.com/nelson123-lab/Gender_based_cleaning_algorithm/blob/6ab531cc304eaa80d52a02556cf2a75abd2b9845/Data/Only%20women.png" width="900"/></div>
137 | 
138 | # Dependency Installation
139 | 
140 | The essential libraries can be downloaded from ['PyPI'](https://pypi.org/) for installation. The libraries themselves as well as their requirements will be installed.
141 | 
142 | ```python
143 | pip install deepface
144 | ```
145 | -Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace and Dlib. The library is mainly powered by TensorFlow and Keras.
146 | Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level.
147 | 
148 | ```python
149 | pip install tqdm
150 | ```
151 | -tqdm instantly make your loops show a smart progress meter - just wrap any iterable with tqdm(iterable), and you’re done!
152 | 
153 | ```python
154 | pip install opencv-python
155 | ```
156 | -OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-source library that includes several hundreds of computer vision algorithms.
157 | 
158 | ```python
159 | pip install matplotlib
160 | ```
161 | -Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
162 | 
163 | -The time and OS modules are part of Python's standard library. So no need to download it.
164 | 
165 | Then you will be able to import the libraries and use its functionalities. 
166 | 
167 | ## Contribution
168 | 
169 | Pull requests are welcome.
170 | 


--------------------------------------------------------------------------------