├── Varimax.png
├── Scree-plot.png
├── Thank-you.png
├── LICENSE
├── README.md
└── Project-code.ipynb
/Varimax.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Sagar-Darji/Personality-prediction/HEAD/Varimax.png
--------------------------------------------------------------------------------
/Scree-plot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Sagar-Darji/Personality-prediction/HEAD/Scree-plot.png
--------------------------------------------------------------------------------
/Thank-you.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Sagar-Darji/Personality-prediction/HEAD/Thank-you.png
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Sagar-Darji
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Personality-prediction
2 | > From the given database `response.csv` Find out the personality traits using this personality prediction project.
3 |
4 | ## Let's get started
5 | [](https://www.linkedin.com/in/sagar-darji-7b7011165/)
6 |
7 | The system will predict one's personality and their traits through basic survey.This system will help the human resource to select right candidate for desired job profile, which in turn provide expert workforce for the organization.
8 |
9 | ### Applications in psychology:
10 | Factor analysis has been used in the study of human intelligence and human personality as a method for comparing the outcomes of (hopefully) objective tests and to construct matrices to define correlations between these outcomes, as well as finding the factors for these results. The field of psychology that measures human intelligence using quantitative testing in this way is known as psychometrics (psycho=mental, metrics=measurement).
11 |
12 | ### Advantages:
13 | - Offers a much more objective method of testing traits such as intelligence in humans
14 | - Allows for a satisfactory comparison between the results of intelligence tests
15 | - Provides support for theories that would be difficult to prove otherwise
16 |
17 | ## Algorithm
18 | ```python
19 | Refine the Data
20 | Prepare the Data
21 | Choose the Factor
22 | variable
23 | correlation matrix
24 | using any method of factor analysis such as EFA
25 | Decide no. of factors
26 | factor loading of factors
27 | rotation of factor loadings
28 | provide appropriate no. of factors
29 |
30 | ```
31 |
32 | ## Now understand and implement the code
33 | > Import all libraries which we needed to perform this `python code`
34 | ```python
35 | #Librerias
36 | import pandas as pd
37 | import numpy as np
38 | import matplotlib.pyplot as plt
39 | ```
40 | Make Dataframe using `Pandas` and shaped it
41 | ```python
42 | #Data
43 | df = pd.read_csv("responses.csv")
44 | df.shape
45 | ```
46 | > Out: (1010, 15)
47 |
48 | In `response.csv` [1010 rows, 150 rows]
49 | > Which means this data collected by surveying 1010 individuals and there is 150 types of different prefrence & fields.
50 |
51 | MUSIC PREFERENCES (19) 0:19
52 |
53 | MOVIE PREFERENCES (12) 19:31
54 |
55 | HOBBIES & INTERESTS (32) 31:63
56 |
57 | PHOBIAS (10) 63:73
58 |
59 | HEALTH HABITS (3) 73:76
60 |
61 | PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS (57) 76:133
62 |
63 | SPENDING HABITS (7) 133:140
64 |
65 | DEMOGRAPHICS (10 ) 140:150
66 |
67 | We will take only: PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS (57) `76:133`
68 |
69 |
70 | ```python
71 | df = df.iloc[:, 76:133]
72 | df.head(5)
73 | ```
74 | > Out:
75 |
76 |
77 |
78 |
79 |
80 |
Daily events
81 |
Prioritising workload
82 |
Writing notes
83 |
Workaholism
84 |
Thinking ahead
85 |
Final judgement
86 |
Reliability
87 |
Keeping promises
88 |
Loss of interest
89 |
Friends versus money
90 |
...
91 |
Happiness in life
92 |
Energy levels
93 |
Small - big dogs
94 |
Personality
95 |
Finding lost valuables
96 |
Getting up
97 |
Interests or hobbies
98 |
Parents' advice
99 |
Questionnaires or polls
100 |
Internet usage
101 |
102 |
103 |
104 |
105 |
0
106 |
2.0
107 |
2.0
108 |
5.0
109 |
4.0
110 |
2.0
111 |
5.0
112 |
4.0
113 |
4.0
114 |
1.0
115 |
3.0
116 |
...
117 |
4.0
118 |
5.0
119 |
1.0
120 |
4.0
121 |
3.0
122 |
2.0
123 |
3.0
124 |
4.0
125 |
3.0
126 |
few hours a day
127 |
128 |
129 |
1
130 |
3.0
131 |
2.0
132 |
4.0
133 |
5.0
134 |
4.0
135 |
1.0
136 |
4.0
137 |
4.0
138 |
3.0
139 |
4.0
140 |
...
141 |
4.0
142 |
3.0
143 |
5.0
144 |
3.0
145 |
4.0
146 |
5.0
147 |
3.0
148 |
2.0
149 |
3.0
150 |
few hours a day
151 |
152 |
153 |
2
154 |
1.0
155 |
2.0
156 |
5.0
157 |
3.0
158 |
5.0
159 |
3.0
160 |
4.0
161 |
5.0
162 |
1.0
163 |
5.0
164 |
...
165 |
4.0
166 |
4.0
167 |
3.0
168 |
3.0
169 |
3.0
170 |
4.0
171 |
5.0
172 |
3.0
173 |
1.0
174 |
few hours a day
175 |
176 |
177 |
3
178 |
4.0
179 |
4.0
180 |
4.0
181 |
5.0
182 |
3.0
183 |
1.0
184 |
3.0
185 |
4.0
186 |
5.0
187 |
2.0
188 |
...
189 |
2.0
190 |
2.0
191 |
1.0
192 |
2.0
193 |
1.0
194 |
1.0
195 |
NaN
196 |
2.0
197 |
4.0
198 |
most of the day
199 |
200 |
201 |
4
202 |
3.0
203 |
1.0
204 |
2.0
205 |
3.0
206 |
5.0
207 |
5.0
208 |
5.0
209 |
4.0
210 |
2.0
211 |
3.0
212 |
...
213 |
3.0
214 |
5.0
215 |
3.0
216 |
3.0
217 |
2.0
218 |
4.0
219 |
3.0
220 |
3.0
221 |
3.0
222 |
few hours a day
223 |
224 |
225 |
226 |
5 rows × 57 columns
227 |
228 |
229 |
230 | # 1. Prepare the Data
231 |
232 | ```python
233 | #Drop NAs
234 | df = df.dropna()
235 | #...............................................................................................
236 | #Encode categorical data
237 | from sklearn.preprocessing import LabelEncoder
238 |
239 | df = df.apply(LabelEncoder().fit_transform)
240 | df
241 | ```
242 |
243 | `dropna()` method will remove Null value from dataframe.
244 |
245 | Why are we encoding the data?
246 | > In order to analys data require all i/p & o/p variable to be nummeric. This means that if our data contains categorical dat, we must encode it to number before you can fit and evalute a model.
247 |
248 | There is two type of encoding
249 | 1. `Integer encoding`
250 | > each unique label is mapped to an integer.
251 | 2. `One hot encoding`
252 | > It refers to splitting the column which contains numerical categorical data to many columns depending on the number of categories present in that column. Each column contains “0” or “1” corresponding to which column it has been placed.
253 |
254 | | Before Encoding | After Encoding |
255 | |--------|--------|
256 | | Height | Height |
257 | | Tall | 0 |
258 | | Short | 1 |
259 | | Medium | 2 |
260 | | Medium | 2 |
261 | | Short | 1 |
262 | | Tall | 0 |
263 |
264 | Here, We have used `One hot encoding`.
265 |
266 | > Out:
267 |
268 |
269 |
270 |
271 |
272 |
273 |
Daily events
274 |
Prioritising workload
275 |
Writing notes
276 |
Workaholism
277 |
Thinking ahead
278 |
Final judgement
279 |
Reliability
280 |
Keeping promises
281 |
Loss of interest
282 |
Friends versus money
283 |
...
284 |
Happiness in life
285 |
Energy levels
286 |
Small - big dogs
287 |
Personality
288 |
Finding lost valuables
289 |
Getting up
290 |
Interests or hobbies
291 |
Parents' advice
292 |
Questionnaires or polls
293 |
Internet usage
294 |
295 |
296 |
297 |
298 |
0
299 |
1
300 |
1
301 |
4
302 |
3
303 |
1
304 |
4
305 |
3
306 |
3
307 |
0
308 |
2
309 |
...
310 |
3
311 |
4
312 |
0
313 |
3
314 |
2
315 |
1
316 |
2
317 |
3
318 |
2
319 |
0
320 |
321 |
322 |
1
323 |
2
324 |
1
325 |
3
326 |
4
327 |
3
328 |
0
329 |
3
330 |
3
331 |
2
332 |
3
333 |
...
334 |
3
335 |
2
336 |
4
337 |
2
338 |
3
339 |
4
340 |
2
341 |
1
342 |
2
343 |
0
344 |
345 |
346 |
2
347 |
0
348 |
1
349 |
4
350 |
2
351 |
4
352 |
2
353 |
3
354 |
4
355 |
0
356 |
4
357 |
...
358 |
3
359 |
3
360 |
2
361 |
2
362 |
2
363 |
3
364 |
4
365 |
2
366 |
0
367 |
0
368 |
369 |
370 |
4
371 |
2
372 |
0
373 |
1
374 |
2
375 |
4
376 |
4
377 |
4
378 |
3
379 |
1
380 |
2
381 |
...
382 |
2
383 |
4
384 |
2
385 |
2
386 |
1
387 |
3
388 |
2
389 |
2
390 |
2
391 |
0
392 |
393 |
394 |
5
395 |
1
396 |
1
397 |
2
398 |
2
399 |
2
400 |
0
401 |
2
402 |
3
403 |
2
404 |
1
405 |
...
406 |
2
407 |
3
408 |
3
409 |
2
410 |
2
411 |
2
412 |
4
413 |
2
414 |
3
415 |
0
416 |
417 |
418 |
...
419 |
...
420 |
...
421 |
...
422 |
...
423 |
...
424 |
...
425 |
...
426 |
...
427 |
...
428 |
...
429 |
...
430 |
...
431 |
...
432 |
...
433 |
...
434 |
...
435 |
...
436 |
...
437 |
...
438 |
...
439 |
...
440 |
441 |
442 |
1005
443 |
2
444 |
1
445 |
0
446 |
3
447 |
1
448 |
2
449 |
2
450 |
2
451 |
3
452 |
3
453 |
...
454 |
3
455 |
2
456 |
2
457 |
2
458 |
3
459 |
4
460 |
3
461 |
3
462 |
2
463 |
0
464 |
465 |
466 |
1006
467 |
0
468 |
2
469 |
0
470 |
4
471 |
4
472 |
4
473 |
4
474 |
3
475 |
0
476 |
1
477 |
...
478 |
3
479 |
3
480 |
2
481 |
4
482 |
2
483 |
0
484 |
2
485 |
3
486 |
2
487 |
1
488 |
489 |
490 |
1007
491 |
2
492 |
0
493 |
0
494 |
0
495 |
3
496 |
0
497 |
2
498 |
4
499 |
0
500 |
3
501 |
...
502 |
2
503 |
0
504 |
2
505 |
1
506 |
2
507 |
4
508 |
0
509 |
3
510 |
4
511 |
2
512 |
513 |
514 |
1008
515 |
2
516 |
0
517 |
4
518 |
0
519 |
2
520 |
3
521 |
3
522 |
3
523 |
4
524 |
2
525 |
...
526 |
2
527 |
1
528 |
1
529 |
3
530 |
0
531 |
4
532 |
2
533 |
2
534 |
2
535 |
2
536 |
537 |
538 |
1009
539 |
2
540 |
4
541 |
3
542 |
4
543 |
3
544 |
2
545 |
4
546 |
4
547 |
2
548 |
3
549 |
...
550 |
3
551 |
1
552 |
2
553 |
3
554 |
0
555 |
1
556 |
1
557 |
2
558 |
4
559 |
0
560 |
561 |
562 |
563 |
864 rows × 57 columns
564 |
565 |
566 | ## 2. Choose the Factors
567 |
568 | ```python
569 | pip install factor_analyzer
570 | ```
571 | Requirement already satisfied: factor_analyzer in c:\users\dell\anaconda3\lib\site-packages (0.3.2)
572 | Requirement already satisfied: pandas in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (0.25.1)
573 | Requirement already satisfied: scipy in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (1.3.1)
574 | Requirement already satisfied: numpy in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (1.16.5)
575 | Requirement already satisfied: scikit-learn in c:\users\dell\anaconda3\lib\site-packages (from factor_analyzer) (0.21.3)
576 | Requirement already satisfied: pytz>=2017.2 in c:\users\dell\anaconda3\lib\site-packages (from pandas->factor_analyzer) (2019.3)
577 | Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\dell\anaconda3\lib\site-packages (from pandas->factor_analyzer) (2.8.0)
578 | Requirement already satisfied: joblib>=0.11 in c:\users\dell\anaconda3\lib\site-packages (from scikit-learn->factor_analyzer) (0.13.2)
579 | Requirement already satisfied: six>=1.5 in c:\users\dell\anaconda3\lib\site-packages (from python-dateutil>=2.6.1->pandas->factor_analyzer) (1.12.0)
580 | Note: you may need to restart the kernel to use updated packages.
581 |
582 | `Factor Analyzer`
583 | > Reduce large no. variables into fewer no. of factors. This is a python module to perform exploratory and factor analysis with several optional rotations. It also includes a class to perform confirmatory factor analysis (CFA), with curtain predefined techniques.
584 |
585 | What is Factor Roatation
586 | > minimize the complexity of the factor loadings to make the structure simpler to interpret.
587 |
588 | There is two type of rotation
589 | 1. Orthogonal rotation
590 | > constrain the factors to be uncorrelated. Althogh often favored, In many cases it is unrealistic to expect the factor to be uncorrelated and forcing then to be uncorrelated make it less likely that the rotation produces a solution with simple structure.
591 | Method:
592 | 1. `varimax`
593 | > it maximizes the sum of the variances of the squared loadings and makes the structure simpler.
594 | Mathematical equation of `varimax`
595 | 
596 | 1. `quatimax`
597 | 1. `equimax`
598 | 4. Oblique rotation
599 | > permit the factors to be correlated with one another often produces solution with a simpler structure.
600 |
601 | Here, Our data is uncorrelated so we have used Orthogonal's `varimax` rotation method.
602 |
603 | Now, We determine no. of factor using Scree plot
604 | > we can use also eigenvalue to determine no. of factor but that is more complex and by Scree plot its is to find.
605 |
606 | ```python
607 | #Try the model with all the variables
608 | from factor_analyzer import FactorAnalyzer # pip install factor_analyzer
609 | fa = FactorAnalyzer(rotation="varimax")
610 | fa.fit(df)
611 |
612 | # Check Eigenvalues
613 | ev, v = fa.get_eigenvalues()
614 | ev
615 |
616 | # Create scree plot using matplotlib
617 | plt.scatter(range(1,df.shape[1]+1),ev)
618 | plt.plot(range(1,df.shape[1]+1),ev)
619 | plt.title('Scree Plot')
620 | plt.xlabel('Factors')
621 | plt.ylabel('Eigenvalue')
622 | plt.grid()
623 | plt.show()
624 | ```
625 | > Out:
626 |
627 | 
628 |
629 | How we find no. of factor?
630 | > A scree plot shows the eigenvalues on the y-axis and the number of factors on the x-axis. It always displays a downward curve.The point where the slope of the curve is clearly leveling off (the “elbow) indicates the number of factors that should be generated by the analysis.
631 |
632 | As you can see the most usefull factors for explain the data are between 5-6 until falling significantly.
633 |
634 | We will fit the model with 5 Factors:
635 |
636 | ```python
637 | #Factor analysis with 5 fators
638 | fa = FactorAnalyzer(5, rotation="varimax")
639 | fa.fit(df)
640 | AF = fa.loadings_
641 | AF = pd.DataFrame(AF)
642 | AF.index = df.columns
643 | AF
644 | ```
645 | > Out:
646 |
647 |
648 |
649 |
650 |
651 |
0
652 |
1
653 |
2
654 |
3
655 |
4
656 |
657 |
658 |
659 |
660 |
Daily events
661 |
0.250416
662 |
0.058953
663 |
0.206877
664 |
0.026094
665 |
0.028915
666 |
667 |
668 |
Prioritising workload
669 |
-0.012803
670 |
-0.150045
671 |
0.555946
672 |
0.078913
673 |
0.128156
674 |
675 |
676 |
Writing notes
677 |
-0.006039
678 |
-0.015927
679 |
0.420849
680 |
0.225307
681 |
0.261380
682 |
683 |
684 |
Workaholism
685 |
0.069524
686 |
0.029275
687 |
0.527082
688 |
0.088573
689 |
0.032979
690 |
691 |
692 |
Thinking ahead
693 |
0.023475
694 |
0.127909
695 |
0.530457
696 |
0.035213
697 |
0.055426
698 |
699 |
700 |
Final judgement
701 |
0.046188
702 |
0.112493
703 |
0.119861
704 |
0.381338
705 |
-0.039756
706 |
707 |
708 |
Reliability
709 |
0.061028
710 |
-0.102481
711 |
0.539373
712 |
0.073534
713 |
-0.003491
714 |
715 |
716 |
Keeping promises
717 |
0.053358
718 |
-0.034661
719 |
0.420538
720 |
0.121450
721 |
-0.033511
722 |
723 |
724 |
Loss of interest
725 |
0.273777
726 |
0.226286
727 |
0.003524
728 |
-0.149262
729 |
0.101882
730 |
731 |
732 |
Friends versus money
733 |
0.021279
734 |
-0.111839
735 |
0.022026
736 |
0.381357
737 |
-0.045824
738 |
739 |
740 |
Funniness
741 |
0.312861
742 |
0.131400
743 |
-0.043014
744 |
-0.018258
745 |
-0.026083
746 |
747 |
748 |
Fake
749 |
0.091188
750 |
0.469616
751 |
-0.024535
752 |
-0.191798
753 |
0.019356
754 |
755 |
756 |
Criminal damage
757 |
0.154868
758 |
0.177732
759 |
-0.112659
760 |
-0.240721
761 |
0.266761
762 |
763 |
764 |
Decision making
765 |
-0.287128
766 |
0.102033
767 |
0.267415
768 |
0.129336
769 |
0.158694
770 |
771 |
772 |
Elections
773 |
0.074306
774 |
-0.015585
775 |
0.222003
776 |
0.131404
777 |
-0.083563
778 |
779 |
780 |
Self-criticism
781 |
-0.016858
782 |
0.398420
783 |
0.229116
784 |
0.114144
785 |
0.069707
786 |
787 |
788 |
Judgment calls
789 |
0.182082
790 |
-0.010461
791 |
0.102263
792 |
0.035675
793 |
0.086474
794 |
795 |
796 |
Hypochondria
797 |
-0.040254
798 |
0.258913
799 |
-0.034874
800 |
0.042981
801 |
0.213548
802 |
803 |
804 |
Empathy
805 |
-0.050152
806 |
-0.073697
807 |
0.059441
808 |
0.324982
809 |
0.133754
810 |
811 |
812 |
Eating to survive
813 |
-0.010608
814 |
0.183045
815 |
0.003261
816 |
-0.015131
817 |
-0.018874
818 |
819 |
820 |
Giving
821 |
0.082276
822 |
-0.154549
823 |
0.112481
824 |
0.376723
825 |
0.234000
826 |
827 |
828 |
Compassion to animals
829 |
-0.083505
830 |
-0.002767
831 |
-0.010424
832 |
0.262183
833 |
0.192734
834 |
835 |
836 |
Borrowed stuff
837 |
-0.097017
838 |
-0.023047
839 |
0.323253
840 |
0.171017
841 |
0.071189
842 |
843 |
844 |
Loneliness
845 |
-0.199197
846 |
0.542350
847 |
-0.019272
848 |
0.045942
849 |
0.190369
850 |
851 |
852 |
Cheating in school
853 |
0.216223
854 |
-0.063183
855 |
-0.384634
856 |
-0.083940
857 |
0.208210
858 |
859 |
860 |
Health
861 |
-0.012267
862 |
0.027867
863 |
0.131645
864 |
0.184296
865 |
0.437826
866 |
867 |
868 |
Changing the past
869 |
-0.016622
870 |
0.482307
871 |
-0.161320
872 |
0.073843
873 |
0.159231
874 |
875 |
876 |
God
877 |
0.047894
878 |
0.032281
879 |
0.027136
880 |
0.453873
881 |
-0.025963
882 |
883 |
884 |
Dreams
885 |
0.207076
886 |
-0.187723
887 |
0.078634
888 |
0.037709
889 |
-0.124853
890 |
891 |
892 |
Charity
893 |
0.163161
894 |
0.116834
895 |
0.156898
896 |
0.354953
897 |
-0.067795
898 |
899 |
900 |
Number of friends
901 |
0.514994
902 |
-0.321738
903 |
-0.086711
904 |
0.241070
905 |
-0.006859
906 |
907 |
908 |
Punctuality
909 |
0.004662
910 |
0.090531
911 |
-0.143569
912 |
0.069648
913 |
0.078111
914 |
915 |
916 |
Lying
917 |
-0.095933
918 |
-0.193370
919 |
0.001775
920 |
0.138092
921 |
0.006950
922 |
923 |
924 |
Waiting
925 |
0.032019
926 |
-0.067715
927 |
-0.000820
928 |
0.075966
929 |
-0.329606
930 |
931 |
932 |
New environment
933 |
0.470076
934 |
-0.129745
935 |
-0.058912
936 |
0.005400
937 |
-0.230743
938 |
939 |
940 |
Mood swings
941 |
-0.086477
942 |
0.353226
943 |
-0.041005
944 |
0.031490
945 |
0.404388
946 |
947 |
948 |
Appearence and gestures
949 |
0.227246
950 |
-0.004762
951 |
0.105894
952 |
0.068825
953 |
0.303119
954 |
955 |
956 |
Socializing
957 |
0.537811
958 |
-0.096245
959 |
-0.048127
960 |
0.135323
961 |
-0.039204
962 |
963 |
964 |
Achievements
965 |
0.252835
966 |
0.048658
967 |
-0.042799
968 |
-0.082401
969 |
0.111902
970 |
971 |
972 |
Responding to a serious letter
973 |
-0.126985
974 |
0.087976
975 |
-0.026876
976 |
0.022940
977 |
0.013346
978 |
979 |
980 |
Children
981 |
0.079877
982 |
-0.134254
983 |
0.033040
984 |
0.440103
985 |
0.075663
986 |
987 |
988 |
Assertiveness
989 |
0.353462
990 |
-0.094372
991 |
0.002509
992 |
-0.067185
993 |
0.044117
994 |
995 |
996 |
Getting angry
997 |
0.051167
998 |
0.176922
999 |
-0.086069
1000 |
-0.070837
1001 |
0.532025
1002 |
1003 |
1004 |
Knowing the right people
1005 |
0.478657
1006 |
0.022868
1007 |
0.113503
1008 |
-0.045359
1009 |
0.227230
1010 |
1011 |
1012 |
Public speaking
1013 |
-0.385674
1014 |
0.104662
1015 |
0.069712
1016 |
0.030447
1017 |
0.190834
1018 |
1019 |
1020 |
Unpopularity
1021 |
-0.082146
1022 |
0.229228
1023 |
0.079173
1024 |
0.241031
1025 |
-0.031212
1026 |
1027 |
1028 |
Life struggles
1029 |
-0.226293
1030 |
0.057892
1031 |
-0.059615
1032 |
0.384875
1033 |
0.392060
1034 |
1035 |
1036 |
Happiness in life
1037 |
0.288585
1038 |
-0.541050
1039 |
0.158473
1040 |
0.051235
1041 |
-0.064525
1042 |
1043 |
1044 |
Energy levels
1045 |
0.499978
1046 |
-0.478860
1047 |
0.037918
1048 |
0.122773
1049 |
-0.025001
1050 |
1051 |
1052 |
Small - big dogs
1053 |
0.206696
1054 |
0.040211
1055 |
-0.143225
1056 |
-0.203991
1057 |
-0.131298
1058 |
1059 |
1060 |
Personality
1061 |
0.259646
1062 |
-0.393197
1063 |
0.064236
1064 |
0.049013
1065 |
-0.056988
1066 |
1067 |
1068 |
Finding lost valuables
1069 |
-0.127907
1070 |
-0.011367
1071 |
0.163354
1072 |
0.391951
1073 |
-0.101749
1074 |
1075 |
1076 |
Getting up
1077 |
0.012217
1078 |
0.150551
1079 |
-0.312297
1080 |
0.082580
1081 |
0.121198
1082 |
1083 |
1084 |
Interests or hobbies
1085 |
0.465627
1086 |
-0.253289
1087 |
0.065015
1088 |
0.144827
1089 |
-0.078694
1090 |
1091 |
1092 |
Parents' advice
1093 |
0.022594
1094 |
-0.032871
1095 |
0.243628
1096 |
0.282252
1097 |
0.113225
1098 |
1099 |
1100 |
Questionnaires or polls
1101 |
-0.045177
1102 |
0.114865
1103 |
0.154309
1104 |
0.188501
1105 |
-0.032532
1106 |
1107 |
1108 |
Internet usage
1109 |
-0.046077
1110 |
0.075435
1111 |
-0.007799
1112 |
-0.081575
1113 |
0.048144
1114 |
1115 |
1116 |
1117 |
1118 |
1119 | ```python
1120 | #Get Top variables for each Factor
1121 | F = AF.unstack()
1122 | F = pd.DataFrame(F).reset_index()
1123 | F = F.sort_values(['level_0',0], ascending=False).groupby('level_0').head(5) # Top 5
1124 | F = F.sort_values(by="level_0")
1125 | F.columns=["FACTOR","Variable","Varianza_Explica"]
1126 | F = F.reset_index().drop(["index"],axis=1)
1127 | F
1128 | ```
1129 | > Out:
1130 |
1131 |
1132 |
1133 |
1134 |
1135 |
FACTOR
1136 |
Variable
1137 |
Varianza_Explica
1138 |
1139 |
1140 |
1141 |
1142 |
0
1143 |
0
1144 |
New environment
1145 |
0.470076
1146 |
1147 |
1148 |
1
1149 |
0
1150 |
Energy levels
1151 |
0.499978
1152 |
1153 |
1154 |
2
1155 |
0
1156 |
Number of friends
1157 |
0.514994
1158 |
1159 |
1160 |
3
1161 |
0
1162 |
Socializing
1163 |
0.537811
1164 |
1165 |
1166 |
4
1167 |
0
1168 |
Knowing the right people
1169 |
0.478657
1170 |
1171 |
1172 |
5
1173 |
1
1174 |
Mood swings
1175 |
0.353226
1176 |
1177 |
1178 |
6
1179 |
1
1180 |
Self-criticism
1181 |
0.398420
1182 |
1183 |
1184 |
7
1185 |
1
1186 |
Fake
1187 |
0.469616
1188 |
1189 |
1190 |
8
1191 |
1
1192 |
Changing the past
1193 |
0.482307
1194 |
1195 |
1196 |
9
1197 |
1
1198 |
Loneliness
1199 |
0.542350
1200 |
1201 |
1202 |
10
1203 |
2
1204 |
Writing notes
1205 |
0.420849
1206 |
1207 |
1208 |
11
1209 |
2
1210 |
Workaholism
1211 |
0.527082
1212 |
1213 |
1214 |
12
1215 |
2
1216 |
Thinking ahead
1217 |
0.530457
1218 |
1219 |
1220 |
13
1221 |
2
1222 |
Prioritising workload
1223 |
0.555946
1224 |
1225 |
1226 |
14
1227 |
2
1228 |
Reliability
1229 |
0.539373
1230 |
1231 |
1232 |
15
1233 |
3
1234 |
Friends versus money
1235 |
0.381357
1236 |
1237 |
1238 |
16
1239 |
3
1240 |
Life struggles
1241 |
0.384875
1242 |
1243 |
1244 |
17
1245 |
3
1246 |
Finding lost valuables
1247 |
0.391951
1248 |
1249 |
1250 |
18
1251 |
3
1252 |
Children
1253 |
0.440103
1254 |
1255 |
1256 |
19
1257 |
3
1258 |
God
1259 |
0.453873
1260 |
1261 |
1262 |
20
1263 |
4
1264 |
Appearence and gestures
1265 |
0.303119
1266 |
1267 |
1268 |
21
1269 |
4
1270 |
Life struggles
1271 |
0.392060
1272 |
1273 |
1274 |
22
1275 |
4
1276 |
Mood swings
1277 |
0.404388
1278 |
1279 |
1280 |
23
1281 |
4
1282 |
Health
1283 |
0.437826
1284 |
1285 |
1286 |
24
1287 |
4
1288 |
Getting angry
1289 |
0.532025
1290 |
1291 |
1292 |
1293 |
1294 |
1295 |
1296 | ```python
1297 | #Show the Top for each Factor
1298 | F = F.pivot(columns='FACTOR')["Variable"]
1299 | F.apply(lambda x: pd.Series(x.dropna().to_numpy()))
1300 | ```
1301 | > Out:
1302 |
1303 |
1304 |
1305 |
1306 |
FACTOR
1307 |
0
1308 |
1
1309 |
2
1310 |
3
1311 |
4
1312 |
1313 |
1314 |
1315 |
1316 |
0
1317 |
New environment
1318 |
Mood swings
1319 |
Writing notes
1320 |
Friends versus money
1321 |
Appearence and gestures
1322 |
1323 |
1324 |
1
1325 |
Energy levels
1326 |
Self-criticism
1327 |
Workaholism
1328 |
Life struggles
1329 |
Life struggles
1330 |
1331 |
1332 |
2
1333 |
Number of friends
1334 |
Fake
1335 |
Thinking ahead
1336 |
Finding lost valuables
1337 |
Mood swings
1338 |
1339 |
1340 |
3
1341 |
Socializing
1342 |
Changing the past
1343 |
Prioritising workload
1344 |
Children
1345 |
Health
1346 |
1347 |
1348 |
4
1349 |
Knowing the right people
1350 |
Loneliness
1351 |
Reliability
1352 |
God
1353 |
Getting angry
1354 |
1355 |
1356 |
1357 |
1358 |
1359 | ---
1360 |
1361 | FACTOR 1: Energy levels, Number of friends, Socializing...
1362 |
1363 | Could be: Extraversion
1364 |
1365 | ---
1366 |
1367 | FACTOR 2: Self-ciricism, Fake, Loneliness...
1368 |
1369 | Looks very similar to "Neuroticism"
1370 |
1371 | ---
1372 |
1373 | Factor 3: Thinking ahead, Prioritising workload...
1374 |
1375 | very similar to "Conscientiousness"
1376 |
1377 | ---
1378 |
1379 | Factor 4: Children, God, Finding lost valuables
1380 |
1381 | This factor could be something like "religious" or "conservative", maybe have lowest scores of a "Openness" in Big Five model.
1382 |
1383 | ---
1384 |
1385 | Factor 5: Appearence and gestures, Mood swings
1386 |
1387 | Mmmm it could be "Agreeableness". What do you think it could be represent?
1388 |
1389 | ---
1390 |
1391 | ## Conclusion
1392 | The first three Factors are very clear: Extraversion, Neuroticism and Conscientiousness. The other two not to much. Anyway is a very interesting approximation
1393 |
1394 | Maybe doing first a PCA for remove hight correlate variables like "God" and "Final judgement"could help.
1395 |
1396 | What do you think?
1397 |
1398 | # Thank you
1399 | 
1400 | > I appreciate especially your `Heart`
1401 |
--------------------------------------------------------------------------------
/Project-code.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 3,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "#Librerias\n",
10 | "import pandas as pd \n",
11 | "import numpy as np\n",
12 | "import matplotlib.pyplot as plt "
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": 4,
18 | "metadata": {},
19 | "outputs": [
20 | {
21 | "data": {
22 | "text/plain": [
23 | "(1010, 150)"
24 | ]
25 | },
26 | "execution_count": 4,
27 | "metadata": {},
28 | "output_type": "execute_result"
29 | }
30 | ],
31 | "source": [
32 | "#Data\n",
33 | "df = pd.read_csv(\"responses.csv\")\n",
34 | "df.shape"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {},
40 | "source": [
41 | "MUSIC PREFERENCES (19) 0:19\n",
42 | "\n",
43 | "MOVIE PREFERENCES (12) 19:31\n",
44 | "\n",
45 | "HOBBIES & INTERESTS (32) 31:63\n",
46 | "\n",
47 | "PHOBIAS (10) 63:73\n",
48 | "\n",
49 | "HEALTH HABITS (3) 73:76\n",
50 | "\n",
51 | "PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS (57) 76:133\n",
52 | "\n",
53 | "SPENDING HABITS (7) 133:140\n",
54 | "\n",
55 | "DEMOGRAPHICS (10 ) 140:150\n",
56 | "\n",
57 | "We will take only: PERSONALITY TRAITS, VIEWS ON LIFE & OPINIONS (57) 76:133"
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": 5,
63 | "metadata": {},
64 | "outputs": [
65 | {
66 | "data": {
67 | "text/html": [
68 | "