├── .DS_Store
├── .github
└── workflows
│ └── python-package-conda.yml
├── .gitignore
├── .vscode
└── settings.json
├── CONTRIBUTING.md
├── Examples
├── AutoViz_Bokeh_Interactive_Demo.ipynb
├── AutoViz_Demo.ipynb
├── Boston.csv
└── LCA Bokeh.ipynb
├── LICENSE
├── README.md
├── autoviz
├── AutoViz_Class.py
├── AutoViz_Holo.py
├── AutoViz_NLP.py
├── AutoViz_Utils.py
├── __init__.py
├── __version__.py
├── classify_method.py
├── test.png
└── tests
│ ├── __init__.py
│ ├── test_autoviz_class.py
│ └── test_deps.py
├── images
├── bokeh_charts.JPG
├── data_clean.png
├── logo.JPG
├── logo.png
├── server_charts.JPG
└── var_charts.JPG
├── old_setup.py
├── requirements-py310.txt
├── requirements-py311.txt
├── requirements.txt
├── setup.py
└── updates.md
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/.DS_Store
--------------------------------------------------------------------------------
/.github/workflows/python-package-conda.yml:
--------------------------------------------------------------------------------
1 | name: Python Package using Conda
2 |
3 | on: [push]
4 |
5 | jobs:
6 | build-linux:
7 | runs-on: ubuntu-latest
8 | strategy:
9 | matrix:
10 | os: [ubuntu-latest, macos-latest, windows-latest]:
11 | max-parallel: 5
12 |
13 | steps:
14 | - uses: actions/checkout@v3
15 | - name: Set up Python 3.10
16 | uses: actions/setup-python@v3
17 | with:
18 | python-version: '3.10'
19 | - name: Add conda to system path
20 | run: |
21 | # $CONDA is an environment variable pointing to the root of the miniconda directory
22 | echo $CONDA/bin >> $GITHUB_PATH
23 | - name: Install dependencies
24 | run: |
25 | conda env update --file environment.yml --name base
26 | - name: Lint with flake8
27 | run: |
28 | conda install flake8
29 | # stop the build if there are Python syntax errors or undefined names
30 | flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
31 | # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
32 | flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
33 | - name: Test with pytest
34 | run: |
35 | conda install pytest
36 | pytest
37 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints/
2 | __pycache__/
3 | .idea/
4 | dist/
5 | autoviz.egg-info/
6 | build/
7 |
--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "python.pythonPath": "/opt/conda/bin/python"
3 | }
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 |
3 | We welcome contributions from anyone beginner or advanced. Please before working on some feature
4 |
5 | - search through the past issues, your concern may have been raised by others in the past. Check through
6 | closed issues as well.
7 | - if there is no open issue for your feature request please open one up to coordinate all collaborators
8 | - write your feature
9 | - submit a pull request on this repo with:
10 | - a brief description
11 | - **detail of the expected change(s) in behaviour**
12 | - how to test it (if it's not obvious)
13 |
14 | Ask someone to test it.
15 |
--------------------------------------------------------------------------------
/Examples/Boston.csv:
--------------------------------------------------------------------------------
1 | "","crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","black","lstat","medv"
2 | "1",0.00632,18,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24
3 | "2",0.02731,0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
4 | "3",0.02729,0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
5 | "4",0.03237,0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
6 | "5",0.06905,0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
7 | "6",0.02985,0,2.18,0,0.458,6.43,58.7,6.0622,3,222,18.7,394.12,5.21,28.7
8 | "7",0.08829,12.5,7.87,0,0.524,6.012,66.6,5.5605,5,311,15.2,395.6,12.43,22.9
9 | "8",0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311,15.2,396.9,19.15,27.1
10 | "9",0.21124,12.5,7.87,0,0.524,5.631,100,6.0821,5,311,15.2,386.63,29.93,16.5
11 | "10",0.17004,12.5,7.87,0,0.524,6.004,85.9,6.5921,5,311,15.2,386.71,17.1,18.9
12 | "11",0.22489,12.5,7.87,0,0.524,6.377,94.3,6.3467,5,311,15.2,392.52,20.45,15
13 | "12",0.11747,12.5,7.87,0,0.524,6.009,82.9,6.2267,5,311,15.2,396.9,13.27,18.9
14 | "13",0.09378,12.5,7.87,0,0.524,5.889,39,5.4509,5,311,15.2,390.5,15.71,21.7
15 | "14",0.62976,0,8.14,0,0.538,5.949,61.8,4.7075,4,307,21,396.9,8.26,20.4
16 | "15",0.63796,0,8.14,0,0.538,6.096,84.5,4.4619,4,307,21,380.02,10.26,18.2
17 | "16",0.62739,0,8.14,0,0.538,5.834,56.5,4.4986,4,307,21,395.62,8.47,19.9
18 | "17",1.05393,0,8.14,0,0.538,5.935,29.3,4.4986,4,307,21,386.85,6.58,23.1
19 | "18",0.7842,0,8.14,0,0.538,5.99,81.7,4.2579,4,307,21,386.75,14.67,17.5
20 | "19",0.80271,0,8.14,0,0.538,5.456,36.6,3.7965,4,307,21,288.99,11.69,20.2
21 | "20",0.7258,0,8.14,0,0.538,5.727,69.5,3.7965,4,307,21,390.95,11.28,18.2
22 | "21",1.25179,0,8.14,0,0.538,5.57,98.1,3.7979,4,307,21,376.57,21.02,13.6
23 | "22",0.85204,0,8.14,0,0.538,5.965,89.2,4.0123,4,307,21,392.53,13.83,19.6
24 | "23",1.23247,0,8.14,0,0.538,6.142,91.7,3.9769,4,307,21,396.9,18.72,15.2
25 | "24",0.98843,0,8.14,0,0.538,5.813,100,4.0952,4,307,21,394.54,19.88,14.5
26 | "25",0.75026,0,8.14,0,0.538,5.924,94.1,4.3996,4,307,21,394.33,16.3,15.6
27 | "26",0.84054,0,8.14,0,0.538,5.599,85.7,4.4546,4,307,21,303.42,16.51,13.9
28 | "27",0.67191,0,8.14,0,0.538,5.813,90.3,4.682,4,307,21,376.88,14.81,16.6
29 | "28",0.95577,0,8.14,0,0.538,6.047,88.8,4.4534,4,307,21,306.38,17.28,14.8
30 | "29",0.77299,0,8.14,0,0.538,6.495,94.4,4.4547,4,307,21,387.94,12.8,18.4
31 | "30",1.00245,0,8.14,0,0.538,6.674,87.3,4.239,4,307,21,380.23,11.98,21
32 | "31",1.13081,0,8.14,0,0.538,5.713,94.1,4.233,4,307,21,360.17,22.6,12.7
33 | "32",1.35472,0,8.14,0,0.538,6.072,100,4.175,4,307,21,376.73,13.04,14.5
34 | "33",1.38799,0,8.14,0,0.538,5.95,82,3.99,4,307,21,232.6,27.71,13.2
35 | "34",1.15172,0,8.14,0,0.538,5.701,95,3.7872,4,307,21,358.77,18.35,13.1
36 | "35",1.61282,0,8.14,0,0.538,6.096,96.9,3.7598,4,307,21,248.31,20.34,13.5
37 | "36",0.06417,0,5.96,0,0.499,5.933,68.2,3.3603,5,279,19.2,396.9,9.68,18.9
38 | "37",0.09744,0,5.96,0,0.499,5.841,61.4,3.3779,5,279,19.2,377.56,11.41,20
39 | "38",0.08014,0,5.96,0,0.499,5.85,41.5,3.9342,5,279,19.2,396.9,8.77,21
40 | "39",0.17505,0,5.96,0,0.499,5.966,30.2,3.8473,5,279,19.2,393.43,10.13,24.7
41 | "40",0.02763,75,2.95,0,0.428,6.595,21.8,5.4011,3,252,18.3,395.63,4.32,30.8
42 | "41",0.03359,75,2.95,0,0.428,7.024,15.8,5.4011,3,252,18.3,395.62,1.98,34.9
43 | "42",0.12744,0,6.91,0,0.448,6.77,2.9,5.7209,3,233,17.9,385.41,4.84,26.6
44 | "43",0.1415,0,6.91,0,0.448,6.169,6.6,5.7209,3,233,17.9,383.37,5.81,25.3
45 | "44",0.15936,0,6.91,0,0.448,6.211,6.5,5.7209,3,233,17.9,394.46,7.44,24.7
46 | "45",0.12269,0,6.91,0,0.448,6.069,40,5.7209,3,233,17.9,389.39,9.55,21.2
47 | "46",0.17142,0,6.91,0,0.448,5.682,33.8,5.1004,3,233,17.9,396.9,10.21,19.3
48 | "47",0.18836,0,6.91,0,0.448,5.786,33.3,5.1004,3,233,17.9,396.9,14.15,20
49 | "48",0.22927,0,6.91,0,0.448,6.03,85.5,5.6894,3,233,17.9,392.74,18.8,16.6
50 | "49",0.25387,0,6.91,0,0.448,5.399,95.3,5.87,3,233,17.9,396.9,30.81,14.4
51 | "50",0.21977,0,6.91,0,0.448,5.602,62,6.0877,3,233,17.9,396.9,16.2,19.4
52 | "51",0.08873,21,5.64,0,0.439,5.963,45.7,6.8147,4,243,16.8,395.56,13.45,19.7
53 | "52",0.04337,21,5.64,0,0.439,6.115,63,6.8147,4,243,16.8,393.97,9.43,20.5
54 | "53",0.0536,21,5.64,0,0.439,6.511,21.1,6.8147,4,243,16.8,396.9,5.28,25
55 | "54",0.04981,21,5.64,0,0.439,5.998,21.4,6.8147,4,243,16.8,396.9,8.43,23.4
56 | "55",0.0136,75,4,0,0.41,5.888,47.6,7.3197,3,469,21.1,396.9,14.8,18.9
57 | "56",0.01311,90,1.22,0,0.403,7.249,21.9,8.6966,5,226,17.9,395.93,4.81,35.4
58 | "57",0.02055,85,0.74,0,0.41,6.383,35.7,9.1876,2,313,17.3,396.9,5.77,24.7
59 | "58",0.01432,100,1.32,0,0.411,6.816,40.5,8.3248,5,256,15.1,392.9,3.95,31.6
60 | "59",0.15445,25,5.13,0,0.453,6.145,29.2,7.8148,8,284,19.7,390.68,6.86,23.3
61 | "60",0.10328,25,5.13,0,0.453,5.927,47.2,6.932,8,284,19.7,396.9,9.22,19.6
62 | "61",0.14932,25,5.13,0,0.453,5.741,66.2,7.2254,8,284,19.7,395.11,13.15,18.7
63 | "62",0.17171,25,5.13,0,0.453,5.966,93.4,6.8185,8,284,19.7,378.08,14.44,16
64 | "63",0.11027,25,5.13,0,0.453,6.456,67.8,7.2255,8,284,19.7,396.9,6.73,22.2
65 | "64",0.1265,25,5.13,0,0.453,6.762,43.4,7.9809,8,284,19.7,395.58,9.5,25
66 | "65",0.01951,17.5,1.38,0,0.4161,7.104,59.5,9.2229,3,216,18.6,393.24,8.05,33
67 | "66",0.03584,80,3.37,0,0.398,6.29,17.8,6.6115,4,337,16.1,396.9,4.67,23.5
68 | "67",0.04379,80,3.37,0,0.398,5.787,31.1,6.6115,4,337,16.1,396.9,10.24,19.4
69 | "68",0.05789,12.5,6.07,0,0.409,5.878,21.4,6.498,4,345,18.9,396.21,8.1,22
70 | "69",0.13554,12.5,6.07,0,0.409,5.594,36.8,6.498,4,345,18.9,396.9,13.09,17.4
71 | "70",0.12816,12.5,6.07,0,0.409,5.885,33,6.498,4,345,18.9,396.9,8.79,20.9
72 | "71",0.08826,0,10.81,0,0.413,6.417,6.6,5.2873,4,305,19.2,383.73,6.72,24.2
73 | "72",0.15876,0,10.81,0,0.413,5.961,17.5,5.2873,4,305,19.2,376.94,9.88,21.7
74 | "73",0.09164,0,10.81,0,0.413,6.065,7.8,5.2873,4,305,19.2,390.91,5.52,22.8
75 | "74",0.19539,0,10.81,0,0.413,6.245,6.2,5.2873,4,305,19.2,377.17,7.54,23.4
76 | "75",0.07896,0,12.83,0,0.437,6.273,6,4.2515,5,398,18.7,394.92,6.78,24.1
77 | "76",0.09512,0,12.83,0,0.437,6.286,45,4.5026,5,398,18.7,383.23,8.94,21.4
78 | "77",0.10153,0,12.83,0,0.437,6.279,74.5,4.0522,5,398,18.7,373.66,11.97,20
79 | "78",0.08707,0,12.83,0,0.437,6.14,45.8,4.0905,5,398,18.7,386.96,10.27,20.8
80 | "79",0.05646,0,12.83,0,0.437,6.232,53.7,5.0141,5,398,18.7,386.4,12.34,21.2
81 | "80",0.08387,0,12.83,0,0.437,5.874,36.6,4.5026,5,398,18.7,396.06,9.1,20.3
82 | "81",0.04113,25,4.86,0,0.426,6.727,33.5,5.4007,4,281,19,396.9,5.29,28
83 | "82",0.04462,25,4.86,0,0.426,6.619,70.4,5.4007,4,281,19,395.63,7.22,23.9
84 | "83",0.03659,25,4.86,0,0.426,6.302,32.2,5.4007,4,281,19,396.9,6.72,24.8
85 | "84",0.03551,25,4.86,0,0.426,6.167,46.7,5.4007,4,281,19,390.64,7.51,22.9
86 | "85",0.05059,0,4.49,0,0.449,6.389,48,4.7794,3,247,18.5,396.9,9.62,23.9
87 | "86",0.05735,0,4.49,0,0.449,6.63,56.1,4.4377,3,247,18.5,392.3,6.53,26.6
88 | "87",0.05188,0,4.49,0,0.449,6.015,45.1,4.4272,3,247,18.5,395.99,12.86,22.5
89 | "88",0.07151,0,4.49,0,0.449,6.121,56.8,3.7476,3,247,18.5,395.15,8.44,22.2
90 | "89",0.0566,0,3.41,0,0.489,7.007,86.3,3.4217,2,270,17.8,396.9,5.5,23.6
91 | "90",0.05302,0,3.41,0,0.489,7.079,63.1,3.4145,2,270,17.8,396.06,5.7,28.7
92 | "91",0.04684,0,3.41,0,0.489,6.417,66.1,3.0923,2,270,17.8,392.18,8.81,22.6
93 | "92",0.03932,0,3.41,0,0.489,6.405,73.9,3.0921,2,270,17.8,393.55,8.2,22
94 | "93",0.04203,28,15.04,0,0.464,6.442,53.6,3.6659,4,270,18.2,395.01,8.16,22.9
95 | "94",0.02875,28,15.04,0,0.464,6.211,28.9,3.6659,4,270,18.2,396.33,6.21,25
96 | "95",0.04294,28,15.04,0,0.464,6.249,77.3,3.615,4,270,18.2,396.9,10.59,20.6
97 | "96",0.12204,0,2.89,0,0.445,6.625,57.8,3.4952,2,276,18,357.98,6.65,28.4
98 | "97",0.11504,0,2.89,0,0.445,6.163,69.6,3.4952,2,276,18,391.83,11.34,21.4
99 | "98",0.12083,0,2.89,0,0.445,8.069,76,3.4952,2,276,18,396.9,4.21,38.7
100 | "99",0.08187,0,2.89,0,0.445,7.82,36.9,3.4952,2,276,18,393.53,3.57,43.8
101 | "100",0.0686,0,2.89,0,0.445,7.416,62.5,3.4952,2,276,18,396.9,6.19,33.2
102 | "101",0.14866,0,8.56,0,0.52,6.727,79.9,2.7778,5,384,20.9,394.76,9.42,27.5
103 | "102",0.11432,0,8.56,0,0.52,6.781,71.3,2.8561,5,384,20.9,395.58,7.67,26.5
104 | "103",0.22876,0,8.56,0,0.52,6.405,85.4,2.7147,5,384,20.9,70.8,10.63,18.6
105 | "104",0.21161,0,8.56,0,0.52,6.137,87.4,2.7147,5,384,20.9,394.47,13.44,19.3
106 | "105",0.1396,0,8.56,0,0.52,6.167,90,2.421,5,384,20.9,392.69,12.33,20.1
107 | "106",0.13262,0,8.56,0,0.52,5.851,96.7,2.1069,5,384,20.9,394.05,16.47,19.5
108 | "107",0.1712,0,8.56,0,0.52,5.836,91.9,2.211,5,384,20.9,395.67,18.66,19.5
109 | "108",0.13117,0,8.56,0,0.52,6.127,85.2,2.1224,5,384,20.9,387.69,14.09,20.4
110 | "109",0.12802,0,8.56,0,0.52,6.474,97.1,2.4329,5,384,20.9,395.24,12.27,19.8
111 | "110",0.26363,0,8.56,0,0.52,6.229,91.2,2.5451,5,384,20.9,391.23,15.55,19.4
112 | "111",0.10793,0,8.56,0,0.52,6.195,54.4,2.7778,5,384,20.9,393.49,13,21.7
113 | "112",0.10084,0,10.01,0,0.547,6.715,81.6,2.6775,6,432,17.8,395.59,10.16,22.8
114 | "113",0.12329,0,10.01,0,0.547,5.913,92.9,2.3534,6,432,17.8,394.95,16.21,18.8
115 | "114",0.22212,0,10.01,0,0.547,6.092,95.4,2.548,6,432,17.8,396.9,17.09,18.7
116 | "115",0.14231,0,10.01,0,0.547,6.254,84.2,2.2565,6,432,17.8,388.74,10.45,18.5
117 | "116",0.17134,0,10.01,0,0.547,5.928,88.2,2.4631,6,432,17.8,344.91,15.76,18.3
118 | "117",0.13158,0,10.01,0,0.547,6.176,72.5,2.7301,6,432,17.8,393.3,12.04,21.2
119 | "118",0.15098,0,10.01,0,0.547,6.021,82.6,2.7474,6,432,17.8,394.51,10.3,19.2
120 | "119",0.13058,0,10.01,0,0.547,5.872,73.1,2.4775,6,432,17.8,338.63,15.37,20.4
121 | "120",0.14476,0,10.01,0,0.547,5.731,65.2,2.7592,6,432,17.8,391.5,13.61,19.3
122 | "121",0.06899,0,25.65,0,0.581,5.87,69.7,2.2577,2,188,19.1,389.15,14.37,22
123 | "122",0.07165,0,25.65,0,0.581,6.004,84.1,2.1974,2,188,19.1,377.67,14.27,20.3
124 | "123",0.09299,0,25.65,0,0.581,5.961,92.9,2.0869,2,188,19.1,378.09,17.93,20.5
125 | "124",0.15038,0,25.65,0,0.581,5.856,97,1.9444,2,188,19.1,370.31,25.41,17.3
126 | "125",0.09849,0,25.65,0,0.581,5.879,95.8,2.0063,2,188,19.1,379.38,17.58,18.8
127 | "126",0.16902,0,25.65,0,0.581,5.986,88.4,1.9929,2,188,19.1,385.02,14.81,21.4
128 | "127",0.38735,0,25.65,0,0.581,5.613,95.6,1.7572,2,188,19.1,359.29,27.26,15.7
129 | "128",0.25915,0,21.89,0,0.624,5.693,96,1.7883,4,437,21.2,392.11,17.19,16.2
130 | "129",0.32543,0,21.89,0,0.624,6.431,98.8,1.8125,4,437,21.2,396.9,15.39,18
131 | "130",0.88125,0,21.89,0,0.624,5.637,94.7,1.9799,4,437,21.2,396.9,18.34,14.3
132 | "131",0.34006,0,21.89,0,0.624,6.458,98.9,2.1185,4,437,21.2,395.04,12.6,19.2
133 | "132",1.19294,0,21.89,0,0.624,6.326,97.7,2.271,4,437,21.2,396.9,12.26,19.6
134 | "133",0.59005,0,21.89,0,0.624,6.372,97.9,2.3274,4,437,21.2,385.76,11.12,23
135 | "134",0.32982,0,21.89,0,0.624,5.822,95.4,2.4699,4,437,21.2,388.69,15.03,18.4
136 | "135",0.97617,0,21.89,0,0.624,5.757,98.4,2.346,4,437,21.2,262.76,17.31,15.6
137 | "136",0.55778,0,21.89,0,0.624,6.335,98.2,2.1107,4,437,21.2,394.67,16.96,18.1
138 | "137",0.32264,0,21.89,0,0.624,5.942,93.5,1.9669,4,437,21.2,378.25,16.9,17.4
139 | "138",0.35233,0,21.89,0,0.624,6.454,98.4,1.8498,4,437,21.2,394.08,14.59,17.1
140 | "139",0.2498,0,21.89,0,0.624,5.857,98.2,1.6686,4,437,21.2,392.04,21.32,13.3
141 | "140",0.54452,0,21.89,0,0.624,6.151,97.9,1.6687,4,437,21.2,396.9,18.46,17.8
142 | "141",0.2909,0,21.89,0,0.624,6.174,93.6,1.6119,4,437,21.2,388.08,24.16,14
143 | "142",1.62864,0,21.89,0,0.624,5.019,100,1.4394,4,437,21.2,396.9,34.41,14.4
144 | "143",3.32105,0,19.58,1,0.871,5.403,100,1.3216,5,403,14.7,396.9,26.82,13.4
145 | "144",4.0974,0,19.58,0,0.871,5.468,100,1.4118,5,403,14.7,396.9,26.42,15.6
146 | "145",2.77974,0,19.58,0,0.871,4.903,97.8,1.3459,5,403,14.7,396.9,29.29,11.8
147 | "146",2.37934,0,19.58,0,0.871,6.13,100,1.4191,5,403,14.7,172.91,27.8,13.8
148 | "147",2.15505,0,19.58,0,0.871,5.628,100,1.5166,5,403,14.7,169.27,16.65,15.6
149 | "148",2.36862,0,19.58,0,0.871,4.926,95.7,1.4608,5,403,14.7,391.71,29.53,14.6
150 | "149",2.33099,0,19.58,0,0.871,5.186,93.8,1.5296,5,403,14.7,356.99,28.32,17.8
151 | "150",2.73397,0,19.58,0,0.871,5.597,94.9,1.5257,5,403,14.7,351.85,21.45,15.4
152 | "151",1.6566,0,19.58,0,0.871,6.122,97.3,1.618,5,403,14.7,372.8,14.1,21.5
153 | "152",1.49632,0,19.58,0,0.871,5.404,100,1.5916,5,403,14.7,341.6,13.28,19.6
154 | "153",1.12658,0,19.58,1,0.871,5.012,88,1.6102,5,403,14.7,343.28,12.12,15.3
155 | "154",2.14918,0,19.58,0,0.871,5.709,98.5,1.6232,5,403,14.7,261.95,15.79,19.4
156 | "155",1.41385,0,19.58,1,0.871,6.129,96,1.7494,5,403,14.7,321.02,15.12,17
157 | "156",3.53501,0,19.58,1,0.871,6.152,82.6,1.7455,5,403,14.7,88.01,15.02,15.6
158 | "157",2.44668,0,19.58,0,0.871,5.272,94,1.7364,5,403,14.7,88.63,16.14,13.1
159 | "158",1.22358,0,19.58,0,0.605,6.943,97.4,1.8773,5,403,14.7,363.43,4.59,41.3
160 | "159",1.34284,0,19.58,0,0.605,6.066,100,1.7573,5,403,14.7,353.89,6.43,24.3
161 | "160",1.42502,0,19.58,0,0.871,6.51,100,1.7659,5,403,14.7,364.31,7.39,23.3
162 | "161",1.27346,0,19.58,1,0.605,6.25,92.6,1.7984,5,403,14.7,338.92,5.5,27
163 | "162",1.46336,0,19.58,0,0.605,7.489,90.8,1.9709,5,403,14.7,374.43,1.73,50
164 | "163",1.83377,0,19.58,1,0.605,7.802,98.2,2.0407,5,403,14.7,389.61,1.92,50
165 | "164",1.51902,0,19.58,1,0.605,8.375,93.9,2.162,5,403,14.7,388.45,3.32,50
166 | "165",2.24236,0,19.58,0,0.605,5.854,91.8,2.422,5,403,14.7,395.11,11.64,22.7
167 | "166",2.924,0,19.58,0,0.605,6.101,93,2.2834,5,403,14.7,240.16,9.81,25
168 | "167",2.01019,0,19.58,0,0.605,7.929,96.2,2.0459,5,403,14.7,369.3,3.7,50
169 | "168",1.80028,0,19.58,0,0.605,5.877,79.2,2.4259,5,403,14.7,227.61,12.14,23.8
170 | "169",2.3004,0,19.58,0,0.605,6.319,96.1,2.1,5,403,14.7,297.09,11.1,23.8
171 | "170",2.44953,0,19.58,0,0.605,6.402,95.2,2.2625,5,403,14.7,330.04,11.32,22.3
172 | "171",1.20742,0,19.58,0,0.605,5.875,94.6,2.4259,5,403,14.7,292.29,14.43,17.4
173 | "172",2.3139,0,19.58,0,0.605,5.88,97.3,2.3887,5,403,14.7,348.13,12.03,19.1
174 | "173",0.13914,0,4.05,0,0.51,5.572,88.5,2.5961,5,296,16.6,396.9,14.69,23.1
175 | "174",0.09178,0,4.05,0,0.51,6.416,84.1,2.6463,5,296,16.6,395.5,9.04,23.6
176 | "175",0.08447,0,4.05,0,0.51,5.859,68.7,2.7019,5,296,16.6,393.23,9.64,22.6
177 | "176",0.06664,0,4.05,0,0.51,6.546,33.1,3.1323,5,296,16.6,390.96,5.33,29.4
178 | "177",0.07022,0,4.05,0,0.51,6.02,47.2,3.5549,5,296,16.6,393.23,10.11,23.2
179 | "178",0.05425,0,4.05,0,0.51,6.315,73.4,3.3175,5,296,16.6,395.6,6.29,24.6
180 | "179",0.06642,0,4.05,0,0.51,6.86,74.4,2.9153,5,296,16.6,391.27,6.92,29.9
181 | "180",0.0578,0,2.46,0,0.488,6.98,58.4,2.829,3,193,17.8,396.9,5.04,37.2
182 | "181",0.06588,0,2.46,0,0.488,7.765,83.3,2.741,3,193,17.8,395.56,7.56,39.8
183 | "182",0.06888,0,2.46,0,0.488,6.144,62.2,2.5979,3,193,17.8,396.9,9.45,36.2
184 | "183",0.09103,0,2.46,0,0.488,7.155,92.2,2.7006,3,193,17.8,394.12,4.82,37.9
185 | "184",0.10008,0,2.46,0,0.488,6.563,95.6,2.847,3,193,17.8,396.9,5.68,32.5
186 | "185",0.08308,0,2.46,0,0.488,5.604,89.8,2.9879,3,193,17.8,391,13.98,26.4
187 | "186",0.06047,0,2.46,0,0.488,6.153,68.8,3.2797,3,193,17.8,387.11,13.15,29.6
188 | "187",0.05602,0,2.46,0,0.488,7.831,53.6,3.1992,3,193,17.8,392.63,4.45,50
189 | "188",0.07875,45,3.44,0,0.437,6.782,41.1,3.7886,5,398,15.2,393.87,6.68,32
190 | "189",0.12579,45,3.44,0,0.437,6.556,29.1,4.5667,5,398,15.2,382.84,4.56,29.8
191 | "190",0.0837,45,3.44,0,0.437,7.185,38.9,4.5667,5,398,15.2,396.9,5.39,34.9
192 | "191",0.09068,45,3.44,0,0.437,6.951,21.5,6.4798,5,398,15.2,377.68,5.1,37
193 | "192",0.06911,45,3.44,0,0.437,6.739,30.8,6.4798,5,398,15.2,389.71,4.69,30.5
194 | "193",0.08664,45,3.44,0,0.437,7.178,26.3,6.4798,5,398,15.2,390.49,2.87,36.4
195 | "194",0.02187,60,2.93,0,0.401,6.8,9.9,6.2196,1,265,15.6,393.37,5.03,31.1
196 | "195",0.01439,60,2.93,0,0.401,6.604,18.8,6.2196,1,265,15.6,376.7,4.38,29.1
197 | "196",0.01381,80,0.46,0,0.422,7.875,32,5.6484,4,255,14.4,394.23,2.97,50
198 | "197",0.04011,80,1.52,0,0.404,7.287,34.1,7.309,2,329,12.6,396.9,4.08,33.3
199 | "198",0.04666,80,1.52,0,0.404,7.107,36.6,7.309,2,329,12.6,354.31,8.61,30.3
200 | "199",0.03768,80,1.52,0,0.404,7.274,38.3,7.309,2,329,12.6,392.2,6.62,34.6
201 | "200",0.0315,95,1.47,0,0.403,6.975,15.3,7.6534,3,402,17,396.9,4.56,34.9
202 | "201",0.01778,95,1.47,0,0.403,7.135,13.9,7.6534,3,402,17,384.3,4.45,32.9
203 | "202",0.03445,82.5,2.03,0,0.415,6.162,38.4,6.27,2,348,14.7,393.77,7.43,24.1
204 | "203",0.02177,82.5,2.03,0,0.415,7.61,15.7,6.27,2,348,14.7,395.38,3.11,42.3
205 | "204",0.0351,95,2.68,0,0.4161,7.853,33.2,5.118,4,224,14.7,392.78,3.81,48.5
206 | "205",0.02009,95,2.68,0,0.4161,8.034,31.9,5.118,4,224,14.7,390.55,2.88,50
207 | "206",0.13642,0,10.59,0,0.489,5.891,22.3,3.9454,4,277,18.6,396.9,10.87,22.6
208 | "207",0.22969,0,10.59,0,0.489,6.326,52.5,4.3549,4,277,18.6,394.87,10.97,24.4
209 | "208",0.25199,0,10.59,0,0.489,5.783,72.7,4.3549,4,277,18.6,389.43,18.06,22.5
210 | "209",0.13587,0,10.59,1,0.489,6.064,59.1,4.2392,4,277,18.6,381.32,14.66,24.4
211 | "210",0.43571,0,10.59,1,0.489,5.344,100,3.875,4,277,18.6,396.9,23.09,20
212 | "211",0.17446,0,10.59,1,0.489,5.96,92.1,3.8771,4,277,18.6,393.25,17.27,21.7
213 | "212",0.37578,0,10.59,1,0.489,5.404,88.6,3.665,4,277,18.6,395.24,23.98,19.3
214 | "213",0.21719,0,10.59,1,0.489,5.807,53.8,3.6526,4,277,18.6,390.94,16.03,22.4
215 | "214",0.14052,0,10.59,0,0.489,6.375,32.3,3.9454,4,277,18.6,385.81,9.38,28.1
216 | "215",0.28955,0,10.59,0,0.489,5.412,9.8,3.5875,4,277,18.6,348.93,29.55,23.7
217 | "216",0.19802,0,10.59,0,0.489,6.182,42.4,3.9454,4,277,18.6,393.63,9.47,25
218 | "217",0.0456,0,13.89,1,0.55,5.888,56,3.1121,5,276,16.4,392.8,13.51,23.3
219 | "218",0.07013,0,13.89,0,0.55,6.642,85.1,3.4211,5,276,16.4,392.78,9.69,28.7
220 | "219",0.11069,0,13.89,1,0.55,5.951,93.8,2.8893,5,276,16.4,396.9,17.92,21.5
221 | "220",0.11425,0,13.89,1,0.55,6.373,92.4,3.3633,5,276,16.4,393.74,10.5,23
222 | "221",0.35809,0,6.2,1,0.507,6.951,88.5,2.8617,8,307,17.4,391.7,9.71,26.7
223 | "222",0.40771,0,6.2,1,0.507,6.164,91.3,3.048,8,307,17.4,395.24,21.46,21.7
224 | "223",0.62356,0,6.2,1,0.507,6.879,77.7,3.2721,8,307,17.4,390.39,9.93,27.5
225 | "224",0.6147,0,6.2,0,0.507,6.618,80.8,3.2721,8,307,17.4,396.9,7.6,30.1
226 | "225",0.31533,0,6.2,0,0.504,8.266,78.3,2.8944,8,307,17.4,385.05,4.14,44.8
227 | "226",0.52693,0,6.2,0,0.504,8.725,83,2.8944,8,307,17.4,382,4.63,50
228 | "227",0.38214,0,6.2,0,0.504,8.04,86.5,3.2157,8,307,17.4,387.38,3.13,37.6
229 | "228",0.41238,0,6.2,0,0.504,7.163,79.9,3.2157,8,307,17.4,372.08,6.36,31.6
230 | "229",0.29819,0,6.2,0,0.504,7.686,17,3.3751,8,307,17.4,377.51,3.92,46.7
231 | "230",0.44178,0,6.2,0,0.504,6.552,21.4,3.3751,8,307,17.4,380.34,3.76,31.5
232 | "231",0.537,0,6.2,0,0.504,5.981,68.1,3.6715,8,307,17.4,378.35,11.65,24.3
233 | "232",0.46296,0,6.2,0,0.504,7.412,76.9,3.6715,8,307,17.4,376.14,5.25,31.7
234 | "233",0.57529,0,6.2,0,0.507,8.337,73.3,3.8384,8,307,17.4,385.91,2.47,41.7
235 | "234",0.33147,0,6.2,0,0.507,8.247,70.4,3.6519,8,307,17.4,378.95,3.95,48.3
236 | "235",0.44791,0,6.2,1,0.507,6.726,66.5,3.6519,8,307,17.4,360.2,8.05,29
237 | "236",0.33045,0,6.2,0,0.507,6.086,61.5,3.6519,8,307,17.4,376.75,10.88,24
238 | "237",0.52058,0,6.2,1,0.507,6.631,76.5,4.148,8,307,17.4,388.45,9.54,25.1
239 | "238",0.51183,0,6.2,0,0.507,7.358,71.6,4.148,8,307,17.4,390.07,4.73,31.5
240 | "239",0.08244,30,4.93,0,0.428,6.481,18.5,6.1899,6,300,16.6,379.41,6.36,23.7
241 | "240",0.09252,30,4.93,0,0.428,6.606,42.2,6.1899,6,300,16.6,383.78,7.37,23.3
242 | "241",0.11329,30,4.93,0,0.428,6.897,54.3,6.3361,6,300,16.6,391.25,11.38,22
243 | "242",0.10612,30,4.93,0,0.428,6.095,65.1,6.3361,6,300,16.6,394.62,12.4,20.1
244 | "243",0.1029,30,4.93,0,0.428,6.358,52.9,7.0355,6,300,16.6,372.75,11.22,22.2
245 | "244",0.12757,30,4.93,0,0.428,6.393,7.8,7.0355,6,300,16.6,374.71,5.19,23.7
246 | "245",0.20608,22,5.86,0,0.431,5.593,76.5,7.9549,7,330,19.1,372.49,12.5,17.6
247 | "246",0.19133,22,5.86,0,0.431,5.605,70.2,7.9549,7,330,19.1,389.13,18.46,18.5
248 | "247",0.33983,22,5.86,0,0.431,6.108,34.9,8.0555,7,330,19.1,390.18,9.16,24.3
249 | "248",0.19657,22,5.86,0,0.431,6.226,79.2,8.0555,7,330,19.1,376.14,10.15,20.5
250 | "249",0.16439,22,5.86,0,0.431,6.433,49.1,7.8265,7,330,19.1,374.71,9.52,24.5
251 | "250",0.19073,22,5.86,0,0.431,6.718,17.5,7.8265,7,330,19.1,393.74,6.56,26.2
252 | "251",0.1403,22,5.86,0,0.431,6.487,13,7.3967,7,330,19.1,396.28,5.9,24.4
253 | "252",0.21409,22,5.86,0,0.431,6.438,8.9,7.3967,7,330,19.1,377.07,3.59,24.8
254 | "253",0.08221,22,5.86,0,0.431,6.957,6.8,8.9067,7,330,19.1,386.09,3.53,29.6
255 | "254",0.36894,22,5.86,0,0.431,8.259,8.4,8.9067,7,330,19.1,396.9,3.54,42.8
256 | "255",0.04819,80,3.64,0,0.392,6.108,32,9.2203,1,315,16.4,392.89,6.57,21.9
257 | "256",0.03548,80,3.64,0,0.392,5.876,19.1,9.2203,1,315,16.4,395.18,9.25,20.9
258 | "257",0.01538,90,3.75,0,0.394,7.454,34.2,6.3361,3,244,15.9,386.34,3.11,44
259 | "258",0.61154,20,3.97,0,0.647,8.704,86.9,1.801,5,264,13,389.7,5.12,50
260 | "259",0.66351,20,3.97,0,0.647,7.333,100,1.8946,5,264,13,383.29,7.79,36
261 | "260",0.65665,20,3.97,0,0.647,6.842,100,2.0107,5,264,13,391.93,6.9,30.1
262 | "261",0.54011,20,3.97,0,0.647,7.203,81.8,2.1121,5,264,13,392.8,9.59,33.8
263 | "262",0.53412,20,3.97,0,0.647,7.52,89.4,2.1398,5,264,13,388.37,7.26,43.1
264 | "263",0.52014,20,3.97,0,0.647,8.398,91.5,2.2885,5,264,13,386.86,5.91,48.8
265 | "264",0.82526,20,3.97,0,0.647,7.327,94.5,2.0788,5,264,13,393.42,11.25,31
266 | "265",0.55007,20,3.97,0,0.647,7.206,91.6,1.9301,5,264,13,387.89,8.1,36.5
267 | "266",0.76162,20,3.97,0,0.647,5.56,62.8,1.9865,5,264,13,392.4,10.45,22.8
268 | "267",0.7857,20,3.97,0,0.647,7.014,84.6,2.1329,5,264,13,384.07,14.79,30.7
269 | "268",0.57834,20,3.97,0,0.575,8.297,67,2.4216,5,264,13,384.54,7.44,50
270 | "269",0.5405,20,3.97,0,0.575,7.47,52.6,2.872,5,264,13,390.3,3.16,43.5
271 | "270",0.09065,20,6.96,1,0.464,5.92,61.5,3.9175,3,223,18.6,391.34,13.65,20.7
272 | "271",0.29916,20,6.96,0,0.464,5.856,42.1,4.429,3,223,18.6,388.65,13,21.1
273 | "272",0.16211,20,6.96,0,0.464,6.24,16.3,4.429,3,223,18.6,396.9,6.59,25.2
274 | "273",0.1146,20,6.96,0,0.464,6.538,58.7,3.9175,3,223,18.6,394.96,7.73,24.4
275 | "274",0.22188,20,6.96,1,0.464,7.691,51.8,4.3665,3,223,18.6,390.77,6.58,35.2
276 | "275",0.05644,40,6.41,1,0.447,6.758,32.9,4.0776,4,254,17.6,396.9,3.53,32.4
277 | "276",0.09604,40,6.41,0,0.447,6.854,42.8,4.2673,4,254,17.6,396.9,2.98,32
278 | "277",0.10469,40,6.41,1,0.447,7.267,49,4.7872,4,254,17.6,389.25,6.05,33.2
279 | "278",0.06127,40,6.41,1,0.447,6.826,27.6,4.8628,4,254,17.6,393.45,4.16,33.1
280 | "279",0.07978,40,6.41,0,0.447,6.482,32.1,4.1403,4,254,17.6,396.9,7.19,29.1
281 | "280",0.21038,20,3.33,0,0.4429,6.812,32.2,4.1007,5,216,14.9,396.9,4.85,35.1
282 | "281",0.03578,20,3.33,0,0.4429,7.82,64.5,4.6947,5,216,14.9,387.31,3.76,45.4
283 | "282",0.03705,20,3.33,0,0.4429,6.968,37.2,5.2447,5,216,14.9,392.23,4.59,35.4
284 | "283",0.06129,20,3.33,1,0.4429,7.645,49.7,5.2119,5,216,14.9,377.07,3.01,46
285 | "284",0.01501,90,1.21,1,0.401,7.923,24.8,5.885,1,198,13.6,395.52,3.16,50
286 | "285",0.00906,90,2.97,0,0.4,7.088,20.8,7.3073,1,285,15.3,394.72,7.85,32.2
287 | "286",0.01096,55,2.25,0,0.389,6.453,31.9,7.3073,1,300,15.3,394.72,8.23,22
288 | "287",0.01965,80,1.76,0,0.385,6.23,31.5,9.0892,1,241,18.2,341.6,12.93,20.1
289 | "288",0.03871,52.5,5.32,0,0.405,6.209,31.3,7.3172,6,293,16.6,396.9,7.14,23.2
290 | "289",0.0459,52.5,5.32,0,0.405,6.315,45.6,7.3172,6,293,16.6,396.9,7.6,22.3
291 | "290",0.04297,52.5,5.32,0,0.405,6.565,22.9,7.3172,6,293,16.6,371.72,9.51,24.8
292 | "291",0.03502,80,4.95,0,0.411,6.861,27.9,5.1167,4,245,19.2,396.9,3.33,28.5
293 | "292",0.07886,80,4.95,0,0.411,7.148,27.7,5.1167,4,245,19.2,396.9,3.56,37.3
294 | "293",0.03615,80,4.95,0,0.411,6.63,23.4,5.1167,4,245,19.2,396.9,4.7,27.9
295 | "294",0.08265,0,13.92,0,0.437,6.127,18.4,5.5027,4,289,16,396.9,8.58,23.9
296 | "295",0.08199,0,13.92,0,0.437,6.009,42.3,5.5027,4,289,16,396.9,10.4,21.7
297 | "296",0.12932,0,13.92,0,0.437,6.678,31.1,5.9604,4,289,16,396.9,6.27,28.6
298 | "297",0.05372,0,13.92,0,0.437,6.549,51,5.9604,4,289,16,392.85,7.39,27.1
299 | "298",0.14103,0,13.92,0,0.437,5.79,58,6.32,4,289,16,396.9,15.84,20.3
300 | "299",0.06466,70,2.24,0,0.4,6.345,20.1,7.8278,5,358,14.8,368.24,4.97,22.5
301 | "300",0.05561,70,2.24,0,0.4,7.041,10,7.8278,5,358,14.8,371.58,4.74,29
302 | "301",0.04417,70,2.24,0,0.4,6.871,47.4,7.8278,5,358,14.8,390.86,6.07,24.8
303 | "302",0.03537,34,6.09,0,0.433,6.59,40.4,5.4917,7,329,16.1,395.75,9.5,22
304 | "303",0.09266,34,6.09,0,0.433,6.495,18.4,5.4917,7,329,16.1,383.61,8.67,26.4
305 | "304",0.1,34,6.09,0,0.433,6.982,17.7,5.4917,7,329,16.1,390.43,4.86,33.1
306 | "305",0.05515,33,2.18,0,0.472,7.236,41.1,4.022,7,222,18.4,393.68,6.93,36.1
307 | "306",0.05479,33,2.18,0,0.472,6.616,58.1,3.37,7,222,18.4,393.36,8.93,28.4
308 | "307",0.07503,33,2.18,0,0.472,7.42,71.9,3.0992,7,222,18.4,396.9,6.47,33.4
309 | "308",0.04932,33,2.18,0,0.472,6.849,70.3,3.1827,7,222,18.4,396.9,7.53,28.2
310 | "309",0.49298,0,9.9,0,0.544,6.635,82.5,3.3175,4,304,18.4,396.9,4.54,22.8
311 | "310",0.3494,0,9.9,0,0.544,5.972,76.7,3.1025,4,304,18.4,396.24,9.97,20.3
312 | "311",2.63548,0,9.9,0,0.544,4.973,37.8,2.5194,4,304,18.4,350.45,12.64,16.1
313 | "312",0.79041,0,9.9,0,0.544,6.122,52.8,2.6403,4,304,18.4,396.9,5.98,22.1
314 | "313",0.26169,0,9.9,0,0.544,6.023,90.4,2.834,4,304,18.4,396.3,11.72,19.4
315 | "314",0.26938,0,9.9,0,0.544,6.266,82.8,3.2628,4,304,18.4,393.39,7.9,21.6
316 | "315",0.3692,0,9.9,0,0.544,6.567,87.3,3.6023,4,304,18.4,395.69,9.28,23.8
317 | "316",0.25356,0,9.9,0,0.544,5.705,77.7,3.945,4,304,18.4,396.42,11.5,16.2
318 | "317",0.31827,0,9.9,0,0.544,5.914,83.2,3.9986,4,304,18.4,390.7,18.33,17.8
319 | "318",0.24522,0,9.9,0,0.544,5.782,71.7,4.0317,4,304,18.4,396.9,15.94,19.8
320 | "319",0.40202,0,9.9,0,0.544,6.382,67.2,3.5325,4,304,18.4,395.21,10.36,23.1
321 | "320",0.47547,0,9.9,0,0.544,6.113,58.8,4.0019,4,304,18.4,396.23,12.73,21
322 | "321",0.1676,0,7.38,0,0.493,6.426,52.3,4.5404,5,287,19.6,396.9,7.2,23.8
323 | "322",0.18159,0,7.38,0,0.493,6.376,54.3,4.5404,5,287,19.6,396.9,6.87,23.1
324 | "323",0.35114,0,7.38,0,0.493,6.041,49.9,4.7211,5,287,19.6,396.9,7.7,20.4
325 | "324",0.28392,0,7.38,0,0.493,5.708,74.3,4.7211,5,287,19.6,391.13,11.74,18.5
326 | "325",0.34109,0,7.38,0,0.493,6.415,40.1,4.7211,5,287,19.6,396.9,6.12,25
327 | "326",0.19186,0,7.38,0,0.493,6.431,14.7,5.4159,5,287,19.6,393.68,5.08,24.6
328 | "327",0.30347,0,7.38,0,0.493,6.312,28.9,5.4159,5,287,19.6,396.9,6.15,23
329 | "328",0.24103,0,7.38,0,0.493,6.083,43.7,5.4159,5,287,19.6,396.9,12.79,22.2
330 | "329",0.06617,0,3.24,0,0.46,5.868,25.8,5.2146,4,430,16.9,382.44,9.97,19.3
331 | "330",0.06724,0,3.24,0,0.46,6.333,17.2,5.2146,4,430,16.9,375.21,7.34,22.6
332 | "331",0.04544,0,3.24,0,0.46,6.144,32.2,5.8736,4,430,16.9,368.57,9.09,19.8
333 | "332",0.05023,35,6.06,0,0.4379,5.706,28.4,6.6407,1,304,16.9,394.02,12.43,17.1
334 | "333",0.03466,35,6.06,0,0.4379,6.031,23.3,6.6407,1,304,16.9,362.25,7.83,19.4
335 | "334",0.05083,0,5.19,0,0.515,6.316,38.1,6.4584,5,224,20.2,389.71,5.68,22.2
336 | "335",0.03738,0,5.19,0,0.515,6.31,38.5,6.4584,5,224,20.2,389.4,6.75,20.7
337 | "336",0.03961,0,5.19,0,0.515,6.037,34.5,5.9853,5,224,20.2,396.9,8.01,21.1
338 | "337",0.03427,0,5.19,0,0.515,5.869,46.3,5.2311,5,224,20.2,396.9,9.8,19.5
339 | "338",0.03041,0,5.19,0,0.515,5.895,59.6,5.615,5,224,20.2,394.81,10.56,18.5
340 | "339",0.03306,0,5.19,0,0.515,6.059,37.3,4.8122,5,224,20.2,396.14,8.51,20.6
341 | "340",0.05497,0,5.19,0,0.515,5.985,45.4,4.8122,5,224,20.2,396.9,9.74,19
342 | "341",0.06151,0,5.19,0,0.515,5.968,58.5,4.8122,5,224,20.2,396.9,9.29,18.7
343 | "342",0.01301,35,1.52,0,0.442,7.241,49.3,7.0379,1,284,15.5,394.74,5.49,32.7
344 | "343",0.02498,0,1.89,0,0.518,6.54,59.7,6.2669,1,422,15.9,389.96,8.65,16.5
345 | "344",0.02543,55,3.78,0,0.484,6.696,56.4,5.7321,5,370,17.6,396.9,7.18,23.9
346 | "345",0.03049,55,3.78,0,0.484,6.874,28.1,6.4654,5,370,17.6,387.97,4.61,31.2
347 | "346",0.03113,0,4.39,0,0.442,6.014,48.5,8.0136,3,352,18.8,385.64,10.53,17.5
348 | "347",0.06162,0,4.39,0,0.442,5.898,52.3,8.0136,3,352,18.8,364.61,12.67,17.2
349 | "348",0.0187,85,4.15,0,0.429,6.516,27.7,8.5353,4,351,17.9,392.43,6.36,23.1
350 | "349",0.01501,80,2.01,0,0.435,6.635,29.7,8.344,4,280,17,390.94,5.99,24.5
351 | "350",0.02899,40,1.25,0,0.429,6.939,34.5,8.7921,1,335,19.7,389.85,5.89,26.6
352 | "351",0.06211,40,1.25,0,0.429,6.49,44.4,8.7921,1,335,19.7,396.9,5.98,22.9
353 | "352",0.0795,60,1.69,0,0.411,6.579,35.9,10.7103,4,411,18.3,370.78,5.49,24.1
354 | "353",0.07244,60,1.69,0,0.411,5.884,18.5,10.7103,4,411,18.3,392.33,7.79,18.6
355 | "354",0.01709,90,2.02,0,0.41,6.728,36.1,12.1265,5,187,17,384.46,4.5,30.1
356 | "355",0.04301,80,1.91,0,0.413,5.663,21.9,10.5857,4,334,22,382.8,8.05,18.2
357 | "356",0.10659,80,1.91,0,0.413,5.936,19.5,10.5857,4,334,22,376.04,5.57,20.6
358 | "357",8.98296,0,18.1,1,0.77,6.212,97.4,2.1222,24,666,20.2,377.73,17.6,17.8
359 | "358",3.8497,0,18.1,1,0.77,6.395,91,2.5052,24,666,20.2,391.34,13.27,21.7
360 | "359",5.20177,0,18.1,1,0.77,6.127,83.4,2.7227,24,666,20.2,395.43,11.48,22.7
361 | "360",4.26131,0,18.1,0,0.77,6.112,81.3,2.5091,24,666,20.2,390.74,12.67,22.6
362 | "361",4.54192,0,18.1,0,0.77,6.398,88,2.5182,24,666,20.2,374.56,7.79,25
363 | "362",3.83684,0,18.1,0,0.77,6.251,91.1,2.2955,24,666,20.2,350.65,14.19,19.9
364 | "363",3.67822,0,18.1,0,0.77,5.362,96.2,2.1036,24,666,20.2,380.79,10.19,20.8
365 | "364",4.22239,0,18.1,1,0.77,5.803,89,1.9047,24,666,20.2,353.04,14.64,16.8
366 | "365",3.47428,0,18.1,1,0.718,8.78,82.9,1.9047,24,666,20.2,354.55,5.29,21.9
367 | "366",4.55587,0,18.1,0,0.718,3.561,87.9,1.6132,24,666,20.2,354.7,7.12,27.5
368 | "367",3.69695,0,18.1,0,0.718,4.963,91.4,1.7523,24,666,20.2,316.03,14,21.9
369 | "368",13.5222,0,18.1,0,0.631,3.863,100,1.5106,24,666,20.2,131.42,13.33,23.1
370 | "369",4.89822,0,18.1,0,0.631,4.97,100,1.3325,24,666,20.2,375.52,3.26,50
371 | "370",5.66998,0,18.1,1,0.631,6.683,96.8,1.3567,24,666,20.2,375.33,3.73,50
372 | "371",6.53876,0,18.1,1,0.631,7.016,97.5,1.2024,24,666,20.2,392.05,2.96,50
373 | "372",9.2323,0,18.1,0,0.631,6.216,100,1.1691,24,666,20.2,366.15,9.53,50
374 | "373",8.26725,0,18.1,1,0.668,5.875,89.6,1.1296,24,666,20.2,347.88,8.88,50
375 | "374",11.1081,0,18.1,0,0.668,4.906,100,1.1742,24,666,20.2,396.9,34.77,13.8
376 | "375",18.4982,0,18.1,0,0.668,4.138,100,1.137,24,666,20.2,396.9,37.97,13.8
377 | "376",19.6091,0,18.1,0,0.671,7.313,97.9,1.3163,24,666,20.2,396.9,13.44,15
378 | "377",15.288,0,18.1,0,0.671,6.649,93.3,1.3449,24,666,20.2,363.02,23.24,13.9
379 | "378",9.82349,0,18.1,0,0.671,6.794,98.8,1.358,24,666,20.2,396.9,21.24,13.3
380 | "379",23.6482,0,18.1,0,0.671,6.38,96.2,1.3861,24,666,20.2,396.9,23.69,13.1
381 | "380",17.8667,0,18.1,0,0.671,6.223,100,1.3861,24,666,20.2,393.74,21.78,10.2
382 | "381",88.9762,0,18.1,0,0.671,6.968,91.9,1.4165,24,666,20.2,396.9,17.21,10.4
383 | "382",15.8744,0,18.1,0,0.671,6.545,99.1,1.5192,24,666,20.2,396.9,21.08,10.9
384 | "383",9.18702,0,18.1,0,0.7,5.536,100,1.5804,24,666,20.2,396.9,23.6,11.3
385 | "384",7.99248,0,18.1,0,0.7,5.52,100,1.5331,24,666,20.2,396.9,24.56,12.3
386 | "385",20.0849,0,18.1,0,0.7,4.368,91.2,1.4395,24,666,20.2,285.83,30.63,8.8
387 | "386",16.8118,0,18.1,0,0.7,5.277,98.1,1.4261,24,666,20.2,396.9,30.81,7.2
388 | "387",24.3938,0,18.1,0,0.7,4.652,100,1.4672,24,666,20.2,396.9,28.28,10.5
389 | "388",22.5971,0,18.1,0,0.7,5,89.5,1.5184,24,666,20.2,396.9,31.99,7.4
390 | "389",14.3337,0,18.1,0,0.7,4.88,100,1.5895,24,666,20.2,372.92,30.62,10.2
391 | "390",8.15174,0,18.1,0,0.7,5.39,98.9,1.7281,24,666,20.2,396.9,20.85,11.5
392 | "391",6.96215,0,18.1,0,0.7,5.713,97,1.9265,24,666,20.2,394.43,17.11,15.1
393 | "392",5.29305,0,18.1,0,0.7,6.051,82.5,2.1678,24,666,20.2,378.38,18.76,23.2
394 | "393",11.5779,0,18.1,0,0.7,5.036,97,1.77,24,666,20.2,396.9,25.68,9.7
395 | "394",8.64476,0,18.1,0,0.693,6.193,92.6,1.7912,24,666,20.2,396.9,15.17,13.8
396 | "395",13.3598,0,18.1,0,0.693,5.887,94.7,1.7821,24,666,20.2,396.9,16.35,12.7
397 | "396",8.71675,0,18.1,0,0.693,6.471,98.8,1.7257,24,666,20.2,391.98,17.12,13.1
398 | "397",5.87205,0,18.1,0,0.693,6.405,96,1.6768,24,666,20.2,396.9,19.37,12.5
399 | "398",7.67202,0,18.1,0,0.693,5.747,98.9,1.6334,24,666,20.2,393.1,19.92,8.5
400 | "399",38.3518,0,18.1,0,0.693,5.453,100,1.4896,24,666,20.2,396.9,30.59,5
401 | "400",9.91655,0,18.1,0,0.693,5.852,77.8,1.5004,24,666,20.2,338.16,29.97,6.3
402 | "401",25.0461,0,18.1,0,0.693,5.987,100,1.5888,24,666,20.2,396.9,26.77,5.6
403 | "402",14.2362,0,18.1,0,0.693,6.343,100,1.5741,24,666,20.2,396.9,20.32,7.2
404 | "403",9.59571,0,18.1,0,0.693,6.404,100,1.639,24,666,20.2,376.11,20.31,12.1
405 | "404",24.8017,0,18.1,0,0.693,5.349,96,1.7028,24,666,20.2,396.9,19.77,8.3
406 | "405",41.5292,0,18.1,0,0.693,5.531,85.4,1.6074,24,666,20.2,329.46,27.38,8.5
407 | "406",67.9208,0,18.1,0,0.693,5.683,100,1.4254,24,666,20.2,384.97,22.98,5
408 | "407",20.7162,0,18.1,0,0.659,4.138,100,1.1781,24,666,20.2,370.22,23.34,11.9
409 | "408",11.9511,0,18.1,0,0.659,5.608,100,1.2852,24,666,20.2,332.09,12.13,27.9
410 | "409",7.40389,0,18.1,0,0.597,5.617,97.9,1.4547,24,666,20.2,314.64,26.4,17.2
411 | "410",14.4383,0,18.1,0,0.597,6.852,100,1.4655,24,666,20.2,179.36,19.78,27.5
412 | "411",51.1358,0,18.1,0,0.597,5.757,100,1.413,24,666,20.2,2.6,10.11,15
413 | "412",14.0507,0,18.1,0,0.597,6.657,100,1.5275,24,666,20.2,35.05,21.22,17.2
414 | "413",18.811,0,18.1,0,0.597,4.628,100,1.5539,24,666,20.2,28.79,34.37,17.9
415 | "414",28.6558,0,18.1,0,0.597,5.155,100,1.5894,24,666,20.2,210.97,20.08,16.3
416 | "415",45.7461,0,18.1,0,0.693,4.519,100,1.6582,24,666,20.2,88.27,36.98,7
417 | "416",18.0846,0,18.1,0,0.679,6.434,100,1.8347,24,666,20.2,27.25,29.05,7.2
418 | "417",10.8342,0,18.1,0,0.679,6.782,90.8,1.8195,24,666,20.2,21.57,25.79,7.5
419 | "418",25.9406,0,18.1,0,0.679,5.304,89.1,1.6475,24,666,20.2,127.36,26.64,10.4
420 | "419",73.5341,0,18.1,0,0.679,5.957,100,1.8026,24,666,20.2,16.45,20.62,8.8
421 | "420",11.8123,0,18.1,0,0.718,6.824,76.5,1.794,24,666,20.2,48.45,22.74,8.4
422 | "421",11.0874,0,18.1,0,0.718,6.411,100,1.8589,24,666,20.2,318.75,15.02,16.7
423 | "422",7.02259,0,18.1,0,0.718,6.006,95.3,1.8746,24,666,20.2,319.98,15.7,14.2
424 | "423",12.0482,0,18.1,0,0.614,5.648,87.6,1.9512,24,666,20.2,291.55,14.1,20.8
425 | "424",7.05042,0,18.1,0,0.614,6.103,85.1,2.0218,24,666,20.2,2.52,23.29,13.4
426 | "425",8.79212,0,18.1,0,0.584,5.565,70.6,2.0635,24,666,20.2,3.65,17.16,11.7
427 | "426",15.8603,0,18.1,0,0.679,5.896,95.4,1.9096,24,666,20.2,7.68,24.39,8.3
428 | "427",12.2472,0,18.1,0,0.584,5.837,59.7,1.9976,24,666,20.2,24.65,15.69,10.2
429 | "428",37.6619,0,18.1,0,0.679,6.202,78.7,1.8629,24,666,20.2,18.82,14.52,10.9
430 | "429",7.36711,0,18.1,0,0.679,6.193,78.1,1.9356,24,666,20.2,96.73,21.52,11
431 | "430",9.33889,0,18.1,0,0.679,6.38,95.6,1.9682,24,666,20.2,60.72,24.08,9.5
432 | "431",8.49213,0,18.1,0,0.584,6.348,86.1,2.0527,24,666,20.2,83.45,17.64,14.5
433 | "432",10.0623,0,18.1,0,0.584,6.833,94.3,2.0882,24,666,20.2,81.33,19.69,14.1
434 | "433",6.44405,0,18.1,0,0.584,6.425,74.8,2.2004,24,666,20.2,97.95,12.03,16.1
435 | "434",5.58107,0,18.1,0,0.713,6.436,87.9,2.3158,24,666,20.2,100.19,16.22,14.3
436 | "435",13.9134,0,18.1,0,0.713,6.208,95,2.2222,24,666,20.2,100.63,15.17,11.7
437 | "436",11.1604,0,18.1,0,0.74,6.629,94.6,2.1247,24,666,20.2,109.85,23.27,13.4
438 | "437",14.4208,0,18.1,0,0.74,6.461,93.3,2.0026,24,666,20.2,27.49,18.05,9.6
439 | "438",15.1772,0,18.1,0,0.74,6.152,100,1.9142,24,666,20.2,9.32,26.45,8.7
440 | "439",13.6781,0,18.1,0,0.74,5.935,87.9,1.8206,24,666,20.2,68.95,34.02,8.4
441 | "440",9.39063,0,18.1,0,0.74,5.627,93.9,1.8172,24,666,20.2,396.9,22.88,12.8
442 | "441",22.0511,0,18.1,0,0.74,5.818,92.4,1.8662,24,666,20.2,391.45,22.11,10.5
443 | "442",9.72418,0,18.1,0,0.74,6.406,97.2,2.0651,24,666,20.2,385.96,19.52,17.1
444 | "443",5.66637,0,18.1,0,0.74,6.219,100,2.0048,24,666,20.2,395.69,16.59,18.4
445 | "444",9.96654,0,18.1,0,0.74,6.485,100,1.9784,24,666,20.2,386.73,18.85,15.4
446 | "445",12.8023,0,18.1,0,0.74,5.854,96.6,1.8956,24,666,20.2,240.52,23.79,10.8
447 | "446",10.6718,0,18.1,0,0.74,6.459,94.8,1.9879,24,666,20.2,43.06,23.98,11.8
448 | "447",6.28807,0,18.1,0,0.74,6.341,96.4,2.072,24,666,20.2,318.01,17.79,14.9
449 | "448",9.92485,0,18.1,0,0.74,6.251,96.6,2.198,24,666,20.2,388.52,16.44,12.6
450 | "449",9.32909,0,18.1,0,0.713,6.185,98.7,2.2616,24,666,20.2,396.9,18.13,14.1
451 | "450",7.52601,0,18.1,0,0.713,6.417,98.3,2.185,24,666,20.2,304.21,19.31,13
452 | "451",6.71772,0,18.1,0,0.713,6.749,92.6,2.3236,24,666,20.2,0.32,17.44,13.4
453 | "452",5.44114,0,18.1,0,0.713,6.655,98.2,2.3552,24,666,20.2,355.29,17.73,15.2
454 | "453",5.09017,0,18.1,0,0.713,6.297,91.8,2.3682,24,666,20.2,385.09,17.27,16.1
455 | "454",8.24809,0,18.1,0,0.713,7.393,99.3,2.4527,24,666,20.2,375.87,16.74,17.8
456 | "455",9.51363,0,18.1,0,0.713,6.728,94.1,2.4961,24,666,20.2,6.68,18.71,14.9
457 | "456",4.75237,0,18.1,0,0.713,6.525,86.5,2.4358,24,666,20.2,50.92,18.13,14.1
458 | "457",4.66883,0,18.1,0,0.713,5.976,87.9,2.5806,24,666,20.2,10.48,19.01,12.7
459 | "458",8.20058,0,18.1,0,0.713,5.936,80.3,2.7792,24,666,20.2,3.5,16.94,13.5
460 | "459",7.75223,0,18.1,0,0.713,6.301,83.7,2.7831,24,666,20.2,272.21,16.23,14.9
461 | "460",6.80117,0,18.1,0,0.713,6.081,84.4,2.7175,24,666,20.2,396.9,14.7,20
462 | "461",4.81213,0,18.1,0,0.713,6.701,90,2.5975,24,666,20.2,255.23,16.42,16.4
463 | "462",3.69311,0,18.1,0,0.713,6.376,88.4,2.5671,24,666,20.2,391.43,14.65,17.7
464 | "463",6.65492,0,18.1,0,0.713,6.317,83,2.7344,24,666,20.2,396.9,13.99,19.5
465 | "464",5.82115,0,18.1,0,0.713,6.513,89.9,2.8016,24,666,20.2,393.82,10.29,20.2
466 | "465",7.83932,0,18.1,0,0.655,6.209,65.4,2.9634,24,666,20.2,396.9,13.22,21.4
467 | "466",3.1636,0,18.1,0,0.655,5.759,48.2,3.0665,24,666,20.2,334.4,14.13,19.9
468 | "467",3.77498,0,18.1,0,0.655,5.952,84.7,2.8715,24,666,20.2,22.01,17.15,19
469 | "468",4.42228,0,18.1,0,0.584,6.003,94.5,2.5403,24,666,20.2,331.29,21.32,19.1
470 | "469",15.5757,0,18.1,0,0.58,5.926,71,2.9084,24,666,20.2,368.74,18.13,19.1
471 | "470",13.0751,0,18.1,0,0.58,5.713,56.7,2.8237,24,666,20.2,396.9,14.76,20.1
472 | "471",4.34879,0,18.1,0,0.58,6.167,84,3.0334,24,666,20.2,396.9,16.29,19.9
473 | "472",4.03841,0,18.1,0,0.532,6.229,90.7,3.0993,24,666,20.2,395.33,12.87,19.6
474 | "473",3.56868,0,18.1,0,0.58,6.437,75,2.8965,24,666,20.2,393.37,14.36,23.2
475 | "474",4.64689,0,18.1,0,0.614,6.98,67.6,2.5329,24,666,20.2,374.68,11.66,29.8
476 | "475",8.05579,0,18.1,0,0.584,5.427,95.4,2.4298,24,666,20.2,352.58,18.14,13.8
477 | "476",6.39312,0,18.1,0,0.584,6.162,97.4,2.206,24,666,20.2,302.76,24.1,13.3
478 | "477",4.87141,0,18.1,0,0.614,6.484,93.6,2.3053,24,666,20.2,396.21,18.68,16.7
479 | "478",15.0234,0,18.1,0,0.614,5.304,97.3,2.1007,24,666,20.2,349.48,24.91,12
480 | "479",10.233,0,18.1,0,0.614,6.185,96.7,2.1705,24,666,20.2,379.7,18.03,14.6
481 | "480",14.3337,0,18.1,0,0.614,6.229,88,1.9512,24,666,20.2,383.32,13.11,21.4
482 | "481",5.82401,0,18.1,0,0.532,6.242,64.7,3.4242,24,666,20.2,396.9,10.74,23
483 | "482",5.70818,0,18.1,0,0.532,6.75,74.9,3.3317,24,666,20.2,393.07,7.74,23.7
484 | "483",5.73116,0,18.1,0,0.532,7.061,77,3.4106,24,666,20.2,395.28,7.01,25
485 | "484",2.81838,0,18.1,0,0.532,5.762,40.3,4.0983,24,666,20.2,392.92,10.42,21.8
486 | "485",2.37857,0,18.1,0,0.583,5.871,41.9,3.724,24,666,20.2,370.73,13.34,20.6
487 | "486",3.67367,0,18.1,0,0.583,6.312,51.9,3.9917,24,666,20.2,388.62,10.58,21.2
488 | "487",5.69175,0,18.1,0,0.583,6.114,79.8,3.5459,24,666,20.2,392.68,14.98,19.1
489 | "488",4.83567,0,18.1,0,0.583,5.905,53.2,3.1523,24,666,20.2,388.22,11.45,20.6
490 | "489",0.15086,0,27.74,0,0.609,5.454,92.7,1.8209,4,711,20.1,395.09,18.06,15.2
491 | "490",0.18337,0,27.74,0,0.609,5.414,98.3,1.7554,4,711,20.1,344.05,23.97,7
492 | "491",0.20746,0,27.74,0,0.609,5.093,98,1.8226,4,711,20.1,318.43,29.68,8.1
493 | "492",0.10574,0,27.74,0,0.609,5.983,98.8,1.8681,4,711,20.1,390.11,18.07,13.6
494 | "493",0.11132,0,27.74,0,0.609,5.983,83.5,2.1099,4,711,20.1,396.9,13.35,20.1
495 | "494",0.17331,0,9.69,0,0.585,5.707,54,2.3817,6,391,19.2,396.9,12.01,21.8
496 | "495",0.27957,0,9.69,0,0.585,5.926,42.6,2.3817,6,391,19.2,396.9,13.59,24.5
497 | "496",0.17899,0,9.69,0,0.585,5.67,28.8,2.7986,6,391,19.2,393.29,17.6,23.1
498 | "497",0.2896,0,9.69,0,0.585,5.39,72.9,2.7986,6,391,19.2,396.9,21.14,19.7
499 | "498",0.26838,0,9.69,0,0.585,5.794,70.6,2.8927,6,391,19.2,396.9,14.1,18.3
500 | "499",0.23912,0,9.69,0,0.585,6.019,65.3,2.4091,6,391,19.2,396.9,12.92,21.2
501 | "500",0.17783,0,9.69,0,0.585,5.569,73.5,2.3999,6,391,19.2,395.77,15.1,17.5
502 | "501",0.22438,0,9.69,0,0.585,6.027,79.7,2.4982,6,391,19.2,396.9,14.33,16.8
503 | "502",0.06263,0,11.93,0,0.573,6.593,69.1,2.4786,1,273,21,391.99,9.67,22.4
504 | "503",0.04527,0,11.93,0,0.573,6.12,76.7,2.2875,1,273,21,396.9,9.08,20.6
505 | "504",0.06076,0,11.93,0,0.573,6.976,91,2.1675,1,273,21,396.9,5.64,23.9
506 | "505",0.10959,0,11.93,0,0.573,6.794,89.3,2.3889,1,273,21,393.45,6.48,22
507 | "506",0.04741,0,11.93,0,0.573,6.03,80.8,2.505,1,273,21,396.9,7.88,11.9
508 |
--------------------------------------------------------------------------------
/Examples/LCA Bokeh.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import pandas as pd\n",
10 | "import numpy as np"
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": null,
16 | "metadata": {},
17 | "outputs": [],
18 | "source": [
19 | "#datapath = 'C:/Users/Ram/Documents/Ram/Data_Sets/'\n",
20 | "datapath = ''\n",
21 | "filename = ''\n",
22 | "sep = ','\n",
23 | "target = 'chas'"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": null,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": [
32 | "df = pd.read_csv(datapath+filename,sep=sep)\n",
33 | "dft = df[:]\n",
34 | "print(df.shape)\n",
35 | "df.head(1)"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": null,
41 | "metadata": {
42 | "scrolled": false
43 | },
44 | "outputs": [],
45 | "source": [
46 | "dft = AV.AutoViz(datapath+filename, sep, target, \"\",\n",
47 | " header=0, verbose=1,\n",
48 | " lowess=False,chart_format='bokeh',max_rows_analyzed=150000,max_cols_analyzed=30)\n"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": null,
54 | "metadata": {},
55 | "outputs": [],
56 | "source": []
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "metadata": {},
62 | "outputs": [],
63 | "source": []
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": null,
68 | "metadata": {},
69 | "outputs": [],
70 | "source": []
71 | }
72 | ],
73 | "metadata": {
74 | "kernelspec": {
75 | "display_name": "Python 3",
76 | "language": "python",
77 | "name": "python3"
78 | },
79 | "language_info": {
80 | "codemirror_mode": {
81 | "name": "ipython",
82 | "version": 3
83 | },
84 | "file_extension": ".py",
85 | "mimetype": "text/x-python",
86 | "name": "python",
87 | "nbconvert_exporter": "python",
88 | "pygments_lexer": "ipython3",
89 | "version": "3.10.13"
90 | }
91 | },
92 | "nbformat": 4,
93 | "nbformat_minor": 2
94 | }
95 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # AutoViz: The One-Line Automatic Data Visualization Library
2 |
3 | 
4 |
5 | Unlock the power of **AutoViz** to visualize any dataset, any size, with just a single line of code! Plus, now you can get a quick assessment of your dataset's quality and fix DQ issues through the FixDQ() function.
6 |
7 | [](https://pepy.tech/project/autoviz)
8 | [](https://pepy.tech/project/autoviz)
9 | [](https://pepy.tech/project/autoviz)
10 | [](https://github.com/RichardLitt/standard-readme)
11 | [](https://pypi.org/project/autoviz)
12 | [](https://pypi.org/project/autoviz)
13 | [](https://github.com/AutoViML/AutoViz/blob/master/LICENSE)
14 |
15 | With AutoViz, you can easily and quickly generate insightful visualizations for your data. Whether you're a beginner or an expert in data analysis, AutoViz can help you explore your data and uncover valuable insights. Try it out and see the power of automated visualization for yourself!
16 |
17 | ## Table of Contents
18 |
32 |
33 | ## Latest
34 | The latest updates about `autoviz` library can be found in Updates page.
35 |
36 | ## ImportantAnnouncement
37 | ### Starting with version 0.1.901, an important update
38 | We're excited to announce we've made significant updates to our `setup.py` script to leverage the latest versions in our dependencies while maintaining support for older Python versions (you may want to check older versions). The installation process is seamless—simply run pip install . in the AutoViz directory, and the script takes care of the rest, tailoring the installation to your environment.
39 |
40 | ### Feedback
41 | Your feedback is crucial! If you encounter any issues or have suggestions, please let us know through [GitHub Issues](https://github.com/AutoViML/AutoViz/issues)
42 |
43 | Thank you for your continued support and happy visualizing!
44 |
45 | ## Citation
46 | If you use AutoViz in your research project or paper, please use the following format for citations:
47 | "Seshadri, Ram (2020). GitHub - AutoViML/AutoViz: Automatically Visualize any dataset, any size with a single line of code. source code: https://github.com/AutoViML/AutoViz"
48 | Current citations for AutoViz
49 |
50 | [Google Scholar](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C31&q=autoviz&oq=autoviz)
51 |
52 | ## Motivation
53 | The motivation behind the creation of AutoViz is to provide a more efficient, user-friendly, and automated approach to exploratory data analysis (EDA) through quick and easy data visualization plus data quality. The library is designed to help users understand patterns, trends, and relationships in the data by creating insightful visualizations with minimal effort. AutoViz is particularly useful for beginners in data analysis as it abstracts away the complexities of various plotting libraries and techniques. For experts, it provides another expert tool that they can use to provide inights into data that they may have missed.
54 |
55 | AutoViz is a powerful tool for generating insightful visualizations with minimal effort. Here are some of its key selling points compared to other automated EDA tools:
56 |
57 | - Ease of use: AutoViz is designed to be user-friendly and accessible to beginners in data analysis, abstracting away the complexities of various plotting libraries
58 | - Speed: AutoViz is optimized for speed and can generate multiple insightful plots with just a single line of code
59 | - Scalability: AutoViz is designed to work with datasets of any size and can handle large datasets efficiently
60 | - Automation: AutoViz automates the visualization process, requiring just a single line of code to generate multiple insightful plots
61 | - Customization: AutoViz provides several options for customizing the visualizations, such as changing the chart type, color palette, etc.
62 | - Data Quality: AutoViz now provides data quality assessment by default and helps you fix DQ issues with a single line of code using the FixDQ() function
63 |
64 | ## Installation
65 |
66 | **Prerequisites**
67 | - [Anaconda](https://docs.anaconda.com/anaconda/install/)
68 |
69 | Create a new environment and install the required dependencies to clone AutoViz:
70 |
71 | **From PyPi:**
72 | ```sh
73 | cd
74 | git clone git@github.com:AutoViML/AutoViz.git
75 | # or download and unzip https://github.com/AutoViML/AutoViz/archive/master.zip
76 | conda create -n python=3.7 anaconda
77 | conda activate # ON WINDOWS: `source activate `
78 | cd AutoViz
79 | ```
80 | For Python versions below 3.10, install dependencies as follows:
81 |
82 | ```
83 | pip install -r requirements.txt
84 | ```
85 |
86 | For Python 3.10, please use:
87 |
88 | ```
89 | pip install -r requirements-py310.txt
90 | ```
91 |
92 | For Python 3.11 and above, it's recommended to use:
93 |
94 | ```
95 | pip install -r requirements-py311.txt
96 | ```
97 |
98 | These requirement files ensure that AutoViz works seamlessly with your Python environment by installing compatible versions of libraries like HoloViews, Bokeh, and hvPlot. Please select the requirement file that corresponds to your Python version to enjoy a smooth experience with AutoViz.
99 |
100 | ## Usage
101 | Discover how to use AutoViz in this Medium article.
102 |
103 | In the AutoViz directory, open a Jupyter Notebook or open a command palette (terminal) and use the following code to instantiate the AutoViz_Class. You can simply run this code step by step:
104 |
105 | ```python
106 | from autoviz import AutoViz_Class
107 | AV = AutoViz_Class()
108 | dft = AV.AutoViz(filename)
109 | ```
110 |
111 | AutoViz can use any input either filename (in CSV, txt, or JSON format) or a pandas dataframe. If you have a large dataset, you can set the `max_rows_analyzed` and `max_cols_analyzed` arguments to speed up the visualization by asking autoviz to sample your dataset.
112 |
113 | AutoViz can also create charts in multiple formats using the `chart_format` setting:
114 | - If `chart_format ='png'` or `'svg'` or `'jpg'`: Matplotlib charts are plotted inline.
115 | * Can be saved locally (using `verbose=2` setting) or displayed (`verbose=1`) in Jupyter Notebooks.
116 | * This is the default behavior for AutoViz.
117 | - If `chart_format='bokeh'`: Interactive Bokeh charts are plotted in Jupyter Notebooks.
118 | - If `chart_format='server'`, dashboards will pop up for each kind of chart on your browser.
119 | - If `chart_format='html'`, interactive Bokeh charts will be created and silently saved as HTML files under the `AutoViz_Plots` directory (under working folder) or any other directory that you specify using the `save_plot_dir` setting (during input).
120 |
121 |
122 | ## API
123 | Arguments for `AV.AutoViz()` method:
124 |
125 | - `filename`: Use an empty string ("") if there's no associated filename and you want to use a dataframe. In that case, using the `dfte` argument for the dataframe. Otherwise provide a filename and leave `dfte` argument with an empty string. Only one of them can be used.
126 | - `sep`: File separator (comma, semi-colon, tab, or any column-separating value) if you use a filename above.
127 | - `depVar`: Target variable in your dataset; set it as an empty string if not applicable.
128 | - `dfte`: name of the pandas dataframe for plotting charts; leave it as empty string if using a filename.
129 | - `header`: set the row number of the header row in your file (0 for the first row). Otherwise leave it as 0.
130 | - `verbose`: 0 for minimal info and charts, 1 for more info and charts, or 2 for saving charts locally without display.
131 | - `lowess`: Use regression lines for each pair of continuous variables against the target variable in small datasets; avoid using for large datasets (>100,000 rows).
132 | - `chart_format`: 'svg', 'png', 'jpg', 'bokeh', 'server', or 'html' for displaying or saving charts in various formats, depending on the verbose option.
133 | - `max_rows_analyzed`: Limit the max number of rows to use for visualization when dealing with very large datasets (millions of rows). A statistically valid sample will be used by autoviz. Default is 150000 rows.
134 | - `max_cols_analyzed`: Limit the number of continuous variables to be analyzed. Defaul is 30 columns.
135 | - `save_plot_dir`: Directory for saving plots. Default is None, which saves plots under the current directory in a subfolder named AutoViz_Plots. If the save_plot_dir doesn't exist, it will be created.
136 |
137 | ## Examples
138 | Here are some examples to help you get started with AutoViz. If you need full jupyter notebooks with code samples they can be found in [examples](https://github.com/AutoViML/AutoViz/tree/master/Examples) folder.
139 |
140 | ### Example 1: Visualize a CSV file with a target variable
141 |
142 | ```python
143 | from autoviz import AutoViz_Class
144 | AV = AutoViz_Class()
145 |
146 | filename = "your_file.csv"
147 | target_variable = "your_target_variable"
148 |
149 | dft = AV.AutoViz(
150 | filename,
151 | sep=",",
152 | depVar=target_variable,
153 | dfte=None,
154 | header=0,
155 | verbose=1,
156 | lowess=False,
157 | chart_format="svg",
158 | max_rows_analyzed=150000,
159 | max_cols_analyzed=30,
160 | save_plot_dir=None
161 | )
162 | ```
163 |
164 | 
165 |
166 | ### Example 2: Visualize a Pandas DataFrame without a target variable:
167 |
168 | ```python
169 | import pandas as pd
170 | from autoviz import AutoViz_Class
171 |
172 | AV = AutoViz_Class()
173 |
174 | data = {'col1': [1, 2, 3, 4, 5], 'col2': [5, 4, 3, 2, 1]}
175 | df = pd.DataFrame(data)
176 |
177 | dft = AV.AutoViz(
178 | "",
179 | sep=",",
180 | depVar="",
181 | dfte=df,
182 | header=0,
183 | verbose=1,
184 | lowess=False,
185 | chart_format="server",
186 | max_rows_analyzed=150000,
187 | max_cols_analyzed=30,
188 | save_plot_dir=None
189 | )
190 |
191 | ```
192 |
193 | 
194 |
195 | ### Example 3: Generate interactive Bokeh charts and save them as HTML files in a custom directory
196 |
197 | ```python
198 | from autoviz import AutoViz_Class
199 | AV = AutoViz_Class()
200 |
201 | filename = "your_file.csv"
202 | target_variable = "your_target_variable"
203 | custom_plot_dir = "your_custom_plot_directory"
204 |
205 | dft = AV.AutoViz(
206 | filename,
207 | sep=",",
208 | depVar=target_variable,
209 | dfte=None,
210 | header=0,
211 | verbose=2,
212 | lowess=False,
213 | chart_format="bokeh",
214 | max_rows_analyzed=150000,
215 | max_cols_analyzed=30,
216 | save_plot_dir=custom_plot_dir
217 | )
218 | ```
219 |
220 | 
221 |
222 | These examples should give you an idea of how to use AutoViz with different scenarios and settings. By tailoring the options and settings, you can generate visualizations that best suit your needs, whether you're working with large datasets, interactive charts, or simply exploring the relationships between variables.
223 |
224 | ## Maintainers
225 | AutoViz is actively maintained and improved by a team of dedicated developers. If you have any questions, suggestions, or issues, feel free to reach out to the maintainers:
226 |
227 | - [@AutoViML](https://github.com/AutoViML)
228 | - [@morenoh149](https://github.com/morenoh149)
229 | - [@hironroy](https://github.com/hironroy)
230 |
231 | ## Contributing
232 | We welcome contributions from the community! If you're interested in contributing to AutoViz, please follow these steps:
233 |
234 | - Fork the repository on GitHub.
235 | - Clone your fork and create a new branch for your feature or bugfix.
236 | - Commit your changes to the new branch, ensuring that you follow coding standards and write appropriate tests.
237 | - Push your changes to your fork on GitHub.
238 | - Submit a pull request to the main repository, detailing your changes and referencing any related issues.
239 |
240 | See [the contributing file](contributing.md)!
241 |
242 | ## License
243 | AutoViz is released under the Apache License, Version 2.0. By using AutoViz, you agree to the terms and conditions specified in the license.
244 |
245 | ## Tips
246 | Here are some additional tips and reminders to help you make the most of the library:
247 |
248 | - **Make sure to regularly upgrade AutoViz** to benefit from the latest features, bug fixes, and improvements. You can update it using pip install --upgrade autoviz.
249 | - **AutoViz is highly customizable, so don't hesitate to explore and experiment with various settings**, such as chart_format, verbose, and max_rows_analyzed. This will allow you to create visualizations that better suit your specific needs and preferences.
250 | - **Remember to delete the AutoViz_Plots directory (or any custom directory you specified) periodically** if you used the verbose=2 option, as it can accumulate a large number of saved charts over time.
251 | - **For further guidance or inspiration, check out the Medium article on AutoViz**, as well as other online resources and tutorials.
252 |
253 | - AutoViz will visualize any sized file using a statistically valid sample.
254 | - COMMA is the default separator in the file, but you can change it.
255 | - Assumes the first row as the header in the file, but this can be changed.
256 |
257 |
258 | - **By leveraging AutoViz's powerful and flexible features**, you can streamline your data visualization process and gain valuable insights more efficiently. Happy visualizing!
259 |
260 | ## DISCLAIMER
261 | This project is not an official Google project. It is not supported by Google, and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.
--------------------------------------------------------------------------------
/autoviz/AutoViz_Class.py:
--------------------------------------------------------------------------------
1 | ############################################################################
2 | # Copyright 2019 Google LLC
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # https://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | #################################################################################################
16 | import os
17 | import pandas as pd
18 | ########################################
19 | import warnings
20 | from sklearn.exceptions import DataConversionWarning
21 | ####################################################################################
22 | import matplotlib
23 | import seaborn as sns
24 | import copy
25 | import time
26 | import traceback
27 | from pandas_dq import Fix_DQ, dq_report
28 | ##########################################################################################
29 | from autoviz.AutoViz_Holo import AutoViz_Holo
30 | from autoviz.AutoViz_Utils import draw_pivot_tables, draw_scatters
31 | from autoviz.AutoViz_Utils import draw_pair_scatters, draw_barplots, draw_heatmap
32 | from autoviz.AutoViz_Utils import draw_distplot, draw_violinplot, draw_date_vars, draw_catscatterplots
33 | from autoviz.AutoViz_Utils import list_difference
34 | from autoviz.AutoViz_Utils import find_remove_duplicates, classify_print_vars
35 | from autoviz.AutoViz_Utils import left_subtract
36 | from autoviz.AutoViz_NLP import draw_word_clouds
37 | #######################################################################################
38 | sns.set(style="ticks", color_codes=True)
39 | matplotlib.use('agg')
40 | warnings.filterwarnings(action='ignore', category=DataConversionWarning)
41 | warnings.filterwarnings("ignore")
42 | #######################################################################################
43 | def warn(*args, **kwargs):
44 | pass
45 | warnings.warn = warn
46 | #######################################################################################
47 | class AutoViz_Class:
48 | """
49 | ##############################################################################
50 | ############# This is not an Officially Supported Google Product! ######
51 | ##############################################################################
52 | #Copyright 2019 Google LLC ######
53 | # ######
54 | #Licensed under the Apache License, Version 2.0 (the "License"); ######
55 | #you may not use this file except in compliance with the License. ######
56 | #You may obtain a copy of the License at ######
57 | # ######
58 | # https://www.apache.org/licenses/LICENSE-2.0 ######
59 | # ######
60 | #Unless required by applicable law or agreed to in writing, software ######
61 | #distributed under the License is distributed on an "AS IS" BASIS, ######
62 | #WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.#####
63 | #See the License for the specific language governing permissions and ######
64 | #limitations under the License. ######
65 | ##############################################################################
66 | ########### AutoViz Class ######
67 | ########### by Ram Seshadri ######
68 | ########### AUTOMATICALLY VISUALIZE ANY DATA SET ######
69 | ########### Version V0.0.68 1/10/20 ######
70 | ##############################################################################
71 | ##### AUTOVIZ PERFORMS AUTOMATIC VISUALIZATION OF ANY DATA SET WITH ONE CLICK.
72 | ##### Give it any input file (CSV, txt or json) and AV will visualize it.##
73 | ##### INPUTS: #####
74 | ##### A FILE NAME OR A DATA FRAME AS INPUT. #####
75 | ##### AutoViz will visualize any sized file using a statistically valid sample.
76 | ##### - COMMA is assumed as default separator in file. But u can change it.##
77 | ##### - Assumes first row as header in file, but you can change it. ####
78 | ##### - First instantiate an AutoViz class to hold output of charts, plots.#
79 | ##### - Then call the Autoviz program with inputs as defined below. ###
80 | ##############################################################################
81 | """
82 |
83 | def __init__(self):
84 | self.overall = {
85 | 'name': 'overall',
86 | 'plots': [],
87 | 'heading': [],
88 | 'subheading': [], # "\n".join(subheading)
89 | 'desc': [], # "\n".join(subheading)
90 | 'table1_title': "",
91 | 'table1': [],
92 | 'table2_title': "",
93 | 'table2': []
94 | } ### This is for overall description and comments about the data set
95 | self.scatter_plot = {
96 | 'name': 'scatter',
97 | 'heading': 'Scatter Plot of each Continuous Variable against Target Variable',
98 | 'plots': [],
99 | 'subheading': [], # "\n".join(subheading)
100 | 'desc': [] # "\n".join(desc)
101 | } ##### This is for description and images for scatter plots ###
102 | self.pair_scatter = {
103 | 'name': 'pair-scatter',
104 | 'heading': 'Pairwise Scatter Plot of each Continuous Variable against other Continuous Variables',
105 | 'plots': [],
106 | 'subheading': [], # "\n".join(subheading)
107 | 'desc': [] # "\n".join(desc)
108 | } ##### This is for description and images for pairs of scatter plots ###
109 | self.dist_plot = {
110 | 'name': 'distribution',
111 | 'heading': 'Distribution Plot of Target Variable',
112 | 'plots': [],
113 | 'subheading': [], # "\n".join(subheading)
114 | 'desc': [] # "\n".join(desc)
115 | } ##### This is for description and images for distribution plots ###
116 | self.pivot_plot = {
117 | 'name': 'pivot',
118 | 'heading': 'Pivot Plots of all Continuous Variable',
119 | 'plots': [],
120 | 'subheading': [], # "\n".join(subheading)
121 | 'desc': [] # "\n".join(desc)
122 | } ##### This is for description and images for pivot plots ###
123 | self.violin_plot = {
124 | 'name': 'violin',
125 | 'heading': 'Violin Plots of all Continuous Variable',
126 | 'plots': [],
127 | 'subheading': [], # "\n".join(subheading)
128 | 'desc': [] # "\n".join(desc)
129 | } ##### This is for description and images for violin plots ###
130 | self.heat_map = {
131 | 'name': 'heatmap',
132 | 'heading': 'Heatmap of all Continuous Variables for target Variable',
133 | 'plots': [],
134 | 'subheading': [], # "\n".join(subheading)
135 | 'desc': [] # "\n".join(desc)
136 | } ##### This is for description and images for heatmaps ###
137 | self.bar_plot = {
138 | 'name': 'bar',
139 | 'heading': 'Bar Plots of Average of each Continuous Variable by Target Variable',
140 | 'plots': [],
141 | 'subheading': [], # "\n".join(subheading)
142 | 'desc': [] # "\n".join(desc)
143 | } ##### This is for description and images for bar plots ###
144 | self.date_plot = {
145 | 'name': 'time-series',
146 | 'heading': 'Time Series Plots of Two Continuous Variables against a Date/Time Variable',
147 | 'plots': [],
148 | 'subheading': [], # "\n".join(subheading)
149 | 'desc': [] # "\n".join(desc)
150 | } ######## This is for description and images for date time plots ###
151 | self.wordcloud = {
152 | 'name': 'wordcloud',
153 | 'heading': 'Word Cloud Plots of NLP or String vars',
154 | 'plots': [],
155 | 'subheading': [], # "\n".join(subheading)
156 | 'desc': [] # "\n".join(desc)
157 | } ######## This is for description and images for date time plots ###
158 | self.catscatter_plot = {
159 | 'name': 'catscatter',
160 | 'heading': 'Cat-Scatter Plots of categorical vars',
161 | 'plots': [],
162 | 'subheading': [], # "\n".join(subheading)
163 | 'desc': [] # "\n".join(desc)
164 | } ######## This is for description and images for catscatter plots ###
165 |
166 | def add_plots(self, plotname, X):
167 | """
168 | This is a simple program to append the input chart to the right variable named plotname
169 | which is an attribute of class AV. So make sure that the plotname var matches an exact
170 | variable name defined in class AV. Otherwise, this will give an error.
171 | """
172 | if X is None:
173 | ### If there is nothing to add, leave it as it is.
174 | # print("Nothing to add Plot not being added")
175 | pass
176 | else:
177 | getattr(self, plotname)["plots"].append(X)
178 |
179 | def add_subheading(self, plotname, X):
180 | """
181 | This is a simple program to append the input chart to the right variable named plotname
182 | which is an attribute of class AV. So make sure that the plotname var matches an exact
183 | variable name defined in class AV. Otherwise, this will give an error.
184 | """
185 | if X is None:
186 | ### If there is nothing to add, leave it as it is.
187 | pass
188 | else:
189 | getattr(self, plotname)["subheading"].append(X)
190 |
191 | def AutoViz(self, filename: (str or pd.DataFrame), sep=',', depVar='', dfte=None, header=0, verbose=1,
192 | lowess=False, chart_format='svg', max_rows_analyzed=150000,
193 | max_cols_analyzed=30, save_plot_dir=None):
194 | """
195 | ##############################################################################
196 | ##### AUTOVIZ PERFORMS AUTOMATIC VISUALIZATION OF ANY DATA SET WITH ONE CLICK.
197 | ##### Give it any input file (CSV, txt or json) and AV will visualize it.##
198 | ##### INPUTS: #####
199 | ##### A FILE NAME OR A DATA FRAME AS INPUT. #####
200 | ##### AutoViz will visualize any sized file using a statistically valid sample.
201 | ##### - max_rows_analyzed = 150000 ### this limits the max number of rows ###
202 | ##### that is used to display charts ###
203 | ##### - max_cols_analyzed = 30 ### This limits the number of continuous ###
204 | ##### vars that can be analyzed ####
205 | ##### - COMMA is assumed as default separator in file. But u can change it.##
206 | ##### - Assumes first row as header in file, but you can change it. ####
207 | ##### - First instantiate an AutoViz class to hold output of charts, plots.#
208 | ##### - Then call the Autoviz program with inputs as defined below. ###
209 | ##############################################################################
210 | ##### This is the main calling program in AV. It will call all the load, #####
211 | #### display and save rograms that are currently outside AV. This program ###
212 | #### will draw scatter and other plots for the input data set and then ####
213 | #### call the correct variable name with add_plots function and send in ####
214 | #### the chart created by that plotting program, for example, scatter #####
215 | #### You have to make sure that add_plots function has the exact name of ####
216 | #### the variable defined in the Class AV. If not, this will give an error.##
217 | #### If verbose=0: it does not print any messages and goes into silent mode##
218 | #### This is the default. #####
219 | #### If verbose=1, it will print messages on the terminal and also display###
220 | #### charts on terminal #####
221 | #### If verbose=2, it will print messages but will not display charts, #####
222 | #### it will simply save them. #####
223 | ##############################################################################
224 | """
225 | if isinstance(dfte, pd.DataFrame): ### if there is a dataframe, choose it
226 | filename = dfte
227 |
228 | if isinstance(depVar, list):
229 | print('Since AutoViz cannot visualize multi-label targets, choosing first item in targets: %s' % depVar[0])
230 | dep_var = depVar[0]
231 | else:
232 | dep_var = copy.deepcopy(depVar)
233 | ####################################################################################
234 | if chart_format.lower() in ['bokeh', 'server', 'bokeh_server', 'bokeh-server', 'html']:
235 | dft = AutoViz_Holo(filename, sep, dep_var, header, verbose,
236 | lowess, chart_format, max_rows_analyzed,
237 | max_cols_analyzed, save_plot_dir)
238 | else:
239 | dft = self.AutoViz_Main(filename, sep, dep_var, header, verbose,
240 | lowess, chart_format, max_rows_analyzed,
241 | max_cols_analyzed, save_plot_dir)
242 | return dft
243 |
244 | def AutoViz_Main(self, filename: str or pd.DataFrame, sep=',', dep_var='', header=0, verbose=0,
245 | lowess=False, chart_format='svg', max_rows_analyzed=150000,
246 | max_cols_analyzed=30, save_plot_dir=None):
247 | """
248 | ##############################################################################
249 | ##### AUTOVIZ_MAIN PERFORMS AUTO VISUALIZATION OF ANY DATA USING MATPLOTLIB ##
250 | ##############################################################################
251 | """
252 | ######### create a directory to save all plots generated by autoviz ############
253 | ############ THis is where you save the figures in a target directory #######
254 | target_dir = 'AutoViz'
255 |
256 | if dep_var is not None:
257 | if isinstance(dep_var, list):
258 | target_dir = dep_var[0]
259 | elif isinstance(dep_var, str):
260 | if dep_var != '':
261 | target_dir = copy.deepcopy(dep_var)
262 | if save_plot_dir is None:
263 | mk_dir = os.path.join(".", "AutoViz_Plots")
264 | else:
265 | mk_dir = copy.deepcopy(save_plot_dir)
266 | if verbose == 2 and not os.path.isdir(mk_dir):
267 | os.mkdir(mk_dir)
268 | mk_dir = os.path.join(mk_dir, target_dir)
269 | if verbose == 2 and not os.path.isdir(mk_dir):
270 | os.mkdir(mk_dir)
271 | ############ Start the clock here and classify variables in data set first ########
272 | start_time = time.time()
273 |
274 | (dft, dep_var, id_cols, bool_vars, cats, continuous_vars, discrete_string_vars, date_vars, classes,
275 | problem_type, selected_cols) = classify_print_vars(filename, sep, max_rows_analyzed, max_cols_analyzed,
276 | dep_var, header, verbose)
277 |
278 | ########### This is where perform data quality checks on data ################
279 | if verbose >= 1:
280 | print('To fix these data quality issues in the dataset, import FixDQ from autoviz...')
281 | #### Run the Data Cleaning suggestions report now ############
282 |
283 | if dep_var is not None:
284 | if isinstance(dep_var, list):
285 | remaining_vars = left_subtract(list(dft), dep_var)
286 | if len(remaining_vars) == len(list(dft)):
287 | print('depVar %s not found in given dataset. Please check your input and try again' % dep_var)
288 | return dft
289 | ### run the data cleaning report with a multi-label list of targets ##
290 | data_cleaning_suggestions(dft, target=dep_var)
291 | else:
292 | ### run the data cleaning report with a single-label target ##
293 | data_cleaning_suggestions(dft, target=dep_var)
294 | else:
295 | ### run data cleaning report with no target ####
296 | data_cleaning_suggestions(dft, target='')
297 |
298 | ##### This is where we start plotting different kinds of charts depending on dependent variables
299 | if dep_var is None or dep_var == '':
300 | ##### This is when No dependent Variable is given #######
301 | if len(continuous_vars) > 1:
302 | try:
303 | svg_data = draw_pair_scatters(dft, continuous_vars, problem_type, verbose, chart_format,
304 | dep_var, classes, lowess, mk_dir)
305 | self.add_plots('pair_scatter', svg_data)
306 | except Exception as e:
307 | print(e)
308 | print('Could not draw Pair Scatter Plots')
309 | try:
310 | svg_data = draw_distplot(dft, bool_vars + cats, continuous_vars, verbose, chart_format, problem_type,
311 | dep_var, classes, mk_dir)
312 | self.add_plots('dist_plot', svg_data)
313 | except Exception as e:
314 | print(f'Could not draw Distribution Plot. {e}')
315 | try:
316 | if len(continuous_vars) > 0:
317 | svg_data = draw_violinplot(dft, dep_var, continuous_vars, verbose, chart_format, problem_type,
318 | mk_dir)
319 | self.add_plots('violin_plot', svg_data)
320 | else:
321 | svg_data = draw_pivot_tables(dft, problem_type, verbose,
322 | chart_format, dep_var, mk_dir)
323 | self.add_plots('pivot_plot', svg_data)
324 | except Exception as e:
325 | print(f'Could not draw Distribution Plots {e}')
326 | try:
327 | #### Since there is no dependent variable in this dataset, you can leave it as-is
328 | numeric_cols = dft.select_dtypes(include='number').columns.tolist()
329 | numeric_cols = list_difference(numeric_cols, date_vars)
330 | svg_data = draw_heatmap(dft, numeric_cols, verbose, chart_format, date_vars, dep_var,
331 | problem_type, mk_dir)
332 | self.add_plots('heat_map', svg_data)
333 | except Exception as e:
334 | print(f'Could not draw Heat Map {e}')
335 | if date_vars != [] and len(continuous_vars) > 0:
336 | try:
337 | svg_data = draw_date_vars(dft, dep_var, date_vars,
338 | continuous_vars, verbose, chart_format, problem_type, mk_dir)
339 | self.add_plots('date_plot', svg_data)
340 | except Exception as e:
341 | print(f'Could not draw Date Vars {e}')
342 | if len(continuous_vars) > 0 and len(cats) > 0:
343 | try:
344 | svg_data = draw_barplots(dft, cats, continuous_vars, problem_type,
345 | verbose, chart_format, dep_var, mk_dir)
346 | self.add_plots('bar_plot', svg_data)
347 | except Exception as e:
348 | print(f'Could not draw Bar Plots {e}')
349 | else:
350 | if len(cats) > 1:
351 | try:
352 | svg_data = draw_catscatterplots(dft, cats, verbose,
353 | chart_format, mk_dir=None)
354 | self.add_plots('catscatter_plot', svg_data)
355 | except Exception as e:
356 | print(f'Could not draw catscatter plots. {e}')
357 | else:
358 | if problem_type == 'Regression':
359 | ############## This is a Regression Problem #################
360 | if len(continuous_vars) > 0:
361 | try:
362 | svg_data = draw_scatters(dft,
363 | continuous_vars, verbose, chart_format, problem_type, dep_var, classes,
364 | lowess, mk_dir)
365 | self.add_plots('scatter_plot', svg_data)
366 | except Exception as e:
367 | print("Exception Drawing Scatter Plots")
368 | print(e)
369 | traceback.print_exc()
370 | print('Could not draw Scatter Plots')
371 | if len(continuous_vars) > 1:
372 | try:
373 | svg_data = draw_pair_scatters(dft, continuous_vars, problem_type, verbose, chart_format,
374 | dep_var, classes, lowess, mk_dir)
375 | self.add_plots('pair_scatter', svg_data)
376 | except Exception as e:
377 | print(f'Could not draw Pair Scatter Plots {e}')
378 | try:
379 | if type(dep_var) == str:
380 | othernums = [x for x in continuous_vars if x not in [dep_var]]
381 | else:
382 | othernums = [x for x in continuous_vars if x not in dep_var]
383 | if len(othernums) >= 1:
384 | svg_data = draw_distplot(dft, bool_vars + cats, continuous_vars, verbose, chart_format,
385 | problem_type, dep_var, classes, mk_dir)
386 | self.add_plots('dist_plot', svg_data)
387 | except Exception as e:
388 | print(f'Could not draw some Distribution Plots {e}')
389 | try:
390 | if len(continuous_vars) > 0:
391 | svg_data = draw_violinplot(dft, dep_var, continuous_vars, verbose, chart_format, problem_type,
392 | mk_dir)
393 | self.add_plots('violin_plot', svg_data)
394 | except Exception as e:
395 | print(f'Could not draw Violin Plots {e}')
396 | try:
397 | numeric_cols = [x for x in dft.select_dtypes(include='number').columns.tolist() if
398 | x not in [dep_var]]
399 | numeric_cols = list_difference(numeric_cols, date_vars)
400 | svg_data = draw_heatmap(dft,
401 | numeric_cols, verbose, chart_format, date_vars, dep_var,
402 | problem_type, mk_dir)
403 | self.add_plots('heat_map', svg_data)
404 | except Exception as e:
405 | print(f'Could not draw some Heat Maps {e}')
406 | if date_vars != [] and len(continuous_vars) > 0:
407 | try:
408 | svg_data = draw_date_vars(
409 | dft, dep_var, date_vars, continuous_vars, verbose, chart_format, problem_type, mk_dir)
410 | self.add_plots('date_plot', svg_data)
411 | except Exception as e:
412 | print(f'Could not draw some Time Series plots. {e}')
413 | if len(cats) > 0 and len(continuous_vars) == 0:
414 | ### This is somewhat duplicative with distplot (above) - hence do it only minimally!
415 | try:
416 | svg_data = draw_pivot_tables(dft, problem_type, verbose,
417 | chart_format, dep_var, mk_dir)
418 | self.add_plots('pivot_plot', svg_data)
419 | except Exception as e:
420 | print(f'Could not draw some Pivot Charts against Dependent Variable {e}')
421 | if len(continuous_vars) > 0 and len(cats) > 0:
422 | try:
423 | svg_data = draw_barplots(dft, find_remove_duplicates(cats + bool_vars), continuous_vars,
424 | problem_type, verbose, chart_format, dep_var, mk_dir)
425 | self.add_plots('bar_plot', svg_data)
426 | # self.add_plots('bar_plot',None)
427 | except Exception as e:
428 | print(f'Could not draw some Bar Charts {e}')
429 | else:
430 | if len(cats) > 1:
431 | try:
432 | svg_data = draw_catscatterplots(dft, cats, verbose,
433 | chart_format, mk_dir=None)
434 | self.add_plots('catscatter_plot', svg_data)
435 | except Exception as e:
436 | print(f'Could not draw catscatter plots... {e}')
437 | else:
438 | ############ This is a Classification Problem ##################
439 | if len(continuous_vars) > 0:
440 | try:
441 | svg_data = draw_scatters(dft, continuous_vars,
442 | verbose, chart_format, problem_type, dep_var, classes, lowess, mk_dir)
443 | self.add_plots('scatter_plot', svg_data)
444 | except Exception as e:
445 | print(e)
446 | traceback.print_exc()
447 | print('Could not draw some Scatter Plots')
448 | if len(continuous_vars) > 1:
449 | try:
450 | svg_data = draw_pair_scatters(dft, continuous_vars,
451 | problem_type, verbose, chart_format, dep_var, classes, lowess,
452 | mk_dir)
453 | self.add_plots('pair_scatter', svg_data)
454 | except Exception as e:
455 | print(f'Could not draw some Pair Scatter Plots {e}')
456 | try:
457 | if type(dep_var) == str:
458 | othernums = [x for x in continuous_vars if x not in [dep_var]]
459 | else:
460 | othernums = [x for x in continuous_vars if x not in dep_var]
461 |
462 | if len(othernums) >= 1:
463 | svg_data = draw_distplot(dft, bool_vars + cats, continuous_vars, verbose, chart_format,
464 | problem_type, dep_var, classes, mk_dir)
465 | self.add_plots('dist_plot', svg_data)
466 | else:
467 | print('No continuous var in data set: drawing categorical distribution plots')
468 | except Exception as e:
469 | print(f'Could not draw some Distribution Plots {e}')
470 | try:
471 | if len(continuous_vars) > 0:
472 | svg_data = draw_violinplot(dft, dep_var, continuous_vars, verbose, chart_format, problem_type,
473 | mk_dir)
474 | self.add_plots('violin_plot', svg_data)
475 | except Exception as e:
476 | print(f'Could not draw some Violin Plots {e}')
477 | try:
478 | numeric_cols = [x for x in dft.select_dtypes(include='number').columns.tolist() if
479 | x not in [dep_var]]
480 | numeric_cols = list_difference(numeric_cols, date_vars)
481 | svg_data = draw_heatmap(dft, numeric_cols,
482 | verbose, chart_format, date_vars, dep_var, problem_type,
483 | mk_dir)
484 | self.add_plots('heat_map', svg_data)
485 | except Exception as e:
486 | print(f'Could not draw some Heat Maps {e}')
487 | if date_vars != [] and len(continuous_vars) > 0:
488 | try:
489 | svg_data = draw_date_vars(dft, dep_var, date_vars,
490 | continuous_vars, verbose, chart_format, problem_type, mk_dir)
491 | self.add_plots('date_plot', svg_data)
492 | except Exception as e:
493 | print(f'Could not draw some Time Series plots. {e}')
494 | if len(cats) > 0 and len(continuous_vars) == 0:
495 | ### This is somewhat duplicative with distplot (above) - hence do it only minimally!
496 | try:
497 | svg_data = draw_pivot_tables(dft, problem_type, verbose,
498 | chart_format, dep_var, mk_dir)
499 | self.add_plots('pivot_plot', svg_data)
500 | except Exception as e:
501 | print(f'Could not draw some Pivot Charts against Dependent Variable {e}')
502 | if len(continuous_vars) > 0 and len(cats) > 0:
503 | try:
504 | svg_data = draw_barplots(dft, find_remove_duplicates(cats + bool_vars), continuous_vars,
505 | problem_type,
506 | verbose, chart_format, dep_var, mk_dir)
507 | self.add_plots('bar_plot', svg_data)
508 | pass
509 | except Exception as e:
510 | if verbose <= 1:
511 | print(f'Could not draw some Bar Charts {e}')
512 | pass
513 | else:
514 | if len(cats) > 1:
515 | try:
516 | svg_data = draw_catscatterplots(dft, cats, verbose,
517 | chart_format, mk_dir=None)
518 | self.add_plots('catscatter_plot', svg_data)
519 | except Exception as e:
520 | print(f'Could not draw catscatter plots. {e}')
521 | ###### Now you can check for NLP vars or discrete_string_vars to do wordcloud #######
522 | if len(discrete_string_vars) > 0:
523 | plotname = 'wordcloud'
524 | import nltk
525 | nltk.download('popular')
526 | for each_string_var in discrete_string_vars:
527 | try:
528 | svg_data = draw_word_clouds(dft, each_string_var, chart_format, plotname,
529 | dep_var, problem_type, classes, mk_dir, verbose=0)
530 | self.add_plots(plotname, svg_data)
531 | except Exception as e:
532 | print(f'Could not draw wordcloud plot for {each_string_var}. {e}')
533 | ### Now print the time taken to run charts for AutoViz #############
534 | if verbose <= 1:
535 | print('All Plots done')
536 | else:
537 | print('All Plots are saved in %s' % mk_dir)
538 | print('Time to run AutoViz = %0.0f seconds ' % (time.time() - start_time))
539 | if verbose <= 1:
540 | print('\n ###################### AUTO VISUALIZATION Completed ########################')
541 | return dft
542 |
543 |
544 | #############################################################################################
545 |
546 |
547 | # Create a new class FixDQ by inheriting from Fix_DQ
548 | class FixDQ(Fix_DQ):
549 | """
550 | FixDQ is a great way to clean an entire train data set and apply the same steps in
551 | an MLOps pipeline to a test dataset. FixDQ can be used to detect most issues in
552 | your data (similar to data_cleaning_suggestions but without the `target`
553 | related issues) in one step. Then it fixes those issues it finds during the
554 | `fit` method by the `transform` method. This transformer can then be saved
555 | (or "pickled") for applying the same steps on test data either at the same
556 | time or later.
557 |
558 | FixDQ will perform following data quality cleaning steps:
559 | It removes ID columns from further processing
560 | It removes zero-variance columns from further processing
561 | It identifies rare categories and groups them into a single category
562 | called "Rare"
563 | It finds infinite values and replaces them with an upper bound based on
564 | Inter Quartile Range
565 | It detects mixed data types and drops those mixed-type columns from
566 | further processing
567 | It detects outliers and suggests to remove them or use robust statistics.
568 | It detects high cardinality features but leaves them as it is.
569 | It detects highly correlated features and drops one of them (whichever
570 | comes first in the column sequence)
571 | It detects duplicate rows and drops one of them or keeps only one copy
572 | of duplicate rows
573 | It detects duplicate columns and drops one of them or keeps only one copy
574 | It detects skewed distributions and applies log or box-cox
575 | transformations on them.
576 | It detects imbalanced classes and leaves them as it is
577 | It detects feature leakage and drops one of those features if
578 | they are highly correlated to target
579 | """
580 |
581 | def __init__(self, quantile=0.87, cat_fill_value='missing',
582 | num_fill_value=9999, rare_threshold=0.01,
583 | correlation_threshold=0.9):
584 | super().__init__() # Call the parent class constructor
585 | # Additional initialization code here
586 | self.quantile = quantile
587 | self.cat_fill_value = cat_fill_value
588 | self.num_fill_value = num_fill_value
589 | self.rare_threshold = rare_threshold
590 | self.correlation_threshold = correlation_threshold
591 |
592 |
593 | ###################################################################################
594 |
595 |
596 | def data_cleaning_suggestions(df, target=None):
597 | """
598 | This is a simple program to give data cleaning and improvement suggestions in class AV.
599 | Make sure you send in a dataframe. Otherwise, this will give an error.
600 | """
601 | if isinstance(df, pd.DataFrame):
602 | dqr = dq_report(data=df, target=target, html=False, csv_engine="pandas", verbose=1)
603 | else:
604 | print("Input must be a dataframe. Please check input and try again.")
605 | return dqr
606 | ###################################################################################
607 |
--------------------------------------------------------------------------------
/autoviz/AutoViz_NLP.py:
--------------------------------------------------------------------------------
1 | """
2 | Copyright 2020 Google LLC. This software is provided as-is, without warranty or
3 | representation for any use or purpose. Your use of it is subject to your
4 | agreement with Google.
5 | """
6 | import pandas as pd
7 | import string
8 |
9 | from collections import Counter
10 |
11 | from .AutoViz_Utils import save_image_data
12 |
13 | pd.set_option('display.max_colwidth', 5000)
14 |
15 | # Contraction map
16 | c_dict = {
17 | "ain't": "am not",
18 | "aren't": "are not",
19 | "cant": "cannot",
20 | "can't": "cannot",
21 | "can't've": "cannot have",
22 | "'cause": "because",
23 | "b": "be",
24 | "bc": "because",
25 | "becos": "because",
26 | "bs": "Expletive",
27 | "cause": "because",
28 | "could've": "could have",
29 | "couldn't": "could not",
30 | "couldn't've": "could not have",
31 | "corp": "corporation",
32 | "cud": "could",
33 | "didn't": "did not",
34 | "doesn't": "does not",
35 | "don't": "do not",
36 | "execs": "executives",
37 | "fck": "fuck",
38 | "fcking": "fucking",
39 | "gon na": "going to",
40 | "hadn't": "had not",
41 | "hadn't've": "had not have",
42 | "hasn't": "has not",
43 | "haven't": "have not",
44 | "he'd": "he would",
45 | "he'd've": "he would have",
46 | "he'll": "he will",
47 | "he'll've": "he will have",
48 | "he's": "he is",
49 | "how'd": "how did",
50 | "how'd'y": "how do you",
51 | "how'll": "how will",
52 | "how's": "how is",
53 | "im": "i am",
54 | "iam": "i am",
55 | "i'd": "I would",
56 | "i'd've": "I would have",
57 | "i'll": "I will",
58 | "i'll've": "I will have",
59 | "i'm": "I am",
60 | "i've": "I have",
61 | "isn't": "is not",
62 | "it'd": "it had",
63 | "it'd've": "it would have",
64 | "it'll": "it will",
65 | "it'll've": "it will have",
66 | "it's": "it is",
67 | "let's": "let us",
68 | "ma'am": "madam",
69 | "mayn't": "may not",
70 | "mgr": "manager",
71 | "might've": "might have",
72 | "mightn't": "might not",
73 | "mightn't've": "might not have",
74 | "must've": "must have",
75 | "mustn't": "must not",
76 | "mustn't've": "must not have",
77 | "needn't": "need not",
78 | "needn't've": "need not have",
79 | "o'clock": "of the clock",
80 | "ofc": "office",
81 | "oughtn't": "ought not",
82 | "oughtn't've": "ought not have",
83 | "pics": "pictures",
84 | "shan't": "shall not",
85 | "sha'n't": "shall not",
86 | "shan't've": "shall not have",
87 | "she'd": "she would",
88 | "she'd've": "she would have",
89 | "she'll": "she will",
90 | "she'll've": "she will have",
91 | "she's": "she is",
92 | "should've": "should have",
93 | "shouldn't": "should not",
94 | "shouldn't've": "should not have",
95 | "so've": "so have",
96 | "so's": "so is",
97 | "svc": "service",
98 | "that'd": "that would",
99 | "that'd've": "that would have",
100 | "that's": "that is",
101 | "there'd": "there had",
102 | "there'd've": "there would have",
103 | "there's": "there is",
104 | "they'd": "they would",
105 | "they'd've": "they would have",
106 | "they'll": "they will",
107 | "they'll've": "they will have",
108 | "they're": "they are",
109 | "they've": "they have",
110 | "tho": "though",
111 | "to've": "to have",
112 | "wan na": "want to",
113 | "wasn't": "was not",
114 | "we'd": "we had",
115 | "we'd've": "we would have",
116 | "we'll": "we will",
117 | "we'll've": "we will have",
118 | "we're": "we are",
119 | "we've": "we have",
120 | "weren't": "were not",
121 | "what'll": "what will",
122 | "what'll've": "what will have",
123 | "what're": "what are",
124 | "what's": "what is",
125 | "what've": "what have",
126 | "when's": "when is",
127 | "when've": "when have",
128 | "where'd": "where did",
129 | "where's": "where is",
130 | "where've": "where have",
131 | "who'll": "who will",
132 | "who'll've": "who will have",
133 | "who's": "who is",
134 | "who've": "who have",
135 | "why's": "why is",
136 | "why've": "why have",
137 | "will've": "will have",
138 | "won't": "will not",
139 | "won't've": "will not have",
140 | "would've": "would have",
141 | "wouldn't": "would not",
142 | "wouldn't've": "would not have",
143 | "y'all": "you all",
144 | "y'alls": "you alls",
145 | "y'all'd": "you all would",
146 | "y'all'd've": "you all would have",
147 | "y'all're": "you all are",
148 | "y'all've": "you all have",
149 | "you'd": "you had",
150 | "you'd've": "you would have",
151 | "you'll": "you you will",
152 | "you'll've": "you you will have",
153 | "you're": "you are",
154 | "you've": "you have"
155 | }
156 |
157 |
158 | ##################################################################################
159 | def left_subtract(l1, l2):
160 | lst = []
161 | for i in l1:
162 | if i not in l2:
163 | lst.append(i)
164 | return lst
165 |
166 |
167 | ################################################################################
168 | def return_stop_words():
169 | STOP_WORDS = ['it', "this", "that", "to", 'its', 'am', 'is', 'are', 'was', 'were', 'a',
170 | 'an', 'the', 'and', 'or', 'of', 'at', 'by', 'for', 'with', 'about', 'between',
171 | 'into', 'above', 'below', 'from', 'up', 'down', 'in', 'out', 'on', 'over', 'will', 'shall', 'could',
172 | 'under', 'again', 'further', 'then', 'once', 'all', 'any', 'both', 'each', 'would',
173 | 'few', 'more', 'most', 'other', 'some', 'such', 'only', 'own', 'same', 'so',
174 | 'than', 'too', 'very', 's', 't', 'can', 'just', 'd', 'll', 'm', 'o', 're',
175 | 've', 'y', 'ain', 'ma', 'them', 'themselves', 'they', 'he', 'she', 'ex', 'become', 'their']
176 | add_words = ["s", "m", 'you', 'not', 'get', 'no', 'via', 'one', 'still', 'us', 'u', 'hey', 'hi', 'oh', 'jeez',
177 | 'the', 'a', 'in', 'to', 'of', 'i', 'and', 'is', 'for', 'on', 'it', 'got', 'aww', 'awww',
178 | 'not', 'my', 'that', 'by', 'with', 'are', 'at', 'this', 'from', 'be', 'have', 'was',
179 | '', ' ', 'say', 's', 'u', 'ap', 'afp', '...', 'n', '\\']
180 | # stopWords = text.ENGLISH_STOP_WORDS.union(add_words)
181 | stop_words = list(set(STOP_WORDS + add_words))
182 | excl = ['will', "i'll", 'shall', "you'll", 'may', "don't", "hadn't", "hasn't", "haven't",
183 | "don't", "isn't", 'if', "mightn't", "mustn'", "mightn't", 'mightn', "needn't",
184 | 'needn', "needn't", 'no', 'not', 'shan', "shan't", 'shouldn', "shouldn't", "wasn't",
185 | 'wasn', 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't", "you'd",
186 | "you'd", "you'll", "you're", 'yourself', 'yourselves']
187 | stopWords = left_subtract(stop_words, excl)
188 | return sorted(stopWords)
189 |
190 |
191 | ##################################################################################
192 | def expandContractions(text):
193 | """
194 | Takes in a sentence, splits it into list of strings and returns sentence back with
195 | items sibstituted by expanded abbreviations or abbreviated words that are expanded.
196 | """
197 | text_list = text.split(" ")
198 | return " ".join([c_dict.get(item, item) for item in text_list])
199 |
200 |
201 | #
202 | # remove entire URL
203 | def remove_URL(text):
204 | url = re.compile(r'https?://\S+|www\.\S+')
205 | return url.sub(r'', text)
206 |
207 |
208 | # Remove just HTML markup language
209 | def remove_html(text):
210 | html = re.compile(r'<.*?>')
211 | return html.sub(r'', text)
212 |
213 |
214 | # Convert Emojis to Text
215 | import emoji
216 |
217 |
218 | def convert_emojis(text):
219 | return emoji.demojize(text)
220 |
221 |
222 | def remove_punct(text):
223 | table = str.maketrans('', '', string.punctuation)
224 | return text.translate(table)
225 |
226 |
227 | # Clean even further removing non-printable text
228 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert
229 |
230 |
231 | def remove_stopwords(tweet):
232 | """Removes STOP_WORDS characters"""
233 | stop_words = return_stop_words()
234 | tweet = tweet.lower()
235 | tweet = ' '.join([x for x in tweet.split(" ") if x not in stop_words])
236 | tweet = ''.join([x for x in tweet if x in string.printable])
237 | return tweet
238 |
239 |
240 | # define a function that accepts text and returns a list of lemmas
241 | def split_into_lemmas(text):
242 | words = TextBlob(text).words
243 | text = ' '.join([word.lemmatize() for word in words])
244 | return text
245 |
246 |
247 | # Expand Slangs
248 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert
249 | slangs = {
250 | "IG": "Instagram",
251 | "FB": "Facebook",
252 | "MOFO": "Expletive",
253 | "OMG": "Oh my God",
254 | "ROFL": "roll on the floor laughing",
255 | "ROFLOL": "roll on the floor laughing out loud",
256 | "ROTFLMAO": "roll on the floor laughing my ass off",
257 | "FCK": "Expletive",
258 | "LMAO": "Laugh my Ass off",
259 | "LOL": "laugh out loud",
260 | }
261 |
262 | abbreviations = {
263 | "$": " dollar ",
264 | "€": " euro ",
265 | "4ao": "for adults only",
266 | "a.m": "before midday",
267 | "a3": "anytime anywhere anyplace",
268 | "aamof": "as a matter of fact",
269 | "acct": "account",
270 | "adih": "another day in hell",
271 | "afaic": "as far as i am concerned",
272 | "afaict": "as far as i can tell",
273 | "afaik": "as far as i know",
274 | "afair": "as far as i remember",
275 | "afk": "away from keyboard",
276 | "app": "application",
277 | "approx": "approximately",
278 | "apps": "applications",
279 | "asap": "as soon as possible",
280 | "asl": "age, sex, location",
281 | "atk": "at the keyboard",
282 | "ave.": "avenue",
283 | "aymm": "are you my mother",
284 | "ayor": "at your own risk",
285 | "b&b": "bed and breakfast",
286 | "b+b": "bed and breakfast",
287 | "b.c": "before christ",
288 | "b2b": "business to business",
289 | "b2c": "business to customer",
290 | "b4": "before",
291 | "b4n": "bye for now",
292 | "b@u": "back at you",
293 | "bae": "before anyone else",
294 | "bak": "back at keyboard",
295 | "bbbg": "bye bye be good",
296 | "bbc": "british broadcasting corporation",
297 | "bbias": "be back in a second",
298 | "bbl": "be back later",
299 | "bbs": "be back soon",
300 | "be4": "before",
301 | "bfn": "bye for now",
302 | "blvd": "boulevard",
303 | "bout": "about",
304 | "brb": "be right back",
305 | "bros": "brothers",
306 | "brt": "be right there",
307 | "bsaaw": "big smile and a wink",
308 | "btch": "bitch",
309 | "btw": "by the way",
310 | "btfd": "buy the Expletive dip",
311 | "bwl": "bursting with laughter",
312 | "c/o": "care of",
313 | "cet": "central european time",
314 | "cf": "compare",
315 | "cia": "central intelligence agency",
316 | "csl": "can not stop laughing",
317 | "cu": "see you",
318 | "cul8r": "see you later",
319 | "cv": "curriculum vitae",
320 | "cwot": "complete waste of time",
321 | "cya": "see you",
322 | "cyt": "see you tomorrow",
323 | "dae": "does anyone else",
324 | "dbmib": "do not bother me i am busy",
325 | "diy": "do it yourself",
326 | "dm": "direct message",
327 | "dwh": "during work hours",
328 | "e123": "easy as one two three",
329 | "eet": "eastern european time",
330 | "eg": "example",
331 | "embm": "early morning business meeting",
332 | "encl": "enclosed",
333 | "encl.": "enclosed",
334 | "etc": "and so on",
335 | "faq": "frequently asked questions",
336 | "fawc": "for anyone who cares",
337 | "fb": "facebook",
338 | "fc": "fingers crossed",
339 | "fig": "figure",
340 | "fimh": "forever in my heart",
341 | "ft.": "feet",
342 | "ft": "featuring",
343 | "ftl": "for the loss",
344 | "ftw": "for the win",
345 | "fwiw": "for what it is worth",
346 | "fyi": "for your information",
347 | "g9": "genius",
348 | "gahoy": "get a hold of yourself",
349 | "gal": "get a life",
350 | "gcse": "general certificate of secondary education",
351 | "gfn": "gone for now",
352 | "gg": "good game",
353 | "gl": "good luck",
354 | "glhf": "good luck have fun",
355 | "gmt": "greenwich mean time",
356 | "gmta": "great minds think alike",
357 | "gn": "good night",
358 | "g.o.a.t": "greatest of all time",
359 | "goat": "greatest of all time",
360 | "goi": "get over it",
361 | "gps": "global positioning system",
362 | "gr8": "great",
363 | "gratz": "congratulations",
364 | "gyal": "girl",
365 | "h&c": "hot and cold",
366 | "hp": "horsepower",
367 | "hr": "hour",
368 | "hrh": "his royal highness",
369 | "ht": "height",
370 | "ibrb": "i will be right back",
371 | "ic": "i see",
372 | "icq": "i seek you",
373 | "icymi": "in case you missed it",
374 | "idc": "i do not care",
375 | "idgadf": "i do not give a damn Expletive",
376 | "idgaf": "i do not give a Expletive",
377 | "idk": "i do not know",
378 | "ie": "that is",
379 | "i.e": "that is",
380 | "ifyp": "i feel your pain",
381 | "iirc": "if i remember correctly",
382 | "ilu": "i love you",
383 | "ily": "i love you",
384 | "imho": "in my humble opinion",
385 | "imo": "in my opinion",
386 | "imu": "i miss you",
387 | "iow": "in other words",
388 | "irl": "in real life",
389 | "j4f": "just for fun",
390 | "jic": "just in case",
391 | "jk": "just kidding",
392 | "jsyk": "just so you know",
393 | "l8r": "later",
394 | "lb": "pound",
395 | "lbs": "pounds",
396 | "ldr": "long distance relationship",
397 | "lmao": "laugh my ass off",
398 | "lmfao": "laugh my Expletive ass off",
399 | "lol": "laugh out loud",
400 | "ltd": "limited",
401 | "ltns": "long time no see",
402 | "m8": "mate",
403 | "mf": "Expletive",
404 | "mfing": "Expletive",
405 | "mfs": "Expletive",
406 | "mfw": "my face when",
407 | "mofo": "Expletive",
408 | "mph": "miles per hour",
409 | "mr": "mister",
410 | "mrw": "my reaction when",
411 | "ms": "miss",
412 | "mte": "my thoughts exactly",
413 | "nagi": "not a good idea",
414 | "nbc": "national broadcasting company",
415 | "nbd": "not big deal",
416 | "nfs": "not for sale",
417 | "ngl": "not going to lie",
418 | "nhs": "national health service",
419 | "nrn": "no reply necessary",
420 | "nsfl": "not safe for life",
421 | "nsfw": "not safe for work",
422 | "nth": "nice to have",
423 | "nvr": "never",
424 | "nyc": "new york city",
425 | "oc": "original content",
426 | "og": "original",
427 | "ohp": "overhead projector",
428 | "oic": "oh i see",
429 | "omdb": "over my dead body",
430 | "omg": "oh my god",
431 | "omw": "on my way",
432 | "p.a": "per annum",
433 | "p.m": "after midday",
434 | "pm": "prime minister",
435 | "poc": "people of color",
436 | "pov": "point of view",
437 | "pp": "pages",
438 | "ppl": "people",
439 | "prw": "parents are watching",
440 | "ps": "postscript",
441 | "pt": "point",
442 | "ptb": "please text back",
443 | "pto": "please turn over",
444 | "qpsa": "what happens", # "que pasa",
445 | "ratchet": "rude",
446 | "rbtl": "read between the lines",
447 | "rlrt": "real life retweet",
448 | "rofl": "rolling on the floor laughing",
449 | "roflol": "rolling on the floor laughing out loud",
450 | "rotflmao": "rolling on the floor laughing my ass off",
451 | "rt": "retweet",
452 | "ruok": "are you ok",
453 | "sfw": "safe for work",
454 | "sk8": "skate",
455 | "smh": "shake my head",
456 | "sq": "square",
457 | "srsly": "seriously",
458 | "ssdd": "same stuff different day",
459 | "tbh": "to be honest",
460 | "tbs": "tablespooful",
461 | "tbsp": "tablespooful",
462 | "tfw": "that feeling when",
463 | "thks": "thank you",
464 | "tho": "though",
465 | "thx": "thank you",
466 | "tia": "thanks in advance",
467 | "til": "today i learned",
468 | "tl;dr": "too long i did not read",
469 | "tldr": "too long i did not read",
470 | "tmb": "tweet me back",
471 | "tntl": "trying not to laugh",
472 | "ttyl": "talk to you later",
473 | "u": "you",
474 | "u2": "you too",
475 | "u4e": "yours for ever",
476 | "utc": "coordinated universal time",
477 | "w/": "with",
478 | "w/o": "without",
479 | "w8": "wait",
480 | "wassup": "what is up",
481 | "wb": "welcome back",
482 | "wtf": "what the Expletive",
483 | "WTF": "what the Expletive",
484 | "wtg": "way to go",
485 | "wtpa": "where the party at",
486 | "wuf": "where are you from",
487 | "wuzup": "what is up",
488 | "wywh": "wish you were here",
489 | "yd": "yard",
490 | "ygtr": "you got that right",
491 | "ynk": "you never know",
492 | "zzz": "sleeping bored and tired"
493 | }
494 |
495 |
496 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert
497 |
498 |
499 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert
500 | def expandAbbreviations(sentence):
501 | text = sentence.split(" ")
502 | return " ".join([abbreviations.get(item, item) for item in text])
503 |
504 |
505 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert
506 | def expandSlangs(sentence):
507 | text = sentence.split(" ")
508 | return " ".join([slangs.get(item, item) for item in text])
509 |
510 |
511 | def join_words(text):
512 | return " ".join(text)
513 |
514 |
515 | def remove_punctuations(text: str):
516 | return re.sub(r'[^\w\s]', '', text)
517 |
518 |
519 | def remove_emoji(text):
520 | emoji_pattern = re.compile("["
521 | u"\U0001F600-\U0001F64F" # emoticons
522 | u"\U0001F300-\U0001F5FF" # symbols & pictographs
523 | u"\U0001F680-\U0001F6FF" # transport & map symbols
524 | u"\U0001F1E0-\U0001F1FF" # flags (iOS)
525 | u"\U00002702-\U000027B0"
526 | u"\U000024C2-\U0001F251"
527 | "]+", flags=re.UNICODE)
528 | return emoji_pattern.sub(r'', text)
529 |
530 |
531 | #### This counts emojis in a sentence which is very helpful to gauge sentiment
532 | def count_emojis(sentence):
533 | import regex
534 | import emoji
535 | emoji_counter = 0
536 | data = regex.findall(r'\X', sentence)
537 | for word in data:
538 | if any(char in emoji.UNICODE_EMOJI for char in word):
539 | emoji_counter += 1
540 | return emoji_counter
541 |
542 |
543 | ################################################################################
544 | import re
545 | from wordcloud import WordCloud, STOPWORDS
546 | import matplotlib.pyplot as plt
547 | from textblob import TextBlob
548 | from itertools import chain
549 |
550 | replace_spaces = re.compile('[/(){}\[\]\|@,;]')
551 | remove_special_chars = re.compile('[^0-9a-z #+_]')
552 | STOPWORDS = return_stop_words()
553 | remove_ip_addr = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b')
554 |
555 |
556 | def clean_steps(text):
557 | text = text.replace('\n', ' ').lower() #
558 | text = remove_ip_addr.sub('', text)
559 | text = replace_spaces.sub(' ', text)
560 | text = remove_special_chars.sub('', text)
561 | text = ' '.join([w for w in text.split() if w not in STOPWORDS])
562 | return text
563 |
564 |
565 | def clean_text(x):
566 | """
567 | ###############################################################################
568 | ## This cleans text string. Use it only as a Series.map(clean_text) function #
569 | ###############################################################################
570 | Input must be one text string only. Don't send arrays or dataframes.
571 | Clean steps cleans one tweet at a time using following steps:
572 | 1. removes URL
573 | 2. Removes a very small list of stop words - about 65
574 | """
575 | x = expandSlangs(x) ### do this before lowering case since case is important for sentiment
576 | x = expandAbbreviations(x) ### this is before lowering case since case is important in sentiment
577 | x = expandContractions(x) ### this is after lowering case - just to double check
578 | x = remove_stopwords(x) ## this works well to remove a small number of stop words
579 | x = remove_punctuations(x) # this works well to remove punctuations and add spaces correctly
580 | x = split_into_lemmas(x) ## this lemmatizes text and gets it ready for wordclouds ###
581 | return x
582 |
583 |
584 | def draw_wordcloud_from_dataframe(dataframe, column):
585 | """
586 | This handy function draws a dataframe column using Wordcloud library and nltk.
587 | """
588 |
589 | ### Remember that fillna only works at dataframe level! ##
590 | X_train = dataframe[[column]].fillna("missing")
591 | ### Map function only works on Series, so you should use this ###
592 | X_train = X_train[column].map(clean_steps)
593 | ### next time, you get back a series, so just use it as is ###
594 | X_train = X_train.map(clean_text)
595 |
596 | # Dictionary of all words from train corpus with their counts.
597 |
598 | ### Fantastic way to count words using one line of code #############
599 | ### Thanks to : https://stackoverflow.com/questions/35857519/efficiently-count-word-frequencies-in-python
600 | words_counts = Counter(chain.from_iterable(map(str.split, X_train)))
601 | vocab_size = 50000
602 | top_words = sorted(words_counts, key=words_counts.get, reverse=True)[:vocab_size]
603 | text_join = ' '.join(top_words)
604 |
605 | # picture_mask = plt.imread('test.png')
606 |
607 | wordcloud1 = WordCloud(
608 | stopwords=STOPWORDS,
609 | background_color='white',
610 | width=1800,
611 | height=1400,
612 | # mask=picture_mask
613 | ).generate(text_join)
614 | return wordcloud1
615 |
616 |
617 | ################################################################################
618 | # Removes duplicates from a list to return unique values - USED ONLYONCE
619 | def find_remove_duplicates(values):
620 | output = []
621 | seen = set()
622 | for value in values:
623 | if value not in seen:
624 | output.append(value)
625 | seen.add(value)
626 | return output
627 |
628 |
629 | def draw_word_clouds(dft, each_string_var, chart_format, plotname,
630 | dep, problem_type, classes, mk_dir, verbose=0):
631 | dft = dft[:]
632 | width_size = 20
633 | height_size = 10
634 | imgdata_list = []
635 |
636 | if problem_type == 'Regression' or problem_type == 'Clustering':
637 | ########## This is for Regression and Clustering problems only #####
638 | num_plots = 1
639 | fig = plt.figure(figsize=(min(num_plots * width_size, 20), min(num_plots * height_size, 20)))
640 | cols = 2
641 | rows = int(num_plots / cols + 0.5)
642 | plotc = 1
643 | while plotc <= num_plots:
644 | plt.subplot(rows, cols, plotc)
645 | ax1 = plt.gca()
646 | wc1 = draw_wordcloud_from_dataframe(dft, each_string_var)
647 | plotc += 1
648 | ax1.axis("off")
649 | ax1.imshow(wc1)
650 | ax1.set_title('Wordcloud for %s' % each_string_var)
651 | image_count = 0
652 | if verbose == 2:
653 | imgdata_list.append(save_image_data(fig, chart_format,
654 | plotname, mk_dir))
655 | image_count += 1
656 | if verbose <= 1:
657 | plt.show()
658 | else:
659 | ########## This is for Classification problems only ###########
660 | num_plots = len(classes)
661 | target_vars = dft[dep].unique()
662 | fig = plt.figure(figsize=(min(num_plots * width_size, 20), min(num_plots * height_size, 20)))
663 | cols = 2
664 | rows = int(num_plots / cols + 0.5)
665 | plotc = 1
666 | while plotc <= num_plots:
667 | plt.subplot(rows, cols, plotc)
668 | ax1 = plt.gca()
669 | ax1.axis("off")
670 | dft_target = dft.loc[(dft[dep] == target_vars[plotc - 1])][each_string_var]
671 | if isinstance(dft_target, pd.Series):
672 | wc1 = draw_wordcloud_from_dataframe(pd.DataFrame(dft_target), each_string_var)
673 | else:
674 | wc1 = draw_wordcloud_from_dataframe(dft_target, each_string_var)
675 | ax1.imshow(wc1)
676 | ax1.set_title('Wordcloud for %s, target=%s' % (each_string_var, target_vars[plotc - 1]), fontsize=20)
677 | plotc += 1
678 | fig.tight_layout()
679 | ### This is where you save the fig or show the fig ######
680 | image_count = 0
681 | if verbose == 2:
682 | imgdata_list.append(save_image_data(fig, chart_format,
683 | plotname, mk_dir))
684 | image_count += 1
685 | if verbose <= 1:
686 | plt.show()
687 | ####### End of Word Clouds #############################
688 |
--------------------------------------------------------------------------------
/autoviz/__init__.py:
--------------------------------------------------------------------------------
1 | name = "autoviz"
2 | from .__version__ import __version__, __holo_version__
3 | from .AutoViz_Class import AutoViz_Class
4 | from .AutoViz_Class import data_cleaning_suggestions
5 | from .AutoViz_Class import FixDQ
6 | ############################################################################################
7 | if __name__ == "__main__":
8 | module_type = 'Running'
9 | else:
10 | module_type = 'Imported'
11 | version_number = __version__
12 | print("""%s v%s. Please call AutoViz in this sequence:
13 | AV = AutoViz_Class()
14 | %%matplotlib inline
15 | dfte = AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=1, lowess=False,
16 | chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None)""" % (
17 | module_type, version_number))
18 | ###########################################################################################
19 |
--------------------------------------------------------------------------------
/autoviz/__version__.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """Specifies the version of the Auto_ViML package."""
3 |
4 | __title__ = "AutoViz"
5 | __author__ = "Ram Seshadri"
6 | __description__ = "Automatically Visualize any data set any size with a Single Line of Code"
7 | __url__ = "https://github.com/AutoViML/AutoViz.git"
8 | __version__ = "0.1.905"
9 | __holo_version__ = "0.0.4"
10 | __license__ = "Apache License 2.0"
11 | __copyright__ = "2020-21 Google"
12 |
--------------------------------------------------------------------------------
/autoviz/classify_method.py:
--------------------------------------------------------------------------------
1 | import random
2 |
3 | import numpy as np
4 | import pandas as pd
5 |
6 | np.random.seed(99)
7 | random.seed(42)
8 | ################################################################################
9 | #### The warnings from Sklearn are so annoying that I have to shut it off #######
10 | import warnings
11 |
12 | warnings.filterwarnings("ignore")
13 | from sklearn.exceptions import DataConversionWarning
14 |
15 | warnings.filterwarnings(action='ignore', category=DataConversionWarning)
16 |
17 |
18 | def warn(*args, **kwargs):
19 | pass
20 |
21 |
22 | warnings.warn = warn
23 | ####################################################################################
24 | from functools import reduce
25 |
26 |
27 | def left_subtract(l1, l2):
28 | lst = []
29 | for i in l1:
30 | if i not in l2:
31 | lst.append(i)
32 | return lst
33 |
34 |
35 | #################################################################################
36 | import copy
37 |
38 |
39 | def EDA_find_remove_columns_with_infinity(df, remove=False, verbose=0):
40 | """
41 | This function finds all columns in a dataframe that have infinite values (np.inf or -np.inf)
42 | It returns a list of column names. If the list is empty, it means no columns were found.
43 | If remove flag is set, then it returns a smaller dataframe with inf columns removed.
44 | """
45 | nums = df.select_dtypes(include='number').columns.tolist()
46 | dfx = df[nums]
47 | sum_rows = np.isinf(dfx).values.sum()
48 | add_cols = list(dfx.columns.to_series()[np.isinf(dfx).any()])
49 | if sum_rows > 0:
50 | if verbose > 0:
51 | print(' there are %d rows and %d columns with infinity in them...' % (sum_rows, len(add_cols)))
52 | if remove:
53 | ### here you need to use df since the whole dataset is involved ###
54 | nocols = [x for x in df.columns if x not in add_cols]
55 | if verbose > 0:
56 | print(" Shape of dataset before %s and after %s removing columns with infinity" %
57 | (df.shape, (df[nocols].shape,)))
58 | return df[nocols]
59 | else:
60 | ## this will be a list of columns with infinity ####
61 | return add_cols
62 | else:
63 | ## this will be an empty list if there are no columns with infinity
64 | return add_cols
65 |
66 |
67 | ####################################################################################
68 | def classify_columns(df_preds, verbose=0):
69 | """
70 | This actually does Exploratory data analysis - it means this function performs EDA
71 | ######################################################################################
72 | Takes a dataframe containing only predictors to be classified into various types.
73 | DO NOT SEND IN A TARGET COLUMN since it will try to include that into various columns.
74 | Returns a data frame containing columns and the class it belongs to such as numeric,
75 | categorical, date or id column, boolean, nlp, discrete_string and cols to delete...
76 | ####### Returns a dictionary with 10 kinds of vars like the following: # continuous_vars,int_vars
77 | # cat_vars,factor_vars, bool_vars,discrete_string_vars,nlp_vars,date_vars,id_vars,cols_delete
78 | """
79 | train = copy.deepcopy(df_preds)
80 | #### If there are 30 chars are more in a discrete_string_var, it is then considered an NLP variable
81 | max_nlp_char_size = 30
82 | max_cols_to_print = 30
83 | print('#######################################################################################')
84 | print('######################## C L A S S I F Y I N G V A R I A B L E S ####################')
85 | print('#######################################################################################')
86 | print('Classifying variables in data set...')
87 | #### Cat_Limit defines the max number of categories a column can have to be called a categorical colum
88 | cat_limit = 35
89 | float_limit = 15 #### Make this limit low so that float variables below this limit become cat vars ###
90 |
91 | def add(a, b):
92 | return a + b
93 |
94 | sum_all_cols = dict()
95 | orig_cols_total = train.shape[1]
96 | # Types of columns
97 | cols_delete = []
98 | cols_delete = [col for col in list(train) if (len(train[col].value_counts()) == 1)
99 | | (train[col].isnull().sum() / len(train) >= 0.90)]
100 | inf_cols = EDA_find_remove_columns_with_infinity(train, remove=False, verbose=verbose)
101 | mixed_cols = [x for x in list(train) if len(train[x].dropna().apply(type).value_counts()) > 1]
102 | if len(mixed_cols) > 0:
103 | print(' Removing %s column(s) due to mixed data type detected...' % mixed_cols)
104 | cols_delete += mixed_cols
105 | cols_delete += inf_cols
106 | train = train[left_subtract(list(train), cols_delete)]
107 | var_df = pd.Series(dict(train.dtypes)).reset_index(drop=False).rename(
108 | columns={0: 'type_of_column'})
109 | sum_all_cols['cols_delete'] = cols_delete
110 |
111 | var_df['bool'] = var_df.apply(
112 | lambda x: 1 if x['type_of_column'] in ['bool', 'object'] and len(train[x['index']].value_counts()) == 2 else 0,
113 | axis=1)
114 | string_bool_vars = list(var_df[(var_df['bool'] == 1)]['index'])
115 | sum_all_cols['string_bool_vars'] = string_bool_vars
116 | var_df['num_bool'] = var_df.apply(lambda x: 1 if x['type_of_column'] in [np.uint8,
117 | np.uint16, np.uint32, np.uint64,
118 | 'int8', 'int16', 'int32', 'int64',
119 | 'float16', 'float32', 'float64'] and len(
120 | train[x['index']].value_counts()) == 2 else 0, axis=1)
121 | num_bool_vars = list(var_df[(var_df['num_bool'] == 1)]['index'])
122 | sum_all_cols['num_bool_vars'] = num_bool_vars
123 | ###### This is where we take all Object vars and split them into diff kinds ###
124 | discrete_or_nlp = var_df.apply(lambda x: 1 if x['type_of_column'] in ['object'] and x[
125 | 'index'] not in string_bool_vars + cols_delete else 0, axis=1)
126 | ######### This is where we figure out whether a string var is nlp or discrete_string var ###
127 | var_df['nlp_strings'] = 0
128 | var_df['discrete_strings'] = 0
129 | var_df['cat'] = 0
130 | var_df['id_col'] = 0
131 | discrete_or_nlp_vars = var_df.loc[discrete_or_nlp == 1]['index'].values.tolist()
132 | copy_discrete_or_nlp_vars = copy.deepcopy(discrete_or_nlp_vars)
133 | if len(discrete_or_nlp_vars) > 0:
134 | for col in copy_discrete_or_nlp_vars:
135 | #### first fill empty or missing vals since it will blowup ###
136 | ### Remember that fillna only works at the dataframe level!
137 | train[[col]] = train[[col]].fillna(' ')
138 | if train[col].map(lambda x: len(x) if type(x) == str else 0).max(
139 | ) >= 50 and len(train[col].value_counts()) >= int(0.9 * len(train)) and col not in string_bool_vars:
140 | var_df.loc[var_df['index'] == col, 'nlp_strings'] = 1
141 | elif train[col].map(lambda x: len(x) if type(x) == str else 0).mean(
142 | ) >= max_nlp_char_size and train[col].map(lambda x: len(x) if type(x) == str else 0).max(
143 | ) < 50 and len(train[col].value_counts()
144 | ) <= int(0.9 * len(train)) and col not in string_bool_vars:
145 | var_df.loc[var_df['index'] == col, 'discrete_strings'] = 1
146 | elif len(train[col].value_counts()) > cat_limit and len(
147 | train[col].value_counts()) <= int(0.9 * len(train)) and col not in string_bool_vars:
148 | var_df.loc[var_df['index'] == col, 'discrete_strings'] = 1
149 | elif len(train[col].value_counts()) > cat_limit and len(train[col].value_counts()
150 | ) == len(train) and col not in string_bool_vars:
151 | var_df.loc[var_df['index'] == col, 'id_col'] = 1
152 | else:
153 | var_df.loc[var_df['index'] == col, 'cat'] = 1
154 | nlp_vars = list(var_df[(var_df['nlp_strings'] == 1)]['index'])
155 | sum_all_cols['nlp_vars'] = nlp_vars
156 | discrete_string_vars = list(var_df[(var_df['discrete_strings'] == 1)]['index'])
157 | sum_all_cols['discrete_string_vars'] = discrete_string_vars
158 | ###### This happens only if a string column happens to be an ID column #######
159 | #### DO NOT Add this to ID_VARS yet. It will be done later. Don't change it easily...
160 | #### Category DTYPE vars are very special = they can be left as is and not disturbed in Python. ###
161 | var_df['dcat'] = var_df.apply(lambda x: 1 if str(x['type_of_column']) == 'category' else 0,
162 | axis=1)
163 | factor_vars = list(var_df[(var_df['dcat'] == 1)]['index'])
164 | sum_all_cols['factor_vars'] = factor_vars
165 | ########################################################################
166 | date_or_id = var_df.apply(lambda x: 1 if x['type_of_column'] in [np.uint8,
167 | np.uint16, np.uint32, np.uint64,
168 | 'int8', 'int16',
169 | 'int32', 'int64'] and x[
170 | 'index'] not in (string_bool_vars + num_bool_vars +
171 | discrete_string_vars + nlp_vars) else 0,
172 | axis=1)
173 | ######### This is where we figure out whether a numeric col is date or id variable ###
174 | var_df['int'] = 0
175 | var_df['date_time'] = 0
176 | ### if a particular column is date-time type, now set it as a date time variable ##
177 | var_df['date_time'] = var_df.apply(lambda x: 1 if x['type_of_column'] in [' 2050:
185 | var_df.loc[var_df['index'] == col, 'id_col'] = 1
186 | else:
187 | try:
188 | pd.to_datetime(train[col], infer_datetime_format=True)
189 | var_df.loc[var_df['index'] == col, 'date_time'] = 1
190 | except:
191 | var_df.loc[var_df['index'] == col, 'id_col'] = 1
192 | else:
193 | if train[col].min() < 1900 or train[col].max() > 2050:
194 | if col not in num_bool_vars:
195 | var_df.loc[var_df['index'] == col, 'int'] = 1
196 | else:
197 | try:
198 | pd.to_datetime(train[col], infer_datetime_format=True)
199 | var_df.loc[var_df['index'] == col, 'date_time'] = 1
200 | except:
201 | if col not in num_bool_vars:
202 | var_df.loc[var_df['index'] == col, 'int'] = 1
203 | else:
204 | pass
205 | int_vars = list(var_df[(var_df['int'] == 1)]['index'])
206 | date_vars = list(var_df[(var_df['date_time'] == 1)]['index'])
207 | id_vars = list(var_df[(var_df['id_col'] == 1)]['index'])
208 | sum_all_cols['int_vars'] = int_vars
209 | copy_date_vars = copy.deepcopy(date_vars)
210 | for date_var in copy_date_vars:
211 | #### This test is to make sure date vars are actually date vars
212 | try:
213 | pd.to_datetime(train[date_var], infer_datetime_format=True)
214 | except:
215 | ##### if not a date var, then just add it to delete it from processing
216 | cols_delete.append(date_var)
217 | date_vars.remove(date_var)
218 | sum_all_cols['date_vars'] = date_vars
219 | sum_all_cols['id_vars'] = id_vars
220 | sum_all_cols['cols_delete'] = cols_delete
221 | ## This is an EXTREMELY complicated logic for cat vars. Don't change it unless you test it many times!
222 | var_df['numeric'] = 0
223 | float_or_cat = var_df.apply(lambda x: 1 if x['type_of_column'] in ['float16',
224 | 'float32', 'float64'] else 0,
225 | axis=1)
226 | ####### We need to make sure there are no categorical vars in float #######
227 | if len(var_df.loc[float_or_cat == 1]) > 0:
228 | for col in var_df.loc[float_or_cat == 1]['index'].values.tolist():
229 | if 2 < len(train[col].value_counts()) <= float_limit and len(
230 | train[col].value_counts()) <= len(train):
231 | var_df.loc[var_df['index'] == col, 'cat'] = 1
232 | else:
233 | if col not in (num_bool_vars + factor_vars):
234 | var_df.loc[var_df['index'] == col, 'numeric'] = 1
235 | cat_vars = list(var_df[(var_df['cat'] == 1)]['index'])
236 | continuous_vars = list(var_df[(var_df['numeric'] == 1)]['index'])
237 |
238 | ######## V E R Y I M P O R T A N T ###################################################
239 | cat_vars_copy = copy.deepcopy(factor_vars)
240 | for cat in cat_vars_copy:
241 | if df_preds[cat].dtype == float:
242 | continuous_vars.append(cat)
243 | factor_vars.remove(cat)
244 | var_df.loc[var_df['index'] == cat, 'dcat'] = 0
245 | var_df.loc[var_df['index'] == cat, 'numeric'] = 1
246 | elif len(df_preds[cat].value_counts()) == df_preds.shape[0]:
247 | id_vars.append(cat)
248 | factor_vars.remove(cat)
249 | var_df.loc[var_df['index'] == cat, 'dcat'] = 0
250 | var_df.loc[var_df['index'] == cat, 'id_col'] = 1
251 |
252 | sum_all_cols['factor_vars'] = factor_vars
253 | ##### There are a couple of extra tests you need to do to remove abberations in cat_vars ###
254 | cat_vars_copy = copy.deepcopy(cat_vars)
255 | for cat in cat_vars_copy:
256 | if df_preds[cat].dtype == float:
257 | continuous_vars.append(cat)
258 | cat_vars.remove(cat)
259 | var_df.loc[var_df['index'] == cat, 'cat'] = 0
260 | var_df.loc[var_df['index'] == cat, 'numeric'] = 1
261 | elif len(df_preds[cat].value_counts()) == df_preds.shape[0]:
262 | id_vars.append(cat)
263 | cat_vars.remove(cat)
264 | var_df.loc[var_df['index'] == cat, 'cat'] = 0
265 | var_df.loc[var_df['index'] == cat, 'id_col'] = 1
266 | sum_all_cols['cat_vars'] = cat_vars
267 | sum_all_cols['continuous_vars'] = continuous_vars
268 | sum_all_cols['id_vars'] = id_vars
269 | ###### This is where you consolidate the numbers ###########
270 | var_dict_sum = dict(zip(var_df.values[:, 0], var_df.values[:, 2:].sum(1)))
271 | for col, sumval in var_dict_sum.items():
272 | if sumval == 0:
273 | print('%s of type=%s is not classified' % (col, train[col].dtype))
274 | elif sumval > 1:
275 | print('%s of type=%s is classified into more then one type' % (col, train[col].dtype))
276 | else:
277 | pass
278 | ##### If there are more than 1000 unique values, then add it to NLP vars ###
279 | copy_discrete_vals = copy.deepcopy(discrete_string_vars)
280 | for each_discrete in copy_discrete_vals:
281 | if train[each_discrete].nunique() >= 1000:
282 | nlp_vars.append(each_discrete)
283 | discrete_string_vars.remove(each_discrete)
284 | elif 100 < train[each_discrete].nunique() < 1000:
285 | pass
286 | else:
287 | ### If it is less than 100 unique values, then make it categorical var
288 | cat_vars.append(each_discrete)
289 | discrete_string_vars.remove(each_discrete)
290 | sum_all_cols['discrete_string_vars'] = discrete_string_vars
291 | sum_all_cols['cat_vars'] = cat_vars
292 | sum_all_cols['nlp_vars'] = nlp_vars
293 | ############### This is where you print all the types of variables ##############
294 | ####### Returns 8 vars in the following order: continuous_vars,int_vars,cat_vars,
295 | ### string_bool_vars,discrete_string_vars,nlp_vars,date_or_id_vars,cols_delete
296 | if verbose == 1:
297 | print(" Number of Numeric Columns = ", len(continuous_vars))
298 | print(" Number of Integer-Categorical Columns = ", len(int_vars))
299 | print(" Number of String-Categorical Columns = ", len(cat_vars))
300 | print(" Number of Factor-Categorical Columns = ", len(factor_vars))
301 | print(" Number of String-Boolean Columns = ", len(string_bool_vars))
302 | print(" Number of Numeric-Boolean Columns = ", len(num_bool_vars))
303 | print(" Number of Discrete String Columns = ", len(discrete_string_vars))
304 | print(" Number of NLP String Columns = ", len(nlp_vars))
305 | print(" Number of Date Time Columns = ", len(date_vars))
306 | print(" Number of ID Columns = ", len(id_vars))
307 | print(" Number of Columns to Delete = ", len(cols_delete))
308 | if verbose >= 2:
309 | print(' Printing up to %d columns (max) in each category:' % max_cols_to_print)
310 | print(" Numeric Columns : %s" % continuous_vars[:max_cols_to_print])
311 | print(" Integer-Categorical Columns: %s" % int_vars[:max_cols_to_print])
312 | print(" String-Categorical Columns: %s" % cat_vars[:max_cols_to_print])
313 | print(" Factor-Categorical Columns: %s" % factor_vars[:max_cols_to_print])
314 | print(" String-Boolean Columns: %s" % string_bool_vars[:max_cols_to_print])
315 | print(" Numeric-Boolean Columns: %s" % num_bool_vars[:max_cols_to_print])
316 | print(" Discrete String Columns: %s" % discrete_string_vars[:max_cols_to_print])
317 | print(" NLP text Columns: %s" % nlp_vars[:max_cols_to_print])
318 | print(" Date Time Columns: %s" % date_vars[:max_cols_to_print])
319 | print(" ID Columns: %s" % id_vars[:max_cols_to_print])
320 | print(" Columns that will not be considered in modeling: %s" % cols_delete[:max_cols_to_print])
321 | ##### now collect all the column types and column names into a single dictionary to return!
322 |
323 | len_sum_all_cols = reduce(add, [len(v) for v in sum_all_cols.values()])
324 | if len_sum_all_cols == orig_cols_total:
325 | print(' %d Predictors classified...' % orig_cols_total)
326 | # print(' This does not include the Target column(s)')
327 | else:
328 | print('No of columns classified %d does not match %d total cols. Continuing...' % (
329 | len_sum_all_cols, orig_cols_total))
330 | ls = sum_all_cols.values()
331 | flat_list = [item for sublist in ls for item in sublist]
332 | if len(left_subtract(list(train), flat_list)) == 0:
333 | print(' Missing columns = None')
334 | else:
335 | print(' Missing columns = %s' % left_subtract(list(train), flat_list))
336 | return sum_all_cols
337 | ####################################################################################
338 |
--------------------------------------------------------------------------------
/autoviz/test.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/autoviz/test.png
--------------------------------------------------------------------------------
/autoviz/tests/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/autoviz/tests/__init__.py
--------------------------------------------------------------------------------
/autoviz/tests/test_autoviz_class.py:
--------------------------------------------------------------------------------
1 | import unittest
2 |
3 | from ..AutoViz_Class import AutoViz_Class
4 |
5 | class TestAutoVizClass:
6 | def test_add_plots(self):
7 | return True
8 |
--------------------------------------------------------------------------------
/autoviz/tests/test_deps.py:
--------------------------------------------------------------------------------
1 | import unittest
2 |
3 |
4 | class DepsTest(unittest.TestCase):
5 | def test(self):
6 | # have to pip install xgboost
7 | from autoviz import AutoViz_Class as AV
8 | AVC = AV.AutoViz_Class()
9 | self.assertIsNotNone(AVC)
10 |
--------------------------------------------------------------------------------
/images/bokeh_charts.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/bokeh_charts.JPG
--------------------------------------------------------------------------------
/images/data_clean.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/data_clean.png
--------------------------------------------------------------------------------
/images/logo.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/logo.JPG
--------------------------------------------------------------------------------
/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/logo.png
--------------------------------------------------------------------------------
/images/server_charts.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/server_charts.JPG
--------------------------------------------------------------------------------
/images/var_charts.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/var_charts.JPG
--------------------------------------------------------------------------------
/old_setup.py:
--------------------------------------------------------------------------------
1 | import setuptools
2 |
3 | with open("README.md", "r") as fh:
4 | long_description = fh.read()
5 |
6 | setuptools.setup(
7 | name="autoviz",
8 | version="0.1.806",
9 | author="Ram Seshadri",
10 | # author_email="author@example.com",
11 | description="Automatically Visualize any dataset, any size with a single line of code",
12 | long_description=long_description,
13 | long_description_content_type="text/markdown",
14 | license='Apache License 2.0',
15 | url="https://github.com/AutoViML/AutoViz.git",
16 | packages=setuptools.find_packages(exclude=("tests",)),
17 | install_requires=[
18 | "xlrd",
19 | "wordcloud",
20 | "emoji",
21 | "numpy<1.25.0",
22 | "pandas",
23 | "pyamg",
24 | "matplotlib<=3.7.4",
25 | "seaborn>=0.12.2",
26 | "scikit-learn",
27 | "statsmodels",
28 | "nltk",
29 | "textblob",
30 | "holoviews~=1.14.9",
31 | "bokeh~=2.4.2",
32 | "hvplot~=0.7.3",
33 | "panel>=0.12.6",
34 | "xgboost>=0.82,<1.7",
35 | "fsspec>=0.8.3",
36 | "typing-extensions>=4.1.1",
37 | "pandas-dq>=1.29"
38 | ],
39 | classifiers=[
40 | "Programming Language :: Python :: 3",
41 | "Operating System :: OS Independent",
42 | ],
43 | )
44 |
--------------------------------------------------------------------------------
/requirements-py310.txt:
--------------------------------------------------------------------------------
1 | xlrd
2 | wordcloud
3 | pyamg
4 | nltk
5 | emoji
6 | textblob
7 | matplotlib<=3.7.4
8 | seaborn>=0.12.2
9 | scikit-learn
10 | statsmodels
11 | xgboost>=0.82,<1.7
12 | fsspec>=0.8.3
13 | typing-extensions>=4.1.1
14 | pandas-dq>=1.29
15 | numpy>=1.25.0
16 | hvplot>=0.9.2
17 | panel>=1.4.0
18 | holoviews>=1.15.3
19 | pandas<2.0
20 |
--------------------------------------------------------------------------------
/requirements-py311.txt:
--------------------------------------------------------------------------------
1 | xlrd
2 | wordcloud
3 | pyamg
4 | nltk
5 | emoji
6 | textblob
7 | matplotlib<=3.7.4
8 | seaborn>=0.12.2
9 | scikit-learn
10 | statsmodels
11 | xgboost>=0.82,<1.7
12 | fsspec>=0.8.3
13 | typing-extensions>=4.1.1
14 | pandas-dq>=1.29
15 | numpy>=1.25.0
16 | hvplot>=0.9.2
17 | panel>=1.4.0
18 | holoviews>=1.15.3
19 | pandas<2.0
20 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | xlrd
2 | wordcloud
3 | pyamg
4 | nltk
5 | emoji
6 | textblob
7 | matplotlib<=3.7.4
8 | seaborn>=0.12.2
9 | scikit-learn
10 | statsmodels
11 | xgboost>=0.82,<1.7
12 | fsspec>=0.8.3
13 | typing-extensions>=4.1.1
14 | pandas-dq>=1.29
15 | numpy<1.24
16 | hvplot~=0.7.3
17 | panel~=0.14.4
18 | holoviews~=1.14.9
19 | param==1.13.0
20 | pandas<2.0
21 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | import setuptools
2 | import sys
3 |
4 | with open("README.md", "r") as fh:
5 | long_description = fh.read()
6 |
7 | # Determine the Python version
8 | python_version = sys.version_info
9 |
10 | list_req = [
11 | "xlrd",
12 | "wordcloud",
13 | "emoji",
14 | # Assuming numpy version <1.25.0 is compatible with older Python versions and older HoloViews
15 | "pyamg",
16 | "scikit-learn",
17 | "statsmodels",
18 | "nltk",
19 | "textblob",
20 | "xgboost>=0.82,<1.7",
21 | "fsspec>=0.8.3",
22 | "typing-extensions>=4.1.1",
23 | "pandas-dq>=1.29"
24 | ]
25 | # Define default dependencies (compatible with older Python versions)
26 | install_requires = list_req
27 |
28 | # Define default dependencies (compatible with older Python versions)
29 | install_requires = list_req + [
30 | # Keep most dependencies as is, adjust only where necessary
31 | "numpy>=1.24.0", # Update as needed for compatibility with newer HoloViews
32 | # Update other dependencies as needed
33 | "hvplot>=0.9.2", ###newer hvplot
34 | "holoviews>=1.16.0", # Update based on the bug fix relevant to Python 3.10
35 | # Ensure other dependencies are compatible
36 | "panel>=1.4.0", ## this is a new version of panel
37 | "pandas>=2.0", ## pandas must be below 2.0 version
38 | "matplotlib>3.7.4", ## newer version of matplotlib
39 | "seaborn>0.12.2", ## newer version of seaborn ##
40 | ]
41 |
42 | setuptools.setup(
43 | name="autoviz",
44 | version="0.1.905",
45 | author="Ram Seshadri",
46 | description="Automatically Visualize any dataset, any size with a single line of code",
47 | long_description=long_description,
48 | long_description_content_type="text/markdown",
49 | license='Apache License 2.0',
50 | url="https://github.com/AutoViML/AutoViz.git",
51 | packages=setuptools.find_packages(exclude=("tests",)),
52 | install_requires=install_requires,
53 | classifiers=[
54 | "Programming Language :: Python :: 3",
55 | "Operating System :: OS Independent",
56 | ],
57 | )
--------------------------------------------------------------------------------
/updates.md:
--------------------------------------------------------------------------------
1 | # Latest updates and news from AutoViz!
2 |
3 | ### April 2024: AutoViz version 0.1.900+ series has some fixes for autoviz install issues
4 | You can always use pip install from git which uses the latest setup.py and it works well!
5 | `!pip install git+https://github.com/AutoViML/AutoViz`
6 |
7 | But if you are using `pip install autoviz`, then you will get two kinds of errors. In order to know what to do, perform the following steps.
8 |
9 | **First print these 3 versions of pandas, numpy and holoviews**
10 |
11 | import pandas as pd
12 | import numpy as np
13 | import holoviews as hv
14 | print(pd.__version__, np.__version__,hv.__version__)
15 |
16 |
17 | If it prints
18 | `numpy<1.24, pandas<2.0, holoviews <= 1.14.19`
19 |
20 | These are all older versions of pandas and numpy along with older versions of holoviews<=1.14.9. These three older versions work together since holoviews uses an older syntax numpy (`np.bool`) that numpy<1.24 uses. However, if you are running this in kaggle kernels, you must restart your Kaggle kernel after you install autoviz since it changes the numpy and pandas versions to an older version and requires a restart to take effect. But if you get this error: `"ValueError: ClassSelector parameter None value must be an instance of (function, tuple), not ."` In that case, you must upgrade holoviews to 1.16.0.
21 |
22 | But if the above statements, print newer versions of pandas and numpy, like this:
23 |
24 | pandas>=2.0.0 numpy>=1.24.0 holoviews>=1.16.0
25 |
26 | then you need newer versions of holoviews. Although regular AutoViz works well with newer pandas and numpy the older holoviews version corrupts it. AutoViz_Holo needs newer versions such as holoviews>=1.16.0 in order to work with newer numpy and pandas. For example, if you don't upgrade holoviews to >=1.16.0 you will get this error: "ValueError: ClassSelector parameter None value must be an instance of (function, tuple), not ."
27 |
28 | Hope this is clear. Please let us know via the issues tab in GitHub.
29 |
30 | ### December 2023: AutoViz now has modular dependency loading and improved support for Python versions 3.10+
31 | Modular Dependency Loading: AutoViz now uses a more flexible approach for importing visualization libraries starting with version `0.1.801`. This means you only need to install certain dependencies (like hvplot and holoviews) if you plan to use specific backends (e.g., bokeh). This change significantly reduces installation issues for users on newer Python versions such as 3.10 and higher.
32 |
33 | Improved Backend Support: Depending on your Python environment, AutoViz dynamically adjusts to use compatible visualization libraries, ensuring a smoother user experience. Requirements:
34 | "holoviews>=1.14.9",
35 | "bokeh>=2.4.2",
36 | "hvplot>=0.7.3",
37 | "panel>=0.12.6".
38 |
39 |
40 | ### June 2023: AutoViz now has Data Quality checks and a transformer to fix your data quality
41 | From version 0.1.70, AutoViz can now automatically analyze your dataset and fix data quality issues in your data set. All you have to do is to `from autoviz import FixDQ ` and use it like a `fit_transform` transformer. It's that easy to perform data cleaning now with AutoViz!
42 |
43 | 
44 |
45 | ### Apr-2023 Update: AutoViz now creates scatter plots for categorical variables when data contains only cat variables
46 | From version 0.1.600 onwards, AutoViz now automatically draws `catscatter` plots for pairs of categorical variables in a data frame. A `catscatter` plot is a type of scatter plot that shows the frequency of each combination of categories in two variables. It can be useful for exploring the relationship between categorical variables and identifying patterns or outliers. It creates these plots only if the data contains no numeric variables. Otherwise, it doesn't create them since it would be unncessary.
47 |
48 | ```
49 | AutoViz is grateful to the cascatter implementation of Myr Barnés, 2020.
50 | You can see the original here: https://github.com/myrthings/catscatter/blob/master/catscatter.py
51 | # More info about this function here:
52 | # - https://towardsdatascience.com/visualize-categorical-relationships-with-catscatter-e60cdb164395
53 | # - https://github.com/myrthings/catscatter/blob/master/README.md
54 | ```
55 |
56 | ### Sep-2022 Update: AutoViz now provides data cleansing suggestions! #autoviz #datacleaning
57 | From version 0.1.50 onwards, AutoViz now automatically analyzes your dataset and provides suggestions for how to clean your data set. It detects missing values, identifies rare categories, finds infinite values, detects mixed data types, and so much more. This will help you tremendously speed up your data cleaning activities. If you have suggestions to add more data cleaning steps please file an `Issue` in our GitHub and we will gladly consider it. Here is an example of how data cleaning suggestions look:
58 |
59 |
60 | In order to get this latest function, you must upgrade autoviz to the latest version by:
61 | ```
62 | pip install autoviz --upgrade
63 | ```
64 |
65 | In the same version, you can also get data suggestions by using `AV.AutoViz(......, verbose=1)` or by simply importing it:
66 |
67 | ```
68 | from autoviz import data_cleaning_suggestions
69 | data_cleaning_suggestions(df)
70 | ```
71 |
72 | ### Dec-23-2021 Update: AutoViz now does Wordclouds! #autoviz #wordcloud
73 | AutoViz can now create Wordclouds automatically for your NLP variables in data. It detects NLP variables automatically and creates wordclouds for them. See Colab notebook for example: [AutoViz Demo with HTML setting](https://colab.research.google.com/drive/1r5QqESRZDY98FFfDOgVtMAVA_oaGtqqx?usp=sharing)
74 |
75 |
76 |
77 | ### Dec 21, 2021: AutoViz now runs on Docker containers as part of MLOps pipelines. Check out Orchest.io
78 | We are excited to announce that AutoViz and Deep_AutoViML are now available as containerized applications on Docker. This means that you can build data pipelines using a fantastic tool like [orchest.io](orchest.io) to build MLOps pipelines visually. Here are two sample pipelines we have created:
79 |
80 | AutoViz pipeline: https://lnkd.in/g5uC-z66
81 | Deep_AutoViML pipeline: https://lnkd.in/gdnWTqCG
82 |
83 | You can find more examples and a wonderful video on [orchest's web site](https://github.com/orchest/orchest-examples)
84 | 
85 |
86 | ### Dec-17-2021 AutoViz now uses HoloViews to display dashboards with Bokeh and save them as Dynamic HTML for web serving #HTML #Bokeh #Holoviews
87 | Now you can use AutoViz to create Interactive Bokeh charts and dashboards (see below) either in Jupyter Notebooks or in the browser. Use chart_format as follows:
88 | - `chart_format='bokeh'`: interactive Bokeh dashboards are plotted in Jupyter Notebooks.
89 | - `chart_format='server'`, dashboards will pop up for each kind of chart on your web browser.
90 | - `chart_format='html'`, interactive Bokeh charts will be silently saved as Dynamic HTML files under `AutoViz_Plots` directory
91 |
--------------------------------------------------------------------------------