├── .DS_Store ├── .github └── workflows │ └── python-package-conda.yml ├── .gitignore ├── .vscode └── settings.json ├── CONTRIBUTING.md ├── Examples ├── AutoViz_Bokeh_Interactive_Demo.ipynb ├── AutoViz_Demo.ipynb ├── Boston.csv └── LCA Bokeh.ipynb ├── LICENSE ├── README.md ├── autoviz ├── AutoViz_Class.py ├── AutoViz_Holo.py ├── AutoViz_NLP.py ├── AutoViz_Utils.py ├── __init__.py ├── __version__.py ├── classify_method.py ├── test.png └── tests │ ├── __init__.py │ ├── test_autoviz_class.py │ └── test_deps.py ├── images ├── bokeh_charts.JPG ├── data_clean.png ├── logo.JPG ├── logo.png ├── server_charts.JPG └── var_charts.JPG ├── old_setup.py ├── requirements-py310.txt ├── requirements-py311.txt ├── requirements.txt ├── setup.py └── updates.md /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/.DS_Store -------------------------------------------------------------------------------- /.github/workflows/python-package-conda.yml: -------------------------------------------------------------------------------- 1 | name: Python Package using Conda 2 | 3 | on: [push] 4 | 5 | jobs: 6 | build-linux: 7 | runs-on: ubuntu-latest 8 | strategy: 9 | matrix: 10 | os: [ubuntu-latest, macos-latest, windows-latest]: 11 | max-parallel: 5 12 | 13 | steps: 14 | - uses: actions/checkout@v3 15 | - name: Set up Python 3.10 16 | uses: actions/setup-python@v3 17 | with: 18 | python-version: '3.10' 19 | - name: Add conda to system path 20 | run: | 21 | # $CONDA is an environment variable pointing to the root of the miniconda directory 22 | echo $CONDA/bin >> $GITHUB_PATH 23 | - name: Install dependencies 24 | run: | 25 | conda env update --file environment.yml --name base 26 | - name: Lint with flake8 27 | run: | 28 | conda install flake8 29 | # stop the build if there are Python syntax errors or undefined names 30 | flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics 31 | # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide 32 | flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics 33 | - name: Test with pytest 34 | run: | 35 | conda install pytest 36 | pytest 37 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | __pycache__/ 3 | .idea/ 4 | dist/ 5 | autoviz.egg-info/ 6 | build/ 7 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.pythonPath": "/opt/conda/bin/python" 3 | } -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | We welcome contributions from anyone beginner or advanced. Please before working on some feature 4 | 5 | - search through the past issues, your concern may have been raised by others in the past. Check through 6 | closed issues as well. 7 | - if there is no open issue for your feature request please open one up to coordinate all collaborators 8 | - write your feature 9 | - submit a pull request on this repo with: 10 | - a brief description 11 | - **detail of the expected change(s) in behaviour** 12 | - how to test it (if it's not obvious) 13 | 14 | Ask someone to test it. 15 | -------------------------------------------------------------------------------- /Examples/Boston.csv: -------------------------------------------------------------------------------- 1 | "","crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","black","lstat","medv" 2 | "1",0.00632,18,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24 3 | "2",0.02731,0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6 4 | "3",0.02729,0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7 5 | "4",0.03237,0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4 6 | "5",0.06905,0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2 7 | "6",0.02985,0,2.18,0,0.458,6.43,58.7,6.0622,3,222,18.7,394.12,5.21,28.7 8 | "7",0.08829,12.5,7.87,0,0.524,6.012,66.6,5.5605,5,311,15.2,395.6,12.43,22.9 9 | "8",0.14455,12.5,7.87,0,0.524,6.172,96.1,5.9505,5,311,15.2,396.9,19.15,27.1 10 | "9",0.21124,12.5,7.87,0,0.524,5.631,100,6.0821,5,311,15.2,386.63,29.93,16.5 11 | "10",0.17004,12.5,7.87,0,0.524,6.004,85.9,6.5921,5,311,15.2,386.71,17.1,18.9 12 | "11",0.22489,12.5,7.87,0,0.524,6.377,94.3,6.3467,5,311,15.2,392.52,20.45,15 13 | "12",0.11747,12.5,7.87,0,0.524,6.009,82.9,6.2267,5,311,15.2,396.9,13.27,18.9 14 | "13",0.09378,12.5,7.87,0,0.524,5.889,39,5.4509,5,311,15.2,390.5,15.71,21.7 15 | "14",0.62976,0,8.14,0,0.538,5.949,61.8,4.7075,4,307,21,396.9,8.26,20.4 16 | "15",0.63796,0,8.14,0,0.538,6.096,84.5,4.4619,4,307,21,380.02,10.26,18.2 17 | "16",0.62739,0,8.14,0,0.538,5.834,56.5,4.4986,4,307,21,395.62,8.47,19.9 18 | "17",1.05393,0,8.14,0,0.538,5.935,29.3,4.4986,4,307,21,386.85,6.58,23.1 19 | "18",0.7842,0,8.14,0,0.538,5.99,81.7,4.2579,4,307,21,386.75,14.67,17.5 20 | "19",0.80271,0,8.14,0,0.538,5.456,36.6,3.7965,4,307,21,288.99,11.69,20.2 21 | "20",0.7258,0,8.14,0,0.538,5.727,69.5,3.7965,4,307,21,390.95,11.28,18.2 22 | "21",1.25179,0,8.14,0,0.538,5.57,98.1,3.7979,4,307,21,376.57,21.02,13.6 23 | "22",0.85204,0,8.14,0,0.538,5.965,89.2,4.0123,4,307,21,392.53,13.83,19.6 24 | "23",1.23247,0,8.14,0,0.538,6.142,91.7,3.9769,4,307,21,396.9,18.72,15.2 25 | "24",0.98843,0,8.14,0,0.538,5.813,100,4.0952,4,307,21,394.54,19.88,14.5 26 | "25",0.75026,0,8.14,0,0.538,5.924,94.1,4.3996,4,307,21,394.33,16.3,15.6 27 | "26",0.84054,0,8.14,0,0.538,5.599,85.7,4.4546,4,307,21,303.42,16.51,13.9 28 | "27",0.67191,0,8.14,0,0.538,5.813,90.3,4.682,4,307,21,376.88,14.81,16.6 29 | "28",0.95577,0,8.14,0,0.538,6.047,88.8,4.4534,4,307,21,306.38,17.28,14.8 30 | "29",0.77299,0,8.14,0,0.538,6.495,94.4,4.4547,4,307,21,387.94,12.8,18.4 31 | "30",1.00245,0,8.14,0,0.538,6.674,87.3,4.239,4,307,21,380.23,11.98,21 32 | "31",1.13081,0,8.14,0,0.538,5.713,94.1,4.233,4,307,21,360.17,22.6,12.7 33 | "32",1.35472,0,8.14,0,0.538,6.072,100,4.175,4,307,21,376.73,13.04,14.5 34 | "33",1.38799,0,8.14,0,0.538,5.95,82,3.99,4,307,21,232.6,27.71,13.2 35 | "34",1.15172,0,8.14,0,0.538,5.701,95,3.7872,4,307,21,358.77,18.35,13.1 36 | "35",1.61282,0,8.14,0,0.538,6.096,96.9,3.7598,4,307,21,248.31,20.34,13.5 37 | "36",0.06417,0,5.96,0,0.499,5.933,68.2,3.3603,5,279,19.2,396.9,9.68,18.9 38 | "37",0.09744,0,5.96,0,0.499,5.841,61.4,3.3779,5,279,19.2,377.56,11.41,20 39 | "38",0.08014,0,5.96,0,0.499,5.85,41.5,3.9342,5,279,19.2,396.9,8.77,21 40 | "39",0.17505,0,5.96,0,0.499,5.966,30.2,3.8473,5,279,19.2,393.43,10.13,24.7 41 | "40",0.02763,75,2.95,0,0.428,6.595,21.8,5.4011,3,252,18.3,395.63,4.32,30.8 42 | "41",0.03359,75,2.95,0,0.428,7.024,15.8,5.4011,3,252,18.3,395.62,1.98,34.9 43 | "42",0.12744,0,6.91,0,0.448,6.77,2.9,5.7209,3,233,17.9,385.41,4.84,26.6 44 | "43",0.1415,0,6.91,0,0.448,6.169,6.6,5.7209,3,233,17.9,383.37,5.81,25.3 45 | "44",0.15936,0,6.91,0,0.448,6.211,6.5,5.7209,3,233,17.9,394.46,7.44,24.7 46 | "45",0.12269,0,6.91,0,0.448,6.069,40,5.7209,3,233,17.9,389.39,9.55,21.2 47 | "46",0.17142,0,6.91,0,0.448,5.682,33.8,5.1004,3,233,17.9,396.9,10.21,19.3 48 | "47",0.18836,0,6.91,0,0.448,5.786,33.3,5.1004,3,233,17.9,396.9,14.15,20 49 | "48",0.22927,0,6.91,0,0.448,6.03,85.5,5.6894,3,233,17.9,392.74,18.8,16.6 50 | "49",0.25387,0,6.91,0,0.448,5.399,95.3,5.87,3,233,17.9,396.9,30.81,14.4 51 | "50",0.21977,0,6.91,0,0.448,5.602,62,6.0877,3,233,17.9,396.9,16.2,19.4 52 | "51",0.08873,21,5.64,0,0.439,5.963,45.7,6.8147,4,243,16.8,395.56,13.45,19.7 53 | "52",0.04337,21,5.64,0,0.439,6.115,63,6.8147,4,243,16.8,393.97,9.43,20.5 54 | "53",0.0536,21,5.64,0,0.439,6.511,21.1,6.8147,4,243,16.8,396.9,5.28,25 55 | "54",0.04981,21,5.64,0,0.439,5.998,21.4,6.8147,4,243,16.8,396.9,8.43,23.4 56 | "55",0.0136,75,4,0,0.41,5.888,47.6,7.3197,3,469,21.1,396.9,14.8,18.9 57 | "56",0.01311,90,1.22,0,0.403,7.249,21.9,8.6966,5,226,17.9,395.93,4.81,35.4 58 | "57",0.02055,85,0.74,0,0.41,6.383,35.7,9.1876,2,313,17.3,396.9,5.77,24.7 59 | "58",0.01432,100,1.32,0,0.411,6.816,40.5,8.3248,5,256,15.1,392.9,3.95,31.6 60 | "59",0.15445,25,5.13,0,0.453,6.145,29.2,7.8148,8,284,19.7,390.68,6.86,23.3 61 | "60",0.10328,25,5.13,0,0.453,5.927,47.2,6.932,8,284,19.7,396.9,9.22,19.6 62 | "61",0.14932,25,5.13,0,0.453,5.741,66.2,7.2254,8,284,19.7,395.11,13.15,18.7 63 | "62",0.17171,25,5.13,0,0.453,5.966,93.4,6.8185,8,284,19.7,378.08,14.44,16 64 | "63",0.11027,25,5.13,0,0.453,6.456,67.8,7.2255,8,284,19.7,396.9,6.73,22.2 65 | "64",0.1265,25,5.13,0,0.453,6.762,43.4,7.9809,8,284,19.7,395.58,9.5,25 66 | "65",0.01951,17.5,1.38,0,0.4161,7.104,59.5,9.2229,3,216,18.6,393.24,8.05,33 67 | "66",0.03584,80,3.37,0,0.398,6.29,17.8,6.6115,4,337,16.1,396.9,4.67,23.5 68 | "67",0.04379,80,3.37,0,0.398,5.787,31.1,6.6115,4,337,16.1,396.9,10.24,19.4 69 | "68",0.05789,12.5,6.07,0,0.409,5.878,21.4,6.498,4,345,18.9,396.21,8.1,22 70 | "69",0.13554,12.5,6.07,0,0.409,5.594,36.8,6.498,4,345,18.9,396.9,13.09,17.4 71 | "70",0.12816,12.5,6.07,0,0.409,5.885,33,6.498,4,345,18.9,396.9,8.79,20.9 72 | "71",0.08826,0,10.81,0,0.413,6.417,6.6,5.2873,4,305,19.2,383.73,6.72,24.2 73 | "72",0.15876,0,10.81,0,0.413,5.961,17.5,5.2873,4,305,19.2,376.94,9.88,21.7 74 | "73",0.09164,0,10.81,0,0.413,6.065,7.8,5.2873,4,305,19.2,390.91,5.52,22.8 75 | "74",0.19539,0,10.81,0,0.413,6.245,6.2,5.2873,4,305,19.2,377.17,7.54,23.4 76 | "75",0.07896,0,12.83,0,0.437,6.273,6,4.2515,5,398,18.7,394.92,6.78,24.1 77 | "76",0.09512,0,12.83,0,0.437,6.286,45,4.5026,5,398,18.7,383.23,8.94,21.4 78 | "77",0.10153,0,12.83,0,0.437,6.279,74.5,4.0522,5,398,18.7,373.66,11.97,20 79 | "78",0.08707,0,12.83,0,0.437,6.14,45.8,4.0905,5,398,18.7,386.96,10.27,20.8 80 | "79",0.05646,0,12.83,0,0.437,6.232,53.7,5.0141,5,398,18.7,386.4,12.34,21.2 81 | "80",0.08387,0,12.83,0,0.437,5.874,36.6,4.5026,5,398,18.7,396.06,9.1,20.3 82 | "81",0.04113,25,4.86,0,0.426,6.727,33.5,5.4007,4,281,19,396.9,5.29,28 83 | "82",0.04462,25,4.86,0,0.426,6.619,70.4,5.4007,4,281,19,395.63,7.22,23.9 84 | "83",0.03659,25,4.86,0,0.426,6.302,32.2,5.4007,4,281,19,396.9,6.72,24.8 85 | "84",0.03551,25,4.86,0,0.426,6.167,46.7,5.4007,4,281,19,390.64,7.51,22.9 86 | "85",0.05059,0,4.49,0,0.449,6.389,48,4.7794,3,247,18.5,396.9,9.62,23.9 87 | "86",0.05735,0,4.49,0,0.449,6.63,56.1,4.4377,3,247,18.5,392.3,6.53,26.6 88 | "87",0.05188,0,4.49,0,0.449,6.015,45.1,4.4272,3,247,18.5,395.99,12.86,22.5 89 | "88",0.07151,0,4.49,0,0.449,6.121,56.8,3.7476,3,247,18.5,395.15,8.44,22.2 90 | "89",0.0566,0,3.41,0,0.489,7.007,86.3,3.4217,2,270,17.8,396.9,5.5,23.6 91 | "90",0.05302,0,3.41,0,0.489,7.079,63.1,3.4145,2,270,17.8,396.06,5.7,28.7 92 | "91",0.04684,0,3.41,0,0.489,6.417,66.1,3.0923,2,270,17.8,392.18,8.81,22.6 93 | "92",0.03932,0,3.41,0,0.489,6.405,73.9,3.0921,2,270,17.8,393.55,8.2,22 94 | "93",0.04203,28,15.04,0,0.464,6.442,53.6,3.6659,4,270,18.2,395.01,8.16,22.9 95 | "94",0.02875,28,15.04,0,0.464,6.211,28.9,3.6659,4,270,18.2,396.33,6.21,25 96 | "95",0.04294,28,15.04,0,0.464,6.249,77.3,3.615,4,270,18.2,396.9,10.59,20.6 97 | "96",0.12204,0,2.89,0,0.445,6.625,57.8,3.4952,2,276,18,357.98,6.65,28.4 98 | "97",0.11504,0,2.89,0,0.445,6.163,69.6,3.4952,2,276,18,391.83,11.34,21.4 99 | "98",0.12083,0,2.89,0,0.445,8.069,76,3.4952,2,276,18,396.9,4.21,38.7 100 | "99",0.08187,0,2.89,0,0.445,7.82,36.9,3.4952,2,276,18,393.53,3.57,43.8 101 | "100",0.0686,0,2.89,0,0.445,7.416,62.5,3.4952,2,276,18,396.9,6.19,33.2 102 | "101",0.14866,0,8.56,0,0.52,6.727,79.9,2.7778,5,384,20.9,394.76,9.42,27.5 103 | "102",0.11432,0,8.56,0,0.52,6.781,71.3,2.8561,5,384,20.9,395.58,7.67,26.5 104 | "103",0.22876,0,8.56,0,0.52,6.405,85.4,2.7147,5,384,20.9,70.8,10.63,18.6 105 | "104",0.21161,0,8.56,0,0.52,6.137,87.4,2.7147,5,384,20.9,394.47,13.44,19.3 106 | "105",0.1396,0,8.56,0,0.52,6.167,90,2.421,5,384,20.9,392.69,12.33,20.1 107 | "106",0.13262,0,8.56,0,0.52,5.851,96.7,2.1069,5,384,20.9,394.05,16.47,19.5 108 | "107",0.1712,0,8.56,0,0.52,5.836,91.9,2.211,5,384,20.9,395.67,18.66,19.5 109 | "108",0.13117,0,8.56,0,0.52,6.127,85.2,2.1224,5,384,20.9,387.69,14.09,20.4 110 | "109",0.12802,0,8.56,0,0.52,6.474,97.1,2.4329,5,384,20.9,395.24,12.27,19.8 111 | "110",0.26363,0,8.56,0,0.52,6.229,91.2,2.5451,5,384,20.9,391.23,15.55,19.4 112 | "111",0.10793,0,8.56,0,0.52,6.195,54.4,2.7778,5,384,20.9,393.49,13,21.7 113 | "112",0.10084,0,10.01,0,0.547,6.715,81.6,2.6775,6,432,17.8,395.59,10.16,22.8 114 | "113",0.12329,0,10.01,0,0.547,5.913,92.9,2.3534,6,432,17.8,394.95,16.21,18.8 115 | "114",0.22212,0,10.01,0,0.547,6.092,95.4,2.548,6,432,17.8,396.9,17.09,18.7 116 | "115",0.14231,0,10.01,0,0.547,6.254,84.2,2.2565,6,432,17.8,388.74,10.45,18.5 117 | "116",0.17134,0,10.01,0,0.547,5.928,88.2,2.4631,6,432,17.8,344.91,15.76,18.3 118 | "117",0.13158,0,10.01,0,0.547,6.176,72.5,2.7301,6,432,17.8,393.3,12.04,21.2 119 | "118",0.15098,0,10.01,0,0.547,6.021,82.6,2.7474,6,432,17.8,394.51,10.3,19.2 120 | "119",0.13058,0,10.01,0,0.547,5.872,73.1,2.4775,6,432,17.8,338.63,15.37,20.4 121 | "120",0.14476,0,10.01,0,0.547,5.731,65.2,2.7592,6,432,17.8,391.5,13.61,19.3 122 | "121",0.06899,0,25.65,0,0.581,5.87,69.7,2.2577,2,188,19.1,389.15,14.37,22 123 | "122",0.07165,0,25.65,0,0.581,6.004,84.1,2.1974,2,188,19.1,377.67,14.27,20.3 124 | "123",0.09299,0,25.65,0,0.581,5.961,92.9,2.0869,2,188,19.1,378.09,17.93,20.5 125 | "124",0.15038,0,25.65,0,0.581,5.856,97,1.9444,2,188,19.1,370.31,25.41,17.3 126 | "125",0.09849,0,25.65,0,0.581,5.879,95.8,2.0063,2,188,19.1,379.38,17.58,18.8 127 | "126",0.16902,0,25.65,0,0.581,5.986,88.4,1.9929,2,188,19.1,385.02,14.81,21.4 128 | "127",0.38735,0,25.65,0,0.581,5.613,95.6,1.7572,2,188,19.1,359.29,27.26,15.7 129 | "128",0.25915,0,21.89,0,0.624,5.693,96,1.7883,4,437,21.2,392.11,17.19,16.2 130 | "129",0.32543,0,21.89,0,0.624,6.431,98.8,1.8125,4,437,21.2,396.9,15.39,18 131 | "130",0.88125,0,21.89,0,0.624,5.637,94.7,1.9799,4,437,21.2,396.9,18.34,14.3 132 | "131",0.34006,0,21.89,0,0.624,6.458,98.9,2.1185,4,437,21.2,395.04,12.6,19.2 133 | "132",1.19294,0,21.89,0,0.624,6.326,97.7,2.271,4,437,21.2,396.9,12.26,19.6 134 | "133",0.59005,0,21.89,0,0.624,6.372,97.9,2.3274,4,437,21.2,385.76,11.12,23 135 | "134",0.32982,0,21.89,0,0.624,5.822,95.4,2.4699,4,437,21.2,388.69,15.03,18.4 136 | "135",0.97617,0,21.89,0,0.624,5.757,98.4,2.346,4,437,21.2,262.76,17.31,15.6 137 | "136",0.55778,0,21.89,0,0.624,6.335,98.2,2.1107,4,437,21.2,394.67,16.96,18.1 138 | "137",0.32264,0,21.89,0,0.624,5.942,93.5,1.9669,4,437,21.2,378.25,16.9,17.4 139 | "138",0.35233,0,21.89,0,0.624,6.454,98.4,1.8498,4,437,21.2,394.08,14.59,17.1 140 | "139",0.2498,0,21.89,0,0.624,5.857,98.2,1.6686,4,437,21.2,392.04,21.32,13.3 141 | "140",0.54452,0,21.89,0,0.624,6.151,97.9,1.6687,4,437,21.2,396.9,18.46,17.8 142 | "141",0.2909,0,21.89,0,0.624,6.174,93.6,1.6119,4,437,21.2,388.08,24.16,14 143 | "142",1.62864,0,21.89,0,0.624,5.019,100,1.4394,4,437,21.2,396.9,34.41,14.4 144 | "143",3.32105,0,19.58,1,0.871,5.403,100,1.3216,5,403,14.7,396.9,26.82,13.4 145 | "144",4.0974,0,19.58,0,0.871,5.468,100,1.4118,5,403,14.7,396.9,26.42,15.6 146 | "145",2.77974,0,19.58,0,0.871,4.903,97.8,1.3459,5,403,14.7,396.9,29.29,11.8 147 | "146",2.37934,0,19.58,0,0.871,6.13,100,1.4191,5,403,14.7,172.91,27.8,13.8 148 | "147",2.15505,0,19.58,0,0.871,5.628,100,1.5166,5,403,14.7,169.27,16.65,15.6 149 | "148",2.36862,0,19.58,0,0.871,4.926,95.7,1.4608,5,403,14.7,391.71,29.53,14.6 150 | "149",2.33099,0,19.58,0,0.871,5.186,93.8,1.5296,5,403,14.7,356.99,28.32,17.8 151 | "150",2.73397,0,19.58,0,0.871,5.597,94.9,1.5257,5,403,14.7,351.85,21.45,15.4 152 | "151",1.6566,0,19.58,0,0.871,6.122,97.3,1.618,5,403,14.7,372.8,14.1,21.5 153 | "152",1.49632,0,19.58,0,0.871,5.404,100,1.5916,5,403,14.7,341.6,13.28,19.6 154 | "153",1.12658,0,19.58,1,0.871,5.012,88,1.6102,5,403,14.7,343.28,12.12,15.3 155 | "154",2.14918,0,19.58,0,0.871,5.709,98.5,1.6232,5,403,14.7,261.95,15.79,19.4 156 | "155",1.41385,0,19.58,1,0.871,6.129,96,1.7494,5,403,14.7,321.02,15.12,17 157 | "156",3.53501,0,19.58,1,0.871,6.152,82.6,1.7455,5,403,14.7,88.01,15.02,15.6 158 | "157",2.44668,0,19.58,0,0.871,5.272,94,1.7364,5,403,14.7,88.63,16.14,13.1 159 | "158",1.22358,0,19.58,0,0.605,6.943,97.4,1.8773,5,403,14.7,363.43,4.59,41.3 160 | "159",1.34284,0,19.58,0,0.605,6.066,100,1.7573,5,403,14.7,353.89,6.43,24.3 161 | "160",1.42502,0,19.58,0,0.871,6.51,100,1.7659,5,403,14.7,364.31,7.39,23.3 162 | "161",1.27346,0,19.58,1,0.605,6.25,92.6,1.7984,5,403,14.7,338.92,5.5,27 163 | "162",1.46336,0,19.58,0,0.605,7.489,90.8,1.9709,5,403,14.7,374.43,1.73,50 164 | "163",1.83377,0,19.58,1,0.605,7.802,98.2,2.0407,5,403,14.7,389.61,1.92,50 165 | "164",1.51902,0,19.58,1,0.605,8.375,93.9,2.162,5,403,14.7,388.45,3.32,50 166 | "165",2.24236,0,19.58,0,0.605,5.854,91.8,2.422,5,403,14.7,395.11,11.64,22.7 167 | "166",2.924,0,19.58,0,0.605,6.101,93,2.2834,5,403,14.7,240.16,9.81,25 168 | "167",2.01019,0,19.58,0,0.605,7.929,96.2,2.0459,5,403,14.7,369.3,3.7,50 169 | "168",1.80028,0,19.58,0,0.605,5.877,79.2,2.4259,5,403,14.7,227.61,12.14,23.8 170 | "169",2.3004,0,19.58,0,0.605,6.319,96.1,2.1,5,403,14.7,297.09,11.1,23.8 171 | "170",2.44953,0,19.58,0,0.605,6.402,95.2,2.2625,5,403,14.7,330.04,11.32,22.3 172 | "171",1.20742,0,19.58,0,0.605,5.875,94.6,2.4259,5,403,14.7,292.29,14.43,17.4 173 | "172",2.3139,0,19.58,0,0.605,5.88,97.3,2.3887,5,403,14.7,348.13,12.03,19.1 174 | "173",0.13914,0,4.05,0,0.51,5.572,88.5,2.5961,5,296,16.6,396.9,14.69,23.1 175 | "174",0.09178,0,4.05,0,0.51,6.416,84.1,2.6463,5,296,16.6,395.5,9.04,23.6 176 | "175",0.08447,0,4.05,0,0.51,5.859,68.7,2.7019,5,296,16.6,393.23,9.64,22.6 177 | "176",0.06664,0,4.05,0,0.51,6.546,33.1,3.1323,5,296,16.6,390.96,5.33,29.4 178 | "177",0.07022,0,4.05,0,0.51,6.02,47.2,3.5549,5,296,16.6,393.23,10.11,23.2 179 | "178",0.05425,0,4.05,0,0.51,6.315,73.4,3.3175,5,296,16.6,395.6,6.29,24.6 180 | "179",0.06642,0,4.05,0,0.51,6.86,74.4,2.9153,5,296,16.6,391.27,6.92,29.9 181 | "180",0.0578,0,2.46,0,0.488,6.98,58.4,2.829,3,193,17.8,396.9,5.04,37.2 182 | "181",0.06588,0,2.46,0,0.488,7.765,83.3,2.741,3,193,17.8,395.56,7.56,39.8 183 | "182",0.06888,0,2.46,0,0.488,6.144,62.2,2.5979,3,193,17.8,396.9,9.45,36.2 184 | "183",0.09103,0,2.46,0,0.488,7.155,92.2,2.7006,3,193,17.8,394.12,4.82,37.9 185 | "184",0.10008,0,2.46,0,0.488,6.563,95.6,2.847,3,193,17.8,396.9,5.68,32.5 186 | "185",0.08308,0,2.46,0,0.488,5.604,89.8,2.9879,3,193,17.8,391,13.98,26.4 187 | "186",0.06047,0,2.46,0,0.488,6.153,68.8,3.2797,3,193,17.8,387.11,13.15,29.6 188 | "187",0.05602,0,2.46,0,0.488,7.831,53.6,3.1992,3,193,17.8,392.63,4.45,50 189 | "188",0.07875,45,3.44,0,0.437,6.782,41.1,3.7886,5,398,15.2,393.87,6.68,32 190 | "189",0.12579,45,3.44,0,0.437,6.556,29.1,4.5667,5,398,15.2,382.84,4.56,29.8 191 | "190",0.0837,45,3.44,0,0.437,7.185,38.9,4.5667,5,398,15.2,396.9,5.39,34.9 192 | "191",0.09068,45,3.44,0,0.437,6.951,21.5,6.4798,5,398,15.2,377.68,5.1,37 193 | "192",0.06911,45,3.44,0,0.437,6.739,30.8,6.4798,5,398,15.2,389.71,4.69,30.5 194 | "193",0.08664,45,3.44,0,0.437,7.178,26.3,6.4798,5,398,15.2,390.49,2.87,36.4 195 | "194",0.02187,60,2.93,0,0.401,6.8,9.9,6.2196,1,265,15.6,393.37,5.03,31.1 196 | "195",0.01439,60,2.93,0,0.401,6.604,18.8,6.2196,1,265,15.6,376.7,4.38,29.1 197 | "196",0.01381,80,0.46,0,0.422,7.875,32,5.6484,4,255,14.4,394.23,2.97,50 198 | "197",0.04011,80,1.52,0,0.404,7.287,34.1,7.309,2,329,12.6,396.9,4.08,33.3 199 | "198",0.04666,80,1.52,0,0.404,7.107,36.6,7.309,2,329,12.6,354.31,8.61,30.3 200 | "199",0.03768,80,1.52,0,0.404,7.274,38.3,7.309,2,329,12.6,392.2,6.62,34.6 201 | "200",0.0315,95,1.47,0,0.403,6.975,15.3,7.6534,3,402,17,396.9,4.56,34.9 202 | "201",0.01778,95,1.47,0,0.403,7.135,13.9,7.6534,3,402,17,384.3,4.45,32.9 203 | "202",0.03445,82.5,2.03,0,0.415,6.162,38.4,6.27,2,348,14.7,393.77,7.43,24.1 204 | "203",0.02177,82.5,2.03,0,0.415,7.61,15.7,6.27,2,348,14.7,395.38,3.11,42.3 205 | "204",0.0351,95,2.68,0,0.4161,7.853,33.2,5.118,4,224,14.7,392.78,3.81,48.5 206 | "205",0.02009,95,2.68,0,0.4161,8.034,31.9,5.118,4,224,14.7,390.55,2.88,50 207 | "206",0.13642,0,10.59,0,0.489,5.891,22.3,3.9454,4,277,18.6,396.9,10.87,22.6 208 | "207",0.22969,0,10.59,0,0.489,6.326,52.5,4.3549,4,277,18.6,394.87,10.97,24.4 209 | "208",0.25199,0,10.59,0,0.489,5.783,72.7,4.3549,4,277,18.6,389.43,18.06,22.5 210 | "209",0.13587,0,10.59,1,0.489,6.064,59.1,4.2392,4,277,18.6,381.32,14.66,24.4 211 | "210",0.43571,0,10.59,1,0.489,5.344,100,3.875,4,277,18.6,396.9,23.09,20 212 | "211",0.17446,0,10.59,1,0.489,5.96,92.1,3.8771,4,277,18.6,393.25,17.27,21.7 213 | "212",0.37578,0,10.59,1,0.489,5.404,88.6,3.665,4,277,18.6,395.24,23.98,19.3 214 | "213",0.21719,0,10.59,1,0.489,5.807,53.8,3.6526,4,277,18.6,390.94,16.03,22.4 215 | "214",0.14052,0,10.59,0,0.489,6.375,32.3,3.9454,4,277,18.6,385.81,9.38,28.1 216 | "215",0.28955,0,10.59,0,0.489,5.412,9.8,3.5875,4,277,18.6,348.93,29.55,23.7 217 | "216",0.19802,0,10.59,0,0.489,6.182,42.4,3.9454,4,277,18.6,393.63,9.47,25 218 | "217",0.0456,0,13.89,1,0.55,5.888,56,3.1121,5,276,16.4,392.8,13.51,23.3 219 | "218",0.07013,0,13.89,0,0.55,6.642,85.1,3.4211,5,276,16.4,392.78,9.69,28.7 220 | "219",0.11069,0,13.89,1,0.55,5.951,93.8,2.8893,5,276,16.4,396.9,17.92,21.5 221 | "220",0.11425,0,13.89,1,0.55,6.373,92.4,3.3633,5,276,16.4,393.74,10.5,23 222 | "221",0.35809,0,6.2,1,0.507,6.951,88.5,2.8617,8,307,17.4,391.7,9.71,26.7 223 | "222",0.40771,0,6.2,1,0.507,6.164,91.3,3.048,8,307,17.4,395.24,21.46,21.7 224 | "223",0.62356,0,6.2,1,0.507,6.879,77.7,3.2721,8,307,17.4,390.39,9.93,27.5 225 | "224",0.6147,0,6.2,0,0.507,6.618,80.8,3.2721,8,307,17.4,396.9,7.6,30.1 226 | "225",0.31533,0,6.2,0,0.504,8.266,78.3,2.8944,8,307,17.4,385.05,4.14,44.8 227 | "226",0.52693,0,6.2,0,0.504,8.725,83,2.8944,8,307,17.4,382,4.63,50 228 | "227",0.38214,0,6.2,0,0.504,8.04,86.5,3.2157,8,307,17.4,387.38,3.13,37.6 229 | "228",0.41238,0,6.2,0,0.504,7.163,79.9,3.2157,8,307,17.4,372.08,6.36,31.6 230 | "229",0.29819,0,6.2,0,0.504,7.686,17,3.3751,8,307,17.4,377.51,3.92,46.7 231 | "230",0.44178,0,6.2,0,0.504,6.552,21.4,3.3751,8,307,17.4,380.34,3.76,31.5 232 | "231",0.537,0,6.2,0,0.504,5.981,68.1,3.6715,8,307,17.4,378.35,11.65,24.3 233 | "232",0.46296,0,6.2,0,0.504,7.412,76.9,3.6715,8,307,17.4,376.14,5.25,31.7 234 | "233",0.57529,0,6.2,0,0.507,8.337,73.3,3.8384,8,307,17.4,385.91,2.47,41.7 235 | "234",0.33147,0,6.2,0,0.507,8.247,70.4,3.6519,8,307,17.4,378.95,3.95,48.3 236 | "235",0.44791,0,6.2,1,0.507,6.726,66.5,3.6519,8,307,17.4,360.2,8.05,29 237 | "236",0.33045,0,6.2,0,0.507,6.086,61.5,3.6519,8,307,17.4,376.75,10.88,24 238 | "237",0.52058,0,6.2,1,0.507,6.631,76.5,4.148,8,307,17.4,388.45,9.54,25.1 239 | "238",0.51183,0,6.2,0,0.507,7.358,71.6,4.148,8,307,17.4,390.07,4.73,31.5 240 | "239",0.08244,30,4.93,0,0.428,6.481,18.5,6.1899,6,300,16.6,379.41,6.36,23.7 241 | "240",0.09252,30,4.93,0,0.428,6.606,42.2,6.1899,6,300,16.6,383.78,7.37,23.3 242 | "241",0.11329,30,4.93,0,0.428,6.897,54.3,6.3361,6,300,16.6,391.25,11.38,22 243 | "242",0.10612,30,4.93,0,0.428,6.095,65.1,6.3361,6,300,16.6,394.62,12.4,20.1 244 | "243",0.1029,30,4.93,0,0.428,6.358,52.9,7.0355,6,300,16.6,372.75,11.22,22.2 245 | "244",0.12757,30,4.93,0,0.428,6.393,7.8,7.0355,6,300,16.6,374.71,5.19,23.7 246 | "245",0.20608,22,5.86,0,0.431,5.593,76.5,7.9549,7,330,19.1,372.49,12.5,17.6 247 | "246",0.19133,22,5.86,0,0.431,5.605,70.2,7.9549,7,330,19.1,389.13,18.46,18.5 248 | "247",0.33983,22,5.86,0,0.431,6.108,34.9,8.0555,7,330,19.1,390.18,9.16,24.3 249 | "248",0.19657,22,5.86,0,0.431,6.226,79.2,8.0555,7,330,19.1,376.14,10.15,20.5 250 | "249",0.16439,22,5.86,0,0.431,6.433,49.1,7.8265,7,330,19.1,374.71,9.52,24.5 251 | "250",0.19073,22,5.86,0,0.431,6.718,17.5,7.8265,7,330,19.1,393.74,6.56,26.2 252 | "251",0.1403,22,5.86,0,0.431,6.487,13,7.3967,7,330,19.1,396.28,5.9,24.4 253 | "252",0.21409,22,5.86,0,0.431,6.438,8.9,7.3967,7,330,19.1,377.07,3.59,24.8 254 | "253",0.08221,22,5.86,0,0.431,6.957,6.8,8.9067,7,330,19.1,386.09,3.53,29.6 255 | "254",0.36894,22,5.86,0,0.431,8.259,8.4,8.9067,7,330,19.1,396.9,3.54,42.8 256 | "255",0.04819,80,3.64,0,0.392,6.108,32,9.2203,1,315,16.4,392.89,6.57,21.9 257 | "256",0.03548,80,3.64,0,0.392,5.876,19.1,9.2203,1,315,16.4,395.18,9.25,20.9 258 | "257",0.01538,90,3.75,0,0.394,7.454,34.2,6.3361,3,244,15.9,386.34,3.11,44 259 | "258",0.61154,20,3.97,0,0.647,8.704,86.9,1.801,5,264,13,389.7,5.12,50 260 | "259",0.66351,20,3.97,0,0.647,7.333,100,1.8946,5,264,13,383.29,7.79,36 261 | "260",0.65665,20,3.97,0,0.647,6.842,100,2.0107,5,264,13,391.93,6.9,30.1 262 | "261",0.54011,20,3.97,0,0.647,7.203,81.8,2.1121,5,264,13,392.8,9.59,33.8 263 | "262",0.53412,20,3.97,0,0.647,7.52,89.4,2.1398,5,264,13,388.37,7.26,43.1 264 | "263",0.52014,20,3.97,0,0.647,8.398,91.5,2.2885,5,264,13,386.86,5.91,48.8 265 | "264",0.82526,20,3.97,0,0.647,7.327,94.5,2.0788,5,264,13,393.42,11.25,31 266 | "265",0.55007,20,3.97,0,0.647,7.206,91.6,1.9301,5,264,13,387.89,8.1,36.5 267 | "266",0.76162,20,3.97,0,0.647,5.56,62.8,1.9865,5,264,13,392.4,10.45,22.8 268 | "267",0.7857,20,3.97,0,0.647,7.014,84.6,2.1329,5,264,13,384.07,14.79,30.7 269 | "268",0.57834,20,3.97,0,0.575,8.297,67,2.4216,5,264,13,384.54,7.44,50 270 | "269",0.5405,20,3.97,0,0.575,7.47,52.6,2.872,5,264,13,390.3,3.16,43.5 271 | "270",0.09065,20,6.96,1,0.464,5.92,61.5,3.9175,3,223,18.6,391.34,13.65,20.7 272 | "271",0.29916,20,6.96,0,0.464,5.856,42.1,4.429,3,223,18.6,388.65,13,21.1 273 | "272",0.16211,20,6.96,0,0.464,6.24,16.3,4.429,3,223,18.6,396.9,6.59,25.2 274 | "273",0.1146,20,6.96,0,0.464,6.538,58.7,3.9175,3,223,18.6,394.96,7.73,24.4 275 | "274",0.22188,20,6.96,1,0.464,7.691,51.8,4.3665,3,223,18.6,390.77,6.58,35.2 276 | "275",0.05644,40,6.41,1,0.447,6.758,32.9,4.0776,4,254,17.6,396.9,3.53,32.4 277 | "276",0.09604,40,6.41,0,0.447,6.854,42.8,4.2673,4,254,17.6,396.9,2.98,32 278 | "277",0.10469,40,6.41,1,0.447,7.267,49,4.7872,4,254,17.6,389.25,6.05,33.2 279 | "278",0.06127,40,6.41,1,0.447,6.826,27.6,4.8628,4,254,17.6,393.45,4.16,33.1 280 | "279",0.07978,40,6.41,0,0.447,6.482,32.1,4.1403,4,254,17.6,396.9,7.19,29.1 281 | "280",0.21038,20,3.33,0,0.4429,6.812,32.2,4.1007,5,216,14.9,396.9,4.85,35.1 282 | "281",0.03578,20,3.33,0,0.4429,7.82,64.5,4.6947,5,216,14.9,387.31,3.76,45.4 283 | "282",0.03705,20,3.33,0,0.4429,6.968,37.2,5.2447,5,216,14.9,392.23,4.59,35.4 284 | "283",0.06129,20,3.33,1,0.4429,7.645,49.7,5.2119,5,216,14.9,377.07,3.01,46 285 | "284",0.01501,90,1.21,1,0.401,7.923,24.8,5.885,1,198,13.6,395.52,3.16,50 286 | "285",0.00906,90,2.97,0,0.4,7.088,20.8,7.3073,1,285,15.3,394.72,7.85,32.2 287 | "286",0.01096,55,2.25,0,0.389,6.453,31.9,7.3073,1,300,15.3,394.72,8.23,22 288 | "287",0.01965,80,1.76,0,0.385,6.23,31.5,9.0892,1,241,18.2,341.6,12.93,20.1 289 | "288",0.03871,52.5,5.32,0,0.405,6.209,31.3,7.3172,6,293,16.6,396.9,7.14,23.2 290 | "289",0.0459,52.5,5.32,0,0.405,6.315,45.6,7.3172,6,293,16.6,396.9,7.6,22.3 291 | "290",0.04297,52.5,5.32,0,0.405,6.565,22.9,7.3172,6,293,16.6,371.72,9.51,24.8 292 | "291",0.03502,80,4.95,0,0.411,6.861,27.9,5.1167,4,245,19.2,396.9,3.33,28.5 293 | "292",0.07886,80,4.95,0,0.411,7.148,27.7,5.1167,4,245,19.2,396.9,3.56,37.3 294 | "293",0.03615,80,4.95,0,0.411,6.63,23.4,5.1167,4,245,19.2,396.9,4.7,27.9 295 | "294",0.08265,0,13.92,0,0.437,6.127,18.4,5.5027,4,289,16,396.9,8.58,23.9 296 | "295",0.08199,0,13.92,0,0.437,6.009,42.3,5.5027,4,289,16,396.9,10.4,21.7 297 | "296",0.12932,0,13.92,0,0.437,6.678,31.1,5.9604,4,289,16,396.9,6.27,28.6 298 | "297",0.05372,0,13.92,0,0.437,6.549,51,5.9604,4,289,16,392.85,7.39,27.1 299 | "298",0.14103,0,13.92,0,0.437,5.79,58,6.32,4,289,16,396.9,15.84,20.3 300 | "299",0.06466,70,2.24,0,0.4,6.345,20.1,7.8278,5,358,14.8,368.24,4.97,22.5 301 | "300",0.05561,70,2.24,0,0.4,7.041,10,7.8278,5,358,14.8,371.58,4.74,29 302 | "301",0.04417,70,2.24,0,0.4,6.871,47.4,7.8278,5,358,14.8,390.86,6.07,24.8 303 | "302",0.03537,34,6.09,0,0.433,6.59,40.4,5.4917,7,329,16.1,395.75,9.5,22 304 | "303",0.09266,34,6.09,0,0.433,6.495,18.4,5.4917,7,329,16.1,383.61,8.67,26.4 305 | "304",0.1,34,6.09,0,0.433,6.982,17.7,5.4917,7,329,16.1,390.43,4.86,33.1 306 | "305",0.05515,33,2.18,0,0.472,7.236,41.1,4.022,7,222,18.4,393.68,6.93,36.1 307 | "306",0.05479,33,2.18,0,0.472,6.616,58.1,3.37,7,222,18.4,393.36,8.93,28.4 308 | "307",0.07503,33,2.18,0,0.472,7.42,71.9,3.0992,7,222,18.4,396.9,6.47,33.4 309 | "308",0.04932,33,2.18,0,0.472,6.849,70.3,3.1827,7,222,18.4,396.9,7.53,28.2 310 | "309",0.49298,0,9.9,0,0.544,6.635,82.5,3.3175,4,304,18.4,396.9,4.54,22.8 311 | "310",0.3494,0,9.9,0,0.544,5.972,76.7,3.1025,4,304,18.4,396.24,9.97,20.3 312 | "311",2.63548,0,9.9,0,0.544,4.973,37.8,2.5194,4,304,18.4,350.45,12.64,16.1 313 | "312",0.79041,0,9.9,0,0.544,6.122,52.8,2.6403,4,304,18.4,396.9,5.98,22.1 314 | "313",0.26169,0,9.9,0,0.544,6.023,90.4,2.834,4,304,18.4,396.3,11.72,19.4 315 | "314",0.26938,0,9.9,0,0.544,6.266,82.8,3.2628,4,304,18.4,393.39,7.9,21.6 316 | "315",0.3692,0,9.9,0,0.544,6.567,87.3,3.6023,4,304,18.4,395.69,9.28,23.8 317 | "316",0.25356,0,9.9,0,0.544,5.705,77.7,3.945,4,304,18.4,396.42,11.5,16.2 318 | "317",0.31827,0,9.9,0,0.544,5.914,83.2,3.9986,4,304,18.4,390.7,18.33,17.8 319 | "318",0.24522,0,9.9,0,0.544,5.782,71.7,4.0317,4,304,18.4,396.9,15.94,19.8 320 | "319",0.40202,0,9.9,0,0.544,6.382,67.2,3.5325,4,304,18.4,395.21,10.36,23.1 321 | "320",0.47547,0,9.9,0,0.544,6.113,58.8,4.0019,4,304,18.4,396.23,12.73,21 322 | "321",0.1676,0,7.38,0,0.493,6.426,52.3,4.5404,5,287,19.6,396.9,7.2,23.8 323 | "322",0.18159,0,7.38,0,0.493,6.376,54.3,4.5404,5,287,19.6,396.9,6.87,23.1 324 | "323",0.35114,0,7.38,0,0.493,6.041,49.9,4.7211,5,287,19.6,396.9,7.7,20.4 325 | "324",0.28392,0,7.38,0,0.493,5.708,74.3,4.7211,5,287,19.6,391.13,11.74,18.5 326 | "325",0.34109,0,7.38,0,0.493,6.415,40.1,4.7211,5,287,19.6,396.9,6.12,25 327 | "326",0.19186,0,7.38,0,0.493,6.431,14.7,5.4159,5,287,19.6,393.68,5.08,24.6 328 | "327",0.30347,0,7.38,0,0.493,6.312,28.9,5.4159,5,287,19.6,396.9,6.15,23 329 | "328",0.24103,0,7.38,0,0.493,6.083,43.7,5.4159,5,287,19.6,396.9,12.79,22.2 330 | "329",0.06617,0,3.24,0,0.46,5.868,25.8,5.2146,4,430,16.9,382.44,9.97,19.3 331 | "330",0.06724,0,3.24,0,0.46,6.333,17.2,5.2146,4,430,16.9,375.21,7.34,22.6 332 | "331",0.04544,0,3.24,0,0.46,6.144,32.2,5.8736,4,430,16.9,368.57,9.09,19.8 333 | "332",0.05023,35,6.06,0,0.4379,5.706,28.4,6.6407,1,304,16.9,394.02,12.43,17.1 334 | "333",0.03466,35,6.06,0,0.4379,6.031,23.3,6.6407,1,304,16.9,362.25,7.83,19.4 335 | "334",0.05083,0,5.19,0,0.515,6.316,38.1,6.4584,5,224,20.2,389.71,5.68,22.2 336 | "335",0.03738,0,5.19,0,0.515,6.31,38.5,6.4584,5,224,20.2,389.4,6.75,20.7 337 | "336",0.03961,0,5.19,0,0.515,6.037,34.5,5.9853,5,224,20.2,396.9,8.01,21.1 338 | "337",0.03427,0,5.19,0,0.515,5.869,46.3,5.2311,5,224,20.2,396.9,9.8,19.5 339 | "338",0.03041,0,5.19,0,0.515,5.895,59.6,5.615,5,224,20.2,394.81,10.56,18.5 340 | "339",0.03306,0,5.19,0,0.515,6.059,37.3,4.8122,5,224,20.2,396.14,8.51,20.6 341 | "340",0.05497,0,5.19,0,0.515,5.985,45.4,4.8122,5,224,20.2,396.9,9.74,19 342 | "341",0.06151,0,5.19,0,0.515,5.968,58.5,4.8122,5,224,20.2,396.9,9.29,18.7 343 | "342",0.01301,35,1.52,0,0.442,7.241,49.3,7.0379,1,284,15.5,394.74,5.49,32.7 344 | "343",0.02498,0,1.89,0,0.518,6.54,59.7,6.2669,1,422,15.9,389.96,8.65,16.5 345 | "344",0.02543,55,3.78,0,0.484,6.696,56.4,5.7321,5,370,17.6,396.9,7.18,23.9 346 | "345",0.03049,55,3.78,0,0.484,6.874,28.1,6.4654,5,370,17.6,387.97,4.61,31.2 347 | "346",0.03113,0,4.39,0,0.442,6.014,48.5,8.0136,3,352,18.8,385.64,10.53,17.5 348 | "347",0.06162,0,4.39,0,0.442,5.898,52.3,8.0136,3,352,18.8,364.61,12.67,17.2 349 | "348",0.0187,85,4.15,0,0.429,6.516,27.7,8.5353,4,351,17.9,392.43,6.36,23.1 350 | "349",0.01501,80,2.01,0,0.435,6.635,29.7,8.344,4,280,17,390.94,5.99,24.5 351 | "350",0.02899,40,1.25,0,0.429,6.939,34.5,8.7921,1,335,19.7,389.85,5.89,26.6 352 | "351",0.06211,40,1.25,0,0.429,6.49,44.4,8.7921,1,335,19.7,396.9,5.98,22.9 353 | "352",0.0795,60,1.69,0,0.411,6.579,35.9,10.7103,4,411,18.3,370.78,5.49,24.1 354 | "353",0.07244,60,1.69,0,0.411,5.884,18.5,10.7103,4,411,18.3,392.33,7.79,18.6 355 | "354",0.01709,90,2.02,0,0.41,6.728,36.1,12.1265,5,187,17,384.46,4.5,30.1 356 | "355",0.04301,80,1.91,0,0.413,5.663,21.9,10.5857,4,334,22,382.8,8.05,18.2 357 | "356",0.10659,80,1.91,0,0.413,5.936,19.5,10.5857,4,334,22,376.04,5.57,20.6 358 | "357",8.98296,0,18.1,1,0.77,6.212,97.4,2.1222,24,666,20.2,377.73,17.6,17.8 359 | "358",3.8497,0,18.1,1,0.77,6.395,91,2.5052,24,666,20.2,391.34,13.27,21.7 360 | "359",5.20177,0,18.1,1,0.77,6.127,83.4,2.7227,24,666,20.2,395.43,11.48,22.7 361 | "360",4.26131,0,18.1,0,0.77,6.112,81.3,2.5091,24,666,20.2,390.74,12.67,22.6 362 | "361",4.54192,0,18.1,0,0.77,6.398,88,2.5182,24,666,20.2,374.56,7.79,25 363 | "362",3.83684,0,18.1,0,0.77,6.251,91.1,2.2955,24,666,20.2,350.65,14.19,19.9 364 | "363",3.67822,0,18.1,0,0.77,5.362,96.2,2.1036,24,666,20.2,380.79,10.19,20.8 365 | "364",4.22239,0,18.1,1,0.77,5.803,89,1.9047,24,666,20.2,353.04,14.64,16.8 366 | "365",3.47428,0,18.1,1,0.718,8.78,82.9,1.9047,24,666,20.2,354.55,5.29,21.9 367 | "366",4.55587,0,18.1,0,0.718,3.561,87.9,1.6132,24,666,20.2,354.7,7.12,27.5 368 | "367",3.69695,0,18.1,0,0.718,4.963,91.4,1.7523,24,666,20.2,316.03,14,21.9 369 | "368",13.5222,0,18.1,0,0.631,3.863,100,1.5106,24,666,20.2,131.42,13.33,23.1 370 | "369",4.89822,0,18.1,0,0.631,4.97,100,1.3325,24,666,20.2,375.52,3.26,50 371 | "370",5.66998,0,18.1,1,0.631,6.683,96.8,1.3567,24,666,20.2,375.33,3.73,50 372 | "371",6.53876,0,18.1,1,0.631,7.016,97.5,1.2024,24,666,20.2,392.05,2.96,50 373 | "372",9.2323,0,18.1,0,0.631,6.216,100,1.1691,24,666,20.2,366.15,9.53,50 374 | "373",8.26725,0,18.1,1,0.668,5.875,89.6,1.1296,24,666,20.2,347.88,8.88,50 375 | "374",11.1081,0,18.1,0,0.668,4.906,100,1.1742,24,666,20.2,396.9,34.77,13.8 376 | "375",18.4982,0,18.1,0,0.668,4.138,100,1.137,24,666,20.2,396.9,37.97,13.8 377 | "376",19.6091,0,18.1,0,0.671,7.313,97.9,1.3163,24,666,20.2,396.9,13.44,15 378 | "377",15.288,0,18.1,0,0.671,6.649,93.3,1.3449,24,666,20.2,363.02,23.24,13.9 379 | "378",9.82349,0,18.1,0,0.671,6.794,98.8,1.358,24,666,20.2,396.9,21.24,13.3 380 | "379",23.6482,0,18.1,0,0.671,6.38,96.2,1.3861,24,666,20.2,396.9,23.69,13.1 381 | "380",17.8667,0,18.1,0,0.671,6.223,100,1.3861,24,666,20.2,393.74,21.78,10.2 382 | "381",88.9762,0,18.1,0,0.671,6.968,91.9,1.4165,24,666,20.2,396.9,17.21,10.4 383 | "382",15.8744,0,18.1,0,0.671,6.545,99.1,1.5192,24,666,20.2,396.9,21.08,10.9 384 | "383",9.18702,0,18.1,0,0.7,5.536,100,1.5804,24,666,20.2,396.9,23.6,11.3 385 | "384",7.99248,0,18.1,0,0.7,5.52,100,1.5331,24,666,20.2,396.9,24.56,12.3 386 | "385",20.0849,0,18.1,0,0.7,4.368,91.2,1.4395,24,666,20.2,285.83,30.63,8.8 387 | "386",16.8118,0,18.1,0,0.7,5.277,98.1,1.4261,24,666,20.2,396.9,30.81,7.2 388 | "387",24.3938,0,18.1,0,0.7,4.652,100,1.4672,24,666,20.2,396.9,28.28,10.5 389 | "388",22.5971,0,18.1,0,0.7,5,89.5,1.5184,24,666,20.2,396.9,31.99,7.4 390 | "389",14.3337,0,18.1,0,0.7,4.88,100,1.5895,24,666,20.2,372.92,30.62,10.2 391 | "390",8.15174,0,18.1,0,0.7,5.39,98.9,1.7281,24,666,20.2,396.9,20.85,11.5 392 | "391",6.96215,0,18.1,0,0.7,5.713,97,1.9265,24,666,20.2,394.43,17.11,15.1 393 | "392",5.29305,0,18.1,0,0.7,6.051,82.5,2.1678,24,666,20.2,378.38,18.76,23.2 394 | "393",11.5779,0,18.1,0,0.7,5.036,97,1.77,24,666,20.2,396.9,25.68,9.7 395 | "394",8.64476,0,18.1,0,0.693,6.193,92.6,1.7912,24,666,20.2,396.9,15.17,13.8 396 | "395",13.3598,0,18.1,0,0.693,5.887,94.7,1.7821,24,666,20.2,396.9,16.35,12.7 397 | "396",8.71675,0,18.1,0,0.693,6.471,98.8,1.7257,24,666,20.2,391.98,17.12,13.1 398 | "397",5.87205,0,18.1,0,0.693,6.405,96,1.6768,24,666,20.2,396.9,19.37,12.5 399 | "398",7.67202,0,18.1,0,0.693,5.747,98.9,1.6334,24,666,20.2,393.1,19.92,8.5 400 | "399",38.3518,0,18.1,0,0.693,5.453,100,1.4896,24,666,20.2,396.9,30.59,5 401 | "400",9.91655,0,18.1,0,0.693,5.852,77.8,1.5004,24,666,20.2,338.16,29.97,6.3 402 | "401",25.0461,0,18.1,0,0.693,5.987,100,1.5888,24,666,20.2,396.9,26.77,5.6 403 | "402",14.2362,0,18.1,0,0.693,6.343,100,1.5741,24,666,20.2,396.9,20.32,7.2 404 | "403",9.59571,0,18.1,0,0.693,6.404,100,1.639,24,666,20.2,376.11,20.31,12.1 405 | "404",24.8017,0,18.1,0,0.693,5.349,96,1.7028,24,666,20.2,396.9,19.77,8.3 406 | "405",41.5292,0,18.1,0,0.693,5.531,85.4,1.6074,24,666,20.2,329.46,27.38,8.5 407 | "406",67.9208,0,18.1,0,0.693,5.683,100,1.4254,24,666,20.2,384.97,22.98,5 408 | "407",20.7162,0,18.1,0,0.659,4.138,100,1.1781,24,666,20.2,370.22,23.34,11.9 409 | "408",11.9511,0,18.1,0,0.659,5.608,100,1.2852,24,666,20.2,332.09,12.13,27.9 410 | "409",7.40389,0,18.1,0,0.597,5.617,97.9,1.4547,24,666,20.2,314.64,26.4,17.2 411 | "410",14.4383,0,18.1,0,0.597,6.852,100,1.4655,24,666,20.2,179.36,19.78,27.5 412 | "411",51.1358,0,18.1,0,0.597,5.757,100,1.413,24,666,20.2,2.6,10.11,15 413 | "412",14.0507,0,18.1,0,0.597,6.657,100,1.5275,24,666,20.2,35.05,21.22,17.2 414 | "413",18.811,0,18.1,0,0.597,4.628,100,1.5539,24,666,20.2,28.79,34.37,17.9 415 | "414",28.6558,0,18.1,0,0.597,5.155,100,1.5894,24,666,20.2,210.97,20.08,16.3 416 | "415",45.7461,0,18.1,0,0.693,4.519,100,1.6582,24,666,20.2,88.27,36.98,7 417 | "416",18.0846,0,18.1,0,0.679,6.434,100,1.8347,24,666,20.2,27.25,29.05,7.2 418 | "417",10.8342,0,18.1,0,0.679,6.782,90.8,1.8195,24,666,20.2,21.57,25.79,7.5 419 | "418",25.9406,0,18.1,0,0.679,5.304,89.1,1.6475,24,666,20.2,127.36,26.64,10.4 420 | "419",73.5341,0,18.1,0,0.679,5.957,100,1.8026,24,666,20.2,16.45,20.62,8.8 421 | "420",11.8123,0,18.1,0,0.718,6.824,76.5,1.794,24,666,20.2,48.45,22.74,8.4 422 | "421",11.0874,0,18.1,0,0.718,6.411,100,1.8589,24,666,20.2,318.75,15.02,16.7 423 | "422",7.02259,0,18.1,0,0.718,6.006,95.3,1.8746,24,666,20.2,319.98,15.7,14.2 424 | "423",12.0482,0,18.1,0,0.614,5.648,87.6,1.9512,24,666,20.2,291.55,14.1,20.8 425 | "424",7.05042,0,18.1,0,0.614,6.103,85.1,2.0218,24,666,20.2,2.52,23.29,13.4 426 | "425",8.79212,0,18.1,0,0.584,5.565,70.6,2.0635,24,666,20.2,3.65,17.16,11.7 427 | "426",15.8603,0,18.1,0,0.679,5.896,95.4,1.9096,24,666,20.2,7.68,24.39,8.3 428 | "427",12.2472,0,18.1,0,0.584,5.837,59.7,1.9976,24,666,20.2,24.65,15.69,10.2 429 | "428",37.6619,0,18.1,0,0.679,6.202,78.7,1.8629,24,666,20.2,18.82,14.52,10.9 430 | "429",7.36711,0,18.1,0,0.679,6.193,78.1,1.9356,24,666,20.2,96.73,21.52,11 431 | "430",9.33889,0,18.1,0,0.679,6.38,95.6,1.9682,24,666,20.2,60.72,24.08,9.5 432 | "431",8.49213,0,18.1,0,0.584,6.348,86.1,2.0527,24,666,20.2,83.45,17.64,14.5 433 | "432",10.0623,0,18.1,0,0.584,6.833,94.3,2.0882,24,666,20.2,81.33,19.69,14.1 434 | "433",6.44405,0,18.1,0,0.584,6.425,74.8,2.2004,24,666,20.2,97.95,12.03,16.1 435 | "434",5.58107,0,18.1,0,0.713,6.436,87.9,2.3158,24,666,20.2,100.19,16.22,14.3 436 | "435",13.9134,0,18.1,0,0.713,6.208,95,2.2222,24,666,20.2,100.63,15.17,11.7 437 | "436",11.1604,0,18.1,0,0.74,6.629,94.6,2.1247,24,666,20.2,109.85,23.27,13.4 438 | "437",14.4208,0,18.1,0,0.74,6.461,93.3,2.0026,24,666,20.2,27.49,18.05,9.6 439 | "438",15.1772,0,18.1,0,0.74,6.152,100,1.9142,24,666,20.2,9.32,26.45,8.7 440 | "439",13.6781,0,18.1,0,0.74,5.935,87.9,1.8206,24,666,20.2,68.95,34.02,8.4 441 | "440",9.39063,0,18.1,0,0.74,5.627,93.9,1.8172,24,666,20.2,396.9,22.88,12.8 442 | "441",22.0511,0,18.1,0,0.74,5.818,92.4,1.8662,24,666,20.2,391.45,22.11,10.5 443 | "442",9.72418,0,18.1,0,0.74,6.406,97.2,2.0651,24,666,20.2,385.96,19.52,17.1 444 | "443",5.66637,0,18.1,0,0.74,6.219,100,2.0048,24,666,20.2,395.69,16.59,18.4 445 | "444",9.96654,0,18.1,0,0.74,6.485,100,1.9784,24,666,20.2,386.73,18.85,15.4 446 | "445",12.8023,0,18.1,0,0.74,5.854,96.6,1.8956,24,666,20.2,240.52,23.79,10.8 447 | "446",10.6718,0,18.1,0,0.74,6.459,94.8,1.9879,24,666,20.2,43.06,23.98,11.8 448 | "447",6.28807,0,18.1,0,0.74,6.341,96.4,2.072,24,666,20.2,318.01,17.79,14.9 449 | "448",9.92485,0,18.1,0,0.74,6.251,96.6,2.198,24,666,20.2,388.52,16.44,12.6 450 | "449",9.32909,0,18.1,0,0.713,6.185,98.7,2.2616,24,666,20.2,396.9,18.13,14.1 451 | "450",7.52601,0,18.1,0,0.713,6.417,98.3,2.185,24,666,20.2,304.21,19.31,13 452 | "451",6.71772,0,18.1,0,0.713,6.749,92.6,2.3236,24,666,20.2,0.32,17.44,13.4 453 | "452",5.44114,0,18.1,0,0.713,6.655,98.2,2.3552,24,666,20.2,355.29,17.73,15.2 454 | "453",5.09017,0,18.1,0,0.713,6.297,91.8,2.3682,24,666,20.2,385.09,17.27,16.1 455 | "454",8.24809,0,18.1,0,0.713,7.393,99.3,2.4527,24,666,20.2,375.87,16.74,17.8 456 | "455",9.51363,0,18.1,0,0.713,6.728,94.1,2.4961,24,666,20.2,6.68,18.71,14.9 457 | "456",4.75237,0,18.1,0,0.713,6.525,86.5,2.4358,24,666,20.2,50.92,18.13,14.1 458 | "457",4.66883,0,18.1,0,0.713,5.976,87.9,2.5806,24,666,20.2,10.48,19.01,12.7 459 | "458",8.20058,0,18.1,0,0.713,5.936,80.3,2.7792,24,666,20.2,3.5,16.94,13.5 460 | "459",7.75223,0,18.1,0,0.713,6.301,83.7,2.7831,24,666,20.2,272.21,16.23,14.9 461 | "460",6.80117,0,18.1,0,0.713,6.081,84.4,2.7175,24,666,20.2,396.9,14.7,20 462 | "461",4.81213,0,18.1,0,0.713,6.701,90,2.5975,24,666,20.2,255.23,16.42,16.4 463 | "462",3.69311,0,18.1,0,0.713,6.376,88.4,2.5671,24,666,20.2,391.43,14.65,17.7 464 | "463",6.65492,0,18.1,0,0.713,6.317,83,2.7344,24,666,20.2,396.9,13.99,19.5 465 | "464",5.82115,0,18.1,0,0.713,6.513,89.9,2.8016,24,666,20.2,393.82,10.29,20.2 466 | "465",7.83932,0,18.1,0,0.655,6.209,65.4,2.9634,24,666,20.2,396.9,13.22,21.4 467 | "466",3.1636,0,18.1,0,0.655,5.759,48.2,3.0665,24,666,20.2,334.4,14.13,19.9 468 | "467",3.77498,0,18.1,0,0.655,5.952,84.7,2.8715,24,666,20.2,22.01,17.15,19 469 | "468",4.42228,0,18.1,0,0.584,6.003,94.5,2.5403,24,666,20.2,331.29,21.32,19.1 470 | "469",15.5757,0,18.1,0,0.58,5.926,71,2.9084,24,666,20.2,368.74,18.13,19.1 471 | "470",13.0751,0,18.1,0,0.58,5.713,56.7,2.8237,24,666,20.2,396.9,14.76,20.1 472 | "471",4.34879,0,18.1,0,0.58,6.167,84,3.0334,24,666,20.2,396.9,16.29,19.9 473 | "472",4.03841,0,18.1,0,0.532,6.229,90.7,3.0993,24,666,20.2,395.33,12.87,19.6 474 | "473",3.56868,0,18.1,0,0.58,6.437,75,2.8965,24,666,20.2,393.37,14.36,23.2 475 | "474",4.64689,0,18.1,0,0.614,6.98,67.6,2.5329,24,666,20.2,374.68,11.66,29.8 476 | "475",8.05579,0,18.1,0,0.584,5.427,95.4,2.4298,24,666,20.2,352.58,18.14,13.8 477 | "476",6.39312,0,18.1,0,0.584,6.162,97.4,2.206,24,666,20.2,302.76,24.1,13.3 478 | "477",4.87141,0,18.1,0,0.614,6.484,93.6,2.3053,24,666,20.2,396.21,18.68,16.7 479 | "478",15.0234,0,18.1,0,0.614,5.304,97.3,2.1007,24,666,20.2,349.48,24.91,12 480 | "479",10.233,0,18.1,0,0.614,6.185,96.7,2.1705,24,666,20.2,379.7,18.03,14.6 481 | "480",14.3337,0,18.1,0,0.614,6.229,88,1.9512,24,666,20.2,383.32,13.11,21.4 482 | "481",5.82401,0,18.1,0,0.532,6.242,64.7,3.4242,24,666,20.2,396.9,10.74,23 483 | "482",5.70818,0,18.1,0,0.532,6.75,74.9,3.3317,24,666,20.2,393.07,7.74,23.7 484 | "483",5.73116,0,18.1,0,0.532,7.061,77,3.4106,24,666,20.2,395.28,7.01,25 485 | "484",2.81838,0,18.1,0,0.532,5.762,40.3,4.0983,24,666,20.2,392.92,10.42,21.8 486 | "485",2.37857,0,18.1,0,0.583,5.871,41.9,3.724,24,666,20.2,370.73,13.34,20.6 487 | "486",3.67367,0,18.1,0,0.583,6.312,51.9,3.9917,24,666,20.2,388.62,10.58,21.2 488 | "487",5.69175,0,18.1,0,0.583,6.114,79.8,3.5459,24,666,20.2,392.68,14.98,19.1 489 | "488",4.83567,0,18.1,0,0.583,5.905,53.2,3.1523,24,666,20.2,388.22,11.45,20.6 490 | "489",0.15086,0,27.74,0,0.609,5.454,92.7,1.8209,4,711,20.1,395.09,18.06,15.2 491 | "490",0.18337,0,27.74,0,0.609,5.414,98.3,1.7554,4,711,20.1,344.05,23.97,7 492 | "491",0.20746,0,27.74,0,0.609,5.093,98,1.8226,4,711,20.1,318.43,29.68,8.1 493 | "492",0.10574,0,27.74,0,0.609,5.983,98.8,1.8681,4,711,20.1,390.11,18.07,13.6 494 | "493",0.11132,0,27.74,0,0.609,5.983,83.5,2.1099,4,711,20.1,396.9,13.35,20.1 495 | "494",0.17331,0,9.69,0,0.585,5.707,54,2.3817,6,391,19.2,396.9,12.01,21.8 496 | "495",0.27957,0,9.69,0,0.585,5.926,42.6,2.3817,6,391,19.2,396.9,13.59,24.5 497 | "496",0.17899,0,9.69,0,0.585,5.67,28.8,2.7986,6,391,19.2,393.29,17.6,23.1 498 | "497",0.2896,0,9.69,0,0.585,5.39,72.9,2.7986,6,391,19.2,396.9,21.14,19.7 499 | "498",0.26838,0,9.69,0,0.585,5.794,70.6,2.8927,6,391,19.2,396.9,14.1,18.3 500 | "499",0.23912,0,9.69,0,0.585,6.019,65.3,2.4091,6,391,19.2,396.9,12.92,21.2 501 | "500",0.17783,0,9.69,0,0.585,5.569,73.5,2.3999,6,391,19.2,395.77,15.1,17.5 502 | "501",0.22438,0,9.69,0,0.585,6.027,79.7,2.4982,6,391,19.2,396.9,14.33,16.8 503 | "502",0.06263,0,11.93,0,0.573,6.593,69.1,2.4786,1,273,21,391.99,9.67,22.4 504 | "503",0.04527,0,11.93,0,0.573,6.12,76.7,2.2875,1,273,21,396.9,9.08,20.6 505 | "504",0.06076,0,11.93,0,0.573,6.976,91,2.1675,1,273,21,396.9,5.64,23.9 506 | "505",0.10959,0,11.93,0,0.573,6.794,89.3,2.3889,1,273,21,393.45,6.48,22 507 | "506",0.04741,0,11.93,0,0.573,6.03,80.8,2.505,1,273,21,396.9,7.88,11.9 508 | -------------------------------------------------------------------------------- /Examples/LCA Bokeh.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "#datapath = 'C:/Users/Ram/Documents/Ram/Data_Sets/'\n", 20 | "datapath = ''\n", 21 | "filename = ''\n", 22 | "sep = ','\n", 23 | "target = 'chas'" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "df = pd.read_csv(datapath+filename,sep=sep)\n", 33 | "dft = df[:]\n", 34 | "print(df.shape)\n", 35 | "df.head(1)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": null, 41 | "metadata": { 42 | "scrolled": false 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "dft = AV.AutoViz(datapath+filename, sep, target, \"\",\n", 47 | " header=0, verbose=1,\n", 48 | " lowess=False,chart_format='bokeh',max_rows_analyzed=150000,max_cols_analyzed=30)\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [] 71 | } 72 | ], 73 | "metadata": { 74 | "kernelspec": { 75 | "display_name": "Python 3", 76 | "language": "python", 77 | "name": "python3" 78 | }, 79 | "language_info": { 80 | "codemirror_mode": { 81 | "name": "ipython", 82 | "version": 3 83 | }, 84 | "file_extension": ".py", 85 | "mimetype": "text/x-python", 86 | "name": "python", 87 | "nbconvert_exporter": "python", 88 | "pygments_lexer": "ipython3", 89 | "version": "3.10.13" 90 | } 91 | }, 92 | "nbformat": 4, 93 | "nbformat_minor": 2 94 | } 95 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AutoViz: The One-Line Automatic Data Visualization Library 2 | 3 | ![logo](images/logo.png) 4 | 5 | Unlock the power of **AutoViz** to visualize any dataset, any size, with just a single line of code! Plus, now you can get a quick assessment of your dataset's quality and fix DQ issues through the FixDQ() function. 6 | 7 | [![Pepy Downloads](https://pepy.tech/badge/autoviz)](https://pepy.tech/project/autoviz) 8 | [![Pepy Downloads per week](https://pepy.tech/badge/autoviz/week)](https://pepy.tech/project/autoviz) 9 | [![Pepy Downloads per month](https://pepy.tech/badge/autoviz/month)](https://pepy.tech/project/autoviz) 10 | [![standard-readme compliant](https://img.shields.io/badge/standard--readme-OK-green.svg)](https://github.com/RichardLitt/standard-readme) 11 | [![Python Versions](https://img.shields.io/pypi/pyversions/autoviz.svg)](https://pypi.org/project/autoviz) 12 | [![PyPI Version](https://img.shields.io/pypi/v/autoviz.svg)](https://pypi.org/project/autoviz) 13 | [![PyPI License](https://img.shields.io/pypi/l/autoviz.svg)](https://github.com/AutoViML/AutoViz/blob/master/LICENSE) 14 | 15 | With AutoViz, you can easily and quickly generate insightful visualizations for your data. Whether you're a beginner or an expert in data analysis, AutoViz can help you explore your data and uncover valuable insights. Try it out and see the power of automated visualization for yourself! 16 | 17 | ## Table of Contents 18 | 32 | 33 | ## Latest 34 | The latest updates about `autoviz` library can be found in Updates page. 35 | 36 | ## ImportantAnnouncement 37 | ### Starting with version 0.1.901, an important update 38 |
  • We're excited to announce we've made significant updates to our `setup.py` script to leverage the latest versions in our dependencies while maintaining support for older Python versions (you may want to check older versions). The installation process is seamless—simply run pip install . in the AutoViz directory, and the script takes care of the rest, tailoring the installation to your environment.
  • 39 | 40 | ### Feedback 41 | Your feedback is crucial! If you encounter any issues or have suggestions, please let us know through [GitHub Issues](https://github.com/AutoViML/AutoViz/issues) 42 | 43 | Thank you for your continued support and happy visualizing! 44 | 45 | ## Citation 46 | If you use AutoViz in your research project or paper, please use the following format for citations:

    47 | "Seshadri, Ram (2020). GitHub - AutoViML/AutoViz: Automatically Visualize any dataset, any size with a single line of code. source code: https://github.com/AutoViML/AutoViz"

    48 | Current citations for AutoViz 49 | 50 | [Google Scholar](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C31&q=autoviz&oq=autoviz) 51 | 52 | ## Motivation 53 | The motivation behind the creation of AutoViz is to provide a more efficient, user-friendly, and automated approach to exploratory data analysis (EDA) through quick and easy data visualization plus data quality. The library is designed to help users understand patterns, trends, and relationships in the data by creating insightful visualizations with minimal effort. AutoViz is particularly useful for beginners in data analysis as it abstracts away the complexities of various plotting libraries and techniques. For experts, it provides another expert tool that they can use to provide inights into data that they may have missed. 54 | 55 | AutoViz is a powerful tool for generating insightful visualizations with minimal effort. Here are some of its key selling points compared to other automated EDA tools: 56 |
      57 |
    1. Ease of use: AutoViz is designed to be user-friendly and accessible to beginners in data analysis, abstracting away the complexities of various plotting libraries
    2. 58 |
    3. Speed: AutoViz is optimized for speed and can generate multiple insightful plots with just a single line of code
    4. 59 |
    5. Scalability: AutoViz is designed to work with datasets of any size and can handle large datasets efficiently
    6. 60 |
    7. Automation: AutoViz automates the visualization process, requiring just a single line of code to generate multiple insightful plots
    8. 61 |
    9. Customization: AutoViz provides several options for customizing the visualizations, such as changing the chart type, color palette, etc.
    10. 62 |
    11. Data Quality: AutoViz now provides data quality assessment by default and helps you fix DQ issues with a single line of code using the FixDQ() function
    12. 63 |
    64 | ## Installation 65 | 66 | **Prerequisites** 67 | - [Anaconda](https://docs.anaconda.com/anaconda/install/) 68 | 69 | Create a new environment and install the required dependencies to clone AutoViz: 70 | 71 | **From PyPi:** 72 | ```sh 73 | cd 74 | git clone git@github.com:AutoViML/AutoViz.git 75 | # or download and unzip https://github.com/AutoViML/AutoViz/archive/master.zip 76 | conda create -n python=3.7 anaconda 77 | conda activate # ON WINDOWS: `source activate ` 78 | cd AutoViz 79 | ``` 80 | For Python versions below 3.10, install dependencies as follows: 81 | 82 | ``` 83 | pip install -r requirements.txt 84 | ``` 85 | 86 | For Python 3.10, please use: 87 | 88 | ``` 89 | pip install -r requirements-py310.txt 90 | ``` 91 | 92 | For Python 3.11 and above, it's recommended to use: 93 | 94 | ``` 95 | pip install -r requirements-py311.txt 96 | ``` 97 | 98 | These requirement files ensure that AutoViz works seamlessly with your Python environment by installing compatible versions of libraries like HoloViews, Bokeh, and hvPlot. Please select the requirement file that corresponds to your Python version to enjoy a smooth experience with AutoViz. 99 | 100 | ## Usage 101 | Discover how to use AutoViz in this Medium article. 102 | 103 | In the AutoViz directory, open a Jupyter Notebook or open a command palette (terminal) and use the following code to instantiate the AutoViz_Class. You can simply run this code step by step: 104 | 105 | ```python 106 | from autoviz import AutoViz_Class 107 | AV = AutoViz_Class() 108 | dft = AV.AutoViz(filename) 109 | ``` 110 | 111 | AutoViz can use any input either filename (in CSV, txt, or JSON format) or a pandas dataframe. If you have a large dataset, you can set the `max_rows_analyzed` and `max_cols_analyzed` arguments to speed up the visualization by asking autoviz to sample your dataset. 112 | 113 | AutoViz can also create charts in multiple formats using the `chart_format` setting: 114 | - If `chart_format ='png'` or `'svg'` or `'jpg'`: Matplotlib charts are plotted inline. 115 | * Can be saved locally (using `verbose=2` setting) or displayed (`verbose=1`) in Jupyter Notebooks. 116 | * This is the default behavior for AutoViz. 117 | - If `chart_format='bokeh'`: Interactive Bokeh charts are plotted in Jupyter Notebooks. 118 | - If `chart_format='server'`, dashboards will pop up for each kind of chart on your browser. 119 | - If `chart_format='html'`, interactive Bokeh charts will be created and silently saved as HTML files under the `AutoViz_Plots` directory (under working folder) or any other directory that you specify using the `save_plot_dir` setting (during input). 120 | 121 | 122 | ## API 123 | Arguments for `AV.AutoViz()` method: 124 | 125 | - `filename`: Use an empty string ("") if there's no associated filename and you want to use a dataframe. In that case, using the `dfte` argument for the dataframe. Otherwise provide a filename and leave `dfte` argument with an empty string. Only one of them can be used. 126 | - `sep`: File separator (comma, semi-colon, tab, or any column-separating value) if you use a filename above. 127 | - `depVar`: Target variable in your dataset; set it as an empty string if not applicable. 128 | - `dfte`: name of the pandas dataframe for plotting charts; leave it as empty string if using a filename. 129 | - `header`: set the row number of the header row in your file (0 for the first row). Otherwise leave it as 0. 130 | - `verbose`: 0 for minimal info and charts, 1 for more info and charts, or 2 for saving charts locally without display. 131 | - `lowess`: Use regression lines for each pair of continuous variables against the target variable in small datasets; avoid using for large datasets (>100,000 rows). 132 | - `chart_format`: 'svg', 'png', 'jpg', 'bokeh', 'server', or 'html' for displaying or saving charts in various formats, depending on the verbose option. 133 | - `max_rows_analyzed`: Limit the max number of rows to use for visualization when dealing with very large datasets (millions of rows). A statistically valid sample will be used by autoviz. Default is 150000 rows. 134 | - `max_cols_analyzed`: Limit the number of continuous variables to be analyzed. Defaul is 30 columns. 135 | - `save_plot_dir`: Directory for saving plots. Default is None, which saves plots under the current directory in a subfolder named AutoViz_Plots. If the save_plot_dir doesn't exist, it will be created. 136 | 137 | ## Examples 138 | Here are some examples to help you get started with AutoViz. If you need full jupyter notebooks with code samples they can be found in [examples](https://github.com/AutoViML/AutoViz/tree/master/Examples) folder. 139 | 140 | ### Example 1: Visualize a CSV file with a target variable 141 | 142 | ```python 143 | from autoviz import AutoViz_Class 144 | AV = AutoViz_Class() 145 | 146 | filename = "your_file.csv" 147 | target_variable = "your_target_variable" 148 | 149 | dft = AV.AutoViz( 150 | filename, 151 | sep=",", 152 | depVar=target_variable, 153 | dfte=None, 154 | header=0, 155 | verbose=1, 156 | lowess=False, 157 | chart_format="svg", 158 | max_rows_analyzed=150000, 159 | max_cols_analyzed=30, 160 | save_plot_dir=None 161 | ) 162 | ``` 163 | 164 | ![var_charts](images/var_charts.JPG) 165 | 166 | ### Example 2: Visualize a Pandas DataFrame without a target variable: 167 | 168 | ```python 169 | import pandas as pd 170 | from autoviz import AutoViz_Class 171 | 172 | AV = AutoViz_Class() 173 | 174 | data = {'col1': [1, 2, 3, 4, 5], 'col2': [5, 4, 3, 2, 1]} 175 | df = pd.DataFrame(data) 176 | 177 | dft = AV.AutoViz( 178 | "", 179 | sep=",", 180 | depVar="", 181 | dfte=df, 182 | header=0, 183 | verbose=1, 184 | lowess=False, 185 | chart_format="server", 186 | max_rows_analyzed=150000, 187 | max_cols_analyzed=30, 188 | save_plot_dir=None 189 | ) 190 | 191 | ``` 192 | 193 | ![server_charts](images/server_charts.JPG) 194 | 195 | ### Example 3: Generate interactive Bokeh charts and save them as HTML files in a custom directory 196 | 197 | ```python 198 | from autoviz import AutoViz_Class 199 | AV = AutoViz_Class() 200 | 201 | filename = "your_file.csv" 202 | target_variable = "your_target_variable" 203 | custom_plot_dir = "your_custom_plot_directory" 204 | 205 | dft = AV.AutoViz( 206 | filename, 207 | sep=",", 208 | depVar=target_variable, 209 | dfte=None, 210 | header=0, 211 | verbose=2, 212 | lowess=False, 213 | chart_format="bokeh", 214 | max_rows_analyzed=150000, 215 | max_cols_analyzed=30, 216 | save_plot_dir=custom_plot_dir 217 | ) 218 | ``` 219 | 220 | ![bokeh_charts](images/bokeh_charts.JPG) 221 | 222 | These examples should give you an idea of how to use AutoViz with different scenarios and settings. By tailoring the options and settings, you can generate visualizations that best suit your needs, whether you're working with large datasets, interactive charts, or simply exploring the relationships between variables. 223 | 224 | ## Maintainers 225 | AutoViz is actively maintained and improved by a team of dedicated developers. If you have any questions, suggestions, or issues, feel free to reach out to the maintainers: 226 | 227 | - [@AutoViML](https://github.com/AutoViML) 228 | - [@morenoh149](https://github.com/morenoh149) 229 | - [@hironroy](https://github.com/hironroy) 230 | 231 | ## Contributing 232 | We welcome contributions from the community! If you're interested in contributing to AutoViz, please follow these steps: 233 | 234 | - Fork the repository on GitHub. 235 | - Clone your fork and create a new branch for your feature or bugfix. 236 | - Commit your changes to the new branch, ensuring that you follow coding standards and write appropriate tests. 237 | - Push your changes to your fork on GitHub. 238 | - Submit a pull request to the main repository, detailing your changes and referencing any related issues. 239 | 240 | See [the contributing file](contributing.md)! 241 | 242 | ## License 243 | AutoViz is released under the Apache License, Version 2.0. By using AutoViz, you agree to the terms and conditions specified in the license. 244 | 245 | ## Tips 246 | Here are some additional tips and reminders to help you make the most of the library: 247 | 248 | - **Make sure to regularly upgrade AutoViz** to benefit from the latest features, bug fixes, and improvements. You can update it using pip install --upgrade autoviz. 249 | - **AutoViz is highly customizable, so don't hesitate to explore and experiment with various settings**, such as chart_format, verbose, and max_rows_analyzed. This will allow you to create visualizations that better suit your specific needs and preferences. 250 | - **Remember to delete the AutoViz_Plots directory (or any custom directory you specified) periodically** if you used the verbose=2 option, as it can accumulate a large number of saved charts over time. 251 | - **For further guidance or inspiration, check out the Medium article on AutoViz**, as well as other online resources and tutorials. 252 |
      253 |
    • AutoViz will visualize any sized file using a statistically valid sample.
    • 254 |
    • COMMA is the default separator in the file, but you can change it.
    • 255 |
    • Assumes the first row as the header in the file, but this can be changed.
    • 256 |
    257 | 258 | - **By leveraging AutoViz's powerful and flexible features**, you can streamline your data visualization process and gain valuable insights more efficiently. Happy visualizing! 259 | 260 | ## DISCLAIMER 261 | This project is not an official Google project. It is not supported by Google, and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose. -------------------------------------------------------------------------------- /autoviz/AutoViz_Class.py: -------------------------------------------------------------------------------- 1 | ############################################################################ 2 | # Copyright 2019 Google LLC 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # https://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | ################################################################################################# 16 | import os 17 | import pandas as pd 18 | ######################################## 19 | import warnings 20 | from sklearn.exceptions import DataConversionWarning 21 | #################################################################################### 22 | import matplotlib 23 | import seaborn as sns 24 | import copy 25 | import time 26 | import traceback 27 | from pandas_dq import Fix_DQ, dq_report 28 | ########################################################################################## 29 | from autoviz.AutoViz_Holo import AutoViz_Holo 30 | from autoviz.AutoViz_Utils import draw_pivot_tables, draw_scatters 31 | from autoviz.AutoViz_Utils import draw_pair_scatters, draw_barplots, draw_heatmap 32 | from autoviz.AutoViz_Utils import draw_distplot, draw_violinplot, draw_date_vars, draw_catscatterplots 33 | from autoviz.AutoViz_Utils import list_difference 34 | from autoviz.AutoViz_Utils import find_remove_duplicates, classify_print_vars 35 | from autoviz.AutoViz_Utils import left_subtract 36 | from autoviz.AutoViz_NLP import draw_word_clouds 37 | ####################################################################################### 38 | sns.set(style="ticks", color_codes=True) 39 | matplotlib.use('agg') 40 | warnings.filterwarnings(action='ignore', category=DataConversionWarning) 41 | warnings.filterwarnings("ignore") 42 | ####################################################################################### 43 | def warn(*args, **kwargs): 44 | pass 45 | warnings.warn = warn 46 | ####################################################################################### 47 | class AutoViz_Class: 48 | """ 49 | ############################################################################## 50 | ############# This is not an Officially Supported Google Product! ###### 51 | ############################################################################## 52 | #Copyright 2019 Google LLC ###### 53 | # ###### 54 | #Licensed under the Apache License, Version 2.0 (the "License"); ###### 55 | #you may not use this file except in compliance with the License. ###### 56 | #You may obtain a copy of the License at ###### 57 | # ###### 58 | # https://www.apache.org/licenses/LICENSE-2.0 ###### 59 | # ###### 60 | #Unless required by applicable law or agreed to in writing, software ###### 61 | #distributed under the License is distributed on an "AS IS" BASIS, ###### 62 | #WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.##### 63 | #See the License for the specific language governing permissions and ###### 64 | #limitations under the License. ###### 65 | ############################################################################## 66 | ########### AutoViz Class ###### 67 | ########### by Ram Seshadri ###### 68 | ########### AUTOMATICALLY VISUALIZE ANY DATA SET ###### 69 | ########### Version V0.0.68 1/10/20 ###### 70 | ############################################################################## 71 | ##### AUTOVIZ PERFORMS AUTOMATIC VISUALIZATION OF ANY DATA SET WITH ONE CLICK. 72 | ##### Give it any input file (CSV, txt or json) and AV will visualize it.## 73 | ##### INPUTS: ##### 74 | ##### A FILE NAME OR A DATA FRAME AS INPUT. ##### 75 | ##### AutoViz will visualize any sized file using a statistically valid sample. 76 | ##### - COMMA is assumed as default separator in file. But u can change it.## 77 | ##### - Assumes first row as header in file, but you can change it. #### 78 | ##### - First instantiate an AutoViz class to hold output of charts, plots.# 79 | ##### - Then call the Autoviz program with inputs as defined below. ### 80 | ############################################################################## 81 | """ 82 | 83 | def __init__(self): 84 | self.overall = { 85 | 'name': 'overall', 86 | 'plots': [], 87 | 'heading': [], 88 | 'subheading': [], # "\n".join(subheading) 89 | 'desc': [], # "\n".join(subheading) 90 | 'table1_title': "", 91 | 'table1': [], 92 | 'table2_title': "", 93 | 'table2': [] 94 | } ### This is for overall description and comments about the data set 95 | self.scatter_plot = { 96 | 'name': 'scatter', 97 | 'heading': 'Scatter Plot of each Continuous Variable against Target Variable', 98 | 'plots': [], 99 | 'subheading': [], # "\n".join(subheading) 100 | 'desc': [] # "\n".join(desc) 101 | } ##### This is for description and images for scatter plots ### 102 | self.pair_scatter = { 103 | 'name': 'pair-scatter', 104 | 'heading': 'Pairwise Scatter Plot of each Continuous Variable against other Continuous Variables', 105 | 'plots': [], 106 | 'subheading': [], # "\n".join(subheading) 107 | 'desc': [] # "\n".join(desc) 108 | } ##### This is for description and images for pairs of scatter plots ### 109 | self.dist_plot = { 110 | 'name': 'distribution', 111 | 'heading': 'Distribution Plot of Target Variable', 112 | 'plots': [], 113 | 'subheading': [], # "\n".join(subheading) 114 | 'desc': [] # "\n".join(desc) 115 | } ##### This is for description and images for distribution plots ### 116 | self.pivot_plot = { 117 | 'name': 'pivot', 118 | 'heading': 'Pivot Plots of all Continuous Variable', 119 | 'plots': [], 120 | 'subheading': [], # "\n".join(subheading) 121 | 'desc': [] # "\n".join(desc) 122 | } ##### This is for description and images for pivot plots ### 123 | self.violin_plot = { 124 | 'name': 'violin', 125 | 'heading': 'Violin Plots of all Continuous Variable', 126 | 'plots': [], 127 | 'subheading': [], # "\n".join(subheading) 128 | 'desc': [] # "\n".join(desc) 129 | } ##### This is for description and images for violin plots ### 130 | self.heat_map = { 131 | 'name': 'heatmap', 132 | 'heading': 'Heatmap of all Continuous Variables for target Variable', 133 | 'plots': [], 134 | 'subheading': [], # "\n".join(subheading) 135 | 'desc': [] # "\n".join(desc) 136 | } ##### This is for description and images for heatmaps ### 137 | self.bar_plot = { 138 | 'name': 'bar', 139 | 'heading': 'Bar Plots of Average of each Continuous Variable by Target Variable', 140 | 'plots': [], 141 | 'subheading': [], # "\n".join(subheading) 142 | 'desc': [] # "\n".join(desc) 143 | } ##### This is for description and images for bar plots ### 144 | self.date_plot = { 145 | 'name': 'time-series', 146 | 'heading': 'Time Series Plots of Two Continuous Variables against a Date/Time Variable', 147 | 'plots': [], 148 | 'subheading': [], # "\n".join(subheading) 149 | 'desc': [] # "\n".join(desc) 150 | } ######## This is for description and images for date time plots ### 151 | self.wordcloud = { 152 | 'name': 'wordcloud', 153 | 'heading': 'Word Cloud Plots of NLP or String vars', 154 | 'plots': [], 155 | 'subheading': [], # "\n".join(subheading) 156 | 'desc': [] # "\n".join(desc) 157 | } ######## This is for description and images for date time plots ### 158 | self.catscatter_plot = { 159 | 'name': 'catscatter', 160 | 'heading': 'Cat-Scatter Plots of categorical vars', 161 | 'plots': [], 162 | 'subheading': [], # "\n".join(subheading) 163 | 'desc': [] # "\n".join(desc) 164 | } ######## This is for description and images for catscatter plots ### 165 | 166 | def add_plots(self, plotname, X): 167 | """ 168 | This is a simple program to append the input chart to the right variable named plotname 169 | which is an attribute of class AV. So make sure that the plotname var matches an exact 170 | variable name defined in class AV. Otherwise, this will give an error. 171 | """ 172 | if X is None: 173 | ### If there is nothing to add, leave it as it is. 174 | # print("Nothing to add Plot not being added") 175 | pass 176 | else: 177 | getattr(self, plotname)["plots"].append(X) 178 | 179 | def add_subheading(self, plotname, X): 180 | """ 181 | This is a simple program to append the input chart to the right variable named plotname 182 | which is an attribute of class AV. So make sure that the plotname var matches an exact 183 | variable name defined in class AV. Otherwise, this will give an error. 184 | """ 185 | if X is None: 186 | ### If there is nothing to add, leave it as it is. 187 | pass 188 | else: 189 | getattr(self, plotname)["subheading"].append(X) 190 | 191 | def AutoViz(self, filename: (str or pd.DataFrame), sep=',', depVar='', dfte=None, header=0, verbose=1, 192 | lowess=False, chart_format='svg', max_rows_analyzed=150000, 193 | max_cols_analyzed=30, save_plot_dir=None): 194 | """ 195 | ############################################################################## 196 | ##### AUTOVIZ PERFORMS AUTOMATIC VISUALIZATION OF ANY DATA SET WITH ONE CLICK. 197 | ##### Give it any input file (CSV, txt or json) and AV will visualize it.## 198 | ##### INPUTS: ##### 199 | ##### A FILE NAME OR A DATA FRAME AS INPUT. ##### 200 | ##### AutoViz will visualize any sized file using a statistically valid sample. 201 | ##### - max_rows_analyzed = 150000 ### this limits the max number of rows ### 202 | ##### that is used to display charts ### 203 | ##### - max_cols_analyzed = 30 ### This limits the number of continuous ### 204 | ##### vars that can be analyzed #### 205 | ##### - COMMA is assumed as default separator in file. But u can change it.## 206 | ##### - Assumes first row as header in file, but you can change it. #### 207 | ##### - First instantiate an AutoViz class to hold output of charts, plots.# 208 | ##### - Then call the Autoviz program with inputs as defined below. ### 209 | ############################################################################## 210 | ##### This is the main calling program in AV. It will call all the load, ##### 211 | #### display and save rograms that are currently outside AV. This program ### 212 | #### will draw scatter and other plots for the input data set and then #### 213 | #### call the correct variable name with add_plots function and send in #### 214 | #### the chart created by that plotting program, for example, scatter ##### 215 | #### You have to make sure that add_plots function has the exact name of #### 216 | #### the variable defined in the Class AV. If not, this will give an error.## 217 | #### If verbose=0: it does not print any messages and goes into silent mode## 218 | #### This is the default. ##### 219 | #### If verbose=1, it will print messages on the terminal and also display### 220 | #### charts on terminal ##### 221 | #### If verbose=2, it will print messages but will not display charts, ##### 222 | #### it will simply save them. ##### 223 | ############################################################################## 224 | """ 225 | if isinstance(dfte, pd.DataFrame): ### if there is a dataframe, choose it 226 | filename = dfte 227 | 228 | if isinstance(depVar, list): 229 | print('Since AutoViz cannot visualize multi-label targets, choosing first item in targets: %s' % depVar[0]) 230 | dep_var = depVar[0] 231 | else: 232 | dep_var = copy.deepcopy(depVar) 233 | #################################################################################### 234 | if chart_format.lower() in ['bokeh', 'server', 'bokeh_server', 'bokeh-server', 'html']: 235 | dft = AutoViz_Holo(filename, sep, dep_var, header, verbose, 236 | lowess, chart_format, max_rows_analyzed, 237 | max_cols_analyzed, save_plot_dir) 238 | else: 239 | dft = self.AutoViz_Main(filename, sep, dep_var, header, verbose, 240 | lowess, chart_format, max_rows_analyzed, 241 | max_cols_analyzed, save_plot_dir) 242 | return dft 243 | 244 | def AutoViz_Main(self, filename: str or pd.DataFrame, sep=',', dep_var='', header=0, verbose=0, 245 | lowess=False, chart_format='svg', max_rows_analyzed=150000, 246 | max_cols_analyzed=30, save_plot_dir=None): 247 | """ 248 | ############################################################################## 249 | ##### AUTOVIZ_MAIN PERFORMS AUTO VISUALIZATION OF ANY DATA USING MATPLOTLIB ## 250 | ############################################################################## 251 | """ 252 | ######### create a directory to save all plots generated by autoviz ############ 253 | ############ THis is where you save the figures in a target directory ####### 254 | target_dir = 'AutoViz' 255 | 256 | if dep_var is not None: 257 | if isinstance(dep_var, list): 258 | target_dir = dep_var[0] 259 | elif isinstance(dep_var, str): 260 | if dep_var != '': 261 | target_dir = copy.deepcopy(dep_var) 262 | if save_plot_dir is None: 263 | mk_dir = os.path.join(".", "AutoViz_Plots") 264 | else: 265 | mk_dir = copy.deepcopy(save_plot_dir) 266 | if verbose == 2 and not os.path.isdir(mk_dir): 267 | os.mkdir(mk_dir) 268 | mk_dir = os.path.join(mk_dir, target_dir) 269 | if verbose == 2 and not os.path.isdir(mk_dir): 270 | os.mkdir(mk_dir) 271 | ############ Start the clock here and classify variables in data set first ######## 272 | start_time = time.time() 273 | 274 | (dft, dep_var, id_cols, bool_vars, cats, continuous_vars, discrete_string_vars, date_vars, classes, 275 | problem_type, selected_cols) = classify_print_vars(filename, sep, max_rows_analyzed, max_cols_analyzed, 276 | dep_var, header, verbose) 277 | 278 | ########### This is where perform data quality checks on data ################ 279 | if verbose >= 1: 280 | print('To fix these data quality issues in the dataset, import FixDQ from autoviz...') 281 | #### Run the Data Cleaning suggestions report now ############ 282 | 283 | if dep_var is not None: 284 | if isinstance(dep_var, list): 285 | remaining_vars = left_subtract(list(dft), dep_var) 286 | if len(remaining_vars) == len(list(dft)): 287 | print('depVar %s not found in given dataset. Please check your input and try again' % dep_var) 288 | return dft 289 | ### run the data cleaning report with a multi-label list of targets ## 290 | data_cleaning_suggestions(dft, target=dep_var) 291 | else: 292 | ### run the data cleaning report with a single-label target ## 293 | data_cleaning_suggestions(dft, target=dep_var) 294 | else: 295 | ### run data cleaning report with no target #### 296 | data_cleaning_suggestions(dft, target='') 297 | 298 | ##### This is where we start plotting different kinds of charts depending on dependent variables 299 | if dep_var is None or dep_var == '': 300 | ##### This is when No dependent Variable is given ####### 301 | if len(continuous_vars) > 1: 302 | try: 303 | svg_data = draw_pair_scatters(dft, continuous_vars, problem_type, verbose, chart_format, 304 | dep_var, classes, lowess, mk_dir) 305 | self.add_plots('pair_scatter', svg_data) 306 | except Exception as e: 307 | print(e) 308 | print('Could not draw Pair Scatter Plots') 309 | try: 310 | svg_data = draw_distplot(dft, bool_vars + cats, continuous_vars, verbose, chart_format, problem_type, 311 | dep_var, classes, mk_dir) 312 | self.add_plots('dist_plot', svg_data) 313 | except Exception as e: 314 | print(f'Could not draw Distribution Plot. {e}') 315 | try: 316 | if len(continuous_vars) > 0: 317 | svg_data = draw_violinplot(dft, dep_var, continuous_vars, verbose, chart_format, problem_type, 318 | mk_dir) 319 | self.add_plots('violin_plot', svg_data) 320 | else: 321 | svg_data = draw_pivot_tables(dft, problem_type, verbose, 322 | chart_format, dep_var, mk_dir) 323 | self.add_plots('pivot_plot', svg_data) 324 | except Exception as e: 325 | print(f'Could not draw Distribution Plots {e}') 326 | try: 327 | #### Since there is no dependent variable in this dataset, you can leave it as-is 328 | numeric_cols = dft.select_dtypes(include='number').columns.tolist() 329 | numeric_cols = list_difference(numeric_cols, date_vars) 330 | svg_data = draw_heatmap(dft, numeric_cols, verbose, chart_format, date_vars, dep_var, 331 | problem_type, mk_dir) 332 | self.add_plots('heat_map', svg_data) 333 | except Exception as e: 334 | print(f'Could not draw Heat Map {e}') 335 | if date_vars != [] and len(continuous_vars) > 0: 336 | try: 337 | svg_data = draw_date_vars(dft, dep_var, date_vars, 338 | continuous_vars, verbose, chart_format, problem_type, mk_dir) 339 | self.add_plots('date_plot', svg_data) 340 | except Exception as e: 341 | print(f'Could not draw Date Vars {e}') 342 | if len(continuous_vars) > 0 and len(cats) > 0: 343 | try: 344 | svg_data = draw_barplots(dft, cats, continuous_vars, problem_type, 345 | verbose, chart_format, dep_var, mk_dir) 346 | self.add_plots('bar_plot', svg_data) 347 | except Exception as e: 348 | print(f'Could not draw Bar Plots {e}') 349 | else: 350 | if len(cats) > 1: 351 | try: 352 | svg_data = draw_catscatterplots(dft, cats, verbose, 353 | chart_format, mk_dir=None) 354 | self.add_plots('catscatter_plot', svg_data) 355 | except Exception as e: 356 | print(f'Could not draw catscatter plots. {e}') 357 | else: 358 | if problem_type == 'Regression': 359 | ############## This is a Regression Problem ################# 360 | if len(continuous_vars) > 0: 361 | try: 362 | svg_data = draw_scatters(dft, 363 | continuous_vars, verbose, chart_format, problem_type, dep_var, classes, 364 | lowess, mk_dir) 365 | self.add_plots('scatter_plot', svg_data) 366 | except Exception as e: 367 | print("Exception Drawing Scatter Plots") 368 | print(e) 369 | traceback.print_exc() 370 | print('Could not draw Scatter Plots') 371 | if len(continuous_vars) > 1: 372 | try: 373 | svg_data = draw_pair_scatters(dft, continuous_vars, problem_type, verbose, chart_format, 374 | dep_var, classes, lowess, mk_dir) 375 | self.add_plots('pair_scatter', svg_data) 376 | except Exception as e: 377 | print(f'Could not draw Pair Scatter Plots {e}') 378 | try: 379 | if type(dep_var) == str: 380 | othernums = [x for x in continuous_vars if x not in [dep_var]] 381 | else: 382 | othernums = [x for x in continuous_vars if x not in dep_var] 383 | if len(othernums) >= 1: 384 | svg_data = draw_distplot(dft, bool_vars + cats, continuous_vars, verbose, chart_format, 385 | problem_type, dep_var, classes, mk_dir) 386 | self.add_plots('dist_plot', svg_data) 387 | except Exception as e: 388 | print(f'Could not draw some Distribution Plots {e}') 389 | try: 390 | if len(continuous_vars) > 0: 391 | svg_data = draw_violinplot(dft, dep_var, continuous_vars, verbose, chart_format, problem_type, 392 | mk_dir) 393 | self.add_plots('violin_plot', svg_data) 394 | except Exception as e: 395 | print(f'Could not draw Violin Plots {e}') 396 | try: 397 | numeric_cols = [x for x in dft.select_dtypes(include='number').columns.tolist() if 398 | x not in [dep_var]] 399 | numeric_cols = list_difference(numeric_cols, date_vars) 400 | svg_data = draw_heatmap(dft, 401 | numeric_cols, verbose, chart_format, date_vars, dep_var, 402 | problem_type, mk_dir) 403 | self.add_plots('heat_map', svg_data) 404 | except Exception as e: 405 | print(f'Could not draw some Heat Maps {e}') 406 | if date_vars != [] and len(continuous_vars) > 0: 407 | try: 408 | svg_data = draw_date_vars( 409 | dft, dep_var, date_vars, continuous_vars, verbose, chart_format, problem_type, mk_dir) 410 | self.add_plots('date_plot', svg_data) 411 | except Exception as e: 412 | print(f'Could not draw some Time Series plots. {e}') 413 | if len(cats) > 0 and len(continuous_vars) == 0: 414 | ### This is somewhat duplicative with distplot (above) - hence do it only minimally! 415 | try: 416 | svg_data = draw_pivot_tables(dft, problem_type, verbose, 417 | chart_format, dep_var, mk_dir) 418 | self.add_plots('pivot_plot', svg_data) 419 | except Exception as e: 420 | print(f'Could not draw some Pivot Charts against Dependent Variable {e}') 421 | if len(continuous_vars) > 0 and len(cats) > 0: 422 | try: 423 | svg_data = draw_barplots(dft, find_remove_duplicates(cats + bool_vars), continuous_vars, 424 | problem_type, verbose, chart_format, dep_var, mk_dir) 425 | self.add_plots('bar_plot', svg_data) 426 | # self.add_plots('bar_plot',None) 427 | except Exception as e: 428 | print(f'Could not draw some Bar Charts {e}') 429 | else: 430 | if len(cats) > 1: 431 | try: 432 | svg_data = draw_catscatterplots(dft, cats, verbose, 433 | chart_format, mk_dir=None) 434 | self.add_plots('catscatter_plot', svg_data) 435 | except Exception as e: 436 | print(f'Could not draw catscatter plots... {e}') 437 | else: 438 | ############ This is a Classification Problem ################## 439 | if len(continuous_vars) > 0: 440 | try: 441 | svg_data = draw_scatters(dft, continuous_vars, 442 | verbose, chart_format, problem_type, dep_var, classes, lowess, mk_dir) 443 | self.add_plots('scatter_plot', svg_data) 444 | except Exception as e: 445 | print(e) 446 | traceback.print_exc() 447 | print('Could not draw some Scatter Plots') 448 | if len(continuous_vars) > 1: 449 | try: 450 | svg_data = draw_pair_scatters(dft, continuous_vars, 451 | problem_type, verbose, chart_format, dep_var, classes, lowess, 452 | mk_dir) 453 | self.add_plots('pair_scatter', svg_data) 454 | except Exception as e: 455 | print(f'Could not draw some Pair Scatter Plots {e}') 456 | try: 457 | if type(dep_var) == str: 458 | othernums = [x for x in continuous_vars if x not in [dep_var]] 459 | else: 460 | othernums = [x for x in continuous_vars if x not in dep_var] 461 | 462 | if len(othernums) >= 1: 463 | svg_data = draw_distplot(dft, bool_vars + cats, continuous_vars, verbose, chart_format, 464 | problem_type, dep_var, classes, mk_dir) 465 | self.add_plots('dist_plot', svg_data) 466 | else: 467 | print('No continuous var in data set: drawing categorical distribution plots') 468 | except Exception as e: 469 | print(f'Could not draw some Distribution Plots {e}') 470 | try: 471 | if len(continuous_vars) > 0: 472 | svg_data = draw_violinplot(dft, dep_var, continuous_vars, verbose, chart_format, problem_type, 473 | mk_dir) 474 | self.add_plots('violin_plot', svg_data) 475 | except Exception as e: 476 | print(f'Could not draw some Violin Plots {e}') 477 | try: 478 | numeric_cols = [x for x in dft.select_dtypes(include='number').columns.tolist() if 479 | x not in [dep_var]] 480 | numeric_cols = list_difference(numeric_cols, date_vars) 481 | svg_data = draw_heatmap(dft, numeric_cols, 482 | verbose, chart_format, date_vars, dep_var, problem_type, 483 | mk_dir) 484 | self.add_plots('heat_map', svg_data) 485 | except Exception as e: 486 | print(f'Could not draw some Heat Maps {e}') 487 | if date_vars != [] and len(continuous_vars) > 0: 488 | try: 489 | svg_data = draw_date_vars(dft, dep_var, date_vars, 490 | continuous_vars, verbose, chart_format, problem_type, mk_dir) 491 | self.add_plots('date_plot', svg_data) 492 | except Exception as e: 493 | print(f'Could not draw some Time Series plots. {e}') 494 | if len(cats) > 0 and len(continuous_vars) == 0: 495 | ### This is somewhat duplicative with distplot (above) - hence do it only minimally! 496 | try: 497 | svg_data = draw_pivot_tables(dft, problem_type, verbose, 498 | chart_format, dep_var, mk_dir) 499 | self.add_plots('pivot_plot', svg_data) 500 | except Exception as e: 501 | print(f'Could not draw some Pivot Charts against Dependent Variable {e}') 502 | if len(continuous_vars) > 0 and len(cats) > 0: 503 | try: 504 | svg_data = draw_barplots(dft, find_remove_duplicates(cats + bool_vars), continuous_vars, 505 | problem_type, 506 | verbose, chart_format, dep_var, mk_dir) 507 | self.add_plots('bar_plot', svg_data) 508 | pass 509 | except Exception as e: 510 | if verbose <= 1: 511 | print(f'Could not draw some Bar Charts {e}') 512 | pass 513 | else: 514 | if len(cats) > 1: 515 | try: 516 | svg_data = draw_catscatterplots(dft, cats, verbose, 517 | chart_format, mk_dir=None) 518 | self.add_plots('catscatter_plot', svg_data) 519 | except Exception as e: 520 | print(f'Could not draw catscatter plots. {e}') 521 | ###### Now you can check for NLP vars or discrete_string_vars to do wordcloud ####### 522 | if len(discrete_string_vars) > 0: 523 | plotname = 'wordcloud' 524 | import nltk 525 | nltk.download('popular') 526 | for each_string_var in discrete_string_vars: 527 | try: 528 | svg_data = draw_word_clouds(dft, each_string_var, chart_format, plotname, 529 | dep_var, problem_type, classes, mk_dir, verbose=0) 530 | self.add_plots(plotname, svg_data) 531 | except Exception as e: 532 | print(f'Could not draw wordcloud plot for {each_string_var}. {e}') 533 | ### Now print the time taken to run charts for AutoViz ############# 534 | if verbose <= 1: 535 | print('All Plots done') 536 | else: 537 | print('All Plots are saved in %s' % mk_dir) 538 | print('Time to run AutoViz = %0.0f seconds ' % (time.time() - start_time)) 539 | if verbose <= 1: 540 | print('\n ###################### AUTO VISUALIZATION Completed ########################') 541 | return dft 542 | 543 | 544 | ############################################################################################# 545 | 546 | 547 | # Create a new class FixDQ by inheriting from Fix_DQ 548 | class FixDQ(Fix_DQ): 549 | """ 550 | FixDQ is a great way to clean an entire train data set and apply the same steps in 551 | an MLOps pipeline to a test dataset. FixDQ can be used to detect most issues in 552 | your data (similar to data_cleaning_suggestions but without the `target` 553 | related issues) in one step. Then it fixes those issues it finds during the 554 | `fit` method by the `transform` method. This transformer can then be saved 555 | (or "pickled") for applying the same steps on test data either at the same 556 | time or later. 557 | 558 | FixDQ will perform following data quality cleaning steps: 559 | It removes ID columns from further processing 560 | It removes zero-variance columns from further processing 561 | It identifies rare categories and groups them into a single category 562 | called "Rare" 563 | It finds infinite values and replaces them with an upper bound based on 564 | Inter Quartile Range 565 | It detects mixed data types and drops those mixed-type columns from 566 | further processing 567 | It detects outliers and suggests to remove them or use robust statistics. 568 | It detects high cardinality features but leaves them as it is. 569 | It detects highly correlated features and drops one of them (whichever 570 | comes first in the column sequence) 571 | It detects duplicate rows and drops one of them or keeps only one copy 572 | of duplicate rows 573 | It detects duplicate columns and drops one of them or keeps only one copy 574 | It detects skewed distributions and applies log or box-cox 575 | transformations on them. 576 | It detects imbalanced classes and leaves them as it is 577 | It detects feature leakage and drops one of those features if 578 | they are highly correlated to target 579 | """ 580 | 581 | def __init__(self, quantile=0.87, cat_fill_value='missing', 582 | num_fill_value=9999, rare_threshold=0.01, 583 | correlation_threshold=0.9): 584 | super().__init__() # Call the parent class constructor 585 | # Additional initialization code here 586 | self.quantile = quantile 587 | self.cat_fill_value = cat_fill_value 588 | self.num_fill_value = num_fill_value 589 | self.rare_threshold = rare_threshold 590 | self.correlation_threshold = correlation_threshold 591 | 592 | 593 | ################################################################################### 594 | 595 | 596 | def data_cleaning_suggestions(df, target=None): 597 | """ 598 | This is a simple program to give data cleaning and improvement suggestions in class AV. 599 | Make sure you send in a dataframe. Otherwise, this will give an error. 600 | """ 601 | if isinstance(df, pd.DataFrame): 602 | dqr = dq_report(data=df, target=target, html=False, csv_engine="pandas", verbose=1) 603 | else: 604 | print("Input must be a dataframe. Please check input and try again.") 605 | return dqr 606 | ################################################################################### 607 | -------------------------------------------------------------------------------- /autoviz/AutoViz_NLP.py: -------------------------------------------------------------------------------- 1 | """ 2 | Copyright 2020 Google LLC. This software is provided as-is, without warranty or 3 | representation for any use or purpose. Your use of it is subject to your 4 | agreement with Google. 5 | """ 6 | import pandas as pd 7 | import string 8 | 9 | from collections import Counter 10 | 11 | from .AutoViz_Utils import save_image_data 12 | 13 | pd.set_option('display.max_colwidth', 5000) 14 | 15 | # Contraction map 16 | c_dict = { 17 | "ain't": "am not", 18 | "aren't": "are not", 19 | "cant": "cannot", 20 | "can't": "cannot", 21 | "can't've": "cannot have", 22 | "'cause": "because", 23 | "b": "be", 24 | "bc": "because", 25 | "becos": "because", 26 | "bs": "Expletive", 27 | "cause": "because", 28 | "could've": "could have", 29 | "couldn't": "could not", 30 | "couldn't've": "could not have", 31 | "corp": "corporation", 32 | "cud": "could", 33 | "didn't": "did not", 34 | "doesn't": "does not", 35 | "don't": "do not", 36 | "execs": "executives", 37 | "fck": "fuck", 38 | "fcking": "fucking", 39 | "gon na": "going to", 40 | "hadn't": "had not", 41 | "hadn't've": "had not have", 42 | "hasn't": "has not", 43 | "haven't": "have not", 44 | "he'd": "he would", 45 | "he'd've": "he would have", 46 | "he'll": "he will", 47 | "he'll've": "he will have", 48 | "he's": "he is", 49 | "how'd": "how did", 50 | "how'd'y": "how do you", 51 | "how'll": "how will", 52 | "how's": "how is", 53 | "im": "i am", 54 | "iam": "i am", 55 | "i'd": "I would", 56 | "i'd've": "I would have", 57 | "i'll": "I will", 58 | "i'll've": "I will have", 59 | "i'm": "I am", 60 | "i've": "I have", 61 | "isn't": "is not", 62 | "it'd": "it had", 63 | "it'd've": "it would have", 64 | "it'll": "it will", 65 | "it'll've": "it will have", 66 | "it's": "it is", 67 | "let's": "let us", 68 | "ma'am": "madam", 69 | "mayn't": "may not", 70 | "mgr": "manager", 71 | "might've": "might have", 72 | "mightn't": "might not", 73 | "mightn't've": "might not have", 74 | "must've": "must have", 75 | "mustn't": "must not", 76 | "mustn't've": "must not have", 77 | "needn't": "need not", 78 | "needn't've": "need not have", 79 | "o'clock": "of the clock", 80 | "ofc": "office", 81 | "oughtn't": "ought not", 82 | "oughtn't've": "ought not have", 83 | "pics": "pictures", 84 | "shan't": "shall not", 85 | "sha'n't": "shall not", 86 | "shan't've": "shall not have", 87 | "she'd": "she would", 88 | "she'd've": "she would have", 89 | "she'll": "she will", 90 | "she'll've": "she will have", 91 | "she's": "she is", 92 | "should've": "should have", 93 | "shouldn't": "should not", 94 | "shouldn't've": "should not have", 95 | "so've": "so have", 96 | "so's": "so is", 97 | "svc": "service", 98 | "that'd": "that would", 99 | "that'd've": "that would have", 100 | "that's": "that is", 101 | "there'd": "there had", 102 | "there'd've": "there would have", 103 | "there's": "there is", 104 | "they'd": "they would", 105 | "they'd've": "they would have", 106 | "they'll": "they will", 107 | "they'll've": "they will have", 108 | "they're": "they are", 109 | "they've": "they have", 110 | "tho": "though", 111 | "to've": "to have", 112 | "wan na": "want to", 113 | "wasn't": "was not", 114 | "we'd": "we had", 115 | "we'd've": "we would have", 116 | "we'll": "we will", 117 | "we'll've": "we will have", 118 | "we're": "we are", 119 | "we've": "we have", 120 | "weren't": "were not", 121 | "what'll": "what will", 122 | "what'll've": "what will have", 123 | "what're": "what are", 124 | "what's": "what is", 125 | "what've": "what have", 126 | "when's": "when is", 127 | "when've": "when have", 128 | "where'd": "where did", 129 | "where's": "where is", 130 | "where've": "where have", 131 | "who'll": "who will", 132 | "who'll've": "who will have", 133 | "who's": "who is", 134 | "who've": "who have", 135 | "why's": "why is", 136 | "why've": "why have", 137 | "will've": "will have", 138 | "won't": "will not", 139 | "won't've": "will not have", 140 | "would've": "would have", 141 | "wouldn't": "would not", 142 | "wouldn't've": "would not have", 143 | "y'all": "you all", 144 | "y'alls": "you alls", 145 | "y'all'd": "you all would", 146 | "y'all'd've": "you all would have", 147 | "y'all're": "you all are", 148 | "y'all've": "you all have", 149 | "you'd": "you had", 150 | "you'd've": "you would have", 151 | "you'll": "you you will", 152 | "you'll've": "you you will have", 153 | "you're": "you are", 154 | "you've": "you have" 155 | } 156 | 157 | 158 | ################################################################################## 159 | def left_subtract(l1, l2): 160 | lst = [] 161 | for i in l1: 162 | if i not in l2: 163 | lst.append(i) 164 | return lst 165 | 166 | 167 | ################################################################################ 168 | def return_stop_words(): 169 | STOP_WORDS = ['it', "this", "that", "to", 'its', 'am', 'is', 'are', 'was', 'were', 'a', 170 | 'an', 'the', 'and', 'or', 'of', 'at', 'by', 'for', 'with', 'about', 'between', 171 | 'into', 'above', 'below', 'from', 'up', 'down', 'in', 'out', 'on', 'over', 'will', 'shall', 'could', 172 | 'under', 'again', 'further', 'then', 'once', 'all', 'any', 'both', 'each', 'would', 173 | 'few', 'more', 'most', 'other', 'some', 'such', 'only', 'own', 'same', 'so', 174 | 'than', 'too', 'very', 's', 't', 'can', 'just', 'd', 'll', 'm', 'o', 're', 175 | 've', 'y', 'ain', 'ma', 'them', 'themselves', 'they', 'he', 'she', 'ex', 'become', 'their'] 176 | add_words = ["s", "m", 'you', 'not', 'get', 'no', 'via', 'one', 'still', 'us', 'u', 'hey', 'hi', 'oh', 'jeez', 177 | 'the', 'a', 'in', 'to', 'of', 'i', 'and', 'is', 'for', 'on', 'it', 'got', 'aww', 'awww', 178 | 'not', 'my', 'that', 'by', 'with', 'are', 'at', 'this', 'from', 'be', 'have', 'was', 179 | '', ' ', 'say', 's', 'u', 'ap', 'afp', '...', 'n', '\\'] 180 | # stopWords = text.ENGLISH_STOP_WORDS.union(add_words) 181 | stop_words = list(set(STOP_WORDS + add_words)) 182 | excl = ['will', "i'll", 'shall', "you'll", 'may', "don't", "hadn't", "hasn't", "haven't", 183 | "don't", "isn't", 'if', "mightn't", "mustn'", "mightn't", 'mightn', "needn't", 184 | 'needn', "needn't", 'no', 'not', 'shan', "shan't", 'shouldn', "shouldn't", "wasn't", 185 | 'wasn', 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't", "you'd", 186 | "you'd", "you'll", "you're", 'yourself', 'yourselves'] 187 | stopWords = left_subtract(stop_words, excl) 188 | return sorted(stopWords) 189 | 190 | 191 | ################################################################################## 192 | def expandContractions(text): 193 | """ 194 | Takes in a sentence, splits it into list of strings and returns sentence back with 195 | items sibstituted by expanded abbreviations or abbreviated words that are expanded. 196 | """ 197 | text_list = text.split(" ") 198 | return " ".join([c_dict.get(item, item) for item in text_list]) 199 | 200 | 201 | # 202 | # remove entire URL 203 | def remove_URL(text): 204 | url = re.compile(r'https?://\S+|www\.\S+') 205 | return url.sub(r'', text) 206 | 207 | 208 | # Remove just HTML markup language 209 | def remove_html(text): 210 | html = re.compile(r'<.*?>') 211 | return html.sub(r'', text) 212 | 213 | 214 | # Convert Emojis to Text 215 | import emoji 216 | 217 | 218 | def convert_emojis(text): 219 | return emoji.demojize(text) 220 | 221 | 222 | def remove_punct(text): 223 | table = str.maketrans('', '', string.punctuation) 224 | return text.translate(table) 225 | 226 | 227 | # Clean even further removing non-printable text 228 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert 229 | 230 | 231 | def remove_stopwords(tweet): 232 | """Removes STOP_WORDS characters""" 233 | stop_words = return_stop_words() 234 | tweet = tweet.lower() 235 | tweet = ' '.join([x for x in tweet.split(" ") if x not in stop_words]) 236 | tweet = ''.join([x for x in tweet if x in string.printable]) 237 | return tweet 238 | 239 | 240 | # define a function that accepts text and returns a list of lemmas 241 | def split_into_lemmas(text): 242 | words = TextBlob(text).words 243 | text = ' '.join([word.lemmatize() for word in words]) 244 | return text 245 | 246 | 247 | # Expand Slangs 248 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert 249 | slangs = { 250 | "IG": "Instagram", 251 | "FB": "Facebook", 252 | "MOFO": "Expletive", 253 | "OMG": "Oh my God", 254 | "ROFL": "roll on the floor laughing", 255 | "ROFLOL": "roll on the floor laughing out loud", 256 | "ROTFLMAO": "roll on the floor laughing my ass off", 257 | "FCK": "Expletive", 258 | "LMAO": "Laugh my Ass off", 259 | "LOL": "laugh out loud", 260 | } 261 | 262 | abbreviations = { 263 | "$": " dollar ", 264 | "€": " euro ", 265 | "4ao": "for adults only", 266 | "a.m": "before midday", 267 | "a3": "anytime anywhere anyplace", 268 | "aamof": "as a matter of fact", 269 | "acct": "account", 270 | "adih": "another day in hell", 271 | "afaic": "as far as i am concerned", 272 | "afaict": "as far as i can tell", 273 | "afaik": "as far as i know", 274 | "afair": "as far as i remember", 275 | "afk": "away from keyboard", 276 | "app": "application", 277 | "approx": "approximately", 278 | "apps": "applications", 279 | "asap": "as soon as possible", 280 | "asl": "age, sex, location", 281 | "atk": "at the keyboard", 282 | "ave.": "avenue", 283 | "aymm": "are you my mother", 284 | "ayor": "at your own risk", 285 | "b&b": "bed and breakfast", 286 | "b+b": "bed and breakfast", 287 | "b.c": "before christ", 288 | "b2b": "business to business", 289 | "b2c": "business to customer", 290 | "b4": "before", 291 | "b4n": "bye for now", 292 | "b@u": "back at you", 293 | "bae": "before anyone else", 294 | "bak": "back at keyboard", 295 | "bbbg": "bye bye be good", 296 | "bbc": "british broadcasting corporation", 297 | "bbias": "be back in a second", 298 | "bbl": "be back later", 299 | "bbs": "be back soon", 300 | "be4": "before", 301 | "bfn": "bye for now", 302 | "blvd": "boulevard", 303 | "bout": "about", 304 | "brb": "be right back", 305 | "bros": "brothers", 306 | "brt": "be right there", 307 | "bsaaw": "big smile and a wink", 308 | "btch": "bitch", 309 | "btw": "by the way", 310 | "btfd": "buy the Expletive dip", 311 | "bwl": "bursting with laughter", 312 | "c/o": "care of", 313 | "cet": "central european time", 314 | "cf": "compare", 315 | "cia": "central intelligence agency", 316 | "csl": "can not stop laughing", 317 | "cu": "see you", 318 | "cul8r": "see you later", 319 | "cv": "curriculum vitae", 320 | "cwot": "complete waste of time", 321 | "cya": "see you", 322 | "cyt": "see you tomorrow", 323 | "dae": "does anyone else", 324 | "dbmib": "do not bother me i am busy", 325 | "diy": "do it yourself", 326 | "dm": "direct message", 327 | "dwh": "during work hours", 328 | "e123": "easy as one two three", 329 | "eet": "eastern european time", 330 | "eg": "example", 331 | "embm": "early morning business meeting", 332 | "encl": "enclosed", 333 | "encl.": "enclosed", 334 | "etc": "and so on", 335 | "faq": "frequently asked questions", 336 | "fawc": "for anyone who cares", 337 | "fb": "facebook", 338 | "fc": "fingers crossed", 339 | "fig": "figure", 340 | "fimh": "forever in my heart", 341 | "ft.": "feet", 342 | "ft": "featuring", 343 | "ftl": "for the loss", 344 | "ftw": "for the win", 345 | "fwiw": "for what it is worth", 346 | "fyi": "for your information", 347 | "g9": "genius", 348 | "gahoy": "get a hold of yourself", 349 | "gal": "get a life", 350 | "gcse": "general certificate of secondary education", 351 | "gfn": "gone for now", 352 | "gg": "good game", 353 | "gl": "good luck", 354 | "glhf": "good luck have fun", 355 | "gmt": "greenwich mean time", 356 | "gmta": "great minds think alike", 357 | "gn": "good night", 358 | "g.o.a.t": "greatest of all time", 359 | "goat": "greatest of all time", 360 | "goi": "get over it", 361 | "gps": "global positioning system", 362 | "gr8": "great", 363 | "gratz": "congratulations", 364 | "gyal": "girl", 365 | "h&c": "hot and cold", 366 | "hp": "horsepower", 367 | "hr": "hour", 368 | "hrh": "his royal highness", 369 | "ht": "height", 370 | "ibrb": "i will be right back", 371 | "ic": "i see", 372 | "icq": "i seek you", 373 | "icymi": "in case you missed it", 374 | "idc": "i do not care", 375 | "idgadf": "i do not give a damn Expletive", 376 | "idgaf": "i do not give a Expletive", 377 | "idk": "i do not know", 378 | "ie": "that is", 379 | "i.e": "that is", 380 | "ifyp": "i feel your pain", 381 | "iirc": "if i remember correctly", 382 | "ilu": "i love you", 383 | "ily": "i love you", 384 | "imho": "in my humble opinion", 385 | "imo": "in my opinion", 386 | "imu": "i miss you", 387 | "iow": "in other words", 388 | "irl": "in real life", 389 | "j4f": "just for fun", 390 | "jic": "just in case", 391 | "jk": "just kidding", 392 | "jsyk": "just so you know", 393 | "l8r": "later", 394 | "lb": "pound", 395 | "lbs": "pounds", 396 | "ldr": "long distance relationship", 397 | "lmao": "laugh my ass off", 398 | "lmfao": "laugh my Expletive ass off", 399 | "lol": "laugh out loud", 400 | "ltd": "limited", 401 | "ltns": "long time no see", 402 | "m8": "mate", 403 | "mf": "Expletive", 404 | "mfing": "Expletive", 405 | "mfs": "Expletive", 406 | "mfw": "my face when", 407 | "mofo": "Expletive", 408 | "mph": "miles per hour", 409 | "mr": "mister", 410 | "mrw": "my reaction when", 411 | "ms": "miss", 412 | "mte": "my thoughts exactly", 413 | "nagi": "not a good idea", 414 | "nbc": "national broadcasting company", 415 | "nbd": "not big deal", 416 | "nfs": "not for sale", 417 | "ngl": "not going to lie", 418 | "nhs": "national health service", 419 | "nrn": "no reply necessary", 420 | "nsfl": "not safe for life", 421 | "nsfw": "not safe for work", 422 | "nth": "nice to have", 423 | "nvr": "never", 424 | "nyc": "new york city", 425 | "oc": "original content", 426 | "og": "original", 427 | "ohp": "overhead projector", 428 | "oic": "oh i see", 429 | "omdb": "over my dead body", 430 | "omg": "oh my god", 431 | "omw": "on my way", 432 | "p.a": "per annum", 433 | "p.m": "after midday", 434 | "pm": "prime minister", 435 | "poc": "people of color", 436 | "pov": "point of view", 437 | "pp": "pages", 438 | "ppl": "people", 439 | "prw": "parents are watching", 440 | "ps": "postscript", 441 | "pt": "point", 442 | "ptb": "please text back", 443 | "pto": "please turn over", 444 | "qpsa": "what happens", # "que pasa", 445 | "ratchet": "rude", 446 | "rbtl": "read between the lines", 447 | "rlrt": "real life retweet", 448 | "rofl": "rolling on the floor laughing", 449 | "roflol": "rolling on the floor laughing out loud", 450 | "rotflmao": "rolling on the floor laughing my ass off", 451 | "rt": "retweet", 452 | "ruok": "are you ok", 453 | "sfw": "safe for work", 454 | "sk8": "skate", 455 | "smh": "shake my head", 456 | "sq": "square", 457 | "srsly": "seriously", 458 | "ssdd": "same stuff different day", 459 | "tbh": "to be honest", 460 | "tbs": "tablespooful", 461 | "tbsp": "tablespooful", 462 | "tfw": "that feeling when", 463 | "thks": "thank you", 464 | "tho": "though", 465 | "thx": "thank you", 466 | "tia": "thanks in advance", 467 | "til": "today i learned", 468 | "tl;dr": "too long i did not read", 469 | "tldr": "too long i did not read", 470 | "tmb": "tweet me back", 471 | "tntl": "trying not to laugh", 472 | "ttyl": "talk to you later", 473 | "u": "you", 474 | "u2": "you too", 475 | "u4e": "yours for ever", 476 | "utc": "coordinated universal time", 477 | "w/": "with", 478 | "w/o": "without", 479 | "w8": "wait", 480 | "wassup": "what is up", 481 | "wb": "welcome back", 482 | "wtf": "what the Expletive", 483 | "WTF": "what the Expletive", 484 | "wtg": "way to go", 485 | "wtpa": "where the party at", 486 | "wuf": "where are you from", 487 | "wuzup": "what is up", 488 | "wywh": "wish you were here", 489 | "yd": "yard", 490 | "ygtr": "you got that right", 491 | "ynk": "you never know", 492 | "zzz": "sleeping bored and tired" 493 | } 494 | 495 | 496 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert 497 | 498 | 499 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert 500 | def expandAbbreviations(sentence): 501 | text = sentence.split(" ") 502 | return " ".join([abbreviations.get(item, item) for item in text]) 503 | 504 | 505 | # Thanks to https://www.kaggle.com/rftexas/text-only-kfold-bert 506 | def expandSlangs(sentence): 507 | text = sentence.split(" ") 508 | return " ".join([slangs.get(item, item) for item in text]) 509 | 510 | 511 | def join_words(text): 512 | return " ".join(text) 513 | 514 | 515 | def remove_punctuations(text: str): 516 | return re.sub(r'[^\w\s]', '', text) 517 | 518 | 519 | def remove_emoji(text): 520 | emoji_pattern = re.compile("[" 521 | u"\U0001F600-\U0001F64F" # emoticons 522 | u"\U0001F300-\U0001F5FF" # symbols & pictographs 523 | u"\U0001F680-\U0001F6FF" # transport & map symbols 524 | u"\U0001F1E0-\U0001F1FF" # flags (iOS) 525 | u"\U00002702-\U000027B0" 526 | u"\U000024C2-\U0001F251" 527 | "]+", flags=re.UNICODE) 528 | return emoji_pattern.sub(r'', text) 529 | 530 | 531 | #### This counts emojis in a sentence which is very helpful to gauge sentiment 532 | def count_emojis(sentence): 533 | import regex 534 | import emoji 535 | emoji_counter = 0 536 | data = regex.findall(r'\X', sentence) 537 | for word in data: 538 | if any(char in emoji.UNICODE_EMOJI for char in word): 539 | emoji_counter += 1 540 | return emoji_counter 541 | 542 | 543 | ################################################################################ 544 | import re 545 | from wordcloud import WordCloud, STOPWORDS 546 | import matplotlib.pyplot as plt 547 | from textblob import TextBlob 548 | from itertools import chain 549 | 550 | replace_spaces = re.compile('[/(){}\[\]\|@,;]') 551 | remove_special_chars = re.compile('[^0-9a-z #+_]') 552 | STOPWORDS = return_stop_words() 553 | remove_ip_addr = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b') 554 | 555 | 556 | def clean_steps(text): 557 | text = text.replace('\n', ' ').lower() # 558 | text = remove_ip_addr.sub('', text) 559 | text = replace_spaces.sub(' ', text) 560 | text = remove_special_chars.sub('', text) 561 | text = ' '.join([w for w in text.split() if w not in STOPWORDS]) 562 | return text 563 | 564 | 565 | def clean_text(x): 566 | """ 567 | ############################################################################### 568 | ## This cleans text string. Use it only as a Series.map(clean_text) function # 569 | ############################################################################### 570 | Input must be one text string only. Don't send arrays or dataframes. 571 | Clean steps cleans one tweet at a time using following steps: 572 | 1. removes URL 573 | 2. Removes a very small list of stop words - about 65 574 | """ 575 | x = expandSlangs(x) ### do this before lowering case since case is important for sentiment 576 | x = expandAbbreviations(x) ### this is before lowering case since case is important in sentiment 577 | x = expandContractions(x) ### this is after lowering case - just to double check 578 | x = remove_stopwords(x) ## this works well to remove a small number of stop words 579 | x = remove_punctuations(x) # this works well to remove punctuations and add spaces correctly 580 | x = split_into_lemmas(x) ## this lemmatizes text and gets it ready for wordclouds ### 581 | return x 582 | 583 | 584 | def draw_wordcloud_from_dataframe(dataframe, column): 585 | """ 586 | This handy function draws a dataframe column using Wordcloud library and nltk. 587 | """ 588 | 589 | ### Remember that fillna only works at dataframe level! ## 590 | X_train = dataframe[[column]].fillna("missing") 591 | ### Map function only works on Series, so you should use this ### 592 | X_train = X_train[column].map(clean_steps) 593 | ### next time, you get back a series, so just use it as is ### 594 | X_train = X_train.map(clean_text) 595 | 596 | # Dictionary of all words from train corpus with their counts. 597 | 598 | ### Fantastic way to count words using one line of code ############# 599 | ### Thanks to : https://stackoverflow.com/questions/35857519/efficiently-count-word-frequencies-in-python 600 | words_counts = Counter(chain.from_iterable(map(str.split, X_train))) 601 | vocab_size = 50000 602 | top_words = sorted(words_counts, key=words_counts.get, reverse=True)[:vocab_size] 603 | text_join = ' '.join(top_words) 604 | 605 | # picture_mask = plt.imread('test.png') 606 | 607 | wordcloud1 = WordCloud( 608 | stopwords=STOPWORDS, 609 | background_color='white', 610 | width=1800, 611 | height=1400, 612 | # mask=picture_mask 613 | ).generate(text_join) 614 | return wordcloud1 615 | 616 | 617 | ################################################################################ 618 | # Removes duplicates from a list to return unique values - USED ONLYONCE 619 | def find_remove_duplicates(values): 620 | output = [] 621 | seen = set() 622 | for value in values: 623 | if value not in seen: 624 | output.append(value) 625 | seen.add(value) 626 | return output 627 | 628 | 629 | def draw_word_clouds(dft, each_string_var, chart_format, plotname, 630 | dep, problem_type, classes, mk_dir, verbose=0): 631 | dft = dft[:] 632 | width_size = 20 633 | height_size = 10 634 | imgdata_list = [] 635 | 636 | if problem_type == 'Regression' or problem_type == 'Clustering': 637 | ########## This is for Regression and Clustering problems only ##### 638 | num_plots = 1 639 | fig = plt.figure(figsize=(min(num_plots * width_size, 20), min(num_plots * height_size, 20))) 640 | cols = 2 641 | rows = int(num_plots / cols + 0.5) 642 | plotc = 1 643 | while plotc <= num_plots: 644 | plt.subplot(rows, cols, plotc) 645 | ax1 = plt.gca() 646 | wc1 = draw_wordcloud_from_dataframe(dft, each_string_var) 647 | plotc += 1 648 | ax1.axis("off") 649 | ax1.imshow(wc1) 650 | ax1.set_title('Wordcloud for %s' % each_string_var) 651 | image_count = 0 652 | if verbose == 2: 653 | imgdata_list.append(save_image_data(fig, chart_format, 654 | plotname, mk_dir)) 655 | image_count += 1 656 | if verbose <= 1: 657 | plt.show() 658 | else: 659 | ########## This is for Classification problems only ########### 660 | num_plots = len(classes) 661 | target_vars = dft[dep].unique() 662 | fig = plt.figure(figsize=(min(num_plots * width_size, 20), min(num_plots * height_size, 20))) 663 | cols = 2 664 | rows = int(num_plots / cols + 0.5) 665 | plotc = 1 666 | while plotc <= num_plots: 667 | plt.subplot(rows, cols, plotc) 668 | ax1 = plt.gca() 669 | ax1.axis("off") 670 | dft_target = dft.loc[(dft[dep] == target_vars[plotc - 1])][each_string_var] 671 | if isinstance(dft_target, pd.Series): 672 | wc1 = draw_wordcloud_from_dataframe(pd.DataFrame(dft_target), each_string_var) 673 | else: 674 | wc1 = draw_wordcloud_from_dataframe(dft_target, each_string_var) 675 | ax1.imshow(wc1) 676 | ax1.set_title('Wordcloud for %s, target=%s' % (each_string_var, target_vars[plotc - 1]), fontsize=20) 677 | plotc += 1 678 | fig.tight_layout() 679 | ### This is where you save the fig or show the fig ###### 680 | image_count = 0 681 | if verbose == 2: 682 | imgdata_list.append(save_image_data(fig, chart_format, 683 | plotname, mk_dir)) 684 | image_count += 1 685 | if verbose <= 1: 686 | plt.show() 687 | ####### End of Word Clouds ############################# 688 | -------------------------------------------------------------------------------- /autoviz/__init__.py: -------------------------------------------------------------------------------- 1 | name = "autoviz" 2 | from .__version__ import __version__, __holo_version__ 3 | from .AutoViz_Class import AutoViz_Class 4 | from .AutoViz_Class import data_cleaning_suggestions 5 | from .AutoViz_Class import FixDQ 6 | ############################################################################################ 7 | if __name__ == "__main__": 8 | module_type = 'Running' 9 | else: 10 | module_type = 'Imported' 11 | version_number = __version__ 12 | print("""%s v%s. Please call AutoViz in this sequence: 13 | AV = AutoViz_Class() 14 | %%matplotlib inline 15 | dfte = AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=1, lowess=False, 16 | chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None)""" % ( 17 | module_type, version_number)) 18 | ########################################################################################### 19 | -------------------------------------------------------------------------------- /autoviz/__version__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Specifies the version of the Auto_ViML package.""" 3 | 4 | __title__ = "AutoViz" 5 | __author__ = "Ram Seshadri" 6 | __description__ = "Automatically Visualize any data set any size with a Single Line of Code" 7 | __url__ = "https://github.com/AutoViML/AutoViz.git" 8 | __version__ = "0.1.905" 9 | __holo_version__ = "0.0.4" 10 | __license__ = "Apache License 2.0" 11 | __copyright__ = "2020-21 Google" 12 | -------------------------------------------------------------------------------- /autoviz/classify_method.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | import numpy as np 4 | import pandas as pd 5 | 6 | np.random.seed(99) 7 | random.seed(42) 8 | ################################################################################ 9 | #### The warnings from Sklearn are so annoying that I have to shut it off ####### 10 | import warnings 11 | 12 | warnings.filterwarnings("ignore") 13 | from sklearn.exceptions import DataConversionWarning 14 | 15 | warnings.filterwarnings(action='ignore', category=DataConversionWarning) 16 | 17 | 18 | def warn(*args, **kwargs): 19 | pass 20 | 21 | 22 | warnings.warn = warn 23 | #################################################################################### 24 | from functools import reduce 25 | 26 | 27 | def left_subtract(l1, l2): 28 | lst = [] 29 | for i in l1: 30 | if i not in l2: 31 | lst.append(i) 32 | return lst 33 | 34 | 35 | ################################################################################# 36 | import copy 37 | 38 | 39 | def EDA_find_remove_columns_with_infinity(df, remove=False, verbose=0): 40 | """ 41 | This function finds all columns in a dataframe that have infinite values (np.inf or -np.inf) 42 | It returns a list of column names. If the list is empty, it means no columns were found. 43 | If remove flag is set, then it returns a smaller dataframe with inf columns removed. 44 | """ 45 | nums = df.select_dtypes(include='number').columns.tolist() 46 | dfx = df[nums] 47 | sum_rows = np.isinf(dfx).values.sum() 48 | add_cols = list(dfx.columns.to_series()[np.isinf(dfx).any()]) 49 | if sum_rows > 0: 50 | if verbose > 0: 51 | print(' there are %d rows and %d columns with infinity in them...' % (sum_rows, len(add_cols))) 52 | if remove: 53 | ### here you need to use df since the whole dataset is involved ### 54 | nocols = [x for x in df.columns if x not in add_cols] 55 | if verbose > 0: 56 | print(" Shape of dataset before %s and after %s removing columns with infinity" % 57 | (df.shape, (df[nocols].shape,))) 58 | return df[nocols] 59 | else: 60 | ## this will be a list of columns with infinity #### 61 | return add_cols 62 | else: 63 | ## this will be an empty list if there are no columns with infinity 64 | return add_cols 65 | 66 | 67 | #################################################################################### 68 | def classify_columns(df_preds, verbose=0): 69 | """ 70 | This actually does Exploratory data analysis - it means this function performs EDA 71 | ###################################################################################### 72 | Takes a dataframe containing only predictors to be classified into various types. 73 | DO NOT SEND IN A TARGET COLUMN since it will try to include that into various columns. 74 | Returns a data frame containing columns and the class it belongs to such as numeric, 75 | categorical, date or id column, boolean, nlp, discrete_string and cols to delete... 76 | ####### Returns a dictionary with 10 kinds of vars like the following: # continuous_vars,int_vars 77 | # cat_vars,factor_vars, bool_vars,discrete_string_vars,nlp_vars,date_vars,id_vars,cols_delete 78 | """ 79 | train = copy.deepcopy(df_preds) 80 | #### If there are 30 chars are more in a discrete_string_var, it is then considered an NLP variable 81 | max_nlp_char_size = 30 82 | max_cols_to_print = 30 83 | print('#######################################################################################') 84 | print('######################## C L A S S I F Y I N G V A R I A B L E S ####################') 85 | print('#######################################################################################') 86 | print('Classifying variables in data set...') 87 | #### Cat_Limit defines the max number of categories a column can have to be called a categorical colum 88 | cat_limit = 35 89 | float_limit = 15 #### Make this limit low so that float variables below this limit become cat vars ### 90 | 91 | def add(a, b): 92 | return a + b 93 | 94 | sum_all_cols = dict() 95 | orig_cols_total = train.shape[1] 96 | # Types of columns 97 | cols_delete = [] 98 | cols_delete = [col for col in list(train) if (len(train[col].value_counts()) == 1) 99 | | (train[col].isnull().sum() / len(train) >= 0.90)] 100 | inf_cols = EDA_find_remove_columns_with_infinity(train, remove=False, verbose=verbose) 101 | mixed_cols = [x for x in list(train) if len(train[x].dropna().apply(type).value_counts()) > 1] 102 | if len(mixed_cols) > 0: 103 | print(' Removing %s column(s) due to mixed data type detected...' % mixed_cols) 104 | cols_delete += mixed_cols 105 | cols_delete += inf_cols 106 | train = train[left_subtract(list(train), cols_delete)] 107 | var_df = pd.Series(dict(train.dtypes)).reset_index(drop=False).rename( 108 | columns={0: 'type_of_column'}) 109 | sum_all_cols['cols_delete'] = cols_delete 110 | 111 | var_df['bool'] = var_df.apply( 112 | lambda x: 1 if x['type_of_column'] in ['bool', 'object'] and len(train[x['index']].value_counts()) == 2 else 0, 113 | axis=1) 114 | string_bool_vars = list(var_df[(var_df['bool'] == 1)]['index']) 115 | sum_all_cols['string_bool_vars'] = string_bool_vars 116 | var_df['num_bool'] = var_df.apply(lambda x: 1 if x['type_of_column'] in [np.uint8, 117 | np.uint16, np.uint32, np.uint64, 118 | 'int8', 'int16', 'int32', 'int64', 119 | 'float16', 'float32', 'float64'] and len( 120 | train[x['index']].value_counts()) == 2 else 0, axis=1) 121 | num_bool_vars = list(var_df[(var_df['num_bool'] == 1)]['index']) 122 | sum_all_cols['num_bool_vars'] = num_bool_vars 123 | ###### This is where we take all Object vars and split them into diff kinds ### 124 | discrete_or_nlp = var_df.apply(lambda x: 1 if x['type_of_column'] in ['object'] and x[ 125 | 'index'] not in string_bool_vars + cols_delete else 0, axis=1) 126 | ######### This is where we figure out whether a string var is nlp or discrete_string var ### 127 | var_df['nlp_strings'] = 0 128 | var_df['discrete_strings'] = 0 129 | var_df['cat'] = 0 130 | var_df['id_col'] = 0 131 | discrete_or_nlp_vars = var_df.loc[discrete_or_nlp == 1]['index'].values.tolist() 132 | copy_discrete_or_nlp_vars = copy.deepcopy(discrete_or_nlp_vars) 133 | if len(discrete_or_nlp_vars) > 0: 134 | for col in copy_discrete_or_nlp_vars: 135 | #### first fill empty or missing vals since it will blowup ### 136 | ### Remember that fillna only works at the dataframe level! 137 | train[[col]] = train[[col]].fillna(' ') 138 | if train[col].map(lambda x: len(x) if type(x) == str else 0).max( 139 | ) >= 50 and len(train[col].value_counts()) >= int(0.9 * len(train)) and col not in string_bool_vars: 140 | var_df.loc[var_df['index'] == col, 'nlp_strings'] = 1 141 | elif train[col].map(lambda x: len(x) if type(x) == str else 0).mean( 142 | ) >= max_nlp_char_size and train[col].map(lambda x: len(x) if type(x) == str else 0).max( 143 | ) < 50 and len(train[col].value_counts() 144 | ) <= int(0.9 * len(train)) and col not in string_bool_vars: 145 | var_df.loc[var_df['index'] == col, 'discrete_strings'] = 1 146 | elif len(train[col].value_counts()) > cat_limit and len( 147 | train[col].value_counts()) <= int(0.9 * len(train)) and col not in string_bool_vars: 148 | var_df.loc[var_df['index'] == col, 'discrete_strings'] = 1 149 | elif len(train[col].value_counts()) > cat_limit and len(train[col].value_counts() 150 | ) == len(train) and col not in string_bool_vars: 151 | var_df.loc[var_df['index'] == col, 'id_col'] = 1 152 | else: 153 | var_df.loc[var_df['index'] == col, 'cat'] = 1 154 | nlp_vars = list(var_df[(var_df['nlp_strings'] == 1)]['index']) 155 | sum_all_cols['nlp_vars'] = nlp_vars 156 | discrete_string_vars = list(var_df[(var_df['discrete_strings'] == 1)]['index']) 157 | sum_all_cols['discrete_string_vars'] = discrete_string_vars 158 | ###### This happens only if a string column happens to be an ID column ####### 159 | #### DO NOT Add this to ID_VARS yet. It will be done later. Don't change it easily... 160 | #### Category DTYPE vars are very special = they can be left as is and not disturbed in Python. ### 161 | var_df['dcat'] = var_df.apply(lambda x: 1 if str(x['type_of_column']) == 'category' else 0, 162 | axis=1) 163 | factor_vars = list(var_df[(var_df['dcat'] == 1)]['index']) 164 | sum_all_cols['factor_vars'] = factor_vars 165 | ######################################################################## 166 | date_or_id = var_df.apply(lambda x: 1 if x['type_of_column'] in [np.uint8, 167 | np.uint16, np.uint32, np.uint64, 168 | 'int8', 'int16', 169 | 'int32', 'int64'] and x[ 170 | 'index'] not in (string_bool_vars + num_bool_vars + 171 | discrete_string_vars + nlp_vars) else 0, 172 | axis=1) 173 | ######### This is where we figure out whether a numeric col is date or id variable ### 174 | var_df['int'] = 0 175 | var_df['date_time'] = 0 176 | ### if a particular column is date-time type, now set it as a date time variable ## 177 | var_df['date_time'] = var_df.apply(lambda x: 1 if x['type_of_column'] in [' 2050: 185 | var_df.loc[var_df['index'] == col, 'id_col'] = 1 186 | else: 187 | try: 188 | pd.to_datetime(train[col], infer_datetime_format=True) 189 | var_df.loc[var_df['index'] == col, 'date_time'] = 1 190 | except: 191 | var_df.loc[var_df['index'] == col, 'id_col'] = 1 192 | else: 193 | if train[col].min() < 1900 or train[col].max() > 2050: 194 | if col not in num_bool_vars: 195 | var_df.loc[var_df['index'] == col, 'int'] = 1 196 | else: 197 | try: 198 | pd.to_datetime(train[col], infer_datetime_format=True) 199 | var_df.loc[var_df['index'] == col, 'date_time'] = 1 200 | except: 201 | if col not in num_bool_vars: 202 | var_df.loc[var_df['index'] == col, 'int'] = 1 203 | else: 204 | pass 205 | int_vars = list(var_df[(var_df['int'] == 1)]['index']) 206 | date_vars = list(var_df[(var_df['date_time'] == 1)]['index']) 207 | id_vars = list(var_df[(var_df['id_col'] == 1)]['index']) 208 | sum_all_cols['int_vars'] = int_vars 209 | copy_date_vars = copy.deepcopy(date_vars) 210 | for date_var in copy_date_vars: 211 | #### This test is to make sure date vars are actually date vars 212 | try: 213 | pd.to_datetime(train[date_var], infer_datetime_format=True) 214 | except: 215 | ##### if not a date var, then just add it to delete it from processing 216 | cols_delete.append(date_var) 217 | date_vars.remove(date_var) 218 | sum_all_cols['date_vars'] = date_vars 219 | sum_all_cols['id_vars'] = id_vars 220 | sum_all_cols['cols_delete'] = cols_delete 221 | ## This is an EXTREMELY complicated logic for cat vars. Don't change it unless you test it many times! 222 | var_df['numeric'] = 0 223 | float_or_cat = var_df.apply(lambda x: 1 if x['type_of_column'] in ['float16', 224 | 'float32', 'float64'] else 0, 225 | axis=1) 226 | ####### We need to make sure there are no categorical vars in float ####### 227 | if len(var_df.loc[float_or_cat == 1]) > 0: 228 | for col in var_df.loc[float_or_cat == 1]['index'].values.tolist(): 229 | if 2 < len(train[col].value_counts()) <= float_limit and len( 230 | train[col].value_counts()) <= len(train): 231 | var_df.loc[var_df['index'] == col, 'cat'] = 1 232 | else: 233 | if col not in (num_bool_vars + factor_vars): 234 | var_df.loc[var_df['index'] == col, 'numeric'] = 1 235 | cat_vars = list(var_df[(var_df['cat'] == 1)]['index']) 236 | continuous_vars = list(var_df[(var_df['numeric'] == 1)]['index']) 237 | 238 | ######## V E R Y I M P O R T A N T ################################################### 239 | cat_vars_copy = copy.deepcopy(factor_vars) 240 | for cat in cat_vars_copy: 241 | if df_preds[cat].dtype == float: 242 | continuous_vars.append(cat) 243 | factor_vars.remove(cat) 244 | var_df.loc[var_df['index'] == cat, 'dcat'] = 0 245 | var_df.loc[var_df['index'] == cat, 'numeric'] = 1 246 | elif len(df_preds[cat].value_counts()) == df_preds.shape[0]: 247 | id_vars.append(cat) 248 | factor_vars.remove(cat) 249 | var_df.loc[var_df['index'] == cat, 'dcat'] = 0 250 | var_df.loc[var_df['index'] == cat, 'id_col'] = 1 251 | 252 | sum_all_cols['factor_vars'] = factor_vars 253 | ##### There are a couple of extra tests you need to do to remove abberations in cat_vars ### 254 | cat_vars_copy = copy.deepcopy(cat_vars) 255 | for cat in cat_vars_copy: 256 | if df_preds[cat].dtype == float: 257 | continuous_vars.append(cat) 258 | cat_vars.remove(cat) 259 | var_df.loc[var_df['index'] == cat, 'cat'] = 0 260 | var_df.loc[var_df['index'] == cat, 'numeric'] = 1 261 | elif len(df_preds[cat].value_counts()) == df_preds.shape[0]: 262 | id_vars.append(cat) 263 | cat_vars.remove(cat) 264 | var_df.loc[var_df['index'] == cat, 'cat'] = 0 265 | var_df.loc[var_df['index'] == cat, 'id_col'] = 1 266 | sum_all_cols['cat_vars'] = cat_vars 267 | sum_all_cols['continuous_vars'] = continuous_vars 268 | sum_all_cols['id_vars'] = id_vars 269 | ###### This is where you consolidate the numbers ########### 270 | var_dict_sum = dict(zip(var_df.values[:, 0], var_df.values[:, 2:].sum(1))) 271 | for col, sumval in var_dict_sum.items(): 272 | if sumval == 0: 273 | print('%s of type=%s is not classified' % (col, train[col].dtype)) 274 | elif sumval > 1: 275 | print('%s of type=%s is classified into more then one type' % (col, train[col].dtype)) 276 | else: 277 | pass 278 | ##### If there are more than 1000 unique values, then add it to NLP vars ### 279 | copy_discrete_vals = copy.deepcopy(discrete_string_vars) 280 | for each_discrete in copy_discrete_vals: 281 | if train[each_discrete].nunique() >= 1000: 282 | nlp_vars.append(each_discrete) 283 | discrete_string_vars.remove(each_discrete) 284 | elif 100 < train[each_discrete].nunique() < 1000: 285 | pass 286 | else: 287 | ### If it is less than 100 unique values, then make it categorical var 288 | cat_vars.append(each_discrete) 289 | discrete_string_vars.remove(each_discrete) 290 | sum_all_cols['discrete_string_vars'] = discrete_string_vars 291 | sum_all_cols['cat_vars'] = cat_vars 292 | sum_all_cols['nlp_vars'] = nlp_vars 293 | ############### This is where you print all the types of variables ############## 294 | ####### Returns 8 vars in the following order: continuous_vars,int_vars,cat_vars, 295 | ### string_bool_vars,discrete_string_vars,nlp_vars,date_or_id_vars,cols_delete 296 | if verbose == 1: 297 | print(" Number of Numeric Columns = ", len(continuous_vars)) 298 | print(" Number of Integer-Categorical Columns = ", len(int_vars)) 299 | print(" Number of String-Categorical Columns = ", len(cat_vars)) 300 | print(" Number of Factor-Categorical Columns = ", len(factor_vars)) 301 | print(" Number of String-Boolean Columns = ", len(string_bool_vars)) 302 | print(" Number of Numeric-Boolean Columns = ", len(num_bool_vars)) 303 | print(" Number of Discrete String Columns = ", len(discrete_string_vars)) 304 | print(" Number of NLP String Columns = ", len(nlp_vars)) 305 | print(" Number of Date Time Columns = ", len(date_vars)) 306 | print(" Number of ID Columns = ", len(id_vars)) 307 | print(" Number of Columns to Delete = ", len(cols_delete)) 308 | if verbose >= 2: 309 | print(' Printing up to %d columns (max) in each category:' % max_cols_to_print) 310 | print(" Numeric Columns : %s" % continuous_vars[:max_cols_to_print]) 311 | print(" Integer-Categorical Columns: %s" % int_vars[:max_cols_to_print]) 312 | print(" String-Categorical Columns: %s" % cat_vars[:max_cols_to_print]) 313 | print(" Factor-Categorical Columns: %s" % factor_vars[:max_cols_to_print]) 314 | print(" String-Boolean Columns: %s" % string_bool_vars[:max_cols_to_print]) 315 | print(" Numeric-Boolean Columns: %s" % num_bool_vars[:max_cols_to_print]) 316 | print(" Discrete String Columns: %s" % discrete_string_vars[:max_cols_to_print]) 317 | print(" NLP text Columns: %s" % nlp_vars[:max_cols_to_print]) 318 | print(" Date Time Columns: %s" % date_vars[:max_cols_to_print]) 319 | print(" ID Columns: %s" % id_vars[:max_cols_to_print]) 320 | print(" Columns that will not be considered in modeling: %s" % cols_delete[:max_cols_to_print]) 321 | ##### now collect all the column types and column names into a single dictionary to return! 322 | 323 | len_sum_all_cols = reduce(add, [len(v) for v in sum_all_cols.values()]) 324 | if len_sum_all_cols == orig_cols_total: 325 | print(' %d Predictors classified...' % orig_cols_total) 326 | # print(' This does not include the Target column(s)') 327 | else: 328 | print('No of columns classified %d does not match %d total cols. Continuing...' % ( 329 | len_sum_all_cols, orig_cols_total)) 330 | ls = sum_all_cols.values() 331 | flat_list = [item for sublist in ls for item in sublist] 332 | if len(left_subtract(list(train), flat_list)) == 0: 333 | print(' Missing columns = None') 334 | else: 335 | print(' Missing columns = %s' % left_subtract(list(train), flat_list)) 336 | return sum_all_cols 337 | #################################################################################### 338 | -------------------------------------------------------------------------------- /autoviz/test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/autoviz/test.png -------------------------------------------------------------------------------- /autoviz/tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/autoviz/tests/__init__.py -------------------------------------------------------------------------------- /autoviz/tests/test_autoviz_class.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | 3 | from ..AutoViz_Class import AutoViz_Class 4 | 5 | class TestAutoVizClass: 6 | def test_add_plots(self): 7 | return True 8 | -------------------------------------------------------------------------------- /autoviz/tests/test_deps.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | 3 | 4 | class DepsTest(unittest.TestCase): 5 | def test(self): 6 | # have to pip install xgboost 7 | from autoviz import AutoViz_Class as AV 8 | AVC = AV.AutoViz_Class() 9 | self.assertIsNotNone(AVC) 10 | -------------------------------------------------------------------------------- /images/bokeh_charts.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/bokeh_charts.JPG -------------------------------------------------------------------------------- /images/data_clean.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/data_clean.png -------------------------------------------------------------------------------- /images/logo.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/logo.JPG -------------------------------------------------------------------------------- /images/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/logo.png -------------------------------------------------------------------------------- /images/server_charts.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/server_charts.JPG -------------------------------------------------------------------------------- /images/var_charts.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AutoViML/AutoViz/63f4b3c67ca80d0148b10b4079e3fd5609e32657/images/var_charts.JPG -------------------------------------------------------------------------------- /old_setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | with open("README.md", "r") as fh: 4 | long_description = fh.read() 5 | 6 | setuptools.setup( 7 | name="autoviz", 8 | version="0.1.806", 9 | author="Ram Seshadri", 10 | # author_email="author@example.com", 11 | description="Automatically Visualize any dataset, any size with a single line of code", 12 | long_description=long_description, 13 | long_description_content_type="text/markdown", 14 | license='Apache License 2.0', 15 | url="https://github.com/AutoViML/AutoViz.git", 16 | packages=setuptools.find_packages(exclude=("tests",)), 17 | install_requires=[ 18 | "xlrd", 19 | "wordcloud", 20 | "emoji", 21 | "numpy<1.25.0", 22 | "pandas", 23 | "pyamg", 24 | "matplotlib<=3.7.4", 25 | "seaborn>=0.12.2", 26 | "scikit-learn", 27 | "statsmodels", 28 | "nltk", 29 | "textblob", 30 | "holoviews~=1.14.9", 31 | "bokeh~=2.4.2", 32 | "hvplot~=0.7.3", 33 | "panel>=0.12.6", 34 | "xgboost>=0.82,<1.7", 35 | "fsspec>=0.8.3", 36 | "typing-extensions>=4.1.1", 37 | "pandas-dq>=1.29" 38 | ], 39 | classifiers=[ 40 | "Programming Language :: Python :: 3", 41 | "Operating System :: OS Independent", 42 | ], 43 | ) 44 | -------------------------------------------------------------------------------- /requirements-py310.txt: -------------------------------------------------------------------------------- 1 | xlrd 2 | wordcloud 3 | pyamg 4 | nltk 5 | emoji 6 | textblob 7 | matplotlib<=3.7.4 8 | seaborn>=0.12.2 9 | scikit-learn 10 | statsmodels 11 | xgboost>=0.82,<1.7 12 | fsspec>=0.8.3 13 | typing-extensions>=4.1.1 14 | pandas-dq>=1.29 15 | numpy>=1.25.0 16 | hvplot>=0.9.2 17 | panel>=1.4.0 18 | holoviews>=1.15.3 19 | pandas<2.0 20 | -------------------------------------------------------------------------------- /requirements-py311.txt: -------------------------------------------------------------------------------- 1 | xlrd 2 | wordcloud 3 | pyamg 4 | nltk 5 | emoji 6 | textblob 7 | matplotlib<=3.7.4 8 | seaborn>=0.12.2 9 | scikit-learn 10 | statsmodels 11 | xgboost>=0.82,<1.7 12 | fsspec>=0.8.3 13 | typing-extensions>=4.1.1 14 | pandas-dq>=1.29 15 | numpy>=1.25.0 16 | hvplot>=0.9.2 17 | panel>=1.4.0 18 | holoviews>=1.15.3 19 | pandas<2.0 20 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | xlrd 2 | wordcloud 3 | pyamg 4 | nltk 5 | emoji 6 | textblob 7 | matplotlib<=3.7.4 8 | seaborn>=0.12.2 9 | scikit-learn 10 | statsmodels 11 | xgboost>=0.82,<1.7 12 | fsspec>=0.8.3 13 | typing-extensions>=4.1.1 14 | pandas-dq>=1.29 15 | numpy<1.24 16 | hvplot~=0.7.3 17 | panel~=0.14.4 18 | holoviews~=1.14.9 19 | param==1.13.0 20 | pandas<2.0 21 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | import sys 3 | 4 | with open("README.md", "r") as fh: 5 | long_description = fh.read() 6 | 7 | # Determine the Python version 8 | python_version = sys.version_info 9 | 10 | list_req = [ 11 | "xlrd", 12 | "wordcloud", 13 | "emoji", 14 | # Assuming numpy version <1.25.0 is compatible with older Python versions and older HoloViews 15 | "pyamg", 16 | "scikit-learn", 17 | "statsmodels", 18 | "nltk", 19 | "textblob", 20 | "xgboost>=0.82,<1.7", 21 | "fsspec>=0.8.3", 22 | "typing-extensions>=4.1.1", 23 | "pandas-dq>=1.29" 24 | ] 25 | # Define default dependencies (compatible with older Python versions) 26 | install_requires = list_req 27 | 28 | # Define default dependencies (compatible with older Python versions) 29 | install_requires = list_req + [ 30 | # Keep most dependencies as is, adjust only where necessary 31 | "numpy>=1.24.0", # Update as needed for compatibility with newer HoloViews 32 | # Update other dependencies as needed 33 | "hvplot>=0.9.2", ###newer hvplot 34 | "holoviews>=1.16.0", # Update based on the bug fix relevant to Python 3.10 35 | # Ensure other dependencies are compatible 36 | "panel>=1.4.0", ## this is a new version of panel 37 | "pandas>=2.0", ## pandas must be below 2.0 version 38 | "matplotlib>3.7.4", ## newer version of matplotlib 39 | "seaborn>0.12.2", ## newer version of seaborn ## 40 | ] 41 | 42 | setuptools.setup( 43 | name="autoviz", 44 | version="0.1.905", 45 | author="Ram Seshadri", 46 | description="Automatically Visualize any dataset, any size with a single line of code", 47 | long_description=long_description, 48 | long_description_content_type="text/markdown", 49 | license='Apache License 2.0', 50 | url="https://github.com/AutoViML/AutoViz.git", 51 | packages=setuptools.find_packages(exclude=("tests",)), 52 | install_requires=install_requires, 53 | classifiers=[ 54 | "Programming Language :: Python :: 3", 55 | "Operating System :: OS Independent", 56 | ], 57 | ) -------------------------------------------------------------------------------- /updates.md: -------------------------------------------------------------------------------- 1 | # Latest updates and news from AutoViz! 2 | 3 | ### April 2024: AutoViz version 0.1.900+ series has some fixes for autoviz install issues 4 | You can always use pip install from git which uses the latest setup.py and it works well! 5 | `!pip install git+https://github.com/AutoViML/AutoViz` 6 | 7 | But if you are using `pip install autoviz`, then you will get two kinds of errors. In order to know what to do, perform the following steps. 8 | 9 | **First print these 3 versions of pandas, numpy and holoviews**
    10 | 11 | import pandas as pd
    12 | import numpy as np
    13 | import holoviews as hv
    14 | print(pd.__version__, np.__version__,hv.__version__) 15 |
    16 | 17 | If it prints 18 | `numpy<1.24, pandas<2.0, holoviews <= 1.14.19` 19 | 20 | These are all older versions of pandas and numpy along with older versions of holoviews<=1.14.9. These three older versions work together since holoviews uses an older syntax numpy (`np.bool`) that numpy<1.24 uses. However, if you are running this in kaggle kernels, you must restart your Kaggle kernel after you install autoviz since it changes the numpy and pandas versions to an older version and requires a restart to take effect. But if you get this error: `"ValueError: ClassSelector parameter None value must be an instance of (function, tuple), not ."` In that case, you must upgrade holoviews to 1.16.0. 21 | 22 | But if the above statements, print newer versions of pandas and numpy, like this:
    23 | 24 | pandas>=2.0.0 numpy>=1.24.0 holoviews>=1.16.0 25 | 26 | then you need newer versions of holoviews. Although regular AutoViz works well with newer pandas and numpy the older holoviews version corrupts it. AutoViz_Holo needs newer versions such as holoviews>=1.16.0 in order to work with newer numpy and pandas. For example, if you don't upgrade holoviews to >=1.16.0 you will get this error: "ValueError: ClassSelector parameter None value must be an instance of (function, tuple), not ." 27 | 28 | Hope this is clear. Please let us know via the issues tab in GitHub. 29 | 30 | ### December 2023: AutoViz now has modular dependency loading and improved support for Python versions 3.10+ 31 |
  • Modular Dependency Loading: AutoViz now uses a more flexible approach for importing visualization libraries starting with version `0.1.801`. This means you only need to install certain dependencies (like hvplot and holoviews) if you plan to use specific backends (e.g., bokeh). This change significantly reduces installation issues for users on newer Python versions such as 3.10 and higher.
  • 32 | 33 |
  • Improved Backend Support: Depending on your Python environment, AutoViz dynamically adjusts to use compatible visualization libraries, ensuring a smoother user experience. Requirements: 34 | "holoviews>=1.14.9", 35 | "bokeh>=2.4.2", 36 | "hvplot>=0.7.3", 37 | "panel>=0.12.6". 38 |
  • 39 | 40 | ### June 2023: AutoViz now has Data Quality checks and a transformer to fix your data quality 41 | From version 0.1.70, AutoViz can now automatically analyze your dataset and fix data quality issues in your data set. All you have to do is to `from autoviz import FixDQ ` and use it like a `fit_transform` transformer. It's that easy to perform data cleaning now with AutoViz! 42 | 43 | ![data_clean](images/data_clean.png) 44 | 45 | ### Apr-2023 Update: AutoViz now creates scatter plots for categorical variables when data contains only cat variables 46 | From version 0.1.600 onwards, AutoViz now automatically draws `catscatter` plots for pairs of categorical variables in a data frame. A `catscatter` plot is a type of scatter plot that shows the frequency of each combination of categories in two variables. It can be useful for exploring the relationship between categorical variables and identifying patterns or outliers. It creates these plots only if the data contains no numeric variables. Otherwise, it doesn't create them since it would be unncessary. 47 | 48 | ``` 49 | AutoViz is grateful to the cascatter implementation of Myr Barnés, 2020. 50 | You can see the original here: https://github.com/myrthings/catscatter/blob/master/catscatter.py 51 | # More info about this function here: 52 | # - https://towardsdatascience.com/visualize-categorical-relationships-with-catscatter-e60cdb164395 53 | # - https://github.com/myrthings/catscatter/blob/master/README.md 54 | ``` 55 | 56 | ### Sep-2022 Update: AutoViz now provides data cleansing suggestions! #autoviz #datacleaning 57 | From version 0.1.50 onwards, AutoViz now automatically analyzes your dataset and provides suggestions for how to clean your data set. It detects missing values, identifies rare categories, finds infinite values, detects mixed data types, and so much more. This will help you tremendously speed up your data cleaning activities. If you have suggestions to add more data cleaning steps please file an `Issue` in our GitHub and we will gladly consider it. Here is an example of how data cleaning suggestions look:
    58 | 59 | 60 | In order to get this latest function, you must upgrade autoviz to the latest version by: 61 | ``` 62 | pip install autoviz --upgrade 63 | ``` 64 | 65 | In the same version, you can also get data suggestions by using `AV.AutoViz(......, verbose=1)` or by simply importing it:
    66 | 67 | ``` 68 | from autoviz import data_cleaning_suggestions 69 | data_cleaning_suggestions(df) 70 | ``` 71 | 72 | ### Dec-23-2021 Update: AutoViz now does Wordclouds! #autoviz #wordcloud 73 | AutoViz can now create Wordclouds automatically for your NLP variables in data. It detects NLP variables automatically and creates wordclouds for them. See Colab notebook for example: [AutoViz Demo with HTML setting](https://colab.research.google.com/drive/1r5QqESRZDY98FFfDOgVtMAVA_oaGtqqx?usp=sharing) 74 | 75 | 76 | 77 | ### Dec 21, 2021: AutoViz now runs on Docker containers as part of MLOps pipelines. Check out Orchest.io 78 | We are excited to announce that AutoViz and Deep_AutoViML are now available as containerized applications on Docker. This means that you can build data pipelines using a fantastic tool like [orchest.io](orchest.io) to build MLOps pipelines visually. Here are two sample pipelines we have created: 79 | 80 | AutoViz pipeline: https://lnkd.in/g5uC-z66 81 | Deep_AutoViML pipeline: https://lnkd.in/gdnWTqCG 82 | 83 | You can find more examples and a wonderful video on [orchest's web site](https://github.com/orchest/orchest-examples) 84 | ![banner](https://github.com/rsesha/autoviz_pipeline/blob/main/autoviz_orchest.png) 85 | 86 | ### Dec-17-2021 AutoViz now uses HoloViews to display dashboards with Bokeh and save them as Dynamic HTML for web serving #HTML #Bokeh #Holoviews 87 | Now you can use AutoViz to create Interactive Bokeh charts and dashboards (see below) either in Jupyter Notebooks or in the browser. Use chart_format as follows: 88 | - `chart_format='bokeh'`: interactive Bokeh dashboards are plotted in Jupyter Notebooks. 89 | - `chart_format='server'`, dashboards will pop up for each kind of chart on your web browser. 90 | - `chart_format='html'`, interactive Bokeh charts will be silently saved as Dynamic HTML files under `AutoViz_Plots` directory 91 | --------------------------------------------------------------------------------