├── .gitignore
├── LICENSE.txt
├── PyDataLens.egg-info
    ├── PKG-INFO
    ├── SOURCES.txt
    ├── dependency_links.txt
    ├── requires.txt
    └── top_level.txt
├── README.md
├── build
    └── lib
    │   ├── pydatalens
    │       ├── __init__.py
    │       ├── cleaning.py
    │       ├── eda.py
    │       ├── utils.py
    │       └── visualizations.py
    │   └── tests
    │       ├── __init__.py
    │       ├── test_cleaning.py
    │       ├── test_eda.py
    │       └── test_visualizations.py
├── dist
    ├── pydatalens-0.0.8-py3-none-any.whl
    └── pydatalens-0.0.8.tar.gz
├── docs
    ├── INSTALL.md
    ├── README.md
    └── USAGE.md
├── examples
    └── example_usage.py
├── pydatalens
    ├── Templates
    │   └── report_template.html
    ├── __init__.py
    ├── cleaning.py
    ├── eda.py
    ├── utils.py
    └── visualizations.py
├── requirements.txt
├── setup.py
└── tests
    ├── __init__.py
    ├── test_cleaning.py
    ├── test_eda.py
    └── test_visualizations.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | __pycache__/
3 | *.log
4 | *.csv
5 | *.xlsx
6 | .env
7 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 Gopalakrishnan Arjunan
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/PyDataLens.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
  1 | Metadata-Version: 2.2
  2 | Name: pydatalens
  3 | Version: 0.0.8
  4 | Summary: A Python package for automatic EDA, data cleaning, and visualization.
  5 | Home-page: https://github.com/gopalakrishnanarjun/pydatalens
  6 | Author: Gopalakrishnan Arjunan
  7 | Author-email: gopalakrishnana02@gmail.com
  8 | Classifier: Programming Language :: Python :: 3
  9 | Classifier: License :: OSI Approved :: MIT License
 10 | Classifier: Operating System :: OS Independent
 11 | Requires-Python: >=3.6
 12 | Description-Content-Type: text/markdown
 13 | License-File: LICENSE.txt
 14 | Requires-Dist: pandas
 15 | Requires-Dist: numpy
 16 | Requires-Dist: matplotlib
 17 | Requires-Dist: seaborn
 18 | Dynamic: author
 19 | Dynamic: author-email
 20 | Dynamic: classifier
 21 | Dynamic: description
 22 | Dynamic: description-content-type
 23 | Dynamic: home-page
 24 | Dynamic: requires-dist
 25 | Dynamic: requires-python
 26 | Dynamic: summary
 27 | 
 28 | 
 29 | # pydatalens
 30 | 
 31 | pydatalens is a Python package designed to streamline the process of **Exploratory Data Analysis (EDA)**, **data cleaning**, and **visualization**. 
 32 | It enables data scientists and analysts to quickly prepare, explore, and gain insights from datasets with minimal effort.
 33 | 
 34 | ---
 35 | 
 36 | ## Features
 37 | 
 38 | ### 1. **Smart Summarization**
 39 | - Automatically generates a summary of the dataset, including:
 40 |   - Data types
 41 |   - Missing values
 42 |   - Descriptive statistics
 43 |   - Unique value counts
 44 | 
 45 | ### 2. **Data Cleaning**
 46 | - Detects and handles missing values using various strategies (mean, median, mode).
 47 | - Identifies and removes duplicate rows.
 48 | - Supports basic outlier detection (planned for future updates).
 49 | 
 50 | ### 3. **Correlation Analysis**
 51 | - Generates a correlation matrix to identify relationships between features.
 52 | - Provides heatmaps for better visualization.
 53 | 
 54 | ### 4. **Automatic Visualizations**
 55 | - Supports generating:
 56 |   - Histograms
 57 |   - Box plots
 58 |   - Correlation heatmaps
 59 |   - Scatter plots (planned for future updates).
 60 | 
 61 | ### 5. **Report Generation**
 62 | - Exports EDA results and visualizations into a detailed **HTML report** for easy sharing.
 63 | 
 64 | ---
 65 | 
 66 | ## Installation
 67 | 
 68 | ### Using pip (from source)
 69 | 1. Clone the repository:
 70 |    ```bash
 71 |    git clone https://github.com/gopalakrishnanarjun/pydatalens.git
 72 |    cd pydatalens
 73 |    ```
 74 | 2. Install the package:
 75 |    ```bash
 76 |    pip install -e .
 77 |    ```
 78 | 
 79 | ### Dependencies
 80 | - Python >= 3.6
 81 | - pandas >= 1.0
 82 | - numpy >= 1.18
 83 | - matplotlib >= 3.1
 84 | - seaborn >= 0.11
 85 | 
 86 | Install dependencies manually:
 87 | ```bash
 88 | pip install pandas numpy matplotlib seaborn
 89 | ```
 90 | 
 91 | ---
 92 | 
 93 | ## Quick Start
 94 | 
 95 | ### 1. Import the package
 96 | ```python
 97 | from pydatalens import eda, cleaning, visualizations
 98 | ```
 99 | 
100 | ### 2. Load a dataset
101 | ```python
102 | import pandas as pd
103 | df = pd.read_csv("your_dataset.csv")
104 | ```
105 | 
106 | ### 3. Summarize the dataset
107 | ```python
108 | print(eda.summarize(df))
109 | ```
110 | 
111 | ### 4. Handle missing values
112 | ```python
113 | df_cleaned = cleaning.handle_missing(df, strategy="mean")
114 | ```
115 | 
116 | ### 5. Visualize the data
117 | ```python
118 | visualizations.plot_histogram(df_cleaned, column="age")
119 | visualizations.correlation_heatmap(df_cleaned)
120 | ```
121 | 
122 | ---
123 | 
124 | ## Examples
125 | 
126 | ### Summarizing the Data
127 | ```python
128 | from pydatalens import eda
129 | summary = eda.summarize(df)
130 | print(summary)
131 | ```
132 | 
133 | ### Cleaning the Data
134 | ```python
135 | from pydatalens import cleaning
136 | df = cleaning.handle_missing(df, strategy="median")
137 | df = cleaning.drop_duplicates(df)
138 | ```
139 | 
140 | ### Visualizing the Data
141 | ```python
142 | from pydatalens import visualizations
143 | visualizations.plot_histogram(df, "column_name")
144 | visualizations.correlation_heatmap(df)
145 | ```
146 | 
147 | ---
148 | 
149 | ## Future Enhancements
150 | - Advanced anomaly detection.
151 | - Support for time series analysis.
152 | - Enhanced visualization options (e.g., scatter plots, pair plots).
153 | - Integration with machine learning pipelines.
154 | 
155 | ---
156 | 
157 | ## Contributing
158 | Contributions are welcome! If you'd like to contribute, please fork the repository and submit a pull request.
159 | 
160 | ---
161 | 
162 | ## License
163 | pydatalens is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
164 | 
165 | ---
166 | 


--------------------------------------------------------------------------------
/PyDataLens.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
 1 | LICENSE.txt
 2 | README.md
 3 | setup.py
 4 | PyDataLens.egg-info/PKG-INFO
 5 | PyDataLens.egg-info/SOURCES.txt
 6 | PyDataLens.egg-info/dependency_links.txt
 7 | PyDataLens.egg-info/requires.txt
 8 | PyDataLens.egg-info/top_level.txt
 9 | pydatalens/__init__.py
10 | pydatalens/cleaning.py
11 | pydatalens/eda.py
12 | pydatalens/utils.py
13 | pydatalens/visualizations.py
14 | pydatalens.egg-info/PKG-INFO
15 | pydatalens.egg-info/SOURCES.txt
16 | pydatalens.egg-info/dependency_links.txt
17 | pydatalens.egg-info/requires.txt
18 | pydatalens.egg-info/top_level.txt
19 | tests/__init__.py
20 | tests/test_cleaning.py
21 | tests/test_eda.py
22 | tests/test_visualizations.py


--------------------------------------------------------------------------------
/PyDataLens.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/PyDataLens.egg-info/requires.txt:
--------------------------------------------------------------------------------
1 | pandas
2 | numpy
3 | matplotlib
4 | seaborn
5 | 


--------------------------------------------------------------------------------
/PyDataLens.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | pydatalens
2 | tests
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # pydatalens
  3 | 
  4 | pydatalens is a Python package designed to streamline the process of **Exploratory Data Analysis (EDA)**, **data cleaning**, and **visualization**. 
  5 | It enables data scientists and analysts to quickly prepare, explore, and gain insights from datasets with minimal effort.
  6 | 
  7 | ---
  8 | 
  9 | ## Features
 10 | 
 11 | ### 1. **Smart Summarization**
 12 | - Automatically generates a summary of the dataset, including:
 13 |   - Data types
 14 |   - Missing values
 15 |   - Descriptive statistics
 16 |   - Unique value counts
 17 | 
 18 | ### 2. **Data Cleaning**
 19 | - Detects and handles missing values using various strategies (mean, median, mode).
 20 | - Identifies and removes duplicate rows.
 21 | - Supports basic outlier detection (planned for future updates).
 22 | 
 23 | ### 3. **Correlation Analysis**
 24 | - Generates a correlation matrix to identify relationships between features.
 25 | - Provides heatmaps for better visualization.
 26 | 
 27 | ### 4. **Automatic Visualizations**
 28 | - Supports generating:
 29 |   - Histograms
 30 |   - Box plots
 31 |   - Correlation heatmaps
 32 |   - Scatter plots (planned for future updates).
 33 | 
 34 | ### 5. **Report Generation**
 35 | - Exports EDA results and visualizations into a detailed **HTML report** for easy sharing.
 36 | 
 37 | ---
 38 | 
 39 | ## Installation
 40 | 
 41 | ### Using pip (from source)
 42 | 1. Clone the repository:
 43 |    ```bash
 44 |    git clone https://github.com/gopalakrishnanarjun/pydatalens.git
 45 |    cd pydatalens
 46 |    ```
 47 | 2. Install the package:
 48 |    ```bash
 49 |    pip install -e .
 50 |    ```
 51 | 
 52 | ### Dependencies
 53 | - Python >= 3.6
 54 | - pandas >= 1.0
 55 | - numpy >= 1.18
 56 | - matplotlib >= 3.1
 57 | - seaborn >= 0.11
 58 | 
 59 | Install dependencies manually:
 60 | ```bash
 61 | pip install pandas numpy matplotlib seaborn
 62 | ```
 63 | 
 64 | ---
 65 | 
 66 | ## Quick Start
 67 | 
 68 | ### 1. Import the package
 69 | ```python
 70 | from pydatalens import eda, cleaning, visualizations
 71 | ```
 72 | 
 73 | ### 2. Load a dataset
 74 | ```python
 75 | import pandas as pd
 76 | df = pd.read_csv("your_dataset.csv")
 77 | ```
 78 | 
 79 | ### 3. Summarize the dataset
 80 | ```python
 81 | print(eda.summarize(df))
 82 | ```
 83 | 
 84 | ### 4. Handle missing values
 85 | ```python
 86 | df_cleaned = cleaning.handle_missing(df, strategy="mean")
 87 | ```
 88 | 
 89 | ### 5. Visualize the data
 90 | ```python
 91 | visualizations.plot_histogram(df_cleaned, column="age")
 92 | visualizations.correlation_heatmap(df_cleaned)
 93 | ```
 94 | 
 95 | ---
 96 | 
 97 | ## Examples
 98 | 
 99 | ### Summarizing the Data
100 | ```python
101 | from pydatalens import eda
102 | summary = eda.summarize(df)
103 | print(summary)
104 | ```
105 | 
106 | ### Cleaning the Data
107 | ```python
108 | from pydatalens import cleaning
109 | df = cleaning.handle_missing(df, strategy="median")
110 | df = cleaning.drop_duplicates(df)
111 | ```
112 | 
113 | ### Visualizing the Data
114 | ```python
115 | from pydatalens import visualizations
116 | visualizations.plot_histogram(df, "column_name")
117 | visualizations.correlation_heatmap(df)
118 | ```
119 | 
120 | ---
121 | 
122 | ## Future Enhancements
123 | - Advanced anomaly detection.
124 | - Support for time series analysis.
125 | - Enhanced visualization options (e.g., scatter plots, pair plots).
126 | - Integration with machine learning pipelines.
127 | 
128 | ---
129 | 
130 | ## Contributing
131 | Contributions are welcome! If you'd like to contribute, please fork the repository and submit a pull request.
132 | 
133 | ---
134 | 
135 | ## License
136 | pydatalens is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
137 | 
138 | ---
139 | 


--------------------------------------------------------------------------------
/build/lib/pydatalens/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | PyDataLens: A Python package for automatic EDA, data cleaning, and visualization.
3 | """
4 | 
5 | from .eda import summarize, correlation
6 | from .cleaning import handle_missing, drop_duplicates
7 | from .visualizations import plot_histogram, correlation_heatmap
8 | 


--------------------------------------------------------------------------------
/build/lib/pydatalens/cleaning.py:
--------------------------------------------------------------------------------
 1 | def handle_missing(df, strategy="mean"):
 2 |     """
 3 |     Fills missing values in the DataFrame.
 4 |     Args:
 5 |         strategy: mean, median, or mode.
 6 |     """
 7 |     print(f"Handling missing values using {strategy} strategy...")
 8 |     for column in df.select_dtypes(include=["float", "int"]).columns:
 9 |         if strategy == "mean":
10 |             df[column] = df[column].fillna(df[column].mean())
11 |         elif strategy == "median":
12 |             df[column] = df[column].fillna(df[column].median())
13 |         elif strategy == "mode":
14 |             df[column] = df[column].fillna(df[column].mode()[0])
15 |     return df
16 | 
17 | def drop_duplicates(df):
18 |     """
19 |     Drops duplicate rows.
20 |     """
21 |     print("Dropping duplicate rows...")
22 |     return df.drop_duplicates()
23 | 


--------------------------------------------------------------------------------
/build/lib/pydatalens/eda.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | 
 3 | def summarize(df):
 4 |     """
 5 |     Summarizes the given DataFrame.
 6 |     """
 7 |     print("Generating data summary...")
 8 |     summary = {
 9 |         "Columns": df.columns.tolist(),
10 |         "Data Types": df.dtypes.tolist(),
11 |         "Missing Values": df.isnull().sum().tolist(),
12 |         "Unique Values": df.nunique().tolist(),
13 |     }
14 |     return pd.DataFrame(summary)
15 | 
16 | def correlation(df):
17 |     """
18 |     Generates a correlation matrix.
19 |     """
20 |     print("Calculating correlation matrix...")
21 |     return df.corr()
22 | 


--------------------------------------------------------------------------------
/build/lib/pydatalens/utils.py:
--------------------------------------------------------------------------------
1 | def save_plot(filename):
2 |     """
3 |     Utility function to save a plot to a file.
4 |     """
5 |     import matplotlib.pyplot as plt
6 |     plt.savefig(filename)
7 |     print(f"Plot saved as {filename}.")
8 | 


--------------------------------------------------------------------------------
/build/lib/pydatalens/visualizations.py:
--------------------------------------------------------------------------------
 1 | import seaborn as sns
 2 | import matplotlib.pyplot as plt
 3 | 
 4 | def plot_histogram(df, column):
 5 |     """
 6 |     Plots a histogram for a column.
 7 |     """
 8 |     print(f"Generating histogram for {column}...")
 9 |     sns.histplot(df[column], kde=True)
10 |     plt.title(f"Histogram of {column}")
11 |     plt.show()
12 | 
13 | def correlation_heatmap(df):
14 |     """
15 |     Plots a heatmap of the correlation matrix.
16 |     """
17 |     print("Generating correlation heatmap...")
18 |     sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
19 |     plt.title("Correlation Heatmap")
20 |     plt.show()
21 | 


--------------------------------------------------------------------------------
/build/lib/tests/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | PyDataLens: A Python package for automatic EDA, data cleaning, and visualization.
3 | """
4 | 
5 | from .eda import summarize, correlation
6 | from .cleaning import handle_missing, drop_duplicates
7 | from .visualizations import plot_histogram, correlation_heatmap
8 | 


--------------------------------------------------------------------------------
/build/lib/tests/test_cleaning.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import cleaning
 3 | 
 4 | def test_handle_missing():
 5 |     data = {"A": [1, None, 3]}
 6 |     df = pd.DataFrame(data)
 7 |     cleaned_df = cleaning.handle_missing(df, strategy="mean")
 8 |     assert cleaned_df.isnull().sum().sum() == 0
 9 |     print("Handle missing test passed.")
10 | 
11 | def test_drop_duplicates():
12 |     data = {"A": [1, 1, 2]}
13 |     df = pd.DataFrame(data)
14 |     cleaned_df = cleaning.drop_duplicates(df)
15 |     assert cleaned_df.shape[0] == 2
16 |     print("Drop duplicates test passed.")
17 | 


--------------------------------------------------------------------------------
/build/lib/tests/test_eda.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import eda
 3 | 
 4 | def test_summarize():
 5 |     data = {"A": [1, 2, None], "B": [4, None, 6]}
 6 |     df = pd.DataFrame(data)
 7 |     summary = eda.summarize(df)
 8 |     assert "Columns" in summary.columns
 9 |     print("Summarize test passed.")
10 | 
11 | def test_correlation():
12 |     data = {"A": [1, 2, 3], "B": [4, 5, 6]}
13 |     df = pd.DataFrame(data)
14 |     corr = eda.correlation(df)
15 |     assert corr.shape[0] == corr.shape[1]
16 |     print("Correlation test passed.")
17 | 


--------------------------------------------------------------------------------
/build/lib/tests/test_visualizations.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import visualizations
 3 | 
 4 | def test_plot_histogram():
 5 |     data = {"A": [1, 2, 3, 4, 5]}
 6 |     df = pd.DataFrame(data)
 7 |     visualizations.plot_histogram(df, column="A")
 8 |     print("Histogram test passed.")
 9 | 
10 | def test_correlation_heatmap():
11 |     data = {"A": [1, 2, 3], "B": [4, 5, 6]}
12 |     df = pd.DataFrame(data)
13 |     visualizations.correlation_heatmap(df)
14 |     print("Correlation heatmap test passed.")
15 | 


--------------------------------------------------------------------------------
/dist/pydatalens-0.0.8-py3-none-any.whl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gopalakrishnanarjun/pydatalens/2aa676bd5ab5c0708de1b996c7d9b24e785f7330/dist/pydatalens-0.0.8-py3-none-any.whl


--------------------------------------------------------------------------------
/dist/pydatalens-0.0.8.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gopalakrishnanarjun/pydatalens/2aa676bd5ab5c0708de1b996c7d9b24e785f7330/dist/pydatalens-0.0.8.tar.gz


--------------------------------------------------------------------------------
/docs/INSTALL.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # **pydatalens Installation Guide**
  3 | 
  4 | This guide explains how to install and set up the **pydatalens** package for automatic exploratory data analysis (EDA), data cleaning, and visualization.
  5 | 
  6 | ---
  7 | 
  8 | ## **Prerequisites**
  9 | 
 10 | ### 1. **Python Version**
 11 | - Ensure you have **Python 3.6 or later** installed on your system.
 12 | - You can check your Python version by running:
 13 |   ```bash
 14 |   python --version
 15 |   ```
 16 | 
 17 | ### 2. **Pip**
 18 | - Make sure you have `pip` (Python's package manager) installed and updated:
 19 |   ```bash
 20 |   pip install --upgrade pip
 21 |   ```
 22 | 
 23 | ---
 24 | 
 25 | ## **Installation Steps**
 26 | 
 27 | ### **Step 1: Clone the Repository**
 28 | Download the pydatalens repository from GitHub:
 29 | ```bash
 30 | git clone https://github.com/gopalakrishnanarjun/pydatalens.git
 31 | cd pydatalens
 32 | ```
 33 | 
 34 | ### **Step 2: Create a Virtual Environment (Optional but Recommended)**
 35 | Set up a virtual environment to isolate your Python dependencies:
 36 | ```bash
 37 | python -m venv pydatalens_env
 38 | source pydatalens_env/bin/activate  # On Windows: pydatalens_env\Scripts\activate
 39 | ```
 40 | 
 41 | ### **Step 3: Install the Package**
 42 | Run the following command to install the pydatalens package and its dependencies:
 43 | ```bash
 44 | pip install -e .
 45 | ```
 46 | 
 47 | ---
 48 | 
 49 | ## **Dependencies**
 50 | 
 51 | pydatalens requires the following Python packages:
 52 | 
 53 | - **pandas**: For data manipulation
 54 | - **numpy**: For numerical operations
 55 | - **matplotlib**: For data visualizations
 56 | - **seaborn**: For advanced visualizations
 57 | 
 58 | All required dependencies will be installed automatically during the installation process. If you encounter issues, you can manually install them:
 59 | ```bash
 60 | pip install pandas numpy matplotlib seaborn
 61 | ```
 62 | 
 63 | ---
 64 | 
 65 | ## **Testing the Installation**
 66 | 
 67 | After installation, you can verify the setup by running the following commands:
 68 | 
 69 | ### **1. Check the Installation**
 70 | Run this Python command to check if the package is installed:
 71 | ```bash
 72 | python -c "import pydatalens; print('pydatalens is installed successfully!')"
 73 | ```
 74 | 
 75 | ### **2. Run Tests**
 76 | Navigate to the `tests` folder and run the test scripts:
 77 | ```bash
 78 | cd tests
 79 | python test_eda.py
 80 | python test_cleaning.py
 81 | python test_visualizations.py
 82 | ```
 83 | 
 84 | ---
 85 | 
 86 | ## **Examples**
 87 | 
 88 | Once installed, you can try the package using the example script provided:
 89 | 
 90 | ### **Run the Example**
 91 | Navigate to the `examples` folder and execute the script:
 92 | ```bash
 93 | cd examples
 94 | python example_usage.py
 95 | ```
 96 | 
 97 | ---
 98 | 
 99 | ## **Common Installation Issues**
100 | 
101 | ### **1. Missing Dependencies**
102 | If some dependencies fail to install, try installing them manually:
103 | ```bash
104 | pip install -r requirements.txt
105 | ```
106 | 
107 | ### **2. Permission Issues**
108 | If you encounter permission-related errors, try:
109 | ```bash
110 | pip install --user -e .
111 | ```
112 | 
113 | ---
114 | 
115 | ## **Uninstalling pydatalens**
116 | 
117 | To remove the package from your system:
118 | ```bash
119 | pip uninstall pydatalens
120 | ```
121 | 
122 | ---
123 | 
124 | ## **Support**
125 | 
126 | If you encounter any issues or need help, please open an issue in the GitHub repository:
127 | [pydatalens GitHub Issues](https://github.com/gopalakrishnanarjun/pydatalens/issues)
128 | 
129 | Enjoy using **pydatalens** for your data analysis needs! 🚀
130 | 


--------------------------------------------------------------------------------
/docs/README.md:
--------------------------------------------------------------------------------
1 | # PyDataLens


--------------------------------------------------------------------------------
/docs/USAGE.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # pydatalens Usage Guide
  3 | 
  4 | pydatalens is a Python package designed to simplify exploratory data analysis (EDA), data cleaning, and visualization. This guide provides examples of how to use the package effectively.
  5 | 
  6 | ## Installation
  7 | 
  8 | Ensure you have the required dependencies installed. You can install pydatalens using the following steps:
  9 | 
 10 | 1. Clone the repository:
 11 | 
 12 |    ```bash
 13 |    git clone https://github.com/gopalakrishnanarjun/pydatalens.git
 14 |    cd pydatalens
 15 |    ```
 16 | 
 17 | 2. Install the package:
 18 | 
 19 |    ```bash
 20 |    pip install -e .
 21 |    ```
 22 | 
 23 | ## Getting Started
 24 | 
 25 | ### Importing the Package
 26 | 
 27 | ```python
 28 | from pydatalens import eda, cleaning, visualizations
 29 | ```
 30 | 
 31 | ### Loading Data
 32 | 
 33 | You can load your dataset using Pandas:
 34 | 
 35 | ```python
 36 | import pandas as pd
 37 | 
 38 | # Load your dataset
 39 | df = pd.read_csv("data.csv")
 40 | ```
 41 | 
 42 | ## Features
 43 | 
 44 | ### 1. Summarizing Data
 45 | 
 46 | Get an overview of your dataset using the `summarize` function.
 47 | 
 48 | ```python
 49 | summary = eda.summarize(df)
 50 | print(summary)
 51 | ```
 52 | 
 53 | ### 2. Correlation Analysis
 54 | 
 55 | Calculate and display a correlation matrix.
 56 | 
 57 | ```python
 58 | correlation_matrix = eda.correlation(df)
 59 | print(correlation_matrix)
 60 | ```
 61 | 
 62 | ### 3. Handling Missing Values
 63 | 
 64 | Handle missing values in the dataset using `mean`, `median`, or `mode` strategies.
 65 | 
 66 | ```python
 67 | df_cleaned = cleaning.handle_missing(df, strategy="mean")
 68 | ```
 69 | 
 70 | ### 4. Dropping Duplicates
 71 | 
 72 | Remove duplicate rows from the dataset.
 73 | 
 74 | ```python
 75 | df_unique = cleaning.drop_duplicates(df)
 76 | ```
 77 | 
 78 | ### 5. Visualizations
 79 | 
 80 | #### Histogram
 81 | 
 82 | Generate a histogram for a specific column.
 83 | 
 84 | ```python
 85 | visualizations.plot_histogram(df, column="age")
 86 | ```
 87 | 
 88 | #### Correlation Heatmap
 89 | 
 90 | Generate a heatmap to visualize correlations.
 91 | 
 92 | ```python
 93 | visualizations.correlation_heatmap(df)
 94 | ```
 95 | 
 96 | ## Advanced Features
 97 | 
 98 | ### Generating an EDA Report
 99 | 
100 | Future versions will include a feature to generate an HTML or PDF EDA report.
101 | 
102 | ## Example Usage
103 | 
104 | Here’s a complete example of using pydatalens:
105 | 
106 | ```python
107 | import pandas as pd
108 | from pydatalens import eda, cleaning, visualizations
109 | 
110 | # Load your dataset
111 | data = {"ColumnA": [1, None, 3, 4], "ColumnB": [10, 20, 30, 40]}
112 | df = pd.DataFrame(data)
113 | 
114 | # Summarize the dataset
115 | summary = eda.summarize(df)
116 | print(summary)
117 | 
118 | # Handle missing values
119 | df_cleaned = cleaning.handle_missing(df, strategy="mean")
120 | 
121 | # Drop duplicates
122 | df_unique = cleaning.drop_duplicates(df_cleaned)
123 | 
124 | # Visualize data
125 | visualizations.plot_histogram(df_unique, "ColumnA")
126 | visualizations.correlation_heatmap(df_unique)
127 | ```
128 | 
129 | ## Support
130 | 
131 | If you encounter any issues, feel free to open an issue on GitHub or contact the package maintainer.
132 | 
133 | ---
134 | 
135 | Happy analyzing with pydatalens!
136 | 


--------------------------------------------------------------------------------
/examples/example_usage.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import eda, cleaning, visualizations
 3 | 
 4 | # Example dataset
 5 | data = {"ColumnA": [1, None, 3, 4], "ColumnB": [10, 20, 30, 40]}
 6 | df = pd.DataFrame(data)
 7 | 
 8 | # Summarize
 9 | print(eda.summarize(df))
10 | 
11 | # Clean
12 | df = cleaning.handle_missing(df, strategy="mean")
13 | df = cleaning.drop_duplicates(df)
14 | 
15 | # Visualize
16 | visualizations.plot_histogram(df, "ColumnA")
17 | visualizations.correlation_heatmap(df)
18 | 


--------------------------------------------------------------------------------
/pydatalens/Templates/report_template.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | <head>
 4 |     <title>EDA Report</title>
 5 | </head>
 6 | <body>
 7 |     <h1>Exploratory Data Analysis Report</h1>
 8 |     <h2>Summary</h2>
 9 |     {{ summary_table }}
10 |     <h2>Correlation Heatmap</h2>
11 |     <img src="{{ heatmap_path }}" alt="Correlation Heatmap">
12 | </body>
13 | </html>
14 | 


--------------------------------------------------------------------------------
/pydatalens/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | PyDataLens: A Python package for automatic EDA, data cleaning, and visualization.
3 | """
4 | 
5 | from .eda import summarize, correlation
6 | from .cleaning import handle_missing, drop_duplicates
7 | from .visualizations import plot_histogram, correlation_heatmap
8 | 


--------------------------------------------------------------------------------
/pydatalens/cleaning.py:
--------------------------------------------------------------------------------
 1 | def handle_missing(df, strategy="mean"):
 2 |     """
 3 |     Fills missing values in the DataFrame.
 4 |     Args:
 5 |         strategy: mean, median, or mode.
 6 |     """
 7 |     print(f"Handling missing values using {strategy} strategy...")
 8 |     for column in df.select_dtypes(include=["float", "int"]).columns:
 9 |         if strategy == "mean":
10 |             df[column] = df[column].fillna(df[column].mean())
11 |         elif strategy == "median":
12 |             df[column] = df[column].fillna(df[column].median())
13 |         elif strategy == "mode":
14 |             df[column] = df[column].fillna(df[column].mode()[0])
15 |     return df
16 | 
17 | def drop_duplicates(df):
18 |     """
19 |     Drops duplicate rows.
20 |     """
21 |     print("Dropping duplicate rows...")
22 |     return df.drop_duplicates()
23 | 


--------------------------------------------------------------------------------
/pydatalens/eda.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | 
 3 | def summarize(df):
 4 |     """
 5 |     Summarizes the given DataFrame.
 6 |     """
 7 |     print("Generating data summary...")
 8 |     summary = {
 9 |         "Columns": df.columns.tolist(),
10 |         "Data Types": df.dtypes.tolist(),
11 |         "Missing Values": df.isnull().sum().tolist(),
12 |         "Unique Values": df.nunique().tolist(),
13 |     }
14 |     return pd.DataFrame(summary)
15 | 
16 | def correlation(df):
17 |     """
18 |     Generates a correlation matrix.
19 |     """
20 |     print("Calculating correlation matrix...")
21 |     return df.corr()
22 | 


--------------------------------------------------------------------------------
/pydatalens/utils.py:
--------------------------------------------------------------------------------
1 | def save_plot(filename):
2 |     """
3 |     Utility function to save a plot to a file.
4 |     """
5 |     import matplotlib.pyplot as plt
6 |     plt.savefig(filename)
7 |     print(f"Plot saved as {filename}.")
8 | 


--------------------------------------------------------------------------------
/pydatalens/visualizations.py:
--------------------------------------------------------------------------------
 1 | import seaborn as sns
 2 | import matplotlib.pyplot as plt
 3 | 
 4 | def plot_histogram(df, column):
 5 |     """
 6 |     Plots a histogram for a column.
 7 |     """
 8 |     print(f"Generating histogram for {column}...")
 9 |     sns.histplot(df[column], kde=True)
10 |     plt.title(f"Histogram of {column}")
11 |     plt.show()
12 | 
13 | def correlation_heatmap(df):
14 |     """
15 |     Plots a heatmap of the correlation matrix.
16 |     """
17 |     print("Generating correlation heatmap...")
18 |     sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
19 |     plt.title("Correlation Heatmap")
20 |     plt.show()
21 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas>=1.0
2 | numpy>=1.18
3 | matplotlib>=3.1
4 | seaborn>=0.11
5 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | 
 3 | setup(
 4 |     name="pydatalens",
 5 |     version="1.0.0",
 6 |     description="A Python package for automatic EDA, data cleaning, and visualization.",
 7 |     author='Gopalakrishnan Arjunan',
 8 |     author_email='gopalakrishnana02@gmail.com',
 9 |     long_description=open('README.md', encoding='utf-8').read(),
10 |     long_description_content_type='text/markdown',
11 |     packages=find_packages(),
12 |     install_requires=[
13 |         "pandas",
14 |         "numpy",
15 |         "matplotlib",
16 |         "seaborn",
17 |     ],
18 |     url='https://github.com/gopalakrishnanarjun/pydatalens',  # Update with your GitHub repository URL
19 |     classifiers=[
20 |         'Programming Language :: Python :: 3',
21 |         'License :: OSI Approved :: MIT License',
22 |         'Operating System :: OS Independent',
23 |     ],
24 |     python_requires=">=3.6",
25 | )
26 | 


--------------------------------------------------------------------------------
/tests/__init__.py:
--------------------------------------------------------------------------------
1 | """
2 | PyDataLens: A Python package for automatic EDA, data cleaning, and visualization.
3 | """
4 | 
5 | from .eda import summarize, correlation
6 | from .cleaning import handle_missing, drop_duplicates
7 | from .visualizations import plot_histogram, correlation_heatmap
8 | 


--------------------------------------------------------------------------------
/tests/test_cleaning.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import cleaning
 3 | 
 4 | def test_handle_missing():
 5 |     data = {"A": [1, None, 3]}
 6 |     df = pd.DataFrame(data)
 7 |     cleaned_df = cleaning.handle_missing(df, strategy="mean")
 8 |     assert cleaned_df.isnull().sum().sum() == 0
 9 |     print("Handle missing test passed.")
10 | 
11 | def test_drop_duplicates():
12 |     data = {"A": [1, 1, 2]}
13 |     df = pd.DataFrame(data)
14 |     cleaned_df = cleaning.drop_duplicates(df)
15 |     assert cleaned_df.shape[0] == 2
16 |     print("Drop duplicates test passed.")
17 | 


--------------------------------------------------------------------------------
/tests/test_eda.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import eda
 3 | 
 4 | def test_summarize():
 5 |     data = {"A": [1, 2, None], "B": [4, None, 6]}
 6 |     df = pd.DataFrame(data)
 7 |     summary = eda.summarize(df)
 8 |     assert "Columns" in summary.columns
 9 |     print("Summarize test passed.")
10 | 
11 | def test_correlation():
12 |     data = {"A": [1, 2, 3], "B": [4, 5, 6]}
13 |     df = pd.DataFrame(data)
14 |     corr = eda.correlation(df)
15 |     assert corr.shape[0] == corr.shape[1]
16 |     print("Correlation test passed.")
17 | 


--------------------------------------------------------------------------------
/tests/test_visualizations.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | from pydatalens import visualizations
 3 | 
 4 | def test_plot_histogram():
 5 |     data = {"A": [1, 2, 3, 4, 5]}
 6 |     df = pd.DataFrame(data)
 7 |     visualizations.plot_histogram(df, column="A")
 8 |     print("Histogram test passed.")
 9 | 
10 | def test_correlation_heatmap():
11 |     data = {"A": [1, 2, 3], "B": [4, 5, 6]}
12 |     df = pd.DataFrame(data)
13 |     visualizations.correlation_heatmap(df)
14 |     print("Correlation heatmap test passed.")
15 | 


--------------------------------------------------------------------------------