├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── README.md ├── app.py ├── ds.jpg └── requirements.txt /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Citizen Code of Conduct 2 | 3 | ## 1. Purpose 4 | 5 | A primary goal of Notebooker Pro is to be inclusive to the largest number of contributors, with the most varied and diverse backgrounds possible. As such, we are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, ability, ethnicity, socioeconomic status, and religion (or lack thereof). 6 | 7 | This code of conduct outlines our expectations for all those who participate in our community, as well as the consequences for unacceptable behavior. 8 | 9 | We invite all those who participate in Notebooker Pro to help us create safe and positive experiences for everyone. 10 | 11 | ## 2. Open [Source/Culture/Tech] Citizenship 12 | 13 | A supplemental goal of this Code of Conduct is to increase open [source/culture/tech] citizenship by encouraging participants to recognize and strengthen the relationships between our actions and their effects on our community. 14 | 15 | Communities mirror the societies in which they exist and positive action is essential to counteract the many forms of inequality and abuses of power that exist in society. 16 | 17 | If you see someone who is making an extra effort to ensure our community is welcoming, friendly, and encourages all participants to contribute to the fullest extent, we want to know. 18 | 19 | ## 3. Expected Behavior 20 | 21 | The following behaviors are expected and requested of all community members: 22 | 23 | * Participate in an authentic and active way. In doing so, you contribute to the health and longevity of this community. 24 | * Exercise consideration and respect in your speech and actions. 25 | * Attempt collaboration before conflict. 26 | * Refrain from demeaning, discriminatory, or harassing behavior and speech. 27 | * Be mindful of your surroundings and of your fellow participants. Alert community leaders if you notice a dangerous situation, someone in distress, or violations of this Code of Conduct, even if they seem inconsequential. 28 | * Remember that community event venues may be shared with members of the public; please be respectful to all patrons of these locations. 29 | 30 | ## 4. Unacceptable Behavior 31 | 32 | The following behaviors are considered harassment and are unacceptable within our community: 33 | 34 | * Violence, threats of violence or violent language directed against another person. 35 | * Sexist, racist, homophobic, transphobic, ableist or otherwise discriminatory jokes and language. 36 | * Posting or displaying sexually explicit or violent material. 37 | * Posting or threatening to post other people's personally identifying information ("doxing"). 38 | * Personal insults, particularly those related to gender, sexual orientation, race, religion, or disability. 39 | * Inappropriate photography or recording. 40 | * Inappropriate physical contact. You should have someone's consent before touching them. 41 | * Unwelcome sexual attention. This includes, sexualized comments or jokes; inappropriate touching, groping, and unwelcomed sexual advances. 42 | * Deliberate intimidation, stalking or following (online or in person). 43 | * Advocating for, or encouraging, any of the above behavior. 44 | * Sustained disruption of community events, including talks and presentations. 45 | 46 | ## 5. Weapons Policy 47 | 48 | No weapons will be allowed at Notebooker Pro events, community spaces, or in other spaces covered by the scope of this Code of Conduct. Weapons include but are not limited to guns, explosives (including fireworks), and large knives such as those used for hunting or display, as well as any other item used for the purpose of causing injury or harm to others. Anyone seen in possession of one of these items will be asked to leave immediately, and will only be allowed to return without the weapon. Community members are further expected to comply with all state and local laws on this matter. 49 | 50 | ## 6. Consequences of Unacceptable Behavior 51 | 52 | Unacceptable behavior from any community member, including sponsors and those with decision-making authority, will not be tolerated. 53 | 54 | Anyone asked to stop unacceptable behavior is expected to comply immediately. 55 | 56 | If a community member engages in unacceptable behavior, the community organizers may take any action they deem appropriate, up to and including a temporary ban or permanent expulsion from the community without warning (and without refund in the case of a paid event). 57 | 58 | ## 7. Reporting Guidelines 59 | 60 | If you are subject to or witness unacceptable behavior, or have any other concerns, please notify a community organizer as soon as possible. mainakc24365@gmail.com. 61 | 62 | 63 | 64 | Additionally, community organizers are available to help community members engage with local law enforcement or to otherwise help those experiencing unacceptable behavior feel safe. In the context of in-person events, organizers will also provide escorts as desired by the person experiencing distress. 65 | 66 | ## 8. Addressing Grievances 67 | 68 | If you feel you have been falsely or unfairly accused of violating this Code of Conduct, you should notify MainakRepositor with a concise description of your grievance. Your grievance will be handled in accordance with our existing governing policies. 69 | 70 | Software 71 | 72 | ## 9. Scope 73 | 74 | We expect all community participants (contributors, paid or otherwise; sponsors; and other guests) to abide by this Code of Conduct in all community venues--online and in-person--as well as in all one-on-one communications pertaining to community business. 75 | 76 | This code of conduct and its related procedures also applies to unacceptable behavior occurring outside the scope of community activities when such behavior has the potential to adversely affect the safety and well-being of community members. 77 | 78 | ## 10. Contact info 79 | 80 | mainakc24365@gmail.com 81 | 82 | ## 11. License and attribution 83 | 84 | The Citizen Code of Conduct is distributed by [Stumptown Syndicate](http://stumptownsyndicate.org) under a [Creative Commons Attribution-ShareAlike license](http://creativecommons.org/licenses/by-sa/3.0/). 85 | 86 | Portions of text derived from the [Django Code of Conduct](https://www.djangoproject.com/conduct/) and the [Geek Feminism Anti-Harassment Policy](http://geekfeminism.wikia.com/wiki/Conference_anti-harassment/Policy). 87 | 88 | _Revision 2.3. Posted 6 March 2017._ 89 | 90 | _Revision 2.2. Posted 4 February 2016._ 91 | 92 | _Revision 2.1. Posted 23 June 2014._ 93 | 94 | _Revision 2.0, adopted by the [Stumptown Syndicate](http://stumptownsyndicate.org) board on 10 January 2013. Posted 17 March 2013._ 95 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | NO ONE APART FROM ME CAN CONTRIBUTE IN THIS PROJECT 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Notebooker-Pro 2 | The best notebook maker 3 | 4 | ## Created, Patented and All Rights Reserved by Mainak Chaudhuri 5 | 6 | [View in Youtube](https://www.youtube.com/watch?v=AVg7yqMXWqk&t=17s) 7 | 8 |
9 | 10 | ## Abstract: 11 | 12 | The main idea of the software named, “Notebooker Pro” is to make the task of preparing a data science notebook, super easy and fast. With this software, any person can quickly get the insights about a data set displayed in front of him. This software also provides effective visualization facilities with 6 types of commonly used graphs. Anyone can also compare through 30 different Machine Learning models of both regression and classification types to choose the best model that can provide the highest accuracy for a data set of a given size and split ratio, (which can also be set by the user as per his/her requirements). 13 | 14 |
15 | 16 | ## Keywords: 17 | 18 | Data Analysis, Machine Learning, Data Visualization, Model Building, Web Application, Data Science Notebooks, Education, Speed of Development. 19 | 20 |
21 | 22 | ## Primary issues with existing techs: 23 | 24 |
    25 |
  1. Manual planning : For making a good notebook, the creator needs to plan it well. The notebook must be informative and include all necessary details yet not over-burdening it with a flood of information which are irrelevant.
  2. 26 |
  3. Time taken to code : Once the planning is done, now comes the most tedious and brain-storming part, that is to code the notebook into existence. This can take hours, and also result in aches in various body parts and a bit of monotony, working for the same thing for so long.
  4. 27 |
  5. Finding a good accuracy : It is definitely a hard job to find the appropriate train-test-split ratio in order to increase the accuracy of the model. At times, due to a poor size selection, the test set and the train set data do not agree with each other, resulting in high training accuracy but low test set accuracy.
  6. 28 |
  7. 4.Messing up with syntax to make a proper graph : Making a good graph with proper axes/parameters is also a great challenge for a beginner or a person with a limited time frame.
  8. 29 |
30 | 31 |
32 | 33 | ## UML Diagram 34 | 35 | 36 | 37 |
38 | 39 | ## WHAT IS NOTEBOOKER PRO 40 | 41 | The notebooker pro is a user-friendly software designed to help you make a good data science notebook in few steps. 42 | Well, notebooker pro will not be making a notebook for you, but will provide you with all the data insights that you 43 | will need to put in your kernel. The notebooker pro has been provided with 4 major sections: 44 | 45 | i. EDA (Explanatory Data Analysis) --> used to find important data and statistical insights from the uploaded files 46 | 47 | ii. Visualization --> Used to perform data visualization with 5 basic important types of graphs 48 | 49 | iii.Regression --> Loops through 30 different regression models and returns the complexity statistics of the result 50 | of regression modelling for your dataset for chosen seed values and size. The only thing to keep in 51 | mind while using this is that, the data must be fitting with a regression modelling. Datasets used 52 | for classification algorithm might generate vague results. So use a proper dataset. 53 | **[eg.: do not use iris,cancer,penguins etc. classifier dataset]** 54 | 55 | iv. Classification --> Loops through 30 different classification models and returns the complexity statistics of the result 56 | of classification modelling for your dataset for chosen seed values and size. The only thing to keep in 57 | mind while using this is that, the data must be fitting with a classification modelling. Datasets used 58 | for non-classification algorithm might generate vague results. So use a proper dataset. 59 | 60 | 61 | Features: 62 | Upload file => Upload only csv files. 63 | Data split => This is a linear slidebar, that will let you choose split ratio between 0 to 1 64 | Random seed => Helps to randomize the data in training and testing data samples. 65 | You may change to get the best accuracy of for a particular model. 66 | 67 | 68 | **You do not necessarily need to know coding to use this webapp** 69 | 70 |
71 | 72 | ## Light Mode 73 | 74 | 75 | 76 |
77 | 78 | ## Dark Mode 79 | 80 | 81 | 82 |
83 | 84 | 85 | 86 | ![ss1](https://user-images.githubusercontent.com/64016811/114223200-27acb300-998d-11eb-8cc5-2e4865102971.jpg) 87 | 88 | 89 | 90 | ## The site is live at : [here](https://share.streamlit.io/mainakrepositor/notebooker-pro/app.py) 91 | 92 | 93 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import streamlit as st 2 | import numpy as np 3 | import pandas as pd 4 | import plotly_express as px 5 | from sklearn import * 6 | from lazypredict.Supervised import LazyRegressor 7 | from lazypredict.Supervised import LazyClassifier 8 | from sklearn.preprocessing import LabelEncoder 9 | import matplotlib.pyplot as plt 10 | import seaborn as sns 11 | import base64 12 | import io 13 | import webbrowser 14 | from sklearn.model_selection import train_test_split 15 | from sklearn.datasets import load_diabetes 16 | from PIL import Image 17 | 18 | 19 | image = Image.open('ds.jpg') 20 | st.image(image,use_column_width=True) 21 | 22 | def main(): 23 | activities = ['EDA','Visualization','Regression','Classification','Documentation','About Us'] 24 | #st.sidebar.success('Updates Coming Soon! 🌟🎉') 25 | option=st.sidebar.selectbox('Choose a section',activities) 26 | st.sidebar.markdown('''Use this section for finding useful insights about your data,and feel free to use them in your notebooks 27 | 28 | 🎯 Version : 1.0.2 ''') 29 | 30 | 31 | if option == 'EDA': 32 | st.subheader("Explanatory Data Analysis") 33 | 34 | data=st.file_uploader("Please upload a CSV dataset ",type=['csv']) 35 | 36 | 37 | 38 | st.warning('Your dataset goes here...') 39 | if data is not None: 40 | df=pd.read_csv(data) 41 | st.dataframe(df) 42 | st.info('Some useful data insights about your data') 43 | if st.checkbox("Display shape"): 44 | r,c = df.shape 45 | st.write('Rows = ',r,'Columns = ',c) 46 | 47 | if st.checkbox('Display columns'): 48 | st.write(df.columns) 49 | 50 | if st.checkbox('Select multiple columns'): 51 | selected_col = st.multiselect('Select preferred columns',df.columns) 52 | df1 = df[selected_col] 53 | st.dataframe(df1) 54 | 55 | if st.checkbox("Head"): 56 | st.write(df.head()) 57 | 58 | if st.checkbox('Tail'): 59 | st.write(df.tail()) 60 | 61 | if st.checkbox('Null values'): 62 | st.write(df.isnull().sum()) 63 | 64 | if st.checkbox('Data types'): 65 | st.write(df.dtypes) 66 | 67 | if st.checkbox('Random sample'): 68 | st.write(df.sample(20)) 69 | 70 | if st.checkbox('Display correlations'): 71 | st.write(df.corr()) 72 | 73 | if st.checkbox('Summary'): 74 | st.write(df.describe(include='all').T) 75 | 76 | 77 | 78 | 79 | elif option == 'Visualization': 80 | st.subheader("Data Visualization and Graphing") 81 | 82 | st.sidebar.subheader("File Upload") 83 | 84 | # Setup file upload 85 | uploaded_file = st.sidebar.file_uploader( 86 | label="Upload your CSV file. (200MB max)", 87 | type=['csv']) 88 | 89 | 90 | if uploaded_file is not None: 91 | st.success('Your data goes here') 92 | 93 | 94 | try: 95 | df = pd.read_csv(uploaded_file) 96 | except Exception as e: 97 | st.warning('Data not found') 98 | 99 | 100 | 101 | global numeric_columns 102 | global non_numeric_columns 103 | try: 104 | st.write(df) 105 | numeric_columns = list(df.select_dtypes(['float', 'int']).columns) 106 | non_numeric_columns = list(df.select_dtypes(['object']).columns) 107 | non_numeric_columns.append(None) 108 | print(non_numeric_columns) 109 | except Exception as e: 110 | print(e) 111 | 112 | 113 | 114 | chart_select = st.sidebar.selectbox( 115 | label="Select the chart type", 116 | options=['Scatterplots', 'Lineplots', 'Histogram', 'Boxplot','Violinplot','Piechart'] 117 | ) 118 | 119 | st.info('The Graphs generated will be displayed here') 120 | 121 | if chart_select == 'Scatterplots': 122 | st.sidebar.subheader("Scatterplot Settings") 123 | try: 124 | x_values = st.sidebar.selectbox('X axis', options=numeric_columns) 125 | y_values = st.sidebar.selectbox('Y axis', options=numeric_columns) 126 | color_value = st.sidebar.selectbox("Color", options=non_numeric_columns) 127 | plot = px.scatter(data_frame=df, x=x_values, y=y_values, color=color_value) 128 | # display the chart 129 | st.plotly_chart(plot) 130 | except Exception as e: 131 | print(e) 132 | 133 | if chart_select == 'Lineplots': 134 | st.sidebar.subheader("Line Plot Settings") 135 | try: 136 | x_values = st.sidebar.selectbox('X axis', options=numeric_columns) 137 | y_values = st.sidebar.selectbox('Y axis', options=numeric_columns) 138 | color_value = st.sidebar.selectbox("Color", options=non_numeric_columns) 139 | plot = px.line(data_frame=df, x=x_values, y=y_values, color=color_value) 140 | st.plotly_chart(plot) 141 | except Exception as e: 142 | print(e) 143 | 144 | if chart_select == 'Histogram': 145 | st.sidebar.subheader("Histogram Settings") 146 | try: 147 | x = st.sidebar.selectbox('Feature', options=numeric_columns) 148 | bin_size = st.sidebar.slider("Number of Bins", min_value=10, 149 | max_value=100, value=40) 150 | color_value = st.sidebar.selectbox("Color", options=non_numeric_columns) 151 | plot = px.histogram(x=x, data_frame=df, color=color_value) 152 | st.plotly_chart(plot) 153 | except Exception as e: 154 | print(e) 155 | 156 | if chart_select == 'Boxplot': 157 | st.sidebar.subheader("Boxplot Settings") 158 | try: 159 | y = st.sidebar.selectbox("Y axis", options=numeric_columns) 160 | x = st.sidebar.selectbox("X axis", options=non_numeric_columns) 161 | color_value = st.sidebar.selectbox("Color", options=non_numeric_columns) 162 | plot = px.box(data_frame=df, y=y, x=x, color=color_value) 163 | st.plotly_chart(plot) 164 | except Exception as e: 165 | print(e) 166 | 167 | if chart_select == 'Piechart': 168 | st.sidebar.subheader("Piechart Settings") 169 | try: 170 | x_values = st.sidebar.selectbox('X axis', options=numeric_columns) 171 | y_values = st.sidebar.selectbox('Y axis', options=non_numeric_columns) 172 | plot = px.pie(data_frame=df, values=x_values, names=y_values) 173 | st.plotly_chart(plot) 174 | 175 | except Exception as e: 176 | print(e) 177 | 178 | if chart_select == 'Violinplot': 179 | st.sidebar.subheader("Violin Plot Settings") 180 | try: 181 | x_values = st.sidebar.selectbox('X axis', options=numeric_columns) 182 | y_values = st.sidebar.selectbox('Y axis', options=numeric_columns) 183 | color_value = st.sidebar.selectbox("Color", options=non_numeric_columns) 184 | plot = px.violin(data_frame=df, x=x_values, y=y_values, color=color_value) 185 | st.plotly_chart(plot) 186 | except Exception as e: 187 | print(e) 188 | 189 | elif option == 'Regression': 190 | st.subheader("Regression ML Model Builder") 191 | 192 | # Model building 193 | def build_model(df): 194 | l = len(df) 195 | 196 | #df = df.iloc[:100] 197 | X = df.iloc[:,:-1] # Using all column except for the last column as X 198 | Y = df.iloc[:,-1] # Selecting the last column as Y 199 | 200 | 201 | st.markdown('**1.2. Dataset dimension**') 202 | st.write('X (Independent Axis)') 203 | st.info(X.shape) 204 | st.write('Y (Dependent Axis)') 205 | st.info(Y.shape) 206 | 207 | st.markdown('**1.3. Variable details**:') 208 | st.write('X variable (first few are shown)') 209 | st.info(list(X.columns[:int(l/5)])) 210 | st.write('Y variable') 211 | st.info(Y.name) 212 | 213 | # Build lazy model 214 | X_train, X_test, Y_train, Y_test = train_test_split(X, Y,test_size = split_size,random_state = seed_number) 215 | reg = LazyRegressor(verbose=0,ignore_warnings=False, custom_metric=None) 216 | models_train,predictions_train = reg.fit(X_train, X_train, Y_train, Y_train) 217 | models_test,predictions_test = reg.fit(X_train, X_test, Y_train, Y_test) 218 | 219 | st.subheader('2.Model Performance Plot (Training Set)') 220 | 221 | st.write('Training set') 222 | st.write(predictions_train) 223 | st.markdown(filedownload(predictions_train,'training.csv'), unsafe_allow_html=True) 224 | 225 | st.write('Test set') 226 | st.write(predictions_test) 227 | st.markdown(filedownload(predictions_test,'test.csv'), unsafe_allow_html=True) 228 | 229 | st.subheader('3.Model Performance Plot(Test set)') 230 | 231 | 232 | with st.markdown('**R-squared**'): 233 | # Tall 234 | predictions_test["R-Squared"] = [0 if i < 0 else i for i in predictions_test["R-Squared"] ] 235 | plt.figure(figsize=(3, 9)) 236 | sns.set_theme(style="darkgrid") 237 | ax1 = sns.barplot(y=predictions_test.index, x="R-Squared", data=predictions_test) 238 | ax1.set(xlim=(0, 1)) 239 | st.markdown(imagedownload(plt,'plot-r2-tall.pdf'), unsafe_allow_html=True) 240 | # Wide 241 | plt.figure(figsize=(12, 3)) 242 | sns.set_theme(style="darkgrid") 243 | ax1 = sns.barplot(x=predictions_test.index, y="R-Squared", data=predictions_test) 244 | ax1.set(ylim=(0, 1)) 245 | plt.xticks(rotation=90) 246 | st.pyplot(plt) 247 | st.markdown(imagedownload(plt,'plot-r2-wide.pdf'), unsafe_allow_html=True) 248 | 249 | with st.markdown('**RMSE (capped at l/2)**'): 250 | # Tall 251 | predictions_test["RMSE"] = [(l/2) if i > (l/2) else i for i in predictions_test["RMSE"] ] 252 | plt.figure(figsize=(3, 9)) 253 | sns.set_theme(style="darkgrid") 254 | ax2 = sns.barplot(y=predictions_test.index, x="RMSE", data=predictions_test) 255 | st.markdown(imagedownload(plt,'plot-rmse-tall.pdf'), unsafe_allow_html=True) 256 | # Wide 257 | plt.figure(figsize=(12, 3)) 258 | sns.set_theme(style="darkgrid") 259 | ax2 = sns.barplot(x=predictions_test.index, y="RMSE", data=predictions_test) 260 | plt.xticks(rotation=90) 261 | st.pyplot(plt) 262 | st.markdown(imagedownload(plt,'plot-rmse-wide.pdf'), unsafe_allow_html=True) 263 | 264 | with st.markdown('**Calculation time**'): 265 | # Tall 266 | predictions_test["Time Taken"] = [0 if i < 0 else i for i in predictions_test["Time Taken"] ] 267 | plt.figure(figsize=(3, 9)) 268 | sns.set_theme(style="darkgrid") 269 | ax3 = sns.barplot(y=predictions_test.index, x="Time Taken", data=predictions_test) 270 | st.markdown(imagedownload(plt,'plot-calculation-time-tall.pdf'), unsafe_allow_html=True) 271 | # Wide 272 | plt.figure(figsize=(9, 3)) 273 | sns.set_theme(style="darkgrid") 274 | ax3 = sns.barplot(x=predictions_test.index, y="Time Taken", data=predictions_test) 275 | plt.xticks(rotation=90) 276 | st.pyplot(plt) 277 | st.markdown(imagedownload(plt,'plot-calculation-time-wide.pdf'), unsafe_allow_html=True) 278 | 279 | 280 | def filedownload(df, filename): 281 | csv = df.to_csv(index=False) 282 | b64 = base64.b64encode(csv.encode()).decode() # strings <-> bytes conversions 283 | href = f'Download {filename} File' 284 | return href 285 | 286 | def imagedownload(plt, filename): 287 | s = io.BytesIO() 288 | plt.savefig(s, format='pdf', bbox_inches='tight') 289 | plt.close() 290 | b64 = base64.b64encode(s.getvalue()).decode() # strings <-> bytes conversions 291 | href = f'Download {filename} File' 292 | return href 293 | 294 | 295 | 296 | with st.sidebar.header('File Uploader Section'): 297 | uploaded_file = st.sidebar.file_uploader("Upload an input as CSV file", type=["csv"]) 298 | 299 | 300 | 301 | with st.sidebar.header('Set the optimization parameters\n (Grab the slider and set to any suitable point)'): 302 | 303 | split_size = st.sidebar.slider('Data split ratio (in fraction):', 0.0, 1.0, 0.7, 0.01) 304 | seed_number = st.sidebar.slider('Set the random-seed-value :', 0, 1, 100, 5) 305 | 306 | with st.sidebar.header('Project made by:'): 307 | st.write("Made by: MAINAK CHAUDHURI") 308 | 309 | #---------------------------------# 310 | 311 | st.subheader('Dataset display') 312 | 313 | 314 | 315 | if uploaded_file is not None: 316 | df = pd.read_csv(uploaded_file) 317 | st.markdown('**Snap of the dataset**') 318 | st.write(df) 319 | build_model(df) 320 | else: 321 | st.info('Upload a file') 322 | st.info('OR') 323 | if st.button('Use preloaded data instead'): 324 | st.info("Dataset used : Pima diabetes") 325 | 326 | 327 | diabetes = load_diabetes() 328 | 329 | X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names).loc[:100] 330 | Y = pd.Series(diabetes.target, name='response').loc[:100] 331 | df = pd.concat( [X,Y], axis=1 ) 332 | 333 | st.markdown('Displaying results form a sample preloaded data :') 334 | st.write(df.head(5)) 335 | 336 | build_model(df) 337 | 338 | elif option == 'Classification': 339 | st.subheader("Classifier ML Model Builder") 340 | 341 | def build_model(df): 342 | l = len(df) 343 | 344 | #df = df.iloc[:100] 345 | X = df.iloc[:,:-1] # Using all column except for the last column as X 346 | Y = df.iloc[:,-1] # Selecting the last column as Y 347 | 348 | 349 | st.markdown('**1.2. Dataset dimension**') 350 | st.write('X (Independent Axis)') 351 | st.info(X.shape) 352 | st.write('Y (Dependent Axis)') 353 | st.info(Y.shape) 354 | 355 | st.markdown('**1.3. Variable details**:') 356 | st.write('X variable (first few are shown)') 357 | st.info(list(X.columns[:int(l/5)])) 358 | st.write('Y variable') 359 | st.info(Y.name) 360 | 361 | # Build lazy model 362 | X_train, X_test, Y_train, Y_test = train_test_split(X, Y,test_size = split_size,random_state = seed_number) 363 | clf = LazyClassifier(verbose=0,ignore_warnings=False, custom_metric=None) 364 | models_train,predictions_train = clf.fit(X_train, X_train, Y_train, Y_train) 365 | models_test,predictions_test = clf.fit(X_train, X_test, Y_train, Y_test) 366 | 367 | st.subheader('2.Model Performance Plot (Training Set)') 368 | 369 | st.write('Training set') 370 | st.write(predictions_train) 371 | st.markdown(filedownload(predictions_train,'training.csv'), unsafe_allow_html=True) 372 | 373 | st.write('Test set') 374 | st.write(predictions_test) 375 | st.markdown(filedownload(predictions_test,'test.csv'), unsafe_allow_html=True) 376 | 377 | st.subheader('3.Model Performance Plot(Test set)') 378 | 379 | 380 | with st.markdown('**Accuracy**'): 381 | # Tall 382 | predictions_test["Accuracy"] = [0 if i < 0 else i for i in predictions_test["Accuracy"] ] 383 | plt.figure(figsize=(5, 12)) 384 | sns.set_theme(style="darkgrid") 385 | ax1 = sns.barplot(y=predictions_test.index, x="Accuracy", data=predictions_test) 386 | ax1.set(xlim=(0, 1)) 387 | st.markdown(imagedownload(plt,'plot-r2-tall.pdf'), unsafe_allow_html=True) 388 | # Wide 389 | plt.figure(figsize=(12, 5)) 390 | sns.set_theme(style="darkgrid") 391 | ax1 = sns.barplot(x=predictions_test.index, y="Accuracy", data=predictions_test) 392 | ax1.set(ylim=(0, 1)) 393 | plt.xticks(rotation=90) 394 | st.pyplot(plt) 395 | st.markdown(imagedownload(plt,'plot-r2-wide.pdf'), unsafe_allow_html=True) 396 | 397 | 398 | 399 | def filedownload(df, filename): 400 | csv = df.to_csv(index=False) 401 | b64 = base64.b64encode(csv.encode()).decode() # strings <-> bytes conversions 402 | href = f'Download {filename} File' 403 | return href 404 | 405 | def imagedownload(plt, filename): 406 | s = io.BytesIO() 407 | plt.savefig(s, format='pdf', bbox_inches='tight') 408 | plt.close() 409 | b64 = base64.b64encode(s.getvalue()).decode() # strings <-> bytes conversions 410 | href = f'Download {filename} File' 411 | return href 412 | 413 | 414 | 415 | with st.sidebar.header('File Uploader Section'): 416 | uploaded_file = st.sidebar.file_uploader("Upload an input as CSV file", type=["csv"]) 417 | 418 | 419 | 420 | with st.sidebar.header('Set the optimization parameters\n (Grab the slider and set to any suitable point)'): 421 | 422 | split_size = st.sidebar.slider('Data split ratio (in fraction):', 0.0, 1.0, 0.7, 0.01) 423 | seed_number = st.sidebar.slider('Set the random-seed-value :', 0, 1, 100, 5) 424 | 425 | with st.sidebar.header('Project made by:'): 426 | st.write("Made by: MAINAK CHAUDHURI") 427 | 428 | #---------------------------------# 429 | 430 | st.subheader('Dataset display') 431 | 432 | 433 | 434 | if uploaded_file is not None: 435 | df = pd.read_csv(uploaded_file) 436 | st.markdown('**Snap of the dataset**') 437 | st.write(df) 438 | build_model(df) 439 | else: 440 | st.info('Upload a file') 441 | st.info('OR') 442 | if st.button('Use preloaded data instead'): 443 | st.info("Dataset used : Pima diabetes") 444 | 445 | 446 | diabetes = load_diabetes() 447 | 448 | X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names).loc[:100] 449 | Y = pd.Series(diabetes.target, name='response').loc[:100] 450 | df = pd.concat( [X,Y], axis=1 ) 451 | 452 | st.markdown('Displaying results form a sample preloaded data :') 453 | st.write(df.head(5)) 454 | 455 | build_model(df) 456 | 457 | elif option == 'Documentation': 458 | st.subheader("How to use Notebooker Pro") 459 | st.markdown('''The notebooker pro is a user-friendly software designed to help you make a good data science notebook in few steps. 460 | Well, notebooker pro will not be making a notebook for you, but will provide you with all the data insights that you 461 | will need to put in your kernel. The notebooker pro has been provided with 4 major sections: 462 | 463 | i. **EDA (Explanatory Data Analysis)** --> used to find important data and statistical insights from the uploaded files 464 | 465 | ii. **Visualization** --> Used to perform data visualization with 5 basic important types of graphs 466 | 467 | iii.**Regression** --> Loops through **30** different regression models and returns the complexity statistics of the result 468 | of regression modelling for your dataset for chosen seed values and size. The only thing to keep in 469 | mind while using this is that, the data must be fitting with a regression modelling. Datasets used 470 | for classification algorithm might generate vague results. So use a proper dataset. 471 | **[eg.: do not use iris,cancer,penguins etc. classifier dataset]** 472 | 473 | iv. **Classification** --> Loops through **30** different classification models and returns the complexity statistics of the result 474 | of classification modelling for your dataset for chosen seed values and size. The only thing to keep in 475 | mind while using this is that, the data must be fitting with a classification modelling. Datasets used 476 | for non-classification algorithm might generate vague results. So use a proper dataset. 477 | 478 | 479 | **Features:** 480 | 481 | **Upload file** => Upload only csv files. 482 | 483 | **Data split** => This is a linear slidebar, that will let you choose split ratio between 0 to 1 484 | 485 | **Random seed** => Helps to randomize the data in training and testing data samples. 486 | You may change to get the best accuracy of for a particular model. 487 | ''') 488 | 489 | elif option == 'About Us': 490 | st.subheader("About Us 😊") 491 | st.markdown('''This web application is made by Mainak Chaudhuri. He is a Computer Science and Engineering student of the SRM University, studying in the second year of B.Tech. The main idea of this application is to help beginners and data science enthusiasts chalk out a plan for preparing a good data science notebook, for college projects, online courses or to add in their portfolio. This application accepts a dataset from the user and displays useful insights about the data. Additionally, it also helps the user visualize the data, choose the best supervised machine learning model (regression & classifaction handled separately) and decide the best suit depending on the dataset size,split and seed values which can be set by the user with the help of the side panel. This application claims to be the first of it's kind ever developed till date by a single developer and also has a serving history and positive reports from 180+ users. 492 | 493 | 494 | 👉 N.B. : This application is an intellectual property of Mainak Chaudhuri and hence holds a reserved copyright. Any form of illegal immitation of graphics, contents or documentation without prior permission of the owner if proved, can result in legal actions against the plagiarist.''') 495 | 496 | st.success('For more info, feel free to contact @ : ') 497 | url = 'https://www.linkedin.com/in/mainak-chaudhuri-127898176/' 498 | 499 | if st.button('Mainak Chaudhuri'): 500 | webbrowser.open_new_tab(url) 501 | 502 | if __name__ == '__main__': 503 | main() 504 | 505 | 506 | 507 | 508 | -------------------------------------------------------------------------------- /ds.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MainakRepositor/Notebooker-Pro/ca35eb7530fc1f463ea6c9c21a98951bd215081b/ds.jpg -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | streamlit==0.79.0 2 | pandas==1.1.3 3 | base58==2.0.1 4 | numpy==1.19.2 5 | pillow==8.0.1 6 | plotly==4.14.1 7 | scikit-learn==0.23.2 8 | lazypredict==0.2.7 9 | seaborn==0.11.1 10 | matplotlib==3.3.3 11 | xgboost==1.1.1 12 | lightgbm==2.3.1 13 | pytest==5.4.3 14 | tqdm==4.56.0 15 | seaborn>=0.10.1 16 | jupyterlab==2.2.6 17 | bokeh==2.2.1 18 | kaleido 19 | plotly.express 20 | --------------------------------------------------------------------------------