├── pdfs └── EDA_Checklist.pdf ├── images └── EDA_Checklist.png ├── README.md ├── LICENSE └── EDA_Checklist.md /pdfs/EDA_Checklist.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neuefische/datascience-infographics/HEAD/pdfs/EDA_Checklist.pdf -------------------------------------------------------------------------------- /images/EDA_Checklist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/neuefische/datascience-infographics/HEAD/images/EDA_Checklist.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # datascience-infographics 2 | Collection of infographics related to working in the data science field. 3 | 4 |

The EDA Checklist Explained

5 | 6 | There is no blueprint fits all on to how to do **Exploratory Data Analysis(EDA)**. A lot of EDA work is like detective work, looking for a suspect, following clues and changing direction based on the clues. Getting good at EDA required though practice. In order to help along we created a small checklist for EDA. This is not an exhaustive list of EDA steps, it is though a minimal checklist. 7 | 8 | The infographic is avalible as [pdf](pdfs/EDA_Checklist.pdf) and as [png](images/EDA_Checklist.png). The icons are from [The Noun Project](https://thenounproject.com). 9 | 10 | For a detailed explanation check [EDA_Checklist](EDA_Checklist.md) description. 11 | 12 | ![EDA](images/EDA_Checklist.png?raw=true "EDA Checklist") 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 neuefische GmbH 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /EDA_Checklist.md: -------------------------------------------------------------------------------- 1 |

The EDA Checklist Explained

2 | 3 | 4 | 5 | - [Understanding](#understanding) 6 | - [Hypothesis](#hypothesis) 7 | - [Explore](#explore) 8 | - [Clean](#clean) 9 | - [Relationships](#relationships) 10 | - [Back to the hypothesis](#back-to-the-hypothesis) 11 | - [Fine tune](#fine-tune) 12 | - [Explain](#explain) 13 | 14 | 15 | 16 | 17 | There is no blueprint fits all on to how to do **Exploratory Data Analysis(EDA)**. A lot of EDA work is like detective work, looking for a suspect, following clues and changing direction based on the clues. Getting good at EDA required though practice. In order to help along we created a small checklist for EDA. This is not an exhaustive list of EDA steps, it is though a minimal checklist. 18 | 19 | The infographic is avalible as [pdf](pdfs/EDA_Checklist.pdf) and as [png](images/EDA_Checklist.png). The icons are from [The Noun Project](https://thenounproject.com). 20 | 21 | 22 | ![EDA](images/EDA_Checklist.png?raw=true "EDA Checklist") 23 | 24 | 25 | ## Understanding 26 | What are you trying to achieve? Think of the problem and the data, check the description and get familiarized to the domain. This would be the time you look at the data description, check what columns you have, what types of values. 27 | 28 | ## Hypothesis 29 | It is good practice to write down some assumptions and expectations of the data before you start analyzing. These you can then try to confirm or infirm during the EDA. 30 | 31 | ## Explore 32 | This is the time to explore the data, check for missing data, extreme values, outliers. Look at the usual suspects when you look at values: the appearance of groups, skewness, appearance of unexpected values, where are the data values centered and how widely are values separated. Do all this meet your expectations from your domain knowledge? 33 | 34 | ## Clean 35 | This one is rather clear, deal with extreme values and with missing values. Are the extreme values really outliers? Are the missing values missing for a reason or is it all random? 36 | Do your variables need to be re-expressed in order to make plots clearer? 37 | 38 | ## Relationships 39 | Are your variables correlated? Are the correlations make sense or could they be caused by confounding factors? 40 | 41 | ## Back to the hypothesis 42 | Do not forget to check the hypothesis you made at the beginning, you might need to revisit the exploration, clean and relationships phases several times until you find your answers and are confident that they are correct. 43 | 44 | ## Fine tune 45 | There are several types of plots that are can make a good analysis be hard to read and to follow: plots that are too small/too large/ missing axes information and descriptions, redundant plots and non-relevant plots. 46 | 47 | Remember that plots should be there to either confirm the expected or to show the unexpected. 48 | If you are uncomfortable deleting all the work you did, feel free to put it in an auxiliary notebook and maybe you will open it again.. 49 | 50 | ## Explain 51 | The analysis we usually admire is usually the one that is easy to follow, so why not do the same. Add explanations to your actions, so people reading can follow your thought process, why you did things and how. This can help also in the future or in the present, as you might discover you missed something. 52 | A good practice is to keep track of this information while you do the analysis, you can even write it all down on paper and at the end put the important things in your analysis in the right place. 53 | 54 | ---- 55 | 56 | 57 | --------------------------------------------------------------------------------