├── .gitignore ├── Homework 1 ├── Homework1.md ├── ikea_data.csv └── img │ ├── ikea-logo.png │ ├── task4.png │ ├── task5.png │ ├── task6.png │ └── task7.png ├── Homework 2 ├── Homework2.md ├── Viet Nam-1975.csv ├── Viet Nam-1980.csv ├── Viet Nam-1985.csv ├── Viet Nam-1990.csv ├── Viet Nam-1995.csv ├── Viet Nam-2000.csv ├── Viet Nam-2005.csv ├── Viet Nam-2010.csv ├── Viet Nam-2015.csv ├── Viet Nam-2020.csv ├── Viet Nam-2025.csv ├── college_data_normalized.csv └── img │ ├── banner.png │ ├── task1.png │ ├── task2.png │ ├── task3.png │ ├── task4.png │ ├── task5.png │ └── task6.png ├── Homework 3 ├── Homework3.md ├── data │ ├── dog_travel.csv │ ├── median-housing 1.csv │ └── scorecard-models.RData └── img │ ├── banner.png │ ├── rcis.png │ ├── task2.png │ ├── task2b.png │ └── task2c.png ├── LICENSE.md ├── README.md ├── Week 1 ├── Week1-AE-Notes.md ├── Week1-AE-RMarkdown.Rmd ├── Week1-AE-RMarkdown.html ├── Week1-AE-RMarkdown.pdf └── img │ ├── Minard_Update.png │ ├── Untitled 1.png │ ├── Untitled 2.png │ ├── Untitled 3.png │ ├── Untitled 4.png │ ├── Untitled 5.png │ ├── Untitled 6.png │ ├── Untitled 7.png │ ├── Untitled 8.png │ ├── Untitled 9.png │ ├── Untitled.png │ └── olympic_feathers_print.jpeg ├── Week 10 ├── Week10-AE-Notes.md ├── Week10-AE-RMarkdown-Key.Rmd ├── Week10-AE-RMarkdown.Rmd ├── img │ ├── banner.png │ ├── choropleth.png │ ├── geomsf.png │ ├── shame.png │ ├── task1.png │ ├── task2.png │ ├── task3.png │ ├── task4-optional.png │ ├── task4.png │ └── task5.png ├── ny-counties.rds ├── ny-inc.rds └── nyc-311.csv ├── Week 11 ├── Week11-AE-Notes.md ├── Week11-AE-RMarkdown.Rmd ├── ad_aqi_tracker_data-2023.csv ├── freedom.csv └── img │ ├── syracuse.png │ ├── task0.png │ ├── task1.png │ ├── task2.png │ ├── task3.png │ ├── task4.png │ ├── task5.png │ ├── task6-1.gif │ └── task6-2.gif ├── Week 12 ├── Week12-AE-RMarkdown.Rmd └── Week12-Lecture-Examples.Rmd ├── Week 2 ├── Week2-AE-Notes.md ├── Week2-AE-RMarkdown.Rmd ├── Week2-AE-RMarkdown.html ├── Week2-AE-RMarkdown.pdf ├── homesales.csv └── img │ ├── banner.png │ ├── dataink.png │ ├── dataink2.jpg │ ├── housedata1.png │ ├── housedata10.png │ ├── housedata11.png │ ├── housedata2.png │ ├── housedata3.png │ ├── housedata4.png │ ├── housedata5.png │ ├── housedata6.png │ ├── housedata7.png │ ├── housedata8.png │ ├── housedata9.png │ └── piechart.png ├── Week 4 ├── Week4-AE-Notes.md ├── img │ ├── Truncated-Y-Axis-Data-Visualizations-Designed-To-Mislead.jpg │ ├── banner.png │ ├── task1.jpg │ ├── task2.jpg │ ├── task3.jpg │ ├── task4.jpg │ ├── task5.jpg │ ├── task6.jpg │ ├── waffle1.jpg │ ├── waffle2.png │ ├── waffle3.jpg │ ├── waffle4.jpg │ └── waffle5.jpg └── worldometer_data.csv ├── Week 5 ├── Week5-AE-Notes.md ├── college_data_normalized.csv ├── demo.Rmd └── img │ ├── ae1.png │ ├── ae2.png │ ├── ae3.png │ ├── ae4.png │ ├── banner.png │ └── shame.jpg ├── Week 6 ├── Week6-AE-Notes.md ├── Week6-AE-RMarkdown.Rmd ├── fjc-judges.RData ├── img │ ├── banner.jpg │ ├── shame.jpg │ └── staff-employment.png └── instructional-staff.csv ├── Week 7 ├── Week7-AE-Notes.md ├── Week7-AE-RMarkdown.Rmd └── img │ ├── banner.gif │ ├── fame.jpg │ ├── notes1.jpg │ ├── task1.jpg │ ├── task2.jpg │ ├── task3.png │ ├── task4.png │ ├── task5.png │ ├── task6.png │ ├── task7.png │ └── task8.png └── Week 9 ├── Week9-AE-Notes.md ├── Week9-AE-RMarkdown.Rmd ├── img ├── avoid1.jpg ├── avoid2.jpg ├── banner.png ├── colorblind-game.png ├── contrast.png ├── dont.jpg ├── task0.png ├── task1.png ├── task2.png └── task3.png └── nurses.csv /.gitignore: -------------------------------------------------------------------------------- 1 | demo.Rmd 2 | demo.Rmd 3 | *Solution.Rmd 4 | Homework 2/.RData 5 | Homework 2/.Rhistory 6 | *demo.Rmd 7 | -------------------------------------------------------------------------------- /Homework 1/Homework1.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/ikea-logo.png) 2 | 3 | # COMP4010/5120 Data Visualization - Homework 1 Question Set 4 | 5 | This assignment dives into the world of data manipulation and visualization using the `tidyverse` package in R. You'll work with [an IKEA furniture dataset]((https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-11-03/)), exploring various techniques to clean, transform, and visually represent the data. 6 | 7 | Throughout the assignment, you'll encounter ten tasks, each requiring you to write R code to achieve specific data manipulation and visualization goals. 8 | 9 | Feel free to experiment, explore different approaches, and consult online resources if needed. Remember to utilize clear and concise code with meaningful variable names and comments to enhance readability and understanding. 10 | 11 | Dataset: The dataset, [`ikea_data.csv`](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-11-03/), contains information about various IKEA furniture items, including their names, categories, prices, dimensions, and designers. 12 | 13 | ### Data dictionary 14 | 15 | |variable |class |description | 16 | |:-----------------|:---------|:-----------| 17 | |item_id |double | Item ID (Irrelevant to us) | 18 | |name |character | Commercial name of items | 19 | |category |character | The furniture category | 20 | |price |double | Current price in Saudi Riyals | 21 | |old_price |character | Price of item in Saudi Riyals before discount | 22 | |sellable_online |logical | Sellable online (boolean) | 23 | |link |character | Link to the item | 24 | |other_colors |character | Whether other colors are available for the item (boolean) | 25 | |short_description |character | Description of the item | 26 | |designer |character | Designer who designed the item | 27 | |depth |double | Depth of the item in Centimeter | 28 | |height |double | Height of the item in Centimeter | 29 | |width |double | Width of the item in Centimeter| 30 | 31 | # Submission requirements 32 | Similar to the weekly AEs, you should submit your work in 2 formats: `Rmd` and `pdf` to Canvas. 33 | Answers for non-coding questions should be added in the file as plain text. 34 | 35 | 36 | ## Task 1: Converting prices to USD 37 | 38 | Convert the price column from Saudi Riyals (SAR) to USD, considering an exchange rate of 1 SAR = 0.27 USD. Create a new column, `price_usd`, containing the prices in USD. 39 | 40 | # Task 2: Splitting multiple designers into seperate rows 41 | 42 | The designer column might contain multiple designers separated by "`/`". Split these entries into separate rows, creating a new data frame with one designer per row. 43 | 44 | For example, consider the following example dataframe `df`: 45 | | item_id |designer |price | 46 | |:-----------------|:---------|:-----------| 47 | |1| Designer A | 1 48 | |2| Designer B/Designer C/Designer D | 15 49 | 50 | Item 2 has 3 designers, we need to create a new dataframe `df_split` by splitting the designer column: 51 | |item_id |designer |price | 52 | |:-----------------|:---------|:-----------| 53 | |1| Designer A | 1 | 54 | |2| Designer B | 15 | 55 | |2| Designer C | 15 | 56 | |2| Designer D | 15 | 57 | 58 | 59 | Hint: There are multiple ways to achieve this, but you can check out the [documentation for `seperate_rows`](https://tidyr.tidyverse.org/reference/separate_rows.html) if you need help. 60 | 61 | # Task 3: Get top 20 designers by number of items. Exclude NA and IKEA of Sweden. 62 | 63 | Based on `df_split` in Task 2, find the top 20 designers with the most products (excluding "`IKEA of Sweden`" and entries with missing values in the `designer` column) and store it in a new dataframe called `top_designers`. The dataframe should have 2 columns which show the designer name and the corresponding number of item made by that designer: 64 | 65 | | designer | num_items| 66 | |:-----------------|:---------| 67 | | Designer Z | 74 | 68 | | Designer X | 107 | 69 | | Designer Y | 67 | 70 | | ... | ... | 71 | 72 | Hint: You may find the following functions useful: 73 | - [`filter()`](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/filter) 74 | - [`is.na()`](https://www.statology.org/is-na/) 75 | - [`group_by()`](https://datacarpentry.org/R-genomics/04-dplyr.html#split-apply-combine_data_analysis_and_the_summarize()_function) 76 | - [`summarize()`](https://datacarpentry.org/R-genomics/04-dplyr.html#split-apply-combine_data_analysis_and_the_summarize()_function) 77 | - [`n()`](https://www.statology.org/n-function-in-r/) 78 | - [`top_n()`](https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/top_n) 79 | 80 | In your RMarkdown, answer the following question(s): 81 | - Who are the top 3 designers by number of items? 82 | - What is the number of items designed by IKEA of Sweden or by unknown designers? Why should we exclude them from the analysis? 83 | 84 | # Task 4. Visualizing price distribution per designer with box plot 85 | Create a boxplot to visualize the distribution of item prices (USD) for each designer identified in `top_designers`. Your plot should show x-axis for designers and y-axis for prices. 86 | Add appropriate axis labels, and a plot title. 87 | 88 | An example plot (yours may look different): 89 | 90 | ![Box plot](img/task4.png) 91 | 92 | In your RMarkdown, answer the following question(s): 93 | - In 3-4 sentences, briefly describe the key findings from this plot. (Open ended question - write down anything useful you can see from this plot) 94 | - In your opinion, is this an effective visualization? If not, what is your suggestion to improve it? 95 | 96 | # Task 5. Distribution of Items per Category with lollipop chart 97 | In the original dataset, count the number of items in each `category` and visualize the distribution using a lollipop chart (recall Weekly AE 2). Create a bar chart with categories on the x-axis, item count on the y-axis, and informative labels and title. 98 | 99 | **Optional**: Sort the categories in descending order of item count. 100 | 101 | An example plot (yours may look different): 102 | 103 | ![Lollipop plot](img/task5.png) 104 | 105 | In your RMarkdown, answer the following question(s): 106 | - In 3-4 sentences, briefly describe the key findings from this plot. (Open ended question - write down anything useful you can see from this plot) 107 | - In your opinion, is this an effective visualization? If not, what is your suggestion to improve it? 108 | 109 | # Task 6. Average price per category with lollipop chart 110 | Calculate the average price (USD) for each category and visualize the distribution using a lollipop chart. 111 | 112 | **Optional**: Sort the categories by average price in descending order. 113 | 114 | An example plot (yours may look different): 115 | 116 | ![Lollipop plot](img/task6.png) 117 | 118 | In your RMarkdown, answer the following question(s): 119 | - In 3-4 sentences, briefly describe the key findings from this plot. (Open ended question - write down anything useful you can see from this plot) 120 | - In your opinion, is this an effective visualization? If not, what is your suggestion to improve it? 121 | 122 | # Task 7. Price vs. Volume relationship with scatter plot 123 | A volume of each item is the product of its `height`, `width`, and `depth`. Plot a scattter plot of volume on the y-axis, and the price in USD on the x-axis. Color the plot using `category`. 124 | 125 | An example plot (yours may look different): 126 | 127 | ![Scatter plot](img/task7.png) 128 | 129 | In your RMarkdown, answer the following question(s): 130 | - In 3-4 sentences, briefly describe the key findings from this plot. (Open ended question - write down anything useful you can see from this plot) 131 | - In your opinion, is this an effective visualization? If not, what is your suggestion to improve it? -------------------------------------------------------------------------------- /Homework 1/img/ikea-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 1/img/ikea-logo.png -------------------------------------------------------------------------------- /Homework 1/img/task4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 1/img/task4.png -------------------------------------------------------------------------------- /Homework 1/img/task5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 1/img/task5.png -------------------------------------------------------------------------------- /Homework 1/img/task6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 1/img/task6.png -------------------------------------------------------------------------------- /Homework 1/img/task7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 1/img/task7.png -------------------------------------------------------------------------------- /Homework 2/Homework2.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | 3 | # COMP4010/5120 Data Visualization - Homework 2 Question Set 4 | 5 | This assignment introduces you to a new type of plot called [ridgeline plot](https://www.data-to-viz.com/graph/ridgeline.html), and also require you to apply the skills you have learned so far in the course. We will be using the same fictional VinUni enrolment data from Week 5 AE, plus some Vietnam population data to construct a [population pyramid plot](https://education.nationalgeographic.org/resource/population-pyramid/). 6 | 7 | # Submission requirements 8 | 9 | Similar to the weekly AEs and prior homeworks, you should submit your work in 2 formats: `Rmd` and `pdf` to Canvas. 10 | Answers for non-coding questions should be added in the file as plain text. 11 | 12 | ## Task 1: Creating a ridgeline plot for VinUni enrolment data 13 | 14 | ### Summary of task requirements 15 | 16 | - Recreate the example plot to the best of your ability (minor differences like font size and figure size are accepted). 17 | - Provide a short comment about what information you are able to intepret from this ridgeline plot. 18 | 19 | ### Task description 20 | 21 | Install the package `ggridges` ([quick beginner's guide](https://cran.r-project.org/web/packages/ggridges/vignettes/introduction.html)). Use `ggridges` to create a ridgeline plot of the distribution of percentage of enrolled students by college in the fictional VinUni dataset [`college_data_normalized.csv`](college_data_normalized.csv). 22 | 23 | Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted), and provide a short comment about what information you are able to intepret from this ridgeline plot: 24 | 25 | ![Task 1](img/task1.png) 26 | 27 | You should do this by creating a function called `theme_college_stats()` to store your theme for the following exercises. In particular, the theme should have the following properties: 28 | 29 | - Centered and bold plot title. 30 | - Only y-axis major grid lines are visible. 31 | - Axis title and ticks labels should be bold. 32 | - Font size should be readable and does not need to be identical to the example plot provided above. 33 | 34 | Additionally, the font family used in the above plot is [Open Sans from Google Font](https://fonts.google.com/specimen/Open+Sans). 35 | The color palette used for color coding the colleges is: `#78d6ff` for CAS, `#ffee54` for CBM, `#992212` for CECS, and `#117024` for CHS. 36 | 37 | Finally, please provide a short comment about what information you are able to intepret from this ridgeline plot. 38 | 39 | ## Task 2: Creating a bar chart for 2118 VinUni data 40 | 41 | ### Summary of task requirements 42 | 43 | - Filter the VinUni enrolment data to extract only data from the year 2118. 44 | - Recreate the example plot to the best of your ability (minor differences like font size and figure size are accepted). 45 | 46 | ### Task description 47 | 48 | Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted): 49 | 50 | ![Task 2](img/task3.png) 51 | 52 | ## Task 3: Creating a pie chart for 2118 VinUni data 53 | 54 | ### Summary of task requirements 55 | 56 | - Recreate the example plot to the best of your ability (minor differences like font size and figure size are accepted). 57 | - Provide a short comment comparing the effectiveness of the pie chart and the bar chart from Task 2 to represent the same data. 58 | 59 | ### Task description 60 | 61 | Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted): 62 | 63 | ![Task 3](img/task2.png) 64 | 65 | Hint: If you're unsure how to create this pie chart, check out [`coord_polar()` documentation](https://ggplot2.tidyverse.org/reference/coord_polar.html). 66 | 67 | ## Task 4: Creating population pyramids for Vietnam data from 1975 to 2025 68 | 69 | ### Summary of task requirements 70 | 71 | - Create a series of 11 individual population pyramid plots for Vietnam from 1975 to 2025. 72 | - Create a theme to style the plots as close as possible to the example plot provided below (you can use other color schemes). 73 | - Provide a comment describing what information does a population pyramid of a year tell you? 74 | - Provide a comment, based on the 11 plots, describing your idea of what is going on with the population in Vietnam? Optional: Provide your hypothesis/explanation to why that is happening. 75 | 76 | ### Task description 77 | 78 | Create a series of 11 individual [population pyramid]((https://education.nationalgeographic.org/resource/population-pyramid/)) plots for Vietnam from 1975 to 2025, based on the data files [`Viet Nam-####.csv`](./). In particular, the x-axis should represent the percentage of total population while the y-axis should be the age group. Below is an example plot of the data for 2020: 79 | 80 | ![Task 4](img/task4.png) 81 | 82 | Please try to recreate the plot to the best of your ability. You can pick any color schemes you would like for your plot. In the example plot above, the colors `#109466` and `#112e80` are used for female and male respectively. 83 | 84 | To avoid copy and pasting a lot of code, you can create a function called `create_population_pyramid(file_name)` to generate a population pyramid with the defined style and theme for a given data file. 85 | 86 | Please describe what information does a population pyramid of a year tell you? Based on the 11 plots you've just generated, describe your idea of what is going on with the population in Vietnam? **Optional**: Provide your hypothesis/explanation to why that is happening. 87 | 88 | ## Task 5: Creating line graph for Vietnamese total population count from 1975 to 2025 89 | 90 | ### Summary of task requirements 91 | 92 | - Create a line graph for Vietnamese total population count from 1975 to 2025. 93 | 94 | ### Task description 95 | 96 | Using all provided data files for Vietnamese population from 1975 to 2025, create a line graph for Vietnamese total population count from 1975 to 2025. As the data for 2025 is predicted data, plot it with a dashed line from 2020 to highlight that it is a prediction. 97 | 98 | Recreate the following plot to the best of your ability (minor differences like font size and figure size are accepted): 99 | 100 | ![Task 5](img/task5.png) 101 | 102 | You can pick any color schemes you would like for your plot. In the example plot above, the color `#112e80` is used for the lines and points and `#999999` is used for the dashed line. 103 | 104 | ## Task 6: Your own Hall of Fame/Shame 105 | 106 | Find an interesting data visualization online and provide a critique of it. It can be a great visualization or a terrible one, up to you! 107 | 108 | In 3-4 sentences: 109 | 110 | - Provide and introduction to the visualization and an image of it along with proper credit/citation. To insert an image in R Markdown, use: 111 | ```![Caption for the picture.](/path/to/image.png)``` 112 | - Describe the purpose of the visualization and the question it is attempting to answer. 113 | 114 | In bullet points: identify the strengths and weaknesses of the visualization. 115 | 116 | In two to three paragraphs (80-150 words): give a critique/analysis/constructive comments about the visualization. Apply the you knowledge of effective visualization and what you've have learned in the course to your analysis. Can you suggest some improvements that can be made to the visualization? 117 | 118 | Side note: inserting an image in R Markdown should look something like this: 119 | 120 | ![Task 6](img/task6.png) -------------------------------------------------------------------------------- /Homework 2/Viet Nam-1975.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3694642,3533048 3 | 5-9,3121466,3028507 4 | 10-14,2989087,2922867 5 | 15-19,2714457,2693838 6 | 20-24,2076148,2135972 7 | 25-29,1403191,1533105 8 | 30-34,1027172,1154893 9 | 35-39,984171,1109235 10 | 40-44,1010992,1158747 11 | 45-49,848021,985453 12 | 50-54,772016,901463 13 | 55-59,646767,746972 14 | 60-64,570331,678210 15 | 65-69,428088,533747 16 | 70-74,332872,443930 17 | 75-79,183815,270358 18 | 80-84,85977,143035 19 | 85-89,28564,55873 20 | 90-94,5496,14371 21 | 95-99,522,2033 22 | 100+,21,131 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-1980.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,4086624,3896241 3 | 5-9,3601373,3485693 4 | 10-14,3020584,2990706 5 | 15-19,2838616,2875253 6 | 20-24,2608580,2663069 7 | 25-29,2021410,2110407 8 | 30-34,1359024,1506636 9 | 35-39,989298,1134796 10 | 40-44,951458,1094749 11 | 45-49,977491,1143917 12 | 50-54,811901,964290 13 | 55-59,727906,874578 14 | 60-64,590842,707845 15 | 65-69,495372,620876 16 | 70-74,345272,462280 17 | 75-79,236716,347983 18 | 80-84,107137,179331 19 | 85-89,36725,72510 20 | 90-94,7448,19518 21 | 95-99,732,2846 22 | 100+,33,196 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-1985.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,4526915,4311312 3 | 5-9,4006081,3859743 4 | 10-14,3537375,3461674 5 | 15-19,2907232,2951899 6 | 20-24,2729170,2841305 7 | 25-29,2542777,2635707 8 | 30-34,1971260,2082154 9 | 35-39,1317979,1484041 10 | 40-44,957636,1118555 11 | 45-49,921444,1080490 12 | 50-54,938433,1121204 13 | 55-59,767940,936689 14 | 60-64,670707,835130 15 | 65-69,517712,652171 16 | 70-74,404676,543594 17 | 75-79,249893,368246 18 | 80-84,140976,235917 19 | 85-89,47322,94088 20 | 90-94,10036,26384 21 | 95-99,1036,4061 22 | 100+,48,290 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-1990.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,4670777,4448662 3 | 5-9,4448635,4277100 4 | 10-14,3959308,3842936 5 | 15-19,3451933,3431582 6 | 20-24,2813886,2923031 7 | 25-29,2666815,2817462 8 | 30-34,2489071,2609294 9 | 35-39,1923492,2059038 10 | 40-44,1281928,1465924 11 | 45-49,929012,1104419 12 | 50-54,885849,1060693 13 | 55-59,888640,1090833 14 | 60-64,710108,898816 15 | 65-69,590784,775701 16 | 70-74,425791,577455 17 | 75-79,295573,440848 18 | 80-84,150868,256699 19 | 85-89,62777,128777 20 | 90-94,13669,36413 21 | 95-99,1525,5949 22 | 100+,69,459 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-1995.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,4480472,4260623 3 | 5-9,4595982,4415086 4 | 10-14,4414726,4268137 5 | 15-19,3928244,3833046 6 | 20-24,3405738,3418352 7 | 25-29,2769092,2908102 8 | 30-34,2621960,2801018 9 | 35-39,2443493,2592424 10 | 40-44,1882183,2041223 11 | 45-49,1247959,1448627 12 | 50-54,894373,1085734 13 | 55-59,839338,1034514 14 | 60-64,822764,1051791 15 | 65-69,628519,844696 16 | 70-74,488346,697378 17 | 75-79,313255,480197 18 | 80-84,180199,319823 19 | 85-89,68052,148908 20 | 90-94,18329,54131 21 | 95-99,2117,9266 22 | 100+,105,780 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-2000.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3678636,3482725 3 | 5-9,4430164,4238865 4 | 10-14,4554958,4400237 5 | 15-19,4357328,4241607 6 | 20-24,3880755,3813519 7 | 25-29,3382369,3406481 8 | 30-34,2750254,2899375 9 | 35-39,2595111,2791177 10 | 40-44,2408404,2577420 11 | 45-49,1846777,2023776 12 | 50-54,1212804,1429674 13 | 55-59,854513,1062360 14 | 60-64,782876,1000858 15 | 65-69,732617,994023 16 | 70-74,522859,766157 17 | 75-79,362475,588576 18 | 80-84,193320,357017 19 | 85-89,82564,192744 20 | 90-94,20254,66003 21 | 95-99,2875,15115 22 | 100+,148,1362 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-2005.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3673690,3421596 3 | 5-9,3645906,3460996 4 | 10-14,4357498,4198635 5 | 15-19,4280693,4236040 6 | 20-24,4061281,4043894 7 | 25-29,3806480,3688444 8 | 30-34,3393297,3340381 9 | 35-39,2745868,2870666 10 | 40-44,2584660,2775668 11 | 45-49,2399863,2566785 12 | 50-54,1830224,2011950 13 | 55-59,1181752,1403330 14 | 60-64,812941,1024571 15 | 65-69,706730,946525 16 | 70-74,611051,905409 17 | 75-79,389821,651462 18 | 80-84,224921,442767 19 | 85-89,89264,219561 20 | 90-94,24755,87404 21 | 95-99,3209,19441 22 | 100+,202,2451 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-2010.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3741588,3387806 3 | 5-9,3643329,3404214 4 | 10-14,3610494,3443027 5 | 15-19,4202547,4109705 6 | 20-24,4073782,4111665 7 | 25-29,3981072,3959360 8 | 30-34,3788085,3639043 9 | 35-39,3363334,3312905 10 | 40-44,2716365,2851761 11 | 45-49,2551428,2756603 12 | 50-54,2347015,2541429 13 | 55-59,1759579,1973885 14 | 60-64,1112588,1359315 15 | 65-69,732028,971634 16 | 70-74,590187,864554 17 | 75-79,456549,773649 18 | 80-84,242899,494433 19 | 85-89,104331,276123 20 | 90-94,26942,101798 21 | 95-99,3935,26470 22 | 100+,229,3319 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-2015.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3947110,3538149 3 | 5-9,3712754,3375324 4 | 10-14,3620222,3396949 5 | 15-19,3580980,3431778 6 | 20-24,4149889,4092957 7 | 25-29,4024289,4095666 8 | 30-34,3931499,3941614 9 | 35-39,3731891,3618417 10 | 40-44,3302109,3287568 11 | 45-49,2653235,2822345 12 | 50-54,2466112,2716899 13 | 55-59,2234321,2488514 14 | 60-64,1640149,1915359 15 | 65-69,995877,1292890 16 | 70-74,613443,889638 17 | 75-79,442039,740313 18 | 80-84,285623,589422 19 | 85-89,113576,310718 20 | 90-94,31763,129407 21 | 95-99,4340,31362 22 | 100+,284,4595 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-2020.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3914415,3515182 3 | 5-9,3918151,3525629 4 | 10-14,3690690,3368547 5 | 15-19,3592890,3386916 6 | 20-24,3537280,3418117 7 | 25-29,4099736,4077268 8 | 30-34,3974712,4077902 9 | 35-39,3874635,3920484 10 | 40-44,3665987,3592153 11 | 45-49,3228325,3255484 12 | 50-54,2567440,2783503 13 | 55-59,2351065,2662918 14 | 60-64,2086301,2417969 15 | 65-69,1469944,1824485 16 | 70-74,839870,1189523 17 | 75-79,462156,766667 18 | 80-84,277636,568363 19 | 85-89,134475,375856 20 | 90-94,34972,148902 21 | 95-99,5157,40949 22 | 100+,316,5705 23 | -------------------------------------------------------------------------------- /Homework 2/Viet Nam-2025.csv: -------------------------------------------------------------------------------- 1 | Age,M,F 2 | 0-4,3693730,3341723 3 | 5-9,3889316,3501707 4 | 10-14,3859798,3501012 5 | 15-19,3596688,3313053 6 | 20-24,3523399,3343511 7 | 25-29,3565336,3420389 8 | 30-34,4122372,4066148 9 | 35-39,3940209,4038239 10 | 40-44,3805911,3871350 11 | 45-49,3573473,3537612 12 | 50-54,3106736,3192372 13 | 55-59,2421136,2706316 14 | 60-64,2177102,2575229 15 | 65-69,1868542,2302988 16 | 70-74,1240881,1680082 17 | 75-79,638319,1030491 18 | 80-84,292072,590966 19 | 85-89,130530,361834 20 | 90-94,41247,180620 21 | 95-99,5720,47634 22 | 100+,399,7778 23 | -------------------------------------------------------------------------------- /Homework 2/college_data_normalized.csv: -------------------------------------------------------------------------------- 1 | year,college,pct 2 | 2105,CBM,0.598425 3 | 2106,CBM,0.6615047779779688 4 | 2107,CBM,0.3971112533138313 5 | 2108,CBM,0.41129442916793973 6 | 2109,CBM,0.4251509105144849 7 | 2110,CBM,0.43469596320899334 8 | 2111,CBM,0.42034424444598883 9 | 2112,CBM,0.4281175536139794 10 | 2113,CBM,0.434933723243606 11 | 2114,CBM,0.4274814637646496 12 | 2115,CBM,0.4368950555476089 13 | 2116,CBM,0.45135 14 | 2117,CBM,0.4224771553436631 15 | 2118,CBM,0.4020581168683984 16 | 2119,CBM,0.3827852384477107 17 | 2120,CBM,0.3396724986988029 18 | 2121,CBM,0.3363424403658756 19 | 2122,CBM,0.3318401649092207 20 | 2123,CBM,0.3014243451343038 21 | 2124,CBM,0.2848104801538449 22 | 2125,CBM,0.2708468062389401 23 | 2105,CAS,0.0000000000000000 24 | 2106,CAS,0.0000000000000000 25 | 2107,CAS,0.2802358533686809 26 | 2108,CAS,0.2880130359507078 27 | 2109,CAS,0.2777006036420579 28 | 2110,CAS,0.2843893714869698 29 | 2111,CAS,0.2966386359950888 30 | 2112,CAS,0.3150025697332152 31 | 2113,CAS,0.3112730410604281 32 | 2114,CAS,0.2922028222913179 33 | 2115,CAS,0.2437896342917084 34 | 2116,CAS,0.2183250000000002 35 | 2117,CAS,0.2292163289630512 36 | 2118,CAS,0.2471780225758193 37 | 2119,CAS,0.2621973141588241 38 | 2120,CAS,0.2780758297633823 39 | 2121,CAS,0.2759211653813196 40 | 2122,CAS,0.2591968603821454 41 | 2123,CAS,0.2455672890421426 42 | 2124,CAS,0.2567813349718876 43 | 2125,CAS,0.2634276664873313 44 | 2105,CECS,0.221725000000000 45 | 2106,CECS,0.172349305539717 46 | 2107,CECS,0.106522534052472 47 | 2108,CECS,0.082951420714940 48 | 2109,CECS,0.081150708458565 49 | 2110,CECS,0.056642820643842 50 | 2111,CECS,0.063937730210577 51 | 2112,CECS,0.062374433490632 52 | 2113,CECS,0.066156906924902 53 | 2114,CECS,0.072518536235350 54 | 2115,CECS,0.081986363419634 55 | 2116,CECS,0.104075000000000 56 | 2117,CECS,0.139327572506952 57 | 2118,CECS,0.173330613355093 58 | 2119,CECS,0.189468118853759 59 | 2120,CECS,0.224646674940945 60 | 2121,CECS,0.247962376198162 61 | 2122,CECS,0.264568302544993 62 | 2123,CECS,0.300523108873258 63 | 2124,CECS,0.304196364150193 64 | 2125,CECS,0.315810182777259 65 | 2105,CHS,0.1776000000000000 66 | 2106,CHS,0.1620179259698497 67 | 2107,CHS,0.2161303592650151 68 | 2108,CHS,0.2177411141664120 69 | 2109,CHS,0.2159977773848913 70 | 2110,CHS,0.2242718446601941 71 | 2111,CHS,0.2190793893483448 72 | 2112,CHS,0.1945054431621735 73 | 2113,CHS,0.1876363287710635 74 | 2114,CHS,0.2077971777086821 75 | 2115,CHS,0.2373289467410482 76 | 2116,CHS,0.2262500000000000 77 | 2117,CHS,0.2089789431863329 78 | 2118,CHS,0.1774332472006890 79 | 2119,CHS,0.1655493285397060 80 | 2120,CHS,0.1576049965968691 81 | 2121,CHS,0.1397740180546422 82 | 2122,CHS,0.1443946721636406 83 | 2123,CHS,0.1524852569502948 84 | 2124,CHS,0.1542118207240741 85 | 2125,CHS,0.1499153444964689 86 | -------------------------------------------------------------------------------- /Homework 2/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/banner.png -------------------------------------------------------------------------------- /Homework 2/img/task1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/task1.png -------------------------------------------------------------------------------- /Homework 2/img/task2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/task2.png -------------------------------------------------------------------------------- /Homework 2/img/task3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/task3.png -------------------------------------------------------------------------------- /Homework 2/img/task4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/task4.png -------------------------------------------------------------------------------- /Homework 2/img/task5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/task5.png -------------------------------------------------------------------------------- /Homework 2/img/task6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 2/img/task6.png -------------------------------------------------------------------------------- /Homework 3/Homework3.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | 3 | # COMP4010/5120 Data Visualization - Homework 3 Question Set 4 | 5 | This assignment requires you to demonstrate your knowledge of visualizing geospatial data (choropleth map), creating and publishing an interactive Shiny dashboard. Additionally, you will also be introduced to `DALEX`, a library for ML model intepretation. This was supposed to be covered during an AE sessio, however due to scheduling, a brief introductory exercise will be included here instead. 6 | 7 | The assignment is seperated into 3 main sections, each using a different dataset. 8 | 9 | The maximum points for this assignment is **20 points**. 10 | 11 | # Submission requirements 12 | 13 | Similar to the weekly AEs and prior homeworks, you should submit your work in 2 formats: `Rmd` and `pdf` to Canvas. 14 | The entire assignment should be contained in 1 `Rmd` file. 15 | 16 | Answers for non-coding questions should be added in the file as plain text in the provided `ANSWER` field. 17 | 18 | # Section 1: Studying popularity of baby names 19 | 20 | For this set of exercises, we are going to use the [`babynames` dataset](https://hadley.github.io/babynames/). 21 | 22 | Install the dataset by running: 23 | 24 | ```R 25 | # Install the released version from CRAN 26 | install.packages("babynames") 27 | 28 | library(babynames) 29 | 30 | applicants_df <- applicants 31 | babynames_df <- babynames 32 | lifetables_df <- lifetables 33 | births_df <- births 34 | ``` 35 | 36 | The four included datasets in this package are: 37 | 38 | - `babynames`: For each year from 1880 to 2017, the number of children of each sex given each name. All names with more than 5 uses are given. 39 | 40 | - `applicants`: The number of applicants for social security numbers (SSN) for each year for each sex. 41 | 42 | - `lifetables`: Cohort life tables data. 43 | 44 | It also includes the following data set from the US Census: 45 | 46 | - `births`: Number of live births by year, up to 2017. 47 | You may need to use multiple files, but the most important one is `babynames`. Feel free to bring in additional data sources as you wish. 48 | 49 | ## Task 1: Shiny Dashboard for Baby Naming Trends (10 pts) 50 | 51 | ### Summary of task requirements 52 | 53 | Design and implement a Shiny dashboard that reports on baby naming trends. The dashboard should appropriately use server-side interactivity. 54 | 55 | 1. **Originality**: 56 | - Avoid direct code copying from examples. 57 | - Create an original implementation. 58 | 59 | 2. **Inputs and Outputs**: 60 | - Use at least **two user inputs**. 61 | - Provide at least **three reactive outputs** (e.g., plots, tables, text). 62 | 63 | 3. **Components**: 64 | - Include at least: 65 | - One **plot** 66 | - One **table** 67 | - One **value box** 68 | - Utilize **plotly** for interactivity in at least one plot. 69 | 70 | 4. **Styling and Customization**: 71 | - Apply an **appropriate theme** for styling. 72 | - Optional: Customize further with **CSS/SCSS** if desired. 73 | 74 | 5. **Creativity and Beyond Basics**: 75 | - Extend beyond basic functionality. 76 | - Consider user experience, customization, interactivity, data sources, and insights. 77 | 78 | 6. **Publishing**: 79 | - Deploy the dashboard on **Shinyapps.io**. 80 | - Clearly print the **working app URL** in the rendered PDF for evaluation. 81 | 82 | Remember to showcase creativity and thoughtful design in your implementation! Good luck! 83 | 84 | ### Task description 85 | 86 | **Create a Shiny dashboard to report on baby naming trends.** Design and implement a Shiny dashboard that appropriately uses server-side interactivity. 87 | 88 | There are tons of examples online of Shiny apps made using the `babynames` package. You may use them for design inspiration but your implementation must be original. Don’t directly copy code from examples, and if you draw on specific features and/or code be sure to cite it. 89 | 90 | - You must use at least two user inputs and three reactive outputs (e.g. plots, tables, text). 91 | 92 | - The dashboard should include at least one plot, one table, and one value box. 93 | - You should use `plotly` for at least one plot and customize it to maximize its interactivity. 94 | - Use an appropriate theme for styling. Feel free to further customize with CSS/SCSS if you have background knowledge, but it’s not required. 95 | 96 | **If you create a basic dashboard, expect a basic grade.** We’re looking for extending beyond the basics and some originality/creativity. Things you might think to incorporate into the dashboard: 97 | 98 | - User experience: e.g. layout (pages, tabs, arrangement of cards, etc.), design, etc. 99 | - Customization: e.g. color palettes, themes, fonts, customized theme() to blend ggplot2 plots with the dashboard theme, etc. 100 | - Interactivity: e.g. hover effects, click events, etc. 101 | - Data: e.g. more than just babynames.csv, additional data sources, data wrangling, etc. 102 | - Insights: e.g. text boxes, value boxes, etc. 103 | 104 | The dashboard should be published using Shinyapps.io and the URL for the working app clearly printed in your rendered PDF so we can easily access it during the evaluation. You will need to create an account on Shinyapps.io but the free tier should be sufficient for this assignment. 105 | 106 | # Section 2: Adopt, don’t shop 107 | 108 | The data for this exercise comes from [The Pudding](https://github.com/the-pudding/data/blob/master/dog-shelters/README.md) via [TidyTuesday](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-12-17). 109 | 110 | You will also need to have the 2 packages: `ggmap` and `maps`. For your convenience, a dataframe with the US state names and their corresponding abbreviations is provided. 111 | 112 | ```R 113 | #install.packages("ggmaps") 114 | library(ggmap) 115 | 116 | #install.packages("maps") 117 | library(maps) 118 | 119 | state_abbreviations <- data.frame( 120 | state_name = c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", 121 | "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", 122 | "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", 123 | "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", 124 | "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", 125 | "New Hampshire", "New Jersey", "New Mexico", "New York", 126 | "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", 127 | "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", 128 | "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", 129 | "West Virginia", "Wisconsin", "Wyoming"), 130 | state_abbr = c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", 131 | "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", 132 | "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", 133 | "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", 134 | "VA", "WA", "WV", "WI", "WY") 135 | ) 136 | ``` 137 | 138 | ## Task 2: Exploring dog adoption dataset in the US (8 pts) 139 | 140 | ### Summary of task requirements 141 | 142 | - Create a dataframe containing the number of dogs available to adopt per `contact_state` 143 | - Create a histogram of the number of dogs available to adopt and describe the distribution of this variable. 144 | - Create a choropleth map where each state is filled in with a color based on the number of dogs available to adopt in that state. 145 | 146 | ### Task description 147 | 148 | - Load the `data/dog_travel.csv` dataset included with `read_csv()`. 149 | 150 | - Calculate the number of dogs available to adopt per `contact_state`. Save the result as a new data frame with 2 columns `contact_state` and `n`. Your new dataframe should look similar to the following example: 151 | 152 | ![Dogs available per `contact_state`](img/task2.png) 153 | 154 | - Make a histogram of the number of dogs available to adopt and describe the distribution of this variable. Use `binwidth = 50`. Your plot may look like the following example: 155 | 156 | ![Dogs histogram](img/task2b.png) 157 | 158 | - Use this dataset to make a map of the US states, where each state is filled in with a color based on the number of dogs available to adopt in that state. 159 | 160 | ![Map](img/task2c.png) 161 | 162 | *Hints*: 163 | 164 | - Use the `state_abbreviations` dataframe which you can find in the data folder of your repo as a lookup table to match state names to abbreviations. 165 | - Use a gradient color scale and `log10` transformation. 166 | 167 | # Section 3. DALEX basics with explainer 168 | 169 | To start, make sure you have the following packages installed. 170 | 171 | ```R 172 | # packages for wrangling data and the original models 173 | library(tidyverse) 174 | #install.packages('tidymodels') 175 | library(tidymodels) 176 | #install.packages('rcis') 177 | library(rcis) 178 | 179 | # packages for model interpretation/explanation 180 | #install.packages('DALEX') 181 | library(DALEX) 182 | #install.packages('DALEXtra') 183 | library(DALEXtra) 184 | #install.packages('rattle') 185 | library(rattle) # fancy tree plots 186 | 187 | # set random number generator seed value for reproducibility 188 | set.seed(4010) 189 | ``` 190 | 191 | In this task we will utilize the `scorecard` dataset. With this dataset, we will intepret a set of machine learning models predicting the median student debt load for students graduating in 2020-21 at four-year colleges and universities as a function of university-specific factors (e.g. public vs. private school, admissions rate, cost of attendance). 192 | 193 | To read about the dataset, run `help(scorecard)` after importing `rcis`. 194 | 195 | ![rcis::scorecard](img/rcis.png) 196 | 197 | We have estimated three distinct machine learning models to predict the median student debt load for students graduating in 2020-21 at four-year colleges and universities. Each model uses the same set of predictors, but the algorithms differ. Specifically, we have estimated 198 | 199 | - Random forest 200 | - Penalized regression 201 | - 10-nearest neighbors 202 | All models were estimated using tidymodels. We will load the training set, test set, and ML workflows from `data/scorecard-models.Rdata`. 203 | 204 | ```R 205 | # load Rdata file with all the data frames and pre-trained models 206 | load("data/scorecard-models.RData") 207 | ``` 208 | 209 | In order to generate our interpretations, we will use the [DALEX package](https://dalex.drwhy.ai/). The first step in any DALEX operation is to create an **explainer object**. This object contains all the information needed to interpret the model’s predictions. We will create explainer objects for each of the three models. 210 | 211 | For example, we can create an explainer object for the Penalized regression model (GLMNet) with: 212 | 213 | ```R 214 | # use explain_*() to create explainer object 215 | # first step of an DALEX operation 216 | explainer_glmnet <- explain_tidymodels( 217 | model = glmnet_wf, 218 | # data should exclude the outcome feature 219 | data = scorecard_train |> select(-debt), 220 | # y should be a vector containing the outcome of interest for the training set 221 | y = scorecard_train$debt, 222 | # assign a label to clearly identify model in later plots 223 | label = "penalized regression" 224 | ) 225 | ``` 226 | 227 | ## Task 3: Create explainer objects (1 pt) 228 | 229 | ### Summary of task requirements 230 | 231 | Fill in the `TODO` fields in the given snippets. 232 | 233 | ### Task description 234 | 235 | Review the syntax below for creating explainer objects using the `explain_tidymodels()` function. Then, create explainer objects for the random forest and 236 | k-nearest neighbors models. Fill in the `TODO` fields in the given snippets provided below. 237 | 238 | ```R 239 | # explainer for random forest model 240 | explainer_rf <- explain_tidymodels( 241 | model = TODO, 242 | data = scorecard_train |> select(-debt), 243 | y = scorecard_train$debt, 244 | label = "TODO" 245 | ) 246 | 247 | # explainer for nearest neighbors model 248 | explainer_kknn <- explain_tidymodels( 249 | model = TODO, 250 | data = scorecard_train |> select(-debt), 251 | y = scorecard_train$debt, 252 | label = "TODO" 253 | ) 254 | ``` 255 | 256 | ## Task 4: Feature importance (1 pt) 257 | 258 | ### Summary of task requirements 259 | 260 | Run the given snippets, and provide a **short answer (1 sentence)** in the `ANSWER` field included in the comment. 261 | 262 | ### Task description 263 | 264 | The DALEX package provides a variety of methods for interpreting machine learning models. One common method is to calculate feature importance. Feature importance measures the contribution of each predictor variable to the model’s predictions. We will use the `model_parts()` function to calculate feature importance for the random forest model. It includes a built-in `plot()` method using `ggplot2` to visualize the results. 265 | 266 | Here is an example, run this snippet and see what happens: 267 | 268 | ```R 269 | # generate feature importance measures 270 | vip_rf <- model_parts(explainer_rf) 271 | vip_rf 272 | 273 | # visualize feature importance 274 | plot(vip_rf) 275 | ``` 276 | 277 | Next, we want to examine the difference when using the ratio of the raw change in the loss function instead. Run the following snippet, examine the output and compare with the plot above. 278 | 279 | ```R 280 | # Question: Calculate feature importance for the random forest model using the ratio of the raw change in the loss function. How does this differ from the raw change? 281 | # ANSWER: YOUR ANSWER HERE 282 | 283 | # calculate ratio rather than raw change 284 | model_parts(explainer_rf, type = "ratio") |> 285 | plot() 286 | ``` 287 | -------------------------------------------------------------------------------- /Homework 3/data/median-housing 1.csv: -------------------------------------------------------------------------------- 1 | DATE,MSPUS 2 | 1/1/1963,17800 3 | 4/1/1963,18000 4 | 7/1/1963,17900 5 | 10/1/1963,18500 6 | 1/1/1964,18500 7 | 4/1/1964,18900 8 | 7/1/1964,18900 9 | 10/1/1964,19400 10 | 1/1/1965,20200 11 | 4/1/1965,19800 12 | 7/1/1965,20200 13 | 10/1/1965,20300 14 | 1/1/1966,21000 15 | 4/1/1966,22100 16 | 7/1/1966,21500 17 | 10/1/1966,21400 18 | 1/1/1967,22300 19 | 4/1/1967,23300 20 | 7/1/1967,22500 21 | 10/1/1967,22900 22 | 1/1/1968,23900 23 | 4/1/1968,24900 24 | 7/1/1968,24800 25 | 10/1/1968,25600 26 | 1/1/1969,25700 27 | 4/1/1969,25900 28 | 7/1/1969,25900 29 | 10/1/1969,24900 30 | 1/1/1970,23900 31 | 4/1/1970,24400 32 | 7/1/1970,23000 33 | 10/1/1970,22600 34 | 1/1/1971,24300 35 | 4/1/1971,25800 36 | 7/1/1971,25300 37 | 10/1/1971,25500 38 | 1/1/1972,26200 39 | 4/1/1972,26800 40 | 7/1/1972,27900 41 | 10/1/1972,29200 42 | 1/1/1973,30200 43 | 4/1/1973,32700 44 | 7/1/1973,33500 45 | 10/1/1973,34000 46 | 1/1/1974,35200 47 | 4/1/1974,35600 48 | 7/1/1974,36200 49 | 10/1/1974,37200 50 | 1/1/1975,38100 51 | 4/1/1975,39000 52 | 7/1/1975,38800 53 | 10/1/1975,41200 54 | 1/1/1976,42800 55 | 4/1/1976,44200 56 | 7/1/1976,44400 57 | 10/1/1976,45500 58 | 1/1/1977,46300 59 | 4/1/1977,48900 60 | 7/1/1977,48800 61 | 10/1/1977,51600 62 | 1/1/1978,53000 63 | 4/1/1978,55300 64 | 7/1/1978,56100 65 | 10/1/1978,59000 66 | 1/1/1979,60600 67 | 4/1/1979,63100 68 | 7/1/1979,64700 69 | 10/1/1979,62600 70 | 1/1/1980,63700 71 | 4/1/1980,64000 72 | 7/1/1980,64900 73 | 10/1/1980,66400 74 | 1/1/1981,66800 75 | 4/1/1981,69400 76 | 7/1/1981,69200 77 | 10/1/1981,70400 78 | 1/1/1982,66400 79 | 4/1/1982,69600 80 | 7/1/1982,69300 81 | 10/1/1982,71600 82 | 1/1/1983,73300 83 | 4/1/1983,74900 84 | 7/1/1983,77400 85 | 10/1/1983,75900 86 | 1/1/1984,78200 87 | 4/1/1984,80700 88 | 7/1/1984,81000 89 | 10/1/1984,79900 90 | 1/1/1985,82800 91 | 4/1/1985,84300 92 | 7/1/1985,83200 93 | 10/1/1985,86800 94 | 1/1/1986,88000 95 | 4/1/1986,92100 96 | 7/1/1986,93000 97 | 10/1/1986,95000 98 | 1/1/1987,97900 99 | 4/1/1987,103400 100 | 7/1/1987,106000 101 | 10/1/1987,111500 102 | 1/1/1988,110000 103 | 4/1/1988,110000 104 | 7/1/1988,115000 105 | 10/1/1988,113900 106 | 1/1/1989,118000 107 | 4/1/1989,118900 108 | 7/1/1989,120000 109 | 10/1/1989,124800 110 | 1/1/1990,123900 111 | 4/1/1990,126800 112 | 7/1/1990,117000 113 | 10/1/1990,121500 114 | 1/1/1991,120000 115 | 4/1/1991,119900 116 | 7/1/1991,120000 117 | 10/1/1991,120000 118 | 1/1/1992,119500 119 | 4/1/1992,120000 120 | 7/1/1992,120000 121 | 10/1/1992,126000 122 | 1/1/1993,125000 123 | 4/1/1993,127000 124 | 7/1/1993,127000 125 | 10/1/1993,127000 126 | 1/1/1994,130000 127 | 4/1/1994,130000 128 | 7/1/1994,129700 129 | 10/1/1994,132000 130 | 1/1/1995,130000 131 | 4/1/1995,133900 132 | 7/1/1995,132000 133 | 10/1/1995,138000 134 | 1/1/1996,137000 135 | 4/1/1996,139900 136 | 7/1/1996,140000 137 | 10/1/1996,144100 138 | 1/1/1997,145000 139 | 4/1/1997,145800 140 | 7/1/1997,145000 141 | 10/1/1997,144200 142 | 1/1/1998,152200 143 | 4/1/1998,149500 144 | 7/1/1998,153000 145 | 10/1/1998,153000 146 | 1/1/1999,157400 147 | 4/1/1999,158700 148 | 7/1/1999,159100 149 | 10/1/1999,165300 150 | 1/1/2000,165300 151 | 4/1/2000,163200 152 | 7/1/2000,168800 153 | 10/1/2000,172900 154 | 1/1/2001,169800 155 | 4/1/2001,179000 156 | 7/1/2001,172500 157 | 10/1/2001,171100 158 | 1/1/2002,188700 159 | 4/1/2002,187200 160 | 7/1/2002,178100 161 | 10/1/2002,190100 162 | 1/1/2003,186000 163 | 4/1/2003,191800 164 | 7/1/2003,191900 165 | 10/1/2003,198800 166 | 1/1/2004,212700 167 | 4/1/2004,217600 168 | 7/1/2004,213500 169 | 10/1/2004,228800 170 | 1/1/2005,232500 171 | 4/1/2005,233700 172 | 7/1/2005,236400 173 | 10/1/2005,243600 174 | 1/1/2006,247700 175 | 4/1/2006,246300 176 | 7/1/2006,235600 177 | 10/1/2006,245400 178 | 1/1/2007,257400 179 | 4/1/2007,242200 180 | 7/1/2007,241800 181 | 10/1/2007,238400 182 | 1/1/2008,233900 183 | 4/1/2008,235300 184 | 7/1/2008,226500 185 | 10/1/2008,222500 186 | 1/1/2009,208400 187 | 4/1/2009,220900 188 | 7/1/2009,214300 189 | 10/1/2009,219000 190 | 1/1/2010,222900 191 | 4/1/2010,219500 192 | 7/1/2010,224100 193 | 10/1/2010,224300 194 | 1/1/2011,226900 195 | 4/1/2011,228100 196 | 7/1/2011,223500 197 | 10/1/2011,221100 198 | 1/1/2012,238400 199 | 4/1/2012,238700 200 | 7/1/2012,248800 201 | 10/1/2012,251700 202 | 1/1/2013,258400 203 | 4/1/2013,268100 204 | 7/1/2013,264800 205 | 10/1/2013,273600 206 | 1/1/2014,275200 207 | 4/1/2014,288000 208 | 7/1/2014,281000 209 | 10/1/2014,298900 210 | 1/1/2015,289200 211 | 4/1/2015,289100 212 | 7/1/2015,295800 213 | 10/1/2015,302500 214 | 1/1/2016,299800 215 | 4/1/2016,306000 216 | 7/1/2016,303800 217 | 10/1/2016,310900 218 | 1/1/2017,313100 219 | 4/1/2017,318200 220 | 7/1/2017,320500 221 | 10/1/2017,337900 222 | 1/1/2018,331800 223 | 4/1/2018,315600 224 | 7/1/2018,330900 225 | 10/1/2018,322800 226 | 1/1/2019,313000 227 | 4/1/2019,322500 228 | 7/1/2019,318400 229 | 10/1/2019,327100 230 | 1/1/2020,329000 231 | 4/1/2020,322600 232 | 7/1/2020,337500 233 | 10/1/2020,358700 234 | 1/1/2021,369800 235 | 4/1/2021,382600 236 | 7/1/2021,411200 237 | 10/1/2021,423600 238 | 1/1/2022,433100 239 | 4/1/2022,449300 240 | 7/1/2022,468000 241 | 10/1/2022,479500 242 | 1/1/2023,429000 243 | 4/1/2023,418500 244 | 7/1/2023,435400 245 | 10/1/2023,417700 -------------------------------------------------------------------------------- /Homework 3/data/scorecard-models.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 3/data/scorecard-models.RData -------------------------------------------------------------------------------- /Homework 3/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 3/img/banner.png -------------------------------------------------------------------------------- /Homework 3/img/rcis.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 3/img/rcis.png -------------------------------------------------------------------------------- /Homework 3/img/task2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 3/img/task2.png -------------------------------------------------------------------------------- /Homework 3/img/task2b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 3/img/task2b.png -------------------------------------------------------------------------------- /Homework 3/img/task2c.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Homework 3/img/task2c.png -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 VinUniversity 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Banner](https://www.visualcinnamon.com/img/blog/2017/journey-into-dataviz/journey_into_dataviz_feature.jpg) 2 | # COMP4010/5120 - Data Visualization - Spring 2024 3 | This course teaches techniques and algorithms for creating effective visualizations of large datasets and their analytics, based on principles from graphic design, visual art, perceptual psychology and cognitive science. In addition to participating in class discussions, students will have to complete several short data analysis and visualization design assignments as well as a final project. Data visualization tools such as Tableau or R are considered as lab exercises. Future developments and methods for continual learning are also presented. 4 | 5 | ## Prerequisites 6 | 7 | - Basic knowledge of programming (experience in Python is beneficial) 8 | - R and RStudio installed on your computer 9 | 10 | ## Installation 11 | 12 | To get started, you will need to install R and RStudio: 13 | 14 | - Download R from the [CRAN repository](https://cran.r-project.org/) 15 | - Download RStudio from the [RStudio Download page](https://www.rstudio.com/products/rstudio/download/) 16 | 17 | ## Canvas 18 | 19 | - [COMP4010 Canvas](https://vinuni.instructure.com/courses/1977) 20 | - [COMP5120 Canvas](https://vinuni.instructure.com/courses/1995) 21 | 22 | ## Homework Problem Sets 23 | | Weeks | Problem Set | Dataset | Deadline | 24 | | --- | --- | --- | --- | 25 | | Week 1-2 | [Problem Set](Homework%201/Homework1.md) | [Dataset](Homework%201/ikea_data.csv) | 11.59pm Monday Week 4 (March 11) | 26 | | Week 4-5 | [Problem Set](Homework%202/Homework2.md) | [Dataset](Homework%202/) | 11.59pm Monday Week 7 (April 1) | 27 | | The rest! | [Problem Set](Homework%203/Homework3.md) | [Dataset](Homework%203/data/) | 11.59pm Monday Week 15 (June 24) | 28 | 29 | ## Weekly Application Exercises (AE) 30 | 31 | For each week, read through the notes before attempting the weekly AE. 32 | The corresponding example R Markdown notebook is also provided with different formats for readability. 33 | 34 | | Week | Topic | Notes | R Markdown | 35 | | --- | --- | --- | --- | 36 | | 01 | Introduction to R and `ggplot2` | [Notes](Week%201/Week1-AE-Notes.md) | [Rmd](Week%201/Week1-AE-RMarkdown.Rmd) \ [HTML](Week%201/Week1-AE-RMarkdown.html) \ [PDF](Week%201/Week1-AE-RMarkdown.pdf) | 37 | | 02 | Practicing with Tabular Data and More Geoms | [Notes](Week%202/Week2-AE-Notes.md) | [Rmd](Week%202/Week2-AE-RMarkdown.Rmd) \ [HTML](Week%202/Week2-AE-RMarkdown.html) \ [PDF](Week%202/Week2-AE-RMarkdown.pdf) | 38 | | 03 | Guest lecture! | n/a | n/a | 39 | | 04 | Log scale & Waffle charts | [Notes](Week%204/Week4-AE-Notes.md) | n/a | 40 | | 05 | Customizing theme | [Notes](Week%205/Week5-AE-Notes.md) | n/a | 41 | | 06 | Data wrangling! | [Notes](Week%206/Week6-AE-Notes.md) | [Rmd](Week%206/Week6-AE-RMarkdown.Rmd) | 42 | | 07 | Annotations | [Notes](Week%207/Week7-AE-Notes.md) | [Rmd](Week%207/Week7-AE-RMarkdown.Rmd) | 43 | | 08 | Presentation week! | n/a | n/a | 44 | | 09 | Accessibility | [Notes](Week%209/Week9-AE-Notes.md) | [Rmd](Week%209/Week9-AE-RMarkdown.Rmd) | 45 | | 10 | Spatial (geo) data visualization | [Notes](Week%2010/Week10-AE-Notes.md) | [Rmd](Week%2010/Week10-AE-RMarkdown.Rmd) | 46 | | 11 | Time series and Animation | [Notes](Week%2011/Week11-AE-Notes.md) | [Rmd](Week%2011/Week11-AE-RMarkdown.Rmd) | 47 | | 12 | Interactive visualization with RShiny | No notes, follow the lecture! \ [MIT 6.859 Interaction Zoo](https://vis.csail.mit.edu/classes/6.859/lectures/09-InteractionZoo.pdf) \ [MIT 6.859 Narrative Visualization](https://vis.csail.mit.edu/classes/6.859/lectures/12-Narrative.pdf) \ [[VIS 20 Talk] Tilt Map](https://www.youtube.com/watch?v=wa51nQzv2Ac) \ [Immersive Storytelling](https://www.youtube.com/watch?v=8VQ1twU3RRI) | [Lecture Rmd](Week%2012/Week12-Lecture-Examples.Rmd) \ [AE Rmd](Week%2012/Week12-AE-RMarkdown.Rmd) | 48 | | 13 | ... | | | 49 | | 14 | ... | | | 50 | | 15 | ... | | | 51 | 52 | ## Helpful resources 53 | 54 | - [Hans Rosling's 200 Countries, 200 Years, 4 Minutes - The Joy of Stats - BBC Four](https://youtu.be/jbkSRLYSojo?si=yENI1BZSAPYKcjd7): One of the most popular talks about data visualization 55 | - [from Data to Viz](https://www.data-to-viz.com/) 56 | 57 | ## Hall of Fame - Learn from good examples 58 | 59 | - [Beautiful tutorial on plotting Australian heatwave data with R](https://github.com/njtierney/ozviridis) 60 | - [Ab interactive visualization showing how the names of makeup products reveal bias in beauty](https://pudding.cool/2021/03/foundation-names/) 61 | - [adumb - I Made a Graph of Wikipedia... This Is What I Found](https://www.youtube.com/watch?v=JheGL6uSF-4&ab_channel=adumb) 62 | 63 | ## Hall of Shame - Learn from bad examples 64 | 65 | - [Don't be counter intuitive](https://www.data-to-viz.com/caveat/counter_intuitive.html) 66 | - [5 examples of bad data visualization](https://www.jotform.com/blog/bad-data-visualization/) 67 | - [10 Good and Bad Examples of Data Visualization in 2024](https://www.polymersearch.com/blog/10-good-and-bad-examples-of-data-visualization) 68 | - [12 Bad Data Visualization Examples Explained](https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/) 69 | 70 | ## Contributing 71 | 72 | We encourage students to contribute to this repository by providing feedback, suggesting improvements, or adding new resources. 73 | 74 | ## License 75 | 76 | This course material is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details. 77 | 78 | ## Acknowledgments 79 | 80 | - Banner image from the amazing [Visual Cinnamon](https://www.visualcinnamon.com/resources/learning-data-visualization/) 81 | 82 | --- 83 | 84 | Happy Learning! 85 | -------------------------------------------------------------------------------- /Week 1/Week1-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "COMP4010 - Week 1" 3 | output: 4 | html_document: default 5 | pdf_document: default 6 | date: "2024-02-20" 7 | --- 8 | 9 | # Week 1 10 | 11 | # Application Exercises 12 | 13 | ## Task 1. 14 | 15 | - Data: 16 | - Mapping: 17 | - Statistical transformation: 18 | - Geometric object: 19 | - Position adjustment: 20 | - Scale: 21 | - Coordinate system: 22 | - Faceting: 23 | 24 | ## Task 2. 25 | 26 | ```{r} 27 | # Your code here 28 | 29 | ``` 30 | 31 | ## Task 3. 32 | 33 | ```{r} 34 | # Your code here 35 | 36 | ``` 37 | 38 | # Reading Material 39 | 40 | ## Hello World! but in R 41 | 42 | Create a chunk below and create a vector of numbers and calculate its mean. (This is akin to printing 'Hello World' in other languages, statisticians are not as fun :D) 43 | 44 | ```{r} 45 | myVector <- c(1,2,3,4) 46 | mean(myVector) 47 | ``` 48 | 49 | ## Using CRAN to install ggplot2 50 | 51 | Alternatively, you can just run this in the console below. 52 | 53 | ```{r} 54 | #install.packages("ggplot2") # uncomment to install 55 | library(ggplot2) # Import 56 | ``` 57 | 58 | ## Hello to ggplot2 59 | 60 | This uses the built-in `mtcars` dataset. To preview this dataset, you can use `summary(mtcars)` or `View(mtcars)`. 61 | 62 | ```{r} 63 | View(mtcars) 64 | ``` 65 | 66 | We can see that there are the `mpg` (miles per gallons) and `wt` (weight) columns in the dataset. Let's plot the 2 dimensions on the scatterplot using `ggplot2`. 67 | 68 | ```{r} 69 | ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point() 70 | ``` 71 | 72 | ## Bar chart with iris dataset 73 | 74 | We can also try making a bar chart with the built-in `iris` dataset. 75 | 76 | ```{r} 77 | ggplot(data = iris, aes(x = Species, fill = Species)) + geom_bar() 78 | ``` 79 | 80 | ## Customizing plots 81 | 82 | Adding titles and labels: 83 | 84 | ```{r} 85 | ggplot(data = mtcars, aes(x = wt, y = mpg)) + 86 | geom_point() + 87 | ggtitle("Scatter Plot of mpg vs wt") + 88 | xlab("Weight") + 89 | ylab("Miles Per Gallon") 90 | ``` 91 | 92 | Changing colors: 93 | 94 | ```{r} 95 | ggplot(data = mtcars, aes(x = wt, y = mpg, color = factor(cyl))) + 96 | geom_point() 97 | ``` 98 | 99 | ### Experimenting with the Iris Dataset 100 | 101 | - **Dataset**: Iris (available in R by default) 102 | - **Task**: Create a scatter plot showing the relationship between petal length and petal width, colored by species. 103 | - **Customization**: Add a smooth regression line for each species. 104 | 105 | ```{r} 106 | ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species)) + 107 | geom_point() + 108 | geom_smooth(method = "lm") + 109 | ggtitle("Petal Length vs Width by Species") + 110 | theme_minimal() 111 | ``` 112 | 113 | ### Visualizing the mtcars Dataset 114 | 115 | - **Dataset**: mtcars (available in R by default) 116 | - **Task**: Create a bar plot showing the average miles per gallon (mpg) for cars with different numbers of cylinders. 117 | - **Customization**: Use a different fill color for each cylinder type and add labels for the average mpg. 118 | 119 | ```{r} 120 | ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) + 121 | geom_bar(stat = "summary", fun = mean) + 122 | geom_text(stat = 'summary', aes(label = round(..y.., 1)), vjust = -0.5) + 123 | labs(x = "Number of Cylinders", y = "Average Miles per Gallon", title = "Average MPG by Cylinder Count") + 124 | theme_bw() 125 | ``` 126 | 127 | ### Exploring the gapminder Dataset 128 | 129 | - **Dataset**: gapminder (install using **`install.packages("gapminder")`** and then **`library(gapminder)`**) 130 | - **Task**: Create a line plot showing GDP per capita over time for select countries. 131 | - **Customization**: Use different line types and colors for each country. 132 | 133 | ```{r} 134 | # install.packages("gapminder") 135 | library(gapminder) 136 | ggplot(subset(gapminder, country %in% c("Japan", "United Kingdom", "United States")), 137 | aes(x = year, y = gdpPercap, color = country, linetype = country)) + 138 | geom_line() + 139 | scale_y_log10() + 140 | ggtitle("GDP Per Capita Over Time") + 141 | theme_light() 142 | ``` 143 | 144 | ### Working with the diamonds Dataset 145 | 146 | - **Dataset**: diamonds (part of ggplot2 package) 147 | - **Task**: Create a histogram of diamond prices, faceted by cut quality. 148 | - **Customization**: Adjust the bin width and use a theme that enhances readability. 149 | 150 | ```{r} 151 | ggplot(diamonds, aes(x = price)) + 152 | geom_histogram(binwidth = 500) + 153 | facet_wrap(~cut) + 154 | labs(title = "Diamond Prices by Cut Quality", x = "Price", y = "Count") + 155 | theme_classic() 156 | ``` 157 | -------------------------------------------------------------------------------- /Week 1/Week1-AE-RMarkdown.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/Week1-AE-RMarkdown.pdf -------------------------------------------------------------------------------- /Week 1/img/Minard_Update.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Minard_Update.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 1.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 2.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 3.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 4.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 5.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 6.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 7.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 8.png -------------------------------------------------------------------------------- /Week 1/img/Untitled 9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled 9.png -------------------------------------------------------------------------------- /Week 1/img/Untitled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/Untitled.png -------------------------------------------------------------------------------- /Week 1/img/olympic_feathers_print.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 1/img/olympic_feathers_print.jpeg -------------------------------------------------------------------------------- /Week 10/Week10-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | 3 | # COMP4010/5120 - Week 10 Application Exercises 4 | --- 5 | 6 | # A. Application Exercises 7 | 8 | **Data:** [`nyc-311.csv`](./nyc-311.csv), [`ny-inc.rds`](./ny-inc.rds), [`ny-counties.rds`](./ny-counties.rds) 9 | 10 | ```R 11 | library(tidyverse) 12 | #install.packages('ggmap') 13 | library(ggmap) 14 | ``` 15 | 16 | Let’s first load a subset of 311 service requests in New York City. This subset includes 311 service requests related to Food Poisoning in commercial establishments (e.g. restaurants, cafeterias, food carts). 17 | 18 | ```R 19 | nyc_311 <- read_csv(file = "nyc-311.csv", show_col_types = FALSE) 20 | nyc_311 21 | ``` 22 | 23 | Store your Stadia Maps API key using the function: 24 | 25 | ```R 26 | register_stadiamaps(key = "YOUR KEY HERE", write = TRUE) 27 | ``` 28 | 29 | ## Task 1. Obtain map tiles for New York City 30 | 31 | Use [bboxfinder.com] to find bounding box coordinates for New York City. Then, use `get_stamenmap()` to obtain map tiles for New York City and visualize the map. 32 | 33 | > Try using a `zoom` level of 11. 34 | 35 | ![Task 1](img/task1.png) 36 | 37 | ## Task 2. Food poisoning rates 38 | 39 | The COVID-19 pandemic caused massive disruption in the restaurant industry. Due to social distancing measures and lockdowns, restaurant traffic decreased significantly. 40 | 41 | While this had significant financial ramifications, one potentially overlooked consequence is the impact on food poisoning rates. With fewer people eating out, the number of food poisoning complaints may have decreased. 42 | 43 | Visualize the geospatial distribution of complaints related to food poisoning in NYC in March, April, and May over a four-year time period (2018-21). Construct the chart in such a way that you can make valid comparisons over time and geographically. What impact did COVID-19 have on food poisoning cases in NYC? Did it vary geographically? 44 | 45 | ![Task 2](img/task2.png) 46 | 47 | # Task 3. Visualize food poisoning complains on Roosevelt Island 48 | 49 | Now focus on food poisoning complaints on or around Roosevelt Island.2 Use get_stamenmap() to obtain map tiles for the Roosevelt Island region and overlay with the food poisoning complaints. What type of chart is more effective for this task? 50 | 51 | > Consider adjusting your `zoom` for this geographic region. 52 | 53 | ![Task 3](img/task3.png) 54 | 55 | --- 56 | 57 | # For the next tasks: New York 2022 median household income 58 | 59 | We will use two data files for this analysis. The first contains median household incomes for each census tract in New York from 2022. The second contains the boundaries of each county in New York. 60 | 61 | ```R 62 | # useful on MacOS to speed up rendering of geom_sf() objects 63 | if (!identical(getOption("bitmapType"), "cairo") && isTRUE(capabilities()[["cairo"]])) { 64 | options(bitmapType = "cairo") 65 | } 66 | 67 | # create reusable labels for each plot 68 | map_labels <- labs( 69 | title = "Median household income in New York in 2022", 70 | subtitle = "By census tract", 71 | color = NULL, 72 | fill = NULL, 73 | caption = "Source: American Community Survey" 74 | ) 75 | 76 | # load data 77 | ny_inc <- read_rds(file = "ny-inc.rds") 78 | ny_counties <- read_rds(file = "ny-counties.rds") 79 | ``` 80 | 81 | # Task 4. Draw a continuous choropleth of median household income 82 | 83 | Create a choropleth map of median household income in New York. Use a continuous color gradient to identify each tract’s median household income. Use a continuous color gradient to identify each tract’s median household income. 84 | 85 | > Use the stored `map_labels` to set the title, subtitle, and caption for this and the remaining plots. 86 | 87 | ![Task 4](img/task4.png) 88 | 89 | **OPTIONAL:** Use `viridis` color palette. 90 | 91 | ![Task 4 - Optional](img/task4-optional.png) 92 | 93 | # Task 5. Overlay county borders 94 | 95 | To provide better context, overlay the NY county borders on the choropleth map using the data from `ny_counties`. 96 | 97 | ![Task 5](img/task5.png) 98 | 99 | # B. Reading Material 100 | 101 | ## 1. Hall of Shame (is back!) 102 | 103 | ![Jon Schwabish -What Not to do in Data Visualization: A Walk through the Bad DataViz Hall of Shame](img/shame.png) 104 | 105 | "[Jon Schwabish](https://policyviz.com/about/) is an economist who trains people to be better at data visualization." This week we don't have just one example for the Hall of Shame but a rather interesting video going through numerous bad examples of data visualization used in real life, along with the comments of Professor Jon Schwabish. 106 | 107 | Check out the video here: [What Not to do in Data Visualization: A Walk through the Bad DataViz Hall of Shame](https://www.youtube.com/watch?v=KluzR75S6U0) 108 | 109 | > Talk abstract: Prepare to be amused and enlightened as we embark on a comical journey through the quirky world of bad data visualizations. In this light-hearted talk, I’ll showcase some of the most outrageous and baffling data visual blunders that have left audiences scratching their heads. From pie charts that vie you everything to bar charts that distort and mislead, you’ll see it all. I mix the comical with the serious to unveil visual missteps in the data world. Amidst the 3D exploding charts, you'll also glean valuable lessons on what not to do when crafting data visualizations. Join me for a rollicking exploration of data gone wrong and leave with a smile and a newfound appreciation for the importance of clarity and accuracy in our data-driven endeavors. 110 | 111 | 112 | ## 2. Getting a Stadia Maps API Key 113 | 114 | [Stadia Maps](https://stadiamaps.com/) is a readily available collection of APIs related to maps and geolocations. To get started with this week exercises, you should sign up for a Stadia Maps API key following [this guide](https://search.r-project.org/CRAN/refmans/ggmap/html/register_stadiamaps.html). 115 | 116 | ## 3. Introduction to `ggmap` in R 117 | 118 | The `ggmap` package is a valuable tool for R users who work with spatial visualization. It serves as an extension to the popular `ggplot2` package, enabling the integration of map data with the customizable plotting capabilities of `ggplot2`. Whether you are a data scientist, a geospatial analyst, or just someone interested in mapping data, `ggmap` offers a flexible and powerful approach to creating detailed and aesthetically pleasing maps. 119 | 120 | The primary feature of `ggmap` is its ability to download and render map tiles from popular mapping services like Google Maps, OpenStreetMap, and Stamen Maps. This allows users to overlay their data onto these maps, creating rich, interactive visualizations that can convey complex geographical data in an intuitive way. 121 | 122 | `ggmap` simplifies the process of geocoding addresses into latitude and longitude coordinates and vice versa, making it easier to plot location-based data. Additionally, it supports the customization of base maps, enabling users to adjust the aesthetic elements such as color schemes, annotations, and layers to suit specific needs. 123 | 124 | In the following sections, we will delve into how to install `ggmap`, retrieve maps, geocode data, and customize your maps to tell stories with your data effectively. Whether you are plotting simple location data or conducting complex spatial analysis, `ggmap` provides the tools necessary to produce insightful and visually engaging maps. 125 | 126 | A great introduction to `ggmap` can be found here: [ggmap: Spatial Visualization with ggplot2](https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf) 127 | 128 | ## 4. Choropleth map 129 | 130 | A choropleth map is a type of thematic map where areas are shaded or patterned in proportion to the measurement of a statistical variable being displayed on the map, such as population density or per-capita income. The purpose of a choropleth map is to provide an easy visual way to compare data across geographic regions without needing to understand precise values. 131 | 132 | Key characteristics of choropleth maps include: 133 | 134 | - **Data Classification**: Data is typically classified into different categories or ranges, such as income levels or temperature ranges. These categories are then assigned different colors or patterns. 135 | - **Color Schemes**: Choropleth maps use color gradients to represent different data ranges. Warmer colors (e.g., red, orange) might represent higher values, while cooler colors (e.g., blue, green) might represent lower values. The choice of colors can significantly affect the readability and interpretability of the data. 136 | Geographic Units: The map is divided into spatial units such as countries, states, counties, or districts. Each unit is filled with a color corresponding to the data value for that area. 137 | - **Simplicity and Accessibility**: These maps make complex data more accessible and understandable to a broad audience, facilitating easier comparison and analysis of regional data. 138 | 139 | Choropleth maps are widely used in various fields such as meteorology, public health, economics, and political science to visualize and analyze spatial data distributions. However, they also have limitations, such as potentially misleading interpretations if the areas of the geographic units vary significantly in size, or if the data is not normalized (e.g., total values vs. per capita). 140 | 141 | > However, its downside is that regions with bigger sizes tend to have a bigger weight in the map interpretation, which includes a bias. - [from Data to Viz](https://www.data-to-viz.com/graph/choropleth.html) 142 | 143 | ![Choropleth map example](img/choropleth.png) 144 | 145 | ## 5. Geographic Information System (GIS) Data and the `sf` library 146 | 147 | A geographic information system (GIS) is a computer system for capturing, storing, checking, and displaying data related to positions on Earth’s surface. GIS can show many different kinds of data on one map, such as streets, buildings, and vegetation. This enables people to more easily see, analyze, and understand patterns and relationships. ([National Geographic](https://education.nationalgeographic.org/resource/geographic-information-system-gis/)) 148 | 149 | The `sf` package in R stands for "simple features" and is a modern approach to handling spatial and geographic data within the R programming environment. The package provides classes and functions to manipulate, process, and visualize spatial data, integrating well with the tidyverse suite of packages for data analysis. 150 | 151 | We can combine GIS data and the `sf` library to draw vector shapes based on geographical data such as country or state borders. For example, to draw a vector border of a state using GIS data in R with the `geom_sf()` function from the `sf` package, you'll need to follow a few key steps. This involves obtaining spatial data in a vector format, preparing the data, and then plotting it. 152 | 153 | You can get the spatial data from various sources such as the `rnaturalearth` and `rnaturalearthdata` libraries and use it directly with the `sf` library: 154 | 155 | ```R 156 | #install.packages("rnaturalearth") 157 | #install.packages("rnaturalearthdata") 158 | library(rnaturalearth) 159 | library(rnaturalearthdata) 160 | 161 | world <- ne_countries(scale = "medium", returnclass = "sf") 162 | ggplot(data = world) + 163 | geom_sf() 164 | ``` 165 | 166 | ![geom_sf example](img/geomsf.png) -------------------------------------------------------------------------------- /Week 10/Week10-AE-RMarkdown-Key.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Week10-AE-RMarkdown" 3 | output: html_document 4 | date: "2024-04-25" 5 | --- 6 | 7 | ```{r} 8 | library(tidyverse) 9 | #install.packages('ggmap') 10 | library(ggmap) 11 | #install.packages('sf') 12 | library(sf) 13 | library(colorspace) 14 | library(scales) 15 | ``` 16 | 17 | Let’s first load a subset of 311 service requests in New York City. This subset includes 311 service requests related to Food Poisoning in commercial establishments (e.g. restaurants, cafeterias, food carts). 18 | 19 | ```{r} 20 | nyc_311 <- read_csv(file = "nyc-311.csv", show_col_types = FALSE) 21 | nyc_311 22 | ``` 23 | 24 | # Store your Stadia Maps API key using the function 25 | 26 | ```{r} 27 | register_stadiamaps(key = "9ae33801-3948-44ed-af85-6b5081ae22bf", write = TRUE) 28 | ``` 29 | 30 | # Task 1. Obtain map tiles for New York City 31 | 32 | Use [bboxfinder.com] to find bounding box coordinates for New York City. Then, use `get_stamenmap()` to obtain map tiles for New York City and visualize the map. 33 | 34 | > Try using a `zoom` level of 11. 35 | 36 | ```{r} 37 | # store bounding box coordinates 38 | nyc_bb <- c( 39 | left = -74.263045, 40 | bottom = 40.487652, 41 | right = -73.675963, 42 | top = 40.934743 43 | ) 44 | 45 | nyc <- get_stadiamap( 46 | bbox = nyc_bb, 47 | zoom = 11 48 | ) 49 | 50 | # plot the raster map 51 | ggmap(nyc) 52 | ``` 53 | 54 | # Task 2. Food poisoning rates 55 | 56 | The COVID-19 pandemic caused massive disruption in the restaurant industry. Due to social distancing measures and lockdowns, restaurant traffic decreased significantly. 57 | 58 | While this had significant financial ramifications, one potentially overlooked consequence is the impact on food poisoning rates. With fewer people eating out, the number of food poisoning complaints may have decreased. 59 | 60 | Visualize the geospatial distribution of complaints related to food poisoning in NYC in March, April, and May over a four-year time period (2018-21). Construct the chart in such a way that you can make valid comparisons over time and geographically. What impact did COVID-19 have on food poisoning cases in NYC? Did it vary geographically? 61 | 62 | ```{r} 63 | nyc_covid_food_poison <- nyc_311 |> 64 | # generate a year variable 65 | mutate(year = year(created_date)) |> 66 | # only keep reports in March, April, and May from 2018-21 67 | filter(month(created_date) %in% 3:5, year %in% 2018:2021) 68 | 69 | ggmap(nyc) + 70 | # add the heatmap 71 | stat_density_2d( 72 | data = nyc_covid_food_poison, 73 | mapping = aes( 74 | x = longitude, 75 | y = latitude, 76 | fill = after_stat(level) 77 | ), 78 | alpha = .1, 79 | bins = 50, 80 | geom = "polygon" 81 | ) + 82 | scale_fill_viridis_c() + 83 | facet_wrap(facets = vars(year)) 84 | ``` 85 | 86 | # Task 3. Visualize food poisoning complains on Roosevelt Island 87 | 88 | Now focus on food poisoning complaints on or around Roosevelt Island.2 Use get_stamenmap() to obtain map tiles for the Roosevelt Island region and overlay with the food poisoning complaints. What type of chart is more effective for this task? 89 | 90 | > Consider adjusting your `zoom` for this geographic region. 91 | 92 | ```{r} 93 | # Obtain map tiles for Roosevelt Island 94 | roosevelt_bb <- c( 95 | left = -73.967121, 96 | bottom = 40.748700, 97 | right = -73.937080, 98 | top = 40.774704 99 | ) 100 | roosevelt <- get_stadiamap( 101 | bbox = roosevelt_bb, 102 | zoom = 14 103 | ) 104 | 105 | # Generate a scatterplot of food poisoning complaints 106 | ggmap(roosevelt) + 107 | # add a scatterplot layer 108 | geom_point( 109 | data = filter(nyc_311, complaint_type == "Food Poisoning"), 110 | mapping = aes( 111 | x = longitude, 112 | y = latitude 113 | ), 114 | alpha = 0.2 115 | ) 116 | ``` 117 | 118 | # New York 2022 median household income 119 | 120 | We will use two data files for this analysis. The first contains median household incomes for each census tract in New York from 2022. The second contains the boundaries of each county in New York. 121 | 122 | ```{r} 123 | # useful on MacOS to speed up rendering of geom_sf() objects 124 | if (!identical(getOption("bitmapType"), "cairo") && isTRUE(capabilities()[["cairo"]])) { 125 | options(bitmapType = "cairo") 126 | } 127 | 128 | # create reusable labels for each plot 129 | map_labels <- labs( 130 | title = "Median household income in New York in 2022", 131 | subtitle = "By census tract", 132 | color = NULL, 133 | fill = NULL, 134 | caption = "Source: American Community Survey" 135 | ) 136 | 137 | # load data 138 | ny_inc <- read_rds(file = "ny-inc.rds") 139 | ny_counties <- read_rds(file = "ny-counties.rds") 140 | 141 | ny_inc 142 | ``` 143 | 144 | ```{r} 145 | ny_counties 146 | ``` 147 | 148 | # Task 4. Draw a continuous choropleth of median household income 149 | 150 | Create a choropleth map of median household income in New York. Use a continuous color gradient to identify each tract’s median household income. Use a continuous color gradient to identify each tract’s median household income. 151 | 152 | > Use the stored `map_labels` to set the title, subtitle, and caption for this and the remaining plots. 153 | 154 | ```{r} 155 | ggplot(data = ny_inc) + 156 | # use fill and color to avoid gray boundary lines 157 | geom_sf(aes(fill = medincomeE, color = medincomeE)) + 158 | # increase interpretability of graph 159 | scale_color_continuous(labels = label_dollar()) + 160 | scale_fill_continuous(labels = label_dollar()) + 161 | map_labels 162 | ``` 163 | 164 | **OPTIONAL**: Use `viridis` color palette. 165 | ```{r} 166 | ggplot(data = ny_inc) + 167 | # use fill and color to avoid gray boundary lines 168 | geom_sf(mapping = aes(fill = medincomeE, color = medincomeE)) + 169 | # increase interpretability of graph 170 | scale_fill_continuous_sequential( 171 | palette = "viridis", 172 | rev = FALSE, 173 | aesthetics = c("fill", "color"), 174 | labels = label_dollar(), 175 | name = NULL 176 | ) + 177 | map_labels 178 | ``` 179 | 180 | # Task 5. Overlay county borders 181 | 182 | To provide better context, overlay the NY county borders on the choropleth map. 183 | 184 | ```{r} 185 | ggplot(data = ny_inc) + 186 | # use fill and color to avoid gray boundary lines 187 | geom_sf(mapping = aes(fill = medincomeE, color = medincomeE)) + 188 | # add county borders 189 | geom_sf(data = ny_counties, color = "white", fill = NA) + 190 | # increase interpretability of graph 191 | scale_fill_continuous_sequential( 192 | palette = "viridis", 193 | rev = FALSE, 194 | aesthetics = c("fill", "color"), 195 | labels = label_dollar(), 196 | name = NULL 197 | ) + 198 | map_labels 199 | ``` -------------------------------------------------------------------------------- /Week 10/Week10-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Week10-AE-RMarkdown" 3 | output: html_document 4 | date: "2024-04-25" 5 | --- 6 | 7 | ```{r} 8 | library(tidyverse) 9 | #install.packages('ggmap') 10 | library(ggmap) 11 | ``` 12 | 13 | Let’s first load a subset of 311 service requests in New York City. This subset includes 311 service requests related to Food Poisoning in commercial establishments (e.g. restaurants, cafeterias, food carts). 14 | 15 | ```{r} 16 | nyc_311 <- read_csv(file = "nyc-311.csv", show_col_types = FALSE) 17 | nyc_311 18 | ``` 19 | 20 | # Store your Stadia Maps API key using the function 21 | 22 | ```{r} 23 | register_stadiamaps(key = "YOUR KEY HERE", write = TRUE) 24 | ``` 25 | 26 | # Task 1. Obtain map tiles for New York City 27 | 28 | Use [bboxfinder.com] to find bounding box coordinates for New York City. Then, use `get_stamenmap()` to obtain map tiles for New York City and visualize the map. 29 | 30 | > Try using a `zoom` level of 11. 31 | 32 | ```{r} 33 | # store bounding box coordinates 34 | nyc_bb <- c( 35 | left = YOUR_CODE_HERE, 36 | bottom = YOUR_CODE_HERE, 37 | right = YOUR_CODE_HERE, 38 | top = YOUR_CODE_HERE 39 | ) 40 | 41 | nyc <- get_stadiamap( 42 | bbox = nyc_bb, 43 | zoom = YOUR_CODE_HERE 44 | ) 45 | 46 | # plot the raster map 47 | ggmap(nyc) 48 | ``` 49 | 50 | # Task 2. Food poisoning rates 51 | 52 | The COVID-19 pandemic caused massive disruption in the restaurant industry. Due to social distancing measures and lockdowns, restaurant traffic decreased significantly. 53 | 54 | While this had significant financial ramifications, one potentially overlooked consequence is the impact on food poisoning rates. With fewer people eating out, the number of food poisoning complaints may have decreased. 55 | 56 | Visualize the geospatial distribution of complaints related to food poisoning in NYC in March, April, and May over a four-year time period (2018-21). Construct the chart in such a way that you can make valid comparisons over time and geographically. What impact did COVID-19 have on food poisoning cases in NYC? Did it vary geographically? 57 | 58 | ```{r} 59 | nyc_covid_food_poison <- nyc_311 |> 60 | # generate a year variable 61 | mutate(year = year(created_date)) |> 62 | # only keep reports in March, April, and May from 2018-21 63 | filter(month(created_date) %in% 3:5, year %in% 2018:2021) 64 | 65 | # YOUR CODE HERE 66 | ``` 67 | 68 | # Task 3. Visualize food poisoning complains on Roosevelt Island 69 | 70 | Now focus on food poisoning complaints on or around Roosevelt Island.2 Use get_stamenmap() to obtain map tiles for the Roosevelt Island region and overlay with the food poisoning complaints. What type of chart is more effective for this task? 71 | 72 | > Consider adjusting your `zoom` for this geographic region. 73 | 74 | ```{r} 75 | # Obtain map tiles for Roosevelt Island 76 | roosevelt_bb <- c( 77 | left = YOUR_CODE_HERE, 78 | bottom = YOUR_CODE_HERE, 79 | right = YOUR_CODE_HERE, 80 | top = YOUR_CODE_HERE 81 | ) 82 | roosevelt <- get_stadiamap( 83 | bbox = roosevelt_bb, 84 | zoom = YOUR_CODE_HERE 85 | ) 86 | 87 | # YOUR CODE HERE 88 | 89 | ``` 90 | 91 | # New York 2022 median household income 92 | 93 | We will use two data files for this analysis. The first contains median household incomes for each census tract in New York from 2022. The second contains the boundaries of each county in New York. 94 | 95 | ```{r} 96 | # useful on MacOS to speed up rendering of geom_sf() objects 97 | if (!identical(getOption("bitmapType"), "cairo") && isTRUE(capabilities()[["cairo"]])) { 98 | options(bitmapType = "cairo") 99 | } 100 | 101 | # create reusable labels for each plot 102 | map_labels <- labs( 103 | title = "Median household income in New York in 2022", 104 | subtitle = "By census tract", 105 | color = NULL, 106 | fill = NULL, 107 | caption = "Source: American Community Survey" 108 | ) 109 | 110 | # load data 111 | ny_inc <- read_rds(file = "ny-inc.rds") 112 | ny_counties <- read_rds(file = "ny-counties.rds") 113 | 114 | ny_inc 115 | ``` 116 | 117 | ```{r} 118 | ny_counties 119 | ``` 120 | 121 | # Task 4. Draw a continuous choropleth of median household income 122 | 123 | Create a choropleth map of median household income in New York. Use a continuous color gradient to identify each tract’s median household income. Use a continuous color gradient to identify each tract’s median household income. 124 | 125 | > Use the stored `map_labels` to set the title, subtitle, and caption for this and the remaining plots. 126 | 127 | ```{r} 128 | # YOUR CODE HERE 129 | 130 | 131 | ``` 132 | 133 | **OPTIONAL**: Use `viridis` color palette. 134 | ```{r} 135 | # YOUR CODE HERE 136 | 137 | 138 | ``` 139 | 140 | # Task 5. Overlay county borders 141 | To provide better context, overlay the NY county borders on the choropleth map. 142 | 143 | ```{r} 144 | # YOUR CODE HERE 145 | 146 | 147 | ``` -------------------------------------------------------------------------------- /Week 10/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/banner.png -------------------------------------------------------------------------------- /Week 10/img/choropleth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/choropleth.png -------------------------------------------------------------------------------- /Week 10/img/geomsf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/geomsf.png -------------------------------------------------------------------------------- /Week 10/img/shame.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/shame.png -------------------------------------------------------------------------------- /Week 10/img/task1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/task1.png -------------------------------------------------------------------------------- /Week 10/img/task2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/task2.png -------------------------------------------------------------------------------- /Week 10/img/task3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/task3.png -------------------------------------------------------------------------------- /Week 10/img/task4-optional.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/task4-optional.png -------------------------------------------------------------------------------- /Week 10/img/task4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/task4.png -------------------------------------------------------------------------------- /Week 10/img/task5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/img/task5.png -------------------------------------------------------------------------------- /Week 10/ny-counties.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/ny-counties.rds -------------------------------------------------------------------------------- /Week 10/ny-inc.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 10/ny-inc.rds -------------------------------------------------------------------------------- /Week 11/Week11-AE-Notes.md: -------------------------------------------------------------------------------- 1 | # COMP4010/5120 - Week 11 Application Exercises 2 | 3 | --- 4 | 5 | # A. Application Exercises 6 | 7 | **Data:** [`ad_aqi_tracker_data-2023.csv`](./ad_aqi_tracker_data-2023.csv), [`freedom.csv`](./freedom.csv) 8 | 9 | ```R 10 | library(tidyverse) 11 | library(scales) 12 | library(janitor) 13 | library(colorspace) 14 | 15 | #install.packages('gganimate') 16 | library(gganimate) 17 | 18 | aqi_levels <- tribble( 19 | ~aqi_min, ~aqi_max, ~color, ~level, 20 | 0, 50, "#D8EEDA", "Good", 21 | 51, 100, "#F1E7D4", "Moderate", 22 | 101, 150, "#F8E4D8", "Unhealthy for sensitive groups", 23 | 151, 200, "#FEE2E1", "Unhealthy", 24 | 201, 300, "#F4E3F7", "Very unhealthy", 25 | 301, 400, "#F9D0D4", "Hazardous" 26 | ) 27 | 28 | freedom <- read_csv( 29 | file = "freedom.csv", 30 | na = "-" 31 | ) 32 | 33 | syr_2023 <- read_csv(file = "ad_aqi_tracker_data-2023.csv") 34 | 35 | syr_2023 <- syr_2023 |> 36 | clean_names() |> 37 | mutate(date = mdy(date)) 38 | ``` 39 | --- 40 | 41 | First, we would like to recreate this plot: 42 | 43 | ![](img/syracuse.png) 44 | 45 | Initially, our plot looks like this: 46 | 47 | ```R 48 | syr_2023 |> 49 | ggplot(aes(x = date, y = aqi_value, group = 1)) + 50 | # plot the AQI in Syracuse 51 | geom_line(linewidth = 1, alpha = 0.5) 52 | ``` 53 | 54 | ## Task 1. Add color shading to the plot based on the AQI guide. The color palette does not need to match the specific colors in the table yet. 55 | 56 | ![](img/task1.png) 57 | 58 | ## Task 2. Use the hexidecimal colors from the dataset for the color palette. 59 | 60 | ![](img/task2.png) 61 | 62 | ## Task 3. Label each AQI category on the chart 63 | 64 | Incorporate text labels for each AQI value directly into the graph. To accomplish this, you need to: 65 | 66 | - Calculate the midpoint AQI value for each category 67 | - Add a geom_text() layer to the plot with the AQI values positioned at the midpoint on the y-axis 68 | 69 | Extend the range of the x-axis to provide more horizontal space for the AQI category labels without interfering with the trend line. 70 | 71 | ![](img/task3.png) 72 | 73 | # Task 4. Clean up the plot. 74 | 75 | Add a meaningful title, axis labels, caption, etc. 76 | 77 | ![](img/task4.png) 78 | 79 | --- 80 | 81 | From this task we will be using the `freedom.csv` data to practice animating a bar chart with `gganimate`. 82 | 83 | Here is the data we will be using: 84 | 85 | ```{r} 86 | freedom_to_plot <- freedom |> 87 | # calculate rowwise standard deviations (one row per country) 88 | rowwise() |> 89 | mutate(sd = sd(c_across(contains("cl_")), na.rm = TRUE)) |> 90 | ungroup() |> 91 | # find the 15 countries with the highest standard deviations 92 | relocate(country, sd) |> 93 | slice_max(order_by = sd, n = 15) |> 94 | # only keep countries with complete observations - necessary for future plotting 95 | drop_na() 96 | freedom_to_plot 97 | ``` 98 | 99 | ```{r} 100 | # calculate position rankings rather than raw scores 101 | freedom_ranked <- freedom_to_plot |> 102 | # only keep columns with civil liberties scores 103 | select(country, contains("cl_")) |> 104 | # wrangle the data to a long format 105 | pivot_longer( 106 | cols = -country, 107 | names_to = "year", 108 | values_to = "civil_liberty", 109 | names_prefix = "cl_", 110 | names_transform = list(year = as.numeric) 111 | ) |> 112 | # calculate rank within year - larger is worse, so reverse in the ranking 113 | group_by(year) |> 114 | mutate(rank_in_year = rank(-civil_liberty, ties.method = "first")) |> 115 | ungroup() |> 116 | # highlight Venezuela 117 | mutate(is_venezuela = if_else(country == "Venezuela", TRUE, FALSE)) 118 | freedom_ranked 119 | ``` 120 | ## Task 5. Fill in the code given to create a faceted bar plot by year. 121 | 122 | ![](img/task5.png) 123 | 124 | ## Task 6. Fill in the code given to turn the facet plot into an animation showing the data by year. 125 | 126 | ![](img/task6-1.gif) 127 | 128 | ```{r} 129 | # smoother transition - might take a while to render 130 | animate(freedom_bar_race, nframes = 300, fps = 100, start_pause = 10, end_pause = 10) 131 | ``` 132 | 133 | ![](img/task6-2.gif) -------------------------------------------------------------------------------- /Week 11/Week11-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Week11-AE-RMarkdown" 3 | output: 4 | pdf_document: default 5 | html_document: default 6 | date: "2024-04-25" 7 | --- 8 | 9 | ```{r} 10 | library(tidyverse) 11 | library(scales) 12 | library(janitor) 13 | library(colorspace) 14 | 15 | #install.packages('gganimate') 16 | library(gganimate) 17 | 18 | aqi_levels <- tribble( 19 | ~aqi_min, ~aqi_max, ~color, ~level, 20 | 0, 50, "#D8EEDA", "Good", 21 | 51, 100, "#F1E7D4", "Moderate", 22 | 101, 150, "#F8E4D8", "Unhealthy for sensitive groups", 23 | 151, 200, "#FEE2E1", "Unhealthy", 24 | 201, 300, "#F4E3F7", "Very unhealthy", 25 | 301, 400, "#F9D0D4", "Hazardous" 26 | ) 27 | 28 | 29 | # Data is in a wide format 30 | # pr_* - political rights, scores 1-7 (1 highest degree of freedom, 7 the lowest) 31 | # cl_* - civil liberties, scores 1-7 (1 highest degree of freedom, 7 the lowest) 32 | # status_* - freedom status (Free, Partly Free, and Not Free) 33 | 34 | freedom <- read_csv( 35 | file = "freedom.csv", 36 | na = "-" 37 | ) 38 | 39 | syr_2023 <- read_csv(file = "ad_aqi_tracker_data-2023.csv") 40 | 41 | syr_2023 <- syr_2023 |> 42 | clean_names() |> 43 | mutate(date = mdy(date)) 44 | ``` 45 | 46 | ```{r} 47 | syr_2023 |> 48 | ggplot(aes(x = date, y = aqi_value, group = 1)) + 49 | # plot the AQI in Syracuse 50 | geom_line(linewidth = 1, alpha = 0.5) 51 | ``` 52 | 53 | # Task 1. Add color shading to the plot based on the AQI guide. The color palette does not need to match the specific colors in the table yet. 54 | ```{r} 55 | syr_plot <- syr_2023 |> 56 | ggplot(aes(x = date, y = aqi_value, group = 1)) + 57 | # shade in background with colors based on AQI guide 58 | geom_rect( 59 | data = ..., 60 | aes( 61 | ymin = ..., ymax = ..., 62 | xmin = ..., xmax = ..., 63 | x = ..., y = ..., fill = color 64 | ) 65 | ) + 66 | # plot the AQI in Syracuse 67 | geom_line(linewidth = 1, alpha = 0.5) 68 | 69 | syr_plot 70 | ``` 71 | 72 | # Task 2. Use the hexidecimal colors from the dataset for the color palette. 73 | 74 | ```{r} 75 | syr_plot_colored <- syr_plot + 76 | ...() 77 | 78 | syr_plot_colored 79 | ``` 80 | 81 | # Task 3. Label each AQI category on the chart 82 | Incorporate text labels for each AQI value directly into the graph. To accomplish this, you need to: 83 | 84 | - Calculate the midpoint AQI value for each category 85 | - Add a geom_text() layer to the plot with the AQI values positioned at the midpoint on the y-axis 86 | 87 | Extend the range of the x-axis to provide more horizontal space for the AQI category labels without interfering with the trend line. 88 | 89 | ```{r} 90 | aqi_levels <- aqi_levels |> 91 | mutate(aqi_mid = ((aqi_min + aqi_max) / 2)) 92 | 93 | syr_plot_labelled <- syr_plot_colored + 94 | scale_x_date( 95 | name = NULL, date_labels = "%b %Y", 96 | limits = c(ymd("2023-01-01"), ymd("2024-03-01")) 97 | ) + 98 | # add text labels for each AQI category 99 | geom_text( 100 | data = ..., 101 | aes(x = ..., y = aqi_mid, label = level), 102 | hjust = 1, size = 6, fontface = "bold", color = "white", 103 | family = "Atkinson Hyperlegible" 104 | ) 105 | 106 | syr_plot_labelled 107 | ``` 108 | 109 | # Task 4. Clean up the plot. 110 | 111 | Add a meaningful title, axis labels, caption, etc. 112 | ```{r} 113 | 114 | syr_plot_labelled <- syr_plot_labelled + 115 | theme_minimal() + 116 | ...() 117 | 118 | syr_plot_labelled 119 | ``` 120 | 121 | --- 122 | 123 | From this task we will be using the `freedom.csv` data. 124 | 125 | ```{r} 126 | freedom_to_plot <- freedom |> 127 | # calculate rowwise standard deviations (one row per country) 128 | rowwise() |> 129 | mutate(sd = sd(c_across(contains("cl_")), na.rm = TRUE)) |> 130 | ungroup() |> 131 | # find the 15 countries with the highest standard deviations 132 | relocate(country, sd) |> 133 | slice_max(order_by = sd, n = 15) |> 134 | # only keep countries with complete observations - necessary for future plotting 135 | drop_na() 136 | freedom_to_plot 137 | ``` 138 | 139 | ```{r} 140 | # calculate position rankings rather than raw scores 141 | freedom_ranked <- freedom_to_plot |> 142 | # only keep columns with civil liberties scores 143 | select(country, contains("cl_")) |> 144 | # wrangle the data to a long format 145 | pivot_longer( 146 | cols = -country, 147 | names_to = "year", 148 | values_to = "civil_liberty", 149 | names_prefix = "cl_", 150 | names_transform = list(year = as.numeric) 151 | ) |> 152 | # calculate rank within year - larger is worse, so reverse in the ranking 153 | group_by(year) |> 154 | mutate(rank_in_year = rank(-civil_liberty, ties.method = "first")) |> 155 | ungroup() |> 156 | # highlight Venezuela 157 | mutate(is_venezuela = if_else(country == "Venezuela", TRUE, FALSE)) 158 | freedom_ranked 159 | ``` 160 | 161 | # Task 5. Fill in the code below to create a faceted bar plot by year. 162 | 163 | ```{r} 164 | freedom_faceted_plot <- freedom_ranked |> 165 | # civil liberty vs freedom rank 166 | ggplot(aes(x = ..., y = ..., fill = ...)) + 167 | geom_col(show.legend = FALSE) + 168 | # change the color palette for emphasis of Venezuela 169 | scale_fill_manual(values = c("gray", "red")) + 170 | # facet by year 171 | facet_wrap(vars(year)) + 172 | # create explicit labels for civil liberties score, 173 | # leaving room for country text labels 174 | scale_x_continuous( 175 | limits = ..., 176 | breaks = ... 177 | ) + 178 | geom_text( 179 | hjust = "...", 180 | aes(label = ...), 181 | x = ... 182 | ) + 183 | # remove extraneous theme/label components 184 | theme( 185 | panel.grid.major.y = element_blank(), 186 | panel.grid.minor = element_blank(), 187 | axis.text.y = element_blank() 188 | ) + 189 | labs(x = NULL, y = NULL) 190 | 191 | freedom_faceted_plot 192 | ``` 193 | # Task 6. Fill in the code below to turn the facet plot into an animation showing the data by year. 194 | 195 | ```{r} 196 | # If your code generates a bunch of png's instead of showing an animated object: 197 | # Install this: 198 | #install.packages("magick") 199 | # And restart your R session 200 | ``` 201 | 202 | ```{r} 203 | freedom_bar_race <- freedom_faceted_plot + 204 | # remove faceting 205 | facet_null() + 206 | # label the current year in the top corner of the plot 207 | geom_text( 208 | x = ..., y = ..., 209 | hjust = "...", 210 | aes(label = ...), 211 | size = ... 212 | ) + 213 | # define group structure for transitions 214 | aes(group = ...) + 215 | # temporal transition - ensure integer value for labeling 216 | transition_time(...) + 217 | labs( 218 | title = "Civil liberties rating, {frame_time}", 219 | subtitle = "1: Highest degree of freedom - 7: Lowest degree of freedom" 220 | ) 221 | 222 | # basic transition 223 | animate(freedom_bar_race, nframes = 30, fps = 2) 224 | ``` 225 | 226 | ```{r} 227 | # smoother transition - might take a while to render 228 | animate(freedom_bar_race, nframes = 300, fps = 100, start_pause = 10, end_pause = 10) 229 | ``` 230 | 231 | -------------------------------------------------------------------------------- /Week 11/img/syracuse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/syracuse.png -------------------------------------------------------------------------------- /Week 11/img/task0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task0.png -------------------------------------------------------------------------------- /Week 11/img/task1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task1.png -------------------------------------------------------------------------------- /Week 11/img/task2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task2.png -------------------------------------------------------------------------------- /Week 11/img/task3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task3.png -------------------------------------------------------------------------------- /Week 11/img/task4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task4.png -------------------------------------------------------------------------------- /Week 11/img/task5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task5.png -------------------------------------------------------------------------------- /Week 11/img/task6-1.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task6-1.gif -------------------------------------------------------------------------------- /Week 11/img/task6-2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 11/img/task6-2.gif -------------------------------------------------------------------------------- /Week 12/Week12-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Iris & Penguins Dataset Dashboard" 3 | author: "Your Name" 4 | output: 5 | html_document: 6 | runtime: shiny 7 | --- 8 | 9 | ```{r setup, include=FALSE} 10 | library(shiny) 11 | library(dplyr) 12 | library(ggplot2) 13 | library(plotly) 14 | library(palmerpenguins) #install.packages('palmerpenguins') 15 | library(bslib) #install.packages("bslib") 16 | library(bsicons) 17 | data(iris) 18 | penguins <- penguins_raw 19 | ``` 20 | 21 | # Starting a Shiny app 22 | 23 | ```{r} 24 | # Define UI for the Shiny app 25 | ui <- fluidPage( 26 | titlePanel("Iris and Penguin Dataset Dashboard") 27 | ) 28 | 29 | # Define server logic for the Shiny app 30 | server <- function(input, output) {} 31 | 32 | # Run the application 33 | shinyApp(ui = ui, server = server) 34 | ``` 35 | 36 | # Adding tabs 37 | 38 | ```{r} 39 | # Define UI for the Shiny app 40 | ui <- fluidPage( 41 | titlePanel("Iris and Penguin Dataset Dashboard"), 42 | tabsetPanel( 43 | tabPanel("Iris Dataset"), 44 | tabPanel("Penguin Dataset") 45 | ) 46 | ) 47 | 48 | # Define server logic for the Shiny app 49 | server <- function(input, output) {} 50 | 51 | # Run the application 52 | shinyApp(ui = ui, server = server) 53 | ``` 54 | 55 | 56 | # Adding layout and controls for Iris 57 | 58 | ```{r} 59 | # Define UI for the Shiny app 60 | ui <- fluidPage( 61 | titlePanel("Iris and Penguin Dataset Dashboard"), 62 | tabsetPanel( 63 | tabPanel("Iris Dataset", 64 | sidebarLayout( 65 | sidebarPanel( 66 | selectInput("iris_xvar", "X-axis variable", choices = names(iris)[1:4]), 67 | selectInput("iris_yvar", "Y-axis variable", choices = names(iris)[1:4], selected = names(iris)[[2]]), 68 | selectInput("iris_species", "Species", choices = unique(iris$Species), multiple = TRUE, selected = unique(iris$Species)) 69 | ), 70 | mainPanel( 71 | plotlyOutput("iris_scatterPlot"), 72 | tableOutput("iris_dataTable") 73 | ) 74 | ) 75 | ), 76 | tabPanel("Penguin Dataset") 77 | ) 78 | ) 79 | 80 | # Define server logic for the Shiny app 81 | server <- function(input, output) {} 82 | 83 | # Run the application 84 | shinyApp(ui = ui, server = server) 85 | ``` 86 | 87 | # Adding layout and controls for Penguins 88 | 89 | ```{r} 90 | # Define UI for the Shiny app 91 | ui <- fluidPage( 92 | titlePanel("Iris and Penguin Dataset Dashboard"), 93 | tabsetPanel( 94 | tabPanel("Iris Dataset", 95 | sidebarLayout( 96 | sidebarPanel( 97 | selectInput("iris_xvar", "X-axis variable", choices = names(iris)[1:4]), 98 | selectInput("iris_yvar", "Y-axis variable", choices = names(iris)[1:4], selected = names(iris)[[2]]), 99 | selectInput("iris_species", "Species", choices = unique(iris$Species), multiple = TRUE, selected = unique(iris$Species)) 100 | ), 101 | mainPanel( 102 | plotlyOutput("iris_scatterPlot"), 103 | tableOutput("iris_dataTable") 104 | ) 105 | ) 106 | ), 107 | tabPanel("Penguin Dataset", 108 | sidebarLayout( 109 | sidebarPanel( 110 | selectInput("penguin_xvar", "X-axis variable", choices = names(penguins)[3:6]), 111 | selectInput("penguin_species", "Species", choices = unique(penguins$Species), multiple = TRUE, selected = unique(penguins$Species)) 112 | ), 113 | mainPanel( 114 | plotlyOutput("penguin_barPlot"), 115 | tableOutput("penguin_dataTable") 116 | ) 117 | ) 118 | ) 119 | ) 120 | ) 121 | 122 | # Define server logic for the Shiny app 123 | server <- function(input, output) {} 124 | 125 | # Run the application 126 | shinyApp(ui = ui, server = server) 127 | 128 | ``` 129 | 130 | # Adding reactive filtering for Iris 131 | 132 | ```{r} 133 | # Define UI for the Shiny app 134 | ui <- fluidPage( 135 | titlePanel("Iris and Penguin Dataset Dashboard"), 136 | tabsetPanel( 137 | tabPanel("Iris Dataset", 138 | sidebarLayout( 139 | sidebarPanel( 140 | selectInput("iris_xvar", "X-axis variable", choices = names(iris)[1:4]), 141 | selectInput("iris_yvar", "Y-axis variable", choices = names(iris)[1:4], selected = names(iris)[[2]]), 142 | selectInput("iris_species", "Species", choices = unique(iris$Species), multiple = TRUE, selected = unique(iris$Species)) 143 | ), 144 | mainPanel( 145 | plotlyOutput("iris_scatterPlot"), 146 | tableOutput("iris_dataTable") 147 | ) 148 | ) 149 | ), 150 | tabPanel("Penguin Dataset", 151 | sidebarLayout( 152 | sidebarPanel( 153 | selectInput("penguin_xvar", "X-axis variable", choices = names(penguins)[3:6]), 154 | selectInput("penguin_species", "Species", choices = unique(penguins$Species), multiple = TRUE, selected = unique(penguins$Species)) 155 | ), 156 | mainPanel( 157 | plotlyOutput("penguin_barPlot"), 158 | tableOutput("penguin_dataTable") 159 | ) 160 | ) 161 | ) 162 | ) 163 | ) 164 | 165 | # Define server logic for the Shiny app 166 | server <- function(input, output) { 167 | # Iris dataset 168 | filteredIrisData <- reactive({ 169 | iris %>% 170 | filter(Species %in% input$iris_species) 171 | }) 172 | 173 | output$iris_scatterPlot <- renderPlotly({ 174 | p <- ggplot(filteredIrisData(), aes_string(x = input$iris_xvar, y = input$iris_yvar, color = "Species", text = "Species")) + 175 | geom_point() + 176 | theme_minimal() + 177 | labs(title = "Scatter Plot of Iris Dataset", x = input$iris_xvar, y = input$iris_yvar) 178 | 179 | ggplotly(p, tooltip = c("x", "y", "Species")) 180 | }) 181 | 182 | output$iris_dataTable <- renderTable({ 183 | filteredIrisData() 184 | }) 185 | } 186 | 187 | # Run the application 188 | shinyApp(ui = ui, server = server) 189 | 190 | ``` 191 | 192 | # Adding reactive filtering for Penguins 193 | 194 | ```{r} 195 | # Define UI for the Shiny app 196 | ui <- fluidPage( 197 | titlePanel("Iris and Penguin Dataset Dashboard"), 198 | tabsetPanel( 199 | tabPanel("Iris Dataset", 200 | sidebarLayout( 201 | sidebarPanel( 202 | selectInput("iris_xvar", "X-axis variable", choices = names(iris)[1:4]), 203 | selectInput("iris_yvar", "Y-axis variable", choices = names(iris)[1:4], selected = names(iris)[[2]]), 204 | selectInput("iris_species", "Species", choices = unique(iris$Species), multiple = TRUE, selected = unique(iris$Species)) 205 | ), 206 | mainPanel( 207 | plotlyOutput("iris_scatterPlot"), 208 | tableOutput("iris_dataTable") 209 | ) 210 | ) 211 | ), 212 | tabPanel("Penguin Dataset", 213 | sidebarLayout( 214 | sidebarPanel( 215 | selectInput("penguin_xvar", "X-axis variable", choices = names(penguins)[3:6]), 216 | selectInput("penguin_species", "Species", choices = unique(penguins$Species), multiple = TRUE, selected = unique(penguins$Species)) 217 | ), 218 | mainPanel( 219 | plotlyOutput("penguin_barPlot"), 220 | tableOutput("penguin_dataTable") 221 | ) 222 | ) 223 | ) 224 | ) 225 | ) 226 | 227 | # Define server logic for the Shiny app 228 | server <- function(input, output) { 229 | # Iris dataset 230 | filteredIrisData <- reactive({ 231 | iris %>% 232 | filter(Species %in% input$iris_species) 233 | }) 234 | 235 | output$iris_scatterPlot <- renderPlotly({ 236 | p <- ggplot(filteredIrisData(), aes_string(x = input$iris_xvar, y = input$iris_yvar, color = "Species", text = "Species")) + 237 | geom_point() + 238 | theme_minimal() + 239 | labs(title = "Scatter Plot of Iris Dataset", x = input$iris_xvar, y = input$iris_yvar) 240 | 241 | ggplotly(p, tooltip = c("x", "y", "Species")) 242 | }) 243 | 244 | output$iris_dataTable <- renderTable({ 245 | filteredIrisData() 246 | }) 247 | 248 | # Penguin dataset 249 | filteredPenguinData <- reactive({ 250 | penguins %>% 251 | filter(Species %in% input$penguin_species) 252 | }) 253 | 254 | output$penguin_barPlot <- renderPlotly({ 255 | p <- ggplot(filteredPenguinData(), aes_string(x = input$penguin_xvar, fill = "Species", text = "Species")) + 256 | geom_bar(position = "dodge") + 257 | theme_minimal() + 258 | labs(title = "Bar Plot of Penguin Dataset", x = input$penguin_xvar) 259 | 260 | ggplotly(p, tooltip = c("x", "y", "Species")) 261 | }) 262 | 263 | output$penguin_dataTable <- renderTable({ 264 | filteredPenguinData() 265 | }) 266 | } 267 | 268 | # Run the application 269 | shinyApp(ui = ui, server = server) 270 | 271 | ``` 272 | 273 | # Final version using bslib and bsicon for decorations 274 | 275 | ```{r} 276 | # Define UI for the Shiny app 277 | ui <- fluidPage( 278 | theme = bs_theme( 279 | version = 5, 280 | bootswatch = "flatly", # You can choose different themes like "cosmo", "cerulean", etc. 281 | primary = "#2c3e50", # Customize colors as needed 282 | secondary = "#18bc9c" 283 | ), 284 | titlePanel("Iris and Penguin Dataset Dashboard"), 285 | tabsetPanel( 286 | tabPanel("Iris Dataset", 287 | sidebarLayout( 288 | sidebarPanel( 289 | selectInput("iris_xvar", "X-axis variable", choices = names(iris)[1:4]), 290 | selectInput("iris_yvar", "Y-axis variable", choices = names(iris)[1:4], selected = names(iris)[[2]]), 291 | selectInput("iris_species", "Species", choices = unique(iris$Species), multiple = TRUE, selected = unique(iris$Species)), 292 | value_box( 293 | title = "Iris Dataset", 294 | value = "Data about some flowers", 295 | showcase = bs_icon("database-fill-check"), 296 | p("Flowers", bs_icon("flower3")), 297 | p("and stuff", bs_icon("emoji-smile")) 298 | ) 299 | ), 300 | mainPanel( 301 | plotlyOutput("iris_scatterPlot"), 302 | tableOutput("iris_dataTable") 303 | 304 | ) 305 | ) 306 | ), 307 | tabPanel("Penguin Dataset", 308 | sidebarLayout( 309 | sidebarPanel( 310 | selectInput("penguin_xvar", "X-axis variable", choices = names(penguins)[3:6]), 311 | selectInput("penguin_species", "Species", choices = unique(penguins$Species), multiple = TRUE, selected = unique(penguins$Species)) 312 | ), 313 | mainPanel( 314 | plotlyOutput("penguin_barPlot"), 315 | tableOutput("penguin_dataTable") 316 | ) 317 | ) 318 | ) 319 | ) 320 | ) 321 | 322 | # Define server logic for the Shiny app 323 | server <- function(input, output) { 324 | # Iris dataset 325 | filteredIrisData <- reactive({ 326 | iris %>% 327 | filter(Species %in% input$iris_species) 328 | }) 329 | 330 | output$iris_scatterPlot <- renderPlotly({ 331 | p <- ggplot(filteredIrisData(), aes_string(x = input$iris_xvar, y = input$iris_yvar, color = "Species", text = "Species")) + 332 | geom_point() + 333 | theme_minimal() + 334 | labs(title = "Scatter Plot of Iris Dataset", x = input$iris_xvar, y = input$iris_yvar) 335 | 336 | ggplotly(p, tooltip = c("x", "y", "Species")) 337 | }) 338 | 339 | output$iris_dataTable <- renderTable({ 340 | filteredIrisData() 341 | }) 342 | 343 | # Penguin dataset 344 | filteredPenguinData <- reactive({ 345 | penguins %>% 346 | filter(Species %in% input$penguin_species) 347 | }) 348 | 349 | output$penguin_barPlot <- renderPlotly({ 350 | p <- ggplot(filteredPenguinData(), aes_string(x = input$penguin_xvar, fill = "Species", text = "Species")) + 351 | geom_bar(position = "dodge") + 352 | theme_minimal() + 353 | labs(title = "Bar Plot of Penguin Dataset", x = input$penguin_xvar) 354 | 355 | ggplotly(p, tooltip = c("x", "y", "species")) 356 | }) 357 | 358 | output$penguin_dataTable <- renderTable({ 359 | filteredPenguinData() 360 | }) 361 | } 362 | 363 | # Run the application 364 | shinyApp(ui = ui, server = server) 365 | 366 | 367 | ``` 368 | -------------------------------------------------------------------------------- /Week 12/Week12-Lecture-Examples.Rmd: -------------------------------------------------------------------------------- 1 | # Plotly 2 | 3 | ```{r} 4 | library(gapminder) 5 | 6 | #install.packages('plotly') 7 | library(plotly) 8 | 9 | gapminder_2007 <- filter( 10 | gapminder, 11 | year == 2007 12 | ) 13 | 14 | my_plot <- ggplot( 15 | data = gapminder_2007, 16 | mapping = aes( 17 | x = gdpPercap, y = lifeExp, 18 | color = continent 19 | ) 20 | ) + 21 | geom_point() + 22 | scale_x_log10() + 23 | theme_minimal() 24 | 25 | ggplotly(my_plot) 26 | ``` 27 | 28 | # Plotly tooltips 29 | ```{r} 30 | my_plot <- ggplot( 31 | data = gapminder_2007, 32 | mapping = aes( 33 | x = gdpPercap, y = lifeExp, 34 | color = continent 35 | ) 36 | ) + 37 | geom_point(aes(text = country)) + 38 | scale_x_log10() + 39 | theme_minimal() 40 | 41 | ggplotly( 42 | my_plot, 43 | tooltip = "text" 44 | ) 45 | ``` 46 | 47 | 48 | # Plotly with bars 49 | ```{r} 50 | car_hist <- ggplot( 51 | data = mpg, 52 | mapping = aes(x = hwy) 53 | ) + 54 | geom_histogram( 55 | binwidth = 2, 56 | boundary = 0, 57 | color = "white" 58 | ) 59 | 60 | ggplotly(car_hist) 61 | ``` 62 | 63 | # Creating interactive World Bank plot 64 | ```{r} 65 | library(tidyverse) 66 | library(WDI) 67 | library(scales) 68 | library(plotly) 69 | library(colorspace) 70 | #install.packages('ggbeeswarm') 71 | library(ggbeeswarm) 72 | 73 | # get World Bank indicators 74 | indicators <- c( 75 | population = "SP.POP.TOTL", 76 | prop_women_parl = "SG.GEN.PARL.ZS", 77 | gdp_per_cap = "NY.GDP.PCAP.KD" 78 | ) 79 | 80 | wdi_parl_raw <- WDI( 81 | country = "all", indicators, extra = TRUE, 82 | start = 2022, end = 2022 83 | ) 84 | 85 | # keep actual "economies" (this may take a while to run) 86 | wdi_clean <- wdi_parl_raw |> 87 | filter(region != "Aggregates") 88 | glimpse(wdi_clean) 89 | ``` 90 | 91 | ```{r} 92 | wdi_2022 <- wdi_clean |> 93 | filter(year == 2022) |> 94 | drop_na(prop_women_parl) |> 95 | # Scale this down from 0-100 to 0-1 so that scales::label_percent() can format 96 | # it as an actual percent 97 | mutate(prop_women_parl = prop_women_parl / 100) 98 | 99 | static_plot <- ggplot( 100 | data = wdi_2022, 101 | mapping = aes(y = fct_rev(region), x = prop_women_parl, color = region) 102 | ) + 103 | geom_quasirandom() + 104 | scale_x_continuous(labels = label_percent()) + 105 | scale_color_discrete_qualitative(guide = "none") + 106 | labs(x = "% women in parliament", y = NULL, caption = "Source: The World Bank") + 107 | theme_bw(base_size = 14) 108 | static_plot 109 | ``` 110 | Make it interactive 111 | 112 | ```{r} 113 | ggplotly(static_plot) 114 | ``` 115 | 116 | Modifying the tooltip 117 | 118 | ```{r} 119 | static_plot_tooltip <- ggplot( 120 | data = wdi_2022, 121 | mapping = aes(y = fct_rev(region), x = prop_women_parl, color = region) 122 | ) + 123 | geom_quasirandom( 124 | mapping = aes(text = country) 125 | ) + 126 | scale_x_continuous(labels = label_percent()) + 127 | scale_color_discrete_qualitative() + 128 | labs(x = "% women in parliament", y = NULL, caption = "Source: The World Bank") + 129 | theme_bw(base_size = 14) + 130 | theme(legend.position = "none") 131 | 132 | ggplotly(static_plot_tooltip, 133 | tooltip = "text" 134 | ) 135 | ``` 136 | 137 | 138 | Creating custom tooltips with the format 139 | 140 | ``` 141 | Name of country 142 | X% women in parliament 143 | ``` 144 | 145 | Generate a new column using `mutate()` with the required character string 146 | 147 | - `str_glue()` to combine character strings with data values 148 | - The `
` is HTML code for a line break 149 | - Use the `label_percent()` function to format numbers as percents 150 | 151 | ```{r} 152 | wdi_2022 <- wdi_clean |> 153 | filter(year == 2022) |> 154 | drop_na(prop_women_parl) |> 155 | mutate( 156 | prop_women_parl = prop_women_parl / 100, 157 | fancy_label = str_glue("{country}
{label_percent(accuracy = 0.1)(prop_women_parl)} women in parliament") 158 | ) 159 | wdi_2022 |> 160 | select(country, prop_women_parl, fancy_label) |> 161 | head() 162 | ``` 163 | 164 | 165 | ```{r} 166 | static_plot_tooltip_fancy <- ggplot( 167 | data = wdi_2022, 168 | mapping = aes(y = fct_rev(region), x = prop_women_parl, color = region) 169 | ) + 170 | geom_quasirandom( 171 | mapping = aes(text = fancy_label) 172 | ) + 173 | scale_x_continuous(labels = label_percent()) + 174 | scale_color_discrete_qualitative() + 175 | labs(x = "% women in parliament", y = NULL, caption = "Source: The World Bank") + 176 | theme_bw(base_size = 14) + 177 | theme(legend.position = "none") 178 | 179 | ggplotly(static_plot_tooltip_fancy, 180 | tooltip = "text" 181 | ) 182 | ``` -------------------------------------------------------------------------------- /Week 2/Week2-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | 3 | # COMP4010/5120 - Week 2 Application Exercises 4 | 5 | --- 6 | 7 | # Submission requirements: 8 | 9 | - For each week, you will need to submit **TWO** items to Canvas: Your R Markdown file in `.Rmd` and `.pdf` formats. You can export `.Rmd` to `.pdf` by using the Knit functionality. If you’re not sure how to do that please check out this video: [Tutorial on how to Knit R Markdown](https://youtu.be/8eBBPVMwTLo?si=93Vo8OOApf0vAYYH). 10 | - In the R Markdown file please provide your answers to the provided exercises (both theory and programming questions). 11 | - Answers to theory questions should be included in markdown as plain text. Answers to programming questions should be included in executable R chunks (more information in **Reading Material Section A.2.4**). 12 | 13 | --- 14 | 15 | # A. Application Exercises 16 | 17 | ## Task 1. Visualizing the house sales data as a lollipop chart 18 | 19 | Following the content from Section B.3, plot the mean area of houses by decade built using a [lollipop chart](https://www.data-to-viz.com/graph/lollipop.html). 20 | 21 | **Hint:** Think of what **geometric primitives** you need to create a lollipop chart? Refer to [this handy cheatsheet](https://rstudio.github.io/cheatsheets/html/data-visualization.html) and find the geoms you need. 22 | 23 | Your plot should look something (may not need to be exactly) like this: 24 | 25 | ![Lollipop chart](img/housedata3.png) 26 | 27 | ## Task 2. Visualizing the distribution of the number of bedrooms 28 | 29 | Your task is to visualize the distribution of the number of bedrooms. To simplify the task, let’s collapse the variable beds into a smaller number of categories and drop rows with missing values for this variable. 30 | ```R 31 | df_bed <- df |> 32 | mutate(beds = factor(beds) |> 33 | fct_collapse( 34 | "5+" = c("5", "6", "7", "9") 35 | )) |> 36 | drop_na(beds) 37 | ``` 38 | Since the number of bedrooms is effectively a categorical variable, we should select a geom appropriate for a single categorical variable. 39 | Create a bar chart visualizing the distribution of the number of bedrooms in the properties sold. Your plot should look something (may not need to be exactly) like this: 40 | 41 | ![Bedrooms bar chart](img/housedata4.png) 42 | 43 | ## Task 3. Visualizing the distribution of the number of bedrooms by the decade in which the property was built 44 | Now let’s visualize the distribution of the number of bedrooms by the decade in which the property was built. We will still use a bar chart but also color-code the bar segments for each decade. Now we have a few variations to consider. 45 | 46 | - [**Stacked bar chart**](https://datavizproject.com/data-type/stacked-bar-chart/) - each bar segment represents the frequency count and are stacked vertically on top of each other. 47 | - [**Dodged/Grouped bar chart**](https://chartio.com/learn/charts/grouped-bar-chart-complete-guide/) - each bar segment represents the frequency count and are placed side by side for each decade. This leaves each segment with a common origin, or baseline value of 0. 48 | - [**Relative frequency bar chart**](https://www.researchgate.net/figure/Relative-frequency-bar-chart-of-settlement-classification-per-cluster-according-to-the_fig4_330763944) - each bar segment represents the relative frequency (proportion) of each category within each decade. 49 | 50 | Generate each form of the bar chart and compare the differences. Which one do you think is the most informative? 51 | *Hint:* Read the documentation for [geom_bar()](https://ggplot2.tidyverse.org/reference/geom_bar.html) to identify an appropriate argument for specifying each type of bar chart. 52 | 53 | Your plot should look something (may not need to be exactly) like this: 54 | 55 | **Stacked bar chart** 56 | 57 | ![Stacked bar chart](img/housedata5.png) 58 | 59 | **Dodged bar chart** 60 | 61 | ![Dodged bar chart](img/housedata6.png) 62 | 63 | **Relative frequency bar chart** 64 | 65 | ![Relative frequency bar chart](img/housedata7.png) 66 | 67 | ## Task 4. Visualizing the distribution of property size by decades 68 | 69 | Now let’s evaluate the typical property size (`area`) by the decade in which the property was built. We will start by summarizing the data and then visualize the results using a bar chart and a boxplot. 70 | 71 | ```R 72 | mean_area_decade <- df |> 73 | group_by(decade_built_cat) |> 74 | summarize(mean_area = mean(area)) 75 | ``` 76 | 77 | Visualize the property size by the decade in which the property was built. Construct a bar chart reporting the average property size, as well as a [boxplot](https://r-graph-gallery.com/boxplot), [violin plot](https://r-graph-gallery.com/violin), and [strip chart](https://diametrical.co.uk/products/quickchart/advanced-charts/strip-charts/) (e.g. jittered scatterplot). What does each graph tell you about the distribution of property size by decade built? Which ones do you find to be more or less effective? 78 | 79 | *Bonus*: [XKCD on Violin Plots](https://xkcd.com/1967/) 80 | 81 | Your plot should look something (may not need to be exactly) like this: 82 | 83 | **Bar chart (mean area by decade built)** 84 | ![Bar chart (mean area by decades)](img/housedata8.png) 85 | 86 | **Box plot (area by decade built)** 87 | ![Box plot (area by decades)](img/housedata9.png) 88 | 89 | **Violin plot (area by decade built)** 90 | ![Violin plot (area by decade built)](img/housedata10.png) 91 | 92 | **Strip chart (area by decade built)** 93 | ![Strip chart (area by decade built)](img/housedata11.png) 94 | 95 | 96 | # B. Reading Material 97 | 98 | ## 1. Hall of ~~Fame~~ Shame 99 | 100 | Learning from other's mistakes is immensely beneficial as it provides you with tangible examples of what not to do in your own visualizations. 101 | The "Hall of Shame" is a curated collection of poorly executed visualizations that serve as cautionary examples of what to avoid in your own work. By examining these visualizations, we'll gain valuable insights into common pitfalls, ineffective techniques, and misleading representations. The "Hall of Shame" is not intended to ridicule or criticize, but rather to foster critical thinking, inspire improvement, and deepen our understanding of effective data communication. 102 | 103 | For our first one, let's look at the most common sin of data viz: pie charts! 104 | 105 | ![Pie chart](img/piechart.png) 106 | 107 | Here are some reasons why using a pie chart in this situation is not ideal: 108 | 109 | - Difficulty comparing slices: As mentioned earlier, the human eye is not good at accurately judging the relative sizes of pie chart slices, especially for more than a few slices. This makes it difficult to compare the percentage of each country or region in this chart. For instance, it’s hard to tell at a glance which is larger, the slice for China or the one for “Other”. 110 | 111 | - Limited data points: Pie charts are most effective when there are only a few data points to compare. This chart has six slices, which can make it visually complex and difficult to interpret. 112 | 113 | A better alternative for this data visualization would be a bar chart. A bar chart would display the percentages for each country or region along a horizontal or vertical axis, making it easier to compare the magnitudes and identify the highest or lowest percentages. 114 | 115 | ## 2. Data-Ink Ratio: Less is More in Data Visualization 116 | 117 | The data-ink ratio is a key principle in data visualization, emphasizing the importance of maximizing the proportion of ink used to represent the actual data compared to the total ink used in the entire visualization. This concept, championed by Edward Tufte, encourages creators to focus on essential elements that convey the data's message effectively. 118 | 119 | ![Data ink ratio](img/dataink.png) 120 | 121 | Why is a high data-ink ratio crucial in modern data viz designs? 122 | 123 | - Clarity and Focus: By minimizing unnecessary visual elements like excessive decorations or overly complex chart structures, viewers can easily grasp the data's key points without being distracted by extraneous information. This leads to a clearer understanding of the message being conveyed. 124 | - Efficiency and Impact: A high data-ink ratio promotes efficient use of visual space, allowing for the presentation of more data within the same area. This maximizes the impact of the visualization, enabling viewers to quickly extract insights from the data. 125 | - Professionalism and Aesthetics: A focus on essential elements often translates to a cleaner and more professional visual design. This aesthetic simplicity fosters a sense of trust and reliability in the data presented. 126 | 127 | ![Data ink ratio](img/dataink2.jpg) 128 | 129 | While not a strict quantitative measure, the data-ink ratio serves as a valuable guiding principle for data visualization professionals. By striving for a high data-ink ratio, creators can ensure their visualizations are clear, concise, and impactful, allowing viewers to effectively understand and utilize the presented information. 130 | 131 | ## 3. House Sales Data 132 | For the following exercises we will work with data on houses that were sold in the state of New York (USA) in 2022 and 2023. 133 | 134 | The variables include: 135 | 136 | - `property_type` - type of property (e.g. single family residential, townhouse, condo) 137 | - `address` - street address of property 138 | - `city` - city of property 139 | - `state` - state of property (all are New York) 140 | - `zip_code` - ZIP code of property 141 | - `price` - sale price (in dollars) 142 | - `beds` - number of bedrooms 143 | - `baths` - number of bathrooms. Full bathrooms with shower/toilet count as 1, bathrooms with just a toilet count as 0.5. 144 | - `area` - living area of the home (in square feet) 145 | - `lot_size` - size of property’s lot (in acres) 146 | - `year_built` - year home was built 147 | - `hoa_month` - monthly HOA dues. If the property is not part of an HOA, then the value is `NA` 148 | 149 | The dataset can be found in the repo for this week. It is called `homesales.csv`. We will import the data and create a new variable, `decade_built_cat`, which identifies the decade in which the home was built. It will include catch-all categories for any homes pre-1940 and post-1990. 150 | 151 | First, you can read the data `csv` in R with: 152 | 153 | ```R 154 | df <- read_csv("homesales.csv") 155 | ``` 156 | 157 | ### 3.1. Average home size by decade 158 | 159 | Let’s examine the average size of homes recently sold by their age. To simplify this task, we will split the homes by decade of construction. It will include catch-all categories for any homes pre-1940 and post-1990. Then we will calculate the average size of homes sold by decade. 160 | 161 | ```R 162 | # create decade variable 163 | df <- df |> 164 | mutate( 165 | decade_built = (year_built %/% 10) * 10, 166 | decade_built_cat = case_when( 167 | decade_built <= 1940 ~ "1940 or before", 168 | decade_built >= 1990 ~ "1990 or after", 169 | .default = as.character(decade_built) 170 | ) 171 | ) 172 | 173 | # calculate mean area by decade 174 | mean_area_decade <- df |> 175 | group_by(decade_built_cat) |> 176 | summarize(mean_area = mean(area)) 177 | mean_area_decade 178 | ``` 179 | 180 | This code snippet is using the pipe operator (`|>`) in R along with the `mutate()` function from the `dplyr` package to create two new variables (`decade_built` and `decade_built_cat`) based on an existing variable `year_built` in the dataframe `df`. 181 | 182 | - The pipe operator (`|>`) passes the dataframe `df` into the next function, which is `mutate()`, making the dataframe the first argument of the `mutate()` function. 183 | - `mutate()`: This function from the `dplyr` package is used to create new variables or modify existing ones in the dataframe. 184 | - `decade_built = (year_built %/% 10) * 10`: This line calculates the decade in which each observation's `year_built` falls. It does this by dividing `year_built` by 10 (`year_built %/% 10`), which effectively truncates the year to the nearest decade. Then, it multiplies the result by 10 to obtain the decade. For example, if `year_built` is 1995, `(1995 %/% 10) * 10` would result in 1990. 185 | - `decade_built_cat = case_when()`: This line creates a categorical variable `decade_built_cat` based on the decade information calculated in the previous step. It uses the `case_when()` function, which allows for conditional assignment based on multiple conditions. 186 | - `decade_built <= 1940 ~ "1940 or before"`: This condition checks if the decade_built is less than or equal to 1940. If true, it assigns the value `"1940 or before"` to the `decade_built_cat`. 187 | - `decade_built >= 1990 ~ "1990 or after"`: This condition checks if the decade_built is greater than or equal to 1990. If true, it assigns the value `"1990 or after"` to the `decade_built_cat`. 188 | - `.default = as.character(decade_built)`: This is the default condition. If the `decade_built` does not fall into either of the specified ranges, it converts the decade into a character and assigns it to `decade_built_cat`. 189 | 190 | ### 3.2. Visualizing the data as a bar chart 191 | 192 | A conventional approach to visualizing this data is a **bar chart**. Since we already calculated the average area, we can use `geom_col()` to create the bar chart. We also graph it horizontally to avoid overlapping labels for the decades. 193 | 194 | ```R 195 | ggplot( 196 | data = mean_area_decade, 197 | mapping = aes(x = mean_area, y = decade_built_cat) 198 | ) + 199 | geom_col() + 200 | labs( 201 | x = "Mean area (square feet)", y = "Decade built", 202 | title = "Mean area of houses, by decade built" 203 | ) 204 | ``` 205 | 206 | ![Bar chart](img/housedata1.png) 207 | 208 | ### 3.3. Visualizing the data as a dot plot 209 | 210 | The bar chart violates the data-ink ratio principle. The bars are not necessary to convey the information. We can use a **dot plot** instead. The dot plot is a variation of the bar chart, where the bars are replaced by dots. The dot plot is a (potentially) better choice because it uses less ink to convey the same information. 211 | 212 | ```R 213 | ggplot( 214 | data = mean_area_decade, 215 | mapping = aes(x = mean_area, y = decade_built_cat) 216 | ) + 217 | geom_point(size = 4) + 218 | labs( 219 | x = "Mean area (square feet)", y = "Decade built", 220 | title = "Mean area of houses, by decade built" 221 | ) 222 | ``` 223 | 224 | ![Point chart](img/housedata2.png) -------------------------------------------------------------------------------- /Week 2/Week2-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | --- 3 | --- 4 | 5 | # Week 2 6 | 7 | # Application Exercises 8 | 9 | Include `tidyverse`: 10 | 11 | ```{r} 12 | #install.packages("tidyverse") 13 | library(tidyverse) 14 | theme_set(theme_minimal()) 15 | ``` 16 | 17 | Read the data: 18 | 19 | ```{r} 20 | df <- read_csv("homesales.csv") 21 | ``` 22 | 23 | Average home size by decade: 24 | 25 | ```{r} 26 | # create decade variable 27 | df <- df |> 28 | mutate( 29 | decade_built = (year_built %/% 10) * 10, 30 | decade_built_cat = case_when( 31 | decade_built <= 1940 ~ "1940 or before", 32 | decade_built >= 1990 ~ "1990 or after", 33 | .default = as.character(decade_built) 34 | ) 35 | ) 36 | 37 | # calculate mean area by decade 38 | mean_area_decade <- df |> 39 | group_by(decade_built_cat) |> 40 | summarize(mean_area = mean(area)) 41 | mean_area_decade 42 | ``` 43 | 44 | Visualizing the data as a bar chart: 45 | 46 | ```{r} 47 | ggplot( 48 | data = mean_area_decade, 49 | mapping = aes(x = mean_area, y = decade_built_cat) 50 | ) + 51 | geom_col() + 52 | labs( 53 | x = "Mean area (square feet)", y = "Decade built", 54 | title = "Mean area of houses, by decade built" 55 | ) 56 | ``` 57 | 58 | Visualizing the data as a dot plot: 59 | 60 | ```{r} 61 | ggplot( 62 | data = mean_area_decade, 63 | mapping = aes(x = mean_area, y = decade_built_cat) 64 | ) + 65 | geom_point(size = 4) + 66 | labs( 67 | x = "Mean area (square feet)", y = "Decade built", 68 | title = "Mean area of houses, by decade built" 69 | ) 70 | ``` 71 | 72 | ## TASK 1. Visualizing the data as a lollipop chart 73 | 74 | ```{r} 75 | # YOUR CODE HERE 76 | 77 | 78 | 79 | 80 | ``` 81 | 82 | ## TASK 2. Visualizing the distribution of the number of bedrooms 83 | 84 | Collapse the variable `beds` into a smaller number of categories and drop rows with missing values for this variable: 85 | 86 | ```{r} 87 | df_bed <- df |> 88 | mutate(beds = factor(beds) |> 89 | fct_collapse( 90 | "5+" = c("5", "6", "7", "9") 91 | )) |> 92 | drop_na(beds) 93 | ``` 94 | 95 | ```{r} 96 | # YOUR CODE HERE 97 | 98 | 99 | 100 | 101 | ``` 102 | 103 | ## TASK 3. Visualizing the distribution of the number of bedrooms by the decade in which the property was built 104 | 105 | Stacked bar chart (number of bedrooms by the decade built): 106 | 107 | ```{r} 108 | # YOUR CODE HERE 109 | 110 | 111 | 112 | 113 | ``` 114 | 115 | Dodged bar chart (number of bedrooms by the decade built): 116 | 117 | ```{r} 118 | # YOUR CODE HERE 119 | 120 | 121 | 122 | 123 | ``` 124 | 125 | Relative frequency bar chart (number of bedrooms by the decade built): 126 | 127 | ```{r} 128 | # YOUR CODE HERE 129 | 130 | 131 | 132 | 133 | ``` 134 | 135 | ## Task 4. Visualizing the distribution of property size by decades 136 | 137 | Getting mean of area of each decade category: 138 | 139 | ```{r} 140 | mean_area_decade <- df |> 141 | group_by(decade_built_cat) |> 142 | summarize(mean_area = mean(area)) 143 | ``` 144 | 145 | Bar chart (mean area by decade built): 146 | 147 | ```{r} 148 | # YOUR CODE HERE 149 | 150 | 151 | 152 | 153 | ``` 154 | 155 | Box plot (area by decade built): 156 | 157 | ```{r} 158 | # YOUR CODE HERE 159 | 160 | 161 | 162 | 163 | ``` 164 | 165 | Violin plot (area by decade built): 166 | 167 | ```{r} 168 | # YOUR CODE HERE 169 | 170 | 171 | 172 | 173 | ``` 174 | 175 | Strip chart (area by decade built): 176 | 177 | ```{r} 178 | set.seed(4010) 179 | 180 | # YOUR CODE HERE 181 | 182 | 183 | 184 | 185 | ``` 186 | -------------------------------------------------------------------------------- /Week 2/Week2-AE-RMarkdown.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/Week2-AE-RMarkdown.pdf -------------------------------------------------------------------------------- /Week 2/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/banner.png -------------------------------------------------------------------------------- /Week 2/img/dataink.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/dataink.png -------------------------------------------------------------------------------- /Week 2/img/dataink2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/dataink2.jpg -------------------------------------------------------------------------------- /Week 2/img/housedata1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata1.png -------------------------------------------------------------------------------- /Week 2/img/housedata10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata10.png -------------------------------------------------------------------------------- /Week 2/img/housedata11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata11.png -------------------------------------------------------------------------------- /Week 2/img/housedata2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata2.png -------------------------------------------------------------------------------- /Week 2/img/housedata3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata3.png -------------------------------------------------------------------------------- /Week 2/img/housedata4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata4.png -------------------------------------------------------------------------------- /Week 2/img/housedata5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata5.png -------------------------------------------------------------------------------- /Week 2/img/housedata6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata6.png -------------------------------------------------------------------------------- /Week 2/img/housedata7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata7.png -------------------------------------------------------------------------------- /Week 2/img/housedata8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata8.png -------------------------------------------------------------------------------- /Week 2/img/housedata9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/housedata9.png -------------------------------------------------------------------------------- /Week 2/img/piechart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 2/img/piechart.png -------------------------------------------------------------------------------- /Week 4/Week4-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | # COMP4010/5120 - Week 4 Application Exercises 3 | --- 4 | 5 | # A. Application Exercises 6 | Data source: https://github.com/imdevskp/covid_19_jhu_data_web_scrap_and_cleaning 7 | 8 | ## Task 1. COVID-19 `TotalCases` vs. `TotalDeaths` 9 | From `worldometer_data.csv`, create a plot of `TotalCases` vs. `TotalDeaths`. Describe the relationship between the two variables. 10 | 11 | ![Task 1](img/task1.jpg) 12 | 13 | ## Task 2. Log scale 14 | Apply log scales to the plot for better '*readability*'. For these two variables, is the log scale beneficial for visualization? Briefly describe your opinion of the plot before and after applying the log scale. 15 | 16 | ![Task 2](img/task2.jpg) 17 | 18 | ## Task 3. Using `ggwaffle` 19 | Refer to Section B.3.2. for information on how to install and use `ggwaffle` to create waffle charts. Use `ggwaffle` to create a waffle chart to demonstrate the distribution of continents in `worldometer_data.csv` from the `Continent` column. What information does this plot tell you about the continents? 20 | 21 | ![Task 3](img/task3.jpg) 22 | 23 | ## Task 4. Tidying the waffle plot 24 | Adjust the waffle chart to use a fixed aspect ratio so the symbols are squares. 25 | 26 | ![Task 4](img/task4.jpg) 27 | 28 | ## Task 5. Coloring the plot 29 | Use [`scale_fill_viridis_d(option='scale_name')`](https://sjspielman.git-hub.io/introverse/articles/color_fill_scales.html) to color your plot. Be sure to have `tidyverse` loaded. 30 | 31 | EDIT: [New link](https://sjspielman.github.io/introverse/articles/color_fill_scales.html) 32 | 33 | ![Task 5](img/task5.jpg) 34 | 35 | ## Task 6. Using icons 36 | Install the `emojifont` package with `install.packages("emojifont")` and import it. Use [`fontawesome`](https://cran.r-project.org/web/packages/emojifont/vignettes/emojifont.html#font-awesome) to get an icon/emoji and apply it to the waffle chart similar to the last example from [`ggwaffle doc`](https://liamgilbey.github.io/ggwaffle/). 37 | 38 | 39 | ![Task 6](img/task6.jpg) 40 | 41 | --- 42 | 43 | # B. Reading Material 44 | 45 | ## 1. Hall of Shame 46 | 47 | ![](img/Truncated-Y-Axis-Data-Visualizations-Designed-To-Mislead.jpg) 48 | 49 | Infamous for its overuse in politics, the truncated y-axis is a classic way to visually mislead. Take a look at the graph above, comparing people with jobs to people on welfare. At first glance, the visual dynamics of the graph suggest people on welfare to number four times as many as people with jobs. Numbers don’t lie, however, and when analyzed, they point out much less sensational facts than the data visualization would suggest. 50 | 51 | This type of misinformation occurs when the graph’s producers ignore convention and manipulate the y-axis. The conventional way of organizing the y-axis is to start at 0 and then go up to the highest data point in your set. By not setting the origin of the y-axis at zero, small differences become hyperbolic and therefore play more on people’s prejudices rather than their rationality. 52 | Focus on creating your data visualizations using data with a zero-baseline y-axis and watch out for truncated axes. Sometimes these distortions are done on purpose to mislead readers, but other times they’re just the consequence of not knowing how an unintentional use of a non-zero-baseline can skew data. 53 | 54 | ## 2. Understanding and Applying Logarithmic Scales in Data Visualization 55 | 56 | The 2 following articles provide a very nice and intuitive explanation for log scales and its benefits and use cases: 57 | - [data-viz-workshop-2021 - Plotting using logarithmic scales](https://badriadhikari.github.io/data-viz-workshop-2021/log/) 58 | - [The Power of Logarithmic Scale](https://dataclaritycorp.com/the-power-of-logarithmic-scale/) 59 | 60 | Data visualization is a crucial aspect of interpreting complex datasets, and understanding different types of scales, including logarithmic scales, is vital. This tutorial aims to provide a clear understanding of logarithmic scales and their application in data visualization, helping you to make informed choices when displaying data. 61 | 62 | ### 2.1. Grasping the Concept of Log Scales 63 | 64 | Consider the 'turn-off display after' setting in electronic devices, where options increase from minutes to hours unevenly. This intuitive understanding of varying increments is at the core of log scales. 65 | 66 | **Visualizing distances**: A linear scale might not effectively compare distances from a central point to various cities if one distance is significantly greater than the others. Using a log scale in this scenario makes the comparison more balanced and informative. 67 | 68 | **Transforming Data for Better Interpretation**: With exponential data like population growth or pandemic spread, linear scales can be misleading. Log scales provide a more nuanced view, revealing trends and patterns that might be obscured on a linear scale. 69 | Exponential variables, such as wildfire spread or population growth, demonstrate rapid increases. These can be more accurately represented on a log scale, making it easier to understand their progression. 70 | 71 | ### 2.2. When to Use Logarithmic Scales 72 | 73 | **Exponential Data**: In cases of exponential growth or decay, where data increases or decreases rapidly, log scales can transform skewed distributions into more interpretable forms. 74 | Understanding the exponential equation (f(x) = ab^x) helps in recognizing scenarios where log scales are appropriate. Remember, log scales are less suitable for data that includes negative or zero values. 75 | 76 | **Comparing Rates of Change**: In scenarios like tracking stock prices or disease spread, log scales can highlight rates of change more effectively than linear scales. They provide a clearer view of how variables are evolving over time, especially when the changes are multiplicative rather than additive. 77 | 78 | **Resolving Skewness in Data Visualization**: When certain data points dominate because of their large magnitude, log scales can prevent this skewness, allowing for a more balanced view of all data points. 79 | 80 | ### 2.3. How to Read and Interpret Logarithmic Scales 81 | 82 | **Identifying Logarithmic Scales**: Determine whether the graph is a semi-log (one axis is logarithmic) or log-log (both axes are logarithmic). Unevenly spaced grid lines are a hallmark of logarithmic scales. 83 | Understand the representation of values on a log scale: consistent multiplicative increases rather than additive ones. For instance, each tick mark could represent a tenfold increase. 84 | 85 | **Understanding Exponential Trends on Log Scales**: On a log scale, exponential trends appear as straight lines. This property can be used to identify exponential relationships within data and to compare different exponential trends. 86 | 87 | ### 2.4. Application in Scientific Research 88 | 89 | **Innate Understanding of Logarithmic Representation**: Studies suggest that humans may intuitively understand logarithmic scales, as they often perceive quantities in relative terms rather than absolute differences. 90 | This innate perception applies to various sensory experiences, such as brightness or sound, indicating the natural fit of log scales in representing certain types of data. 91 | 92 | **Interpreting Scientific Data**: In fields where data spans several orders of magnitude, such as in astronomy or microbiology, logarithmic scales are indispensable for meaningful visualization and comparison. 93 | 94 | ### To sum it up: 95 | 96 | Use logarithmic scales when dealing with multiplicative changes, exponential growth or decay, and data spanning several orders of magnitude. Avoid logarithmic scales for data that includes negative values, zeroes, or when linear relationships are more appropriate. Remember, the choice of scale can significantly affect the interpretation of data. By understanding and applying logarithmic scales appropriately, you can uncover patterns and insights that might otherwise remain hidden in your data visualizations. 97 | 98 | ## 3. Waffle charts (yum!) 99 | 100 | Waffle charts are an innovative way to visualize data, offering a unique and engaging alternative to traditional pie or bar charts. They are particularly effective for displaying part-to-whole relationships and for making comparisons between categories. This tutorial will guide you through the fundamentals of waffle charts, their construction, and their effective use in data visualization. 101 | 102 | ![Waffle chart & Real waffles](img/waffle1.jpg) 103 | 104 | ### 3.1. Understanding Waffle Charts 105 | 106 | **What is a Waffle Chart?** A waffle chart is a grid-based visual representation where each cell in the grid represents a small, equal portion of the whole. It’s akin to a square pie chart but offers a more granular view. 107 | Ideal for representing percentages or proportions, waffle charts can effectively communicate data in a more relatable and visually appealing way. 108 | 109 | **Comparative Strengths and Weaknesses**: 110 | 111 | - Strengths: Waffle charts are excellent for illustrating small differences, which might be overlooked in pie charts. They are visually engaging and easy to interpret. 112 | - Weaknesses: Not suitable for large datasets or for displaying changes over time. They can become cluttered and less effective if too many categories are included. 113 | 114 | ### 3.2. Creating a Waffle Chart in R using `ggwaffle`: 115 | 116 | Waffle charts can be effectively created in R using the `ggwaffle` package, which is part of the `ggplot2` ecosystem. This package simplifies the process of creating waffle charts while providing flexibility to customize the charts. 117 | 118 | Install and import the `ggwaffle` package: 119 | 120 | ```R 121 | devtools::install_github("liamgilbey/ggwaffle") 122 | library(ggwaffle) 123 | ``` 124 | 125 | From [https://liamgilbey.github.io/ggwaffle/](https://liamgilbey.github.io/ggwaffle/): 126 | 127 | > `ggwaffle` heavily relies on the usage of `ggplot2`. Much like standard `ggplot` graphs, waffle charts are created by adding layers to a base graphic. Because of the inner mechanisms of `ggplot2`, some of the necessary data transformations have to be completed outside of a standard plot creation. The function `waffle_iron` has been added to help with issue. 128 | > `ggwaffle` also introduces a column mapping function, `aes_d`. At this stage I have no idea of how useful this is outside the context of the package, but it seemed a nice way to specify dynamic column renaming. `aes_d` is obviously coined from ggplot’s aes function and has a very similar idea. Here we are mapping column names to feed into a function so they can be renamed for used appropriately. 129 | 130 | ```R 131 | waffle_data <- waffle_iron(mpg, aes_d(group = class)) 132 | 133 | ggplot(waffle_data, aes(x, y, fill = group)) + 134 | geom_waffle() 135 | ``` 136 | 137 | ![Waffle chart 2](img/waffle2.png) 138 | 139 | In short, you first need to prepare the data with `waffle_iron()` and then plotting it with `ggplot2` and `geom_waffle()`. See the [documentation for [`waffle_iron()`](https://rdrr.io/github/liamgilbey/ggwaffle/man/waffle_iron.html) for more details. 140 | 141 | Let's practice with the `mtcars` dataset. Let's create a waffle chart grouped by the number of cylinders in a car. Note that `x` and `y` here are columns that were automatically created by `waffle_iron()` based on the grouping that we have requested. 142 | 143 | ```R 144 | cyl_data <- waffle_iron(mtcars, aes_d(group = cyl)) 145 | 146 | # Creating the waffle chart 147 | ggplot(cyl_data, mapping = aes(x, y, fill = group)) + 148 | geom_waffle() + 149 | ggtitle("Distribution of Cars by Cylinder Count in mtcars Dataset") 150 | ``` 151 | 152 | ![Waffle chart 3](img/waffle3.jpg) 153 | 154 | Right now it's treating our `cyl` column as a continuous nuemrical variable, but we want to treat the number of cylinders as discrete categories. Let's convert it to discrete factors and see what will happen. 155 | 156 | ```R 157 | cyl_data <- waffle_iron(mtcars, aes_d(group = cyl)) 158 | 159 | # Creating the waffle chart 160 | ggplot(cyl_data, mapping = aes(x, y, fill = as.factor(group))) + 161 | geom_waffle() + 162 | ggtitle("Distribution of Cars by Cylinder Count in mtcars Dataset") 163 | ``` 164 | 165 | ![Waffle chart 4](img/waffle4.jpg) 166 | 167 | Much better! Let's change the theme to the default waffle theme so it looks more like a proper waffle chart. 168 | 169 | ```R 170 | cyl_data <- waffle_iron(mtcars, aes_d(group = cyl)) 171 | 172 | # Creating the waffle chart 173 | ggplot(cyl_data, mapping = aes(x, y, fill = as.factor(group))) + 174 | geom_waffle() + 175 | theme_waffle() + 176 | ggtitle("Distribution of Cars by Cylinder Count in mtcars Dataset") 177 | ``` 178 | 179 | ![Waffle chart 5](img/waffle5.jpg) 180 | 181 | For more customization, please refer to this handy [quick start documentation for `ggwaffle`](https://liamgilbey.github.io/ggwaffle/). 182 | 183 | ### 3.3. Best Practices for Clarity and Accuracy: 184 | 185 | - **Keep it simple**: Limit the number of categories to ensure the chart is easy to read and interpret. 186 | - **Consistent scale**: Ensure that each cell represents the same value across all categories for accurate comparison. 187 | - **Use color effectively**: Choose distinct, contrasting colors for different categories to enhance readability. 188 | - **Labels**: Include clear labels and a legend to convey the meaning of each color and category. This enhances the user's understanding of the chart. If necessary, include annotations or tooltips for more detailed explanations. 189 | - **Interactivity**: In digital formats, make your waffle charts interactive. Hover-over effects can display exact values or additional information, adding depth to the data presented. 190 | 191 | ### 3.4. Use cases 192 | 193 | Specific examples: 194 | - Displaying election results, survey data, or market share. 195 | - Comparing demographic data like age or income distribution. 196 | - Visualizing progress towards a goal, like fundraising or project milestones. 197 | 198 | ### To sum it up: 199 | 200 | Waffle charts are ideal when you want to emphasize the composition of a dataset and when dealing with a limited number of categories. 201 | Consider the nature of your data, the audience, and the context of your presentation before deciding if a waffle chart is the most effective tool for your visualization needs. 202 | Waffle charts, with their straightforward design and ease of interpretation, can be a powerful tool in your data visualization arsenal. By applying the principles outlined in this tutorial, you can enhance your presentations and make complex data more accessible and engaging for your audience. 203 | -------------------------------------------------------------------------------- /Week 4/img/Truncated-Y-Axis-Data-Visualizations-Designed-To-Mislead.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/Truncated-Y-Axis-Data-Visualizations-Designed-To-Mislead.jpg -------------------------------------------------------------------------------- /Week 4/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/banner.png -------------------------------------------------------------------------------- /Week 4/img/task1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/task1.jpg -------------------------------------------------------------------------------- /Week 4/img/task2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/task2.jpg -------------------------------------------------------------------------------- /Week 4/img/task3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/task3.jpg -------------------------------------------------------------------------------- /Week 4/img/task4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/task4.jpg -------------------------------------------------------------------------------- /Week 4/img/task5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/task5.jpg -------------------------------------------------------------------------------- /Week 4/img/task6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/task6.jpg -------------------------------------------------------------------------------- /Week 4/img/waffle1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/waffle1.jpg -------------------------------------------------------------------------------- /Week 4/img/waffle2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/waffle2.png -------------------------------------------------------------------------------- /Week 4/img/waffle3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/waffle3.jpg -------------------------------------------------------------------------------- /Week 4/img/waffle4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/waffle4.jpg -------------------------------------------------------------------------------- /Week 4/img/waffle5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 4/img/waffle5.jpg -------------------------------------------------------------------------------- /Week 5/Week5-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | # COMP4010/5120 - Week 5 Application Exercises 3 | --- 4 | 5 | # A. Application Exercises 6 | Suppose we have a fictional dataset of enrolment at VinUni from the year 2105 to 2125. We are tasked to create a series of statistical charts to be presented on the organization's official media streams. Due to [**organizational branding**](https://www.paradigmmarketinganddesign.com/what-is-organizational-branding/) requirements, we have to design our charts to follow the required visual language of the organization. 7 | 8 | > Organizational branding is one of the most often misunderstood concepts in marketing and design. Many make the mistake of relegating it to simply the logo or signature colors of a business or organization. But a brand is so much more than that. It is one of, if not the most valuable asset you possess. While it is not always easily defined, it is almost always easily recognized. It is how your audience perceives you and is created by a combination of your reputation, identity, and voice. In other words, organizational branding is the mechanism used to shape, cultivate, and evolve your brand. It involves many elements, including your name, logo, tagline, website, colors, collateral, messaging, positioning, graphic elements, social media, and other outreach platforms." - [What is Organizational Branding?](https://www.paradigmmarketinganddesign.com/what-is-organizational-branding/) 9 | 10 | VinUni, for instance, has comprehensive[ brand identity guidelines](https://policy.vinuni.edu.vn/all-policies/brand-identity-manual/). Their design center offers clear instructions on using the VinUni logo, color scheme, and fonts, along with downloadable resources like a PowerPoint template. 11 | 12 | Let's say you want to develop a set of statistical charts for a VinUni presentation in a specific course, and you want to ensure they are both consistent with the university's branding and can be easily replicated. You can use the `ggplot2` package to create a custom theme that aligns with VinUni's brand identity. 13 | 14 | ## Task 1. Simple dodged bar chart for enrolment by college 15 | 16 | From `college_data_normalized.csv`, create a dodged bar chart of `pct` for each `college` with fill by `year`. Only show data from 2120 onwards. 17 | 18 | ![Task 1](img/ae1.png) 19 | 20 | ## Task 2. Customize with brand identity 21 | 22 | With the [`showtext`](https://cran.rstudio.com/web/packages/showtext/vignettes/introduction.html) library (`install.packages('showtext')`) and the [`ggplot2` theme documentation](https://ggplot2.tidyverse.org/reference/theme.html), customize the chart with corresponding fonts and color schemes. 23 | 24 | You can select any organization branding scheme that you may like, the following examples will be following VinUni's branding. 25 | 26 | ![Task 2](img/ae2.png) 27 | 28 | ## Task 3. Make customization reusable by creating a function 29 | 30 | Create an R function for your customizations which can be reused for future charts. For example, with the base chart: 31 | `college_plot <- ggplot(...)`, and a function called `theme_vinuni`, we can simply apply the theme to the chart by doing: 32 | 33 | ```R 34 | college_plot + 35 | theme_vinuni() 36 | ``` 37 | 38 | Create the same plot as Task 2, but using a function to apply your theme. **Note**: You don't need to include the color palette in your function. That can be done outside of the function using `scale_fill_manual` or `scale_color_manual`. 39 | 40 | ## Task 4. Create a themed multiple line graph 41 | 42 | Create a multiple line graph that shows the percentage of enrolled students for each college from 2105 to 2125, apply the theme you have previously created. Experiment with the color palette, pick any color that you want which may represent the brand you have chosen for effective visualization. 43 | 44 | ![Task 4](img/ae4.png) 45 | 46 | 47 | --- 48 | 49 | # B. Reading Material 50 | 51 | ## 1. Hall of Shame 52 | 53 | ![](img/shame.jpg) 54 | 55 | >The pie chart in the above image shows the percentage of Americans who have tried marijuana in three different years. Now, a pie chart is used to show percentages of a whole and represents percentages at a set point in time. Due to this, the audience may mistake the visualization showing the following information. 56 | > - All the people participating in the survey tried marijuana. 57 | > - 51 percent of the population tried marijuana today. 58 | > - 43 percent of them tried it last year. 59 | > - 34 percent of them tried marijuana in 1997. 60 | > However, the reality is entirely different. The above pie chart shows data from three different surveys. The graph is trying to show that 61 | > - Today, 51 percent of the total population has tried marijuana. 49 percent of them haven’t. 62 | > - Last year, 43 percent of the total population tried marijuana. 57 percent of them didn’t. 63 | > - In 1997, only 34 percent of the total population tried marijuana. 67 percent of them didn’t. 64 | > Thus, the above data visualization is using graphic forms in inappropriate ways to distort the data. 65 | 66 | Source: [Codeconquest](https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/) 67 | 68 | ## 2. Customizing `ggplot2` 69 | 70 | This week, we will delve into customizing themes in `ggplot2`, an R package widely used for creating elegant data visualizations. Specifically, we will focus on how to recreate a specific plot with customized themes. The customization involves font adjustments using the showtext package and applying unique color palettes based on different branding guidelines. 71 | 72 | ### Step 1: Setting Up the Font 73 | 74 | Select a font according to your chosen branding scheme. For VinUni, it's the Montserrat type face. We will use the Montserrat font for our plot, a versatile and modern sans-serif typeface. To do this, we first need to load the `showtext library` and add the Montserrat font. 75 | 76 | ```R 77 | library(showtext) 78 | font_add_google("Montserrat", "Montserrat") 79 | showtext_auto() 80 | ``` 81 | 82 | ### Step 2: Defining Color Palettes 83 | 84 | Our customization involves two color palettes based on branding guidelines from VinUni. Let's define these palettes in R: 85 | 86 | ```R 87 | # VinUni color palette 88 | vinuni_palette_main <- c("#35426e", "#d2ae6d", "#c83538", "#2e548a") 89 | vinuni_palette_accents <- c("#5cc6d0", "#a7c4d2", "#d2d3d5", "#4890bd", "#0087c3", "#d2ae6d") 90 | ``` 91 | 92 | ### Step 3: Creating the Plot with Custom Themes 93 | 94 | Assuming you have a plot object named `my_plot`, we can customize its theme as follows: 95 | 96 | ```R 97 | main_font = "Montserrat" 98 | 99 | college_plot + 100 | scale_fill_manual(values = vinuni_palette_accents) + 101 | theme_minimal( 102 | base_family = main_font, 103 | base_size = 11 104 | ) + 105 | theme( 106 | plot.title.position = "plot", 107 | plot.title = element_text(hjust = 0.5, face="bold", colour = vinuni_palette_main[3]), 108 | plot.subtitle = element_text(hjust = 0.5, face="bold"), 109 | legend.position = "bottom", 110 | panel.grid.major.x = element_blank(), 111 | panel.grid.minor.x = element_blank(), 112 | panel.grid.major.y = element_line(color = "grey", linewidth = 0.2), 113 | panel.grid.minor.y = element_blank(), 114 | axis.text = element_text(size = rel(1.0)), 115 | axis.text.x = element_text(face="bold", colour = vinuni_palette_main[1]), 116 | axis.text.y = element_text(face="bold", colour = vinuni_palette_main[1]), 117 | legend.text = element_text(size = rel(0.9)) 118 | ) 119 | ``` 120 | 121 | ### Step 4: Finalizing the Plot 122 | 123 | Finally, to ensure the `showtext` settings are applied correctly, we need to call `showtext_auto()` again at the end of the script. -------------------------------------------------------------------------------- /Week 5/college_data_normalized.csv: -------------------------------------------------------------------------------- 1 | year,college,pct 2 | 2105,CBM,0.598425 3 | 2106,CBM,0.6615047779779688 4 | 2107,CBM,0.3971112533138313 5 | 2108,CBM,0.41129442916793973 6 | 2109,CBM,0.4251509105144849 7 | 2110,CBM,0.43469596320899334 8 | 2111,CBM,0.42034424444598883 9 | 2112,CBM,0.4281175536139794 10 | 2113,CBM,0.434933723243606 11 | 2114,CBM,0.4274814637646496 12 | 2115,CBM,0.4368950555476089 13 | 2116,CBM,0.45135 14 | 2117,CBM,0.4224771553436631 15 | 2118,CBM,0.4020581168683984 16 | 2119,CBM,0.3827852384477107 17 | 2120,CBM,0.3396724986988029 18 | 2121,CBM,0.3363424403658756 19 | 2122,CBM,0.3318401649092207 20 | 2123,CBM,0.3014243451343038 21 | 2124,CBM,0.2848104801538449 22 | 2125,CBM,0.2708468062389401 23 | 2105,CAS,0.0000000000000000 24 | 2106,CAS,0.0000000000000000 25 | 2107,CAS,0.2802358533686809 26 | 2108,CAS,0.2880130359507078 27 | 2109,CAS,0.2777006036420579 28 | 2110,CAS,0.2843893714869698 29 | 2111,CAS,0.2966386359950888 30 | 2112,CAS,0.3150025697332152 31 | 2113,CAS,0.3112730410604281 32 | 2114,CAS,0.2922028222913179 33 | 2115,CAS,0.2437896342917084 34 | 2116,CAS,0.2183250000000002 35 | 2117,CAS,0.2292163289630512 36 | 2118,CAS,0.2471780225758193 37 | 2119,CAS,0.2621973141588241 38 | 2120,CAS,0.2780758297633823 39 | 2121,CAS,0.2759211653813196 40 | 2122,CAS,0.2591968603821454 41 | 2123,CAS,0.2455672890421426 42 | 2124,CAS,0.2567813349718876 43 | 2125,CAS,0.2634276664873313 44 | 2105,CECS,0.221725000000000 45 | 2106,CECS,0.172349305539717 46 | 2107,CECS,0.106522534052472 47 | 2108,CECS,0.082951420714940 48 | 2109,CECS,0.081150708458565 49 | 2110,CECS,0.056642820643842 50 | 2111,CECS,0.063937730210577 51 | 2112,CECS,0.062374433490632 52 | 2113,CECS,0.066156906924902 53 | 2114,CECS,0.072518536235350 54 | 2115,CECS,0.081986363419634 55 | 2116,CECS,0.104075000000000 56 | 2117,CECS,0.139327572506952 57 | 2118,CECS,0.173330613355093 58 | 2119,CECS,0.189468118853759 59 | 2120,CECS,0.224646674940945 60 | 2121,CECS,0.247962376198162 61 | 2122,CECS,0.264568302544993 62 | 2123,CECS,0.300523108873258 63 | 2124,CECS,0.304196364150193 64 | 2125,CECS,0.315810182777259 65 | 2105,CHS,0.1776000000000000 66 | 2106,CHS,0.1620179259698497 67 | 2107,CHS,0.2161303592650151 68 | 2108,CHS,0.2177411141664120 69 | 2109,CHS,0.2159977773848913 70 | 2110,CHS,0.2242718446601941 71 | 2111,CHS,0.2190793893483448 72 | 2112,CHS,0.1945054431621735 73 | 2113,CHS,0.1876363287710635 74 | 2114,CHS,0.2077971777086821 75 | 2115,CHS,0.2373289467410482 76 | 2116,CHS,0.2262500000000000 77 | 2117,CHS,0.2089789431863329 78 | 2118,CHS,0.1774332472006890 79 | 2119,CHS,0.1655493285397060 80 | 2120,CHS,0.1576049965968691 81 | 2121,CHS,0.1397740180546422 82 | 2122,CHS,0.1443946721636406 83 | 2123,CHS,0.1524852569502948 84 | 2124,CHS,0.1542118207240741 85 | 2125,CHS,0.1499153444964689 86 | -------------------------------------------------------------------------------- /Week 5/demo.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Untitled" 3 | output: html_document 4 | date: "2024-03-19" 5 | --- 6 | 7 | ```{r} 8 | library(tidyverse) 9 | library(scales) 10 | 11 | # Load the data from the CSV file 12 | data <- read_csv("college_data_normalized.csv") 13 | 14 | # Filter the data for years 2100 onwards 15 | filtered_data <- data %>% 16 | filter(year >= 2116) 17 | 18 | # Create a dodged bar chart 19 | college_plot <- ggplot(filtered_data, aes(x = college, y = pct, fill = as.factor(year))) + 20 | geom_col(position = position_dodge2(padding = 0.2)) + 21 | scale_y_continuous(labels = label_percent()) + 22 | labs( 23 | title = "Percentage of Enrolled Students by College", 24 | subtitle = "From 2120 onwards", 25 | x = NULL, 26 | y = NULL, 27 | fill = NULL 28 | ) 29 | 30 | college_plot 31 | ``` 32 | 33 | 34 | ```{r} 35 | 36 | #install.packages('showtext') 37 | library(showtext) 38 | font_add_google("Montserrat", "Montserrat") 39 | 40 | showtext_auto() 41 | 42 | main_font = "Montserrat" 43 | 44 | # vinuni color palette - accent colors 45 | # based on branding guideline (page 13) - https://policy.vinuni.edu.vn/all-policies/brand-identity-manual/ 46 | vinuni_palette_main <- c("#35426e", "#d2ae6d", "#c83538", "#2e548a") 47 | vinuni_palette_accents <- c( "#5cc6d0", "#a7c4d2", "#d2d3d5", "#4890bd", "#0087c3", "#d2ae6d") 48 | 49 | #maybe try google palette: https://partnermarketinghub.withgoogle.com/brands/google-assistant/overview/color-palette-and-typography/ 50 | google_palette_main <- c("#EA4335", "#4285F4", "#FBBC04", "#34A853") 51 | 52 | #or MIT color palette: https://brand.mit.edu/color 53 | mit_palette_main <- c("#750014", "#8b959e", "#ff1423", "#000000") 54 | 55 | college_plot + 56 | scale_fill_manual(values = vinuni_palette_accents) + 57 | theme_minimal( 58 | base_family = main_font, 59 | base_size = 11 60 | ) + 61 | theme( 62 | plot.title.position = "plot", 63 | plot.title = element_text(hjust = 0.5, face="bold", colour = vinuni_palette_main[3]), 64 | plot.subtitle = element_text(hjust = 0.5, face="bold"), 65 | legend.position = "bottom", 66 | panel.grid.major.x = element_blank(), 67 | panel.grid.minor.x = element_blank(), 68 | panel.grid.major.y = element_line(color = "grey", linewidth = 0.2), 69 | panel.grid.minor.y = element_blank(), 70 | axis.text = element_text(size = rel(1.0)), 71 | axis.text.x = element_text(face="bold", colour = vinuni_palette_main[1]), 72 | axis.text.y = element_text(face="bold", colour = vinuni_palette_main[1]), 73 | legend.text = element_text(size = rel(0.9)) 74 | ) 75 | 76 | showtext_auto() 77 | ``` 78 | 79 | 80 | ```{r} 81 | 82 | # Saving our theme as a function 83 | theme_vinuni <- function(base_size = 11, base_family = main_font, 84 | base_line_size = base_size / 22, 85 | base_rect_size = base_size / 22) { 86 | # Base our theme on minimal theme 87 | theme_minimal( 88 | base_family = base_family, 89 | base_size = base_size, 90 | base_line_size = base_line_size, 91 | base_rect_size = base_rect_size 92 | ) + 93 | theme( 94 | plot.title.position = "plot", 95 | plot.title = element_text(hjust = 0.5, face="bold", colour = vinuni_palette_main[3]), 96 | plot.subtitle = element_text(hjust = 0.5, face="bold"), 97 | legend.position = "bottom", 98 | panel.grid.major.x = element_blank(), 99 | panel.grid.minor.x = element_blank(), 100 | panel.grid.major.y = element_line(color = "#bbbbbb", linewidth = 0.2), 101 | panel.grid.minor.y = element_blank(), 102 | axis.text = element_text(size = rel(1.0)), 103 | axis.text.x = element_text(face="bold", colour = vinuni_palette_main[1]), 104 | axis.text.y = element_text(face="bold", colour = vinuni_palette_main[1]), 105 | legend.text = element_text(size = rel(0.9)) 106 | ) 107 | } 108 | ``` 109 | 110 | ```{r} 111 | college_plot + 112 | scale_fill_manual(values = vinuni_palette_accents) + 113 | theme_vinuni() 114 | ``` 115 | 116 | ```{r} 117 | data |> 118 | mutate( 119 | college = fct_reorder2(.f = college, .x = year, .y = pct) 120 | ) |> 121 | ggplot(aes(x = year, y = pct, color = college)) + 122 | geom_point() + 123 | geom_line() + 124 | scale_x_continuous(limits = c(2100, 2120), breaks = seq(2100, 2120, 5)) + 125 | scale_y_continuous(labels = label_percent()) + 126 | labs( 127 | x = "Year", 128 | y = "Percent of students admitted", 129 | color = "College", 130 | title = "Percentage of Enrolled Students by College", 131 | ) + 132 | theme_vinuni() + 133 | theme(legend.position = "right") + 134 | scale_color_manual(values = vinuni_palette_main) 135 | ``` -------------------------------------------------------------------------------- /Week 5/img/ae1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 5/img/ae1.png -------------------------------------------------------------------------------- /Week 5/img/ae2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 5/img/ae2.png -------------------------------------------------------------------------------- /Week 5/img/ae3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 5/img/ae3.png -------------------------------------------------------------------------------- /Week 5/img/ae4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 5/img/ae4.png -------------------------------------------------------------------------------- /Week 5/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 5/img/banner.png -------------------------------------------------------------------------------- /Week 5/img/shame.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 5/img/shame.jpg -------------------------------------------------------------------------------- /Week 6/Week6-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.jpg) 2 | 3 | # COMP4010/5120 - Week 6 Application Exercises 4 | --- 5 | 6 | # A. Application Exercises 7 | 8 | **Data:** [`instructional-staff.csv`](instructional-staff.csv) 9 | 10 | The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. [This report](https://www.aaup.org/sites/default/files/files/AAUP_Report_InstrStaff-75-11_apr2013.pdf) by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below. 11 | 12 | ![](img/staff-employment.png) 13 | 14 | Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year. 15 | 16 | In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format. 17 | 18 | ## Task 1. Recreate the visualization 19 | 20 | Reshape the data so we have one row per faculty type and year, and the percentage of hires as a single column. 21 | 22 | ## Task 2. Attempt to recreate the original bar chart as best as you can. 23 | 24 | Don’t worry about theming or color palettes right now. The most important aspects to incorporate: 25 | 26 | - Faculty type on the y-axis with bar segments color-coded based on the year of the survey 27 | - Percentage of instructional staff employees on the x-axis 28 | - Begin the x-axis at 5% 29 | - Label the x-axis at 5% increments 30 | - Match the order of the legend 31 | 32 | > [forcats](https://forcats.tidyverse.org/) contains many functions for defining and adjusting the order of levels for factor variables. Factors are often used to enforce specific ordering of categorical variables in charts. 33 | 34 | ## Task 3. Let’s make it better 35 | 36 | The original plot is not very informative. It’s hard to compare the trends for across each faculty type. 37 | 38 | Improve the chart by using a relative frequency bar chart with year on the y-axis and faculty type encoded using color. 39 | 40 | What are this chart’s advantages and disadvantages? 41 | 42 | ## Task 4: Let’s instead use a line chart. 43 | 44 | Graph the data with year on the x-axis and percentage of employees on the y-axis. Distinguish each faculty type using an appropriate aesthetic mapping. 45 | 46 | ## Task 5: Cleaning it up 47 | 48 | - Add a proper title and labelling to the chart 49 | - Use an optimized color palette 50 | - Order the legend values by the final value of the percentage variable 51 | 52 | ## Task 6: More improvements! 53 | 54 | Colleges and universities have come to rely more heavily on non-tenure track faculty members over time, in particular part-time faculty (e.g. contingent faculty, adjuncts). We want to show how academia is increasingly relying on part-time faculty. 55 | 56 | With your peers, sketch/design a chart that highlights the trend for part-time faculty. What type of geom would you use? What elements would you include? What would you remove? 57 | 58 | Create the chart you designed above using `ggplot2`. 59 | 60 | # B. Reading Material 61 | 62 | ## 1. Hall of Shame 63 | 64 | ![Bad bar chart](img/shame.jpg) 65 | 66 | >At first glance, the chart might seem okay. However, there are plenty of problems with it. 67 | > The Y-axis of the graph has a break in it. The lower ticks at the Y-axis are separated at \$100M. After \$700M, It suddenly jumps from \$700M to \$1,700M. Due to this, the revenue of \$490M looks bigger than \$1213M of government funding. Hence, it’s extremely misleading to present the scale in a way where 1.2 billion looks smaller than or almost equal to 490 million. 68 | > At first glance, a viewer will figure out that television revenue is the same as government funding. This is due to the fact that the blue part of the bar chart is almost equal to the length of the pink part of the bar chart due to the distortion of the Y-axis labels. 69 | > Another major problem in the above chart is that the Revenue and advertising revenue charts should not be separate from the main bar showing total income. They aren’t two separate bars but they are just subdividing the revenue section of the bar showing the total income. The second bar is just showing how the blue part of the first bar is split and the third one is showing how the purple part of the second bar chart is split. 70 | > Thus, we can say that the above chart is an example of bad data visualization as it is intentionally misleading the viewer by distorting the elements of the chart. 71 | [[Source]](https://www.codeconquest.com/blog/12-bad-data-visualization-examples-explained/) 72 | 73 | ## 2. Wide data vs. long data formats 74 | 75 | In data analysis, the terms "*wide*" and "*long*" refer to two different ways of structuring a dataset, especially in the context of handling repeated measures or time series data. Understanding these formats is crucial for data manipulation, visualization, and analysis. Let's define each: 76 | 77 | **Wide Format**: 78 | 79 | - **Characteristic**: In a wide format, each subject or entity is usually in a single row, with multiple columns representing different variables, conditions, time points, or measurements. 80 | - **Example**: Consider a dataset of students with their scores in different subjects. Each student would have a single row, with separate columns for each subject (like Math, Science, English). 81 | 82 | | StudentID | MathScore | ScienceScore | EnglishScore | 83 | |-----------|-----------|--------------|--------------| 84 | | 1 | 85 | 90 | 75 | 85 | | 2 | 88 | 92 | 80 | 86 | | 3 | 90 | 94 | 85 | 87 | 88 | - **Use Case**: The wide format is more intuitive for human reading and is often the format in which data is initially collected. It's well-suited for scenarios where each column represents a distinct category. 89 | 90 | **Long Format**: 91 | 92 | - **Characteristic**: In the long format, each row represents a single observation, often a single time point or condition per subject. There are typically columns indicating the subject/entity, the variable (or time/condition), and the value. 93 | - **Example**: Using the same student dataset, a long format would list each subject's score in a separate row. 94 | 95 | | StudentID | Subject | Score | 96 | |-----------|----------|-------| 97 | | 1 | Math | 85 | 98 | | 1 | Science | 90 | 99 | | 1 | English | 75 | 100 | | 2 | Math | 88 | 101 | | 2 | Science | 92 | 102 | | 2 | English | 80 | 103 | | 3 | Math | 90 | 104 | | 3 | Science | 94 | 105 | | 3 | English | 85 | 106 | 107 | - **Use Case**: The long format is particularly useful for statistical analysis and visualization in many data analysis tools, as it allows for more consistent treatment of variables. It's also the preferred format for many types of time series or repeated measures data analysis. 108 | 109 | Essentially, the wide format is more about having separate columns for different variables, while the long format focuses on having one row per observation, making it easier to apply certain types of data manipulations and analyses. The choice between wide and long formats depends on the specific requirements of your analysis and the tools you are using. 110 | 111 | ## 3. Turning Wide Data into Long Format and Vice Versa in `R` 112 | 113 | In the `tidyverse` package in R, you can use different functions to transform data between wide and long formats. The `tidyr` package, part of the `tidyverse`, provides convenient functions for these transformations: `pivot_longer()` for turning wide data into long format, and `pivot_wider()` for the reverse. 114 | 115 | Let's assume you have a wide data frame named `wide_data`: 116 | 117 | ```R 118 | wide_data <- tibble( 119 | id = 1:3, 120 | score_math = c(90, 85, 88), 121 | score_science = c(92, 89, 87), 122 | score_english = c(93, 90, 85) 123 | ) 124 | ``` 125 | 126 | To convert this into a long format, you'd use `pivot_longer()`: 127 | 128 | ```R 129 | library(tidyverse) 130 | 131 | long_data <- wide_data %>% 132 | pivot_longer( 133 | cols = starts_with("score"), 134 | names_to = "subject", 135 | names_prefix = "score_", 136 | values_to = "score" 137 | ) 138 | ``` 139 | 140 | In this example, `cols = starts_with("score")` selects all columns that start with `"score"` to be transformed. `names_to = "subject"` creates a new column named `"subject"` for the former column names. `names_prefix = "score_"` removes the prefix `"score_"` from these names. Finally, `values_to = "score"` specifies that the actual values go into a column named `"score"`. 141 | 142 | Let's look at the reverse process. Assuming you have a long format data frame `long_data`: 143 | 144 | ```R 145 | long_data <- tibble( 146 | id = rep(1:3, each = 3), 147 | subject = rep(c("math", "science", "english"), times = 3), 148 | score = c(90, 92, 93, 85, 89, 90, 88, 87, 85) 149 | ) 150 | ``` 151 | 152 | To convert this into a wide format, use `pivot_wider()`: 153 | 154 | ```R 155 | wide_data <- long_data %>% 156 | pivot_wider( 157 | names_from = subject, 158 | values_from = score, 159 | names_prefix = "score_" 160 | ) 161 | ``` 162 | 163 | Here, `names_from = subject` tells the function to create new columns from the unique values in the `"subject"` column. `values_from = score` indicates that the values in these new columns should come from the `"score"` column. `names_prefix = "score_"` adds a prefix to the new column names for clarity. 164 | 165 | Both `pivot_longer()` and `pivot_wider()` are highly versatile and can be customized further to accommodate more complex data reshaping tasks. 166 | 167 | 168 | ## 4. Data manipulation with `mutate()`, `fct_reorder()`, and `if_else()` 169 | 170 | ### 4.1. `mutate()` 171 | 172 | `mutate` is used in `dplyr` to create new columns or modify existing ones in a data frame or tibble. It's a key function for data transformation. 173 | 174 | With `mutate`, you can apply a function to one or more existing columns to create a new column, or you can overwrite an existing column with new values. 175 | 176 | For example: 177 | 178 | ```R 179 | library(dplyr) 180 | 181 | data <- tibble( 182 | x = 1:5, 183 | y = c("A", "B", "A", "B", "C") 184 | ) 185 | 186 | data <- data %>% 187 | mutate(z = x * 2) # Creates a new column 'z' which is twice the value of 'x' 188 | ``` 189 | 190 | ### 4.2. `fct_reorder()` 191 | 192 | `fct_reorder()` is from the `forcats` package, which is designed for handling factors (categorical data) in `R`. `fct_reorder` reorders factor levels based on the values of another variable, which is particularly useful for plotting. 193 | 194 | This function is typically used when you want the levels of a factor to be ordered based on some quantitative measure. This is often done before creating plots to ensure that factor levels are displayed in a meaningful order. 195 | 196 | Example: 197 | 198 | ```R 199 | library(forcats) 200 | 201 | data <- tibble( 202 | category = factor(c("A", "B", "A", "B", "C")), 203 | value = c(3, 5, 2, 4, 1) 204 | ) 205 | 206 | data <- data %>% 207 | mutate(category = fct_reorder(category, value, .fun = mean)) 208 | # Reorders 'category' based on the mean of 'value' for each level 209 | ``` 210 | 211 | In this example, `fct_reorder` is used inside `mutate` to reorder the levels of the category factor based on the mean of the corresponding values. 212 | 213 | ### 4.3. `if_else()` 214 | 215 | `if_else` is another function from the `dplyr` package in `R`, which is used for conditional logic. It is similar to the base `R` `ifelse` function but is more strict and type-safe. `if_else` tests a condition and returns one value if the condition is true and another if it is false. 216 | 217 | Let's combine `mutate()`, `fct_reorder()`, and `if_else()` in examples to illustrate their usage: 218 | 219 | 220 | **Example 1: Using `mutate()` with `if_else()`** 221 | Imagine a dataset of employees with their sales numbers, and we want to classify them as "High Performer" or "Low Performer" based on whether their sales exceed a certain threshold: 222 | 223 | ```R 224 | library(dplyr) 225 | 226 | # Example dataset 227 | employee_data <- tibble( 228 | employee_id = 1:5, 229 | sales = c(200, 150, 300, 250, 100) 230 | ) 231 | 232 | # Classify employees based on their sales 233 | employee_data <- employee_data %>% 234 | mutate(performance = if_else(sales > 200, "High Performer", "Low Performer")) 235 | ``` 236 | 237 | Here, `if_else()` is used to create a new column `performance` which classifies employees based on whether their `sales` are greater than 200. 238 | 239 | **Example 2: Using `mutate()` with `fct_reorder()` and `if_else()`** 240 | Now, let's combine both in a more complex scenario: 241 | 242 | ```R 243 | # Combined dataset 244 | data <- tibble( 245 | category = factor(c("A", "B", "A", "B", "C")), 246 | value = c(3, 15, 2, 4, 1), 247 | status = c("New", "Old", "Old", "New", "New") 248 | ) 249 | 250 | # Reorder 'category' and create a new 'status_label' column 251 | data <- data %>% 252 | mutate( 253 | category = fct_reorder(category, value, .fun = mean), 254 | status_label = if_else(value > 5, "High", "Low") 255 | ) 256 | ``` 257 | 258 | In this combined example, we first reorder `category` based on the mean of `value`. Then we use `if_else` to create a new column `status_label` that assigns `"High"` or `"Low"` based on the `value` column. This showcases how `mutate()` can be used to simultaneously perform multiple transformations on a dataset. 259 | -------------------------------------------------------------------------------- /Week 6/Week6-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Week6-AE-RMarkdown" 3 | output: html_document 4 | date: "2024-03-27" 5 | --- 6 | 7 | ```{r} 8 | library(tidyverse) 9 | library(scales) 10 | ``` 11 | 12 | # Take a sad plot, and make it better 13 | 14 | The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. [This](https://www.aaup.org/sites/default/files/files/AAUP_Report_InstrStaff-75-11_apr2013.pdf) report by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below. 15 | 16 | Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year. 17 | 18 | ```{r} 19 | staff <- read_csv("instructional-staff.csv") 20 | staff 21 | ``` 22 | 23 | # Recreate the visualization 24 | 25 | In order to recreate this visualization we need to first reshape the data to have one variable for faculty type and one variable for year. In other words, we will convert the data from the long format to wide format. 26 | 27 | # Task 1: Reshape the data so we have one row per faculty type and year, and the percentage of hires as a single column. 28 | 29 | ```{r} 30 | # YOUR CODE HERE 31 | 32 | ``` 33 | 34 | # Task 2: Attempt to recreate the original bar chart as best as you can. 35 | 36 | Don’t worry about theming or color palettes right now. The most important aspects to incorporate: 37 | 38 | - Faculty type on the y-axis with bar segments color-coded based on the year of the survey 39 | - Percentage of instructional staff employees on the x-axis 40 | - Begin the x-axis at 5% 41 | - Label the x-axis at 5% increments 42 | - Match the order of the legend 43 | 44 | > [forcats](https://forcats.tidyverse.org/) contains many functions for defining and adjusting the order of levels for factor variables. Factors are often used to enforce specific ordering of categorical variables in charts. 45 | 46 | ```{r} 47 | # YOUR CODE HERE 48 | 49 | ``` 50 | 51 | # Let’s make it better 52 | 53 | The original plot is not very informative. It’s hard to compare the trends for across each faculty type. 54 | 55 | # Task 3: Improve the chart by using a relative frequency bar chart with year on the y-axis and faculty type encoded using color. 56 | 57 | ```{r} 58 | # YOUR CODE HERE 59 | 60 | ``` 61 | 62 | What are this chart’s advantages and disadvantages? - ADD YOUR RESPONSE HERE 63 | 64 | # Task 4: Let’s instead use a line chart. 65 | 66 | Graph the data with year on the x-axis and percentage of employees on the y-axis. Distinguish each faculty type using an appropriate aesthetic mapping. 67 | 68 | ```{r} 69 | # YOUR CODE HERE 70 | 71 | ``` 72 | 73 | # Task 5: Cleaning it up. 74 | 75 | - Add a proper title and labelling to the chart 76 | - Use an optimized color palette 77 | - Order the legend values by the final value of the percentage variable 78 | 79 | ```{r} 80 | # YOUR CODE HERE 81 | 82 | ``` 83 | 84 | # Task 6: More improvements! 85 | Colleges and universities have come to rely more heavily on non-tenure track faculty members over time, in particular part-time faculty (e.g. contingent faculty, adjuncts). We want to show how academia is increasingly relying on part-time faculty. 86 | 87 | With your peers, sketch/design a chart that highlights the trend for part-time faculty. What type of geom would you use? What elements would you include? What would you remove? 88 | 89 | - ADD YOUR RESPONSE HERE 90 | 91 | Create the chart you designed above using `ggplot2`. 92 | 93 | ```{r} 94 | # YOUR CODE HERE 95 | 96 | ``` 97 | -------------------------------------------------------------------------------- /Week 6/fjc-judges.RData: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 6/fjc-judges.RData -------------------------------------------------------------------------------- /Week 6/img/banner.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 6/img/banner.jpg -------------------------------------------------------------------------------- /Week 6/img/shame.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 6/img/shame.jpg -------------------------------------------------------------------------------- /Week 6/img/staff-employment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 6/img/staff-employment.png -------------------------------------------------------------------------------- /Week 6/instructional-staff.csv: -------------------------------------------------------------------------------- 1 | faculty_type,1975,1989,1993,1995,1999,2001,2003,2005,2007,2009,2011 2 | Full-Time Tenured Faculty,29,27.6,25,24.8,21.8,20.3,19.3,17.8,17.2,16.8,16.7 3 | Full-Time Tenure-Track Faculty,16.1,11.4,10.2,9.6,8.9,9.2,8.8,8.2,8,7.6,7.4 4 | Full-Time Non-Tenure-Track Faculty,10.3,14.1,13.6,13.6,15.2,15.5,15,14.8,14.9,15.1,15.4 5 | Part-Time Faculty,24,30.4,33.1,33.2,35.5,36,37,39.3,40.5,41.1,41.3 6 | Graduate Student Employees,20.5,16.5,18.1,18.8,18.7,19,20,19.9,19.5,19.4,19.3 -------------------------------------------------------------------------------- /Week 7/Week7-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.gif) 2 | 3 | # COMP4010/5120 - Week 7 Application Exercises 4 | --- 5 | 6 | # A. Application Exercises 7 | 8 | **Important documentation:** [`ggplot2 annotate()`](https://ggplot2.tidyverse.org/reference/annotate.html) 9 | 10 | **Data:** `wdi_co2_raw <- WDI(country = "all", c("SP.POP.TOTL","EN.ATM.CO2E.PC","NY.GDP.PCAP.KD"), extra = TRUE, start = 1995, end = 2023)` from the `WDI` library by World Bank. 11 | 12 | Although this dataset contains up to 2023, the CO2 emissions data is mostly `NA` from 2021 onwards, so we will be using only rows up until 2020. Also, to reduce clutter, let's only consider countries with a population greater than 200,000. 13 | 14 | ![Data columns](img/notes1.jpg) 15 | 16 | For convenience you should rename some columns to more *usable* names: 17 | 18 | - `SP.POP.TOTL` to `population`, 19 | - `EN.ATM.CO2E.PC` to `co2_emissions`, 20 | - `NY.GDP.PCAP.KD` to `gdp_per_cap`, 21 | 22 | ```R 23 | wdi_clean <- wdi_co2_raw |> 24 | filter(region != "Aggregates") |> 25 | filter(population > 200000) |> 26 | select(iso2c, iso3c, country, year, 27 | population = SP.POP.TOTL, 28 | co2_emissions = EN.ATM.CO2E.PC, 29 | gdp_per_cap = NY.GDP.PCAP.KD, 30 | region, income 31 | ) 32 | ``` 33 | 34 | Create rankings for countries by CO2 emissions by year: 35 | 36 | ```R 37 | co2_rankings <- wdi_clean |> 38 | # Get rid of all the rows that have missing values in co2_emissions 39 | drop_na(co2_emissions) |> 40 | # Look at each year individually and rank countries based on their emissions that year 41 | mutate( 42 | ranking = rank(co2_emissions), 43 | .by = year 44 | ) 45 | ``` 46 | 47 | ## Task 1: Prepare data in wide format 48 | 49 | Convert the data frame into wide format to better see the ranking by year. 50 | 51 | ![Task 1](img/task1.jpg) 52 | 53 | ## Task 2. Data wrangling 54 | 55 | In this task you will need to prepare the data for further visualization tasks. 56 | 57 | - Calculate the difference in rankings between any 2 years (e.g. 2020 and 1995) and store in a column called `rank_diff`. 58 | - Create a column called `significant_diff` to indicate whether the rank changed by more than 30 positions, and whether it was a significant increase or significant decrease in ranking. 59 | 60 | ![Task 2](img/task2.jpg) 61 | 62 | ## Task 3. Scatter plot for changes in CO2 emission rankings between 1995 and 2020 (or the years you've chosen) 63 | 64 | Create a basic plot that visualizes the changes in CO2 emission rankings between 1995 and 2020 (or the years you've chosen). 65 | 66 | ![Task 3](img/task3.png) 67 | 68 | # Task 4. Lazy way to show change in rank 69 | 70 | Can you come up with the simplest visual element we can use to demonstrate whether a country increased, decreased, or maintained their ranking? You shouldn't need to change the data in any way. 71 | Add it to your current plot. (You may need to read the [`ggplot2 annotate()`](https://rfortherestofus.com/2023/10/annotate-vs-geoms) examples). 72 | 73 | # Task 5. Highlight significant countries 74 | 75 | Add colors to better separate the three groups of countries based on the change in ranking. Use [`geom_label_repel()`](https://r-graph-gallery.com/package/ggrepel.html) from the `ggrepel` library to add the names of the countries which had a significant change in their rank. 76 | 77 | ![Task 5](img/task5.png) 78 | 79 | # Task 6. Additional text annotations 80 | 81 | Since our plot is showing the relationship between the ranking of two different years, it essentially divides the plot into two halves. Add text annotations to provide labels for the two halves of the plot, `"Countries improving"` and `"Countries worsening"`. 82 | 83 | ![Task 6](img/task6.png) 84 | 85 | # Task 7. Using colors to redirect attention 86 | 87 | Before we added colors to highlight the different classes of countries based on their change in ranking. Now use `scale_color_manual()` to change the color of the insignificant countries to be more *uninteresting*. 88 | 89 | ![Task 7](img/task7.png) 90 | 91 | # Task 8. More geometric annotations 92 | 93 | With `annotate()`, use `segment` (with `arrow` parameter) and `rect` to create boxes, arrows, and text labels to highlight region containing the top 25 and bottom 25 ranking countries labelled `"Lowest emitters"` and `"Highest emitters"`. 94 | 95 | ![Task 8](img/task8.png) 96 | 97 | # B. Reading Material 98 | 99 | ## Hall of Fame! 100 | 101 | A YouTuber have created a graph of all Wikipedia articles, which showcases how much information you can extract from an unfathomable amount of data. 102 | 103 | [I Made a Graph of Wikipedia... This Is What I Found](https://www.youtube.com/watch?v=JheGL6uSF-4&ab_channel=adumb) 104 | [![I Made a Graph of Wikipedia... This Is What I Found](img/fame.jpg)](https://www.youtube.com/watch?v=JheGL6uSF-4&ab_channel=adumb) 105 | 106 | Within the same graph, using different annotation elements (mostly highlighting), the author presented multiple interesting findings like trends, relationships, communinities, special communities, isolated articles (deadends, orphans), etc. 107 | 108 | The video is a great example showing the power of extracting *information* from *data*, with visualization. 109 | -------------------------------------------------------------------------------- /Week 7/Week7-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Week7-AE-RMarkdown" 3 | output: html_document 4 | date: "2024-03-27" 5 | --- 6 | 7 | ```{r} 8 | library(tidyverse) 9 | library(scales) 10 | 11 | 12 | #install.packages('WDI') 13 | library(WDI) 14 | #install.packages('ggrepel') 15 | library(ggrepel) 16 | #install.packages('ggtext') 17 | library(ggtext) 18 | ``` 19 | ```{r} 20 | indicators <- c("SP.POP.TOTL", # Population 21 | "EN.ATM.CO2E.PC", # CO2 emissions 22 | "NY.GDP.PCAP.KD") # GDP per capita 23 | 24 | # CO2 emissions data is mostly NULL from 2021 onwards... 25 | wdi_co2_raw <- WDI(country = "all", indicators, extra = TRUE, 26 | start = 1995, end = 2023) 27 | ``` 28 | 29 | ```{r} 30 | wdi_clean <- wdi_co2_raw |> 31 | filter(region != "Aggregates") |> 32 | select(iso2c, iso3c, country, year, 33 | population = SP.POP.TOTL, 34 | co2_emissions = EN.ATM.CO2E.PC, 35 | gdp_per_cap = NY.GDP.PCAP.KD, 36 | region, income 37 | ) |> 38 | filter(population > 200000) 39 | ``` 40 | 41 | ```{r} 42 | co2_rankings <- wdi_clean |> 43 | # Get rid of all the rows that have missing values in co2_emissions 44 | drop_na(co2_emissions) |> 45 | # Look at each year individually and rank countries based on their emissions that year 46 | mutate( 47 | ranking = rank(co2_emissions), 48 | .by = year 49 | ) 50 | ``` 51 | 52 | 53 | # Task 1: Prepare data in wide format 54 | ```{r} 55 | # YOUR CODE HERE 56 | 57 | 58 | 59 | ``` 60 | 61 | 62 | # Task 2: Data wrangling 63 | ```{r} 64 | # YOUR CODE HERE 65 | 66 | 67 | 68 | ``` 69 | 70 | 71 | # Task 3: Scatter plot for changes in CO2 emission rankings between 1995 and 2020 72 | ```{r} 73 | # YOUR CODE HERE 74 | 75 | 76 | 77 | ``` 78 | 79 | 80 | # Task 4: Lazy way to show change in rank 81 | ```{r} 82 | # YOUR CODE HERE 83 | 84 | 85 | 86 | ``` 87 | 88 | 89 | # Task 5: Highlight significant countries 90 | ```{r} 91 | # YOUR CODE HERE 92 | 93 | 94 | 95 | ``` 96 | 97 | 98 | # Task 6: Additional text annotations 99 | ```{r} 100 | # YOUR CODE HERE 101 | 102 | 103 | 104 | ``` 105 | 106 | 107 | # Task 7: Using colors to redirect attention 108 | ```{R} 109 | # YOUR CODE HERE 110 | 111 | 112 | 113 | ``` 114 | 115 | 116 | # Task 8: More geometric annotations 117 | ```{r} 118 | # YOUR CODE HERE 119 | 120 | 121 | 122 | ``` 123 | -------------------------------------------------------------------------------- /Week 7/img/banner.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/banner.gif -------------------------------------------------------------------------------- /Week 7/img/fame.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/fame.jpg -------------------------------------------------------------------------------- /Week 7/img/notes1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/notes1.jpg -------------------------------------------------------------------------------- /Week 7/img/task1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task1.jpg -------------------------------------------------------------------------------- /Week 7/img/task2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task2.jpg -------------------------------------------------------------------------------- /Week 7/img/task3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task3.png -------------------------------------------------------------------------------- /Week 7/img/task4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task4.png -------------------------------------------------------------------------------- /Week 7/img/task5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task5.png -------------------------------------------------------------------------------- /Week 7/img/task6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task6.png -------------------------------------------------------------------------------- /Week 7/img/task7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task7.png -------------------------------------------------------------------------------- /Week 7/img/task8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 7/img/task8.png -------------------------------------------------------------------------------- /Week 9/Week9-AE-Notes.md: -------------------------------------------------------------------------------- 1 | ![Banner](img/banner.png) 2 | 3 | # COMP4010/5120 - Week 9 Application Exercises 4 | --- 5 | 6 | # A. Application Exercises 7 | 8 | **Data:** [`nurses.csv`](./nurses.csv) 9 | 10 | ```R 11 | nurses <- read_csv("nurses.csv") |> clean_names() 12 | 13 | # subset to three states 14 | nurses_subset <- nurses |> 15 | filter(state %in% c("California", "New York", "North Carolina")) 16 | ``` 17 | 18 | The following code chunk demonstrates how to add alternative text to a bar chart. The alternative text is added to the chunk header using the `fig-alt` chunk option. The text is written in Markdown and can be as long as needed. Note that fig-cap is not the same as `fig-alt`. 19 | 20 | ```R 21 | #| label: nurses-bar 22 | #| fig-cap: "Total employed Registered Nurses" 23 | #| fig-alt: "The figure is a bar chart titled 'Total employed Registered 24 | #| Nurses' that displays the numbers of registered nurses in three states 25 | #| (California, New York, and North Carolina) over a 20 year period, with data 26 | #| recorded in three time points (2000, 2010, and 2020). In each state, the 27 | #| numbers of registered nurses increase over time. The following numbers are 28 | #| all approximate. California started off with 200K registered nurses in 2000, 29 | #| 240K in 2010, and 300K in 2020. New York had 150K in 2000, 160K in 2010, and 30 | #| 170K in 2020. Finally North Carolina had 60K in 2000, 90K in 2010, and 100K 31 | #| in 2020." 32 | 33 | nurses_subset |> 34 | filter(year %in% c(2000, 2010, 2020)) |> 35 | ggplot(aes(x = state, y = total_employed_rn, fill = factor(year))) + 36 | geom_col(position = "dodge") + 37 | scale_fill_viridis_d(option = "E") + 38 | scale_y_continuous(labels = label_number(scale = 1/1000, suffix = "K")) + 39 | labs( 40 | x = "State", y = "Number of Registered Nurses", fill = "Year", 41 | title = "Total employed Registered Nurses" 42 | ) + 43 | theme( 44 | legend.background = element_rect(fill = "white", color = "white"), 45 | legend.position = c(0.85, 0.75) 46 | ) 47 | ``` 48 | 49 | 50 | ![Task 0](img/task0.png) 51 | 52 | ## Task 1. Add alt text to line chart 53 | 54 | Add appropriate alt text (label, caption, and alt text) to make the following chart accessible for screen readers. 55 | 56 | ```R 57 | # Your label here 58 | # Your caption here 59 | # Your alt text here 60 | 61 | nurses_subset |> 62 | ggplot(aes(x = year, y = annual_salary_median, color = state)) + 63 | geom_line() + 64 | scale_y_continuous(labels = label_dollar(scale = 1/1000, suffix = "K")) + 65 | labs( 66 | x = "Year", y = "Annual median salary", color = "State", 67 | title = "Annual median salary of Registered Nurses" 68 | ) + 69 | coord_cartesian(clip = "off") + 70 | theme( 71 | plot.margin = margin(0.1, 0.9, 0.1, 0.1, "in") 72 | ) 73 | ``` 74 | 75 | ![Task 1](img/task1.png) 76 | 77 | ## Task 2. Direct labelling instead of legends 78 | 79 | Direct labels is when you add corresponding text annotations to a plot to describe the visual elements without relying on legends. 80 | 81 | Create a version of the same line chart but using direct labels instead of legends. 82 | 83 | You can achieve this with just `geom_text()` but you can also check out https://r-graph-gallery.com/web-line-chart-with-labels-at-end-of-line.html for a fancier way of achieving this. 84 | 85 | ![Task 2](img/task2.png) 86 | 87 | ## Task 3. Colorblind-friendly plots 88 | 89 | Use `colorblindr` for colorblind-friendly palettes. 90 | ```{r} 91 | #remotes::install_github("wilkelab/cowplot") 92 | #install.packages("colorspace", repos = "http://R-Forge.R-project.org") 93 | #remotes::install_github("clauswilke/colorblindr") 94 | library(colorblindr) 95 | ``` 96 | 97 | Try out colorblind simulations at http://hclwizard.org/cvdemulator/ or |> your plot to `cvd_grid()` to see the plot in various color-vision-deficiency simulations. 98 | 99 | With the line chart from Task 1, create 3 different plots: one with the default color scale, one with the `viridis` color scale, and one with the `OkabeIto` color scale from `colorblindr`. Show the `cvd_grid()` of each plot and describe the simulated effectiveness of the color scales for colorblind viewers. 100 | 101 | For example, the grid for the default palette should look like this. 102 | 103 | ![Task 3](img/task3.png) 104 | 105 | # B. Reading Material 106 | 107 | ## 1. Accessible data visualization 108 | 109 | Accessibility in data visualization is essential because it ensures that all individuals, regardless of disabilities or limitations, have the opportunity to understand and engage with data. 110 | 111 | Data visualizations are powerful tools for conveying complex information quickly and effectively. Ensuring these tools are accessible means that everyone, including people with disabilities such as visual impairments, cognitive differences, or mobility restrictions, can benefit from the data being presented. This democratization of information supports equality and inclusivity. 112 | 113 | Many regions have legal requirements, such as the Americans with Disabilities Act (ADA) in the U.S., which mandate that digital content, including data visualizations, be accessible to all users. Beyond legal compliance, there is an ethical imperative to ensure that no group is excluded from accessing information that could impact their personal, professional, or educational lives. 114 | 115 | Designing for accessibility often results in clearer and more comprehensible visualizations for all users, not just those with disabilities. This concept, known as the "curb-cut effect," originates from the idea that curb cuts, while designed for wheelchairs, benefit everyone including cyclists, parents with strollers, and more. Similarly, accessible data visualizations can provide benefits such as better readability and easier comprehension for a wider audience. 116 | 117 | By making data visualizations accessible, creators can engage a more diverse audience. This diversity can lead to a broader range of feedback and insights, potentially improving the data analysis and communication strategies based on a wider array of perspectives. 118 | 119 | ## 2. Accessibility factors 120 | 121 | **Visual Impairments**: This includes a range of conditions from complete blindness to various forms of low vision and color vision deficiencies (colorblindness). Visualizations should be designed so that they are still comprehensible without reliance solely on color or fine visual details. Techniques such as high contrast, alternative text descriptions, and screen reader compatibility can enhance accessibility. 122 | 123 | **Hearing Impairments**: While data visualizations are primarily visual, any accompanying audio explanations or alerts must be accessible. Providing captions or transcripts for audio content can ensure that users who are deaf or hard of hearing can still access all the information. 124 | 125 | **Cognitive Disabilities**: Individuals with cognitive disabilities, including dyslexia, autism, and intellectual disabilities, might find complex visualizations challenging. Simplifying information, avoiding sensory overload, and allowing users to control interactive elements at their own pace can help make data visualizations more accessible. 126 | 127 | **Mobility and Motor Impairments**: Some users may have difficulty with fine motor control, impacting their ability to interact with highly interactive visualizations. Designing with keyboard navigation in mind and ensuring that interactive elements are large enough to be easily clicked can help. 128 | 129 | **Technological Limitations**: Not all users have access to the latest technology. Some might be using older software or devices with lower processing power or without support for advanced graphical presentations. Ensuring that visualizations are accessible on a variety of platforms and devices is essential. 130 | 131 | **Environmental Factors**: The environment in which a visualization is accessed can impact visibility and interaction. For example, viewing a visualization in a brightly lit area or on a small smartphone screen in direct sunlight can affect how easily the information is consumed. Designing with good contrast and clarity can mitigate some of these issues. 132 | 133 | **Educational Background and Expertise**: The level of a user's expertise and familiarity with the data or the tools used to display the data can influence how effectively they can interpret a visualization. Providing explanatory notes, glossaries, or adjustable complexity levels can help bridge different levels of expertise. 134 | 135 | **Cultural and Linguistic Differences**: Symbols, color meanings, and layout preferences can vary widely between cultures. Additionally, ensuring that visualizations are usable by speakers of different languages by supporting localization and internationalization can expand accessibility. 136 | 137 | ## 3. Colorblindness 138 | 139 | ![Colorblindness reference chart](img/colorblind-game.png) 140 | 141 | Colorblindness, also known as color vision deficiency, is a condition where people see colors differently than those with typical color vision. There are several types of colorblindness, each affecting how certain colors are perceived. The types are generally categorized based on which particular color sensitivities are impaired. The main types of colorblindess are: 142 | 143 | - **Red-Green Colorblindness**: This is the most common form of colorblindness. It occurs due to a deficiency or absence of red cones (protan) or green cones (deutan) in the eyes. 144 | - Protanomaly: This is a reduced sensitivity to red light. Individuals with protanomaly see red, orange, and yellow as greener and less bright than typical. 145 | - Protanopia: This involves a total absence of red cones. Red appears as black, and certain shades of orange, yellow, and green can appear as yellow. 146 | - Deuteranomaly: This is a reduced sensitivity to green light. It affects the ability to distinguish between some shades of red, orange, yellow, and green, which look more similar. 147 | - Deuteranopia: This is characterized by the absence of green cones. Green appears as beige, and red looks brownish-yellow, making it hard to differentiate hues in this spectrum. 148 | - **Blue-Yellow Colorblindness**: Less common than red-green colorblindness, this type involves the blue cones in the retina. 149 | - Tritanomaly: Reduced sensitivity to blue light. Blue appears greener, and it can be difficult to distinguish yellow and red from pink. 150 | - Tritanopia: This involves a lack of blue photoreceptors. Blue appears as green and yellow looks light grey or violet. 151 | - **Complete Colorblindness (Achromatopsia)**: This rare condition involves a total absence of color vision. People with achromatopsia see the world in shades of grey. This condition is often associated with sensitivity to light, blurred vision, and involuntary eye movements (nystagmus). 152 | - **Partial Colorblindness (Achromatomaly)**: Also rare, this involves limited color perception, where colors are generally perceived but are washed out and not as vibrant as they appear to those with normal color vision. 153 | 154 | Visit http://hclwizard.org/cvdemulator/ and upload an image to get an idea of how each type of colorblindness affect color perception. 155 | 156 | So how does this affect our data visualizations? Many data visualizations rely on color differences to convey distinct categories, relationships, trends, or priorities. For individuals with colorblindness, colors that might appear distinct to others can be indistinguishable. For example, red and green, often used to signify opposing conditions such as stop and go or decrease and increase, can look nearly identical to someone with red-green colorblindness. 157 | 158 | ![Designing for colorblindness 1](img/avoid1.jpg) 159 | ![Designing for colorblindness 2](img/avoid2.jpg) 160 | 161 | 162 | Furthermore, if a visualization uses color as the sole method for distinguishing data points, colorblind users may miss out on critical information. This can lead to misunderstandings or incomplete interpretations of the data, which can be particularly problematic in fields where accurate data interpretation is crucial, such as in finance, healthcare, and safety-related industries. 163 | 164 | When color is not perceived as intended, colorblind individuals may need to spend additional time and effort to understand the visualization. They might have to rely on contextual clues, legends, or labels more heavily than other viewers, which can make the process of interpreting data more cumbersome and less intuitive. 165 | 166 | Visualizations that are not designed with colorblindness in mind can also be less aesthetically pleasing to colorblind viewers, potentially decreasing engagement. Engagement is critical in many contexts, such as education, marketing, and public information dissemination. When people cannot fully perceive visual data presentations, they might feel excluded from discussions or decisions that are based on such visualizations. This exclusion can impact personal opportunities and broader societal participation. 167 | 168 | ![Don't rely only on colors](img/dont.jpg) 169 | 170 | If your audience may include colorblind viewers, consider the following measures: 171 | 172 | - **Use Colorblind-Friendly Palettes**: Opt for palettes that are distinguishable to those with various types of color vision deficiencies. Tools and resources, such as colorblind simulation software, can help designers choose appropriate colors. 173 | - **Add Text Labels and Symbols**: Incorporating text labels, symbols, or textures in addition to color can help convey information clearly to all viewers, regardless of how they perceive color. 174 | - **Employ Contrasting Shades**: Utilize contrasts not just in color but also in brightness and saturation. Different shades can help delineate data more clearly for those with color vision deficiencies. 175 | - **Utilize Patterns and Shapes**: Patterns and shapes can help differentiate elements in a chart where colors might fail. For instance, using different line styles (solid, dashed) or markers (circles, squares) can indicate different data series effectively. 176 | 177 | Other than colors, contrast can also affect how different people perceive your visualization. To check the contrast level between two colors, you can use https://coolors.co/contrast-checker or the package [`coloratio`](https://matt-dray.github.io/coloratio/). Usually, a contrast value of 4.5 or higher is desired for foreground-background colors. 178 | 179 | ![Color contrast example](img/contrast.png) -------------------------------------------------------------------------------- /Week 9/Week9-AE-RMarkdown.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Week9-AE-RMarkdown" 3 | output: html_document 4 | date: "2024-03-27" 5 | --- 6 | 7 | ```{r} 8 | library(tidyverse) 9 | library(readxl) 10 | library(scales) 11 | #install.packages('janitor') 12 | library(janitor) 13 | ``` 14 | 15 | ```{r} 16 | nurses <- read_csv("nurses.csv") |> clean_names() 17 | 18 | # subset to three states 19 | nurses_subset <- nurses |> 20 | filter(state %in% c("California", "New York", "North Carolina")) 21 | ``` 22 | 23 | The following code chunk demonstrates how to add alternative text to a bar chart. The alternative text is added to the chunk header using the `fig-alt` chunk option. The text is written in Markdown and can be as long as needed. Note that fig-cap is not the same as `fig-alt`. 24 | 25 | ```{r} 26 | #| label: nurses-bar 27 | #| fig-cap: "Total employed Registered Nurses" 28 | #| fig-alt: "The figure is a bar chart titled 'Total employed Registered 29 | #| Nurses' that displays the numbers of registered nurses in three states 30 | #| (California, New York, and North Carolina) over a 20 year period, with data 31 | #| recorded in three time points (2000, 2010, and 2020). In each state, the 32 | #| numbers of registered nurses increase over time. The following numbers are 33 | #| all approximate. California started off with 200K registered nurses in 2000, 34 | #| 240K in 2010, and 300K in 2020. New York had 150K in 2000, 160K in 2010, and 35 | #| 170K in 2020. Finally North Carolina had 60K in 2000, 90K in 2010, and 100K 36 | #| in 2020." 37 | 38 | nurses_subset |> 39 | filter(year %in% c(2000, 2010, 2020)) |> 40 | ggplot(aes(x = state, y = total_employed_rn, fill = factor(year))) + 41 | geom_col(position = "dodge") + 42 | scale_fill_viridis_d(option = "E") + 43 | scale_y_continuous(labels = label_number(scale = 1/1000, suffix = "K")) + 44 | labs( 45 | x = "State", y = "Number of Registered Nurses", fill = "Year", 46 | title = "Total employed Registered Nurses" 47 | ) + 48 | theme( 49 | legend.background = element_rect(fill = "white", color = "white"), 50 | legend.position = c(0.85, 0.75) 51 | ) 52 | ``` 53 | 54 | # Task 1. Add alt text to line chart 55 | ```{r} 56 | # Your label here 57 | # Your caption here 58 | # Your alt text here 59 | 60 | nurses_subset |> 61 | ggplot(aes(x = year, y = annual_salary_median, color = state)) + 62 | geom_line() + 63 | scale_y_continuous(labels = label_dollar(scale = 1/1000, suffix = "K")) + 64 | labs( 65 | x = "Year", y = "Annual median salary", color = "State", 66 | title = "Annual median salary of Registered Nurses" 67 | ) + 68 | coord_cartesian(clip = "off") + 69 | theme( 70 | plot.margin = margin(0.1, 0.9, 0.1, 0.1, "in") 71 | ) 72 | ``` 73 | 74 | # Task 2. Direct labelling instead of legends 75 | 76 | Create a version of the same line chart but using direct labels instead of legends. 77 | 78 | You can achieve this with just `geom_text()` but you can also check out https://r-graph-gallery.com/web-line-chart-with-labels-at-end-of-line.html for a fancier way of achieving this. 79 | 80 | ```{r} 81 | # YOUR CODE HERE 82 | 83 | ``` 84 | 85 | # Task 3. Colorblind-friendly plots 86 | Use `colorblindr` for colorblind-friendly palettes. 87 | ```{r} 88 | #remotes::install_github("wilkelab/cowplot") 89 | #install.packages("colorspace", repos = "http://R-Forge.R-project.org") 90 | #remotes::install_github("clauswilke/colorblindr") 91 | library(colorblindr) 92 | ``` 93 | 94 | Try out colorblind simulations at http://hclwizard.org/cvdemulator/ or |> your plot to `cvd_grid()` to see the plot in various color-vision-deficiency simulations. 95 | 96 | With the line chart from Task 1, create 3 different plots: one with the default color scale, one with the `viridis` color scale, and one with the `OkabeIto` color scale from `colorblindr`. Show the `cvd_grid()` of each plot and describe the simulated effectiveness of the color scales for colorblind viewers. 97 | 98 | ```{r} 99 | # YOUR CODE HERE for default color scale 100 | 101 | ``` 102 | 103 | What do you think of the default color scale effectiveness for colorblind viewers? 104 | 105 | YOUR ANSWER HERE 106 | 107 | --- 108 | 109 | ```{r} 110 | # YOUR CODE HERE for viridis color scale 111 | 112 | ``` 113 | 114 | What do you think of the viridis color scale effectiveness for colorblind viewers? 115 | 116 | YOUR ANSWER HERE 117 | 118 | --- 119 | 120 | ```{r} 121 | # YOUR CODE HERE for OkabeIto color scale 122 | 123 | ``` 124 | 125 | What do you think of the OkabeIto color scale effectiveness for colorblind viewers? 126 | 127 | YOUR ANSWER HERE 128 | 129 | 130 | -------------------------------------------------------------------------------- /Week 9/img/avoid1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/avoid1.jpg -------------------------------------------------------------------------------- /Week 9/img/avoid2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/avoid2.jpg -------------------------------------------------------------------------------- /Week 9/img/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/banner.png -------------------------------------------------------------------------------- /Week 9/img/colorblind-game.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/colorblind-game.png -------------------------------------------------------------------------------- /Week 9/img/contrast.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/contrast.png -------------------------------------------------------------------------------- /Week 9/img/dont.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/dont.jpg -------------------------------------------------------------------------------- /Week 9/img/task0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/task0.png -------------------------------------------------------------------------------- /Week 9/img/task1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/task1.png -------------------------------------------------------------------------------- /Week 9/img/task2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/task2.png -------------------------------------------------------------------------------- /Week 9/img/task3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lennemo09/COMP4010-Spring24/1510166559d09c744687528820554b682c2883c1/Week 9/img/task3.png --------------------------------------------------------------------------------