class: center, middle, inverse, title-slide # Data visualization ## Lecture 2 ###
Louis SIRUGUE ### M1 APE - Fall 2022 --- <style type="text/css"> body{background-color:black;filter:invert(1)} </style> <style> .left-column {width: 65%;} .right-column {width: 30%;} </style> ### Quick reminder #### 1. Import data ```r fb <- read.csv("C:/User/Documents/ligue1.csv", encoding = "UTF-8") ``` -- <p style = "margin-bottom:1.5cm;"></p> #### 2. Class ```r is.numeric("1.6180339") # What would be the output? ``` -- ``` ## [1] FALSE ``` -- <p style = "margin-bottom:1.5cm;"></p> #### 3. Subsetting ```r fb$Home[3] ``` ``` ## [1] "Troyes" ``` --- ### Quick reminder #### 4. Packages ```r library(dplyr) ``` -- <p style = "margin-bottom:1.5cm;"></p> #### 5. The dplyr grammar .left-column[ <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption></caption> <thead> <tr> <th style="text-align:left;"> Function </th> <th style="text-align:left;"> Meaning </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> mutate() </td> <td style="text-align:left;"> Modify or create a variable </td> </tr> <tr> <td style="text-align:left;"> select() </td> <td style="text-align:left;"> Keep a subset of variables </td> </tr> <tr> <td style="text-align:left;"> filter() </td> <td style="text-align:left;"> Keep a subset of observations </td> </tr> <tr> <td style="text-align:left;"> arrange() </td> <td style="text-align:left;"> Sort the data </td> </tr> <tr> <td style="text-align:left;"> group_by() </td> <td style="text-align:left;"> Group the data </td> </tr> <tr> <td style="text-align:left;"> summarise() </td> <td style="text-align:left;"> Summarizes variables into 1 observation per group </td> </tr> </tbody> </table> ] -- .right-column[ <img style = "margin-top:0cm; margin-left:1.5cm;" src = "pipe.png" width = "180"/> ] --- class: inverse, hide-logo ### Warm up practice #### 1) Import `starbucks.csv` and `View()` the data <p style = "margin-bottom:1.5cm;"></p> -- #### 2) Inspect the structure of the data using `str()` <p style = "margin-bottom:1.5cm;"></p> -- #### 3) Use `summarise()` to compute for each beverage category the average number of calories and the number of different declinations (there is 1 row per declination) <p style = "margin-bottom:1.5cm;"></p> -- #### 4) Create a subset of the data called `maxcal` containing the variables `Beverage_category`, `Beverage_prep`, and `Calories`, for the 10 observations with the highest calorie values <center><i>You can use the row_number() function within filter() to use the row numbers as any other variable</i></center> <p style = "margin-bottom:1.5cm;"></p> -- <center><h3><i>You've got 10 minutes!</i></h3></center>
−
+
10
:
00
--- class: inverse, hide-logo ### Solution #### 1) Import `starbucks.csv` and `View()` the data ```r starbucks <- read.csv("C:/User/Documents/starbucks.csv") View(starbucks) ``` -- <center><img src = "view_starbucks.png" width = "1100"/></center> --- class: inverse, hide-logo ### Solution #### 1) Import `starbucks.csv` and `View()` the data ```r starbucks <- read.csv("C:/User/Documents/starbucks.csv") View(starbucks) ``` <center><img src = "view_starbucks_sep.png" width = "1100"/></center> <p style = "margin-bottom:1cm;"></p> <ul> <li>We only have <b>one variable</b> in which all values are <b>separated by semicolons</b></li> <ul> <li>We need to set the <b>sep</b> argument of the function accordingly</li> </ul> </ul> --- class: inverse, hide-logo ### Solution #### 1) Import `starbucks.csv` and `View()` the data ```r starbucks <- read.csv("C:/User/Documents/starbucks.csv") View(starbucks) ``` <center><img src = "view_starbucks_encoding.png" width = "1100"/></center> <p style = "margin-bottom:1cm;"></p> <ul> <li>We only have <b>one variable</b> in which all values are <b>separated by semicolons</b></li> <ul> <li>We need to set the <b>sep</b> argument of the function accordingly</li> <li>Like last time, we also need to set the <b>encoding</b> argument correctly</li> </ul> </ul> -- ```r *starbucks <- read.csv("C:/User/Documents/starbucks.csv", sep = ";", encoding = "UTF-8") ``` --- class: inverse, hide-logo ### Solution #### 2) Inspect the structure of the data using `str()` ```r str(starbucks) ``` -- ```text ## 'data.frame': 242 obs. of 18 variables: ## $ Beverage_category : chr "Coffee" "Coffee" "Coffee" "Coffee" ... ## $ Beverage : chr "Brewed Coffee" "Brewed Coffee" "Brewed Coffee" "Brewed Coffee" ... ## $ Beverage_prep : chr "Short" "Tall" "Grande" "Venti" ... ## $ Calories : int 3 4 5 5 70 100 70 100 150 110 ... ## $ Total.Fat : chr "0.1" "0.1" "0.1" "0.1" ... ## $ Trans.Fat : num 0 0 0 0 0.1 2 0.4 0.2 3 0.5 ... ## $ Saturated.Fat : num 0 0 0 0 0 0.1 0 0 0.2 0 ... ## $ Sodium : int 0 0 0 0 5 15 0 5 25 0 ... ## $ Total.Carbohydrates: int 5 10 10 10 75 85 65 120 135 105 ... ## $ Cholesterol : int 0 0 0 0 10 10 6 15 15 10 ... ## $ Dietary.Fibre : int 0 0 0 0 0 0 1 0 0 1 ... ## $ Sugars : int 0 0 0 0 9 9 4 14 14 6 ... ## $ Protein : num 0.3 0.5 1 1 6 6 5 10 10 8 ... ## $ Vitamin.A : chr "0%" "0%" "0%" "0%" ... ## $ Vitamin.C : chr "0%" "0%" "0%" "0%" ... ## . ... . ... ... ... ... ... ... ``` --- class: inverse, hide-logo ### Solution #### 3) Use `summarise()` to compute for each beverage category the average number of calories and the number of different declinations (there is 1 row per declination) ```r starbucks %>% group_by(Beverage_category) %>% summarise(Declinations = n(), Mean_cal = mean(Calories)) ``` -- ``` ## # A tibble: 9 x 3 ## Beverage_category Declinations Mean_cal ## <chr> <int> <dbl> ## 1 Classic Espresso Drinks 58 140. ## 2 Coffee 4 4.25 ## 3 Frappuccino® Blended Coffee 36 277. ## 4 Frappuccino® Blended Crème 13 233. ## 5 Frappuccino® Light Blended Coffee 12 162. ## 6 Shaken Iced Beverages 18 114. ## 7 Signature Espresso Drinks 40 250 ## 8 Smoothies 9 282. ## 9 Tazo® Tea Drinks 52 177. ``` --- class: inverse, hide-logo ### Solution #### 4) Create a subset of the data called `maxcal` containing the variables `Beverage_category`, `Beverage_prep`, and `Calories`, for the 10 observations with the highest calorie values ```r maxcal <- starbucks %>% arrange(-Calories) %>% select(Beverage_category, Beverage_prep, Calories) %>% filter(row_number() <= 10) maxcal ``` -- ``` ## Beverage_category Beverage_prep Calories ## 1 Signature Espresso Drinks 2% Milk 510 ## 2 Signature Espresso Drinks Soymilk 460 ## 3 Frappuccino® Blended Coffee Whole Milk 460 ## 4 Signature Espresso Drinks Venti Nonfat Milk 450 ## 5 Tazo® Tea Drinks 2% Milk 450 ## 6 Frappuccino® Blended Coffee Soymilk 430 ## 7 Frappuccino® Blended Coffee Venti Nonfat Milk 420 ## 8 Signature Espresso Drinks 2% Milk 400 ## 9 Tazo® Tea Drinks Soymilk 390 ## 10 Frappuccino® Blended Coffee Whole Milk 390 ``` --- <h3>Today we learn how to plot data</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"> <li><b>4. How (not) to lie with graphics</b></li> <ul style = "list-style: none"> <li>4.1. Cumulative representations</li> <li>4.2. Axis manipulations</li> <li>4.3. Interpolation</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"><li><b>5. Wrap up!</b></li></ul> ] --- <h3>Today we learn how to plot data</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> ] --- ### 1. The `ggplot()` function #### 1.1. Basic structure <ul> <li>Let's use <b>ggplot</b> on data from the <a href = "https://wid.world/">World Inequality database</a></li> </ul> ```r wid <- read.csv("C:/User/Documents/wid.csv") ``` -- <p style = "margin-bottom:-1cm;"></p> ```r str(wid) ``` ``` ## 'data.frame': 1610 obs. of 6 variables: ## $ country : chr "Algeria" "Algeria" "Algeria" "Algeria" ... ## $ continent: chr "Africa" "Africa" "Africa" "Africa" ... ## $ year : int 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 ... ## $ fshare : num 0.0992 0.112 0.1201 0.1206 0.116 ... ## $ top1 : num 0.1003 0.0991 0.0991 0.0991 0.0991 ... ## $ inc_head : num 12611 12620 12634 12532 12546 ... ``` -- <p style = "margin-bottom:1cm;"></p> <ul> <li>It contains 1610 observations and 6 variables:</li> <ul> <li><b>continent/country/year</b>: Observation level</li> <li><b>f_share</b>: Female labor income share</li> <li><b>top1</b>: Top 1% income share</li> <li><b>inc_head</b>: Per adult national income</li> </ul> </ul> --- ### 1. The `ggplot()` function #### 1.1. Basic structure <ul> <li><b>ggplot()</b> from ggplot2 is what we're gonna use for all our plots</li> <li>It takes the following <b>core arguments</b>:</li> <ul> <li><b>Data</b>: the values to plot</li> <li><b>Mapping</b> (aes, for aesthetics): the structure of the plot</li> <li><b>Geometry</b>: the type of plot</li> </ul> </ul> <p style = "margin-bottom:-1cm;"> <center><img src = "ggplot.png" width = "150" style="margin-left:17cm; margin-top:-8cm"/></center> -- <p style = "margin-bottom:1cm;"></p> <ul> <li><b>Data and mapping</b> should be specified within the <b>parentheses</b></li> <li><b>Geometry</b> and any <b>other element</b> should be added with a <b>+</b> sign</li> </ul> ```r ggplot(data, aes()) + geometry + anything_else ``` -- <p style = "margin-bottom:1.25cm;"></p> * You can also **apply** the `ggplot()` function to your data with a **pipe** ```r data %>% ggplot(aes()) + geometry ``` --- ### 1. The `ggplot()` function #### 1.1. Basic structure ```r ggplot(wid) # Data # ``` .left-column[ <img src="slides_files/figure-html/unnamed-chunk-29-1.png" width="90%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:1.5cm;"></p> <ul style = "margin-left:-1cm"> <li>We assigned data to ggplot()</li> <ul> <li>But our plot is empty</li> </ul> </ul> ] --- ### 1. The `ggplot()` function #### 1.1. Basic structure ```r ggplot(wid, aes(x = inc_head, y = top1)) # Data & aesthetics # ``` .left-column[ <img src="slides_files/figure-html/unnamed-chunk-31-1.png" width="90%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:1.5cm;"></p> <ul style = "margin-left:-1cm"> <li>We assigned data to ggplot()</li> <ul> <li>But our plot is empty</li> </ul> </ul> <p style = "margin-bottom:1.5cm;"></p> <ul style = "margin-left:-1cm"> <li>We assigned variables to axes</li> <ul> <li>But still nothing</li> </ul> </ul> ] --- ### 1. The `ggplot()` function #### 1.1. Basic structure ```r ggplot(wid, aes(x = inc_head, y = top1)) + # Data & aesthetics geom_point() # Geometry ``` .left-column[ <img src="slides_files/figure-html/unnamed-chunk-33-1.png" width="90%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:1.5cm;"></p> <ul style = "margin-left:-1cm"> <li>We assigned data to ggplot()</li> <ul> <li>But our plot is empty</li> </ul> </ul> <p style = "margin-bottom:1.5cm;"></p> <ul style = "margin-left:-1cm"> <li>We assigned variables to axes</li> <ul> <li>But still nothing</li> </ul> </ul> <p style = "margin-bottom:1.5cm;"></p> <ul style = "margin-left:-1cm"> <li>We need a geometry</li> <ul> <li>Points for instance</li> </ul> </ul> ] --- ### 1. The `ggplot()` function #### 1.1. Basic structure <ul> <li>You can the save the plot using the <b>ggsave()</b> function</li> <ul> <li>You just need to specify the <b>output destination</b> and it will <b>save</b> what is in your <b>plot panel</b></li> </ul> </ul> <p style = "margin-bottom:1.25cm;"></p> -- ```r ggsave("C:/User/Documents/wid.png") ``` -- <p style = "margin-bottom:1.25cm;"></p> <ul> <li>You can also <b>modify</b> the following options, which take the <b>parameters of your plot</b> panel if unspecified:</li> <ul> <li><b>plot:</b> ggplot object</li> <li><b>width:</b> width of the plot</li> <li><b>height:</b> height of the plot</li> <li><b>unit:</b> unit of the plot size ("in", "cm", "mm", "px")</li> <li><b>dpi:</b> pixel density, default to 300px/in</li> </ul> </ul> -- <p style = "margin-bottom:1.25cm;"></p> ```r ggsave("wid.png", plot = last_plot(), width = 16, height = 9, unit = "cm", dpi = 900) ``` --- ### 1. The `ggplot()` function #### 1.2. Axes <ul> <li>Axes can be modified with <b>scale functions</b>, whose names depend on:</li> <ul> <li>The axis to modify</li> <li>The type of variable assigned to the axis</li> </ul> </ul> <p style = "margin-bottom:1.25cm;"> -- <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Basic scale functions</caption> <thead> <tr> <th style="text-align:left;text-align: center;"> Axis </th> <th style="text-align:left;text-align: center;"> x-axis </th> <th style="text-align:left;text-align: center;"> y-axis </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Continuous </td> <td style="text-align:left;"> scale_x_continuous() </td> <td style="text-align:left;"> scale_y_continuous() </td> </tr> <tr> <td style="text-align:left;"> Discrete </td> <td style="text-align:left;"> scale_x_discrete() </td> <td style="text-align:left;"> scale_y_discrete() </td> </tr> </tbody> </table> -- <p style = "margin-bottom:1.25cm;"> <ul> <li> The following <b>parameters</b> can be modified in these scale functions:</li> <ul> <li> <b>name:</b> The label of the corresponding axis </li> <li> <b>limits:</b> Where the axis should start and end </li> <li> <b>breaks:</b> Where to put ticks and values on the axis </li> </ul> </ul> --- ### 1. The `ggplot()` function #### 1.2. Axes ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point() # Basic structure # ``` <img src="slides_files/figure-html/unnamed-chunk-38-1.png" width="60%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function #### 1.2. Axes ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point() + # Basic structure * scale_x_continuous(name = "Income per adult", limits = c(0, 150000)) # Scale function ``` <img src="slides_files/figure-html/unnamed-chunk-40-1.png" width="60%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function #### 1.3. Theme() <ul> <li>You can use one of the <b>default R themes</b> to easily change the layout of your plot</li> <ul> <li> ... + theme_<b>bw</b>()</li> <li> ... + theme_<b>minimal</b>()</li> <li> ... + theme_<b>dark</b>()</li> <li>You can also tune the <b>font size</b> inside these functions with the <b>base_size</b> argument</li> </ul> </ul> -- <img src="slides_files/figure-html/unnamed-chunk-41-1.png" width="100%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function #### 1.3. Theme() <ul> <li>You can also custom your graph using the <b>theme()</b> function</li> <ul> <li>It allows to <b>custom</b> virtually <b>anything</b></li> <li>Enter ?theme to see the <b>endless</b> list of possible <b>arguments</b></li> <li>Obviously we won't go through all of them but here are a few</li> </ul> </ul> <p style = "margin-bottom:1cm;"> -- ```r # Basic structure ggplot(wid, aes(x = inc_head, y = top1)) + geom_point() + # Axis scale_x_continuous(name = "Income per adult", limits = c(0, 150000)) + # Theme theme_minimal(base_size = 14) + * theme(# Color of the background and of its border * plot.background = element_rect(fill = "#DFE6EB", colour = "#DFE6EB"), * # Size of the axis lines * axis.line = element_line(size = rel(0.8)), * # Color of the grid lines * panel.grid = element_line(color = "gray85")) ``` --- ### 1. The `ggplot()` function #### 1.3. Theme() <img src="slides_files/figure-html/unnamed-chunk-43-1.png" width="75%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function #### 1.3. Theme() <ul> <li>Geometries can also be modified</li> <ul> <li><b>alpha:</b> opacity from 0 to 1</li> <li><b>color:</b> color of the geometry (for geometries that are filled such as bars, it will color the border)</li> <li><b>fill:</b> fill color for geometries such as bars</li> <li><b>size:</b> size of the geometry</li> <li><b>shape:</b> change shape for geometries like points</li> <li><b>linetype:</b> solid, dashed, dotted, etc., for line geometries</li> <li>...</li> </ul> </ul> -- <p style = "margin-bottom:-1cm;"> .pull-left[ <p style = "margin-bottom:1.5cm;"> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + theme_minimal(base_size = 14) ``` ] -- .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-45-1.png" width="75%" style="display: block; margin: auto;" /> ] --- ### 1. The `ggplot()` function #### 1.4. Annotation <p style = "margin-bottom:.5cm;"> -- <ul> <li>It is sometimes useful to <b>annotate</b> a graph so that certain things become <b>more salient</b></li> <ul> <li><b>Separate</b> two groups with a <b>dashed line</b></li> <li>Add a few <b>words somewhere</b> for clarity</li> <li><b>Circle</b> a specific group of <b>data points</b></li> <li>Add <b>labels</b> to data points</li> </ul> </ul> -- <p style = "margin-bottom:1cm;"> * **Straight lines** can easily be added with their respective geometry ```r + geom_hline(yintercept = , linetype = ) ``` ```r + geom_vline(xintercept = , linetype = ) ``` -- <p style = "margin-bottom:1cm;"> * And **punctual text annotations** can be added with `annotate()` ```r + annotate("text", x = , y = , label = ) ``` --- ### 1. The `ggplot()` function #### 1.4. Annotation: Adding lines -- .pull-left[ ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 2, alpha = .3) + geom_hline(yintercept = .17) ``` <p style = "margin-bottom:1.14cm;"> <img src="slides_files/figure-html/unnamed-chunk-50-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 2, alpha = .3) + geom_vline(xintercept = 90000, linetype = "dashed") ``` <img src="slides_files/figure-html/unnamed-chunk-51-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ### 1. The `ggplot()` function #### 1.4. Annotation: Adding text ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 2, alpha = .3) + annotate("text", x = 125000, y = .28, label = "Relevant info", size = 5) ``` -- <img src="slides_files/figure-html/unnamed-chunk-53-1.png" width="60%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) # # # # # # # ``` <img src="slides_files/figure-html/unnamed-chunk-55-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) # # # # # # ``` <img src="slides_files/figure-html/unnamed-chunk-57-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + geom_vline(xintercept = 90000, linetype = "dashed", size = 1, color = "#727272") # # # # # ``` <img src="slides_files/figure-html/unnamed-chunk-59-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + geom_vline(xintercept = 90000, linetype = "dashed", size = 1, color = "#727272") + annotate("text", x = 125000, y = .2, label = "Outliers", size = 5, color = "#505050") # # # # ``` <img src="slides_files/figure-html/unnamed-chunk-61-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + geom_vline(xintercept = 90000, linetype = "dashed", size = 1, color = "#727272") + annotate("text", x = 125000, y = .2, label = "Outliers", size = 5, color = "#505050") + scale_x_continuous(name = "Income per adult", limits = c(0, 150000)) # # # ``` <img src="slides_files/figure-html/unnamed-chunk-63-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + geom_vline(xintercept = 90000, linetype = "dashed", size = 1, color = "#727272") + annotate("text", x = 125000, y = .2, label = "Outliers", size = 5, color = "#505050") + scale_x_continuous(name = "Income per adult", limits = c(0, 150000)) + scale_y_continuous(name = "Top 1% inc. share", limits = c(0, .35)) # # ``` <img src="slides_files/figure-html/unnamed-chunk-65-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + geom_vline(xintercept = 90000, linetype = "dashed", size = 1, color = "#727272") + annotate("text", x = 125000, y = .2, label = "Outliers", size = 5, color = "#505050") + scale_x_continuous(name = "Income per adult", limits = c(0, 150000)) + scale_y_continuous(name = "Top 1% inc. share", limits = c(0, .35)) + theme_minimal(base_size = 14) # ``` <img src="slides_files/figure-html/unnamed-chunk-67-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 1. The `ggplot()` function <center><b>Combining everything</b></center> ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(size = 3, color = "#6794A7", alpha = .3, shape = 18) + geom_vline(xintercept = 90000, linetype = "dashed", size = 1, color = "#727272") + annotate("text", x = 125000, y = .2, label = "Outliers", size = 5, color = "#505050") + scale_x_continuous(name = "Income per adult", limits = c(0, 150000)) + scale_y_continuous(name = "Top 1% inc. share", limits = c(0, .35)) + theme_minimal(base_size = 14) + theme(plot.background = element_rect(fill = "#DFE6EB", colour = "#DFE6EB")) ``` <img src="slides_files/figure-html/unnamed-chunk-69-1.png" width="40%" style="display: block; margin: auto;" /> --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"> <li><b>4. How (not) to lie with graphics</b></li> <ul style = "list-style: none"> <li>4.1. Cumulative representations</li> <li>4.2. Axis manipulations</li> <li>4.3. Interpolation</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"><li><b>5. Wrap up!</b></li></ul> ] --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] --- ### 2. Adding dimensions #### 2.1. More axes <ul> <li>In some cases you may want to <b>convey information</b> using other means than position on an axis</li> <ul> <li>The <b>color, size, or shape</b> of a geometry can be used to represent a <b>third variable</b></li> </ul> </ul> -- <ul> <li>We can assign <b>different colors to different points</b> depending on the associated continent</li> <ul> <li>Continent should be assigned to the <i>"color axis"</i> in <b>aes()</b></li> </ul> </ul> -- ```r ggplot(wid, aes(x = inc_head, y = top1, color = continent)) + geom_point(alpha = .3) ``` <img src="slides_files/figure-html/unnamed-chunk-70-1.png" width="40%" style="display: block; margin: auto;" /> --- ### 2. Adding dimensions #### 2.1. More axes * If the variable assigned to the color axis is continuous, a color gradient will be used -- ```r ggplot(wid, aes(x = inc_head, y = top1, color = fshare)) + geom_point(alpha = .3) ``` <img src="slides_files/figure-html/unnamed-chunk-71-1.png" width="55%" style="display: block; margin: auto;" /> --- ### 2. Adding dimensions #### 2.1. More axes <ul> <li>Because there is no proper <i>"color axis"</i>, a <b>legend</b> is generated</li> <ul> <li>It can be seen as a <i>"color"</i> axis, just like the x- and y-axis</li> <li>And should then be modified with a <b><i>scale</i> function</b></li> </ul> </ul> -- <p style = "margin-bottom:1.5cm;"></p> .pull-left[ <center><b>Discrete color variable</b></center> ```r plot + scale_color_manual( name = "Title", values = c("red", "blue") ) ``` ] .pull-right[ <center><b>Continuous color variable</b></center> ```r plot + scale_color_gradient( name = "Title", low = "red", high = "blue" ) ``` ] <p style = "margin-bottom:1.5cm;"></p> -- <ul> <li>But color is not the only <b>property</b> that can be used as a <b>dimension</b>, you can use:</li> <ul> <li><b>size, shape, alpha</b>, ...</li> <li><b>fill, linetype</b>, ..., for relevant geometries</li> </ul> </ul> --- .pull-left[ <img src="slides_files/figure-html/unnamed-chunk-74-1.png" width="100%" style="display: block; margin: auto;" /> <img src="slides_files/figure-html/unnamed-chunk-75-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-76-1.png" width="100%" style="display: block; margin: auto;" /> <img src="slides_files/figure-html/unnamed-chunk-77-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ### 2. Adding dimensions #### 2.2. More facets <ul> <li>Another way to <b>distinguish groups</b> is to divide the plot into <b>facets</b></li> <ul> <li>To do so, indicate your faceting variable into the <b>facet_wrap()</b> function</li> </ul> </ul> -- * In `facet_wrap()`, the faceting variable must be preceded by a tilde as the first argument: -- ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point() + * facet_wrap(~continent) ``` -- <ul> <li>You can then choose the facet arrangement:</li> <ul> <li><b>nrow</b> to indicate the number of rows</li> <li><b>ncol</b> to indicate the number of columns</li> </ul> </ul> -- .pull-left[ <p style = "margin-bottom:1cm;"></p> <ul> <li>As well as which <b>scale</b> should be:</li> <ul> <li><b>free:</b> adjusted separately to each facet</li> <li><b>fixed:</b> common to all facets</li> </ul> </ul> ] .pull-right[ <p style = "margin-bottom:-1cm;"></p> <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>scales argument in facet_wrap()</caption> <thead> <tr> <th style="text-align:left;text-align: center;"> </th> <th style="text-align:left;text-align: center;"> x fixed </th> <th style="text-align:left;text-align: center;"> x free </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;font-weight: bold;"> y fixed </td> <td style="text-align:left;"> scales = "fixed" </td> <td style="text-align:left;"> scales = "free_x" </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> y free </td> <td style="text-align:left;"> scales = "free_y" </td> <td style="text-align:left;"> scales = "free" </td> </tr> </tbody> </table> ] --- ### 2. Adding dimensions #### 2.2. More facets ```r ggplot(wid, aes(x = inc_head, y = top1)) + geom_point(alpha = .3) + facet_wrap(~continent, ncol = 3, scales = "free_x") ``` -- <img src="slides_files/figure-html/unnamed-chunk-81-1.png" width="60%" style="display: block; margin: auto;" /> --- ### 2. Adding dimensions #### 2.3. More labels <ul> <li>The last dimension I want to mention is the <b><i>label</i> axis</b></li> <ul> <li>When using <b>geom_text()</b> instead of geom_point(), it will plot the corresponding <b>text instead of points</b></li> </ul> </ul> -- ```r ggplot(wid %>% filter(year == 2019 & continent == "Europe"), # subset so that we can see something ``` <p style = "margin-top:-1cm"></p> -- ```r aes(x = inc_head, y = top1, label = country)) + geom_text(alpha = .6) ``` -- <img src="slides_files/figure-html/unnamed-chunk-84-1.png" width="48%" style="display: block; margin: auto;" /> --- class: inverse, hide-logo ### Practice #### 1) Reproduce this graph with the `starbucks` dataset <img src="slides_files/figure-html/unnamed-chunk-85-1.png" width="70%" style="display: block; margin: auto;" /> -- <p style = "margin-bottom:1cm;"></p> <center><h3><i>You've got 10 minutes!</i></h3></center>
−
+
10
:
00
--- class: inverse, hide-logo ### Solution ```r ggplot(starbucks, aes(x = Calories, y = Cholesterol)) # # # ``` <img src="slides_files/figure-html/unnamed-chunk-87-1.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, hide-logo ### Solution ```r ggplot(starbucks, aes(x = Calories, y = Cholesterol)) + geom_point() # # ``` <img src="slides_files/figure-html/unnamed-chunk-89-1.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, hide-logo ### Solution ```r ggplot(starbucks, aes(x = Calories, y = Cholesterol, size = Trans.Fat, color = Sugars)) + geom_point() # # ``` <img src="slides_files/figure-html/unnamed-chunk-91-1.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, hide-logo ### Solution ```r ggplot(starbucks, aes(x = Calories, y = Cholesterol, size = Trans.Fat, color = Sugars)) + geom_point(alpha = .3) + scale_color_gradient(low = "green", high = "red") # ``` <img src="slides_files/figure-html/unnamed-chunk-93-1.png" width="70%" style="display: block; margin: auto;" /> --- class: inverse, hide-logo ### Solution ```r ggplot(starbucks, aes(x = Calories, y = Cholesterol, size = Trans.Fat, color = Sugars)) + geom_point(alpha = .3) + scale_color_gradient(low = "green", high = "red") + theme_minimal(base_size = 14) ``` <img src="slides_files/figure-html/unnamed-chunk-95-1.png" width="70%" style="display: block; margin: auto;" /> --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions ✔</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"> <li><b>4. How (not) to lie with graphics</b></li> <ul style = "list-style: none"> <li>4.1. Cumulative representations</li> <li>4.2. Axis manipulations</li> <li>4.3. Interpolation</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"><li><b>5. Wrap up!</b></li></ul> ] --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions ✔</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> ] --- ### 3. Types of geometry #### 3.1. Points and lines <ul> <li>So far we only represented scatterplots, but <b>many other geometries</b> can be used</li> <ul> <li>For instance, <b>lines</b> are particularly suited for <b>evolutions</b> over time</li> </ul> </ul> -- ```r ggplot(wid %>% filter(country == "USA"), aes(x = year, y = top1)) + geom_point() + geom_line() ``` -- <img src="slides_files/figure-html/unnamed-chunk-97-1.png" width="48%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.2. Barplots and histograms <ul> <li><b>Barplots</b> however are great for categorical \(x\) variables and continuous \(y\) variables</li> <ul> <li>Setting the <b>stat</b> argument to <b>"identity"</b> allows to display the corresponding <b>y value</b></li> </ul> </ul> -- ```r ggplot(wid %>% filter(continent == "South America" & year == 2019), aes(x = country, y = top1)) + geom_bar(stat = "identity") ``` -- <img src="slides_files/figure-html/unnamed-chunk-99-1.png" width="63%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.2. Barplots and histograms <ul> <li>Note that you can <b>reorder the bars</b> according to their y value using the reorder() function</li> </ul> -- ```r ggplot(wid %>% filter(continent == "South America" & year == 2019), aes(x = reorder(country, top1), y = top1)) + geom_bar(stat = "identity") ``` -- <img src="slides_files/figure-html/unnamed-chunk-101-1.png" width="68%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.2. Barplots and histograms <ul> <li>You can also set stat to <b>"count"</b> to plot the <b>number of observations</b> per category</li> <ul> <li>In that case, no variable should be assigned to the y axis</li> </ul> </ul> -- ```r ggplot(wid, aes(x = continent)) + geom_bar(stat = "count") ``` -- <img src="slides_files/figure-html/unnamed-chunk-103-1.png" width="68%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.2. Barplots and histograms <ul> <li>Finally, histograms can be used to describe the distribution of a continuous variable</li> <ul> <li>You can tune the bin width with <b>binwidth</b> or the number of bins with <b>bins</b></li> </ul> </ul> -- ```r ggplot(wid %>% filter(year == 2019), aes(x = fshare)) + geom_histogram(bins = 20, color = "white", fill = "steelblue") ``` -- <img src="slides_files/figure-html/unnamed-chunk-105-1.png" width="63%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li>The <b>continuous</b> equivalent of the histogram is the <b>density</b></li> <ul> <li>Similarly you can tune the <b>bandwidth</b> with the <b>bw</b> argument <i>(don't do it)</i></li> </ul> </ul> -- ```r ggplot(wid %>% filter(year == 2019), aes(x = fshare)) + geom_density(color = "white", fill = "steelblue") ``` -- <img src="slides_files/figure-html/unnamed-chunk-107-1.png" width="63%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li>A handy geometry to plot <b>densities</b> for different <b>groups</b> is the <b>violin</b></li> <ul> <li>Note that the <b>grouping variable</b> should be assigned to the <b>\(x\) axis</b></i></li> </ul> </ul> -- ```r ggplot(wid %>% filter(year == 2019), aes(x = continent, y = fshare)) + geom_violin(color = "white", fill = "steelblue") ``` -- <img src="slides_files/figure-html/unnamed-chunk-109-1.png" width="63%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li><b>Violins</b> are particularly interesting when <b>combined with boxplots</b></li> <ul> <li>When overlaying these geometries, make sure to tune the <b>width and opacity</b> appropriately</li> </ul> </ul> -- ```r ggplot(wid %>% filter(year == 2019), aes(x = continent, y = fshare)) + geom_violin(fill = "steelblue", alpha = .4) + geom_boxplot(width = .1, alpha = .4) ``` -- <img src="slides_files/figure-html/unnamed-chunk-111-1.png" width="63%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li>This is how <b>boxplots</b> are constructed:</li> </ul> -- <img src="slides_files/figure-html/unnamed-chunk-112-1.png" width="90%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li>This is how <b>boxplots</b> are constructed:</li> </ul> <img src="slides_files/figure-html/unnamed-chunk-113-1.png" width="90%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li>This is how <b>boxplots</b> are constructed:</li> </ul> <img src="slides_files/figure-html/unnamed-chunk-114-1.png" width="90%" style="display: block; margin: auto;" /> --- ### 3. Types of geometry #### 3.3. Densities and boxplots <ul> <li>This is how <b>boxplots</b> are constructed:</li> </ul> <img src="slides_files/figure-html/unnamed-chunk-115-1.png" width="90%" style="display: block; margin: auto;" /> --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions ✔</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry ✔</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"> <li><b>4. How (not) to lie with graphics</b></li> <ul style = "list-style: none"> <li>4.1. Cumulative representations</li> <li>4.2. Axis manipulations</li> <li>4.3. Interpolation</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"><li><b>5. Wrap up!</b></li></ul> ] --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions ✔</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry ✔</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"> <li><b>4. How (not) to lie with graphics</b></li> <ul style = "list-style: none"> <li>4.1. Cumulative representations</li> <li>4.2. Axis manipulations</li> <li>4.3. Interpolation</li> </ul> </ul> ] --- ### 4. How (not) to lie with graphics #### 4.1. Cumulative representations <p style = "margin-bottom:1cm;"></p> .left-column[ <center><img src = "trump.png" width = "720"/></center> ] .right-column[ <p style = "margin-bottom:2cm;"></p> <center>Donald Trump during his daily coronavirus task force briefing on April 6, 2020</center> <p style = "margin-bottom:1.5cm;"></p> <b>The legend indicates:</b> <i>''>1,790,000 tests completed through April 5''</i> ] --- ### 4. How (not) to lie with graphics #### 4.1. Cumulative representations <p style = "margin-bottom:1cm;"></p> .left-column[ <img src="slides_files/figure-html/unnamed-chunk-116-1.png" width="100%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:2cm;"></p> <center><b>Let's take a closer look</b></center> <p style = "margin-bottom:1.5cm;"></p> <center><i>''>1,790,000 tests completed through April 5''</i></center> <p style = "margin-bottom:1.5cm;"></p> <center>Isn't there something tricky here?</center> ] --- ### 4. How (not) to lie with graphics #### 4.1. Cumulative representations <p style = "margin-bottom:1cm;"></p> .left-column[ <img src="slides_files/figure-html/unnamed-chunk-117-1.png" width="100%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:1.25cm;"></p> <center><i>They plotted the <b>cumulative</b> number tests!</i></center> <p style = "margin-bottom:1.5cm;"></p> <ul><li>This makes it <b>look</b> like an <b>exponential</b> progression</li></ul> <p style = "margin-bottom:1cm;"></p> <ul><li>While the daily number of tests <b>actually did not increase that exponentially</b></li></ul> ] --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations .left-column[ <img src="slides_files/figure-html/unnamed-chunk-118-1.png" width="100%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:4cm;"></p> <center><i><b>What about this increase?</b></i></center> ] --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations .left-column[ <img src="slides_files/figure-html/unnamed-chunk-119-1.png" width="100%" style="display: block; margin: auto;" /> ] .right-column[ <p style = "margin-bottom:3cm;"></p> <i><b>Same data</b>, but starting <b>from 0</b></i> <p style = "margin-bottom:1.5cm;"></p> ➜ <b>Zooming</b> or unzooming on a graph can be very <b>misleading</b> ] --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations <center><img src = "foxnews.png" width = "600"/></center> --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations .pull-left[ <img src="slides_files/figure-html/unnamed-chunk-120-1.png" width="90%" style="display: block; margin: auto;" /> <center><i><b>Misleading</b></i></center> ] -- .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-121-1.png" width="90%" style="display: block; margin: auto;" /> <center><i><b>Not misleading</b></i></center> ] --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations * But in this case which is the most adequate representation? .pull-left[ <img src="slides_files/figure-html/unnamed-chunk-122-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-123-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations * There is no universal rule, but always <b>pay attention to axes and scales</b> .pull-left[ <p style = "margin-bottom:-1cm;"></p> <img src="slides_files/figure-html/unnamed-chunk-124-1.png" width="95%" style="display: block; margin: auto;" /> <p style = "margin-bottom:-1.25cm;"></p> <img src="slides_files/figure-html/unnamed-chunk-125-1.png" width="95%" style="display: block; margin: auto;" /> ] .pull-right[ <p style = "margin-bottom:-1cm;"></p> <img src="slides_files/figure-html/unnamed-chunk-126-1.png" width="95%" style="display: block; margin: auto;" /> <p style = "margin-bottom:-1.25cm;"></p> <img src="slides_files/figure-html/unnamed-chunk-127-1.png" width="95%" style="display: block; margin: auto;" /> ] --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations <ul> <li>⚠ <i>Be very careful with double axes</i> ⚠</li> <ul> <li>You can make them tell basically everything</li> </ul> </ul> <img src="slides_files/figure-html/unnamed-chunk-128-1.png" width="90%" style="display: block; margin: auto;" /> -- <p style = "margin-top:-11.265cm"></p> <img src="slides_files/figure-html/unnamed-chunk-129-1.png" width="90%" style="display: block; margin: auto;" /> -- <p style = "margin-top:-11.265cm"></p> <img src="slides_files/figure-html/unnamed-chunk-130-1.png" width="90%" style="display: block; margin: auto;" /> -- <p style = "margin-top:-11.265cm"></p> <img src="slides_files/figure-html/unnamed-chunk-131-1.png" width="90%" style="display: block; margin: auto;" /> --- ### 4. How (not) to lie with graphics #### 4.2. Axis manipulations <ul> <li>Be careful with <b>free scales</b> in <b>facet_wrap()</b> as well</li> <ul> <li>It can make things <b>look more homogeneous</b> than they actually are</li> </ul> </ul> <img src="slides_files/figure-html/unnamed-chunk-132-1.png" width="60%" style="display: block; margin: auto;" /> --- ### 4. How (not) to lie with graphics #### 4.3. Interpolation * Here is the **previous graph** on the tax increase using a **line geometry** .pull-left[ <img src="slides_files/figure-html/unnamed-chunk-133-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ <p style = "margin-bottom:3cm;"></p> <center>This line has <b>infinitely many</b> points</center> <p style = "margin-bottom:2cm;"></p> <center>But <b>only two</b> of them are <b>correct</b></center> ] --- ### 4. How (not) to lie with graphics #### 4.3. Interpolation <ul> <li>This figure also has <b>finitely many actual data points</b> but feels more natural</li> <ul> <li>This is because values are <b>sufficiently close</b> to each other to be <b>considered</b> as <b>continuous</b></li> </ul> </ul> <img src="slides_files/figure-html/unnamed-chunk-134-1.png" width="60%" style="display: block; margin: auto;" /> --- ### 4. How (not) to lie with graphics #### 4.3. Interpolation <ul> <li>There is no rule either on when <b>lines</b> should be used or not</li> <ul> <li>But the <b>observation level</b> should be <b>clear</b> on the graph</li> </ul> </ul> <p style = "margin-bottom:1.5cm;"></p> .pull-left[ <img src="slides_files/figure-html/unnamed-chunk-135-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-136-1.png" width="100%" style="display: block; margin: auto;" /> ] --- <h3>Overview</h3> <p style = "margin-bottom:2.5cm;"></p> .pull-left[ <p style = "margin-bottom:.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>1. The ggplot() function ✔</b></li> <ul style = "list-style: none"> <li>1.1. Basic structure</li> <li>1.2. Axes</li> <li>1.3. Theme</li> <li>1.4. Annotation</li> </ul> </ul> <p style = "margin-bottom:1.75cm;"></p> <ul style = "margin-left:1.5cm;list-style: none"> <li><b>2. Adding dimensions ✔</b></li> <ul style = "list-style: none"> <li>2.1. More axes</li> <li>2.2. More facets</li> <li>2.3. More labels</li> </ul> </ul> ] .pull-right[ <ul style = "margin-left:-1cm;list-style: none"> <li><b>3. Types of geometry ✔</b></li> <ul style = "list-style: none"> <li>3.1. Points and lines</li> <li>3.2. Barplots and histograms</li> <li>3.3. Densities and boxplots</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"> <li><b>4. How (not) to lie with graphics ✔</b></li> <ul style = "list-style: none"> <li>4.1. Cumulative representations</li> <li>4.2. Axis manipulations</li> <li>4.3. Interpolation</li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul style = "margin-left:-1cm;list-style: none"><li><b>5. Wrap up!</b></li></ul> ] --- ### 5. Wrap up! <p style = "margin-bottom:2cm;"> <center><h4> The 3 core components of the ggplot() function </h4></center> -- <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption></caption> <thead> <tr> <th style="text-align:left;"> Component </th> <th style="text-align:center;"> Contribution </th> <th style="text-align:center;"> Implementation </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Data </td> <td style="text-align:center;"> Underlying values </td> <td style="text-align:center;"> ggplot(data, | data %>% ggplot(., </td> </tr> <tr> <td style="text-align:left;"> Mapping </td> <td style="text-align:center;"> Axis assignment </td> <td style="text-align:center;"> aes(x = V1, y = V2, ...)) </td> </tr> <tr> <td style="text-align:left;"> Geometry </td> <td style="text-align:center;"> Type of plot </td> <td style="text-align:center;"> + geom_point() + geom_line() + ... </td> </tr> </tbody> </table> <p style = "margin-bottom:2cm;"> -- * Any **other element** should be added with a **`+` sign** -- ```r ggplot(data, aes(x = V1, y = V2)) + geom_point() + geom_line() + anything_else() ``` --- ### 5. Wrap up! .pull-left[ <p style = "margin-bottom:1.75cm;"> <center><h4> Main customization tools </h4></center> <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption></caption> <thead> <tr> <th style="text-align:left;"> Item to customize </th> <th style="text-align:left;"> Main functions </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Axes </td> <td style="text-align:left;"> scale_[x/y]_[continuous/discrete] </td> </tr> <tr> <td style="text-align:left;"> Baseline theme </td> <td style="text-align:left;"> theme_[void/minimal/.../dark]() </td> </tr> <tr> <td style="text-align:left;"> Annotations </td> <td style="text-align:left;"> geom_[[h/v]line/text](), annotate() </td> </tr> <tr> <td style="text-align:left;"> Theme </td> <td style="text-align:left;"> theme(axis.[line/ticks].[x/y] = ..., </td> </tr> </tbody> </table> ] -- .pull-right[ <center><h4> Main types of geometry </h4></center> <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption></caption> <thead> <tr> <th style="text-align:left;"> Geometry </th> <th style="text-align:center;"> Function </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Bar plot </td> <td style="text-align:center;"> geom_bar() </td> </tr> <tr> <td style="text-align:left;"> Histogram </td> <td style="text-align:center;"> geom_histogram() </td> </tr> <tr> <td style="text-align:left;"> Area </td> <td style="text-align:center;"> geom_area() </td> </tr> <tr> <td style="text-align:left;"> Line </td> <td style="text-align:center;"> geom_line() </td> </tr> <tr> <td style="text-align:left;"> Density </td> <td style="text-align:center;"> geom_density() </td> </tr> <tr> <td style="text-align:left;"> Boxplot </td> <td style="text-align:center;"> geom_boxplot() </td> </tr> <tr> <td style="text-align:left;"> Violin </td> <td style="text-align:center;"> geom_violin() </td> </tr> <tr> <td style="text-align:left;"> Scatter plot </td> <td style="text-align:center;"> geom_point() </td> </tr> </tbody> </table> ] --- ### 5. Wrap up! .pull-left[ <center><h4> Main types of aesthetics </h4></center> <table class="table table-hover table-condensed" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption></caption> <thead> <tr> <th style="text-align:left;"> Argument </th> <th style="text-align:left;"> Meaning </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> alpha </td> <td style="text-align:left;"> opacity from 0 to 1 </td> </tr> <tr> <td style="text-align:left;"> color </td> <td style="text-align:left;"> color of the geometry </td> </tr> <tr> <td style="text-align:left;"> fill </td> <td style="text-align:left;"> fill color of the geometry </td> </tr> <tr> <td style="text-align:left;"> size </td> <td style="text-align:left;"> size of the geometry </td> </tr> <tr> <td style="text-align:left;"> shape </td> <td style="text-align:left;"> shape for geometries like points </td> </tr> <tr> <td style="text-align:left;"> linetype </td> <td style="text-align:left;"> solid, dashed, dotted, etc. </td> </tr> </tbody> </table> ] -- .pull-right[ <p style = "margin-bottom:3.25cm;"></p> <ul> <li>If specified <b>in the geometry</b></li> <ul> <li>It will apply uniformly to <b>all the geometry</b></li> </ul> </ul> <p style = "margin-bottom:1cm;"></p> <ul> <li>If assigned to a variable <b>in aes</b></li> <ul> <li>It will <b>vary with the variable</b> according to a scale documented in legend</li> </ul> </ul> ] <br> -- ```r ggplot(data, aes(x = V1, y = V2, size = V3)) + geom_point(color = "steelblue", alpha = .6) ```