├── .gitignore ├── README.Rmd ├── README.md ├── README_files └── figure-gfm │ ├── pressure-1.png │ ├── unnamed-chunk-2-1.png │ ├── unnamed-chunk-3-1.png │ ├── unnamed-chunk-4-1.png │ └── unnamed-chunk-5-1.png ├── results.csv ├── table-shapes.Rproj ├── table.graffle └── table.png /.gitignore: -------------------------------------------------------------------------------- 1 | .Rhistory 2 | .RData 3 | .Rproj.user 4 | -------------------------------------------------------------------------------- /README.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: github_document 3 | --- 4 | 5 | # Pivot function names 6 | 7 | On 2019-03-22, I [tweeted about](https://twitter.com/hadleywickham/status/1109132826631421952) a [survey](https://forms.gle/vvYgBw1EwHK69gA17) to help me pick names for the [new pivot functions](https://tidyr.tidyverse.org/dev/articles/pivot.html) in the dev version of tidyr. 8 | 9 | In the survey, I showed a picture of two tables containing the same data, and asked participants to describe their relative shapes. This document describes the results. 10 | 11 | ![Table A has four columns (id, x, y, z) and 2 rows. Table B has three columns (id, n, x) and six rows](table.png) 12 | 13 | ```{r, include=FALSE} 14 | knitr::opts_chunk$set(comment = "#>", collapse = TRUE) 15 | ``` 16 | 17 | ```{r setup, message = FALSE} 18 | library(googlesheets) 19 | library(tidyverse) 20 | 21 | # This googlesheet is public if you want to do your own analysis 22 | key <- gs_key("1Do5R1k5sEZrwU0N1KmIjKaapHDrf7eYLdIlcGNx-MsI") 23 | results <- googlesheets::gs_read(key, col_types = list()) 24 | names(results) <- c("timestamp", "table_a", "table_b") 25 | head(results) 26 | 27 | nrow(results) 28 | 29 | # Capture for posterity 30 | write_csv(results, "results.csv") 31 | ``` 32 | 33 | ## Table A -> Table B 34 | 35 | Wider is the clear winner with ~80% of responses. 36 | 37 | ```{r} 38 | table_a <- results %>% 39 | filter(!is.na(table_a)) %>% 40 | mutate(top3 = table_a %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>% 41 | count(top3) %>% 42 | mutate(prop = n / sum(n)) 43 | 44 | table_a %>% 45 | ggplot(aes(top3, prop)) + 46 | geom_col() + 47 | scale_y_continuous(labels = scales::percent) + 48 | labs( 49 | x = NULL, 50 | y = "Percent of responses" 51 | ) + 52 | coord_flip() 53 | ``` 54 | 55 | There were a wide range of write in respones. The most popular included concise, compact, condense, denser. 56 | 57 | ```{r} 58 | results %>% 59 | mutate( 60 | table_a = table_a %>% 61 | str_remove("Table A is ") %>% 62 | str_remove(" than Table B") %>% 63 | str_trunc(50) 64 | ) %>% 65 | count(table_a, sort = TRUE) %>% 66 | print(n = Inf) 67 | ``` 68 | 69 | ## Table B -> Table A 70 | 71 | Longer is the clear winner with ~70% of responses. Given the number of people who suggested taller to me, I had expected it to come in much higher. Interestingly narrower is much less common than shorter, it's equivalent above. 72 | 73 | ```{r} 74 | table_b <- results %>% 75 | filter(!is.na(table_b)) %>% 76 | mutate(top3 = table_b %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>% 77 | count(top3) %>% 78 | mutate(prop = n / sum(n)) 79 | 80 | table_b %>% 81 | ggplot(aes(top3, prop)) + 82 | geom_col() + 83 | scale_y_continuous(labels = scales::percent) + 84 | labs( 85 | x = NULL, 86 | y = "Percent of responses" 87 | ) + 88 | coord_flip() 89 | ``` 90 | 91 | There were a wide range of write in respones. The most popular included expanded and skinnier. 92 | 93 | ```{r} 94 | results %>% 95 | mutate( 96 | table_b = table_b %>% 97 | str_remove("Table B is ") %>% 98 | str_remove(" than Table A") %>% 99 | str_trunc(50) 100 | ) %>% 101 | count(table_b, sort = TRUE) %>% 102 | print(n = Inf) 103 | ``` 104 | 105 | ## Conclusion 106 | 107 | The new functions will be called `pivot_wider()` and `pivot_longer()`: these are not the most natural names for everyone, but they are are the most popular by a large margin. I like pivot because it suggests the form of the underlying operation (a pivoting or rotation), and it is evocative to excel users. 108 | 109 | A few alternatives that were suggested, considered, and rejected: 110 | 111 | * `VERB_long()`/`VERB_wide()`: not obvious whether they take long/wide 112 | data or return long/wide data. 113 | 114 | * `VERB_to_long()`/`VERB_to_wider()`: implies that long and wide are absolute 115 | terms. I don't think it makes sense to talk about long or wide form data; 116 | you can only say one form is longer or wider than another form. 117 | 118 | * `to_long()`/`to_wide()`: isn't a verb, and implies that there's only one 119 | operation that makes data longer/wider. The next version of tidyr will also 120 | contain functions that unnest list-columns of vectors, and that verb (name 121 | TBA) also needs directional suffixes. 122 | 123 | * `reshape_SHAPE`: too much potential for confusion with the existing 124 | `base::resahpe()` 125 | 126 | * `gather()`/`spread()`: while some people clearly liked these functions they 127 | were not memorable to a large number of people I talked to. 128 | 129 | I appreciate the enthusiasm that people have for naming functions! 130 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Pivot function names 3 | 4 | On 2019-03-22, I [tweeted 5 | about](https://twitter.com/hadleywickham/status/1109132826631421952) a 6 | [survey](https://forms.gle/vvYgBw1EwHK69gA17) to help me pick names for 7 | the [new pivot 8 | functions](https://tidyr.tidyverse.org/dev/articles/pivot.html) in the 9 | dev version of tidyr. 10 | 11 | In the survey, I showed a picture of two tables containing the same 12 | data, and asked participants to describe their relative shapes. This 13 | document describes the results. 14 | 15 | ![Table A has four columns (id, x, y, z) and 2 rows. Table B has three 16 | columns (id, n, x) and six rows](table.png) 17 | 18 | ``` r 19 | library(googlesheets) 20 | library(tidyverse) 21 | 22 | # This googlesheet is public if you want to do your own analysis 23 | key <- gs_key("1Do5R1k5sEZrwU0N1KmIjKaapHDrf7eYLdIlcGNx-MsI") 24 | results <- googlesheets::gs_read(key, col_types = list()) 25 | names(results) <- c("timestamp", "table_a", "table_b") 26 | head(results) 27 | #> # A tibble: 6 x 3 28 | #> timestamp table_a table_b 29 | #> 30 | #> 1 3/22/2019 10:30:… Table A is tidier than Tab… Table B is deeper than Tab… 31 | #> 2 3/22/2019 10:31:… Table A is wider than Tabl… Table B is taller than Tab… 32 | #> 3 3/22/2019 10:32:… Table A is wider than Tabl… Table B is taller than Tab… 33 | #> 4 3/22/2019 10:33:… Table A is wider than Tabl… Table B is longer than Tab… 34 | #> 5 3/22/2019 10:34:… Table A is wider than Tabl… Table B is longer than Tab… 35 | #> 6 3/22/2019 10:37:… Table A is shallower than … Table B is narrower than T… 36 | 37 | nrow(results) 38 | #> [1] 2649 39 | 40 | # Capture for posterity 41 | write_csv(results, "results.csv") 42 | ``` 43 | 44 | ## Table A -\> Table B 45 | 46 | Wider is the clear winner with ~80% of responses. 47 | 48 | ``` r 49 | table_a <- results %>% 50 | filter(!is.na(table_a)) %>% 51 | mutate(top3 = table_a %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>% 52 | count(top3) %>% 53 | mutate(prop = n / sum(n)) 54 | 55 | table_a %>% 56 | ggplot(aes(top3, prop)) + 57 | geom_col() + 58 | scale_y_continuous(labels = scales::percent) + 59 | labs( 60 | x = NULL, 61 | y = "Percent of responses" 62 | ) + 63 | coord_flip() 64 | ``` 65 | 66 | ![](README_files/figure-gfm/unnamed-chunk-2-1.png) 67 | 68 | There were a wide range of write in respones. The most popular included 69 | concise, compact, condense, denser. 70 | 71 | ``` r 72 | results %>% 73 | mutate( 74 | table_a = table_a %>% 75 | str_remove("Table A is ") %>% 76 | str_remove(" than Table B") %>% 77 | str_trunc(50) 78 | ) %>% 79 | count(table_a, sort = TRUE) %>% 80 | print(n = Inf) 81 | #> # A tibble: 99 x 2 82 | #> table_a n 83 | #> 84 | #> 1 wider 2121 85 | #> 2 shorter 328 86 | #> 3 shallower 90 87 | #> 4 Compact 4 88 | #> 5 more compact 4 89 | #> 6 Denser 3 90 | #> 7 2 91 | #> 8 Condensed 2 92 | #> 9 fatter than table B 2 93 | #> 10 Horizontal 2 94 | #> 11 smaller 2 95 | #> 12 wider and shorter 2 96 | #> 13 %>% select(x, y, z) %>% 1 97 | #> 14 3D 1 98 | #> 15 "a \"condensed\" version of Table B" 1 99 | #> 16 a gathered Table B 1 100 | #> 17 A is a condense of B 1 101 | #> 18 A is more collective than B 1 102 | #> 19 A is the pivot shape of B 1 103 | #> 20 A is the unpivot of B 1 104 | #> 21 buxom 1 105 | #> 22 By individual / by observation 1 106 | #> 23 cases by variable, i.e. wider 1 107 | #> 24 cleaner 1 108 | #> 25 coiled up compared to table B 1 109 | #> 26 colum heavy 1 110 | #> 27 column-heavy as compared to Table B 1 111 | #> 28 Compact. Far fewer data cells required in table A. 1 112 | #> 29 concise version table B 1 113 | #> 30 Concise, Row Complete, completeR, conciseR 1 114 | #> 31 condensed than table B 1 115 | #> 32 Denormslised 1 116 | #> 33 Dense 1 117 | #> 34 denser than table B. 1 118 | #> 35 distinct, indexed on ID, fact table 1 119 | #> 36 expanded by columns 1 120 | #> 37 Fater 1 121 | #> 38 fatter 1 122 | #> 39 Flatter 1 123 | #> 40 For this specific example, I would say shorter.... 1 124 | #> 41 Funkier 1 125 | #> 42 horizontal vs Table B is vertical 1 126 | #> 43 horizontal, or column-wise 1 127 | #> 44 implicit 1 128 | #> 45 In A each observation repeats only once 1 129 | #> 46 Keep it as gather/spread. 1 130 | #> 47 Less melted 1 131 | #> 48 more aggregated 1 132 | #> 49 More compact 1 133 | #> 50 more compact / succinct / concise 1 134 | #> 51 more compact and easy to read 1 135 | #> 52 more compact than table B 1 136 | #> 53 More compact. 1 137 | #> 54 more compressed. 1 138 | #> 55 more concentrated than table B 1 139 | #> 56 more concise 1 140 | #> 57 more condensed 1 141 | #> 58 More dense (more info per cell) 1 142 | #> 59 more spread 1 143 | #> 60 notidy 1 144 | #> 61 pedantic 1 145 | #> 62 pivot 1 146 | #> 63 Pivoted 1 147 | #> 64 pivoted on “n” as its columns 1 148 | #> 65 Reshape 1 149 | #> 66 row_condensed 1 150 | #> 67 Russia 1 151 | #> 68 Short fat 1 152 | #> 69 short-form of Table B. 1 153 | #> 70 short-wide 1 154 | #> 71 Shorter and fatter 1 155 | #> 72 shorter and fatter (wider) 1 156 | #> 73 shorter and wider 1 157 | #> 74 shorter and wider but not shallower 1 158 | #> 75 shorter and wider than B 1 159 | #> 76 skimpy 1 160 | #> 77 spreaded on b 1 161 | #> 78 squash 1 162 | #> 79 Squatter 1 163 | #> 80 stacked by id compared to Table B 1 164 | #> 81 Stout 1 165 | #> 82 "Table \"A\" covers less area than Table \"B\"" 1 166 | #> 83 Table 1 is better organized for the human inter... 1 167 | #> 84 Table A has grouped/summarized information from... 1 168 | #> 85 Table A has more columns but less rows 1 169 | #> 86 table A is of 2 dimensions, while B is of 1 in ... 1 170 | #> 87 Table A looks like a waffle 1 171 | #> 88 Table A spreads more horizontally than table B 1 172 | #> 89 Table B is CR(Column to Row) transformation of ... 1 173 | #> 90 Table B is taller than Table A 1 174 | #> 91 Table is wide table b is long 1 175 | #> 92 Tabular 1 176 | #> 93 Te same table, but different presentation or da... 1 177 | #> 94 the before picture of Table B. 1 178 | #> 95 the unstacked form of Table B 1 179 | #> 96 They are similar, containing the same information 1 180 | #> 97 thicc while Table B is a sticc 1 181 | #> 98 tidier 1 182 | #> 99 unique on one variable and is wider 1 183 | ``` 184 | 185 | ## Table B -\> Table A 186 | 187 | Longer is the clear winner with ~70% of responses. Given the number of 188 | people who suggested taller to me, I had expected it to come in much 189 | higher. Interestingly narrower is much less common than shorter, it’s 190 | equivalent above. 191 | 192 | ``` r 193 | table_b <- results %>% 194 | filter(!is.na(table_b)) %>% 195 | mutate(top3 = table_b %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>% 196 | count(top3) %>% 197 | mutate(prop = n / sum(n)) 198 | 199 | table_b %>% 200 | ggplot(aes(top3, prop)) + 201 | geom_col() + 202 | scale_y_continuous(labels = scales::percent) + 203 | labs( 204 | x = NULL, 205 | y = "Percent of responses" 206 | ) + 207 | coord_flip() 208 | ``` 209 | 210 | ![](README_files/figure-gfm/unnamed-chunk-4-1.png) 211 | 212 | There were a wide range of write in respones. The most popular included 213 | expanded and skinnier. 214 | 215 | ``` r 216 | results %>% 217 | mutate( 218 | table_b = table_b %>% 219 | str_remove("Table B is ") %>% 220 | str_remove(" than Table A") %>% 221 | str_trunc(50) 222 | ) %>% 223 | count(table_b, sort = TRUE) %>% 224 | print(n = Inf) 225 | #> # A tibble: 96 x 2 226 | #> table_b n 227 | #> 228 | #> 1 longer 1844 229 | #> 2 taller 419 230 | #> 3 narrower 171 231 | #> 4 deeper 116 232 | #> 5 7 233 | #> 6 Vertical 2 234 | #> 7 %>% filter(n=='x'|n=='y'|n=='z') %>% 1 235 | #> 8 2d 1 236 | #> 9 a detailed version of table A 1 237 | #> 10 a panel, Table A is not 1 238 | #> 11 a spread Table A 1 239 | #> 12 "an \"expanded\" version of Table A" 1 240 | #> 13 Atomic 1 241 | #> 14 B is a sublimate of A 1 242 | #> 15 B is more individual than A 1 243 | #> 16 B is the itemized shape of A 1 244 | #> 17 B is the pivot of A 1 245 | #> 18 bigger 1 246 | #> 19 By attribute / by key-value 1 247 | #> 20 Chile 1 248 | #> 21 clearer than table B. 1 249 | #> 22 combonation of variable is unique and is longer 1 250 | #> 23 Down 1 251 | #> 24 Expanded 1 252 | #> 25 expanded by rows 1 253 | #> 26 Expanded. 1 254 | #> 27 expansive than table A 1 255 | #> 28 explicit 1 256 | #> 29 Extended 1 257 | #> 30 flatter than table A (i.e The value column is t... 1 258 | #> 31 gaunt 1 259 | #> 32 Groovier 1 260 | #> 33 "I think if I hadn't heard of \"wide\" and \"long\",..." 1 261 | #> 34 Keep it as gather/spread. 1 262 | #> 35 less compact 1 263 | #> 36 Less dense 1 264 | #> 37 long and narrow 1 265 | #> 38 longer and narrower 1 266 | #> 39 Looser 1 267 | #> 40 melter 1 268 | #> 41 more diluted than table A 1 269 | #> 42 more fragmented or elemental 1 270 | #> 43 more likely to be drafted in the NBA. Jk, it's ... 1 271 | #> 44 More melted 1 272 | #> 45 more repetitive / verbose / redundant 1 273 | #> 46 More slender 1 274 | #> 47 more stretched out. 1 275 | #> 48 more verbose 1 276 | #> 49 more vertically stacked than table A 1 277 | #> 50 narrow-long 1 278 | #> 51 narrower and longer 1 279 | #> 52 Needs more rows for each observation 1 280 | #> 53 Normalised 1 281 | #> 54 Not deeper 1 282 | #> 55 outrolled 1 283 | #> 56 pivot 1 284 | #> 57 pivoted with “n” in its rows 1 285 | #> 58 portrait vs. Table A is landscape 1 286 | #> 59 redundundant 1 287 | #> 60 repeated observation/measurement of cases by va... 1 288 | #> 61 row heavy 1 289 | #> 62 row_expanded 1 290 | #> 63 row-heavy as compared to Table A 1 291 | #> 64 skinnier. 1 292 | #> 65 Skinnier/skinny 1 293 | #> 66 Skinny 1 294 | #> 67 Slack 1 295 | #> 68 Sparser 1 296 | #> 69 Sparser than table A 1 297 | #> 70 stack 1 298 | #> 71 Stacked 1 299 | #> 72 stretched by n compared to Table A 1 300 | #> 73 stretched out compared to Table A 1 301 | #> 74 "Table \"B\" has creates options than Table \"A\"" 1 302 | #> 75 Table A is RC transformation of Table B 1 303 | #> 76 Table B has more rows but less columns 1 304 | #> 77 Take B is skinnier 1 305 | #> 78 Tall skinny 1 306 | #> 79 taller and thinner 1 307 | #> 80 Taller and thinner 1 308 | #> 81 the after photo of Table A 1 309 | #> 82 the long-form of Table A. 1 310 | #> 83 the stacked form of Table A 1 311 | #> 84 Thinner 1 312 | #> 85 thinner and longer 1 313 | #> 86 Tidier 1 314 | #> 87 tidy 1 315 | #> 88 transaction table, not indexed on ID 1 316 | #> 89 transposed of table A 1 317 | #> 90 Unbind variables for each id 1 318 | #> 91 Unpivoted 1 319 | #> 92 Vectorial 1 320 | #> 93 verbose 1 321 | #> 94 vertical 1 322 | #> 95 wider 1 323 | #> 96 worse 1 324 | ``` 325 | 326 | ## Conclusion 327 | 328 | The new functions will be called `pivot_wider()` and `pivot_longer()`: 329 | these are not the most natural names for everyone, but they are are the 330 | most popular by a large margin. I like pivot because it suggests the 331 | form of the underlying operation (a pivoting or rotation), and it is 332 | evocative to excel users. 333 | 334 | A few alternatives that were suggested, considered, and rejected: 335 | 336 | - `VERB_long()`/`VERB_wide()`: not obvious whether they take long/wide 337 | data or return long/wide data. 338 | 339 | - `VERB_to_long()`/`VERB_to_wider()`: implies that long and wide are 340 | absolute terms. I don’t think it makes sense to talk about long or 341 | wide form data; you can only say one form is longer or wider than 342 | another form. 343 | 344 | - `to_long()`/`to_wide()`: isn’t a verb, and implies that there’s only 345 | one operation that makes data longer/wider. The next version of 346 | tidyr will also contain functions that unnest list-columns of 347 | vectors, and that verb (name TBA) also needs directional suffixes. 348 | 349 | - `reshape_SHAPE`: too much potential for confusion with the existing 350 | `base::resahpe()` 351 | 352 | - `gather()`/`spread()`: while some people clearly liked these 353 | functions they were not memorable to a large number of people I 354 | talked to. 355 | 356 | I appreciate the enthusiasm that people have for naming functions\! 357 | -------------------------------------------------------------------------------- /README_files/figure-gfm/pressure-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/README_files/figure-gfm/pressure-1.png -------------------------------------------------------------------------------- /README_files/figure-gfm/unnamed-chunk-2-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/README_files/figure-gfm/unnamed-chunk-2-1.png -------------------------------------------------------------------------------- /README_files/figure-gfm/unnamed-chunk-3-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/README_files/figure-gfm/unnamed-chunk-3-1.png -------------------------------------------------------------------------------- /README_files/figure-gfm/unnamed-chunk-4-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/README_files/figure-gfm/unnamed-chunk-4-1.png -------------------------------------------------------------------------------- /README_files/figure-gfm/unnamed-chunk-5-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/README_files/figure-gfm/unnamed-chunk-5-1.png -------------------------------------------------------------------------------- /table-shapes.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: knitr 13 | LaTeX: XeLaTeX 14 | 15 | AutoAppendNewline: Yes 16 | StripTrailingWhitespace: Yes 17 | -------------------------------------------------------------------------------- /table.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/table.graffle -------------------------------------------------------------------------------- /table.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hadley/table-shapes/78976335231ed2ab27fdbf1c9d918e1d45d14b8d/table.png --------------------------------------------------------------------------------