├── 538.tsv ├── DnDStatistics.Rproj ├── LICENSE ├── README.md ├── dataProcess.R └── docs ├── 538.tsv ├── charTable.tsv ├── index.Rmd ├── index.html └── uniqueTable.tsv /538.tsv: -------------------------------------------------------------------------------- 1 | Race FIGHTER ROGUE WIZARD BARBARIAN CLERIC RANGER PALADIN WARLOCK MONK BARD SORCERER DRUID TOTAL 2 | HUMAN 4,888 2,542 2,568 1,435 2,339 1,715 2,326 1,714 1,946 1,454 1,324 996 25,248 3 | ELF 1,242 2,257 2,744 336 921 3,076 492 755 1,349 651 841 1,779 16,443 4 | HALF-ELF 646 1,325 611 153 628 891 817 1,401 399 1,808 1,258 516 10,454 5 | DWARF 2,009 362 395 1,323 2,199 415 971 286 405 394 264 484 9,507 6 | DRAGONBORN 1,335 325 346 875 510 355 1,688 584 457 371 1,031 309 8,185 7 | TIEFLING 379 798 516 198 353 272 473 2,188 309 806 1,062 281 7,634 8 | GENASI 580 495 558 388 459 420 322 415 750 352 648 584 5,971 9 | HALFLING 339 1,797 257 306 308 440 207 296 551 801 310 302 5,916 10 | HALF-ORC 976 233 143 1,709 272 245 427 212 284 199 126 215 5,039 11 | GNOME 257 600 1,360 227 304 238 151 311 196 400 257 332 4,634 12 | GOLIATH 865 139 109 1,729 192 187 389 136 326 144 114 190 4,522 13 | AARAKOCRA 273 362 181 313 249 572 149 203 835 279 177 275 3,868 14 | AASIMAR 116 71 67 70 274 60 429 210 87 144 174 65 1,767 15 | TOTAL 13,906 11,307 9,855 9,063 9,009 8,887 8,840 8,711 7,892 7,804 7,587 6,328 NA -------------------------------------------------------------------------------- /DnDStatistics.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 4 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Burak Ogan Mancarci 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | DnD character statistics 3 | ======================== 4 | 5 | This is my experiment on doing some stats on DnD characters. 6 | 7 | See [here](https://oganm.github.io/dndstats/) for the document 8 | 9 | 10 | See [here](https://github.com/oganm/dndstats/blob/master/docs/index.Rmd) for the Rmd source code for the document. 11 | 12 | The text of this document is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license 13 | 14 | The code blocks within the source code is licensed under [MIT license](https://opensource.org/licenses/MIT). 15 | 16 | 17 | ## Data access 18 | 19 | This dataset is present in 2 forms: in its entirety that includes duplicates 20 | of characters and filtered version that only includes unique characters. 21 | 22 | Go [here](docs/charTable.tsv) for the complete data and [here](docs/uniqueTable.tsv) for the filtered one. Both have 23 | the same columns as explained below. The code to generate these tables can be found [here](https://github.com/oganm/dndstats/blob/master/dataProcess.R). 24 | 25 | Below are the descriptions of the columns in the files. If you think something you'd be interested 26 | in is missing, you can let me know. 27 | 28 | **name:** This column has hashes that represent character names. If the hashes are 29 | the same, that means the names are the same. Real names are removed 30 | to protect character anonymity. Yes D&D characters have rights. 31 | 32 | **race:** This is the race field as it come out of the application. It is not really 33 | helpful as subrace and race information all mixed up together and unevenly available. 34 | It also includes some homebrew content. You probably want to use the **processedRace** 35 | column if you are interested in this. 36 | 37 | **background:** Background as it comes out of the application. 38 | 39 | **date:** Time & date of input. Dates before 2018-04-16 are unreliable as some has accidentally changed 40 | while moving files around. 41 | 42 | **class:** Class and level. Different classes are separated by `|` when needed. 43 | 44 | **justClass:** Class without level. Different classes are separated by `|` when needed. 45 | 46 | **subclass:** Subclasses. Again, separated by `|` when needed. 47 | 48 | **level:** Total character level. 49 | 50 | **feats:** Feats chosen by character. Separated by `|` when needed. 51 | 52 | **HP:** Character HP. 53 | 54 | **AC:** Character AC. 55 | 56 | **Str, Dex, Con, Int, Wis, Cha:** ability scores 57 | 58 | **alignment:** Alignment free text field. It is a mess, don't touch it. See **processedAlignment**,**good** and **lawful** instead. 59 | 60 | **skills:** List of skills with proficiency. Separated by `|`. 61 | 62 | **weapons:** List weapons. Separated by `|`. It is somewhat of a mess as it allows free text inputs. See **processedWeapons**. 63 | 64 | **spells:** List of spells and their levels. Spells are separated by `|`s. Each spell has its level next to it 65 | separated by `*`s. This is a huge mess as its a free text field and some users included things like damage dice in them. See **processedSpells**. 66 | 67 | **day:** A shortened version of **date**. Only includes day information. 68 | 69 | **processedAlignment:** Processed version of the **alignment** column. Way people wrote up their alignments are manually sifted through and assigned to the matching aligmment. First character represents lawfulness (L, N, C), second one goodness (G,N,E). An empty string means alignment wasn't written or unclear. 70 | 71 | **good, lawful:** Isolated columns for goodness and lawfulness. 72 | 73 | **processedRace:** I have gone through the way **race** column is filled by the app and asigned them to correct 74 | races. If empty, indiciates a homebrew race not natively supported by the app. 75 | 76 | **processedSpells:** Formatting is same as the **spells** column but it is cleaned up. Using string similarity I tried 77 | to match the spells to the full list of spells available in the official publications. The spell is removed if the spell I guessed does not have the correct level or doesn't include all words of the original spell and has too many modifications to be recognizable. It may have a few false matches but it should be mostly fine 78 | 79 | **processedWeapons:** Similar to **processedSpells**, **weapons** column is matched to the closest official weapon with some restrictions. 80 | 81 | **levelGroup:** splits levels into groups as used in the feat percentage plot. Only present in the filtered data 82 | but easy enough to make on your own. 83 | -------------------------------------------------------------------------------- /dataProcess.R: -------------------------------------------------------------------------------- 1 | library(import5eChar) # github.com/oganm/import5eChar 2 | library(purrr) 3 | library(readr) 4 | library(glue) 5 | library(digest) 6 | library(dplyr) 7 | library(XML) 8 | library(ogbox) # github.com/oganm/ogbox 9 | library(wizaRd) # github.com/oganm/wizaRd 10 | library(stringr) 11 | library(memoise) 12 | 13 | # memoImportChar = memoise(importCharacter) 14 | # saveRDS(memoImportChar,'memoImportChar.rds') 15 | memoImportChar = readRDS('/home/oganm/gitRepos/DnDStatistics/memoImportChar.rds') 16 | 17 | # get all char files saved everywhere 18 | charFiles = c(list.files('/srv/shiny-server/printSheetApp/chars/',full.names = TRUE), 19 | list.files('/srv/shiny-server/interactiveSheet/chars/',full.names = TRUE), 20 | list.files('/srv/shiny-server/chars',full.names = TRUE), 21 | list.files('/srv/shiny-server/chars2', full.names = TRUE), 22 | list.files('/srv/shiny-server/chars3', full.names = TRUE), 23 | list.files('/srv/shiny-server/chars4', full.names = TRUE)) 24 | print('reading char files') 25 | # use import5eChar to read the all of them 26 | chars = charFiles %>% lapply(function(x){ 27 | memoImportChar(file = x) 28 | }) 29 | saveRDS(memoImportChar,'memoImportChar.rds') 30 | 31 | # get date information. dates before 2018-04-16 are not reliable 32 | fileInfo = file.info(charFiles) 33 | # get user fingerprint and IP 34 | fileData = charFiles %>% basename %>% strsplit('_') 35 | 36 | # add file and user info to the characters 37 | print('constructing char table') 38 | chars = lapply(1:length(chars),function(i){ 39 | char = chars[[i]] 40 | char$date = fileInfo$mtime[i] 41 | if(length(fileData[[i]]) == 1){ 42 | char$ip = 'NULL' 43 | char$finger = 'NULL' 44 | char$hash = fileData[[i]] 45 | } else{ 46 | char$finger = fileData[[i]][1] 47 | char$ip = fileData[[i]][2] 48 | char$hash = fileData[[i]][3] 49 | } 50 | char 51 | }) 52 | 53 | # setting the names to character name and class. this won't be exposed to others 54 | names(chars) = chars %>% map_chr(function(x){ 55 | paste(x$Name,x$ClassField) 56 | }) 57 | 58 | # create the table 59 | charTable = chars %>% map(function(x){ 60 | data.frame(ip = x$ip, 61 | finger = x$finger, 62 | hash = x$hash, 63 | name = x$Name, 64 | race = x$Race, 65 | background = x$Background, 66 | date = x$date, 67 | class = paste(x$classInfo[,1],x$classInfo[,3],collapse='|'), 68 | justClass = x$classInfo[,'Class'] %>% paste(collapse ='|'), 69 | subclass = x$classInfo[,'Archetype'] %>% paste(collapse ='|'), 70 | level = x$classInfo[,'Level'] %>% as.integer() %>% sum, 71 | feats = x$feats[x$feats !=''] %>% paste(collapse = '|'), 72 | HP = x$currentHealth, 73 | AC = AC(x), 74 | Str = x$abilityScores['Str'], 75 | Dex = x$abilityScores['Dex'], 76 | Con = x$abilityScores['Con'], 77 | Int = x$abilityScores['Int'], 78 | Wis = x$abilityScores['Wis'], 79 | Cha = x$abilityScores['Cha'], 80 | alignment = x$Alignment, 81 | skills = x$skillProf %>% which %>% names %>% paste(collapse = '|'), 82 | weapons = x$weapons %>% map_chr('name') %>% gsub("\\|","",.) %>% paste(collapse = '|'), 83 | spells = glue('{x$spells$name %>% gsub("\\\\*|\\\\|","",.)}*{x$spells$level}') %>% glue::collapse('|') %>% {if(length(.)!=1){return('')}else{return(.)}}, 84 | day = x$date %>% format('%m %d %y'), 85 | stringsAsFactors = FALSE) 86 | }) %>% do.call(rbind,.) 87 | 88 | 89 | 90 | # post processing ----- 91 | # the way races are encoded in the app is a little silly. sub-races are 92 | # not recorded separately. essentially race information is lost other 93 | # than a text field after it's effects are applied during creation. 94 | # The text field is also not too consistent. For instance if you are a 95 | # variant it'll simply say "Variant" but if you are a variant human 96 | # it'll only say human 97 | # here, I define regex that matches races. 98 | # kind of an overkill as only few races actually required special care 99 | races = c(Aarakocra = 'Aarakocra', 100 | Aasimar = 'Aasimar', 101 | Bugbear= 'Bugbear', 102 | Dragonborn = 'Dragonborn', 103 | Dwarf = 'Dwarf', 104 | Elf = '(?% {.[!. %in% unlist(align)]} %>% table %>% sort %>% names 195 | 196 | checkAlignment = function(x,legend){ 197 | x = names(legend)[findInList(x,legend)] 198 | if(length(x) == 0){ 199 | return('') 200 | } else{ 201 | return(x) 202 | } 203 | } 204 | 205 | 206 | charTable %<>% mutate(processedAlignment = alignment %>% purrr::map_chr(checkAlignment,align), 207 | good = processedAlignment %>% purrr::map_chr(checkAlignment,goodEvil) %>% 208 | factor(levels = c('E','N','G')), 209 | lawful = processedAlignment %>% 210 | purrr::map_chr(checkAlignment,lawfulChaotic) %>% factor(levels = c('C','N','L'))) 211 | 212 | charTable %<>% mutate(processedRace = race %>% sapply(function(x){ 213 | out = races %>% sapply(function(y){ 214 | grepl(pattern = y, x,perl = TRUE,ignore.case = TRUE) 215 | }) %>% which %>% names 216 | 217 | if(length(out) == 0 | length(out)>1){ 218 | out = '' 219 | } 220 | 221 | return(out) 222 | })) 223 | 224 | # remove personal info 225 | 226 | shortestDigest = function(vector){ 227 | digested = vector %>% map_chr(digest,'sha1') 228 | uniqueDigested = digested %>% unique 229 | 230 | collusionLimit = 1:40 %>% sapply(function(i){ 231 | substr(uniqueDigested,40-i,40)%>% unique %>% length 232 | }) %>% which.max %>% {.+1} 233 | 234 | digested %<>% substr(40-collusionLimit,40) 235 | } 236 | 237 | 238 | charTable$name %<>% shortestDigest 239 | charTable$ip %<>% shortestDigest 240 | charTable$finger %<>% shortestDigest 241 | charTable$hash %<>% shortestDigest 242 | 243 | spells = wizaRd::spells 244 | 245 | spells = c(spells, list('.' = list(level = as.integer(99)))) 246 | class(spells) = 'list' 247 | 248 | legitSpells =spells %>% names 249 | 250 | 251 | processedSpells = charTable$spells %>% sapply(function(x){ 252 | if(x==''){ 253 | return('') 254 | } 255 | spellNames = x %>% str_split('\\|') %>% {.[[1]]} %>% str_split('\\*') %>% map_chr(1) 256 | spellLevels = x %>% str_split('\\|') %>% {.[[1]]} %>% str_split('\\*') %>% map_chr(2) 257 | 258 | distanceMatrix = adist(tolower(spellNames), tolower(legitSpells),costs = list(ins=2, del=2, sub=3), counts = TRUE) 259 | 260 | rownames(distanceMatrix) = spellNames 261 | colnames(distanceMatrix) = legitSpells 262 | 263 | predictedSpell = distanceMatrix %>% apply(1,which.min) %>% {legitSpells[.]} 264 | distanceScores = distanceMatrix %>% apply(1,min) 265 | predictedSpellLevel = spells[predictedSpell] %>% purrr::map_int('level') 266 | 267 | ins = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'ins'] %>% as.matrix %>% diag 268 | del = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'del'] %>% as.matrix %>% diag 269 | sub = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'sub'] %>% as.matrix %>% diag 270 | isItIn = predictedSpell %>% str_split(' |/') %>% map(function(x){ 271 | x[!x %in% c('and','or','of','to','the')] 272 | }) %>% 273 | {sapply(1:length(.),function(i){ 274 | all(sapply(.[[i]],grepl,x =spellNames[i],ignore.case=TRUE)) 275 | })} 276 | 277 | spellFrame = data.frame(spellNames,predictedSpell,spellLevels,predictedSpellLevel,distanceScores,ins,del,sub,isItIn,stringsAsFactors = FALSE) 278 | 279 | spellFrame %<>% filter(as.integer(spellLevels)==predictedSpellLevel &( isItIn | (sub < 5 & del < 5 & ins < 5))) 280 | 281 | paste0(spellFrame$predictedSpell,'*',spellFrame$predictedSpellLevel,collapse ='|') 282 | }) 283 | charTable$processedSpells = processedSpells 284 | 285 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$spells[i],charTable$processedSpells[i])}) %>% {.>20} %>% {charTable$spells[.]} %>% {.[43]} 286 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$spells[i],charTable$processedSpells[i])}) %>% {.>20} %>% {charTable$spells[.]} %>% {.[70]} 287 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$spells[i],charTable$processedSpells[i])}) %>% {.>20} %>% {charTable$spells[.]} %>% {.[88]} 288 | 289 | # download.file('https://www.dropbox.com/s/4f7zdx09nkfa9as/Core.xml?dl=1',destfile = 'Core.xml') 290 | # allRules = xmlParse('Core.xml') %>% xmlToList() 291 | # fightClubItems = allRules[names(allRules) == 'item'] 292 | # saveRDS(fightClubItems,'fightClubItems.rds') 293 | 294 | # fightClubItems = readRDS('fightClubItems.rds') 295 | # names(fightClubItems) = allRules %>% map('name') %>% as.character 296 | # 297 | # fightClubItems %>% map_chr('type') %>% {. %in% 'M'} %>% {fightClubItems[.]} %>% map_chr('name') 298 | # fightClubItems %>% map_chr('type') %>% {. %in% 'R'} %>% {fightClubItems[.]} %>% map_chr('name') 299 | 300 | legitWeapons = c(# fightClubItems %>% map_chr('type') %>% {. %in% 'M'} %>% {fightClubItems[.]} %>% map_chr('name'), 301 | # fightClubItems %>% map_chr('type') %>% {. %in% 'R'} %>% {fightClubItems[.]} %>% map_chr('name'), 302 | 'Crossbow, Light', 'Dart', 'Shortbow', 'Sling', 303 | 'Blowgun', 'Crossbow, hand', 'Crossbow, Heavy', 'Longbow', 'Net', 304 | 'Club','Dagger','Greatclub','Handaxe','Javelin','Light hammer','Mace','Quarterstaff','Sickle','Spear','Unarmed Strike', 305 | 'Battleaxe','Flail','Glaive','Greataxe','Greatsword','Halberd','Lance','Longsword','Maul','Morningstar','Pike','Rapier','Scimitar','Shortsword','Trident','War pick','Warhammer','Whip') 306 | 307 | processedWeapons = charTable$weapons %>% sapply(function(x){ 308 | if(x==''){ 309 | return('') 310 | } 311 | weaponNames = x %>% str_split('\\|') %>% {.[[1]]} 312 | 313 | distanceMatrix = adist(tolower(weaponNames), tolower(legitWeapons),costs = list(ins=2, del=2, sub=3), counts = TRUE) 314 | 315 | rownames(distanceMatrix) = weaponNames 316 | colnames(distanceMatrix) = legitWeapons 317 | 318 | predictedWeapon = distanceMatrix %>% apply(1,which.min) %>% {legitWeapons[.]} 319 | distanceScores = distanceMatrix %>% apply(1,min) 320 | 321 | ins = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'ins'] %>% as.matrix %>% diag 322 | del = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'del'] %>% as.matrix %>% diag 323 | sub = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'sub'] %>% as.matrix %>% diag 324 | isItIn = predictedWeapon %>% str_split(' |/') %>% map(function(x){ 325 | x[!x %in% c('and','or','of','to','the')] 326 | }) %>% 327 | {sapply(1:length(.),function(i){ 328 | all(sapply(.[[i]],grepl,x =weaponNames[i],ignore.case=TRUE)) 329 | })} 330 | 331 | weaponFrame = data.frame(weaponNames,predictedWeapon,distanceScores,ins,del,sub,isItIn,stringsAsFactors = FALSE) 332 | 333 | weaponFrame %<>% filter(isItIn| (sub < 2 & del < 2 & ins < 2)) 334 | 335 | paste0(weaponFrame$predictedWeapon %>% unique,collapse ='|') 336 | }) 337 | 338 | charTable$processedWeapons = processedWeapons 339 | 340 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$weapons[i],charTable$processedWeapons[i])}) %>% {.>20} %>% {charTable$weapons[.]} %>% {.[10]} 341 | 342 | 343 | unsecureFields = c('ip','finger','hash') 344 | 345 | charTable = charTable[!names(charTable) %in% unsecureFields] 346 | 347 | # user id ------ 348 | # userID = c() 349 | # pb = txtProgressBar(min = 0, max = nrow(charTable), initial = 0) 350 | # 351 | # for(i in 1:nrow(charTable)){ 352 | # setTxtProgressBar(pb,i) 353 | # for (id in unique(userID)){ 354 | # userChars = charTable[which(userID == id),] 355 | # ip = charTable$ip[i] %>% {if(is.na(.) || . =='NULL' || .==''){return("NANA")}else{.}} 356 | # finger = charTable$finger[i] %>% {if(is.na(.) || . =='NULL' ||. == ''){return("NANA")}else{.}} 357 | # hash = charTable$hash[i] %>% {if(is.na(.) || . =='NULL' || . == ''){return("NANA")}else{.}} 358 | # 359 | # ipInUser = ip %in% userChars$ip 360 | # fingerInUser = finger %in% userChars$finger 361 | # hashInUser = hash %in% userChars$hash 362 | # if(ipInUser | fingerInUser | hashInUser){ 363 | # 364 | # userID = c(userID,id) 365 | # break 366 | # } 367 | # 368 | # } 369 | # 370 | # if(length(userID)!=i){ 371 | # userID = c(userID, max(c(userID,0))+1) 372 | # } 373 | # } 374 | # 375 | # charTable$userID = userID 376 | # 377 | # 378 | # userID = c() 379 | # pb = txtProgressBar(min = 0, max = nrow(charTable), initial = 0) 380 | # 381 | # for(i in 1:nrow(charTable)){ 382 | # setTxtProgressBar(pb,i) 383 | # for (id in unique(userID)){ 384 | # userChars = charTable[which(userID == id),] 385 | # ip = charTable$ip[i] %>% {if(is.na(.) || . =='NULL' || .==''){return("NANA")}else{.}} 386 | # finger = charTable$finger[i] %>% {if(is.na(.) || . =='NULL' ||. == ''){return("NANA")}else{.}} 387 | # hash = charTable$hash[i] %>% {if(is.na(.) || . =='NULL' || . == ''){return("NANA")}else{.}} 388 | # 389 | # ipInUser = ip %in% userChars$ip 390 | # fingerInUser = finger %in% userChars$finger 391 | # hashInUser = hash %in% userChars$hash 392 | # if(fingerInUser | hashInUser){ 393 | # 394 | # userID = c(userID,id) 395 | # break 396 | # } 397 | # 398 | # } 399 | # 400 | # if(length(userID)!=i){ 401 | # userID = c(userID, max(c(userID,0))+1) 402 | # } 403 | # } 404 | # 405 | # charTable$userIDNoIP = userID 406 | # 407 | write_tsv(charTable,path = 'docs/charTable.tsv') 408 | # 409 | # # secure table ----- 410 | # 411 | # # not sure about the legality of this but I may be able to share 412 | # # the data in an anonymized form. 413 | # 414 | # secureTable = charTable 415 | # secureTable$ip %<>% sapply(function(x){ 416 | # if(x %in% c('','NULL')){ 417 | # return('') 418 | # } else{ 419 | # digest(x,'sha1') 420 | # } 421 | # }) 422 | # secureTable$finger %<>% sapply(function(x){ 423 | # if(x %in% c('','NULL')){ 424 | # return('') 425 | # } else{ 426 | # digest(x,'sha1') 427 | # } 428 | # }) 429 | # 430 | # secureTable$name %<>% sapply(digest,'sha1') 431 | # write_tsv(secureTable,path = 'docs/hashedTable.tsv') 432 | 433 | -------------------------------------------------------------------------------- /docs/538.tsv: -------------------------------------------------------------------------------- 1 | Race FIGHTER ROGUE WIZARD BARBARIAN CLERIC RANGER PALADIN WARLOCK MONK BARD SORCERER DRUID TOTAL 2 | HUMAN 4,888 2,542 2,568 1,435 2,339 1,715 2,326 1,714 1,946 1,454 1,324 996 25,248 3 | ELF 1,242 2,257 2,744 336 921 3,076 492 755 1,349 651 841 1,779 16,443 4 | HALF-ELF 646 1,325 611 153 628 891 817 1,401 399 1,808 1,258 516 10,454 5 | DWARF 2,009 362 395 1,323 2,199 415 971 286 405 394 264 484 9,507 6 | DRAGONBORN 1,335 325 346 875 510 355 1,688 584 457 371 1,031 309 8,185 7 | TIEFLING 379 798 516 198 353 272 473 2,188 309 806 1,062 281 7,634 8 | GENASI 580 495 558 388 459 420 322 415 750 352 648 584 5,971 9 | HALFLING 339 1,797 257 306 308 440 207 296 551 801 310 302 5,916 10 | HALF-ORC 976 233 143 1,709 272 245 427 212 284 199 126 215 5,039 11 | GNOME 257 600 1,360 227 304 238 151 311 196 400 257 332 4,634 12 | GOLIATH 865 139 109 1,729 192 187 389 136 326 144 114 190 4,522 13 | AARAKOCRA 273 362 181 313 249 572 149 203 835 279 177 275 3,868 14 | AASIMAR 116 71 67 70 274 60 429 210 87 144 174 65 1,767 15 | TOTAL 13,906 11,307 9,855 9,063 9,009 8,887 8,840 8,711 7,892 7,804 7,587 6,328 NA -------------------------------------------------------------------------------- /docs/index.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | output: html_document 3 | always_allow_html: yes 4 | editor_options: 5 | chunk_output_type: console 6 | --- 7 | 8 | ```{r setup, include=FALSE} 9 | library(dplyr) 10 | library(magrittr) 11 | library(readr) 12 | library(stringr) 13 | library(ggplot2) 14 | library(cowplot) 15 | library(glue) 16 | library(reshape2) 17 | library(igraph) 18 | library(circlize) 19 | library(patchwork) 20 | library(plotly) 21 | library(shiny) 22 | library(here) 23 | library(knitr) 24 | library(purrr) 25 | library(kableExtra) 26 | library(ogbox) # github.com/oganm/ogbox 27 | knitr::opts_chunk$set(echo = FALSE, fig.align ='center') 28 | 29 | 30 | getUniqueTable = function(charTable){ 31 | uniqueTable = charTable %>% arrange(desc(level)) %>% filter(!duplicated(paste(name,justClass))) %>% 32 | filter(!level > 20) 33 | 34 | # detect non unique characters that multiclassed 35 | multiClassed = uniqueTable %>% filter(grepl('\\|',justClass)) 36 | singleClassed = uniqueTable %>% filter(!grepl('\\|',justClass)) 37 | 38 | 39 | matchingNames = multiClassed$name[multiClassed$name %in% singleClassed$name]%>% na.omit 40 | 41 | isDuplicate = matchingNames %>% sapply(function(nm){ 42 | multiChar = multiClassed %>% filter(name == nm) 43 | singleChar = singleClassed %>% filter(name == nm) 44 | 45 | if(nrow(multiChar) != 1 | nrow(singleChar) != 1){ 46 | warning('Not 1-1 match. Skipping') 47 | return(FALSE) 48 | } else{ 49 | isSubset = str_split(multiChar$justClass,pattern = '\\|') %>% {.[[1]]} %>% {singleChar$justClass %in% .} 50 | isHigherLevel = multiChar$level > singleChar$level 51 | return(isSubset & isHigherLevel) 52 | } 53 | }) 54 | 55 | singleClassed %<>% filter(!name %in% matchingNames[isDuplicate]) 56 | 57 | uniqueTable = rbind(singleClassed,multiClassed) 58 | 59 | return(list(uniqueTable = uniqueTable, 60 | singleClassed = singleClassed, 61 | multiClassed = multiClassed)) 62 | } 63 | 64 | # load table and get unique characters 65 | 66 | charTable = read_tsv(here("docs/charTable.tsv"),na = 'NA') 67 | charTable %<>% mutate(good = factor(good,levels = c('E','N','G')), 68 | lawful = factor(lawful, levels = c('C','N','L'))) 69 | 70 | # group levels at common feat acquisition points. sorry fighters and rogues 71 | charTable %<>% mutate(levelGroup = cut(level, 72 | breaks = c(0,3,7,11,15,18,20), 73 | labels = c('1-3','4-7','8-11','12-15','16-18','19-20'))) 74 | 75 | # for anyone looking at this and confused by the weird syntax 76 | # see https://stackoverflow.com/questions/1826519/how-to-assign-from-a-function-which-returns-more-than-one-value 77 | list[keepRevised,,] = getUniqueTable(charTable) 78 | charTable$justClass %<>% gsub(pattern = 'Revised ', replacement = '',x = .) 79 | charTable$class %<>% gsub(pattern = 'Revised ', replacement = '',x = .) 80 | 81 | list[uniqueTable,singleClassed,multiClassed] = getUniqueTable(charTable) 82 | 83 | write_tsv(uniqueTable,path = here('docs/uniqueTable.tsv')) 84 | 85 | 86 | barPalette = c('#7DD4A6','#C15BC5','#D65242','#415455', 87 | '#D2A75C','#8FD25B','#D15B86','#A5B5BE','#727EC6', 88 | '#567441','#754334','#5E3A60','#77B0D0',"#CCEBC5", 89 | "#D9D9D9","#FCCDE5") 90 | ``` 91 | 92 | Table of Contents 93 | ================= 94 | 95 | * [Is your D&D character rare? II: Off-brand edition](#is-your-dd-character-rare-ii-off-brand-edition) 96 | * [Introduction](#introduction) 97 | * [Is Your D&D Character Rare? II](#is-your-dd-character-rare-ii) 98 | * [Is your character archetype rare?](#is-your-character-archetype-rare) 99 | * [Is your alignment rare?](#is-your-alignment-rare) 100 | * [Are your feat choices rare?](#are-your-feat-choices-rare) 101 | * [Is your multiclass combination rare?](#is-your-multiclass-combination-rare) 102 | * [Is power gaming rare?](#is-power-gaming-rare) 103 | * [Are your spells rare?](#are-your-spells-rare) 104 | * [Is your game day rare?](#is-your-game-day-rare) 105 | * [About the data](#about-the-data) 106 | * [Data access](#data-access) 107 | * [About this document](#about-this-document) 108 | * [Changelog](#changelog) 109 | 110 | 111 | # Is your D&D character rare? II: Off-brand edition 112 | 113 | *Ogan Mancarci, 28 July 2018* 114 | 115 | *Edited: 9 September 2018 (see [changlelog](#changelog))* 116 | 117 | ## Introduction 118 | 119 | About a year ago FiveThirtyEight published a short article called 120 | ["Is Your D&D Character Rare?"](https://fivethirtyeight.com/features/is-your-dd-character-rare/). 121 | It was a product of a deal between Curse and FiveThirtyEight which meant the data 122 | was not available to anyone else. I was a little jealous that I couldn't play with the data and disappointed that they only counted class race combinations and called it a day. 123 | 124 | Shortly after, I released a few tools ([1](https://oganm.github.io/printSheetApp/),[2](https://oganm.github.io/5eInteractiveSheet/)) for a popular mobile application ([3](https://play.google.com/store/apps/details?id=com.wgkammerer.testgui.basiccharactersheet.app&hl=en_CA)) which allowed me to collect my users' character data. 125 | 126 | After 3.5 months of data collection 127 | I have a whopping... `r nrow(uniqueTable)` unique characters in my database that I can play with. Well... I'm not 128 | as popular as DnDBeyond but I don't see anyone else waving around hundreds of character sheets for us to 129 | data mine, so it'll have to do. 130 | 131 | ## Is Your D&D Character Rare? II 132 | 133 | To start with let's redo the table from FiveThirtyEight. I am not going to pretend 134 | like I have many thousands of samples so instead of per 100,000 this shows class and race combinations per 100 135 | characters. In FiveThirtyEight's table, characters with multiple classes count once for each class. Here I divided multiclassed characters based on the proportion of their class levels. For instance, a character who is a Fighter 5/Rogue 15 will add 0.75 to the rogue count and 0.25 to the fighter count. Homebrew and UA classes are removed. 136 | 137 | ```{r fiveThirtyEightCopy,fig.width=9} 138 | 139 | # classes = uniqueTable$justClass %>% str_split('\\|') %>% unlist %>% unique 140 | legitClasses = c("Warlock", "Monk", "Wizard", "Barbarian", "Sorcerer", "Paladin", "Fighter", "Druid", "Ranger", "Rogue","Cleric","Bard") 141 | races = uniqueTable$processedRace %>% unique %>% {.[.!='']} 142 | coOccurenceMatrix = matrix(0 , nrow=length(races),ncol = length(legitClasses)) 143 | colnames(coOccurenceMatrix) = legitClasses 144 | rownames(coOccurenceMatrix) = races 145 | for (i in seq_along(races)){ 146 | for (j in seq_along(legitClasses)){ 147 | ((uniqueTable$processedRace==races[i]) * { 148 | classLevel =str_extract(uniqueTable$class,glue('(?<={legitClasses[j]} )[0-9]+')) %>% {.[is.na(.)] = 0;.} %>% as.integer() 149 | classLevel/uniqueTable$level 150 | }) %>% sum -> coOcc 151 | coOccurenceMatrix[i,j] = coOcc 152 | } 153 | } 154 | 155 | coOccurenceMatrixSubset = coOccurenceMatrix[,!coOccurenceMatrix %>% apply(2,sum) %>% {.<2}] 156 | 157 | coOccurenceMatrixSubset = coOccurenceMatrixSubset[!coOccurenceMatrixSubset %>% apply(1,sum) %>% {.<1},] 158 | 159 | coOccurenceMatrixSubset = 160 | coOccurenceMatrixSubset[coOccurenceMatrixSubset %>% apply(1,sum) %>% order(decreasing = FALSE), 161 | coOccurenceMatrixSubset %>% apply(2,sum) %>% order(decreasing = TRUE)] 162 | 163 | coOccurenceMatrixSubset = coOccurenceMatrixSubset/(sum(coOccurenceMatrix))* 100 164 | 165 | 166 | classSums = coOccurenceMatrixSubset %>% apply(2,sum) 167 | raceSums = coOccurenceMatrixSubset %>% apply(1,sum) 168 | 169 | coOccurenceMatrixSubset = cbind(coOccurenceMatrixSubset,raceSums) 170 | 171 | 172 | coOccurenceMatrixSubset = rbind(Total = c(classSums,NA), coOccurenceMatrixSubset) 173 | colnames(coOccurenceMatrixSubset)[ncol(coOccurenceMatrixSubset)] = "Total" 174 | 175 | coOccurenceFrame = coOccurenceMatrixSubset %>% melt() 176 | names(coOccurenceFrame)[1:2] = c('Race','Class') 177 | 178 | coOccurenceFrame %<>% mutate(fillCol = value*(Race!='Total' & Class!='Total')) 179 | 180 | 181 | coOccurenceFrame %>% ggplot(aes(x = Class,y = Race)) + 182 | geom_tile(aes(fill = fillCol),show.legend = FALSE)+ 183 | scale_fill_continuous(low = 'white',high = '#46A948',na.value = 'white')+ 184 | # viridis::scale_fill_viridis() + 185 | geom_text(aes(label = value %>% round(2) %>% format(nsmall=2))) + 186 | scale_x_discrete(position='top') + xlab('') + ylab('') + 187 | theme(axis.text.x = element_text(angle = 30,vjust = 0.5,hjust = 0)) 188 | 189 | ``` 190 | 191 | 192 | ```{r fiveThirtyEightCorrMaths,message=FALSE} 193 | fiveThirtyEight = read_tsv('538.tsv') %>% melt() 194 | names(fiveThirtyEight)[2] = 'Class' 195 | 196 | fiveThirtyEight %<>% mutate(Class = as.character(Class)) %>% 197 | arrange(Race,Class) %>% filter(Race !='TOTAL' & Class != 'TOTAL') 198 | 199 | coOccurenceFrame %<>% mutate(Race = toupper(Race), Class = toupper(Class)) %>% 200 | arrange(Race,Class) %>% 201 | filter(Race %in% fiveThirtyEight$Race & Class %in% fiveThirtyEight$Class) 202 | 203 | corFrame = data.frame(DnDBeyond = fiveThirtyEight$value/1000, oganm = coOccurenceFrame$value, 204 | class = coOccurenceFrame$Class,race = coOccurenceFrame$Race) 205 | 206 | 207 | ``` 208 | 209 | Despite the methodological differences, these results seem to correlate well with DnDBeyond data (Spearman's ρ=`r round(cor(corFrame$DnDBeyond,corFrame$oganm,method = 'spearman'),2)`) even though we seem to disagree on the exact order of popularity. Graph below shows the % occurrence of a class/race combination in DnDBeyond data as presented in FiveThirtyEight and my data. 210 | 211 | 212 | ```{r fiveThirtyEightCorr,message=FALSE,fig.height=3.5,fig.width=3.5} 213 | 214 | 215 | corFrame %>% ggplot(aes(x = oganm,y = DnDBeyond,text = paste(class, race))) + 216 | geom_point() + 217 | ggtitle("Class-race combination %s at\n DnDBeyond vs oganm's data") ->p 218 | 219 | ply = plotly::ggplotly(p) %>% layout(xaxis=list(fixedrange=TRUE)) %>% 220 | config(displayModeBar = F) %>% 221 | layout(yaxis=list(fixedrange=TRUE)) 222 | ply$width = 500 223 | # rmarkdown seems to ignore alignment set using plotly 224 | div(ply,align = 'center') 225 | ``` 226 | 227 | ## Is your character archetype rare? 228 | 229 | This is a little hard to visualize in a single plot. Alas we are short on space so you're going 230 | to have to mouse over to see the details. Each colored section shows a character archetype's proportion 231 | to the rest of the archetypes for the class. They are ordered from bottom to top in order of 232 | frequency, so the brown always show the most popular archetype and it goes downhill (but upwards in the plot) from there. 233 | 234 | ```{r archetypeGraph} 235 | # uniqueTable$justClass 236 | classes = uniqueTable$justClass %>% str_split('\\|') %>% unlist 237 | archetypes = uniqueTable$subclass %>% str_split('\\|') %>% unlist 238 | 239 | 240 | archeFrame = data.frame(classes,archetypes) %>% filter(archetypes !='') 241 | classSum = archeFrame$classes %>% table %>% sort(decreasing = TRUE) 242 | 243 | archeFrame %<>% group_by(classes,archetypes) %>% summarize(count = n()) %>% 244 | arrange(classes,(count)) %>% filter(classes %in% names(which(classSum>2))) %>% 245 | ungroup() %>% 246 | mutate(archetypes = factor(archetypes,levels = archetypes)) %>% 247 | group_by(classes) %>% 248 | mutate(ratio = count/sum(count)*100) %>% 249 | mutate(classArcheID = as.integer(archetypes) - max(as.integer(archetypes)) +1) %>% ungroup() %>% mutate(classArcheID = as.factor(classArcheID)) %>% 250 | mutate(`%` = round(ratio)) %>% 251 | filter(classes %in% legitClasses) 252 | 253 | archeFrame %>% 254 | ggplot(aes(x = classes,y = ratio,fill = classArcheID, 255 | label = archetypes,hede = count,hodo = `%`)) + 256 | geom_bar(stat='identity') + 257 | theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 ), 258 | legend.position = 'none') + 259 | scale_fill_manual(values = barPalette) + 260 | ggtitle('Archetype choices') + xlab('') + ylab('archetype % within class')->p 261 | 262 | ply = ggplotly(p,tooltip = c('label','hodo','hede')) %>% layout(xaxis=list(fixedrange=TRUE)) %>% 263 | config(displayModeBar = F) %>% 264 | layout(yaxis=list(fixedrange=TRUE)) 265 | div(ply,align='center') 266 | 267 | ``` 268 | 269 | ## Is your alignment rare? 270 | 271 | Analysis of alignment in this dataset is difficult because unlike most other fields, 272 | it is not mandatory. It also isn't something you are likely to forget about your 273 | character so there isn't much incentive to fill it in. 274 | Only `r round(sum(uniqueTable$alignment != '')/nrow(uniqueTable)*100)`% of 275 | characters actually filled this field. I know I only filled it myself when testing 276 | my applications. It is entirely possible for the users' choice to fill this box 277 | to introduce a bias so take these results with a grain of salt 278 | 279 | Also, since its a free text field, some manual 280 | processing is required to make the most of this information 281 | (looking at you fellows with the "Awesome" and "Super Good" alignments). But that is 282 | still `r sum(uniqueTable$alignment != '')` characters so there you go: 283 | 284 | The plot below shows character counts for each alignment. 285 | 286 | ```{r alignment, fig.height=2,fig.width=2,fig.align='center'} 287 | 288 | alignmentTable = uniqueTable %>% filter(processedAlignment != '') 289 | 290 | 291 | 292 | 293 | alignmentCounts = alignmentTable %>% group_by(good,lawful) %>% 294 | summarize(Count = n()) 295 | 296 | 297 | alignmentCounts %>% ggplot(aes(y = good, 298 | x = lawful, 299 | fill = Count, 300 | label = Count)) + geom_tile() + 301 | scale_fill_continuous(low = 'white',high = '#46A948',na.value = 'white') + 302 | geom_text() + 303 | ylab('Good/Evil') + 304 | xlab('Lawful/Chaotic') + 305 | scale_x_discrete(limits = c('L','N','C')) + 306 | theme(legend.position = 'none')->p 307 | p 308 | ``` 309 | 310 | In general, lawful characters seem to be out of style these days. Let's see how are 311 | the tendencies for individual classes. Below graph shows a mean alignment for 312 | each class. Multiclassed characters' contribution 313 | is calculated as before. You can mouse over to see sample size and mean values. 314 | The numerical values are distributed from 1 to 3. 1 is Chaotic/Evil, 3 is Lawful/Good on 315 | the corresponding scales. 316 | 317 | ```{r classAlignment} 318 | 319 | classGood = legitClasses %>% sapply(function(x){ 320 | classAlignment = alignmentTable %>% filter(grepl(x,justClass)) 321 | good =classAlignment %$% good 322 | classProportion = as.integer(stringr::str_extract(classAlignment$class,glue('(?<={x} )[0-9]+')))/classAlignment$level 323 | weighted.mean(good %>% as.integer,classProportion) 324 | }) 325 | 326 | classLawful = legitClasses %>% sapply(function(x){ 327 | classAlignment = alignmentTable %>% filter(grepl(x,justClass)) 328 | lawful =classAlignment %$% lawful 329 | classProportion = as.integer(stringr::str_extract(classAlignment$class,glue('(?<={x} )[0-9]+')))/classAlignment$level 330 | weighted.mean(lawful %>% as.integer,classProportion) 331 | }) 332 | 333 | classN = legitClasses %>% sapply(function(x){ 334 | classAlignment = alignmentTable %>% filter(grepl(x,justClass)) 335 | classProportion = sum(as.integer(stringr::str_extract(classAlignment$class,glue('(?<={x} )[0-9]+')))/classAlignment$level) 336 | return(classProportion) 337 | }) 338 | 339 | 340 | classAlignments = data.frame(`Good/Evil` = classGood,`Chaotic/Lawful` = classLawful, 341 | Class = legitClasses,N = classN, 342 | check.names = FALSE) 343 | 344 | classAlignments %>% ggplot(aes(y = `Good/Evil`,x = `Chaotic/Lawful`, color = Class,hede = N)) + geom_point() + 345 | scale_y_continuous(breaks = c(1,2,3), 346 | labels = c('E','N','G'),limits = c(1,3)) + 347 | scale_x_continuous(breaks = c(1,2,3), 348 | labels = c('C','N','L'),limits = c(3,1), trans = 'reverse')+ 349 | ylab('Good/Evil') + 350 | xlab('Chaotic/Lawful') + 351 | scale_color_manual(values = barPalette) ->p 352 | 353 | ply = plotly::ggplotly(p) %>% layout(xaxis=list(fixedrange=TRUE)) %>% 354 | config(displayModeBar = F) %>% 355 | layout(yaxis=list(fixedrange=TRUE)) 356 | ply$width = 400 357 | ply$height = 300 358 | # rmarkdown seems to ignore alignment set using plotly 359 | div(ply,align = 'center') 360 | 361 | ``` 362 | 363 | Darn! Most of the space in this graph is wasted. Even good old paladin has a 364 | chaotic tendency. Seems like 5e really helped players to break tradition. Meanwhile, Warlock is predictably the evilest class. 365 | 366 | We can also 367 | do the same to backgrounds. Since they probably explain more than a character's 368 | back story than a class does we might get more information. 369 | 370 | 371 | ```{r backGroundAlignment} 372 | 373 | getMeanAlignments = function(table, property, minRepresentation = 3){ 374 | uniqueThing = table[[property]] %>% table %>% {.[.>minRepresentation]} %>% names 375 | goodThing = uniqueThing %>% sapply(function(x){ 376 | thingAlignment = table[table[[property]] %in% x,] 377 | good =thingAlignment %$% good 378 | mean(good %>% as.integer) 379 | }) 380 | lawfulThing = uniqueThing %>% sapply(function(x){ 381 | thingAlignment = table[table[[property]] %in% x,] 382 | lawful =thingAlignment %$% lawful 383 | mean(lawful %>% as.integer) 384 | }) 385 | 386 | thingCount = uniqueThing %>% sapply(function(x){ 387 | table[table[[property]] == x,] %>% nrow 388 | }) 389 | 390 | thingAligment = data.frame(`Good/Evil` = goodThing,`Chaotic/Lawful` = lawfulThing, 391 | thing = uniqueThing,N = thingCount, 392 | check.names = FALSE) 393 | names(thingAligment)[3] = property 394 | return(thingAligment) 395 | } 396 | 397 | backgroundAlignment = getMeanAlignments(alignmentTable,property = 'background') 398 | 399 | names(backgroundAlignment)[3] = 'Background' 400 | 401 | backgroundAlignment %>% ggplot(aes(y = `Good/Evil`,x = `Chaotic/Lawful`, color = Background,hede = N)) + geom_point() + 402 | scale_y_continuous(breaks = c(1,2,3), 403 | labels = c('E','N','G'),limits = c(1,3)) + 404 | scale_x_continuous(breaks = c(1,2,3), 405 | labels = c('C','N','L'),limits = c(3,1), trans = 'reverse')+ 406 | ylab('Good/Evil') + 407 | xlab('Chaotic/Lawful') ->p# + 408 | # scale_color_manual(values = barPalette) 409 | 410 | ply = plotly::ggplotly(p) %>% layout(xaxis=list(fixedrange=TRUE)) %>% 411 | config(displayModeBar = F) %>% 412 | layout(yaxis=list(fixedrange=TRUE)) 413 | ply$width = 550 414 | ply$height = 300 415 | # rmarkdown seems to ignore alignment set using plotly 416 | div(ply,align = 'center') 417 | 418 | ``` 419 | 420 | This looks better. On extremes we have Knights who tend to be lawful, Folk Heroes 421 | and Hermits on the good, Bounty Hunters, Charlatans, Urchins on chaotic and Criminals 422 | as the only background left of Neutral on the Good/Evil line. 423 | 424 | Obviously next logical step is racial profiling. 425 | 426 | ```{r raceAlignment} 427 | 428 | raceAlignment = getMeanAlignments(alignmentTable,property = 'processedRace') 429 | 430 | 431 | names(raceAlignment)[3] = 'Race' 432 | 433 | raceAlignment %<>% filter(Race !='') 434 | 435 | 436 | raceAlignment %>% ggplot(aes(y = `Good/Evil`,x = `Chaotic/Lawful`, color = Race,hede = N)) + 437 | geom_point() + 438 | scale_y_continuous(breaks = c(1,2,3), 439 | labels = c('E','N','G'),limits = c(1,3)) + 440 | scale_x_continuous(breaks = c(1,2,3), 441 | labels = c('C','N','L'),limits = c(3,1), trans = 'reverse')+ 442 | ylab('Good/Evil') + 443 | xlab('Chaotic/Lawful') + 444 | scale_color_manual(values = barPalette) ->p# + 445 | # scale_color_manual(values = barPalette) 446 | 447 | ply = plotly::ggplotly(p) %>% layout(xaxis=list(fixedrange=TRUE)) %>% 448 | config(displayModeBar = F) %>% 449 | layout(yaxis=list(fixedrange=TRUE)) 450 | ply$width = 500 451 | ply$height = 300 452 | # rmarkdown seems to ignore alignment set using plotly 453 | div(ply,align = 'center') 454 | 455 | ``` 456 | 457 | Take that racism! Half-orcs tend to be nicer characters than humans (Disclaimer: Like most, if not all of the one to one comparisons you can make here, difference between Half-Orc and Human "goodness" is not statistically significant, p = `r alignmentTable %>% filter(processedRace %in% c('Human','Half-Orc')) %>% mutate(good = as.integer(good)) %>% lm(good~processedRace,data=.) %>% summary %$% coefficients %>% {.[2,4]} %>% round(digits = 2)`). Alas, Tieflings 458 | are as close to Chaotic Stupid as they are stereotyped as. 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | ## Are your feat choices rare? 469 | 470 | Jeremy Crawford once [tweeted](https://twitter.com/jeremyecrawford/status/969020122177331200?lang=en) 471 | 472 | > Another piece of D&D data: a majority of D&D characters don't use feats. Many players love the customization possible with feats, but a larger group of players is happy to make characters without feats. Feats are, therefore, not a driving force behind many players' choices. 473 | 474 | We can see whether or not our data agrees. On a surface look `r round(sum(uniqueTable$feats!='')/nrow(uniqueTable)*100)`% of all characters 475 | have at least one feat. However, this is partially caused by the fact that a significant portion (`r round(sum(uniqueTable$level %in% c(1,2,3))/nrow(uniqueTable)*100)`%) 476 | of our characters are between levels 1-3 and unless they are variant humans, they cannot have feats. We can see that by higher levels, feat adoption rates increase significantly, suggesting that once given the opportunity, players are likely to pick a feat. 477 | 478 | 479 | ```{r featProportions,fig.height=4.3} 480 | uniqueTable %>% 481 | filter(!is.na(levelGroup)) %>% 482 | group_by(levelGroup) %>% 483 | mutate(levelGroup2 = paste0(levelGroup,'\n(',n(),' chars)')) %>% 484 | ungroup() %>% 485 | arrange(levelGroup) %>% 486 | mutate(levelGroup2 = factor(levelGroup2, levels = unique(levelGroup2))) %>% 487 | group_by(levelGroup2) %>% 488 | summarise(featPopularity = sum(feats!='')/n()*100) %>% 489 | ggplot(aes(x = levelGroup2,y = featPopularity)) + 490 | geom_text(aes(label = paste(round(featPopularity),'%')),vjust=-0.25) + 491 | geom_bar(stat = 'identity') + 492 | ylab('% with at least one feat') + xlab('Level Interval') + 493 | ggtitle('Feat adoption by character levels') 494 | 495 | commonPlayTable = uniqueTable %>% filter(as.integer(levelGroup) %in% c(2,3,4)) 496 | commonPlayFeatRate = commonPlayTable %>% {sum(.$feats!='')/nrow(.)} 497 | 498 | ``` 499 | 500 | It can be postulated players spend most of their time between levels 4-15. `r round(commonPlayFeatRate*100)`% of all characters 501 | in this range has at least one feat. As I later discovered, this also somewhat correlates with the 502 | [data in DnDBeyond](https://twitter.com/BadEyeAdam/status/969435420676231169) though the percentages 503 | here are higher overall. 504 | 505 | **Note:** I am getting messages about how this clearly shows how Crawford was super wrong. 506 | That's not very accurate. It is true that my data shows a higher proportion of feat 507 | adoption than the D&D beyond data, however we cannot conclusively reject the statement 508 | "a majority of D&D characters don't use feats" due to possible sampling errors. If we 509 | take level 4-15 interval into consideration, our sample size is `r nrow(commonPlayTable)`. 510 | Based on this we have a `r round(sqrt( commonPlayFeatRate*(1- commonPlayFeatRate)/nrow(commonPlayTable)) * 1.96*100)`% margin of error (95% confidence) on that `r round(commonPlayFeatRate*100)`%. 511 | 512 | Next step is to examine which classes picks which feats, and which feats 513 | are the most popular. The graph below shows which feat is selected the most and by which 514 | class. Multiclassed characters are merged into their own category to reduce clutter. 515 | Any feat that is selected only twice or less is removed. Again, mouse over the bars to see details. 516 | 517 | ```{r featBar} 518 | featedChars = uniqueTable %>% 519 | filter(feats!='') %>% 520 | mutate(justClass = {justClass[grepl('\\|',justClass)] = 'Multiclassed';justClass}) %>% 521 | filter(justClass %in% names(which(table(justClass)>1))) 522 | class = featedChars$justClass 523 | feats = featedChars$feats 524 | 525 | uniqueFeats = feats %>% str_split('\\|') %>% unlist %>% unique %>% na.omit() 526 | 527 | featPicks = feats %>% str_split('\\|') 528 | 529 | names(featPicks) = class 530 | 531 | featFrame = 532 | featPicks %>% melt %>% {names(.) = c('Feat','Class');.} %>% 533 | mutate(Feat = factor(Feat,levels = names(sort(table(Feat),decreasing = TRUE)))) %>% 534 | filter(Feat %in% names(which(table(Feat)>2))) %>% group_by(Feat,Class) %>% summarize(Count = n()) 535 | 536 | 537 | featFrame %>% 538 | ggplot(aes(x = Feat,y = Count, fill = Class)) + 539 | geom_bar(stat = 'identity') + 540 | xlab('') + 541 | theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + 542 | scale_fill_manual(values = barPalette)+ 543 | ggtitle('Feat popularity and class prefence')->p 544 | 545 | ply = plotly::ggplotly(p) %>% config(displayModeBar = F) %>% layout(xaxis=list(fixedrange=TRUE)) %>% layout(yaxis=list(fixedrange=TRUE)) 546 | 547 | ply$height = 500 548 | 549 | div(ply,align='center') 550 | # version of this code that splits multiclasses into components. results 551 | # in an ugly graph 552 | # singleClassed %>% filter(!is.na(feats)) 553 | 554 | # 555 | # feats = uniqueTable$feats %>% str_split('\\|') %>% unlist %>% na.omit%>%unique 556 | # 557 | # classes = uniqueTable$justClass %>% str_split('\\|') %>% unlist %>% unique 558 | # featCoOccurence = matrix(0,nrow = length(feats),ncol = length(classes)) 559 | # 560 | # for (i in seq_along(feats)){ 561 | # for (j in seq_along(classes)){ 562 | # ((grepl(feats[i],uniqueTable$feats,perl= TRUE)) * { 563 | # classLevel =str_extract(uniqueTable$class,glue('(?<={classes[j]} )[0-9]+')) %>% {.[is.na(.)] = 0;.} %>% as.integer() 564 | # classLevel/uniqueTable$level 565 | # }) %>% sum -> coOcc 566 | # featCoOccurence[i,j] = coOcc 567 | # } 568 | # } 569 | # 570 | # colnames(featCoOccurence) = classes 571 | # rownames(featCoOccurence) = feats 572 | # 573 | # featCoOccurence = featCoOccurence[,!featCoOccurence %>% apply(2,sum) %>% {.<2}] 574 | # featCoOccurence = featCoOccurence[!featCoOccurence %>% apply(1,sum) %>% {.<1},] 575 | # 576 | # featFrame = featCoOccurence %>% melt %>% filter(value!=0) %>% {names(.)=c('Feat','Class','Count');.} 577 | # popFeat = featFrame %>% group_by(Feat) %>% summarize(total = sum(Count)) %>% arrange(desc(total)) %>% filter(total>1) 578 | # featFrame %<>% 579 | # filter(Feat %in% popFeat$Feat) %>% 580 | # mutate(Feat = factor(Feat,levels = popFeat$Feat), 581 | # Class = factor(Class, levels = sort(as.character(unique(Class))))) 582 | # 583 | # 584 | # featFrame %>% 585 | # ggplot(aes(x = Feat,y = Count, fill = Class)) + geom_bar(stat = 'identity') + 586 | # xlab('') + 587 | # theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + 588 | # scale_fill_manual(values = c('#7DD4A6','#C15BC5','#D65242','#415455','#D2A75C','#8FD25B', 589 | # '#D15B86','#A5B5BE','#727EC6','#567441','#754334','#5E3A60'))+ 590 | # ggtitle('Feat popularity and class prefence')->p 591 | # 592 | # ply = plotly::ggplotly(p) 593 | # 594 | # ply$height = 500 595 | # 596 | # div(ply,align='center') 597 | 598 | ``` 599 | 600 | It is surprising that Elven Accuracy, a feat that is added in a supplement and restricted to elves, is as 601 | popular as many core book feats that are known to be highly effective. `r uniqueTable %>% filter(grepl('elf|Variant',race,ignore.case = TRUE)) %$% feats %>% {grepl('Elven A',.)} %>% {sum(.)/length(.)*100} %>% round`% of all elves and half-elves have this feat. Its appeal to both 602 | ranged weapon attackers and casters seems to make it a good choice for elves from many walks 603 | of life. Another interesting bit 604 | is that the Magic Initiate feat seems be very popular amongst classes with spellcasting ability. I was 605 | always under the impression that Magic Initiate's main use case would be to add some magic to a mundane 606 | class. 607 | 608 | We can also look into how feats synergize with each other. The network below shows how often feats 609 | are selected together. Unique connections are removed. Node sizes represent how many times a feat appeared together with another feat. The thickness of the lines between the nodes are determined 610 | by the number of characters both feats appear in. 611 | 612 | ```{r featNetwork,fig.width=7.5,fig.height=7.5} 613 | featCoOccurence = uniqueTable %>% filter(grepl("\\|",feats)) %$% feats 614 | uniqueFeats = featCoOccurence %>% strsplit('\\|') %>% unlist %>% table %>% sort(decreasing = TRUE) %>% 615 | {.[.>0]}%>% names 616 | adjMatrix = matrix(0,nrow= length(uniqueFeats),ncol = length(uniqueFeats)) 617 | 618 | 619 | for (i in seq_along(uniqueFeats)){ 620 | for(j in seq_along(uniqueFeats)){ 621 | if(i !=j){ 622 | feati = grepl(x = featCoOccurence, 623 | pattern = paste0('(\\||^)',uniqueFeats[i], ('(\\||$)'))) 624 | featj = grepl(x = featCoOccurence, 625 | pattern = paste0('(\\||^)',uniqueFeats[j], ('(\\||$)'))) 626 | 627 | adjMatrix[i,j] = sum(feati & featj) 628 | } 629 | } 630 | } 631 | uniqueFeats %<>% str_replace(' ','\n') 632 | 633 | 634 | 635 | rownames(adjMatrix) = uniqueFeats 636 | colnames(adjMatrix) = uniqueFeats 637 | 638 | threshold = 1 639 | adjMatrix = adjMatrix-threshold 640 | adjMatrix[adjMatrix < 1] = 0 641 | zeroFilter = adjMatrix %>% apply(1,sum) %>% {.!=0} 642 | adjMatrix = adjMatrix[zeroFilter,zeroFilter] 643 | 644 | 645 | network=graph_from_adjacency_matrix( adjMatrix, weighted=T, mode="undirected", diag=F) 646 | E(network)$width <- E(network)$weight*2.5 647 | 648 | maxWeight = E(network)$weight %>% max 649 | maxStrength = strength(network) %>% max 650 | par(mar=c(0,0,1,0)) 651 | 652 | set.seed(9) 653 | plot(network, 654 | vertex.frame.color="white", 655 | vertex.label.color="black", 656 | vertex.size = strength(network)*1.5, 657 | main = 'Feat synergy network', 658 | asp = 1) 659 | ``` 660 | 661 | Before I say anything, I have to declare the connections in this graph aren't particularly 662 | strong. There are 663 | too many feats and I have too few characters for high number of feats to appear together. 664 | The strongest link in this graph is based on `r max(adjMatrix)+1` observations. 665 | 666 | Yet, as it stands, the connections seem quite intuitive, so we are probably not staring at noise here. 667 | Robustness of elven accuracy is visible in this graph as it is both selected by 668 | characters trying to optimize their ranged and spell attacks. 669 | Crossbow Expert-Sharpshooter is known to be an effective combination to boost damage. 670 | Sentinel-Polearm Master is amazing for battlefield control. 671 | 672 | ## Is your multiclass combination rare? 673 | 674 | Since our dataset includes multiclassed characters, we can see which classed tend to appear 675 | together. Note that our sample size much smaller here (`r nrow(multiClassed)` characters). Node sizes in the 676 | network below show how many times a class appeared in all multiclassed characters. The thickness of the lines between the nodes 677 | are determined by the number of characters both classes appear in. For instance, we see that most rangers 678 | multiclass with rogues, while most rogues multiclass with fighters. 679 | 680 | ```{r multiClassingNetwork} 681 | 682 | coOccurence = multiClassed$justClass 683 | # in case I need them ordered 684 | uniqueClasses = coOccurence %>% 685 | strsplit('\\|') %>% 686 | unlist %>% 687 | table %>% 688 | sort(decreasing = TRUE) %>% 689 | names 690 | uniqueClasses = uniqueClasses[uniqueClasses %in% legitClasses] 691 | 692 | adjMatrix = matrix(0,nrow= length(uniqueClasses),ncol = length(uniqueClasses)) 693 | 694 | for (i in seq_along(uniqueClasses)){ 695 | for(j in seq_along(uniqueClasses)){ 696 | if(i !=j){ 697 | adjMatrix[i,j] = sum(grepl(x = coOccurence,pattern = uniqueClasses[i]) & grepl(x = coOccurence,pattern = uniqueClasses[j])) 698 | } 699 | } 700 | } 701 | rownames(adjMatrix) = uniqueClasses 702 | colnames(adjMatrix) = uniqueClasses 703 | network=graph_from_adjacency_matrix( adjMatrix, weighted=T, mode="undirected", diag=F) 704 | E(network)$width <- E(network)$weight 705 | 706 | maxWeight = E(network)$weight %>% max 707 | maxStrength = strength(network) %>% max 708 | par(mar=c(0,0,1,0)) 709 | 710 | plot(network,layout = layout_in_circle, 711 | vertex.frame.color="white", 712 | vertex.label.color="black", 713 | vertex.size = strength(network), 714 | main = 'Multiclassing network', 715 | asp = 1 716 | ) 717 | 718 | 719 | ``` 720 | 721 | While this network is good to show which classes tend to be chosen together, it doesn't 722 | give much information about how classes are distributed. In the below graph we look at 723 | what is ratio of class levels in individual characters. A Fighter 5/Rogue 15 would appear 724 | as a 25% data point in the Fighter column and 75% in the Rogue column. This will give 725 | us information about which classes are dipped in and which ones are used as the main class. 726 | 727 | ```{r multiClassingProportions} 728 | 729 | multiClassProportion = lapply(uniqueClasses, function(x){ 730 | classSubset = multiClassed %>% filter(grepl(x,justClass)) 731 | 732 | classLevel = classSubset$class %>% 733 | str_extract(glue('{x} [0-9]+')) %>% 734 | str_extract('[0-9]+') %>% 735 | as.integer 736 | 737 | classLevel/classSubset$level 738 | 739 | }) 740 | 741 | multiClassTotalLevel = lapply(uniqueClasses, function(x){ 742 | classSubset = multiClassed %>% 743 | filter(grepl(x,justClass)) 744 | totalLevel = classSubset$level 745 | 746 | }) 747 | 748 | multiClassChar = lapply(uniqueClasses, function(x){ 749 | classSubset = multiClassed %>% 750 | filter(grepl(x,justClass)) 751 | classInfo = classSubset$class 752 | 753 | }) 754 | 755 | names(multiClassProportion) = uniqueClasses 756 | names(multiClassTotalLevel) = uniqueClasses 757 | 758 | multiClassProportion %<>% 759 | melt 760 | order = multiClassProportion %>% 761 | group_by(L1) %>% 762 | summarise(mean = mean(value)) %>% 763 | arrange(desc(mean)) %$% L1 764 | 765 | multiClassProportion$L1 %<>% factor(levels = order) 766 | multiClassTotalLevel %<>% melt 767 | multiClassChar %<>% melt 768 | multiClassProportion = cbind(multiClassProportion,multiClassTotalLevel$value,multiClassChar$value) 769 | 770 | names(multiClassProportion) = c('ClassProp','Class','Level','Char') 771 | 772 | 773 | multiClassProportion %<>% mutate(ClassProp = round(ClassProp * 100,digits = 2)) 774 | 775 | multiClassProportion %>% 776 | ggplot(aes(x = Class, y = ClassProp, label = Char)) + 777 | geom_violin(color = "#C4C4C4", fill = "#C4C4C4") + 778 | geom_jitter(alpha = .5,width = 0.1) + 779 | theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + 780 | ylab('% of character level') + 781 | xlab('') ->p 782 | 783 | ply = ggplotly(p) %>% layout(xaxis=list(fixedrange=TRUE)) %>% 784 | config(displayModeBar = F) %>% 785 | layout(yaxis=list(fixedrange=TRUE)) 786 | ply$width = 600 787 | ply$x$data[[2]]$text = multiClassProportion$Char 788 | ply$x$data[[1]]$hoverinfo = 'none' 789 | div(ply,align = 'center') 790 | 791 | ``` 792 | 793 | While there is a high amount of variation in the data, some conventional wisdom 794 | pops up through the means. Warlock is famous for its dipping potential and a Cleric 795 | level synergizes nicely with many other class features. I am a proud player of a Cleric 796 | dipped Fighter myself. I would avoid reading too much into this though. The variance 797 | is too high and sample size is too low to make reliable inferences. 798 | 799 | And finally let's see which classes tend to appear in multiclassed builds compared 800 | to single classed ones 801 | 802 | ```{r mutliVsSingle} 803 | 804 | totalClass = uniqueClasses %>% sapply(function(x){grepl(x,uniqueTable$justClass) %>% sum}) 805 | multiClass = uniqueClasses %>% sapply(function(x){grepl(x,multiClassed$justClass) %>% sum}) 806 | 807 | multiProps = sort(multiClass/totalClass,decreasing = TRUE) 808 | 809 | data.frame(Class = factor(names(multiProps),levels = names(multiProps)),Prop = multiProps*100) %>% 810 | ggplot(aes(x = Class, y= Prop)) + 811 | geom_bar(stat = 'identity') + 812 | xlab('') + 813 | geom_text(aes(label = paste(round(Prop),'%')),vjust=-0.25) + 814 | theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + ylab('% in multiclassed build') 815 | ``` 816 | 817 | 818 | ## Is power gaming rare? 819 | 820 | ```{r multiClassingAndFeats} 821 | highLevel = uniqueTable %>% filter(levelGroup %>% as.integer %>% {.>1}) 822 | HLmultiClassed = highLevel %>% filter(grepl('\\|',justClass)) 823 | HLsingleClassed = highLevel %>% filter(!grepl('\\|',justClass)) 824 | 825 | singleClassedFeaters = sum(HLsingleClassed$feats!='') 826 | 827 | multiClassedFeaters = sum(HLmultiClassed$feats!='') 828 | pVal = phyper(multiClassedFeaters, 829 | multiClassedFeaters + singleClassedFeaters, 830 | sum(uniqueTable$feats!=''), 831 | nrow(multiClassed), lower.tail = FALSE, log.p = FALSE) 832 | ``` 833 | 834 | Ok that title is a stretch, but we have a format to stick to. 835 | 836 | Both multiclassing and picking feats are somewhat advanced character building rules. 837 | While making the character building process complicated, they can be used to create frighteningly 838 | affective combinations (or get stuck waiting till the end of the campaign till their build gets 839 | everything they want). Intuitively, it wouldn't be surprising to see that multiclassers are more likely 840 | to get feats to optimize their builds. Indeed, we see that `r round(multiClassedFeaters/nrow(HLmultiClassed)*100)`% 841 | of multiclassed characters above level 3 chose to get a feat as opposed to 842 | `r round(singleClassedFeaters/nrow(HLsingleClassed)*100)`% of single classed counterparts. 843 | A modest yet statistically significant difference (p=`r format.pval(pVal,digits = 2)`). 844 | 845 | ## Are your spells rare? 846 | 847 | Like alignment, spells were annoying to deal with. The app only allows writing free 848 | text as spells and doesn't automatically fill anything other than cleric domain spells. 849 | Some casters don't even seem to bother with filling anything and when they do, they sometimes 850 | shorten the name of the spell or add things like damage dice next to it. Thanks to 851 | some computer magic (string distances to all existing spell names), we can identify 852 | what they are trying to say with a satisfying accuracy. The low level heavy nature of the 853 | dataset also strikes again as higher level spells appear less and less frequent. 854 | 855 | Below you see how frequently a spell is chosen by each class. Spells chosen 856 | by less than 3 people are removed. I also totally ignored multiclassed characters 857 | here because I'm not going to bother with trying to decide which spell came from 858 | which class. 859 | 860 | If you don't see high level spells that means not enough people agreed on any particular 861 | spells to make it to the table 862 | 863 | ```{r spells} 864 | # for (x in c('Wizard','Cleric','Sorcerer','Druid','Warlock','Bard')){ 865 | c('Wizard','Cleric','Sorcerer','Druid','Warlock','Bard') %>% lapply(function(x){ 866 | 867 | classFrame = singleClassed[singleClassed$justClass %in% x,] 868 | 869 | spellNames = classFrame %$% processedSpells %>% strsplit('\\|') %>% map(str_extract,'.*?(?=\\*)') %>% unlist 870 | spellLevels = classFrame %$% processedSpells %>% strsplit('\\|') %>% map(str_extract,'(?<=\\*).*') %>% unlist %>% as.integer 871 | 872 | levelCount = classFrame %$% processedSpells %>% strsplit('\\|') %>% map(str_extract,'(?<=\\*).*') %>% map(unique) %>% unlist %>% as.integer() %>% table 873 | 874 | 875 | frame = data.frame(spellNames,spellLevels,levelCount = as.integer(levelCount[spellLevels %>% as.character]),stringsAsFactors = FALSE) %>% 876 | arrange(spellLevels) %>% group_by(spellNames,spellLevels,levelCount) %>% summarise(count = n()) %>% ungroup() %>% arrange(spellLevels,desc(count)) %>% 877 | mutate(`%` = round(count/levelCount*100)) %>% filter(levelCount>1 & `%` >10 & count>2) 878 | 879 | groupSep = frame$spellLevels %>% duplicated() %>% not %>% which 880 | groupSep = c(groupSep,nrow(frame)) 881 | groupLevels= frame$spellLevels %>% unique %>% paste('Level',.) 882 | groupLevels[groupLevels %in% 'Level 0'] = 'Cantrip' 883 | 884 | frame %<>% select(spellNames,count,`%`) 885 | 886 | kbl = kable(frame,caption = x,format = 'html') %>% 887 | kable_styling("striped", full_width = F) 888 | 889 | for(i in seq_along(groupLevels)){ 890 | kbl %<>% group_rows(groupLevels[i],groupSep[i],groupSep[i+1]-1) 891 | } 892 | 893 | kbl %>% scroll_box(width = "100%", height = "250px") %>% HTML() 894 | }) -> tables 895 | 896 | 897 | 898 | 899 | div( 900 | fluidRow( 901 | column(4, 902 | tables[[1]]), 903 | column(4, 904 | tables[[2]]), 905 | column(4, 906 | tables[[3]])), 907 | fluidRow( 908 | column(4, 909 | tables[[4]]), 910 | column(4, 911 | tables[[5]]), 912 | column(4, 913 | tables[[6]]) 914 | 915 | ) 916 | ) 917 | 918 | ``` 919 | 920 | ## Is your game day rare? 921 | 922 | My applications are they are purely utilitarian. One gives you 923 | a character sheet, the other is an interactive character sheet that automates your dice roll. 924 | It is somewhat reasonable to think that most people would be using them shortly before or during a game. Graphs below 925 | how many characters were created in each day of the week and below that there's a punch card that 926 | shows individual hours. 927 | 928 | 929 | 930 | ```{r gameDay,fig.height=8} 931 | reliableDateTable = uniqueTable %>% filter(as.Date(date) > as.Date('2018-04-16')) 932 | 933 | days = reliableDateTable$ date %>% weekdays() 934 | hours = as.POSIXlt( reliableDateTable$date)$hour 935 | 936 | time = data.frame(days = factor(days, levels = c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday')), hours = hours) 937 | 938 | 939 | time %>% group_by(days,hours) %>% summarise(Characters = n()) %>% 940 | ggplot(aes(x = days,y = hours,size = Characters)) + geom_point() + 941 | theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 ), 942 | plot.margin = unit(c(0,0,0,0),'cm')) + 943 | xlab('') + 944 | ylab('Hour of Day') -> plot1 945 | 946 | time %>% group_by(days) %>% summarise(Characters = n()) %>% 947 | ggplot(aes(x = days,y = Characters)) + geom_bar(stat ='identity') + 948 | theme(axis.text.x = element_blank(), 949 | axis.ticks.x = element_blank(), 950 | plot.margin = unit(c(0,0,0,0),'cm'))+ 951 | xlab('') + ggtitle('Time of character submission') -> plot2 952 | 953 | plot2/plot1 + plot_layout(ncol = 1, heights = c(3,5)) 954 | 955 | ``` 956 | 957 | Frankly not much to be said here. Most popular days of the week are obviously weekends and Friday. DnD 958 | takes time. More work = less DnD. 959 | Hours of day are somewhat unreliable as I didn't correct for user time zones. US alone, which seems to be 960 | where most my users are coming from, can 961 | have 3 hours of difference. I could use IPs and detect locations to fix times but not 962 | going into that rabbit hole... How long before the game a player may want their character 963 | sheet is also a great source of variability. I mostly did this because I like punch cards... 964 | 965 | ## About the data 966 | 967 | Unique characters are acquired by grouping the characters that share the same name and class 968 | and picking the higher level version. This could have merged independent characters with tropey names 969 | like Grognak the Barbarian of Drizzt the Ranger but manual examination of the data showed no cases of characters 970 | who appear to be made by different people but still has the same name and class. 971 | 972 | If a multiclassed character shares name with a single 973 | classed character, I assume they are duplicates if the single classed character is lower level and 974 | its class matches with one of the classes of the multiclassed character. 975 | 976 | Any character above level 20 (there were `r sum(charTable$level > 20)`) were removed. 977 | 978 | `r sum(grepl('Revised',keepRevised$class,ignore.case = TRUE))` Revised Rangers were merged back into 979 | the ranger class. 980 | 981 | Most percentages are rounded to the nearest integer. 982 | 983 | As all data, this data comes with caveats. It is a subset of all DnD players who are using a 984 | particular mobile application who also know about and use my applications and consented 985 | to let me to keep their character sheets. I don't have reason 986 | to think that these would be enriching certain character building choices but it's 987 | something to keep in mind. 988 | 989 | 990 | ```{r statistics} 991 | fighterCount = grepl('Fighter',uniqueTable$class) %>% sum 992 | battleMasterCount = uniqueTable$subclass %>% str_split('\\|') %>% unlist %>% {. %in% 'Battle Master'} %>% sum 993 | battleMasterPercent = battleMasterCount/fighterCount 994 | bmConfInf = sqrt(battleMasterPercent*(1-battleMasterPercent)/fighterCount) * 1.96 995 | 996 | 997 | championCount = uniqueTable$subclass %>% str_split('\\|') %>% unlist %>% {. %in% 'Champion'} %>% sum 998 | championPercent = championCount/fighterCount 999 | cmConfInf= sqrt(championPercent*(1-championPercent)/fighterCount) * 1.96 1000 | 1001 | 1002 | ``` 1003 | In most parts of this document no information is provided about whether or not the differences 1004 | are actually statistacilly significant. Sorry about that. Didn't want to fill this place with 1005 | too much math. For instance we can see that we have 1006 | `r battleMasterCount` battle masters 1007 | vs `r championCount` champions. This is not a statistically significant difference based on our sample size 1008 | so we cannot state with high confidence that one is more popular than the other. 1009 | 1010 | If you are interested in significance of any of these measures, you can take a peak at this [article](https://en.wikipedia.org/wiki/Margin_of_error) on wikipedia where formulas needed are explained. 1011 | For some of these at least you should be able to get the information you need from the article. 1012 | 1013 | If you have any questions, you can [mail me](mailto:ogan.mancarci@gmail.com). Mention "dndstats" 1014 | somewhere in the 1015 | text so you won't be sent to spam. 1016 | 1017 | 1018 | ## Data access 1019 | 1020 | This dataset is present in 2 forms: in its entirety that includes duplicates 1021 | of characters and filtered version that only includes unique characters. 1022 | 1023 | Go [here](https://github.com/oganm/dndstats/blob/master/docs/charTable.tsv) for the complete data and [here](https://github.com/oganm/dndstats/blob/master/docs/uniqueTable.tsv) for the filtered one. Click the raw button 1024 | to get them in plain text. Both have the same columns as explained below. 1025 | The code to generate these tables can be found [here](https://github.com/oganm/dndstats/blob/master/dataProcess.R). 1026 | 1027 | Below are the descriptions of the columns in the files. If you think something you'd be interested 1028 | in is missing, you can let me know. 1029 | 1030 | **name:** This column has hashes that represent character names. If the hashes are 1031 | the same, that means the names are the same. Real names are removed 1032 | to protect character anonymity. Yes D&D characters have rights. 1033 | 1034 | **race:** This is the race field as it come out of the application. It is not really 1035 | helpful as subrace and race information all mixed up together and unevenly available. 1036 | It also includes some homebrew content. You probably want to use the **processedRace** 1037 | column if you are interested in this. 1038 | 1039 | **background:** Background as it comes out of the application. 1040 | 1041 | **date:** Time & date of input. Dates before 2018-04-16 are unreliable as some has accidentally changed 1042 | while moving files around. 1043 | 1044 | **class:** Class and level. Different classes are separated by `|` when needed. 1045 | 1046 | **justClass:** Class without level. Different classes are separated by `|` when needed. 1047 | 1048 | **subclass:** Subclasses. Again, separated by `|` when needed. 1049 | 1050 | **level:** Total character level. 1051 | 1052 | **feats:** Feats chosen by character. Separated by `|` when needed. 1053 | 1054 | **HP:** Character HP. 1055 | 1056 | **AC:** Character AC. 1057 | 1058 | **Str, Dex, Con, Int, Wis, Cha:** ability scores 1059 | 1060 | **alignment:** Alignment free text field. It is a mess, don't touch it. See **processedAlignment**,**good** and **lawful** instead. 1061 | 1062 | **skills:** List of skills with proficiency. Separated by `|`. 1063 | 1064 | **weapons:** List weapons. Separated by `|`. It is somewhat of a mess as it allows free text inputs. See **processedWeapons**. 1065 | 1066 | **spells:** List of spells and their levels. Spells are separated by `|`s. Each spell has its level next to it 1067 | separated by `*`s. This is a huge mess as its a free text field and some users included things like damage dice in them. See **processedSpells**. 1068 | 1069 | **day:** A shortened version of **date**. Only includes day information. 1070 | 1071 | **processedAlignment:** Processed version of the **alignment** column. Way people wrote up their alignments are manually sifted through and assigned to the matching aligmment. First character represents lawfulness (L, N, C), second one goodness (G,N,E). An empty string means alignment wasn't written or unclear. 1072 | 1073 | **good, lawful:** Isolated columns for goodness and lawfulness. 1074 | 1075 | **processedRace:** I have gone through the way **race** column is filled by the app and asigned them to correct 1076 | races. If empty, indiciates a homebrew race not natively supported by the app. 1077 | 1078 | **processedSpells:** Formatting is same as the **spells** column but it is cleaned up. Using string similarity I tried 1079 | to match the spells to the full list of spells available in the official publications. The spell is removed if the spell I guessed does not have the correct level or doesn't include all words of the original spell and has too many modifications to be recognizable. It may have a few false matches but it should be mostly fine 1080 | 1081 | **processedWeapons:** Similar to **processedSpells**, **weapons** column is matched to the closest official weapon with some restrictions. 1082 | 1083 | **levelGroup:** splits levels into groups as used in the feat percentage plot. Only present in the filtered data 1084 | but easy enough to make on your own. 1085 | 1086 | 1087 | ## About this document 1088 | 1089 | The text of this document is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license 1090 | 1091 | [Here](https://github.com/oganm/DnDStatistics/blob/master/docs/index.Rmd)'s its source code. It's not pretty. 1092 | 1093 | The code blocks within the source code is licensed under [MIT license](https://opensource.org/licenses/MIT). 1094 | 1095 | ## Changelog 1096 | 1097 | **9 September 2018:** 1098 | * Data from 100 more characters added. 1099 | 1100 | **19 August 2018:** 1101 | 1102 | * Typo in data release. Same name hash means names are the same not characters. 1103 | * Alignment flip again to match memes 1104 | 1105 | **18 August 2018 2:** 1106 | 1107 | * Fix bug that counts the percentage of people who wrote their alignments down wrong 1108 | * Flip alignment axes 1109 | * Disclaimer about feat adoption 1110 | 1111 | **18 August 2018:** 1112 | 1113 | * Data from additional 82 characters incorporated. No significant changes observed. 1114 | * Links to the data added 1115 | * Spell information added 1116 | * Feat bar plot now filters any feat that is taken less than 3 times instead of 2 1117 | 1118 | **2 August 2018:** 1119 | 1120 | * License information added. 1121 | * A forgotten word added. 1122 | * Data from 40 additional characters incorporated. No significant changes observed. 1123 | * Claim about increased decency of Half-Orcs softened 1124 | * Changelog added 1125 | 1126 | **28 July 2018:** 1127 | 1128 | * Initial release 1129 | 1130 | 1131 | 1132 | 1133 | --------------------------------------------------------------------------------