├── 538.tsv
├── DnDStatistics.Rproj
├── LICENSE
├── README.md
├── dataProcess.R
└── docs
    ├── 538.tsv
    ├── charTable.tsv
    ├── index.Rmd
    ├── index.html
    └── uniqueTable.tsv


/538.tsv:
--------------------------------------------------------------------------------
 1 | Race	FIGHTER	ROGUE	WIZARD	BARBARIAN	CLERIC	RANGER	PALADIN	WARLOCK	MONK	BARD	SORCERER	DRUID	TOTAL
 2 | HUMAN	4,888	2,542	2,568	1,435	2,339	1,715	2,326	1,714	1,946	1,454	1,324	996	25,248
 3 | ELF	1,242	2,257	2,744	336	921	3,076	492	755	1,349	651	841	1,779	16,443
 4 | HALF-ELF	646	1,325	611	153	628	891	817	1,401	399	1,808	1,258	516	10,454
 5 | DWARF	2,009	362	395	1,323	2,199	415	971	286	405	394	264	484	9,507
 6 | DRAGONBORN	1,335	325	346	875	510	355	1,688	584	457	371	1,031	309	8,185
 7 | TIEFLING	379	798	516	198	353	272	473	2,188	309	806	1,062	281	7,634
 8 | GENASI	580	495	558	388	459	420	322	415	750	352	648	584	5,971
 9 | HALFLING	339	1,797	257	306	308	440	207	296	551	801	310	302	5,916
10 | HALF-ORC	976	233	143	1,709	272	245	427	212	284	199	126	215	5,039
11 | GNOME	257	600	1,360	227	304	238	151	311	196	400	257	332	4,634
12 | GOLIATH	865	139	109	1,729	192	187	389	136	326	144	114	190	4,522
13 | AARAKOCRA	273	362	181	313	249	572	149	203	835	279	177	275	3,868
14 | AASIMAR	116	71	67	70	274	60	429	210	87	144	174	65	1,767
15 | TOTAL	13,906	11,307	9,855	9,063	9,009	8,887	8,840	8,711	7,892	7,804	7,587	6,328	NA


--------------------------------------------------------------------------------
/DnDStatistics.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 4
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Burak Ogan Mancarci
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | DnD character statistics
 3 | ========================
 4 | 
 5 | This is my experiment on doing some stats on DnD characters.
 6 | 
 7 | See [here](https://oganm.github.io/dndstats/) for the document
 8 | 
 9 | 
10 | See [here](https://github.com/oganm/dndstats/blob/master/docs/index.Rmd) for the Rmd source code for the document.
11 | 
12 | The text of this document is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license
13 | 
14 | The code blocks within the source code is licensed under [MIT license](https://opensource.org/licenses/MIT).
15 | 
16 | 
17 | ## Data access
18 | 
19 | This dataset is present in 2 forms: in its entirety that includes duplicates
20 | of characters and filtered version that only includes unique characters.
21 | 
22 | Go [here](docs/charTable.tsv) for the complete data and [here](docs/uniqueTable.tsv) for the filtered one. Both have
23 | the same columns as explained below. The code to generate these tables can be found [here](https://github.com/oganm/dndstats/blob/master/dataProcess.R).
24 | 
25 | Below are the descriptions of the columns in the files. If you think something you'd be interested
26 | in is missing, you can let me know.
27 | 
28 | **name:** This column has hashes that represent character names. If the hashes are
29 | the same, that means the names are the same. Real names are removed
30 | to protect character anonymity. Yes D&D characters have rights.
31 | 
32 | **race:** This is the race field as it come out of the application. It is not really
33 | helpful as subrace and race information all mixed up together and unevenly available.
34 | It also includes some homebrew content. You probably want to use the **processedRace**
35 | column if you are interested in this.
36 | 
37 | **background:** Background as it comes out of the application.
38 | 
39 | **date:** Time & date of input. Dates before 2018-04-16 are unreliable as some has accidentally changed
40 | while moving files around.
41 | 
42 | **class:** Class and level. Different classes are separated by `|` when needed.
43 | 
44 | **justClass:** Class without level. Different classes are separated by `|` when needed.
45 | 
46 | **subclass:** Subclasses. Again, separated by `|` when needed.
47 | 
48 | **level:** Total character level.
49 | 
50 | **feats:** Feats chosen by character. Separated by `|` when needed.
51 | 
52 | **HP:** Character HP.
53 | 
54 | **AC:** Character AC.
55 | 
56 | **Str, Dex, Con, Int, Wis, Cha:** ability scores
57 | 
58 | **alignment:** Alignment free text field. It is a mess, don't touch it. See **processedAlignment**,**good** and **lawful** instead.
59 | 
60 | **skills:** List of skills with proficiency.  Separated by `|`.
61 | 
62 | **weapons:** List weapons. Separated by `|`. It is somewhat of a mess as it allows free text inputs. See **processedWeapons**.
63 | 
64 | **spells:** List of spells and their levels. Spells are separated by `|`s. Each spell has its level next to it
65 | separated by `*`s. This is a huge mess as its a free text field and some users included things like damage dice in them. See **processedSpells**.
66 | 
67 | **day:** A shortened version of **date**. Only includes day information.
68 | 
69 | **processedAlignment:** Processed version of the **alignment** column. Way people wrote up their alignments are manually sifted through and assigned to the matching aligmment. First character represents lawfulness (L, N, C), second one goodness (G,N,E). An empty string means alignment wasn't written or unclear.
70 | 
71 | **good, lawful:** Isolated columns for goodness and lawfulness.
72 | 
73 | **processedRace:** I have gone through the way **race** column is filled by the app and asigned them to correct
74 | races. If empty, indiciates a homebrew race not natively supported by the app.
75 | 
76 | **processedSpells:** Formatting is same as the **spells** column but it is cleaned up.  Using string similarity I tried
77 | to match the spells to the full list of spells available in the official publications. The spell is removed if the spell I guessed does not have the correct level or doesn't include all words of the original spell and has too many modifications to be recognizable. It may have a few false matches but it should be mostly fine
78 | 
79 | **processedWeapons:** Similar to **processedSpells**, **weapons** column is matched to the closest official weapon with some restrictions.
80 | 
81 | **levelGroup:** splits levels into groups as used in the feat percentage plot. Only present in the filtered data
82 | but easy enough to make on your own.
83 | 


--------------------------------------------------------------------------------
/dataProcess.R:
--------------------------------------------------------------------------------
  1 | library(import5eChar) # github.com/oganm/import5eChar
  2 | library(purrr)
  3 | library(readr)
  4 | library(glue)
  5 | library(digest)
  6 | library(dplyr)
  7 | library(XML)
  8 | library(ogbox) # github.com/oganm/ogbox
  9 | library(wizaRd) # github.com/oganm/wizaRd
 10 | library(stringr)
 11 | library(memoise)
 12 | 
 13 | # memoImportChar = memoise(importCharacter)
 14 | # saveRDS(memoImportChar,'memoImportChar.rds')
 15 | memoImportChar = readRDS('/home/oganm/gitRepos/DnDStatistics/memoImportChar.rds')
 16 | 
 17 | # get all char files saved everywhere
 18 | charFiles = c(list.files('/srv/shiny-server/printSheetApp/chars/',full.names = TRUE),
 19 |               list.files('/srv/shiny-server/interactiveSheet/chars/',full.names = TRUE),
 20 |               list.files('/srv/shiny-server/chars',full.names = TRUE),
 21 |               list.files('/srv/shiny-server/chars2', full.names = TRUE),
 22 |               list.files('/srv/shiny-server/chars3', full.names = TRUE),
 23 |               list.files('/srv/shiny-server/chars4', full.names = TRUE))
 24 | print('reading char files')
 25 | # use import5eChar to read the all of them
 26 | chars = charFiles %>% lapply(function(x){
 27 |     memoImportChar(file = x)
 28 | })
 29 | saveRDS(memoImportChar,'memoImportChar.rds')
 30 | 
 31 | # get date information. dates before 2018-04-16 are not reliable 
 32 | fileInfo = file.info(charFiles) 
 33 | # get user fingerprint and IP
 34 | fileData = charFiles %>% basename %>% strsplit('_')
 35 | 
 36 | # add file and user info to the characters
 37 | print('constructing char table')
 38 | chars = lapply(1:length(chars),function(i){
 39 |     char = chars[[i]]
 40 |     char$date = fileInfo$mtime[i]
 41 |     if(length(fileData[[i]]) == 1){
 42 |         char$ip = 'NULL'
 43 |         char$finger = 'NULL'
 44 |         char$hash = fileData[[i]]
 45 |     } else{
 46 |         char$finger = fileData[[i]][1]
 47 |         char$ip = fileData[[i]][2]
 48 |         char$hash = fileData[[i]][3]
 49 |     }
 50 |     char
 51 | })
 52 | 
 53 | # setting the names to character name and class. this won't be exposed to others
 54 | names(chars) = chars %>% map_chr(function(x){
 55 |     paste(x$Name,x$ClassField)
 56 | })
 57 | 
 58 | # create the table 
 59 | charTable = chars %>% map(function(x){
 60 |     data.frame(ip = x$ip,
 61 |                finger = x$finger,
 62 |                hash = x$hash,
 63 |                name = x$Name,
 64 |                race = x$Race,
 65 |                background = x$Background,
 66 |                date = x$date,
 67 |                class = paste(x$classInfo[,1],x$classInfo[,3],collapse='|'),
 68 |                justClass =  x$classInfo[,'Class'] %>% paste(collapse ='|'),
 69 |                subclass = x$classInfo[,'Archetype'] %>% paste(collapse ='|'),
 70 |                level = x$classInfo[,'Level'] %>% as.integer() %>% sum,
 71 |                feats = x$feats[x$feats !=''] %>% paste(collapse = '|'),
 72 |                HP = x$currentHealth,
 73 |                AC = AC(x),
 74 |                Str = x$abilityScores['Str'],
 75 |                Dex = x$abilityScores['Dex'],
 76 |                Con = x$abilityScores['Con'],
 77 |                Int = x$abilityScores['Int'],
 78 |                Wis = x$abilityScores['Wis'],
 79 |                Cha = x$abilityScores['Cha'],
 80 |                alignment = x$Alignment,
 81 |                skills = x$skillProf %>% which %>% names %>% paste(collapse = '|'),
 82 |                weapons = x$weapons %>% map_chr('name') %>% gsub("\\|","",.)  %>% paste(collapse = '|'),
 83 |                spells = glue('{x$spells$name %>% gsub("\\\\*|\\\\|","",.)}*{x$spells$level}') %>% glue::collapse('|') %>% {if(length(.)!=1){return('')}else{return(.)}},
 84 |                day = x$date %>%  format('%m %d %y'),
 85 |                stringsAsFactors = FALSE)
 86 | }) %>% do.call(rbind,.)
 87 | 
 88 | 
 89 | 
 90 | # post processing -----
 91 | # the way races are encoded in the app is a little silly. sub-races are 
 92 | # not recorded separately. essentially race information is lost other
 93 | # than a text field after it's effects are applied during creation.
 94 | # The text field is also not too consistent. For instance if you are a 
 95 | # variant it'll simply say "Variant" but if you are a variant human
 96 | # it'll only say human
 97 | # here, I define regex that matches races.
 98 | # kind of an overkill as only few races actually required special care
 99 | races = c(Aarakocra = 'Aarakocra',
100 |           Aasimar = 'Aasimar',
101 |           Bugbear= 'Bugbear',
102 |           Dragonborn = 'Dragonborn',
103 |           Dwarf = 'Dwarf',
104 |           Elf = '(?<!Half-)Elf',
105 |           Firbolg = 'Firbolg',
106 |           Genasi= 'Genasi',
107 |           Gith = 'Geth',
108 |           Gnome = 'Gnome',
109 |           Goblin='Goblin',
110 |           Goliath = 'Goliath',
111 |           'Half-Elf' = '(Half-Elf)|(^Variant)',
112 |           'Half-Orc' = 'Half-Orc',
113 |           Halfling = 'Halfling',
114 |           Hobgoblin = 'Hobgoblin',
115 |           Human = 'Human',
116 |           Kenku = 'Kenku',
117 |           Kobold = 'Kobold',
118 |           Lizardfolk = 'Lizardfolk',
119 |           Orc = '(?<!Half-)Orc',
120 |           'Yaun-Ti' = 'Serpentblood',
121 |           Tabaxi = 'Tabaxi',
122 |           Tiefling ='Tiefling|Lineage',
123 |           Triton = 'Triton',
124 |           Turtle = 'Turtle|Tortle')
125 | 
126 | align = list(NG = c('NG',
127 |                     '"Good"',
128 |                     "Neuteral Good",
129 |                     "Neutral Good",
130 |                     "Nuetral Goodt",
131 |                     "Neutral/Good",
132 |                     "Neutral good",
133 |                     'neutral good',
134 |                     'neutral-good',
135 |                     'Neutral Good ',
136 |                     'Nuetral Good',
137 |                     'N/G'),
138 |              CG = c('Chaotic Good',
139 |                     'CG',
140 |                     'Chacotic Good',
141 |                     'Chaotic good',
142 |                     'Good Chaotic',
143 |                     'chaotic good',
144 |                     'Chaotic Good '),
145 |              LG = c('Lawful Good',
146 |                     'Lawful Good ',
147 |                     'L-G',
148 |                     'LG',
149 |                     'lawful good',
150 |                     'Lawful good'),
151 |              NN = c('Neutral',
152 |                     'neutral ',
153 |                     'Neutral ',
154 |                     'n',
155 |                     'N',
156 |                     'True Neutral',
157 |                     'True Neutral ',
158 |                     'neutral',
159 |                     'TN',
160 |                     'Neutral Neutral',
161 |                     'true neutral',
162 |                     'Neutral neutral'),
163 |              CN = c('Chaotic Neutral',
164 |                     'CN',
165 |                     'chaotic neutral',
166 |                     'Chaotic neutral',
167 |                     'chaotic nuetral',
168 |                     'Chaotic Nuetral',
169 |                     'cn',
170 |                     'Chaotic Neutral ',
171 |                     'neutral chaotic ',
172 |                     'neutral chaotic'),
173 |              LN = c('Lawful Neutral',
174 |                     'lawful neutral ',
175 |                     'Leal e Neutro',
176 |                     'lawful - neutral',
177 |                     'LN',
178 |                     'Lawful Neutral (good-ish)',
179 |                     'lawful neutral',
180 |                     'lawful neutral'),
181 |              NE = c('Neutral Evil'),
182 |              LE = c('Lawful Evil','LE'),
183 |              CE = c('CE','Chaotic Evil'))
184 | 
185 | goodEvil = list(`E` = c('NE','LE','CE'),
186 |                 `N` = c('LN','CN','NN'),
187 |                 `G` = c('NG','LG','CG'))
188 | 
189 | lawfulChaotic = list(`C` = c('CN','CG','CE'),
190 |                      `N` = c('NG','NE','NN'),
191 |                      `L` = c('LG','LE','LN'))
192 | 
193 | # lists any alignment text I'm not processing
194 | charTable$alignment  %>% {.[!. %in% unlist(align)]} %>% table %>% sort %>% names
195 | 
196 | checkAlignment = function(x,legend){
197 |     x = names(legend)[findInList(x,legend)]
198 |     if(length(x) == 0){
199 |         return('')
200 |     } else{
201 |         return(x)
202 |     }
203 | }
204 | 
205 | 
206 | charTable %<>% mutate(processedAlignment = alignment %>% purrr::map_chr(checkAlignment,align),
207 |                         good = processedAlignment %>% purrr::map_chr(checkAlignment,goodEvil) %>% 
208 |                             factor(levels = c('E','N','G')),
209 |                         lawful = processedAlignment %>% 
210 |                             purrr::map_chr(checkAlignment,lawfulChaotic) %>% factor(levels = c('C','N','L'))) 
211 | 
212 | charTable %<>% mutate(processedRace = race %>% sapply(function(x){
213 |     out = races %>% sapply(function(y){
214 |         grepl(pattern = y, x,perl = TRUE,ignore.case = TRUE)
215 |     }) %>% which %>% names
216 |     
217 |     if(length(out) == 0 | length(out)>1){
218 |         out = ''
219 |     }
220 |     
221 |     return(out)
222 | }))
223 | 
224 | # remove personal info
225 | 
226 | shortestDigest = function(vector){
227 |     digested  = vector %>% map_chr(digest,'sha1')
228 |     uniqueDigested =  digested %>% unique 
229 |     
230 |     collusionLimit = 1:40 %>% sapply(function(i){
231 |         substr(uniqueDigested,40-i,40)%>% unique %>% length
232 |     }) %>% which.max %>% {.+1}
233 |     
234 |     digested %<>%  substr(40-collusionLimit,40)
235 | }
236 | 
237 | 
238 | charTable$name %<>% shortestDigest
239 | charTable$ip %<>% shortestDigest
240 | charTable$finger %<>% shortestDigest
241 | charTable$hash %<>% shortestDigest
242 | 
243 | spells = wizaRd::spells
244 | 
245 | spells = c(spells, list('.' = list(level = as.integer(99))))
246 | class(spells) = 'list'
247 | 
248 | legitSpells =spells %>% names
249 | 
250 | 
251 | processedSpells = charTable$spells %>% sapply(function(x){
252 |     if(x==''){
253 |         return('')
254 |     }
255 |     spellNames = x %>% str_split('\\|') %>% {.[[1]]} %>% str_split('\\*') %>% map_chr(1)
256 |     spellLevels =  x %>% str_split('\\|') %>% {.[[1]]} %>% str_split('\\*') %>% map_chr(2)
257 |     
258 |     distanceMatrix = adist(tolower(spellNames), tolower(legitSpells),costs = list(ins=2, del=2, sub=3), counts = TRUE)
259 |     
260 |     rownames(distanceMatrix) = spellNames
261 |     colnames(distanceMatrix) = legitSpells
262 |     
263 |     predictedSpell = distanceMatrix %>% apply(1,which.min) %>% {legitSpells[.]}
264 |     distanceScores =  distanceMatrix %>% apply(1,min) 
265 |     predictedSpellLevel = spells[predictedSpell] %>% purrr::map_int('level')
266 |     
267 |     ins = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'ins'] %>% as.matrix  %>% diag
268 |     del = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'del'] %>% as.matrix %>% diag
269 |     sub = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'sub'] %>% as.matrix %>% diag
270 |     isItIn = predictedSpell %>% str_split(' |/') %>% map(function(x){
271 |         x[!x %in% c('and','or','of','to','the')]
272 |     }) %>% 
273 |     {sapply(1:length(.),function(i){
274 |         all(sapply(.[[i]],grepl,x =spellNames[i],ignore.case=TRUE))
275 |     })}
276 |     
277 |     spellFrame = data.frame(spellNames,predictedSpell,spellLevels,predictedSpellLevel,distanceScores,ins,del,sub,isItIn,stringsAsFactors = FALSE)
278 |     
279 |     spellFrame %<>% filter(as.integer(spellLevels)==predictedSpellLevel &( isItIn | (sub < 5 & del < 5 & ins < 5)))
280 |     
281 |     paste0(spellFrame$predictedSpell,'*',spellFrame$predictedSpellLevel,collapse ='|')
282 | })
283 | charTable$processedSpells = processedSpells
284 | 
285 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$spells[i],charTable$processedSpells[i])}) %>% {.>20} %>% {charTable$spells[.]} %>% {.[43]}
286 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$spells[i],charTable$processedSpells[i])}) %>% {.>20} %>% {charTable$spells[.]} %>% {.[70]}
287 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$spells[i],charTable$processedSpells[i])}) %>% {.>20} %>% {charTable$spells[.]} %>% {.[88]}
288 | 
289 | # download.file('https://www.dropbox.com/s/4f7zdx09nkfa9as/Core.xml?dl=1',destfile = 'Core.xml')
290 | # allRules = xmlParse('Core.xml') %>% xmlToList()
291 | # fightClubItems = allRules[names(allRules) == 'item']
292 | # saveRDS(fightClubItems,'fightClubItems.rds')
293 | 
294 | # fightClubItems =  readRDS('fightClubItems.rds')
295 | # names(fightClubItems) = allRules %>% map('name') %>% as.character
296 | # 
297 | # fightClubItems %>% map_chr('type') %>% {. %in% 'M'} %>% {fightClubItems[.]} %>% map_chr('name')
298 | # fightClubItems %>% map_chr('type') %>% {. %in% 'R'} %>% {fightClubItems[.]} %>% map_chr('name')
299 | 
300 | legitWeapons = c(# fightClubItems %>% map_chr('type') %>% {. %in% 'M'} %>% {fightClubItems[.]} %>% map_chr('name'),
301 |                  # fightClubItems %>% map_chr('type') %>% {. %in% 'R'} %>% {fightClubItems[.]} %>% map_chr('name'),
302 |                  'Crossbow, Light', 'Dart', 'Shortbow', 'Sling',
303 |                  'Blowgun', 'Crossbow, hand', 'Crossbow, Heavy', 'Longbow', 'Net',
304 |                  'Club','Dagger','Greatclub','Handaxe','Javelin','Light hammer','Mace','Quarterstaff','Sickle','Spear','Unarmed Strike',
305 |                  'Battleaxe','Flail','Glaive','Greataxe','Greatsword','Halberd','Lance','Longsword','Maul','Morningstar','Pike','Rapier','Scimitar','Shortsword','Trident','War pick','Warhammer','Whip')
306 | 
307 | processedWeapons = charTable$weapons %>% sapply(function(x){
308 |     if(x==''){
309 |         return('')
310 |     }
311 |     weaponNames = x %>% str_split('\\|') %>% {.[[1]]} 
312 | 
313 |     distanceMatrix = adist(tolower(weaponNames), tolower(legitWeapons),costs = list(ins=2, del=2, sub=3), counts = TRUE)
314 |     
315 |     rownames(distanceMatrix) = weaponNames
316 |     colnames(distanceMatrix) = legitWeapons
317 |     
318 |     predictedWeapon = distanceMatrix %>% apply(1,which.min) %>% {legitWeapons[.]}
319 |     distanceScores =  distanceMatrix %>% apply(1,min) 
320 |     
321 |     ins = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'ins'] %>% as.matrix  %>% diag
322 |     del = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'del'] %>% as.matrix %>% diag
323 |     sub = attributes(distanceMatrix)$counts[,distanceMatrix %>% apply(1,which.min),'sub'] %>% as.matrix %>% diag
324 |     isItIn = predictedWeapon %>% str_split(' |/') %>% map(function(x){
325 |         x[!x %in% c('and','or','of','to','the')]
326 |     }) %>% 
327 |     {sapply(1:length(.),function(i){
328 |         all(sapply(.[[i]],grepl,x =weaponNames[i],ignore.case=TRUE))
329 |     })}
330 |     
331 |     weaponFrame = data.frame(weaponNames,predictedWeapon,distanceScores,ins,del,sub,isItIn,stringsAsFactors = FALSE)
332 |     
333 |     weaponFrame %<>% filter(isItIn|  (sub < 2 & del < 2 & ins < 2))
334 |     
335 |     paste0(weaponFrame$predictedWeapon %>% unique,collapse ='|')
336 | })
337 | 
338 | charTable$processedWeapons = processedWeapons
339 | 
340 | # x = 1:nrow(charTable) %>% sapply(function(i){adist(charTable$weapons[i],charTable$processedWeapons[i])}) %>% {.>20} %>% {charTable$weapons[.]} %>% {.[10]}
341 | 
342 |     
343 | unsecureFields = c('ip','finger','hash')
344 | 
345 | charTable = charTable[!names(charTable) %in% unsecureFields]
346 | 
347 | # user id ------
348 | # userID = c()
349 | # pb = txtProgressBar(min = 0, max = nrow(charTable), initial = 0) 
350 | # 
351 | # for(i in 1:nrow(charTable)){
352 | #     setTxtProgressBar(pb,i)
353 | #     for (id in unique(userID)){
354 | #         userChars = charTable[which(userID == id),]
355 | #         ip = charTable$ip[i] %>% {if(is.na(.) || . =='NULL' || .==''){return("NANA")}else{.}}
356 | #         finger = charTable$finger[i] %>% {if(is.na(.) || . =='NULL' ||. == ''){return("NANA")}else{.}}
357 | #         hash = charTable$hash[i] %>% {if(is.na(.) || . =='NULL' || . == ''){return("NANA")}else{.}}
358 | #         
359 | #         ipInUser = ip %in% userChars$ip
360 | #         fingerInUser = finger %in% userChars$finger
361 | #         hashInUser = hash %in% userChars$hash
362 | #         if(ipInUser | fingerInUser | hashInUser){
363 | #             
364 | #             userID = c(userID,id)
365 | #             break
366 | #         }
367 | #         
368 | #     }
369 | #     
370 | #     if(length(userID)!=i){
371 | #         userID = c(userID, max(c(userID,0))+1)
372 | #     }
373 | # }
374 | # 
375 | # charTable$userID = userID
376 | # 
377 | # 
378 | # userID = c()
379 | # pb = txtProgressBar(min = 0, max = nrow(charTable), initial = 0) 
380 | # 
381 | # for(i in 1:nrow(charTable)){
382 | #     setTxtProgressBar(pb,i)
383 | #     for (id in unique(userID)){
384 | #         userChars = charTable[which(userID == id),]
385 | #         ip = charTable$ip[i] %>% {if(is.na(.) || . =='NULL' || .==''){return("NANA")}else{.}}
386 | #         finger = charTable$finger[i] %>% {if(is.na(.) || . =='NULL' ||. == ''){return("NANA")}else{.}}
387 | #         hash = charTable$hash[i] %>% {if(is.na(.) || . =='NULL' || . == ''){return("NANA")}else{.}}
388 | #         
389 | #         ipInUser = ip %in% userChars$ip
390 | #         fingerInUser = finger %in% userChars$finger
391 | #         hashInUser = hash %in% userChars$hash
392 | #         if(fingerInUser | hashInUser){
393 | #             
394 | #             userID = c(userID,id)
395 | #             break
396 | #         }
397 | #         
398 | #     }
399 | #     
400 | #     if(length(userID)!=i){
401 | #         userID = c(userID, max(c(userID,0))+1)
402 | #     }
403 | # }
404 | # 
405 | # charTable$userIDNoIP = userID
406 | # 
407 | write_tsv(charTable,path = 'docs/charTable.tsv')
408 | # 
409 | # # secure table -----
410 | # 
411 | # # not sure about the legality of this but I may be able to share 
412 | # # the data in an anonymized form.
413 | # 
414 | # secureTable = charTable
415 | # secureTable$ip %<>% sapply(function(x){
416 | #     if(x %in% c('','NULL')){
417 | #         return('')
418 | #     } else{
419 | #         digest(x,'sha1')
420 | #     }
421 | # })
422 | # secureTable$finger %<>% sapply(function(x){
423 | #     if(x %in% c('','NULL')){
424 | #         return('')
425 | #     } else{
426 | #         digest(x,'sha1')
427 | #     }
428 | # })
429 | # 
430 | # secureTable$name %<>% sapply(digest,'sha1')
431 | # write_tsv(secureTable,path = 'docs/hashedTable.tsv')
432 | 
433 | 


--------------------------------------------------------------------------------
/docs/538.tsv:
--------------------------------------------------------------------------------
 1 | Race	FIGHTER	ROGUE	WIZARD	BARBARIAN	CLERIC	RANGER	PALADIN	WARLOCK	MONK	BARD	SORCERER	DRUID	TOTAL
 2 | HUMAN	4,888	2,542	2,568	1,435	2,339	1,715	2,326	1,714	1,946	1,454	1,324	996	25,248
 3 | ELF	1,242	2,257	2,744	336	921	3,076	492	755	1,349	651	841	1,779	16,443
 4 | HALF-ELF	646	1,325	611	153	628	891	817	1,401	399	1,808	1,258	516	10,454
 5 | DWARF	2,009	362	395	1,323	2,199	415	971	286	405	394	264	484	9,507
 6 | DRAGONBORN	1,335	325	346	875	510	355	1,688	584	457	371	1,031	309	8,185
 7 | TIEFLING	379	798	516	198	353	272	473	2,188	309	806	1,062	281	7,634
 8 | GENASI	580	495	558	388	459	420	322	415	750	352	648	584	5,971
 9 | HALFLING	339	1,797	257	306	308	440	207	296	551	801	310	302	5,916
10 | HALF-ORC	976	233	143	1,709	272	245	427	212	284	199	126	215	5,039
11 | GNOME	257	600	1,360	227	304	238	151	311	196	400	257	332	4,634
12 | GOLIATH	865	139	109	1,729	192	187	389	136	326	144	114	190	4,522
13 | AARAKOCRA	273	362	181	313	249	572	149	203	835	279	177	275	3,868
14 | AASIMAR	116	71	67	70	274	60	429	210	87	144	174	65	1,767
15 | TOTAL	13,906	11,307	9,855	9,063	9,009	8,887	8,840	8,711	7,892	7,804	7,587	6,328	NA


--------------------------------------------------------------------------------
/docs/index.Rmd:
--------------------------------------------------------------------------------
   1 | ---
   2 | output: html_document
   3 | always_allow_html: yes
   4 | editor_options: 
   5 |   chunk_output_type: console
   6 | ---
   7 | 
   8 | ```{r setup, include=FALSE}
   9 | library(dplyr)
  10 | library(magrittr)
  11 | library(readr)
  12 | library(stringr)
  13 | library(ggplot2)
  14 | library(cowplot)
  15 | library(glue)
  16 | library(reshape2)
  17 | library(igraph)
  18 | library(circlize)
  19 | library(patchwork)
  20 | library(plotly)
  21 | library(shiny)
  22 | library(here)
  23 | library(knitr)
  24 | library(purrr)
  25 | library(kableExtra)
  26 | library(ogbox) # github.com/oganm/ogbox
  27 | knitr::opts_chunk$set(echo = FALSE, fig.align ='center')
  28 | 
  29 | 
  30 | getUniqueTable = function(charTable){
  31 |     uniqueTable = charTable %>% arrange(desc(level)) %>% filter(!duplicated(paste(name,justClass))) %>% 
  32 |         filter(!level > 20)
  33 |     
  34 |     # detect non unique characters that multiclassed
  35 |     multiClassed = uniqueTable %>% filter(grepl('\\|',justClass))
  36 |     singleClassed = uniqueTable %>% filter(!grepl('\\|',justClass))
  37 |     
  38 |     
  39 |     matchingNames = multiClassed$name[multiClassed$name %in% singleClassed$name]%>% na.omit 
  40 |     
  41 |     isDuplicate = matchingNames %>% sapply(function(nm){
  42 |         multiChar = multiClassed %>% filter(name == nm)
  43 |         singleChar = singleClassed %>% filter(name == nm)
  44 |         
  45 |         if(nrow(multiChar) != 1 | nrow(singleChar) != 1){
  46 |             warning('Not 1-1 match. Skipping')
  47 |             return(FALSE)
  48 |         } else{
  49 |             isSubset = str_split(multiChar$justClass,pattern = '\\|') %>% {.[[1]]} %>% {singleChar$justClass %in% .}
  50 |             isHigherLevel = multiChar$level > singleChar$level
  51 |             return(isSubset & isHigherLevel)
  52 |         }
  53 |     })
  54 |     
  55 |     singleClassed %<>% filter(!name %in% matchingNames[isDuplicate])
  56 |     
  57 |     uniqueTable = rbind(singleClassed,multiClassed)
  58 |     
  59 |     return(list(uniqueTable = uniqueTable,
  60 |                 singleClassed = singleClassed,
  61 |                 multiClassed = multiClassed))
  62 | }
  63 | 
  64 | # load table and get unique characters
  65 | 
  66 | charTable = read_tsv(here("docs/charTable.tsv"),na = 'NA')
  67 | charTable %<>% mutate(good = factor(good,levels = c('E','N','G')),
  68 |                       lawful =  factor(lawful, levels = c('C','N','L')))
  69 | 
  70 | # group levels at common feat acquisition points. sorry fighters and rogues
  71 | charTable %<>% mutate(levelGroup = cut(level,
  72 |                                        breaks = c(0,3,7,11,15,18,20),
  73 |                                        labels  = c('1-3','4-7','8-11','12-15','16-18','19-20')))
  74 | 
  75 | # for anyone looking at this and confused by the weird syntax
  76 | # see https://stackoverflow.com/questions/1826519/how-to-assign-from-a-function-which-returns-more-than-one-value
  77 | list[keepRevised,,] = getUniqueTable(charTable)
  78 | charTable$justClass %<>%  gsub(pattern = 'Revised ', replacement = '',x = .)
  79 | charTable$class %<>%  gsub(pattern = 'Revised ', replacement = '',x = .)
  80 | 
  81 | list[uniqueTable,singleClassed,multiClassed] = getUniqueTable(charTable)
  82 | 
  83 | write_tsv(uniqueTable,path = here('docs/uniqueTable.tsv'))
  84 | 
  85 | 
  86 | barPalette = c('#7DD4A6','#C15BC5','#D65242','#415455',
  87 |                '#D2A75C','#8FD25B','#D15B86','#A5B5BE','#727EC6',
  88 |                '#567441','#754334','#5E3A60','#77B0D0',"#CCEBC5",
  89 |                "#D9D9D9","#FCCDE5")
  90 | ```
  91 | 
  92 | Table of Contents
  93 | =================
  94 | 
  95 |    * [Is your D&amp;D character rare? II: Off-brand edition](#is-your-dd-character-rare-ii-off-brand-edition)
  96 |       * [Introduction](#introduction)
  97 |       * [Is Your D&amp;D Character Rare? II](#is-your-dd-character-rare-ii)
  98 |       * [Is your character archetype rare?](#is-your-character-archetype-rare)
  99 |       * [Is your alignment rare?](#is-your-alignment-rare)
 100 |       * [Are your feat choices rare?](#are-your-feat-choices-rare)
 101 |       * [Is your multiclass combination rare?](#is-your-multiclass-combination-rare)
 102 |       * [Is power gaming rare?](#is-power-gaming-rare)
 103 |       * [Are your spells rare?](#are-your-spells-rare)
 104 |       * [Is your game day rare?](#is-your-game-day-rare)
 105 |       * [About the data](#about-the-data)
 106 |       * [Data access](#data-access)
 107 |       * [About this document](#about-this-document)
 108 |       * [Changelog](#changelog)
 109 | 
 110 | 
 111 | # Is your D&D character rare? II: Off-brand edition
 112 | 
 113 | *Ogan Mancarci, 28 July 2018*
 114 | 
 115 | *Edited: 9 September 2018 (see [changlelog](#changelog))*
 116 | 
 117 | ## Introduction
 118 | 
 119 | About a year ago FiveThirtyEight published a short article called 
 120 | ["Is Your D&D Character Rare?"](https://fivethirtyeight.com/features/is-your-dd-character-rare/).
 121 | It was a product of a deal between Curse and FiveThirtyEight which meant the data
 122 | was not available to anyone else. I was a little jealous that I couldn't play with the data and disappointed that they only counted class race combinations and called it a day.
 123 | 
 124 | Shortly after, I released a few tools ([1](https://oganm.github.io/printSheetApp/),[2](https://oganm.github.io/5eInteractiveSheet/)) for a popular mobile application ([3](https://play.google.com/store/apps/details?id=com.wgkammerer.testgui.basiccharactersheet.app&hl=en_CA)) which allowed me to collect my users' character data. 
 125 | 
 126 | After 3.5 months of data collection
 127 | I have a whopping... `r  nrow(uniqueTable)` unique characters in my database that I can play with. Well... I'm not
 128 | as popular as DnDBeyond but I don't see anyone else waving around hundreds of character sheets for us to 
 129 | data mine, so it'll have to do.
 130 | 
 131 | ## Is Your D&D Character Rare? II
 132 | 
 133 | To start with let's redo the table from FiveThirtyEight. I am not going to pretend
 134 | like I have many thousands of samples so instead of per 100,000 this shows class and race combinations per 100
 135 | characters. In FiveThirtyEight's table, characters with multiple classes count once for each class. Here I divided multiclassed characters based on the proportion of their class levels. For instance, a character who is a Fighter 5/Rogue 15 will add 0.75 to the rogue count and 0.25 to the fighter count. Homebrew and UA classes are removed.
 136 | 
 137 | ```{r fiveThirtyEightCopy,fig.width=9}
 138 | 
 139 | # classes = uniqueTable$justClass %>% str_split('\\|') %>% unlist %>% unique
 140 | legitClasses = c("Warlock", "Monk", "Wizard", "Barbarian", "Sorcerer", "Paladin", "Fighter", "Druid", "Ranger", "Rogue","Cleric","Bard")
 141 | races = uniqueTable$processedRace %>% unique %>% {.[.!='']}
 142 | coOccurenceMatrix = matrix(0 , nrow=length(races),ncol = length(legitClasses))
 143 | colnames(coOccurenceMatrix) = legitClasses
 144 | rownames(coOccurenceMatrix) = races
 145 | for (i in seq_along(races)){
 146 |     for (j in seq_along(legitClasses)){
 147 |         ((uniqueTable$processedRace==races[i]) * {
 148 |             classLevel  =str_extract(uniqueTable$class,glue('(?<={legitClasses[j]} )[0-9]+')) %>% {.[is.na(.)] = 0;.} %>% as.integer()
 149 |             classLevel/uniqueTable$level
 150 |             }) %>% sum -> coOcc
 151 |         coOccurenceMatrix[i,j] = coOcc
 152 |     }
 153 | }
 154 | 
 155 | coOccurenceMatrixSubset = coOccurenceMatrix[,!coOccurenceMatrix %>% apply(2,sum) %>% {.<2}]
 156 | 
 157 | coOccurenceMatrixSubset = coOccurenceMatrixSubset[!coOccurenceMatrixSubset %>% apply(1,sum) %>% {.<1},]
 158 | 
 159 | coOccurenceMatrixSubset = 
 160 |     coOccurenceMatrixSubset[coOccurenceMatrixSubset %>% apply(1,sum) %>% order(decreasing = FALSE),
 161 |                             coOccurenceMatrixSubset %>% apply(2,sum) %>% order(decreasing = TRUE)]
 162 | 
 163 | coOccurenceMatrixSubset = coOccurenceMatrixSubset/(sum(coOccurenceMatrix))* 100
 164 | 
 165 | 
 166 | classSums = coOccurenceMatrixSubset %>% apply(2,sum)
 167 | raceSums = coOccurenceMatrixSubset %>% apply(1,sum)
 168 | 
 169 | coOccurenceMatrixSubset = cbind(coOccurenceMatrixSubset,raceSums)
 170 | 
 171 | 
 172 | coOccurenceMatrixSubset = rbind(Total = c(classSums,NA), coOccurenceMatrixSubset)
 173 | colnames(coOccurenceMatrixSubset)[ncol(coOccurenceMatrixSubset)] = "Total"
 174 | 
 175 | coOccurenceFrame = coOccurenceMatrixSubset %>% melt() 
 176 | names(coOccurenceFrame)[1:2] = c('Race','Class')
 177 | 
 178 | coOccurenceFrame %<>% mutate(fillCol = value*(Race!='Total' & Class!='Total'))
 179 | 
 180 | 
 181 | coOccurenceFrame %>% ggplot(aes(x = Class,y = Race)) +
 182 |     geom_tile(aes(fill = fillCol),show.legend = FALSE)+
 183 |     scale_fill_continuous(low = 'white',high = '#46A948',na.value = 'white')+
 184 |     # viridis::scale_fill_viridis() + 
 185 |     geom_text(aes(label = value %>% round(2) %>% format(nsmall=2))) + 
 186 |     scale_x_discrete(position='top') + xlab('') + ylab('') + 
 187 |     theme(axis.text.x = element_text(angle = 30,vjust = 0.5,hjust = 0)) 
 188 | 
 189 | ```
 190 | 
 191 | 
 192 | ```{r fiveThirtyEightCorrMaths,message=FALSE}
 193 | fiveThirtyEight = read_tsv('538.tsv') %>% melt()
 194 | names(fiveThirtyEight)[2] = 'Class'
 195 | 
 196 | fiveThirtyEight %<>% mutate(Class = as.character(Class)) %>%
 197 |     arrange(Race,Class) %>% filter(Race !='TOTAL' & Class != 'TOTAL')
 198 | 
 199 | coOccurenceFrame %<>% mutate(Race = toupper(Race), Class = toupper(Class)) %>% 
 200 |     arrange(Race,Class) %>% 
 201 |     filter(Race %in% fiveThirtyEight$Race & Class %in% fiveThirtyEight$Class)
 202 | 
 203 | corFrame = data.frame(DnDBeyond = fiveThirtyEight$value/1000, oganm = coOccurenceFrame$value,
 204 |            class = coOccurenceFrame$Class,race = coOccurenceFrame$Race)
 205 | 
 206 | 
 207 | ```
 208 | 
 209 | Despite the methodological differences, these results seem to correlate well with DnDBeyond data (Spearman's ρ=`r round(cor(corFrame$DnDBeyond,corFrame$oganm,method = 'spearman'),2)`) even though we seem to disagree on the exact order of popularity. Graph below shows the % occurrence of a class/race combination in DnDBeyond data as presented in FiveThirtyEight and my data.
 210 | 
 211 | 
 212 | ```{r fiveThirtyEightCorr,message=FALSE,fig.height=3.5,fig.width=3.5}
 213 | 
 214 | 
 215 | corFrame %>% ggplot(aes(x = oganm,y = DnDBeyond,text = paste(class, race))) +
 216 |     geom_point() + 
 217 |     ggtitle("Class-race combination %s at\n DnDBeyond vs oganm's data") ->p
 218 | 
 219 | ply = plotly::ggplotly(p) %>%  layout(xaxis=list(fixedrange=TRUE)) %>%
 220 |     config(displayModeBar = F) %>% 
 221 |     layout(yaxis=list(fixedrange=TRUE))
 222 | ply$width = 500
 223 | # rmarkdown seems to ignore alignment set using plotly
 224 | div(ply,align = 'center')
 225 | ```
 226 | 
 227 | ## Is your character archetype rare?
 228 | 
 229 | This is a little hard to visualize in a single plot. Alas we are short on space so you're going
 230 | to have to mouse over to see the details. Each colored section shows a character archetype's proportion
 231 | to the rest of the archetypes for the class. They are ordered from bottom to top in order of 
 232 | frequency, so the brown always show the most popular archetype and it goes downhill (but upwards in the plot) from there.
 233 | 
 234 | ```{r archetypeGraph}
 235 | # uniqueTable$justClass
 236 | classes = uniqueTable$justClass %>% str_split('\\|') %>% unlist
 237 | archetypes = uniqueTable$subclass %>% str_split('\\|') %>% unlist
 238 | 
 239 | 
 240 | archeFrame = data.frame(classes,archetypes) %>% filter(archetypes !='') 
 241 | classSum = archeFrame$classes %>% table %>% sort(decreasing = TRUE)
 242 | 
 243 | archeFrame %<>% group_by(classes,archetypes) %>% summarize(count = n()) %>% 
 244 |     arrange(classes,(count)) %>% filter(classes %in% names(which(classSum>2))) %>% 
 245 |     ungroup() %>% 
 246 |     mutate(archetypes = factor(archetypes,levels = archetypes)) %>% 
 247 |     group_by(classes) %>% 
 248 |     mutate(ratio = count/sum(count)*100) %>%
 249 |     mutate(classArcheID = as.integer(archetypes) - max(as.integer(archetypes)) +1) %>% ungroup() %>% mutate(classArcheID = as.factor(classArcheID)) %>% 
 250 |     mutate(`%` = round(ratio)) %>% 
 251 |     filter(classes %in% legitClasses)
 252 | 
 253 | archeFrame %>% 
 254 |     ggplot(aes(x = classes,y = ratio,fill = classArcheID,
 255 |                label = archetypes,hede = count,hodo = `%`)) +
 256 |     geom_bar(stat='identity') +
 257 |      theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 ),
 258 |            legend.position = 'none') + 
 259 |     scale_fill_manual(values = barPalette) + 
 260 |     ggtitle('Archetype choices') + xlab('') + ylab('archetype % within class')->p
 261 | 
 262 | ply = ggplotly(p,tooltip = c('label','hodo','hede')) %>% layout(xaxis=list(fixedrange=TRUE)) %>%
 263 |     config(displayModeBar = F) %>% 
 264 |     layout(yaxis=list(fixedrange=TRUE))
 265 | div(ply,align='center')
 266 | 
 267 | ```
 268 | 
 269 | ## Is your alignment rare?
 270 | 
 271 | Analysis of alignment in this dataset is difficult because unlike most other fields, 
 272 | it is not mandatory. It also isn't something you are likely to forget about your
 273 | character so there isn't much incentive to fill it in.
 274 | Only `r round(sum(uniqueTable$alignment != '')/nrow(uniqueTable)*100)`% of
 275 | characters actually filled this field. I know I only filled it myself when testing
 276 | my applications. It is entirely possible for the users' choice to fill this box
 277 | to introduce a bias so take these results with a grain of salt
 278 | 
 279 | Also, since its a free text field, some manual
 280 | processing is required to make the most of this information 
 281 | (looking at you fellows with the "Awesome" and "Super Good" alignments). But that is
 282 | still `r sum(uniqueTable$alignment != '')` characters so there you go:
 283 | 
 284 | The plot below shows character counts for each alignment.
 285 | 
 286 | ```{r alignment, fig.height=2,fig.width=2,fig.align='center'}
 287 | 
 288 | alignmentTable = uniqueTable %>% filter(processedAlignment != '')
 289 | 
 290 | 
 291 | 
 292 | 
 293 | alignmentCounts = alignmentTable %>% group_by(good,lawful) %>% 
 294 |     summarize(Count = n())
 295 | 
 296 | 
 297 | alignmentCounts %>% ggplot(aes(y = good,
 298 |                                x = lawful,
 299 |                                fill = Count,
 300 |                                label = Count)) + geom_tile() + 
 301 |     scale_fill_continuous(low = 'white',high = '#46A948',na.value = 'white') + 
 302 |     geom_text() + 
 303 |     ylab('Good/Evil') +
 304 |     xlab('Lawful/Chaotic') +
 305 |     scale_x_discrete(limits = c('L','N','C')) +
 306 |     theme(legend.position = 'none')->p
 307 | p
 308 | ```
 309 | 
 310 | In general, lawful characters seem to be out of style these days. Let's see how are 
 311 | the tendencies for individual classes. Below graph shows a mean alignment for 
 312 | each class. Multiclassed characters' contribution
 313 | is calculated as before. You can mouse over to see sample size and mean values.
 314 | The numerical values are distributed from 1 to 3. 1 is Chaotic/Evil, 3 is Lawful/Good on 
 315 | the corresponding scales.
 316 | 
 317 | ```{r classAlignment}
 318 | 
 319 | classGood = legitClasses %>% sapply(function(x){
 320 |     classAlignment = alignmentTable %>% filter(grepl(x,justClass))
 321 |     good =classAlignment %$% good
 322 |     classProportion = as.integer(stringr::str_extract(classAlignment$class,glue('(?<={x} )[0-9]+')))/classAlignment$level
 323 |     weighted.mean(good %>% as.integer,classProportion)
 324 | })
 325 | 
 326 | classLawful = legitClasses %>% sapply(function(x){
 327 |     classAlignment = alignmentTable %>% filter(grepl(x,justClass))
 328 |     lawful =classAlignment %$% lawful
 329 |     classProportion = as.integer(stringr::str_extract(classAlignment$class,glue('(?<={x} )[0-9]+')))/classAlignment$level
 330 |     weighted.mean(lawful %>% as.integer,classProportion)
 331 | })
 332 | 
 333 | classN = legitClasses %>% sapply(function(x){
 334 |     classAlignment = alignmentTable %>% filter(grepl(x,justClass))
 335 |     classProportion = sum(as.integer(stringr::str_extract(classAlignment$class,glue('(?<={x} )[0-9]+')))/classAlignment$level)
 336 |     return(classProportion)
 337 | })
 338 | 
 339 | 
 340 | classAlignments = data.frame(`Good/Evil` = classGood,`Chaotic/Lawful` = classLawful,
 341 |                              Class = legitClasses,N = classN,
 342 |                              check.names = FALSE) 
 343 | 
 344 | classAlignments %>% ggplot(aes(y = `Good/Evil`,x = `Chaotic/Lawful`, color = Class,hede = N)) + geom_point() +  
 345 |     scale_y_continuous(breaks = c(1,2,3),
 346 |                        labels = c('E','N','G'),limits = c(1,3)) + 
 347 |     scale_x_continuous(breaks = c(1,2,3),
 348 |                        labels = c('C','N','L'),limits = c(3,1), trans = 'reverse')+
 349 |     ylab('Good/Evil') +
 350 |     xlab('Chaotic/Lawful') + 
 351 |     scale_color_manual(values = barPalette) ->p
 352 | 
 353 | ply = plotly::ggplotly(p) %>%  layout(xaxis=list(fixedrange=TRUE)) %>%
 354 |     config(displayModeBar = F) %>% 
 355 |     layout(yaxis=list(fixedrange=TRUE))
 356 | ply$width = 400
 357 | ply$height = 300
 358 | # rmarkdown seems to ignore alignment set using plotly
 359 | div(ply,align = 'center')
 360 | 
 361 | ```
 362 | 
 363 | Darn! Most of the space in this graph is wasted. Even good old paladin has a 
 364 | chaotic tendency. Seems like 5e really helped players to break tradition. Meanwhile, Warlock is predictably the evilest class.
 365 | 
 366 | We can also
 367 | do the same to backgrounds. Since they probably explain more than a character's 
 368 | back story than a class does we might get more information.
 369 | 
 370 | 
 371 | ```{r backGroundAlignment}
 372 | 
 373 | getMeanAlignments = function(table, property, minRepresentation = 3){
 374 |     uniqueThing = table[[property]] %>% table %>% {.[.>minRepresentation]} %>% names
 375 |     goodThing = uniqueThing %>% sapply(function(x){
 376 |         thingAlignment = table[table[[property]] %in%  x,]
 377 |         good =thingAlignment %$% good
 378 |         mean(good %>% as.integer)
 379 |     })
 380 |     lawfulThing = uniqueThing %>%  sapply(function(x){
 381 |         thingAlignment = table[table[[property]] %in%  x,]
 382 |         lawful =thingAlignment %$% lawful
 383 |         mean(lawful %>% as.integer)
 384 |     })
 385 |     
 386 |     thingCount = uniqueThing %>% sapply(function(x){
 387 |         table[table[[property]] == x,] %>% nrow
 388 |     })
 389 |     
 390 |     thingAligment = data.frame(`Good/Evil` = goodThing,`Chaotic/Lawful` = lawfulThing,
 391 |                              thing = uniqueThing,N = thingCount,
 392 |                              check.names = FALSE) 
 393 |     names(thingAligment)[3] = property
 394 |     return(thingAligment)
 395 | }
 396 | 
 397 | backgroundAlignment = getMeanAlignments(alignmentTable,property = 'background')
 398 | 
 399 | names(backgroundAlignment)[3] = 'Background'
 400 | 
 401 | backgroundAlignment %>% ggplot(aes(y = `Good/Evil`,x = `Chaotic/Lawful`, color = Background,hede = N)) + geom_point() +  
 402 |     scale_y_continuous(breaks = c(1,2,3),
 403 |                        labels = c('E','N','G'),limits = c(1,3)) + 
 404 |     scale_x_continuous(breaks = c(1,2,3),
 405 |                        labels = c('C','N','L'),limits = c(3,1), trans = 'reverse')+
 406 |     ylab('Good/Evil') +
 407 |     xlab('Chaotic/Lawful') ->p# + 
 408 |     # scale_color_manual(values = barPalette) 
 409 | 
 410 | ply = plotly::ggplotly(p) %>%  layout(xaxis=list(fixedrange=TRUE)) %>%
 411 |     config(displayModeBar = F) %>% 
 412 |     layout(yaxis=list(fixedrange=TRUE))
 413 | ply$width = 550
 414 | ply$height = 300
 415 | # rmarkdown seems to ignore alignment set using plotly
 416 | div(ply,align = 'center')
 417 | 
 418 | ```
 419 | 
 420 | This looks better. On extremes we have Knights who tend to be lawful, Folk Heroes
 421 | and Hermits on the good, Bounty Hunters, Charlatans, Urchins on chaotic and Criminals
 422 | as the only background left of Neutral on the Good/Evil line.
 423 | 
 424 | Obviously next logical step is racial profiling.
 425 | 
 426 | ```{r raceAlignment}
 427 | 
 428 | raceAlignment = getMeanAlignments(alignmentTable,property = 'processedRace')
 429 | 
 430 | 
 431 | names(raceAlignment)[3] = 'Race'
 432 | 
 433 | raceAlignment %<>% filter(Race !='')
 434 | 
 435 | 
 436 | raceAlignment %>% ggplot(aes(y = `Good/Evil`,x = `Chaotic/Lawful`, color = Race,hede = N)) +
 437 |     geom_point() +  
 438 |     scale_y_continuous(breaks = c(1,2,3),
 439 |                        labels = c('E','N','G'),limits = c(1,3)) + 
 440 |     scale_x_continuous(breaks = c(1,2,3),
 441 |                        labels = c('C','N','L'),limits = c(3,1), trans = 'reverse')+
 442 |     ylab('Good/Evil') +
 443 |     xlab('Chaotic/Lawful') +
 444 |     scale_color_manual(values = barPalette) ->p# + 
 445 |     # scale_color_manual(values = barPalette) 
 446 | 
 447 | ply = plotly::ggplotly(p) %>%  layout(xaxis=list(fixedrange=TRUE)) %>%
 448 |     config(displayModeBar = F) %>% 
 449 |     layout(yaxis=list(fixedrange=TRUE))
 450 | ply$width = 500
 451 | ply$height = 300
 452 | # rmarkdown seems to ignore alignment set using plotly
 453 | div(ply,align = 'center')
 454 | 
 455 | ```
 456 | 
 457 | Take that racism! Half-orcs tend to be nicer characters than humans (Disclaimer: Like most, if not all of the one to one comparisons you can make here, difference between Half-Orc and Human "goodness" is not statistically significant, p = `r alignmentTable %>% filter(processedRace %in% c('Human','Half-Orc')) %>% mutate(good = as.integer(good))  %>% lm(good~processedRace,data=.) %>% summary %$% coefficients %>% {.[2,4]} %>% round(digits = 2)`). Alas, Tieflings
 458 | are as close to Chaotic Stupid as they are stereotyped as.
 459 | 
 460 | <!-- ## Are your skills rare? -->
 461 | 
 462 | <!-- Skills are this is where I get bored -->
 463 | 
 464 | <!-- ```{r skills} -->
 465 | <!-- uniqueTable$skills %>% str_split('\\|') %>% unlist %>% table %>% sort -->
 466 | <!-- ``` -->
 467 | 
 468 | ## Are your feat choices rare?
 469 | 
 470 | Jeremy Crawford once [tweeted](https://twitter.com/jeremyecrawford/status/969020122177331200?lang=en)
 471 | 
 472 | > Another piece of D&D data: a majority of D&D characters don't use feats. Many players love the customization possible with feats, but a larger group of players is happy to make characters without feats. Feats are, therefore, not a driving force behind many players' choices. 
 473 | 
 474 | We can see whether or not our data agrees. On a surface look `r round(sum(uniqueTable$feats!='')/nrow(uniqueTable)*100)`% of all characters
 475 | have at least one feat. However, this is partially caused by  the fact that a significant portion (`r round(sum(uniqueTable$level %in%  c(1,2,3))/nrow(uniqueTable)*100)`%) 
 476 | of our characters are between levels 1-3 and unless they are variant humans, they cannot have feats. We can see that by higher levels, feat adoption rates increase significantly, suggesting that once given the opportunity, players are likely to pick a feat.
 477 | 
 478 | 
 479 | ```{r featProportions,fig.height=4.3}
 480 | uniqueTable %>% 
 481 |     filter(!is.na(levelGroup)) %>% 
 482 |     group_by(levelGroup) %>% 
 483 |     mutate(levelGroup2 = paste0(levelGroup,'\n(',n(),' chars)')) %>% 
 484 |     ungroup() %>% 
 485 |     arrange(levelGroup) %>% 
 486 |     mutate(levelGroup2 = factor(levelGroup2, levels = unique(levelGroup2))) %>% 
 487 |     group_by(levelGroup2) %>% 
 488 |     summarise(featPopularity = sum(feats!='')/n()*100) %>%
 489 |     ggplot(aes(x = levelGroup2,y = featPopularity)) +
 490 |     geom_text(aes(label = paste(round(featPopularity),'%')),vjust=-0.25) + 
 491 |     geom_bar(stat = 'identity') +
 492 |     ylab('% with at least one feat') + xlab('Level Interval') + 
 493 |     ggtitle('Feat adoption by character levels')
 494 | 
 495 | commonPlayTable = uniqueTable %>% filter(as.integer(levelGroup) %in% c(2,3,4))
 496 | commonPlayFeatRate = commonPlayTable %>% {sum(.$feats!='')/nrow(.)}
 497 | 
 498 | ```
 499 | 
 500 | It can be postulated players spend most of their time between levels 4-15. `r round(commonPlayFeatRate*100)`% of all characters
 501 | in this range has at least one feat. As I later discovered, this also somewhat correlates with the
 502 | [data in DnDBeyond](https://twitter.com/BadEyeAdam/status/969435420676231169) though the percentages
 503 | here are higher overall.
 504 | 
 505 | **Note:** I am getting messages about how this clearly shows how Crawford was super wrong.
 506 | That's not very accurate. It is true that my data shows a higher proportion of feat
 507 | adoption than the D&D beyond data, however we cannot conclusively reject the statement
 508 | "a majority of D&D characters don't use feats" due to possible sampling errors. If we 
 509 | take level 4-15 interval into consideration, our sample size is `r nrow(commonPlayTable)`.
 510 | Based on this we have a `r round(sqrt( commonPlayFeatRate*(1- commonPlayFeatRate)/nrow(commonPlayTable)) * 1.96*100)`% margin of error (95% confidence) on that `r round(commonPlayFeatRate*100)`%.
 511 | 
 512 | Next step is to examine which classes picks which feats, and which feats 
 513 | are the most popular. The graph below shows which feat is selected the most and by which 
 514 | class. Multiclassed characters are merged into their own category to reduce clutter.
 515 | Any feat that is selected only twice or less is removed. Again, mouse over the bars to see details.
 516 | 
 517 | ```{r featBar}
 518 | featedChars = uniqueTable %>%
 519 |     filter(feats!='') %>%
 520 |     mutate(justClass = {justClass[grepl('\\|',justClass)] = 'Multiclassed';justClass}) %>% 
 521 |     filter(justClass %in% names(which(table(justClass)>1)))
 522 | class = featedChars$justClass
 523 | feats = featedChars$feats
 524 | 
 525 | uniqueFeats = feats %>% str_split('\\|') %>% unlist %>% unique %>% na.omit()
 526 | 
 527 | featPicks =  feats %>% str_split('\\|') 
 528 | 
 529 | names(featPicks) = class
 530 | 
 531 | featFrame = 
 532 |     featPicks %>% melt %>% {names(.) = c('Feat','Class');.} %>%
 533 |     mutate(Feat = factor(Feat,levels = names(sort(table(Feat),decreasing = TRUE)))) %>% 
 534 |     filter(Feat %in% names(which(table(Feat)>2))) %>% group_by(Feat,Class) %>% summarize(Count = n())
 535 | 
 536 | 
 537 | featFrame %>% 
 538 |     ggplot(aes(x = Feat,y = Count, fill = Class)) +
 539 |     geom_bar(stat = 'identity') +
 540 |     xlab('') + 
 541 |     theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + 
 542 |     scale_fill_manual(values = barPalette)+
 543 |     ggtitle('Feat popularity and class prefence')->p
 544 | 
 545 | ply = plotly::ggplotly(p) %>% config(displayModeBar = F) %>% layout(xaxis=list(fixedrange=TRUE)) %>% layout(yaxis=list(fixedrange=TRUE))
 546 | 
 547 | ply$height = 500
 548 | 
 549 | div(ply,align='center')
 550 | # version of this code that splits multiclasses into components. results
 551 | # in an ugly graph
 552 | # singleClassed %>% filter(!is.na(feats))
 553 | 
 554 | # 
 555 | # feats = uniqueTable$feats %>% str_split('\\|') %>% unlist %>% na.omit%>%unique
 556 | # 
 557 | # classes = uniqueTable$justClass %>% str_split('\\|') %>% unlist %>% unique
 558 | # featCoOccurence = matrix(0,nrow = length(feats),ncol = length(classes))
 559 | # 
 560 | # for (i in seq_along(feats)){
 561 | #     for (j in seq_along(classes)){
 562 | #         ((grepl(feats[i],uniqueTable$feats,perl= TRUE)) * {
 563 | #             classLevel  =str_extract(uniqueTable$class,glue('(?<={classes[j]} )[0-9]+')) %>% {.[is.na(.)] = 0;.} %>% as.integer()
 564 | #             classLevel/uniqueTable$level
 565 | #             }) %>% sum -> coOcc
 566 | #         featCoOccurence[i,j] = coOcc
 567 | #     }
 568 | # }
 569 | # 
 570 | # colnames(featCoOccurence) = classes
 571 | # rownames(featCoOccurence) = feats
 572 | # 
 573 | # featCoOccurence = featCoOccurence[,!featCoOccurence %>% apply(2,sum) %>% {.<2}]
 574 | # featCoOccurence = featCoOccurence[!featCoOccurence %>% apply(1,sum) %>% {.<1},]
 575 | # 
 576 | # featFrame = featCoOccurence %>% melt %>% filter(value!=0) %>% {names(.)=c('Feat','Class','Count');.}
 577 | # popFeat = featFrame %>% group_by(Feat) %>% summarize(total = sum(Count)) %>% arrange(desc(total)) %>% filter(total>1)
 578 | # featFrame %<>% 
 579 | #     filter(Feat %in% popFeat$Feat) %>%
 580 | #     mutate(Feat = factor(Feat,levels = popFeat$Feat),
 581 | #            Class = factor(Class, levels = sort(as.character(unique(Class)))))
 582 | # 
 583 | # 
 584 | # featFrame %>% 
 585 | #     ggplot(aes(x = Feat,y = Count, fill = Class)) + geom_bar(stat = 'identity') +
 586 | #     xlab('') + 
 587 | #      theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + 
 588 | #     scale_fill_manual(values = c('#7DD4A6','#C15BC5','#D65242','#415455','#D2A75C','#8FD25B',
 589 | #                                  '#D15B86','#A5B5BE','#727EC6','#567441','#754334','#5E3A60'))+
 590 | #     ggtitle('Feat popularity and class prefence')->p
 591 | # 
 592 | # ply = plotly::ggplotly(p)
 593 | # 
 594 | # ply$height = 500
 595 | # 
 596 | # div(ply,align='center')
 597 | 
 598 | ```
 599 | 
 600 | It is surprising that Elven Accuracy, a feat that is added in a supplement and restricted to elves, is as 
 601 | popular as many core book feats that are known to be highly effective. `r uniqueTable %>% filter(grepl('elf|Variant',race,ignore.case = TRUE)) %$% feats %>% {grepl('Elven A',.)} %>% {sum(.)/length(.)*100} %>% round`% of all elves and half-elves have this feat. Its appeal to both
 602 | ranged weapon attackers and casters seems to make it a good choice for elves from many walks
 603 | of life. Another interesting bit 
 604 | is that the Magic Initiate feat seems be very popular amongst classes with spellcasting ability. I was
 605 | always under the impression that Magic Initiate's main use case would be to add some magic to a mundane
 606 | class.
 607 | 
 608 | We can also look into how feats synergize with each other. The network below shows how often feats 
 609 | are selected together. Unique connections are removed. Node sizes represent how many times a feat appeared together with another feat. The thickness of the lines between the nodes are determined
 610 | by the number of characters both feats appear in.
 611 | 
 612 | ```{r featNetwork,fig.width=7.5,fig.height=7.5}
 613 | featCoOccurence = uniqueTable %>% filter(grepl("\\|",feats)) %$% feats
 614 | uniqueFeats = featCoOccurence %>% strsplit('\\|') %>% unlist %>% table %>% sort(decreasing = TRUE) %>% 
 615 |     {.[.>0]}%>% names
 616 | adjMatrix = matrix(0,nrow= length(uniqueFeats),ncol = length(uniqueFeats))
 617 | 
 618 | 
 619 | for (i in seq_along(uniqueFeats)){
 620 |     for(j in seq_along(uniqueFeats)){
 621 |         if(i !=j){
 622 |             feati = grepl(x = featCoOccurence,
 623 |                           pattern = paste0('(\\||^)',uniqueFeats[i], ('(\\||$)')))
 624 |             featj = grepl(x = featCoOccurence,
 625 |                           pattern = paste0('(\\||^)',uniqueFeats[j], ('(\\||$)')))
 626 |             
 627 |             adjMatrix[i,j] = sum(feati & featj)
 628 |         }
 629 |     }
 630 | }
 631 | uniqueFeats %<>% str_replace(' ','\n')
 632 | 
 633 | 
 634 | 
 635 | rownames(adjMatrix) = uniqueFeats
 636 | colnames(adjMatrix) = uniqueFeats
 637 | 
 638 | threshold = 1
 639 | adjMatrix = adjMatrix-threshold
 640 | adjMatrix[adjMatrix < 1] = 0
 641 | zeroFilter = adjMatrix %>% apply(1,sum) %>% {.!=0}
 642 | adjMatrix = adjMatrix[zeroFilter,zeroFilter]
 643 | 
 644 | 
 645 | network=graph_from_adjacency_matrix( adjMatrix, weighted=T, mode="undirected", diag=F)
 646 | E(network)$width <- E(network)$weight*2.5
 647 | 
 648 | maxWeight = E(network)$weight %>% max
 649 | maxStrength = strength(network) %>% max
 650 | par(mar=c(0,0,1,0))
 651 | 
 652 | set.seed(9)
 653 | plot(network,
 654 |      vertex.frame.color="white",
 655 |      vertex.label.color="black",
 656 |      vertex.size = strength(network)*1.5,
 657 |      main = 'Feat synergy network',
 658 |      asp = 1)
 659 | ```
 660 | 
 661 | Before I say anything, I have to declare the connections in this graph aren't particularly
 662 | strong. There are
 663 | too many feats and I have too few characters for high number of feats to appear together.
 664 | The strongest link in this graph is based on `r max(adjMatrix)+1` observations.
 665 | 
 666 | Yet, as it stands, the connections seem quite intuitive, so we are probably not staring at noise here.
 667 | Robustness of elven accuracy is visible in this graph as it is both selected by 
 668 | characters trying to optimize their ranged and spell attacks.
 669 | Crossbow Expert-Sharpshooter is known to be an effective combination to boost damage.
 670 | Sentinel-Polearm Master is amazing for battlefield control.
 671 | 
 672 | ## Is your multiclass combination rare?
 673 | 
 674 | Since our dataset includes multiclassed characters, we can see which classed tend to appear
 675 | together. Note that our sample size much smaller here (`r nrow(multiClassed)` characters). Node sizes in the 
 676 | network below show how many times a class appeared in all multiclassed characters. The thickness of the lines between the nodes
 677 | are determined by the number of characters both classes appear in. For instance, we see that most rangers
 678 | multiclass with rogues, while most rogues multiclass with fighters.
 679 | 
 680 | ```{r multiClassingNetwork}
 681 | 
 682 | coOccurence = multiClassed$justClass
 683 | # in case I need them ordered
 684 | uniqueClasses =   coOccurence %>% 
 685 |     strsplit('\\|') %>%
 686 |     unlist %>% 
 687 |     table %>% 
 688 |     sort(decreasing = TRUE) %>%
 689 |     names
 690 | uniqueClasses = uniqueClasses[uniqueClasses %in% legitClasses]
 691 | 
 692 | adjMatrix = matrix(0,nrow= length(uniqueClasses),ncol = length(uniqueClasses))
 693 | 
 694 | for (i in seq_along(uniqueClasses)){
 695 |     for(j in seq_along(uniqueClasses)){
 696 |         if(i !=j){
 697 |             adjMatrix[i,j] = sum(grepl(x = coOccurence,pattern = uniqueClasses[i]) &  grepl(x = coOccurence,pattern = uniqueClasses[j]))
 698 |         }
 699 |     }
 700 | }
 701 | rownames(adjMatrix) = uniqueClasses
 702 | colnames(adjMatrix) = uniqueClasses
 703 | network=graph_from_adjacency_matrix( adjMatrix, weighted=T, mode="undirected", diag=F)
 704 | E(network)$width <- E(network)$weight
 705 | 
 706 | maxWeight = E(network)$weight %>% max
 707 | maxStrength = strength(network) %>% max
 708 | par(mar=c(0,0,1,0))
 709 | 
 710 | plot(network,layout = layout_in_circle,
 711 |      vertex.frame.color="white",
 712 |      vertex.label.color="black",
 713 |      vertex.size = strength(network),
 714 |      main = 'Multiclassing network',
 715 |      asp = 1
 716 |      )
 717 | 
 718 | 
 719 | ```
 720 | 
 721 | While this network is good to show which classes tend to be chosen together, it doesn't
 722 | give much information about how classes are distributed. In the below graph we look at
 723 | what is ratio of class levels in individual characters. A Fighter 5/Rogue 15 would appear
 724 | as a 25% data point in the Fighter column and 75% in the Rogue column. This will give
 725 | us information about which classes are dipped in and which ones are used as the main class.
 726 | 
 727 | ```{r multiClassingProportions}
 728 | 
 729 | multiClassProportion = lapply(uniqueClasses, function(x){
 730 |     classSubset = multiClassed %>% filter(grepl(x,justClass))
 731 |     
 732 |     classLevel = classSubset$class %>% 
 733 |         str_extract(glue('{x} [0-9]+')) %>% 
 734 |         str_extract('[0-9]+') %>%
 735 |         as.integer
 736 |     
 737 |     classLevel/classSubset$level
 738 |     
 739 | })
 740 | 
 741 | multiClassTotalLevel =  lapply(uniqueClasses, function(x){
 742 |     classSubset = multiClassed %>% 
 743 |         filter(grepl(x,justClass))
 744 |     totalLevel = classSubset$level
 745 | 
 746 | })
 747 | 
 748 | multiClassChar = lapply(uniqueClasses, function(x){
 749 |     classSubset = multiClassed %>% 
 750 |         filter(grepl(x,justClass))
 751 |     classInfo = classSubset$class
 752 | 
 753 | })
 754 | 
 755 | names(multiClassProportion) = uniqueClasses
 756 | names(multiClassTotalLevel) = uniqueClasses
 757 | 
 758 | multiClassProportion %<>%
 759 |     melt
 760 | order = multiClassProportion %>% 
 761 |     group_by(L1) %>% 
 762 |     summarise(mean = mean(value)) %>% 
 763 |     arrange(desc(mean)) %$% L1
 764 | 
 765 | multiClassProportion$L1 %<>% factor(levels = order)
 766 | multiClassTotalLevel %<>% melt
 767 | multiClassChar %<>% melt
 768 | multiClassProportion = cbind(multiClassProportion,multiClassTotalLevel$value,multiClassChar$value)
 769 | 
 770 | names(multiClassProportion) = c('ClassProp','Class','Level','Char')
 771 | 
 772 | 
 773 | multiClassProportion %<>% mutate(ClassProp = round(ClassProp * 100,digits = 2))
 774 | 
 775 | multiClassProportion %>%
 776 |     ggplot(aes(x = Class, y = ClassProp, label = Char)) + 
 777 |    geom_violin(color = "#C4C4C4", fill = "#C4C4C4") +
 778 |     geom_jitter(alpha = .5,width = 0.1) +
 779 |     theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 )) + 
 780 |     ylab('% of character level') +
 781 |     xlab('') ->p
 782 | 
 783 | ply = ggplotly(p) %>%  layout(xaxis=list(fixedrange=TRUE)) %>%
 784 |     config(displayModeBar = F) %>% 
 785 |     layout(yaxis=list(fixedrange=TRUE))
 786 | ply$width = 600
 787 | ply$x$data[[2]]$text = multiClassProportion$Char
 788 | ply$x$data[[1]]$hoverinfo = 'none'
 789 | div(ply,align = 'center')
 790 | 
 791 | ```
 792 | 
 793 | While there is a high amount of variation in the data, some conventional wisdom 
 794 | pops up through the means. Warlock is famous for its dipping potential and a Cleric
 795 | level synergizes nicely with many other class features. I am a proud player of a Cleric
 796 | dipped Fighter myself. I would avoid reading too much into this though. The variance
 797 | is too high and sample size is too low to make reliable inferences.
 798 | 
 799 | And finally let's see which classes tend to appear in multiclassed builds compared
 800 | to single classed ones
 801 | 
 802 | ```{r mutliVsSingle}
 803 | 
 804 | totalClass = uniqueClasses %>% sapply(function(x){grepl(x,uniqueTable$justClass) %>% sum})
 805 | multiClass = uniqueClasses %>% sapply(function(x){grepl(x,multiClassed$justClass) %>% sum})
 806 | 
 807 | multiProps = sort(multiClass/totalClass,decreasing = TRUE)
 808 | 
 809 | data.frame(Class = factor(names(multiProps),levels = names(multiProps)),Prop = multiProps*100) %>% 
 810 |     ggplot(aes(x = Class, y= Prop)) + 
 811 |      geom_bar(stat = 'identity') +
 812 |     xlab('') + 
 813 |         geom_text(aes(label = paste(round(Prop),'%')),vjust=-0.25) + 
 814 |      theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 ))  + ylab('% in multiclassed build')
 815 | ```
 816 | 
 817 | 
 818 | ## Is power gaming rare?
 819 | 
 820 | ```{r multiClassingAndFeats}
 821 | highLevel = uniqueTable %>% filter(levelGroup %>% as.integer %>% {.>1})
 822 | HLmultiClassed = highLevel %>% filter(grepl('\\|',justClass))
 823 | HLsingleClassed = highLevel %>% filter(!grepl('\\|',justClass))
 824 | 
 825 | singleClassedFeaters = sum(HLsingleClassed$feats!='')
 826 | 
 827 | multiClassedFeaters = sum(HLmultiClassed$feats!='')
 828 | pVal = phyper(multiClassedFeaters,
 829 |               multiClassedFeaters + singleClassedFeaters,
 830 |               sum(uniqueTable$feats!=''),
 831 |               nrow(multiClassed), lower.tail = FALSE, log.p = FALSE)
 832 | ```
 833 | 
 834 | Ok that title is a stretch, but we have a format to stick to.
 835 | 
 836 | Both multiclassing and picking feats are somewhat advanced character building rules.
 837 | While making the character building process complicated, they can be used to create frighteningly 
 838 | affective combinations (or get stuck waiting till the end of the campaign till their build gets
 839 | everything they want). Intuitively, it wouldn't be surprising to see that multiclassers are more likely
 840 | to get feats to optimize their builds. Indeed, we see that `r round(multiClassedFeaters/nrow(HLmultiClassed)*100)`% 
 841 | of  multiclassed characters above level 3 chose to get a feat as opposed to 
 842 | `r round(singleClassedFeaters/nrow(HLsingleClassed)*100)`% of single classed counterparts.
 843 | A modest yet statistically significant difference (p=`r format.pval(pVal,digits = 2)`).
 844 | 
 845 | ## Are your spells rare?
 846 | 
 847 | Like alignment, spells were annoying to deal with. The app only allows writing free
 848 | text as spells and doesn't automatically fill anything other than cleric domain spells.
 849 | Some casters don't even seem to bother with filling anything and when they do, they sometimes
 850 | shorten the name of the spell or add things like damage dice next to it. Thanks to
 851 | some computer magic (string distances to all existing spell names), we can identify
 852 | what they are trying to say with a satisfying accuracy. The low level heavy nature of the
 853 | dataset also strikes again as higher level spells appear less and less frequent.
 854 | 
 855 | Below you see how frequently a spell is chosen by each class. Spells chosen
 856 | by less than 3 people are removed. I also totally ignored multiclassed characters
 857 | here because I'm not going to bother with trying to decide which spell came from
 858 | which class.
 859 | 
 860 | If you don't see high level spells that means not enough people agreed on any particular
 861 | spells to make it to the table
 862 | 
 863 | ```{r spells}
 864 | # for (x in c('Wizard','Cleric','Sorcerer','Druid','Warlock','Bard')){
 865 | c('Wizard','Cleric','Sorcerer','Druid','Warlock','Bard') %>% lapply(function(x){
 866 |     
 867 |     classFrame = singleClassed[singleClassed$justClass %in% x,] 
 868 |     
 869 |     spellNames = classFrame %$% processedSpells %>% strsplit('\\|') %>% map(str_extract,'.*?(?=\\*)') %>% unlist
 870 |     spellLevels = classFrame %$% processedSpells %>% strsplit('\\|') %>% map(str_extract,'(?<=\\*).*') %>% unlist %>% as.integer
 871 |     
 872 |     levelCount = classFrame %$% processedSpells %>% strsplit('\\|') %>% map(str_extract,'(?<=\\*).*') %>% map(unique) %>% unlist %>% as.integer() %>% table
 873 | 
 874 |     
 875 |     frame = data.frame(spellNames,spellLevels,levelCount = as.integer(levelCount[spellLevels %>% as.character]),stringsAsFactors = FALSE) %>% 
 876 |         arrange(spellLevels) %>% group_by(spellNames,spellLevels,levelCount) %>% summarise(count = n()) %>% ungroup() %>% arrange(spellLevels,desc(count)) %>% 
 877 |         mutate(`%` = round(count/levelCount*100)) %>% filter(levelCount>1 & `%` >10 & count>2)
 878 |     
 879 |     groupSep = frame$spellLevels %>% duplicated() %>% not %>% which
 880 |     groupSep = c(groupSep,nrow(frame))
 881 |     groupLevels= frame$spellLevels %>% unique %>% paste('Level',.)
 882 |     groupLevels[groupLevels %in% 'Level 0'] = 'Cantrip'
 883 |     
 884 |     frame %<>% select(spellNames,count,`%`)
 885 |     
 886 |     kbl = kable(frame,caption = x,format = 'html') %>% 
 887 |         kable_styling("striped", full_width = F) 
 888 |     
 889 |     for(i in seq_along(groupLevels)){
 890 |         kbl %<>% group_rows(groupLevels[i],groupSep[i],groupSep[i+1]-1)
 891 |     }
 892 |     
 893 |     kbl %>%  scroll_box(width = "100%", height = "250px") %>% HTML()
 894 | }) -> tables
 895 | 
 896 | 
 897 | 
 898 | 
 899 | div(
 900 |     fluidRow(
 901 |         column(4,
 902 |                tables[[1]]),
 903 |         column(4,
 904 |                tables[[2]]),
 905 |         column(4,
 906 |                tables[[3]])),
 907 |     fluidRow(
 908 |         column(4,
 909 |                tables[[4]]),
 910 |         column(4,
 911 |                tables[[5]]),
 912 |         column(4,
 913 |                tables[[6]])
 914 |         
 915 |     )
 916 | )
 917 | 
 918 | ```
 919 | 
 920 | ## Is your game day rare?
 921 | 
 922 | My applications are they are purely utilitarian. One gives you
 923 | a character sheet, the other is an interactive character sheet that automates your dice roll.
 924 | It is somewhat reasonable to think that most people would be using them shortly before or during a game. Graphs below
 925 | how many characters were created in each day of the week and below that there's a punch card that 
 926 | shows individual hours. 
 927 | 
 928 | 
 929 | 
 930 | ```{r gameDay,fig.height=8}
 931 | reliableDateTable = uniqueTable %>% filter(as.Date(date) >  as.Date('2018-04-16'))
 932 | 
 933 | days = reliableDateTable$ date %>% weekdays()
 934 | hours = as.POSIXlt( reliableDateTable$date)$hour
 935 | 
 936 | time = data.frame(days = factor(days, levels = c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday')), hours = hours)
 937 | 
 938 | 
 939 | time %>% group_by(days,hours) %>% summarise(Characters = n()) %>%
 940 |     ggplot(aes(x = days,y = hours,size = Characters)) + geom_point() +
 941 |     theme(axis.text.x = element_text(angle = 90,vjust = 0.5,hjust = 1 ),
 942 |           plot.margin = unit(c(0,0,0,0),'cm')) +
 943 |     xlab('') +
 944 |     ylab('Hour of Day') -> plot1
 945 | 
 946 | time %>% group_by(days) %>% summarise(Characters = n()) %>% 
 947 |     ggplot(aes(x = days,y = Characters)) + geom_bar(stat ='identity') + 
 948 |     theme(axis.text.x = element_blank(),
 949 |           axis.ticks.x = element_blank(),
 950 |           plot.margin = unit(c(0,0,0,0),'cm'))+
 951 |     xlab('') + ggtitle('Time of character submission') -> plot2
 952 | 
 953 | plot2/plot1 + plot_layout(ncol = 1, heights = c(3,5))
 954 | 
 955 | ```
 956 | 
 957 | Frankly not much to be said here. Most popular days of the week are obviously weekends and Friday. DnD 
 958 | takes time. More work = less DnD.
 959 | Hours of day are somewhat unreliable as I didn't correct for user time zones. US alone,  which seems to be 
 960 | where most my users are coming from, can
 961 | have 3 hours of difference. I could use IPs and detect locations to fix times but not 
 962 | going into that rabbit hole... How long before the game a player may want their character 
 963 | sheet is also a great source of variability. I mostly did this because I like punch cards...
 964 | 
 965 | ## About the data
 966 | 
 967 | Unique characters are acquired by grouping the characters that share the same name and class
 968 | and picking the higher level version. This could have merged independent characters with tropey names
 969 | like Grognak the Barbarian of Drizzt the Ranger but manual examination of the data showed no cases of characters
 970 | who appear to be made by different people but still has the same name and class. 
 971 | 
 972 | If a multiclassed character shares name with a single
 973 | classed character, I assume they are duplicates if the single classed character is lower level and
 974 | its class matches with one of the classes of the multiclassed character. 
 975 | 
 976 | Any character above level 20 (there were `r sum(charTable$level > 20)`) were removed. 
 977 | 
 978 | `r sum(grepl('Revised',keepRevised$class,ignore.case = TRUE))` Revised Rangers were merged back into
 979 | the ranger class. 
 980 | 
 981 | Most percentages are rounded to the nearest integer.
 982 | 
 983 | As all data, this data comes with caveats. It is a subset of all DnD players who are using a
 984 | particular mobile application who also know about and use my applications and consented
 985 | to let me to keep their character sheets. I don't have reason
 986 | to think that these would be enriching certain character building choices but it's
 987 | something to keep in mind.
 988 | 
 989 | 
 990 | ```{r statistics}
 991 | fighterCount = grepl('Fighter',uniqueTable$class) %>% sum
 992 | battleMasterCount = uniqueTable$subclass %>% str_split('\\|') %>% unlist %>% {. %in% 'Battle Master'} %>% sum
 993 | battleMasterPercent = battleMasterCount/fighterCount
 994 | bmConfInf = sqrt(battleMasterPercent*(1-battleMasterPercent)/fighterCount) * 1.96
 995 | 
 996 | 
 997 | championCount = uniqueTable$subclass %>% str_split('\\|') %>% unlist %>% {. %in% 'Champion'} %>% sum
 998 | championPercent = championCount/fighterCount
 999 | cmConfInf= sqrt(championPercent*(1-championPercent)/fighterCount) * 1.96
1000 | 
1001 | 
1002 | ```
1003 | In most parts of this document no information is provided about whether or not the differences
1004 | are actually statistacilly significant. Sorry about that. Didn't want to fill this place with
1005 | too much math. For instance we can see that we have
1006 | `r battleMasterCount` battle masters
1007 | vs `r championCount` champions. This is not a statistically significant difference based on our sample size
1008 | so we cannot state with high confidence that one is more popular than the other.
1009 | 
1010 | If you are interested in significance of any of these measures, you can take a peak at this [article](https://en.wikipedia.org/wiki/Margin_of_error) on wikipedia where formulas needed are explained.
1011 | For some of these at least you should be able to get the information you need from the article.
1012 | 
1013 | If you have any questions, you can [mail me](mailto:ogan.mancarci@gmail.com). Mention "dndstats"
1014 | somewhere in the
1015 | text so you won't be sent to spam.
1016 | 
1017 | 
1018 | ## Data access
1019 | 
1020 | This dataset is present in 2 forms: in its entirety that includes duplicates
1021 | of characters and filtered version that only includes unique characters.
1022 | 
1023 | Go [here](https://github.com/oganm/dndstats/blob/master/docs/charTable.tsv) for the complete data and [here](https://github.com/oganm/dndstats/blob/master/docs/uniqueTable.tsv) for the filtered one. Click the raw button
1024 | to get them in plain text. Both have the same columns as explained below. 
1025 | The code to generate these tables can be found [here](https://github.com/oganm/dndstats/blob/master/dataProcess.R).
1026 | 
1027 | Below are the descriptions of the columns in the files. If you think something you'd be interested
1028 | in is missing, you can let me know.
1029 | 
1030 | **name:** This column has hashes that represent character names. If the hashes are
1031 | the same, that means the names are the same. Real names are removed
1032 | to protect character anonymity. Yes D&D characters have rights.
1033 | 
1034 | **race:** This is the race field as it come out of the application. It is not really
1035 | helpful as subrace and race information all mixed up together and unevenly available.
1036 | It also includes some homebrew content. You probably want to use the **processedRace**
1037 | column if you are interested in this.
1038 | 
1039 | **background:** Background as it comes out of the application.
1040 | 
1041 | **date:** Time & date of input. Dates before 2018-04-16 are unreliable as some has accidentally changed
1042 | while moving files around.
1043 | 
1044 | **class:** Class and level. Different classes are separated by `|` when needed.
1045 | 
1046 | **justClass:** Class without level. Different classes are separated by `|` when needed.
1047 | 
1048 | **subclass:** Subclasses. Again, separated by `|` when needed.
1049 | 
1050 | **level:** Total character level.
1051 | 
1052 | **feats:** Feats chosen by character. Separated by `|` when needed.
1053 | 
1054 | **HP:** Character HP.
1055 | 
1056 | **AC:** Character AC.
1057 | 
1058 | **Str, Dex, Con, Int, Wis, Cha:** ability scores
1059 | 
1060 | **alignment:** Alignment free text field. It is a mess, don't touch it. See **processedAlignment**,**good** and **lawful** instead.
1061 | 
1062 | **skills:** List of skills with proficiency.  Separated by `|`.
1063 | 
1064 | **weapons:** List weapons. Separated by `|`. It is somewhat of a mess as it allows free text inputs. See **processedWeapons**.
1065 | 
1066 | **spells:** List of spells and their levels. Spells are separated by `|`s. Each spell has its level next to it
1067 | separated by `*`s. This is a huge mess as its a free text field and some users included things like damage dice in them. See **processedSpells**.
1068 | 
1069 | **day:** A shortened version of **date**. Only includes day information.
1070 | 
1071 | **processedAlignment:** Processed version of the **alignment** column. Way people wrote up their alignments are manually sifted through and assigned to the matching aligmment. First character represents lawfulness (L, N, C), second one goodness (G,N,E). An empty string means alignment wasn't written or unclear.
1072 | 
1073 | **good, lawful:** Isolated columns for goodness and lawfulness.
1074 | 
1075 | **processedRace:** I have gone through the way **race** column is filled by the app and asigned them to correct
1076 | races. If empty, indiciates a homebrew race not natively supported by the app.
1077 | 
1078 | **processedSpells:** Formatting is same as the **spells** column but it is cleaned up.  Using string similarity I tried
1079 | to match the spells to the full list of spells available in the official publications. The spell is removed if the spell I guessed does not have the correct level or doesn't include all words of the original spell and has too many modifications to be recognizable. It may have a few false matches but it should be mostly fine
1080 | 
1081 | **processedWeapons:** Similar to **processedSpells**, **weapons** column is matched to the closest official weapon with some restrictions.
1082 | 
1083 | **levelGroup:** splits levels into groups as used in the feat percentage plot. Only present in the filtered data
1084 | but easy enough to make on your own.
1085 | 
1086 | 
1087 | ## About this document
1088 | 
1089 | The text of this document is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license
1090 | 
1091 | [Here](https://github.com/oganm/DnDStatistics/blob/master/docs/index.Rmd)'s its source code. It's not pretty. 
1092 | 
1093 | The code blocks within the source code is licensed under [MIT license](https://opensource.org/licenses/MIT).
1094 | 
1095 | ## Changelog
1096 | 
1097 | **9 September 2018:**
1098 | * Data from 100 more characters added.
1099 | 
1100 | **19 August 2018:**
1101 | 
1102 | * Typo in data release. Same name hash means names are the same not characters.
1103 | * Alignment flip again to match memes
1104 | 
1105 | **18 August 2018 2:**
1106 | 
1107 | * Fix bug that counts the percentage of people who wrote their alignments down wrong
1108 | * Flip alignment axes
1109 | * Disclaimer about feat adoption
1110 | 
1111 | **18 August 2018:**
1112 | 
1113 | * Data from additional 82 characters incorporated. No significant changes observed.
1114 | * Links to the data added
1115 | * Spell information added
1116 | * Feat bar plot now filters any feat that is taken less than 3 times instead of 2
1117 | 
1118 | **2 August 2018:** 
1119 | 
1120 | * License information added. 
1121 | * A forgotten word added.
1122 | * Data from 40 additional characters incorporated. No significant changes observed.
1123 | * Claim about increased decency of Half-Orcs softened
1124 | * Changelog added
1125 | 
1126 | **28 July 2018:** 
1127 | 
1128 | * Initial release
1129 | 
1130 | 
1131 | 
1132 | 
1133 | 


--------------------------------------------------------------------------------