├── .Rhistory ├── README.md ├── figures ├── use.png ├── wpp.png ├── fear.png ├── g.graphml ├── d3net.html ├── gender1.png ├── gender2.png ├── static.png ├── things.png ├── topicdiffs.png ├── topicprops.png ├── topiccluster.png ├── topicprops2.png └── topicprops3.png ├── data └── dayofarchaeology.csv ├── 006_kmeans_for_groups_of_similar_authors.r ├── 005_topic_similarity_matrix.r ├── 001_scrape_for_links_to_fulltext.r ├── 007_visualise_author_relationships.r ├── 008_analyse_commenter_author_relationships.r ├── 002_scrape_for_fulltext_etc_from_links.r ├── 004_generate_topic_model.r ├── 003_clean_fulltext.r └── README.html /.Rhistory: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/README.md -------------------------------------------------------------------------------- /figures/use.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/use.png -------------------------------------------------------------------------------- /figures/wpp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/wpp.png -------------------------------------------------------------------------------- /figures/fear.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/fear.png -------------------------------------------------------------------------------- /figures/g.graphml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/g.graphml -------------------------------------------------------------------------------- /figures/d3net.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/d3net.html -------------------------------------------------------------------------------- /figures/gender1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/gender1.png -------------------------------------------------------------------------------- /figures/gender2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/gender2.png -------------------------------------------------------------------------------- /figures/static.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/static.png -------------------------------------------------------------------------------- /figures/things.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/things.png -------------------------------------------------------------------------------- /figures/topicdiffs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/topicdiffs.png -------------------------------------------------------------------------------- /figures/topicprops.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/topicprops.png -------------------------------------------------------------------------------- /data/dayofarchaeology.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/data/dayofarchaeology.csv -------------------------------------------------------------------------------- /figures/topiccluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/topiccluster.png -------------------------------------------------------------------------------- /figures/topicprops2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/topicprops2.png -------------------------------------------------------------------------------- /figures/topicprops3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benmarwick/dayofarchaeology/HEAD/figures/topicprops3.png -------------------------------------------------------------------------------- /006_kmeans_for_groups_of_similar_authors.r: -------------------------------------------------------------------------------- 1 | #' Use kmeans to identify groups of similar authors 2 | 3 | km <- kmeans(topic_df_dist, n.topics) 4 | # get names for each cluster 5 | allnames <- vector("list", length = n.topics) 6 | for(i in 1:n.topics){ 7 | allnames[[i]] <- names(km$cluster[km$cluster == i]) 8 | } 9 | 10 | # Here's the list of authors by group 11 | allnames -------------------------------------------------------------------------------- /005_topic_similarity_matrix.r: -------------------------------------------------------------------------------- 1 | #' Calculate similarity matrix 2 | #' Shows which documents are similar to each other 3 | #' by their proportions of topics. Based on Matt Jockers' method 4 | 5 | library(cluster) 6 | topic_df_dist <- as.matrix(daisy(t(topic_docs), metric = "euclidean", stand = TRUE)) 7 | # Change row values to zero if less than row minimum plus row standard deviation 8 | # keep only closely related documents and avoid a dense spagetti diagram 9 | # that's difficult to interpret (hat-tip: http://stackoverflow.com/a/16047196/1036500) 10 | topic_df_dist[ sweep(topic_df_dist, 1, (apply(topic_df_dist,1,min) + apply(topic_df_dist,1,sd) )) > 0 ] <- 0 11 | -------------------------------------------------------------------------------- /001_scrape_for_links_to_fulltext.r: -------------------------------------------------------------------------------- 1 | #' Get URLs to blog post full text for all posts 2 | #' by scraping them out of each page of the 3 | #' main blog aggregator 4 | 5 | library(RCurl) 6 | library(XML) 7 | 8 | n <- 100 # determined by inspecting the first page 9 | # pre-allocate list to fill 10 | links <- vector("list", length = n) 11 | # get URLs from the first page separately, since the URL for 12 | # the first page doesn't follow a pattern 13 | links[[1]] <- unname(xpathSApply(htmlParse(getURI("http://www.dayofarchaeology.com/")),"//h2/a/@href")) 14 | for(i in 1:n){ 15 | # track progress by showing the iteration we're up to 16 | print(i) 17 | # get all content on the i+1 th page of the main blog list 18 | blogdata <- htmlParse(getURI(paste0("http://www.dayofarchaeology.com/page/", i+1,"/"))) 19 | # extract links for all posts 20 | links[[i+1]] <- unname(xpathSApply(blogdata,"//h2/a/@href")) 21 | } 22 | 23 | -------------------------------------------------------------------------------- /007_visualise_author_relationships.r: -------------------------------------------------------------------------------- 1 | #' Visualize author similarity using force-directed network graphs 2 | 3 | #### network diagram using Fruchterman & Reingold algorithm 4 | # static 5 | library(igraph) 6 | g <- as.undirected(graph.adjacency(topic_df_dist)) 7 | layout1 <- layout.fruchterman.reingold(g, niter=500) 8 | plot(g, layout=layout1, edge.curved = TRUE, vertex.size = 1, vertex.color= "grey", edge.arrow.size = 0, vertex.label.dist=0.5, vertex.label = NA) 9 | 10 | 11 | # interactive in a web browser 12 | devtools::install_github("d3Network", "christophergandrud") 13 | require(d3Network) 14 | d3SimpleNetwork(get.data.frame(g),width = 1500, height = 800, 15 | textColour = "orange", linkColour = "red", 16 | fontsize = 10, 17 | nodeClickColour = "#E34A33", 18 | charge = -100, opacity = 0.9, file = "d3net.html") 19 | # find the html file in working directory and open in a web browser 20 | 21 | # for Gephi 22 | # this line will export from R and make the file 'g.graphml' 23 | # in the working directory, ready to open with Gephi 24 | write.graph(g, file="g.graphml", format="graphml") -------------------------------------------------------------------------------- /008_analyse_commenter_author_relationships.r: -------------------------------------------------------------------------------- 1 | #' Get names of commenters on blog posts and do 2 | #' some basic sna 3 | 4 | # Make edge lists for further analysis 5 | 6 | # get table of commenters' names and URLs of posts they comment on 7 | names(commenters) <- blogtext$url 8 | commenters_url <- stack(commenters) 9 | names(commenters_url) <- c("commenter", "url") 10 | 11 | # get table of commenters' names and authors of posts they comment on 12 | names(commenters) <- blogtext$author 13 | commenters_author<- stack(commenters) 14 | names(commenters_author) <- c("commenter", "author") 15 | 16 | 17 | # plots 18 | 19 | require(igraph) 20 | g <- graph.data.frame(commenters_author, directed=TRUE) 21 | plot(g, 22 | layout=layout.fruchterman.reingold, # the layout method. see the igraph documentation for details 23 | main='Day of Archaeology blog comments \n(edges point to post author)', #specifies the title 24 | vertex.size = 5, 25 | vertex.label.dist=0.1, #puts the name labels slightly off the dots 26 | vertex.color = "red", 27 | vertex.frame.color='red', #the color of the border of the dots 28 | vertex.label.color='black', #the color of the name labels 29 | vertex.label.font=1, #the font of the name labels 30 | vertex.label=V(g)$name, #specifies the lables of the vertices. in this case the 'name' attribute is used 31 | vertex.label.cex=1, #specifies the size of the font of the labels. can also be made to vary 32 | vertex.label.family = "sans", 33 | edge.arrow.size = 0.4, 34 | edge.color = "blue", 35 | layout=layout.fruchterman.reingold 36 | ) 37 | 38 | 39 | # for Gephi 40 | # this line will export from R and make the file 'g.graphml' 41 | # in the working directory, ready to open with Gephi 42 | write.graph(g, file="g.graphml", format="graphml") 43 | 44 | # export edge list to CSV for other software 45 | write.csv(commenters_author, file = 'commenters_author.csv') 46 | 47 | 48 | -------------------------------------------------------------------------------- /002_scrape_for_fulltext_etc_from_links.r: -------------------------------------------------------------------------------- 1 | #' Start with output from scraping for URLs, now pull full text 2 | #' get text from each post using the URLs we just got 3 | 4 | # make one big list of URLs 5 | linksall <- unlist(links) 6 | # make a data.frame to store the full text in 7 | # and get date and author of full text also 8 | blogtext <- data.frame(text = vector(length = length(linksall)), 9 | monthday = vector(length = length(linksall)), 10 | year = vector(length = length(linksall)), 11 | author = vector(length = length(linksall)) 12 | ) 13 | 14 | # make a list to store comments for each blog post 15 | names <- vector("list", length = length(linksall)) 16 | 17 | # loop over the URLs to pull full text, etc. from each URL 18 | # includes error handling in case a field is empty, etc. 19 | for(i in 1:length(linksall)){ 20 | # track progress 21 | print(i) 22 | # get URL 23 | blogdata <- htmlParse(getURI(linksall[[i]])) 24 | # get text from URL 25 | result <- try( 26 | blogtext[i,1] <- xpathSApply(blogdata, "//*/section[@class='entry']", xmlValue) 27 | ); if(class(result) == "try-error") next; 28 | # get date of blog post 29 | # first month and day 30 | result <- try( 31 | blogtext[i,2] <- strsplit(xpathSApply(blogdata, "//*/abbr[@class='date time published']", xmlValue), ",")[[1]][1], 32 | ); if(class(result) == "try-error") next; 33 | # and then year, and remove excess white space 34 | result <- try( 35 | blogtext[i,3] <- gsub("\\s","", strsplit(xpathSApply(blogdata, "//*/abbr[@class='date time published']", xmlValue), ",")[[1]][2]) 36 | ); if(class(result) == "try-error") next; 37 | # and author 38 | result <- try( 39 | blogtext[i,4] <- xpathSApply(blogdata, "//*/span[@class='fn']", xmlValue) 40 | ); if(class(result) == "try-error") next; 41 | # and the names of the commenters 42 | result <- try( 43 | commenters[[i]] <- xpathSApply(blogdata, "//*/span[@class='name']", xmlValue) 44 | ); if(class(result) == "try-error") next; 45 | 46 | } 47 | 48 | # add columns of URLs to the fulltext post 49 | blogtext$url <- linksall 50 | -------------------------------------------------------------------------------- /004_generate_topic_model.r: -------------------------------------------------------------------------------- 1 | #' Topic modelling with MALLET using clean fulltext 2 | #' based on http://www.cs.princeton.edu/~mimno/R/ 3 | 4 | 5 | require(mallet) 6 | documents <- data.frame(text = blogtext$text, 7 | id = make.unique(blogtext$author), 8 | class = blogtext$year, 9 | stringsAsFactors=FALSE) 10 | 11 | mallet.instances <- mallet.import(documents$id, documents$text, "C:/mallet-2.0.7/stoplists/en.txt", token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}") 12 | 13 | ## Create a topic trainer object. 14 | n.topics <- 30 15 | topic.model <- MalletLDA(n.topics) 16 | 17 | ## Load our documents. We could also pass in the filename of a 18 | ## saved instance list file that we build from the command-line tools. 19 | topic.model$loadDocuments(mallet.instances) 20 | 21 | ## Get the vocabulary, and some statistics about word frequencies. 22 | ## These may be useful in further curating the stopword list. 23 | vocabulary <- topic.model$getVocabulary() 24 | word.freqs <- mallet.word.freqs(topic.model) 25 | 26 | ## Optimize hyperparameters every 20 iterations, 27 | ## after 50 burn-in iterations. 28 | topic.model$setAlphaOptimization(20, 50) 29 | 30 | ## Now train a model. Note that hyperparameter optimization is on, by default. 31 | ## We can specify the number of iterations. Here we'll use a large-ish round number. 32 | topic.model$train(200) 33 | 34 | ## NEW: run through a few iterations where we pick the best topic for each token, 35 | ## rather than sampling from the posterior distribution. 36 | topic.model$maximize(10) 37 | 38 | ## Get the probability of topics in documents and the probability of words in topics. 39 | ## By default, these functions return raw word counts. Here we want probabilities, 40 | ## so we normalize, and add "smoothing" so that nothing has exactly 0 probability. 41 | doc.topics <- mallet.doc.topics(topic.model, smoothed=T, normalized=T) 42 | topic.words <- mallet.topic.words(topic.model, smoothed=T, normalized=T) 43 | 44 | # from http://www.cs.princeton.edu/~mimno/R/clustertrees.R 45 | ## transpose and normalize the doc topics 46 | topic.docs <- t(doc.topics) 47 | topic.docs <- topic.docs / rowSums(topic.docs) 48 | 49 | ## Get a vector containing short names for the topics 50 | topics.labels <- rep("", n.topics) 51 | for (topic in 1:n.topics) topics.labels[topic] <- paste(mallet.top.words(topic.model, topic.words[topic,], num.top.words=5)$words, collapse=" ") 52 | # have a look at keywords for each topic 53 | topics.labels 54 | 55 | 56 | 57 | # create data.frame with columns as authors and rows as topics 58 | topic_docs <- data.frame(topic.docs) 59 | names(topic_docs) <- documents$id 60 | 61 | # find top n topics for a certain author 62 | df1 <- t(topic_docs[,grep("Sarah Bennett", names(topic_docs))]) 63 | colnames(df1) <- topics.labels 64 | require(reshape2) 65 | topic.proportions.df <- melt(cbind(data.frame(df1), 66 | document=factor(1:nrow(df1))), 67 | variable.name="topic", 68 | id.vars = "document") 69 | # plot for each doc by that author 70 | require(ggplot2) 71 | ggplot(topic.proportions.df, aes(topic, value, fill=document)) + 72 | geom_bar(stat="identity") + 73 | ylab("proportion") + 74 | theme(axis.text.x = element_text(angle=90, hjust=1)) + 75 | coord_flip() + 76 | facet_wrap(~ document, ncol=5) 77 | 78 | ## cluster based on shared words 79 | plot(hclust(dist(topic.words)), labels=topics.labels) 80 | 81 | 82 | ## How do topics differ across different years? 83 | 84 | topic_docs_t <- data.frame(t(topic_docs)) 85 | topic_docs_t$year <- documents$class 86 | # now we have a data frame where each row is a topic and 87 | # each column is a document. The cells contain topic 88 | # proportions. The next line computes the average proportion of 89 | # each topic in all the posts in a given year. Note that in 90 | # topic_docs_t$year there is one FALSE, which dirties the data 91 | # slightly and causes warnings 92 | df3 <- aggregate(topic_docs_t, by=list(topic_docs_t$year), FUN=mean) 93 | # this next line transposes the wide data frame created by the above 94 | # line into a tall data frame where each column is a year. The 95 | # input data frame is subset using the %in% function 96 | # to omit the last row because this 97 | # last row is the result of the anomalous FALSE value that 98 | # is in place of the year for one blog post. This is probably 99 | # a result of a glitch in the blog page format. I also exclude 100 | # the last column because it has NAs in it, a side-effect of the 101 | # aggregate function above. Here's my original line: 102 | # df3 <- data.frame(t(df3[-3,-length(df3)]), stringsAsFactors = FALSE) 103 | # And below is an updated version that generalises this in case 104 | # you have more than two years: 105 | years <- sort(as.character(na.omit(as.numeric(as.character(unique(topic_docs_t$year)))))) 106 | df3 <- data.frame(t(df3[(df3$Group.1 %in% years),-length(df3)]), stringsAsFactors = FALSE) 107 | # now we put on informative column names 108 | # names(df3) <- c("y2012", "y2013") 109 | # Here's a more general version in case you have more than two years 110 | # or different years to what I've got: 111 | names(df3) <- unname(sapply(years, function(i) paste0("y",i))) 112 | # the next line removes the first row, which is just the years 113 | df3 <- df3[-1,] 114 | # the next line converts all the values to numbers so we can 115 | # work on them 116 | df3 <- data.frame(apply(df3, 2, as.numeric, as.character)) 117 | df3$topic <- 1:n.topics 118 | 119 | # which topics differ the most between the years? 120 | 121 | # If you have 122 | # more than two years you will need to do things differently 123 | # by adding in some more pairwise comparisons. Here is one 124 | # pairwise comparison: 125 | df3$diff <- df3[,1] - df3[,2] 126 | df3[with(df3, order(-abs(diff))), ] 127 | # # then if you had three years you might then do 128 | # # a comparison of yrs 1 and 3 129 | # df3$diff2 <- df3[,1] - df3[,3] 130 | # df3[with(df3, order(-abs(diff2))), ] 131 | # # and the other pairwise comparison of yrs 2 and 3 132 | # df3$diff3 <- df3[,2] - df3[,3] 133 | # df3[with(df3, order(-abs(diff3))), ] 134 | ## and so on 135 | 136 | 137 | # plot 138 | library(reshape2) 139 | # we reshape from long to very long! and drop the 140 | # 'diff' column that we computed above by using a negatve 141 | # index, that's the -4 in the line below. You'll need to change 142 | # that value if you have more than two years, you might find 143 | # replacing it with -ncol(df3) will do the trick, if you just 144 | # added one diff column. 145 | df3m <- melt(df3[,-4], id = 3) 146 | ggplot(df3m, aes(fill = as.factor(topic), topic, value)) + 147 | geom_bar(stat="identity") + 148 | coord_flip() + 149 | facet_wrap(~ variable) 150 | -------------------------------------------------------------------------------- /003_clean_fulltext.r: -------------------------------------------------------------------------------- 1 | #' clean out non-ASCII characters and formatting 2 | 3 | # remove non-ASCII characters 4 | Encoding(blogtext[,1]) <- "latin1" 5 | iconv(blogtext[,1], "latin1", "ASCII", sub="") 6 | # remove newline character 7 | blogtext[,1] <- gsub("\n","", blogtext[,1]) 8 | # save as CSV so others can use it 9 | write.csv(blogtext, 'dayofarchaeology.csv') 10 | 11 | # if loading CSV file to run from here: 12 | # blogtext <- read.csv("dayofarchaeology.csv", stringsAsFactors = FALSE) 13 | 14 | # a few quick summary statistics 15 | 16 | # How many posts in total? 17 | nrow(blogtext) 18 | 19 | # How many words in total? 20 | length(unlist(lapply(blogtext$text, function(i) strsplit(i, " ")[[1]]))) 21 | 22 | # how many authors in total? 23 | length(unique(blogtext$author)) 24 | 25 | # How many posts in each year? 26 | table(blogtext$year) 27 | 28 | # how many words per year? 29 | length(unlist(lapply(blogtext[blogtext$year == 2012,]$text, function(i) strsplit(i, " ")[[1]]))) 30 | length(unlist(lapply(blogtext[blogtext$year == 2013,]$text, function(i) strsplit(i, " ")[[1]]))) 31 | 32 | # how many authors per year? 33 | length(unique(blogtext[blogtext$year == 2012,]$author)) 34 | length(unique(blogtext[blogtext$year == 2013,]$author)) 35 | 36 | # words per post per year 37 | length(unlist(lapply(blogtext[blogtext$year == 2012,]$text, function(i) strsplit(i, " ")[[1]])))/table(blogtext$year)[[1]] 38 | length(unlist(lapply(blogtext[blogtext$year == 2013,]$text, function(i) strsplit(i, " ")[[1]])))/table(blogtext$year)[[2]] 39 | 40 | # plot distribution of words 41 | # how many words per post? 42 | wpp <- data.frame( 43 | words = unlist(lapply(blogtext$text, function(i) length(strsplit(i, " ")[[1]]))), 44 | year = blogtext$year, stringsAsFactors = FALSE) 45 | 46 | # only look at 2012 and 2013 47 | wpp <- wpp[wpp$year %in% c(2012, 2013),] 48 | 49 | # function for labels 50 | nlabels <- table(wpp$year) 51 | 52 | # To create the median labels, you can use by 53 | meds <- round(c(by(wpp$words, wpp$year, mean)),0) 54 | 55 | # make the plot 56 | require(ggplot2) 57 | ggplot(wpp, aes(as.factor(year), words, label=rownames(wpp))) + 58 | geom_violin() + 59 | geom_text(data = data.frame(), aes(x = names(meds) , y = meds, 60 | label = paste("mean =", meds))) + 61 | xlab("Year") 62 | 63 | # check if a certain word is present at all 64 | require(tm) 65 | # create corpus 66 | corp <- Corpus(VectorSource(blogtext[,1])) 67 | # if using CSV file do this instead of the line above 68 | # corp <- Corpus(VectorSource(blogtext[,2])) 69 | # process text 70 | skipWords <- function(x) removeWords(x, stopwords("english")) 71 | funcs <- list(tolower, removePunctuation, removeNumbers, stripWhitespace, skipWords) 72 | corp <- tm_map(corp, FUN = tm_reduce, tmFuns = funcs) 73 | # create document term matrix 74 | dtm <- DocumentTermMatrix(corp, control = 75 | # limit word lengths 76 | list(wordLengths = c(2,10))) # , 77 | ## A few other options for text mining 78 | # control weighting 79 | # weighting = weightTfIdf, 80 | # keep words in more than 1% 81 | # and less than 95% of docs 82 | # bounds = list(global = c( 83 | # length(corp)*0.01,length(corp)*0.95)))) 84 | 85 | # how many times does the word 'pyramid' occur in this document term matrix? 86 | dtmdf <- data.frame(inspect(dtm)) 87 | sum(dtmdf[, names(dtmdf) == 'pyramid']) 88 | 89 | # Indiana Jones comparison 90 | 91 | # list words of interest - things archys use 92 | IJ1 <- c('trowel', 'shovel', 'spade', 'gun', 'whip', 'fedora', 'computer', 'pen') 93 | # get word counts in document term matrix 94 | IJ2 <- dtmdf[, intersect(names(dtmdf), IJ1)] 95 | # find words that don't occur in dtm at all 96 | notin <- setdiff(IJ1, names(dtmdf) ) 97 | # append of cols of zeros for these words not in the dtm 98 | IJ3 <- cbind(IJ2, replicate(length(notin), rep(0,nrow(IJ2))) ) 99 | # edit col names 100 | names(IJ3) <- c(names(IJ2), notin) 101 | # reshape for plotting 102 | require(reshape2) 103 | IJ4 <- melt(IJ3) 104 | require(ggplot2) 105 | ggplot(IJ4, aes(reorder(variable,-value), value)) + 106 | geom_bar(stat="identity") + 107 | xlab("Things archaeologists use in the field") + 108 | ylab("Term Frequency") + theme(axis.text.x = element_text(colour="grey20",size=17,angle=0,hjust=.5,vjust=.5,face="plain"), 109 | axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), 110 | axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=0,face="plain"), 111 | axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=.5,face="plain")) 112 | 113 | 114 | # list words of interest - things archys fear 115 | IJ1 <- c('tunnels', 'cliffs', 'heat', 'cold', 'insects', 'snakes', 'bears', 'nazis', 'aliens') 116 | # get word counts in document term matrix 117 | IJ2 <- dtmdf[, intersect(names(dtmdf), IJ1)] 118 | # find words that don't occur in dtm at all 119 | notin <- setdiff(IJ1, names(dtmdf) ) 120 | # append of cols of zeros for these words not in the dtm 121 | IJ3 <- cbind(IJ2, replicate(length(notin), rep(0,nrow(IJ2))) ) 122 | # edit col names 123 | names(IJ3) <- c(names(IJ2), notin) 124 | # reshape for plotting 125 | require(reshape2) 126 | IJ4 <- melt(IJ3) 127 | require(ggplot2) 128 | ggplot(IJ4, aes(reorder(variable,-value), value)) + 129 | geom_bar(stat="identity") + 130 | xlab("Dangers faced by archaeologists") + 131 | ylab("Term Frequency") + theme(axis.text.x = element_text(colour="grey20",size=17,angle=0,hjust=.5,vjust=.5,face="plain"), 132 | axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), 133 | axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=0,face="plain"), 134 | axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=.5,face="plain")) 135 | 136 | # what kinds of artefacts do archys study? 137 | 138 | # list words of interest - things archys study 139 | IJ1 <- c('pottery', 'bones', 'pollen', 'stones', 'bricks', 'wood', 'metal', 'treasure', 'grail') 140 | # get word counts in document term matrix 141 | IJ2 <- dtmdf[, intersect(names(dtmdf), IJ1)] 142 | # find words that don't occur in dtm at all 143 | notin <- setdiff(IJ1, names(dtmdf) ) 144 | # append of cols of zeros for these words not in the dtm 145 | ifelse(length(notin) == 0, 146 | IJ3 <- IJ2, 147 | IJ3 <- cbind(IJ2, replicate(length(notin), rep(0,nrow(IJ2))) ) 148 | ) 149 | # edit col names 150 | names(IJ3) <- c(names(IJ2), notin) 151 | # reshape for plotting 152 | require(reshape2) 153 | IJ4 <- melt(IJ3) 154 | require(ggplot2) 155 | ggplot(IJ4, aes(reorder(variable,-value), value)) + 156 | geom_bar(stat="identity") + 157 | xlab("Things archaeologists study") + 158 | ylab("Term Frequency") + theme(axis.text.x = element_text(colour="grey20",size=12,angle=0,hjust=.5,vjust=.5,face="plain"), 159 | axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), 160 | axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=0,face="plain"), 161 | axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=.5,face="plain")) 162 | 163 | # what about gender? 164 | 165 | # list words of interest - things archys study 166 | IJ1 <- c('he', 'him', 'his', 'she', 'her', 'hers') 167 | # get word counts in document term matrix 168 | IJ2 <- dtmdf[, intersect(names(dtmdf), IJ1)] 169 | # find words that don't occur in dtm at all 170 | notin <- setdiff(IJ1, names(dtmdf) ) 171 | # append of cols of zeros for these words not in the dtm 172 | ifelse(length(notin) == 0, 173 | IJ3 <- IJ2, 174 | IJ3 <- cbind(IJ2, replicate(length(notin), rep(0,nrow(IJ2))) ) 175 | ) 176 | # edit col names 177 | names(IJ3) <- c(names(IJ2), notin) 178 | # reshape for plotting 179 | require(reshape2) 180 | IJ4 <- melt(IJ3) 181 | require(ggplot2) 182 | ggplot(IJ4, aes(reorder(variable,-value), value)) + 183 | geom_bar(stat="identity") + 184 | xlab("Use of gender-specific pronouns by archaeologists") + 185 | ylab("Term Frequency") + theme(axis.text.x = element_text(colour="grey20",size=12,angle=0,hjust=.5,vjust=.5,face="plain"), 186 | axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), 187 | axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=0,face="plain"), 188 | axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=.5,face="plain")) 189 | 190 | # aggregate male and female pronouns 191 | IJ5 <- data.frame(male = rowSums(cbind(IJ3$he, IJ3$him, IJ3$his)), 192 | female = rowSums(cbind(IJ3$she, IJ3$her, IJ3$hers))) 193 | IJ6 <- melt(IJ5) 194 | require(ggplot2) 195 | ggplot(IJ6, aes(reorder(variable,-value), value, fill = variable))+ 196 | geom_bar(stat="identity") + 197 | xlab("Use of gender-specific pronouns by archaeologists") + 198 | ylab("Term Frequency") + theme(axis.text.x = element_text(colour="grey20",size=12,angle=0,hjust=.5,vjust=.5,face="plain"), 199 | axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), 200 | axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=0,face="plain"), 201 | axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=.5,face="plain")) + 202 | scale_fill_manual(values=c("blue", "#FF0066")) 203 | 204 | -------------------------------------------------------------------------------- /README.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | A Distant Reading of the Day of Archaeology 8 | 9 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 |

A Distant Reading of the Day of Archaeology

145 | 146 |

Introduction

147 | 148 |

The Day of Archaeology is an event where archaeologists write about their acitivities on a group blog. The event started in 2011 and aims to 'provide a window into the daily lives of archaeologists from all over the world'. Currently there are over 1000 posts on the blog, rather a lot to read in one sitting. Rather than closely read each post, we can do a distant reading to get some insights into the corpus. Distant reading is a term advocated by Franco Moretti to refer to efforts to understand texts through quantitative analysis and visualisation.

149 | 150 |

A quantitative method that has recently become popular for distant reading is topic modelling. To get some insights into what all these archaeologists were writing about, I've generated a topic model to find the most important themes amongst the posts. By browsing the topics I can see what they key ideas are without having to read every word of every post. This approach is inspired by Matt Jockers' analysis of the 2010 Day of Digital Humanities blog posts, and Shawn Graham, who did a similar analysis of the 2011 Day of Archaeology blog posts and has also written an accessible introduction to topic modelling.

151 | 152 |

The questions I'm attempting to answer with this distant reading include: what is a typical day for an archaeologist? What are the different kinds of day that are represented in this collection? Do all archaeologists have generally similar days or not? As an archaeologist also I'm curious to see how my day compares with others!

153 | 154 |

Method

155 | 156 |

My method uses the R programming language and a few external tools, most notably MALLET. The method should be completely reproducible using the code in this repository (go ahead and try it! If you're coming to R for the first time, I recommend using R with RStudio). Here's a quick summary of the process, do inspect the code for more details.

157 | 158 |

First, I scraped the dayofarchaeology.com site to get the links to the full text of each post (because the front and subsequent pages of the main site only give a snippet of text as a teaser). The blog has a Creative Commons Attribution-ShareAlike 3.0 License which means we are free to copy, share and remix the blog contents, provided we give proper attribution (did I mention all of this is coming from dayofarchaeology.com?) and make the results available with the same or similar licence (I use the MIT licence for this repository, which is similar). Second, I pulled the full text from each post, along with the name of the author and the date. Third, I cleaned the text to remove unusual characters and formatting. Fourth, I generated a topic model using the latent Dirchlet allocation algorithm implemented MALLET (it's much faster that the pure R methods). I arbitrarily set the number of topics at 30 and generated one model with all posts from 2012-2013. Fifth, I computed a similarity matrix for the authors of each post based on the mixture of topics in each author's post. Sixth, I computed a k-means cluster analysis to assign each author into a group, based on the topics detected in their post. Seventh, I visualised the groups of authors with a network graph. Each step of the method has a corresponding file of R code in this repository.

159 | 160 |

Results

161 | 162 |

I've put all the scraped data in a csv file here (right-click -> save link as…) in case you want to browse or do other analyses. The csv file contains the full text of each post, the name of the author, the date of publication and the URL of the post.

163 | 164 |

Summary of the corpus

165 | 166 |

In the 2012-2013 corpus there are a total of 352,558 words in 622 blog posts by 370 unique authors (as of 5pm EST 28 July 2013, a few more posts trickled in after this time, but this analysis is a weekend project, so it stops on a Sunday evening). The author count is probably an underestimate as some posts (like this very long one) are written by multiple people using a common affiliation as the author name. There were fewer posts in 2013 (n = 273) compared to 2012 (n = 348), but the average length of the posts is slightly higher in 2013 (mean = 591) compared to 2012 (mean = 549). Here's a plot of the distribution of words per post by year: 167 | wpp

168 | 169 |

Summary of the topic model

170 | 171 |

Here are 50 key words for the 30 topics generated by the LDA model:

172 | 173 |
>[1] "excavation field area stone features ground excavations tools soil excavated test discovered campus feature involved surface high late story rock bone level dating pit pits areas natural source survey structures photo forms excavate excavating flint larger located deposits open fragments trowel clear difficult geophysical floor image entire buried filled allowed"                                                                                           
174 |  [2] "field archaeologists past school professional interested ferry understanding favorite knowledge washington skills public farm individuals topic director communication screen interests responsibility profession experiences single popular article artifacts engage professionals macedonia academic real participation archeology band digs semester shows scientific future diversity enjoyed vcu unit sit civil pseudoarchaeology washingtonâ dig responsibilities"
175 |  [3] "past cemetery landscape learn understand human questions interested modern interest ceramics lab early understanding water archaeologists places identify enjoy animal record techniques bones common simply methods personal southern types lost studying simple reason helps program ceramic lives lack collecting communities othe lived step properly inside kids property locations surveying question"                                                            
176 |  [4] "digital access open archaeologists archive ads database online content reports director app text images center media free video system community linked library grey files visual literature metadata punk toronto scarf web order articles mobile videos create adding file mukurtu thousands cultural platform image publication dead filemaker databases edit asi created"                                                                                           
177 |  [5] "shelf ù"ø laarc gallery objects ùsø archive number ù"ù center lottery ø«ø ù.ù cooper ù^ø ù.ø suggested orange object completely metal registered piece dayofarch ùsù environmental discover random store øªø border solid width tweet margin auto float margin-top text-align img cfcfcf margin-left ù"ù.ø weâ medieval excavated london message textile holdsâ"                                                                                                        
178 |  [6] "rcahms copyright scotland survey chosen built aerial ordnance stone castle database crown favourite historic landscape north twitter photographs view scottish loch fort record park south east images west wall water buildings structures photograph place myarchaeology.â canmore recorded visit revealed remains monuments timber image people history island stones century cropmarks location"                                                                    
179 |  [7] "cat circus making desk bag fort centre dig observatory urns click colchester house hat green cans uppsala unit system cremation contents high article mining ice common garden yellow editor utah cwa regular session magazine friend captains folkestone auckland hold renovation army avoid newsletter care drop warm military sardine taylorâ baseball"                                                                                                              
180 |  [8] "museum objects display finds exhibition object museums coins treasure metal history antiquities coin silver database hoard scheme material social british casts case cases interested items detecting stolen classical looting tile temporary create gold market pas museumâ north hertfordshire gallery portable conservation finder looted hitchin preserved story cast stories pot vessel"                                                                           
181 |  [9] "shropshire hoard war space community bristol royalist detectorists bitterley veterans military graffiti county indigenous civil shrewsbury march garrison northern south army territory communities ludlow contemporary earth parliamentarian wem paid country family worth spacecraft nter jawoyn club force aboriginal parliamentarians brampton similar system support significant messages men castle rights relic cavalry"                                         
182 | [10] "students community school dig volunteers student children learning people archaeologists visitors experience past involved history open events undergraduate learn training education workshops skills typical schools questions town activity weeks learned outreach science friends planning educational silchester class tour workshop pottery college gave talks teaching participants placement aim tours teach contribute"                                        
183 | [11] "historic planning environment county wright vitaemilia policy law council conservation ireland committee protection issues commercial dayofarch government means rescue potential dayofarchâ knowledge development board impact resources client records carried developer officers interest process buildings july works practice natural early national remains monitoring organisations sector officer sweden significance standards money organisation"             
184 | [12] "des une archã sur est pour dans qui nous prã ologie avec par mon journã moyen fouille inrap gallery intã aux vous occupation cette diagnostic mãªme aussi donnã ont fait muller-pelletier sols ologues ologique lors exceptionnelle ces siã vie ils carine scientifique sont vestiges tre couverte viarmes chã large questions"                                                                                                                                         
185 | [13] "people itâ years world things lot interesting reading read blog write means donâ posts book head fun point canâ didnâ list weeks fact ago career books share called degree ideas pieces position youâ kind stuff context theyâ thinking huge moving issues fit historical sitting resources person phd happened play talk"                                                                                                                                              
186 | [14] "house remains century years land houses area gardens bones family middle small brown occupied called ancient bottom industrial lake fields conditions sample garden base andâ living council estate church child cut destruction remaining teeth town college salt thin britain sea point farm fine green trees shown deposit market reach chapel"                                                                                                                      
187 | [15] "material collection finds collections age excavations records artefacts iron years early excavated record pottery small late boxes box range archives publication archive ago storage store documents drawings modern original individual colleague technology documentation created bronze assistant task extensive building published making researchers began remains hill photograph centre collected catalogue town"                                               
188 | [16] "ancient museum cultural conference history social academic institute teaching writing australia dissertation living egypt city department society culture early museums focused western published funding web science european past human discipline mentioned egyptian southern sciences interested eastern tomb humanities trip associate fellowship oxford universities included relevant professor on-line places materials world"                                  
189 | [17] "interesting norfolk mola centre talk programme colleague range wood visitor festival landscape dating assemblage building neolithic closely timbers lottery anglo-saxon species grave england services involved fund coastal cba hearth senior discussion review waterlogged flots bone identification reminded specialists scotland csi radiocarbon action thames manager artist crannog graves offer specialist biggest"                                              
190 | [18] "glass century assemblage cave island leather early mosaic interesting weâ fragments white ireland nails shoe beautiful preserved discovery vessel wild norse end mountains opening covered bowl vessels bay galway shoes visible quarters piece common colleague degrees share flood imagine ale lion segni sides harbour men drawing french economic dublin roads"                                                                                                     
191 | [19] "building medieval digging buildings city small trench finds crew construction excavating walls early dig hot built wall dug road lots street hole top urban inside weeks water brick library close heat town pottery fill samples deep food dates record pot block dark basement complex cleaning trenches experienced materials nearby pipe"                                                                                                                           
192 | [20] "war camp world monuments johnâ british battle institute italy rome statue memorial evans faraday spain photos knossos italian malta command cambridge half fig bomber papers figures gave monument raf shells doa involvement spanish boat shamanic latin neolithic materials activity idea campaign crete negatives monticello pow tests japanese palatine copenhagen ioa"                                                                                             
193 | [21] "museum mound grave west creek curator complex facility group making stones corn garden native tour display interpretive manchester days incised volunteers program gift room library consists culture museum.â slide documentary gcmac trays view flint burial film educational moundsville delf norona shop educator beads displays square visitors wonderful papers check care"                                                                                       
194 | [22] "age church medieval bronze idaho iron palace ireland village early churches burials end burial farm national exmoor identified london coffin sandpoint post-medieval cist cremation land parish viking stepney close houses population peat valley happy green skeleton surrounded swca holy horse symbolic mola live mound spread boundary wooden moorland peterborough appaloosa"                                                                                     
195 | [23] "che arqueologia archaeologyâ oday projeto dei montescudaio sambaqui progetto archeologia uma con modo delle allâ sono cubatã layer piã medievale montecorvino gli archeologi attraverso mio elementi pisa baldassarri post video cad museu prof sul virtual dellâ sia questo lavorare naturalmente essere disegno bassa material autocad base file vital joinville stico"                                                                                               
196 | [24] "conservation models wessex pitt rivers lab model wiltshire south large salisbury figurines excavated wood service areas plaster scanning maritime damage space materials equipment operation condition wreck paint cleaned surface general shopping treatment remove conserving imaging diving skeleton cleaning protected corrosion aim underwater scale fragile nightingale gallery heavy camera wooden loose"                                                        
197 | [25] "itâ week weâ office bit days morning writing started start things check process archaeologists spend thatâ recording lot full afternoon hours couple london pretty main short small end finished ready final finally call donâ half idea leave computer emails friday yesterday lunch current thereâ tea lots running youâ making finish"                                                                                                                               
198 | [26] "phd human student bones ã???atalhã samples deer pottery bone turkey thesis worcester cake cardiff residues animal room neolithic collaboration difficult animals hut faunal close worcestershire jaffa jebel writing suggested slag pots hive fallow bronze favourite supportive materials huge hike peak isotope analysing smith published phds fellow changing activity laboratory researcher"                                                                          
199 | [27] "survey gis record database understand aerial recording records laser maps wales photography photographic systems total world techniques form spatial film cross trees multiple tables english discussion hill processing leader landscapes valley involved software station scanner avon tree yorkshire field system relevant norwegian andâ fields medieval structure context generally computers entered"                                                             
200 | [28] "los para por las con mã¡s arqueologã como arqueã day trabajo proyecto arqueolã desde este pero sobre expressly stated licensed creative commons attribution-sharealike unported license logos muy todo mallou histã nos hoy patrimonio ser ver hora tiempo tengo dayofarch entre historia asã tambiã madrid video dos grupo vez soy logo"                                                                                                                               
201 | [29] "artifacts historic state historical public national park cultural arkansas lab history survey resources program african graduate philadelphia preservation pennsylvania unit states society native united artifact archeological county materials portion camp center property crm exhibit foundation bridge education century outreach findings station laboratory equipment curation assistant teach document americans preserve south"                               
202 | [30] "day work time project creative licensed commons stated license expressly attribution-sharealike unported working today year heritage archaeologist find local team job projects life post important public spent large group including report place number fieldwork exciting staff visit set july future members provide colleagues hard based present worked plan management reports"
203 | 
204 | 205 |

It should be pretty obvious that these topics are generated by a probabilistic algorithm rather than carefully organised by a person. For example, medieval churches and Idaho in topic 22 and the cat circus in topic 7 are rather dissonant combinations. However, many of the topics seem quite distinctive and coherent, such as 4, 6, and 20. A few topics seem to make sense as mixtures or chimera topics, suggesting that a slightly higher number of topics might be more appropriate. Topic 25 is like an eerily garbled telegraphic text message from an unfortunate archaeologist chained to a desk (and is similar to Graham's topic 17 from 2012). Topics 28 and 30 are colophon topics dominated by the license that is attached to each post. With additional effort and time, such as analysing topic diagnostics and excluding more stopwords and non-noun parts of speech we may be able to refine the topic model.

206 | 207 |

Basic validation of the model

208 | 209 |

We can validate the model to a basic degree by closely reading a tiny random sample of the corpus to see if the model's classifications seem accurate. For example, here we can see the mixture of topics in the posts by my friend and colleague Jacq Matthews:

210 | 211 |

Jacq's topics

212 | 213 |

That seems pretty good, Jacq's 2012 post was about field work and her duties as an officer of the Australian Archaeological Association, while her 2013 post is more about social sciences and global cultural heritage.

214 | 215 |

And here's a graph of an interesting post by Ryan Baker on using small aerial drones for archaeological photography, survey and model-making:

216 | 217 |

Ryan's topics

218 | 219 |

A good classification, with high proportions of topic 24 that includes models, camera, figurines and equipment and topic 27 with survey, GIS and photography.

220 | 221 |

And here we can see that Sarah Bennett, an archaeologist in Florida, made quite different posts in each year:

222 | 223 |

Sarah's topics

224 | 225 |

Sarah's 2013 post is about volunteers cleaning up a historic cemetery, which is nicely captured by topic 3. Topic 29 reflects the public volunteer aspect of the post. Sarah's 2012 post is about the excavation of a shell midden, clearly indicated by a high proportions of topic 1 and 20. Topic 20 is interesting because it seems mostly to be about the archaeology of war. We see 'shell' and 'shells' in topic 20 between the algorithm hasn't been able to distinguish between shells you eat and other kinds of shells (bombs, wrecked buildings, etc.).

226 | 227 |

While the topic model has a few comical and naive moments, my informal and brief validation indicates that it is clearly not complete nonsense and is credible as a representation of the corpus. Note that each time you replicate the generation of this model you get slightly different topics because of the probabilistic nature of topic modelling.

228 | 229 |

Visualisation of similar topics

230 | 231 |

To get a sense of relationships amongst the topics we can visualize a hierarchical clustering of topics. Here we can see that the museum topics tend to form a group distinct from the others. Excavation and field archaeology form a high-level cluster as well as regional historical archaeology topics (on the far right). The majority of topics are quite similar to each other.

232 | 233 |

cluster of topics

234 | 235 |

Comparison of topics in 2012 and 2013

236 | 237 |

We can get an impression of the shift in topics from 2012 to 2013 by comparing the average proportions of each topic across all documents for each year. The five topics that are the most different are 12, 28, 6, 18 and 23. Topics 12, 23 and 28 are non-English language topics, suggesting are greater international contribution in 2013. Topic 6 seems to reflect the large number of posts in 2013 by or about archaeologists working with the Royal Commission on the Ancient and Historical Monuments of Scotland.

238 | 239 |

difference in topics

240 | 241 |

Groups of similar authors

242 | 243 |

Now that we've established the credibility of the topic model, we can look at how authors group together according to the mixtures of topics in their posts. Here are the groups of authors I get after a k-means analysis on topic proportions. I arbitrarily set the number of groups at 30 (you can run the code yourself and change the number to see what happens). With additional effort we could algorithmically determine the optimum number of groups. If there is a number after the name it's because that author has more than one post on the blog. Reassuringly, most of the time we see multiple posts by the same author in the same cluster. Although we saw above that it's not always the case that one author writes about the same general mix of topics in their posts.

244 | 245 |
[[1]]
246 |  [1] "sarah_may1"          "Claire Bradshaw"     "MOLA.1"             
247 |  [4] "MOLA.6"              "David Gurney.3"      "MOLA.8"             
248 |  [7] "bajrjobs.4"          "EHZooarchaeologists" "Susan Greaney.1"    
249 | [10] "Karen Stewart"      
250 | 
251 | [[2]]
252 |  [1] "Laracuente"                 "Declan Moore (Moore Group)"
253 |  [3] "Emily Wright"               "Kelly Powell"              
254 |  [5] "JamesAlbone"                "David Gurney"              
255 |  [7] "Helen Wells"                "Ian Richardson"            
256 |  [9] "Paul McCulloch"             "MOLA.5"                    
257 | [11] "David Gurney.2"             "Dan Hull"                  
258 | [13] "ChrisCumberpatch"           "Charles Mount"             
259 | [15] "sylvia.warman"              "Laura Belton"              
260 | [17] "Robin Standring"            "Michelle Touton"           
261 | [19] "Emily Wright.1"             "Chris Constable.1"         
262 | [21] "Giles Carey"                "Roman Baths Museum.1"      
263 | [23] "Manda Forster"              "Magnus Reuterdahl.1"       
264 | 
265 | [[3]]
266 |  [1] "Stephen Kay"         "Caroline Goodson"    "Alexandra Knox"     
267 |  [4] "dberryman"           "Valentina"           "MOLA.7"             
268 |  [7] "MOLA.9"              "michigan"            "Zsolt Magyar"       
269 | [10] "Marcel Cornellissen" "dberryman.1"         "rrohe"              
270 | [13] "talia_shay.1"        "MOLA.15"             "MOLA.20"            
271 | [16] "MOLA.23"             "tuzusai2012"         "cornelius.1"        
272 | [19] "tuzusai2012.1"      
273 | 
274 | [[4]]
275 | [1] "cartvol.3" "MOLA.10"   "cartvol.7" "MOLA.14"   "cartvol.8"
276 | 
277 | [[5]]
278 |  [1] "Alan Simkins"                                  
279 |  [2] "RCAHMS.2"                                      
280 |  [3] "gabe"                                          
281 |  [4] "SUrachi.1"                                     
282 |  [5] "PalatineEastPotteryProject.1"                  
283 |  [6] "Francesco Ripanti"                             
284 |  [7] "David Gurney.4"                                
285 |  [8] "Amesemi"                                       
286 |  [9] "Italian National Association of Archaeologists"
287 | [10] "edlyne"                                        
288 | [11] "AMTTA"                                         
289 | [12] "ffion"                                         
290 | [13] "The Gabii Project"                             
291 | [14] "Stefano Costa.1"                               
292 | [15] "William Hafford"                               
293 | [16] "brennawalks"                                   
294 | [17] "David Gill"                                    
295 | [18] "lofttroll"                                     
296 | [19] "Henriette Roued-Cunliffe.2"                    
297 | [20] "Bob Muckle.2"                                  
298 | 
299 | [[6]]
300 |  [1] "carmean"              "Rachael Sparks"       "Heather Sebire"      
301 |  [4] "Amanda Brooks"        "Heather Cline"        "sdhaddow"            
302 |  [7] "Manchester Museum"    "Steve Compston"       "Martin Lominy"       
303 | [10] "mcarra"               "David E. Rotenizer.1" "Dena Sedar"          
304 | [13] "Heather Cline.1"      "NGO Archaeologica.2"  "Andrew Kirkland"     
305 | [16] "Amanda Brooks.1"     
306 | 
307 | [[7]]
308 |  [1] "Susan Greaney"                      "Bolton Library and Museum Services"
309 |  [3] "Sarah JaneHarknett.1"               "Shawn Graham.1"                    
310 |  [5] "Laura Burnett"                      "William Hafford.1"                 
311 |  [7] "Craig Barker"                       "Julie Cassidy"                     
312 |  [9] "Laura Burnett.2"                    "Candace Richards"                  
313 | 
314 | [[8]]
315 | [1] "cristiana"             "cristiana.1"           "Jaime Almansa Sánchez"
316 | [4] "Khawla Goussous"       "cartvol.4"             "archscotland"         
317 | [7] "Cara Jones"            "Lorna Richardson.4"    "archscotland.1"       
318 | 
319 | [[9]]
320 |  [1] "Kevin Wooldridge"                 "David Standing"                  
321 |  [3] "Jonathan Haller"                  "Kayt Armstrong"                  
322 |  [5] "clydeandavon"                     "clydeandavon.1"                  
323 |  [7] "RCAHMS.12"                        "Spencer Gavin Smith.1"           
324 |  [9] "Chiz Harward (Urban Archaeology)" "Jenny Ryder"                     
325 | [11] "RCAHMS.24"                        "Rosalind Buck"                   
326 | [13] "Cathy Dagg"                       "MOLA.11"                         
327 | [15] "Chris Green"                      "Aerial-Cam"                      
328 | [17] "ArcheoWebby.1"                    "popefinn"                        
329 | [19] "Paul"                             "Tom Goskar"                      
330 | [21] "Serra Head.1"                     "Andrew Mayfield.1"               
331 | [23] "Giles Carey.1"                    "Spencer Gavin Smith.2"           
332 | 
333 | [[10]]
334 |  [1] "Lorna Richardson"   "ArcheoWebby"        "Sophie Hay"        
335 |  [4] "saraperry"          "David Howell"       "Kayt Armstrong.1"  
336 |  [7] "Susan Johnston"     "Don Henson"         "Francesca Tronchin"
337 | [10] "Don Henson.2"       "MOLA.25"            "Kristina Killgrove"
338 | 
339 | [[11]]
340 |  [1] "Brian Kerr"            "bajrjobs"              "MOLA"                 
341 |  [4] "Dawn McLaren"          "Somayyeh Mottaghi"     "Lorna Richardson.1"   
342 |  [7] "Dana Goodburn-Brown"   "James Morris"          "Dana Goodburn-Brown.2"
343 | [10] "Mike Heyworth"         "Anne Crone"           
344 | 
345 | [[12]]
346 |  [1] "sclements"               "Cath Poucher"            "cartvol.2"              
347 |  [4] "Carole Bancroft-Turner"  "Megan Rowland"           "Jaime Almansa Sánchez.5"
348 |  [7] "Guy Hunt.1"              "judgec"                  "Don Henson.1"           
349 | [10] "judgec.1"                "Sarah MacLean"          
350 | 
351 | [[13]]
352 |  [1] "Chris Constable"     "Guy Hunt"            "Spencer Gavin Smith"
353 |  [4] "Helen Williams"      "Alice Kershaw"       "Rena MacGuire"      
354 |  [7] "Rachel Ives.2"       "MOLA.18"             "Claire Woodhead.5"  
355 | [10] "nikolah"             "Claire Woodhead.6"   "De Kogge"           
356 | 
357 | [[14]]
358 |  [1] "RCAHMS"     "RCAHMS.1"   "RCAHMS.3"   "RCAHMS.4"   "RCAHMS.5"   "RCAHMS.6"  
359 |  [7] "RCAHMS.7"   "RCAHMS.8"   "RCAHMS.9"   "RCAHMS.11"  "RCAHMS.13"  "RCAHMS.14" 
360 | [13] "RCAHMS.15"  "RCAHMS.17"  "RCAHMS.18"  "RCAHMS.19"  "RCAHMS.20"  "RCAHMS.21" 
361 | [19] "RCAHMS.22"  "RCAHMS.23"  "RCAHMS.25"  "RCAHMS.27"  "RCAHMS.28"  "RCAHMS.30" 
362 | [25] "RCAHMS.31"  "RCAHMS.32"  "RCAHMS.33"  "Garry Law"  "James Cole" "RCAHMS.34" 
363 | [31] "RCAHMS.36"  "RCAHMS.38"  "RCAHMS.39"  "RCAHMS.41"  "RCAHMS.43" 
364 | 
365 | [[15]]
366 | [1] "archaeologicalresearchcollective" "Tamira"                          
367 | [3] "cartvol.6"                       
368 | 
369 | [[16]]
370 | [1] "Sandra LozanoRubio"   "Rmadgwick"            "Sandra LozanoRubio.1"
371 | [4] "Tim Young"            "Scott Haddow"        
372 | 
373 | [[17]]
374 |  [1] "Ralph Mills"           "cartvol"               "cartvol.1"            
375 |  [4] "mmrathgaber"           "MOLA.3"                "brennawalks@gmail.com"
376 |  [7] "MOLA.12"               "Lynn Evans"            "Matt Law.3"           
377 | [10] "Dave Wilton"           "april.beisaw"          "IUSB PAFS"            
378 | [13] "MOLA.17"               "Claire Woodhead.4"     "MOLA.22"              
379 | [16] "RCAHMS.37"             "Sue Carter"            "Lynn Evans.1"         
380 | [19] "Sue Carter.1"          "Colleen Morgan"       
381 | 
382 | [[18]]
383 |  [1] "Molly Swords"              "MOLA.2"                   
384 |  [3] "RCAHMS.16"                 "Laura Puolamaki"          
385 |  [5] "Exmoorhistoricenvironment" "Molly Swords.1"           
386 |  [7] "Mary Petrich-Guy"          "Dot Boughton"             
387 |  [9] "Asa M Larsson"             "Rachel Ives"              
388 | 
389 | [[19]]
390 |   [1] "Waveney ValleyCommunityArchaeologyGroup"                                               
391 |   [2] "Glynis Irwin"                                                                          
392 |   [3] "David E. Rotenizer"                                                                    
393 |   [4] "rbakerarae"                                                                            
394 |   [5] "Keneiloe Molopyane"                                                                    
395 |   [6] "Beth Pruitt"                                                                           
396 |   [7] "Grace Krause"                                                                          
397 |   [8] "Gail Boyle"                                                                            
398 |   [9] "Sarah Bennett"                                                                         
399 |  [10] "Thiago Fossile"                                                                        
400 |  [11] "Peter Reavill"                                                                         
401 |  [12] "Scott Clark"                                                                           
402 |  [13] "Lancaster Williams"                                                                    
403 |  [14] "Jaime Almansa Sánchez.1"                                                               
404 |  [15] "Jaime Almansa Sánchez.2"                                                               
405 |  [16] "David Garcia Casas"                                                                    
406 |  [17] "Christine Morris"                                                                      
407 |  [18] ""                                                                                      
408 |  [19] "Magnus Reuterdahl"                                                                     
409 |  [20] "April Beisaw"                                                                          
410 |  [21] "Tricia Jarratt"                                                                        
411 |  [22] "Anthroprobably"                                                                        
412 |  [23] "diacarco"                                                                              
413 |  [24] "EAAPP"                                                                                 
414 |  [25] "Pedro MoyaMaleno"                                                                      
415 |  [26] "RCAHMS.10"                                                                             
416 |  [27] "Jaime Almansa Sánchez.3"                                                               
417 |  [28] "Briana Pobiner"                                                                        
418 |  [29] "jbarnes9"                                                                              
419 |  [30] "David Gurney.1"                                                                        
420 |  [31] "Helen Keremedjiev"                                                                     
421 |  [32] "Bernard K. Means"                                                                      
422 |  [33] "Kelly Abbott"                                                                          
423 |  [34] "Charlotte Douglas"                                                                     
424 |  [35] "Manchester Museum.1"                                                                   
425 |  [36] "Adam Corsini.2"                                                                        
426 |  [37] "Kelly Abbott.1"                                                                        
427 |  [38] "INRAP"                                                                                 
428 |  [39] "INRAP.1"                                                                               
429 |  [40] "drspacejunk"                                                                           
430 |  [41] "angela middleton"                                                                      
431 |  [42] "Sebastian Foxley"                                                                      
432 |  [43] "Kelly Abbott.2"                                                                        
433 |  [44] "Kelly Abbott.3"                                                                        
434 |  [45] "Claire Woodhead.2"                                                                     
435 |  [46] "INRAP.2"                                                                               
436 |  [47] "Michelle Zupan"                                                                        
437 |  [48] "magago"                                                                                
438 |  [49] "RCAHMS.26"                                                                             
439 |  [50] "Jaime Almansa Sánchez.4"                                                               
440 |  [51] "RCAHMS.29"                                                                             
441 |  [52] "Giuliano De Felice"                                                                    
442 |  [53] "ArchaeoAD"                                                                             
443 |  [54] "Becky Wragg Sykes"                                                                     
444 |  [55] "Daniel Pett"                                                                           
445 |  [56] "Lorna Richardson.2"                                                                    
446 |  [57] "Lorna Richardson.3"                                                                    
447 |  [58] "Lorna Richardson.5"                                                                    
448 |  [59] "Lorna Richardson.6"                                                                    
449 |  [60] "Lorna Richardson.7"                                                                    
450 |  [61] "Lorna Richardson.8"                                                                    
451 |  [62] "Todd Whitelaw"                                                                         
452 |  [63] "Valentina.1"                                                                           
453 |  [64] "Simone82"                                                                              
454 |  [65] "Andrea"                                                                                
455 |  [66] "Elizabeth Moore"                                                                       
456 |  [67] "Anabelle Castaño"                                                                      
457 |  [68] "Wessex Archaeology"                                                                    
458 |  [69] "Wessex Archaeology.1"                                                                  
459 |  [70] "Wessex Archaeology.2"                                                                  
460 |  [71] "Wessex Archaeology.3"                                                                  
461 |  [72] "AngelGreen"                                                                            
462 |  [73] "Laracuente.1"                                                                          
463 |  [74] "Christina O'Regan"                                                                     
464 |  [75] "Italian National Association of Archaeologists.1"                                      
465 |  [76] "Janet Jones"                                                                           
466 |  [77] "Henriette Roued-Cunliffe"                                                              
467 |  [78] "Sheena Payne-Lunn"                                                                     
468 |  [79] "Project Florence"                                                                      
469 |  [80] "Monrepos - Archaeological Research Centre and Museum for Human Behavioural Evolution.1"
470 |  [81] "MOLA.13"                                                                               
471 |  [82] "Beverly Chiarulli"                                                                     
472 |  [83] "Richard O'Brien"                                                                       
473 |  [84] "Evaristo Gestoso Rodriguez"                                                            
474 |  [85] "AMTTA.1"                                                                               
475 |  [86] "Margie"                                                                                
476 |  [87] "Sarah Bennett.1"                                                                       
477 |  [88] "duncans"                                                                               
478 |  [89] "John Worth"                                                                            
479 |  [90] "Dana Goodburn-Brown.3"                                                                 
480 |  [91] "Beth Pruitt.1"                                                                         
481 |  [92] "NGO Archaeologica"                                                                     
482 |  [93] "Philadelphia Archaeological Forum.3"                                                   
483 |  [94] "Ashley McCuistion"                                                                     
484 |  [95] "Marni Walter.1"                                                                        
485 |  [96] "AVenovcevs"                                                                            
486 |  [97] "AKOT Heritage"                                                                         
487 |  [98] "nashcl"                                                                                
488 |  [99] "michigan.1"                                                                            
489 | [100] "David Standing.1"                                                                      
490 | [101] "Robin Standring.1"                                                                     
491 | [102] "Laura Griffin"                                                                         
492 | [103] "John Worth.1"                                                                          
493 | [104] "Bernard K. Means.1"                                                                    
494 | [105] "Ian Richardson.1"                                                                      
495 | [106] "aeadams83"                                                                             
496 | [107] "cornelius"                                                                             
497 | [108] "NGO Archaeologica.1"                                                                   
498 | [109] "David Howell.1"                                                                        
499 | [110] "hinesbuwf"                                                                             
500 | [111] "eharchaeology"                                                                         
501 | [112] "Damian Shiels"                                                                         
502 | [113] "Peter Reavill.1"                                                                       
503 | [114] "Peter Reavill.2"                                                                       
504 | [115] "MOLA.16"                                                                               
505 | [116] "Peter Reavill.3"                                                                       
506 | [117] "Rachel Ives.1"                                                                         
507 | [118] "Sophie Hay.1"                                                                          
508 | [119] "Thomas Loebel"                                                                         
509 | [120] "María José Figuerero"                                                                  
510 | [121] "Carmen Ting"                                                                           
511 | [122] "Peter Reavill.4"                                                                       
512 | [123] "RCAHMS.35"                                                                             
513 | [124] "MOLA.19"                                                                               
514 | [125] "transit_monkey"                                                                        
515 | [126] "ralphj"                                                                                
516 | [127] "Peter Reavill.5"                                                                       
517 | [128] "Emily Noel-Paton"                                                                      
518 | [129] "Peter Reavill.6"                                                                       
519 | [130] "Tim Young.1"                                                                           
520 | [131] "Declan Moore (Moore Group).1"                                                          
521 | [132] "Liz Goodman"                                                                           
522 | [133] "Gaye Nayton"                                                                           
523 | [134] "John Worth.2"                                                                          
524 | [135] "MOLA.24"                                                                               
525 | [136] "alinelara"                                                                             
526 | [137] "Claire Woodhead.7"                                                                     
527 | [138] "RCAHMS.40"                                                                             
528 | [139] "Peter Reavill.8"                                                                       
529 | [140] "Cathy Dagg.1"                                                                          
530 | [141] "Helen Williams.1"                                                                      
531 | [142] "Charles Mount.1"                                                                       
532 | [143] "murosv"                                                                                
533 | [144] "Ferry"                                                                                 
534 | [145] "Grace Krause.1"                                                                        
535 | [146] "hinesbuwf.1"                                                                           
536 | [147] "David Hunter"                                                                          
537 | [148] "Peter Reavill.9"                                                                       
538 | 
539 | [[20]]
540 |  [1] "Matt Law"               "alexism"                "SUrachi"               
541 |  [4] "Rob Hedge"              "Claire Woodhead"        "Matt Law.1"            
542 |  [7] "Rebecca"                "castlesandcoprolites"   "Matt Law.2"            
543 | [10] "Dana Goodburn-Brown.1"  "Melonie Shier"          "David Osborne"         
544 | [13] "eastoxford"             "DeborahFox"             "Liza Kavanagh"         
545 | [16] "Joe Flatman"            "Pat Hadley"             "long1086"              
546 | [19] "Hembo Pagi"             "Stu Eve"                "Andy Dufton"           
547 | [22] "Sara Perry"             "Allison Mickel"         "Richard Madgwick"      
548 | [25] "Alice Forward"          "Jacqui Mulville"        "castlesandcoprolites.1"
549 | [28] "Don Henson.3"          
550 | 
551 | [[21]]
552 |  [1] "CAT"                                                                                 
553 |  [2] "CoDA_ucb.3"                                                                          
554 |  [3] "bajrjobs.1"                                                                          
555 |  [4] "Carly Hilts, Current Archaeology/Current World Archaeology"                          
556 |  [5] "Samantha Brown"                                                                      
557 |  [6] "bajrjobs.2"                                                                          
558 |  [7] "bajrjobs.3"                                                                          
559 |  [8] "Annie Partridge"                                                                     
560 |  [9] "F.R.A.G."                                                                            
561 | [10] "Christopher Merritt"                                                                 
562 | [11] "Monrepos - Archaeological Research Centre and Museum for Human Behavioural Evolution"
563 | [12] "Anne Jensen"                                                                         
564 | [13] "Xtinebean"                                                                           
565 | [14] "Nancy Grace"                                                                         
566 | [15] "Carl Carlson-Drexler"                                                                
567 | [16] "Matthew Jones"                                                                       
568 | [17] "Carly Hilts, Current Archaeology/Current World Archaeology.1"                        
569 | [18] "Terry Brock"                                                                         
570 | [19] "MOLA.21"                                                                             
571 | [20] "Robyn Antanovskii"                                                                   
572 | [21] "izoken"                                                                              
573 | [22] "Geoff Wyatt"                                                                         
574 | 
575 | [[22]]
576 |  [1] "Darlene Applegate"                   "Sean Naleimaile"                    
577 |  [3] "Mandy Ranslow"                       "Jamie Chad Brandon"                 
578 |  [5] "John Lowe"                           "cdrexler"                           
579 |  [7] "Nicole Bucchino"                     "Sean Naleimaile.1"                  
580 |  [9] "cames"                               "Claire vanNierop"                   
581 | [11] "gwynn henderson"                     "Glynis Irwin.1"                     
582 | [13] "Philadelphia Archaeological Forum"   "Philadelphia Archaeological Forum.1"
583 | [15] "Philadelphia Archaeological Forum.2" "Philadelphia Archaeological Forum.4"
584 | [17] "John Lowe.1"                         "Mandy Ranslow.1"                    
585 | [19] "Kurt Thomas Hunt"                    "Valerie M. J. Hall"                 
586 | [21] "Rebecca Duggan"                      "Lucy Johnson"                       
587 | [23] "Jamie Chad Brandon.1"               
588 | 
589 | [[23]]
590 |  [1] "SuccinctBill"           "Doug"                   "CoDA_ucb"              
591 |  [4] "Russell Alleen-Willems" "Shawn Graham"           "CoDA_ucb.1"            
592 |  [7] "Leigh Anne"             "CoDA_ucb.2"             "CoDA_ucb.4"            
593 | [10] "Beatrice Hopkinson"     "CoDA_ucb.5"             "Neil Gevaux"           
594 | [13] "APAAME"                 "Andrew Reinhard"        "Ray Moore"             
595 | [16] "cejo"                   "Doug.1"                 "Ulla Rajala"           
596 | [19] "Eric Kansa"             "Shawn Graham.2"         "emmajaneoriordan"      
597 | [22] "ADS"                    "Ethan Watrall"          "Andrew Reinhard.1"     
598 | [25] "Kasia"                  "emmajaneoriordan.1"     "Kasia.1"               
599 | 
600 | [[24]]
601 |  [1] "Francis Deblauwe"           "Jacq Matthews"             
602 |  [3] "Kathryn E. Piquette"        "Bob Muckle"                
603 |  [5] "Mark Patton"                "johnwillimas"              
604 |  [7] "SuzieThomas"                "cristiana.2"               
605 |  [9] "Kelly M"                    "Becky Wragg Sykes.1"       
606 | [11] "Alex Nagel"                 "Udjahorresnet"             
607 | [13] "Kathryn E. Piquette.1"      "Diefenerfer.1"             
608 | [15] "Nancy Lovell"               "bupap"                     
609 | [17] "terhi"                      "Henriette Roued-Cunliffe.1"
610 | [19] "Axel G. Posluschny"         "Bob Muckle.1"              
611 | [21] "ArchaeoAD.1"                "Melanie Pitkin"            
612 | 
613 | [[25]]
614 |  [1] "Adam Corsini"        "Adam Corsini.1"      "Adam Corsini.3"     
615 |  [4] "Adam Corsini.4"      "Adam Corsini.5"      "Marcel Dallinger"   
616 |  [7] "Pippa Pearce"        "CooperCenter"        "Andrew Fetherston"  
617 | [10] "Lucy Sawyer"         "Andrew Fetherston.1" "CooperCenter.1"     
618 | [13] "Andrew Fetherston.2" "Andrew Fetherston.3" "fieldwork"          
619 | [16] "Andrew Fetherston.4" "Andrew Fetherston.5"
620 | 
621 | [[26]]
622 |  [1] "Diefenerfer"                "Angela Piccini"            
623 |  [3] "Serra Head"                 "Ryan Swanson"              
624 |  [5] "PalatineEastPotteryProject" "magago.1"                  
625 |  [7] "Mathias Probst"             "Jacq Matthews.1"           
626 |  [9] "Tanya Peres Lemons"         "cartvol.5"                 
627 | [11] "Lorna Richardson.10"        "FlindersArchSoc"           
628 | [13] "tkriek"                     "Polly Peterson"            
629 | [15] "Vasilka Dimitrovska"       
630 | 
631 | [[27]]
632 |  [1] "DavidAltoft"               "ssprince"                 
633 |  [3] "Bairbre Mullee"            "Cayla Breiling"           
634 |  [5] "Sally Rodgers"             "Sarah JaneHarknett"       
635 |  [7] "Claire Woodhead.1"         "Alex Moseley"             
636 |  [9] "Samantha Colclough"        "Amanda Clarke"            
637 | [11] "talia_shay"                "Marni Walter"             
638 | [13] "Archaeology UFPI - BRAZIL" "Cara Jones.1"             
639 | [15] "sarah_may1.1"              "Charlotte Douglas.1"      
640 | [17] "Bairbre Mullee.1"          "Samantha Barnes"          
641 | [19] "Hayley Forsyth"            "Brian"                    
642 | [21] "Joanne Robinson"           "LizzieW"                  
643 | [23] "bthorn"                    "Mike Pitts"               
644 | [25] "Andrew Mayfield"           "Andrew Mayfield.2"        
645 | [27] "Brian.1"                  
646 | 
647 | [[28]]
648 |  [1] "Alfred W. Bowers Laboratory of Anthropology"
649 |  [2] "Stefano Costa"                              
650 |  [3] "Penny Johnston"                             
651 |  [4] "Frank Lynam"                                
652 |  [5] "MOLA.4"                                     
653 |  [6] "Lorna Richardson.9"                         
654 |  [7] "Helen Sharp"                                
655 |  [8] "FALSE"                                      
656 |  [9] "Rachael Sparks.1"                           
657 | [10] "Claire Woodhead.3"                          
658 | [11] "Roman Baths Museum"                         
659 | [12] "sven"                                       
660 | [13] "Dawn McLaren.1"                             
661 | [14] "RCAHMS.42"                                  
662 | 
663 | [[29]]
664 |  [1] "Bob Clarke"                         "Chiz Harward (Urban Archaeology).1"
665 |  [3] "Katy Meyers"                        "Rose"                              
666 |  [5] "mwilliams"                          "Nicola Hembrey"                    
667 |  [7] "Helen Goodchild"                    "Sue Harrington"                    
668 |  [9] "Kasia.2"                            "jpalmer"                           
669 | [11] "Stefan Sagrott"                    
670 | 
671 | [[30]]
672 |  [1] "SusanneT"                     "adamrabinowitz"              
673 |  [3] "Donna Yates"                  "Eleanor Ghey"                
674 |  [5] "Keith Fitzpatrick-Matthews"   "phdiva"                      
675 |  [7] "Keith Fitzpatrick-Matthews.1" "Laura Burnett.1"             
676 |  [9] "Charlotte Dixon"              "Peter Reavill.7"             
677 | [11] "Keith Fitzpatrick-Matthews.2" "Wendy Scott"                 
678 | 
679 | 680 |

Visualisation of author groups

681 | 682 |

Here is a static visualisation of the relationship between all the authors. We get a quick sense that there are distinctive groups, but it's too small to show author names which is a major limitation. 683 | static visualisation

684 | 685 |

Here is a slightly interactive visualisation, where we can see names on the nodes (click on them to magnify the name) and inspect them in more detail by dragging them around.

686 | 687 |

An even more interactive version can be downloaded here (right-click -> save link as…) and opened in Gephi.

688 | 689 |

Discussion and Conclusion

690 | 691 |

To return to the questions that motivated this little investigation, we can get some answers from what is missing from the results, as well as what is present. Key terms that are not prominent in the topics are pyramids, temples, whips and any kind of firearm. This suggests that a day in the life of the most popular Hollywood archaeologists has little in common with the people contributing to the day of archaeology. There is a small area of intersection, as

692 | 693 |

The questions I'm attempting to answer with this distant reading include: what is a typical day for an archaeologist? What are the different kinds of day that are represented in this collection? Do all archaeologists have generally similar days or not? As an archaeologist also I'm curious to see how my day compares with others!

694 | 695 | 696 | 697 | 698 | 699 | --------------------------------------------------------------------------------