├── .gitignore ├── LICENSE ├── README.md ├── draw_plot.R ├── gh-comment-sentiment.csv └── plot.png /.gitignore: -------------------------------------------------------------------------------- 1 | # History files 2 | .Rhistory 3 | .Rapp.history 4 | 5 | # Session Data files 6 | .RData 7 | 8 | # Example code in package build process 9 | *-Ex.R 10 | 11 | # Output files from R CMD build 12 | /*.tar.gz 13 | 14 | # Output files from R CMD check 15 | /*.Rcheck/ 16 | 17 | # RStudio files 18 | .Rproj.user/ 19 | 20 | # produced vignettes 21 | vignettes/*.html 22 | vignettes/*.pdf 23 | 24 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 25 | .httr-oauth 26 | 27 | # knitr and R markdown default cache directories 28 | /*_cache/ 29 | /cache/ 30 | 31 | # Temporary files created by R markdown 32 | *.utf8.md 33 | *.knit.md 34 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Sergey Abakumoff 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [Github commit comments](https://github.com/blog/42-commit-comments) allow developers to express opinions about modifications of a code. The comments data were obtained from the [BigQuery Github Archive](https://cloud.google.com/bigquery/public-data/github) and the sentiment analysis on them has been conducted with a help of Google Natural Language API. ~80K [comments sample](gh-comment-sentiment.csv)(~10K comments for each programming language) had been used to rank programming languages by a percentage of positive/negative comments. Here is the result, [built](draw_plot.R) with R. 2 | 3 | ![](plot.png) 4 | -------------------------------------------------------------------------------- /draw_plot.R: -------------------------------------------------------------------------------- 1 | library(ggplot2) 2 | library(dplyr) 3 | data<-read.csv("gh-comment-sentiment.csv", header=TRUE) 4 | stats1<-data %>% select(language, score) %>% 5 | mutate(tone=as.factor(ifelse(score>0, "positive", ifelse(score<0, "negative", "neutral")))) %>% 6 | select(language, tone) %>% 7 | group_by(language, tone) %>% 8 | summarize(count=n()) %>% 9 | ungroup() %>% arrange(language, desc(tone)) %>% 10 | group_by(language) %>% 11 | mutate(tone_percent = 100*count/sum(count), label_pos=cumsum(tone_percent) - 0.5 * tone_percent) 12 | temp<-stats1 %>% filter(tone=="positive") %>% arrange(-tone_percent) %>% select(language) 13 | temp$pos<-seq.int(nrow(temp)) 14 | stats1<-merge(stats1, temp, by="language") 15 | ggplot() + theme_bw() + geom_bar(aes(y=tone_percent, x=reorder(language, -pos), fill=tone), data=stats1, stat="identity") + 16 | geom_text(data=stats1, aes(x = language, y = label_pos, ymax=label_pos, hjust = 0.5, label = paste0(round(tone_percent),"%")), size=4) + 17 | labs(x="Language", y="Percentage of sentiment") + 18 | scale_fill_manual(values=c('#F45E5A', '#5086FF', '#17B12B'), guide = guide_legend(reverse = TRUE)) + 19 | theme(legend.position="bottom", legend.direction="horizontal", legend.title = element_blank()) + coord_flip() 20 | -------------------------------------------------------------------------------- /plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sAbakumoff/gh-comments-sentiment/f1d4baf55bf336b14d60975739f0fb47ad68017b/plot.png --------------------------------------------------------------------------------