├── .gitignore ├── CODE_OF_CONDUCT ├── LICENSE ├── README.md ├── build.gradle.kts ├── gradle.properties ├── gradle └── wrapper │ ├── gradle-wrapper.jar │ └── gradle-wrapper.properties ├── gradlew ├── gradlew.bat ├── settings.gradle.kts └── src ├── main ├── kotlin │ ├── app.kt │ ├── logic │ │ ├── cli.kt │ │ ├── log.kt │ │ └── s3.kt │ └── utils │ │ ├── logging.kt │ │ └── sqldelight.kt └── sqldelight │ └── com │ └── benasher44 │ └── kloudfrontblogstats │ ├── AccessLog.sq │ └── SchemaVersion.sq └── test └── kotlin └── LogParsingTest.kt /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | .gradle 3 | build 4 | *.iml 5 | local.properties 6 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT: -------------------------------------------------------------------------------- 1 | 2 | # Contributor Covenant Code of Conduct 3 | 4 | ## Our Pledge 5 | 6 | We as members, contributors, and leaders pledge to make participation in our 7 | community a harassment-free experience for everyone, regardless of age, body 8 | size, visible or invisible disability, ethnicity, sex characteristics, gender 9 | identity and expression, level of experience, education, socio-economic status, 10 | nationality, personal appearance, race, religion, or sexual identity 11 | and orientation. 12 | 13 | We pledge to act and interact in ways that contribute to an open, welcoming, 14 | diverse, inclusive, and healthy community. 15 | 16 | ## Our Standards 17 | 18 | Examples of behavior that contributes to a positive environment for our 19 | community include: 20 | 21 | * Demonstrating empathy and kindness toward other people 22 | * Being respectful of differing opinions, viewpoints, and experiences 23 | * Giving and gracefully accepting constructive feedback 24 | * Accepting responsibility and apologizing to those affected by our mistakes, 25 | and learning from the experience 26 | * Focusing on what is best not just for us as individuals, but for the 27 | overall community 28 | 29 | Examples of unacceptable behavior include: 30 | 31 | * The use of sexualized language or imagery, and sexual attention or 32 | advances of any kind 33 | * Trolling, insulting or derogatory comments, and personal or political attacks 34 | * Public or private harassment 35 | * Publishing others' private information, such as a physical or email 36 | address, without their explicit permission 37 | * Other conduct which could reasonably be considered inappropriate in a 38 | professional setting 39 | 40 | ## Enforcement Responsibilities 41 | 42 | Community leaders are responsible for clarifying and enforcing our standards of 43 | acceptable behavior and will take appropriate and fair corrective action in 44 | response to any behavior that they deem inappropriate, threatening, offensive, 45 | or harmful. 46 | 47 | Community leaders have the right and responsibility to remove, edit, or reject 48 | comments, commits, code, wiki edits, issues, and other contributions that are 49 | not aligned to this Code of Conduct, and will communicate reasons for moderation 50 | decisions when appropriate. 51 | 52 | ## Scope 53 | 54 | This Code of Conduct applies within all community spaces, and also applies when 55 | an individual is officially representing the community in public spaces. 56 | Examples of representing our community include using an official e-mail address, 57 | posting via an official social media account, or acting as an appointed 58 | representative at an online or offline event. 59 | 60 | ## Enforcement 61 | 62 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 63 | reported to the community leaders responsible for enforcement at 64 | hello at benasher.co. 65 | All complaints will be reviewed and investigated promptly and fairly. 66 | 67 | All community leaders are obligated to respect the privacy and security of the 68 | reporter of any incident. 69 | 70 | ## Enforcement Guidelines 71 | 72 | Community leaders will follow these Community Impact Guidelines in determining 73 | the consequences for any action they deem in violation of this Code of Conduct: 74 | 75 | ### 1. Correction 76 | 77 | **Community Impact**: Use of inappropriate language or other behavior deemed 78 | unprofessional or unwelcome in the community. 79 | 80 | **Consequence**: A private, written warning from community leaders, providing 81 | clarity around the nature of the violation and an explanation of why the 82 | behavior was inappropriate. A public apology may be requested. 83 | 84 | ### 2. Warning 85 | 86 | **Community Impact**: A violation through a single incident or series 87 | of actions. 88 | 89 | **Consequence**: A warning with consequences for continued behavior. No 90 | interaction with the people involved, including unsolicited interaction with 91 | those enforcing the Code of Conduct, for a specified period of time. This 92 | includes avoiding interactions in community spaces as well as external channels 93 | like social media. Violating these terms may lead to a temporary or 94 | permanent ban. 95 | 96 | ### 3. Temporary Ban 97 | 98 | **Community Impact**: A serious violation of community standards, including 99 | sustained inappropriate behavior. 100 | 101 | **Consequence**: A temporary ban from any sort of interaction or public 102 | communication with the community for a specified period of time. No public or 103 | private interaction with the people involved, including unsolicited interaction 104 | with those enforcing the Code of Conduct, is allowed during this period. 105 | Violating these terms may lead to a permanent ban. 106 | 107 | ### 4. Permanent Ban 108 | 109 | **Community Impact**: Demonstrating a pattern of violation of community 110 | standards, including sustained inappropriate behavior, harassment of an 111 | individual, or aggression toward or disparagement of classes of individuals. 112 | 113 | **Consequence**: A permanent ban from any sort of public interaction within 114 | the community. 115 | 116 | ## Attribution 117 | 118 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 119 | version 2.0, available at 120 | [https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0]. 121 | 122 | Community Impact Guidelines were inspired by 123 | [Mozilla's code of conduct enforcement ladder][Mozilla CoC]. 124 | 125 | For answers to common questions about this code of conduct, see the FAQ at 126 | [https://www.contributor-covenant.org/faq][FAQ]. Translations are available 127 | at [https://www.contributor-covenant.org/translations][translations]. 128 | 129 | [homepage]: https://www.contributor-covenant.org 130 | [v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html 131 | [Mozilla CoC]: https://github.com/mozilla/diversity 132 | [FAQ]: https://www.contributor-covenant.org/faq 133 | [translations]: https://www.contributor-covenant.org/translations 134 | 135 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright © 2020 Ben Asher 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Kloudfront Blog Stats 2 | 3 | A lambda function and CLI tool for processing AWS Cloudfront access logs in Kotlin and dumping the useful data into postgres. 4 | 5 | ## Why? 6 | 7 | My blog ([benasher.co](https://benasher.co)) is a static site hosted via S3 and AWS Cloudfront. I built this tool, so that I could get longer-living page view data beyond the 60-day retention period that Cloudfront gives you in its reports. 8 | 9 | ## Goals 10 | 11 | Be able to write queries to assess: 12 | 13 | 1. Views per day, week, month for the site and per page 14 | 1. Top referers 15 | 1. Sanitize AWS log data to prepare it to be queried 16 | 1. Run serverless to keep costs low— primary use case is occassional usage (site owner occasionally runs queries) 17 | 18 | Cloudfront gives you reports for some of this information, but the data only goes back 60 days. Processing logs into a database allows quer 19 | 20 | ## Non-Goals 21 | 22 | 1. Store data that would allow tracking users or locations 23 | 24 | ## What it does 25 | 26 | This parses Cloudfront access logs and extracts: 27 | 28 | 1. Access date and time in UTC 29 | 1. Referer header 30 | 1. User Agent 31 | 1. Path component of the URL accessed 32 | 33 | 🚨 All paths are normalized to remove the trailing slash. Once a log is processed, the extracted data is dumped into postgres, and the log file is *deleted from S3* 🧹. 34 | 35 | ## Usage 36 | 37 | ### Important Environment Variables 38 | 39 | * [SDK credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html) 40 | * `PG_USER`: The postgres database user 41 | * `PG_PASSWORD`: The postgres database password 42 | * `PG_URL`: The postgres database url in the format: `postgresql://YOUR_DB_LOCATION/YOUR_DB_NAME` 43 | * `LOG_BUCKET_REGION` (unless supplied on the command line): The AWS region where your S3 bucket lives 44 | * `LOG_BUCKET` (unless supplied on the command line): The name of the bucket where the logs live, to be parsed. 45 | 46 | ### Deploy to Lambda 47 | 48 | The below assumes you have the aws cli tool setup, and AWS credentials configured for it. 49 | 50 | 1. `./gradlew clean fatJar` 51 | 1. Command to create the function (pay attention to all caps variables that need substitution): 52 | ``` 53 | aws lambda create-function --function-name YOUR_FUNCTION_NAME --runtime java8 \ 54 | --zip-file fileb://build/libs/KloudfrontBlogStats-1.0-SNAPSHOT-fat.jar --handler com.benasher44.kloudfrontblogstats.AppKt::s3Handler \ 55 | --role YOUR_ROLE_FOR_LAMBDA \ 56 | --vpc-config YOUR_VPC_CONFIG \ 57 | --environment "Variables={LOG_BUCKET=YOUR_LOG_BUCKET,LOG_BUCKET_REGION=YOUR_S3_BUCKET_REGION,PG_URL=postgresql://YOUR_DB_LOCATION/YOUR_DB_NAME,PG_USER=YOUR_PG_USER,PG_PASSWORD=YOUR_PG_PASSWORD}" \ 58 | --timeout 300 \ 59 | --memory-size 512 60 | ``` 61 | 62 | ### CLI 63 | 64 | 1. `./gradlew clean fatJar` 65 | 1. `java -jar build/libs/KloudfrontBlogStats-1.0-SNAPSHOT-fat.jar --help` 66 | 67 | This is mainly useful for testing, though you could run it locally and not pay for AWS Lambda at all. By default, the CLI tool does not delete logs from S3. See the help text for how to enable that. 68 | 69 | ## Useful Resources 70 | 71 | * [Configuring and using standard logs (access logs)](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html) 72 | * [Kotlin and Groovy JVM Languages with AWS Lambda](https://aws.amazon.com/blogs/compute/kotlin-and-groovy-jvm-languages-with-aws-lambda/) 73 | * [Tutorial: Using AWS Lambda with Amazon S3](https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html) 74 | * [Tutorial: Configuring a Lambda function to access Amazon RDS in an Amazon VPC](https://docs.aws.amazon.com/lambda/latest/dg/services-rds-tutorial.html) 75 | * [Why can’t I connect to an S3 bucket using a gateway VPC endpoint?](https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/) 76 | -------------------------------------------------------------------------------- /build.gradle.kts: -------------------------------------------------------------------------------- 1 | 2 | 3 | plugins { 4 | kotlin("jvm") version "1.4.21" 5 | kotlin("plugin.serialization") version "1.4.21" 6 | id("com.squareup.sqldelight") version "1.4.4" 7 | application 8 | } 9 | 10 | group = "com.benasher44" 11 | version = "1.0-SNAPSHOT" 12 | 13 | kotlin { 14 | // TODO: SQLDelight internal hack 15 | // explicitApi() 16 | target { 17 | compilations.all { 18 | kotlinOptions.jvmTarget = "1.8" 19 | } 20 | } 21 | } 22 | 23 | repositories { 24 | mavenCentral() 25 | } 26 | 27 | application { 28 | mainClass.set("com.benasher44.kloudfrontblogstats.AppKt") 29 | } 30 | 31 | sqldelight { 32 | database("KBSDatabase") { 33 | packageName = "com.benasher44.kloudfrontblogstats" 34 | dialect = "postgresql" 35 | } 36 | } 37 | 38 | val ktlintConfig by configurations.creating 39 | 40 | dependencies { 41 | ktlintConfig("com.pinterest:ktlint:0.40.0") 42 | } 43 | 44 | val ktlint by tasks.registering(JavaExec::class) { 45 | group = "verification" 46 | description = "Check Kotlin code style." 47 | classpath = ktlintConfig 48 | main = "com.pinterest.ktlint.Main" 49 | args = listOf("src/**/*.kt") 50 | } 51 | 52 | val ktlintformat by tasks.registering(JavaExec::class) { 53 | group = "formatting" 54 | description = "Fix Kotlin code style deviations." 55 | classpath = ktlintConfig 56 | main = "com.pinterest.ktlint.Main" 57 | args = listOf("-F", "src/**/*.kt", "*.kts") 58 | } 59 | 60 | val checkTask = tasks.named("check") 61 | checkTask.configure { 62 | dependsOn(ktlint) 63 | } 64 | 65 | dependencies { 66 | implementation("com.github.ajalt.clikt:clikt:3.1.0") 67 | implementation("com.squareup.sqldelight:jdbc-driver:1.4.4") 68 | implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.0.1") 69 | implementation("org.postgresql:postgresql:42.2.18") 70 | implementation("org.slf4j:slf4j-simple:1.7.30") 71 | implementation(platform("software.amazon.awssdk:bom:2.15.53")) 72 | implementation("software.amazon.awssdk:s3") 73 | 74 | testImplementation(kotlin("test-junit5")) 75 | testImplementation("org.junit.jupiter:junit-jupiter-api:5.7.0") 76 | testRuntimeOnly("org.junit.jupiter:junit-jupiter-engine:5.7.0") 77 | } 78 | 79 | tasks.test { 80 | useJUnitPlatform() 81 | } 82 | 83 | val fatJar by tasks.registering(Jar::class) { 84 | dependsOn(configurations.named("runtimeClasspath")) 85 | dependsOn(tasks.named("jar")) 86 | 87 | archiveClassifier.set("fat") 88 | 89 | manifest { 90 | attributes["Main-Class"] = application.mainClass.get() 91 | } 92 | 93 | val sourceClasses = sourceSets.main.get().output.classesDirs 94 | inputs.files(sourceClasses) 95 | 96 | from(files(sourceClasses)) 97 | from(files(sourceClasses)) 98 | from(configurations.runtimeClasspath.get().asFileTree.files.map { zipTree(it) }) 99 | exclude("**/*.kotlin_metadata") 100 | exclude("**/*.kotlin_module") 101 | exclude("**/*.kotlin_builtins") 102 | exclude("**/module-info.class") 103 | exclude("META-INF/maven/**") 104 | } 105 | -------------------------------------------------------------------------------- /gradle.properties: -------------------------------------------------------------------------------- 1 | kotlin.code.style=official 2 | -------------------------------------------------------------------------------- /gradle/wrapper/gradle-wrapper.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/benasher44/KloudfrontBlogStats/c662788c6ed6bd07323e783b2f52183dd21d09dc/gradle/wrapper/gradle-wrapper.jar -------------------------------------------------------------------------------- /gradle/wrapper/gradle-wrapper.properties: -------------------------------------------------------------------------------- 1 | distributionBase=GRADLE_USER_HOME 2 | distributionPath=wrapper/dists 3 | distributionUrl=https\://services.gradle.org/distributions/gradle-6.7.1-bin.zip 4 | zipStoreBase=GRADLE_USER_HOME 5 | zipStorePath=wrapper/dists 6 | -------------------------------------------------------------------------------- /gradlew: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env sh 2 | 3 | # 4 | # Copyright 2015 the original author or authors. 5 | # 6 | # Licensed under the Apache License, Version 2.0 (the "License"); 7 | # you may not use this file except in compliance with the License. 8 | # You may obtain a copy of the License at 9 | # 10 | # https://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | # 18 | 19 | ############################################################################## 20 | ## 21 | ## Gradle start up script for UN*X 22 | ## 23 | ############################################################################## 24 | 25 | # Attempt to set APP_HOME 26 | # Resolve links: $0 may be a link 27 | PRG="$0" 28 | # Need this for relative symlinks. 29 | while [ -h "$PRG" ] ; do 30 | ls=`ls -ld "$PRG"` 31 | link=`expr "$ls" : '.*-> \(.*\)$'` 32 | if expr "$link" : '/.*' > /dev/null; then 33 | PRG="$link" 34 | else 35 | PRG=`dirname "$PRG"`"/$link" 36 | fi 37 | done 38 | SAVED="`pwd`" 39 | cd "`dirname \"$PRG\"`/" >/dev/null 40 | APP_HOME="`pwd -P`" 41 | cd "$SAVED" >/dev/null 42 | 43 | APP_NAME="Gradle" 44 | APP_BASE_NAME=`basename "$0"` 45 | 46 | # Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. 47 | DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"' 48 | 49 | # Use the maximum available, or set MAX_FD != -1 to use that value. 50 | MAX_FD="maximum" 51 | 52 | warn () { 53 | echo "$*" 54 | } 55 | 56 | die () { 57 | echo 58 | echo "$*" 59 | echo 60 | exit 1 61 | } 62 | 63 | # OS specific support (must be 'true' or 'false'). 64 | cygwin=false 65 | msys=false 66 | darwin=false 67 | nonstop=false 68 | case "`uname`" in 69 | CYGWIN* ) 70 | cygwin=true 71 | ;; 72 | Darwin* ) 73 | darwin=true 74 | ;; 75 | MINGW* ) 76 | msys=true 77 | ;; 78 | NONSTOP* ) 79 | nonstop=true 80 | ;; 81 | esac 82 | 83 | CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar 84 | 85 | 86 | # Determine the Java command to use to start the JVM. 87 | if [ -n "$JAVA_HOME" ] ; then 88 | if [ -x "$JAVA_HOME/jre/sh/java" ] ; then 89 | # IBM's JDK on AIX uses strange locations for the executables 90 | JAVACMD="$JAVA_HOME/jre/sh/java" 91 | else 92 | JAVACMD="$JAVA_HOME/bin/java" 93 | fi 94 | if [ ! -x "$JAVACMD" ] ; then 95 | die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME 96 | 97 | Please set the JAVA_HOME variable in your environment to match the 98 | location of your Java installation." 99 | fi 100 | else 101 | JAVACMD="java" 102 | which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. 103 | 104 | Please set the JAVA_HOME variable in your environment to match the 105 | location of your Java installation." 106 | fi 107 | 108 | # Increase the maximum file descriptors if we can. 109 | if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then 110 | MAX_FD_LIMIT=`ulimit -H -n` 111 | if [ $? -eq 0 ] ; then 112 | if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then 113 | MAX_FD="$MAX_FD_LIMIT" 114 | fi 115 | ulimit -n $MAX_FD 116 | if [ $? -ne 0 ] ; then 117 | warn "Could not set maximum file descriptor limit: $MAX_FD" 118 | fi 119 | else 120 | warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT" 121 | fi 122 | fi 123 | 124 | # For Darwin, add options to specify how the application appears in the dock 125 | if $darwin; then 126 | GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\"" 127 | fi 128 | 129 | # For Cygwin or MSYS, switch paths to Windows format before running java 130 | if [ "$cygwin" = "true" -o "$msys" = "true" ] ; then 131 | APP_HOME=`cygpath --path --mixed "$APP_HOME"` 132 | CLASSPATH=`cygpath --path --mixed "$CLASSPATH"` 133 | 134 | JAVACMD=`cygpath --unix "$JAVACMD"` 135 | 136 | # We build the pattern for arguments to be converted via cygpath 137 | ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null` 138 | SEP="" 139 | for dir in $ROOTDIRSRAW ; do 140 | ROOTDIRS="$ROOTDIRS$SEP$dir" 141 | SEP="|" 142 | done 143 | OURCYGPATTERN="(^($ROOTDIRS))" 144 | # Add a user-defined pattern to the cygpath arguments 145 | if [ "$GRADLE_CYGPATTERN" != "" ] ; then 146 | OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)" 147 | fi 148 | # Now convert the arguments - kludge to limit ourselves to /bin/sh 149 | i=0 150 | for arg in "$@" ; do 151 | CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -` 152 | CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option 153 | 154 | if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition 155 | eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"` 156 | else 157 | eval `echo args$i`="\"$arg\"" 158 | fi 159 | i=`expr $i + 1` 160 | done 161 | case $i in 162 | 0) set -- ;; 163 | 1) set -- "$args0" ;; 164 | 2) set -- "$args0" "$args1" ;; 165 | 3) set -- "$args0" "$args1" "$args2" ;; 166 | 4) set -- "$args0" "$args1" "$args2" "$args3" ;; 167 | 5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;; 168 | 6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;; 169 | 7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;; 170 | 8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;; 171 | 9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;; 172 | esac 173 | fi 174 | 175 | # Escape application args 176 | save () { 177 | for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done 178 | echo " " 179 | } 180 | APP_ARGS=`save "$@"` 181 | 182 | # Collect all arguments for the java command, following the shell quoting and substitution rules 183 | eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS" 184 | 185 | exec "$JAVACMD" "$@" 186 | -------------------------------------------------------------------------------- /gradlew.bat: -------------------------------------------------------------------------------- 1 | @rem 2 | @rem Copyright 2015 the original author or authors. 3 | @rem 4 | @rem Licensed under the Apache License, Version 2.0 (the "License"); 5 | @rem you may not use this file except in compliance with the License. 6 | @rem You may obtain a copy of the License at 7 | @rem 8 | @rem https://www.apache.org/licenses/LICENSE-2.0 9 | @rem 10 | @rem Unless required by applicable law or agreed to in writing, software 11 | @rem distributed under the License is distributed on an "AS IS" BASIS, 12 | @rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | @rem See the License for the specific language governing permissions and 14 | @rem limitations under the License. 15 | @rem 16 | 17 | @if "%DEBUG%" == "" @echo off 18 | @rem ########################################################################## 19 | @rem 20 | @rem Gradle startup script for Windows 21 | @rem 22 | @rem ########################################################################## 23 | 24 | @rem Set local scope for the variables with windows NT shell 25 | if "%OS%"=="Windows_NT" setlocal 26 | 27 | set DIRNAME=%~dp0 28 | if "%DIRNAME%" == "" set DIRNAME=. 29 | set APP_BASE_NAME=%~n0 30 | set APP_HOME=%DIRNAME% 31 | 32 | @rem Resolve any "." and ".." in APP_HOME to make it shorter. 33 | for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi 34 | 35 | @rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. 36 | set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m" 37 | 38 | @rem Find java.exe 39 | if defined JAVA_HOME goto findJavaFromJavaHome 40 | 41 | set JAVA_EXE=java.exe 42 | %JAVA_EXE% -version >NUL 2>&1 43 | if "%ERRORLEVEL%" == "0" goto execute 44 | 45 | echo. 46 | echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. 47 | echo. 48 | echo Please set the JAVA_HOME variable in your environment to match the 49 | echo location of your Java installation. 50 | 51 | goto fail 52 | 53 | :findJavaFromJavaHome 54 | set JAVA_HOME=%JAVA_HOME:"=% 55 | set JAVA_EXE=%JAVA_HOME%/bin/java.exe 56 | 57 | if exist "%JAVA_EXE%" goto execute 58 | 59 | echo. 60 | echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% 61 | echo. 62 | echo Please set the JAVA_HOME variable in your environment to match the 63 | echo location of your Java installation. 64 | 65 | goto fail 66 | 67 | :execute 68 | @rem Setup the command line 69 | 70 | set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar 71 | 72 | 73 | @rem Execute Gradle 74 | "%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %* 75 | 76 | :end 77 | @rem End local scope for the variables with windows NT shell 78 | if "%ERRORLEVEL%"=="0" goto mainEnd 79 | 80 | :fail 81 | rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of 82 | rem the _cmd.exe /c_ return code! 83 | if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1 84 | exit /b 1 85 | 86 | :mainEnd 87 | if "%OS%"=="Windows_NT" endlocal 88 | 89 | :omega 90 | -------------------------------------------------------------------------------- /settings.gradle.kts: -------------------------------------------------------------------------------- 1 | 2 | rootProject.name = "KloudfrontBlogStats" 3 | -------------------------------------------------------------------------------- /src/main/kotlin/app.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats 2 | 3 | import com.benasher44.kloudfrontblogstats.logic.CLI 4 | import com.benasher44.kloudfrontblogstats.logic.S3Object 5 | import com.benasher44.kloudfrontblogstats.logic.S3Service 6 | import com.benasher44.kloudfrontblogstats.logic.enumerateLogs 7 | import com.benasher44.kloudfrontblogstats.utils.LOGGER 8 | import com.benasher44.kloudfrontblogstats.utils.logMessage 9 | import com.benasher44.kloudfrontblogstats.utils.setLogger 10 | import com.benasher44.kloudfrontblogstats.utils.withLazyConnection 11 | import software.amazon.awssdk.regions.Region 12 | import java.io.InputStream 13 | import java.io.OutputStream 14 | import java.util.zip.GZIPInputStream 15 | import kotlin.system.exitProcess 16 | 17 | private val REGION by lazy { 18 | requireNotNull(System.getenv("LOG_BUCKET_REGION")) { 19 | "Specify log bucket region by setting the LOG_BUCKET_REGION env var" 20 | } 21 | } 22 | 23 | private val BUCKET by lazy { 24 | requireNotNull(System.getenv("LOG_BUCKET")) { 25 | "Specify log bucket by setting the LOG_BUCKET env var" 26 | } 27 | } 28 | 29 | @Suppress("unused", "RedundantVisibilityModifier") 30 | public fun s3Handler(input: InputStream, output: OutputStream) { 31 | try { 32 | handleObjects( 33 | BUCKET, 34 | S3Service(Region.of(REGION), true) 35 | ) 36 | } finally { 37 | input.close() 38 | output.close() 39 | } 40 | } 41 | 42 | private fun handleObjects(bucket: String, s3Service: S3Service) { 43 | try { 44 | var count = 0 45 | withLazyConnection { lazyQueries -> 46 | s3Service.enumerateObjectsInBucket(bucket) { s3o, listCount -> 47 | 48 | // listCount is uninitialized at the start of a new "page" of objects 49 | if (!listCount.isInitialized()) { 50 | LOGGER.log("Listed ${listCount.value} keys.") 51 | } 52 | try { 53 | handleObject(s3o, s3Service, lazyQueries.value) 54 | count += 1 55 | } catch (e: Throwable) { 56 | LOGGER.error("Processing error (${s3o.key}): ${e.logMessage()}") 57 | } 58 | LOGGER.log("Processed ${s3o.key}") 59 | } 60 | } 61 | LOGGER.log("Processed $count objects successfully.") 62 | } catch (e: Throwable) { 63 | LOGGER.error(e.logMessage()) 64 | throw e 65 | } 66 | } 67 | 68 | private fun handleObject( 69 | s3o: S3Object, 70 | s3Service: S3Service, 71 | queries: AccessLogQueries 72 | ) { 73 | LOGGER.log("Getting ${s3o.key}") 74 | s3Service.getObject(s3o).use { downloadStream -> 75 | GZIPInputStream(downloadStream).use { input -> 76 | queries.transaction { 77 | input.enumerateLogs { date, time, referer, userAgent, path -> 78 | queries.insertLog( 79 | // date and time in UTC (postgres timestamp format) 80 | "$date $time", 81 | referer, 82 | userAgent, 83 | path 84 | ) 85 | } 86 | } 87 | } 88 | } 89 | LOGGER.log("Deleting ${s3o.key}") 90 | s3Service.deleteObject(s3o) 91 | } 92 | 93 | fun main(args: Array) = CLI { 94 | setLogger(this) 95 | try { 96 | handleObjects( 97 | bucket, 98 | S3Service(region, allowDelete) 99 | ) 100 | } catch (e: Throwable) { 101 | LOGGER.error(e.logMessage()) 102 | exitProcess(1) 103 | } 104 | exitProcess(0) 105 | }.main(args) 106 | -------------------------------------------------------------------------------- /src/main/kotlin/logic/cli.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats.logic 2 | 3 | import com.benasher44.kloudfrontblogstats.utils.Logger 4 | import com.github.ajalt.clikt.core.CliktCommand 5 | import com.github.ajalt.clikt.parameters.options.convert 6 | import com.github.ajalt.clikt.parameters.options.flag 7 | import com.github.ajalt.clikt.parameters.options.option 8 | import com.github.ajalt.clikt.parameters.options.required 9 | import com.github.ajalt.clikt.parameters.types.choice 10 | import software.amazon.awssdk.regions.Region 11 | import java.time.Instant 12 | import java.time.ZoneId 13 | 14 | internal class CLI(private val runLambda: CLI.() -> Unit) : CliktCommand(), Logger { 15 | 16 | val bucket by option( 17 | "--bucket", 18 | help = "The S3 bucket containing the log files" 19 | ).required() 20 | 21 | val region by option( 22 | "--region", 23 | help = "The AWS region where the S3 bucket resides" 24 | ) 25 | .choice(choices = Region.regions().map { it.id() }.toTypedArray()) 26 | .convert { Region.of(it)!! } 27 | .required() 28 | 29 | val allowDelete by option( 30 | "--allow-delete", 31 | help = "When set, allows the tool to delete objects after processing" 32 | ).flag(default = false) 33 | 34 | override fun run() { 35 | runLambda.invoke(this) 36 | } 37 | 38 | override fun log(msg: String) { 39 | echo(msg.logMessage()) 40 | } 41 | 42 | override fun error(msg: String) { 43 | echo(msg.logMessage(), err = true) 44 | } 45 | } 46 | 47 | private fun String.logMessage(): String = 48 | "[${Instant.now().atZone(ZoneId.systemDefault())}] $this" 49 | -------------------------------------------------------------------------------- /src/main/kotlin/logic/log.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats.logic 2 | 3 | import java.io.InputStream 4 | import java.net.URLDecoder 5 | import java.nio.charset.StandardCharsets.UTF_8 6 | 7 | private class LogLine(line: String, private val fields: Map) { 8 | private val values = line.trim().split("\t") 9 | 10 | // log values are url-encoded 11 | operator fun get(key: String): String = URLDecoder.decode( 12 | // avoid URLDecoder decoding "+" as " " 13 | values[fields[key]!!].replace("+", "%2B"), 14 | UTF_8.name() 15 | ) 16 | } 17 | 18 | internal typealias LogLambda = ( 19 | date: String, 20 | time: String, 21 | referer: String?, 22 | userAgent: String?, 23 | path: String 24 | ) -> Unit 25 | 26 | internal fun InputStream.enumerateLogs(lambda: LogLambda) { 27 | bufferedReader().useLines { lines -> 28 | lateinit var fields: Map 29 | val filteredLines = lines.filter { line -> 30 | if (line.startsWith("#Fields:")) { 31 | fields = line.substringAfter("#Fields:") 32 | .trim() 33 | .split(" ") 34 | .withIndex() 35 | .associate { it.value to it.index } 36 | false 37 | } else !line.startsWith("#") 38 | } 39 | for (line in filteredLines) { 40 | val values = LogLine(line, fields) 41 | 42 | // HTTP status 43 | if (values["sc-status"] != "200" && values["sc-status"] != "304") continue 44 | 45 | // HTTP method 46 | if (values["cs-method"] != "GET") continue 47 | 48 | lambda( 49 | // date and time in UTC 50 | values["date"], 51 | values["time"], 52 | 53 | // Referer header 54 | values["cs(Referer)"].nullIfEmptyLogValue()?.normalizeUrl(), 55 | 56 | // User-Agent 57 | values["cs(User-Agent)"].nullIfEmptyLogValue(), 58 | 59 | // Path 60 | values["cs-uri-stem"].normalizeUrl() 61 | ) 62 | } 63 | } 64 | } 65 | 66 | // null values are represented by "-" in the log 67 | private fun String.nullIfEmptyLogValue(): String? = 68 | this.takeUnless { it == "-" } 69 | 70 | private fun String.normalizeUrl(): String = this 71 | 72 | // trim trailing slash 73 | .trimEnd('/') 74 | 75 | // remove leading http:// 76 | .substringAfter("http://") 77 | 78 | // remove leading https:// 79 | .substringAfter("https://") 80 | -------------------------------------------------------------------------------- /src/main/kotlin/logic/s3.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats.logic 2 | 3 | import software.amazon.awssdk.regions.Region 4 | import software.amazon.awssdk.services.s3.S3Client 5 | import software.amazon.awssdk.services.s3.model.DeleteObjectRequest 6 | import software.amazon.awssdk.services.s3.model.GetObjectRequest 7 | import software.amazon.awssdk.services.s3.model.ListObjectsV2Request 8 | import java.io.InputStream 9 | 10 | internal data class S3Object(val bucket: String, val key: String) 11 | 12 | internal class S3Service(region: Region, private val allowDelete: Boolean) { 13 | private val client = S3Client.builder() 14 | .region(region) 15 | .build() 16 | 17 | fun getObject(s3o: S3Object): InputStream { 18 | val getRequest = GetObjectRequest.builder() 19 | .bucket(s3o.bucket) 20 | .key(s3o.key) 21 | .build() 22 | return client.getObject(getRequest) 23 | } 24 | 25 | // noop if allowDelete is false 26 | fun deleteObject(s3o: S3Object) { 27 | if (!allowDelete) return 28 | val deleteRequest = DeleteObjectRequest.builder() 29 | .bucket(s3o.bucket) 30 | .key(s3o.key) 31 | .build() 32 | client.deleteObject(deleteRequest) 33 | } 34 | 35 | /** Calls [lambda] for each object in the bucket, paging through all objects in the bucket */ 36 | fun enumerateObjectsInBucket(bucket: String, lambda: (S3Object, Lazy) -> Unit) { 37 | var nextContinuationToken: String? = null 38 | do { 39 | // build the list request, with a continuation token if we have one 40 | val listRequest = ListObjectsV2Request.builder() 41 | .bucket(bucket) 42 | if (nextContinuationToken != null) { 43 | listRequest.continuationToken(nextContinuationToken) 44 | } 45 | 46 | // get the response and store the continuation token for next time 47 | val response = client.listObjectsV2(listRequest.build()) 48 | if (response.isTruncated) { 49 | nextContinuationToken = response.nextContinuationToken() 50 | } 51 | 52 | // call the lambda for each object in the response 53 | if (response.hasContents()) { 54 | val count = lazy { response.keyCount() } 55 | for (s3o in response.contents()) { 56 | lambda(S3Object(bucket, s3o.key()), count) 57 | } 58 | } 59 | } while (response.isTruncated) 60 | } 61 | } 62 | -------------------------------------------------------------------------------- /src/main/kotlin/utils/logging.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats.utils 2 | 3 | import java.util.logging.Level 4 | 5 | // OTHER_LOGGER can be set to be used instead of JavaLogger 6 | private var OTHER_LOGGER: Logger? = null 7 | internal fun setLogger(logger: Logger) { 8 | OTHER_LOGGER = logger 9 | } 10 | 11 | internal val LOGGER: Logger by lazy { 12 | OTHER_LOGGER ?: JavaLogger 13 | } 14 | 15 | interface Logger { 16 | fun log(msg: String) 17 | fun error(msg: String) 18 | } 19 | 20 | private object JavaLogger : Logger { 21 | private val logger: java.util.logging.Logger = java.util.logging.Logger.getLogger(this::javaClass.name) 22 | 23 | override fun log(msg: String) { 24 | logger.log(Level.INFO, msg) 25 | } 26 | 27 | override fun error(msg: String) { 28 | logger.log(Level.SEVERE, msg) 29 | } 30 | } 31 | 32 | internal fun Throwable.logMessage(): String = 33 | "$this - $message: ${stackTraceToString()}" 34 | -------------------------------------------------------------------------------- /src/main/kotlin/utils/sqldelight.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats.utils 2 | 3 | import com.benasher44.kloudfrontblogstats.AccessLogQueries 4 | import com.benasher44.kloudfrontblogstats.KBSDatabase 5 | import com.squareup.sqldelight.db.SqlDriver 6 | import com.squareup.sqldelight.sqlite.driver.asJdbcDriver 7 | import org.postgresql.ds.PGSimpleDataSource 8 | 9 | // lazily connects to postgres to avoid unnecessary connections to 10 | // reduce costs associated with Aurora Serverless 11 | internal fun withLazyConnection(dbLambda: (queries: Lazy) -> Unit) { 12 | val lazyDriver = lazy { 13 | val url = System.getenv("PG_URL") 14 | val ds = PGSimpleDataSource() 15 | ds.setUrl("jdbc:$url") 16 | ds.user = System.getenv("PG_USER") 17 | ds.password = System.getenv("PG_PASSWORD") 18 | ds.asJdbcDriver() 19 | } 20 | try { 21 | val accessLogQueries = lazy { 22 | val driver = lazyDriver.value 23 | val database = KBSDatabase(driver) 24 | 25 | // TODO: fix setVersion query and support upgrades 26 | // https://github.com/AlecStrong/sql-psi/issues/136 27 | if (!driver.schemaVersionTableExists()) { 28 | database.schemaVersionQueries.transaction { 29 | KBSDatabase.Schema.create(driver) 30 | } 31 | database.schemaVersionQueries.setVersion() 32 | } 33 | database.accessLogQueries 34 | } 35 | 36 | dbLambda(accessLogQueries) 37 | } finally { 38 | if (lazyDriver.isInitialized()) { 39 | lazyDriver.value.close() 40 | } 41 | } 42 | } 43 | 44 | private fun SqlDriver.schemaVersionTableExists(): Boolean { 45 | executeQuery( 46 | null, 47 | """|SELECT 1 FROM information_schema.tables 48 | |WHERE table_schema = 'public' AND 49 | | table_name = 'schemaversion' 50 | |FETCH FIRST ROW ONLY 51 | |""".trimMargin(), 52 | 0 53 | ).use { 54 | if (!it.next()) return false 55 | return it.getLong(0) == 1L 56 | } 57 | } 58 | -------------------------------------------------------------------------------- /src/main/sqldelight/com/benasher44/kloudfrontblogstats/AccessLog.sq: -------------------------------------------------------------------------------- 1 | CREATE TABLE AccessLog( 2 | id SERIAL PRIMARY KEY, 3 | accessedAt TIMESTAMP NOT NULL, 4 | referer TEXT, 5 | userAgent TEXT, 6 | path TEXT NOT NULL 7 | ); 8 | 9 | insertLog: 10 | INSERT INTO AccessLog(accessedAt, referer, userAgent, path) 11 | -- https://github.com/AlecStrong/sql-psi/issues/234 12 | VALUES (CAST (? AS TIMESTAMP), ?, ?, ?); -------------------------------------------------------------------------------- /src/main/sqldelight/com/benasher44/kloudfrontblogstats/SchemaVersion.sq: -------------------------------------------------------------------------------- 1 | CREATE TABLE SchemaVersion ( 2 | id INT PRIMARY KEY NOT NULL, 3 | version INT NOT NULL 4 | ); 5 | 6 | setVersion: 7 | INSERT INTO SchemaVersion(id, version) VALUES (0, 1); -------------------------------------------------------------------------------- /src/test/kotlin/LogParsingTest.kt: -------------------------------------------------------------------------------- 1 | package com.benasher44.kloudfrontblogstats 2 | 3 | import com.benasher44.kloudfrontblogstats.logic.enumerateLogs 4 | import org.junit.jupiter.api.Test 5 | import kotlin.test.assertEquals 6 | 7 | class LogParsingTest { 8 | 9 | private data class LogRecord( 10 | val date: String, 11 | val time: String, 12 | val referer: String?, 13 | val userAgent: String?, 14 | val path: String 15 | ) 16 | 17 | private val sampleLog = """ 18 | #Version: 1.0 19 | #Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end 20 | 2020-11-25 05:35:49 SIN52-C2 32396 0.0.0.0 GET something.cloudfront.net /img/logo.png 200 - Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010.15;%20rv:82.0)%20Gecko/20100101%20Firefox/82.0 - - Hit dGVzdA== benasher.co https 244 0.003 - TLSv1.3 TLS_AES_128_GCM_SHA256 Hit HTTP/2.0 - - 59724 0.003 Hit image/png 32049 - - 21 | 2020-11-25 05:35:50 SIN52-C2 17700 0.0.0.0 GET something.cloudfront.net /favicon.ico 200 https://benasher.co/img/logo.png Mozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010.15;%20rv:82.0)%20Gecko/20100101%20Firefox/82.0 - - Hit dGVzdA== benasher.co https 69 0.002 - TLSv1.3 TLS_AES_128_GCM_SHA256 Hit HTTP/2.0 - - 59724 0.002 Hit image/vnd.microsoft.icon 17363 - - 22 | 2020-11-25 05:39:29 AMS54-C1 5112 0.0.0.0 GET something.cloudfront.net /kotlin-binary-debugging/ 200 - Mozilla/5.0%20(compatible;%20AhrefsBot/7.0;%20+http://ahrefs.com/robot/) - - Hit dGVzdA== benasher.co https 113 0.012 - TLSv1.3 TLS_AES_128_GCM_SHA256 Hit HTTP/2.0 - - 44862 0.011 Hit text/html - - - 23 | """.trimIndent() 24 | 25 | @Test 26 | fun `parses sample log`() { 27 | val logRecords = mutableListOf() 28 | sampleLog.byteInputStream().enumerateLogs { date, time, referer, userAgent, path -> 29 | logRecords.add( 30 | LogRecord(date, time, referer, userAgent, path) 31 | ) 32 | } 33 | val expectedLogRecords = listOf( 34 | LogRecord("2020-11-25", "05:35:49", null, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:82.0) Gecko/20100101 Firefox/82.0", "/img/logo.png"), 35 | LogRecord("2020-11-25", "05:35:50", "benasher.co/img/logo.png", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:82.0) Gecko/20100101 Firefox/82.0", "/favicon.ico"), 36 | LogRecord("2020-11-25", "05:39:29", null, "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)", "/kotlin-binary-debugging"), 37 | ) 38 | assertEquals(expectedLogRecords, logRecords) 39 | } 40 | } 41 | --------------------------------------------------------------------------------