├── 1-Queries Reference.pdf ├── 2-Aggregate-Functions Reference.pdf ├── 3-Multiple-Tables-Combination Reference.pdf ├── README.md ├── SQL Server └── README.md ├── UTM参数查询.md ├── Warby+Parker(SQL+PowerBi).pdf └── 用户流失率(Churn rate).md /1-Queries Reference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/SQL/8793043d961c446bb3547fe4da4e9a3d74a05341/1-Queries Reference.pdf -------------------------------------------------------------------------------- /2-Aggregate-Functions Reference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/SQL/8793043d961c446bb3547fe4da4e9a3d74a05341/2-Aggregate-Functions Reference.pdf -------------------------------------------------------------------------------- /3-Multiple-Tables-Combination Reference.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/SQL/8793043d961c446bb3547fe4da4e9a3d74a05341/3-Multiple-Tables-Combination Reference.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SQL Generalizations 2 | SQL is a programming language designed to manipulate and manage data stored in relational databases. 3 | - A relational database is a database that organizes information into one or more tables. 4 | - A table is a collection of data organized into rows and columns. 5 | A statement is a string of characters that the database recognizes as a valid command. 6 | ## Six Commands Commonly Used: 7 | - `CREATE`: Creates new table 8 | - `INSERT INTO`: adds a new row to a table 9 | - `SELECT`: queries data from a table. 10 | - `ALTER TABLE`: changes an existing table. 11 | - `UPDATE`: edits a row in a table. 12 | - `DELETE FROM`: deletes rows from a table. 13 | 14 | ## Common data types in SQL: 15 | - Integer 16 | - Text 17 | - Data 18 | - Real 19 | 20 | ## Six Common Statements in SQL: 21 | ### 1.CREATE 22 | `CREATE TABLE table_name( 23 | column_name_1 data_type, 24 | column_name_2 data_type, 25 | column_name_3 data_type);` 26 | 27 | ### 2.INSERT 28 | `INSERT INTO table_name( 29 | column_name_1,column_name_2,column_3) 30 | VALUES (a1,b1,c1),(a2,b2,c2),(a3,b3,c3);` 31 | 32 | ### 3.SELECT 33 | `SELECT * FROM table_name;`,`SELECT column_name FROM table_name;` 34 | 35 | ### 4.ALTER 36 | `ALTER TABLE table_name ADD COLUMN new_column_name TEXT;` 37 | 38 | ### 5.UPDATE 39 | `UPDATE table_name SET (to_be_modified) WHERE (loc_conditon);` 40 | 41 | ### 6.DELETE 42 | `DELETE FROM table_name WHERE (condition);` 43 | 44 | ### Constrains 45 | Constraints that add information about how a column can be used are invoked after specifying the data type for a column. 46 | They can be used to tell the database to reject inserted data that does not adhere to a certain restriction. For example: 47 | - `CREATE TABLE table_name( 48 | column_name_1 INTEGER PRIMARY KEY, 49 | column_name_2 TEXT UNIQUE, 50 | column_name_3 TEXT NOT NULL, 51 | column_name_$ TEXT DEFAULT 'Not Applicable' 52 | );` 53 | We found that the constrains words like `PRIMARY KEY`, `UNIQUE`, `NOT NULL`, `DEFAULT (VALUE)`. 54 | 55 | ## Chapter 1: Writing Queries 56 | One of the core purposes of the SQL language is to retrieve information stored in a database. 57 | This is commonly referred to as querying. Queries allow us to communicate with the database by asking questions and having the result set return data relevant to the question. 58 | 59 | Reviews: 60 | - `SELECT` is the clause we use every time we want to query information from a database. 61 | - `AS` renames a column or table. 62 | - `DISTINCT` return unique values. 63 | - `WHERE` is a popular command that lets you filter the results of the query based on conditions that you specify. 64 | - `LIKE` and `BETWEEN` are special operators. 65 | - `AND` and `OR` combines multiple conditions. 66 | - `ORDER BY` sorts the result. 67 | - `LIMIT` specifies the maximum number of rows that the query will return. 68 | - `CASE` creates different outputs. 69 | ### AS 70 | `AS` is a keyword in SQL that allows you to rename a column or table using an alias. Some important things to note: 71 | - Although it’s not always necessary, it’s best practice to surround your aliases with single quotes. 72 | - When using `AS`, the columns are not been renamed in the table. The aliases only appear in the result. 73 | 74 | ### DISTINCT 75 | `DISTINCT` is used to return unique values in the output. It filters out all duplicate values in the specified column(s). 76 | 77 | ### WHERE 78 | DISTINCT is used to return unique values in the output. It filters out all duplicate values in the specified column(s). 79 | Comparison operators used with the WHERE clause are: 80 | - `=` equal to 81 | - `!=` not equal to 82 | - `>` greater than 83 | - `<` less than 84 | - `>=` greater than or equal to 85 | - `<=` less than or equal to 86 | ### LIKE 87 | LIKE can be a useful operator when you want to compare similar values. 88 | - The `_` means you can substitute any individual character here without breaking the pattern.For example: 89 | - The names Seven and Se7en both match the pattern:`LIKE 'Se_ev'`. 90 | - The `%` is a wildcard character that matches zero or more missing letters in the pattern.For example: 91 | - `LIKE 'A%'` matches all words with names that begin with letter ‘A’ 92 | - `LIKE '%a'` matches all words that end with ‘a’. 93 | - We can also use % both before and after a pattern:`LIKE '%xxx%'` 94 | 95 | ### IS NULL 96 | More often than not, the data you encounter will have missing values.we will have to use these operators: 97 | - `IS NULL` 98 | - `IS NOT NULL` 99 | 100 | ### BETWEEN 101 | The `BETWEEN` operator can be used in a WHERE clause to filter the result set within a certain range. 102 | The values can be numbers, text or dates.interesting point to emphasize: 103 | - `BETWEEN` two letters is not inclusive of the 2nd letter.For example: 104 | - `SELECT * FROM movies WHERE name BETWEEN 'A' AND 'J';`The results merely return a series of movie name with a letter of 'A'-'I'. 105 | - `BETWEEN` two numbers is inclusive of the 2nd number.For example: 106 | - `SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999;`The results return with the year between 1990 and 1999. 107 | 108 | ### AND 109 | Sometimes we want to combine multiple conditions in a `WHERE` clause to make the result set more specific and useful. 110 | With `AND`, both conditions must be true for the row to be included in the result. 111 | 112 | ### OR 113 | Similar to `AND`, the `OR` operator can also be used to combine multiple conditions in `WHERE` 114 | With `OR`, if any of the conditions are true, then the row is added to the result. 115 | 116 | ### ORDER BY (DESC) 117 | `ORDER BY` is a clause that indicates you want to sort the result set by a particular column.(can have more than one column) 118 | The column that we `ORDER BY` doesn’t even have to be one of the columns that we’re displaying. 119 | 120 | ### LIMIT 121 | `LIMIT` is a clause that lets US specify the maximum number of rows the result set will have. 122 | This saves space on our screen and makes our queries run faster. 123 | Here, we specify that the result set can’t have more than 10 rows. 124 | `LIMIT` always goes at the very end of the query. Also, it is not supported in all SQL databases. 125 | 126 | ### CASE 127 | A CASE statement allows us to create different outputs (usually in the SELECT statement). 128 | It is SQL’s way of handling if-then logic.For example: 129 | - `SELECT name,CASE WHEN genre='romance' THEN 'Chill' 130 | WHEN genre='comedy' THEN 'Chill' 131 | ELSE 'Intense' 132 | END AS 'Mood' 133 | From movies;` 134 | 135 | ## Chapter 2: Aggregate Functions 136 | Calculations performed on multiple rows of a table are called aggregates. 137 | 138 | Reviews: 139 | - `COUNT()`: count the number of rows 140 | - `SUM()`: the sum of the values in a column 141 | - `MAX()/MIN()`: the largest/smallest value 142 | - `AVG()`: the average of the values in a column 143 | - `ROUND()`: round the values in the column 144 | - `GROUP BY` is a clause used with aggregate functions to combine data from one or more columns. 145 | - `HAVING` limit the results of a query based on an aggregate property. 146 | 147 | ### COUNT 148 | `COUNT()` is a function that takes the name of a column as an argument and counts the number of non-empty values in that column. 149 | 150 | ### SUM 151 | `SUM()` is a function that takes the name of a column as an argument and returns the sum of all the values in that column. 152 | 153 | ### MAX/MIN 154 | The `MAX()` and `MIN()` functions return the highest and lowest values in a column, respectively. 155 | 156 | ### AVG() 157 | The `AVG()` function works by taking a column name as an argument and returns the average value for that column. 158 | 159 | ### ROUND() 160 | `ROUND()` function takes two arguments inside the parenthesis: 161 | - A column name 162 | - A column name 163 | 164 | ### GROUP BY 165 | `GROUP BY` is a clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT. 166 | SQL lets us use column reference(s) in our GROUP BY that will make our lives easier: 167 | - `1` is the first column selected 168 | - `2` is the second column selected 169 | - `3` is the third column selected 170 | 171 | ### HAVING 172 | `HAVING` allows you to filter which groups to include and which to exclude. 173 | `HAVING` statement always comes after GROUP BY, but before ORDER BY and LIMIT. 174 | 175 | ## Chapter 3: Working With Multiple Tables 176 | In order to efficiently store data, we often spread related information across multiple tables.Such as: 177 | - the `orders` would contain just the information necessary to describe what was ordered: 178 | - order_id, customer_id, subscription_id, purchase_date 179 | - subscriptions would contain the information to describe each type of subscription: 180 | - subscription_id, description, price_per_month, subscription_length 181 | - customers would contain the information for each customer: 182 | - customer_id, customer_name, address 183 | 184 | Reviews: 185 | - `JOIN` will combine rows from different tables if the join condition is true. 186 | - `LEFT JOIN` will return every row in the left table, and if the join condition is not met, NULL values are used to fill in the columns from the right table. 187 | - Primary key is a column that serves a unique identifier for the rows in the table. 188 | - Foreign key is a column that contains the primary key to another table. 189 | - `CROSS JOIN` lets us combine all rows of one table with all rows of another table. 190 | - `UNION` stacks one dataset on top of another. 191 | - `WITH` allows us to define one or more temporary tables that can be used in the final query. 192 | 193 | ### JOIN: `SELECT...FROM...JOIN...ON...;` 194 | Because column names are often repeated across multiple tables, we use the syntax table_name.column_name to be sure that our requests for columns are unambiguous. In our example, we use this syntax in the ON statement, but we will also use it in the SELECT or any other statement where we refer to column names. 195 | 196 | ### INNER JOIN: `SELECT...FROM...INNER JOIN...ON...;` 197 | When we perform a simple `JOIN` (often called an `inner join`) our result only includes rows that match our ON condition. 198 | 199 | ### LEFT JOIN: `SELECT...FROM...LEFT JOIN...ON...;` 200 | A `left join` will keep all rows from the first table, regardless of whether there is a matching row in the second table. 201 | The final result will keep all rows of the left table but will omit the un-matched row. 202 | 203 | ### PRIMARY KEY vs FOREIGN KEY 204 | Each of tables has a column that uniquely identifies each row of that table, for example: 205 | - `order_id` for `orders` 206 | - `subscription_id` for `subscriptions` 207 | - `customer_id` for `customers` 208 | These special columns are called primary keys.Primary keys have a few requirements: 209 | - None of the values can be NULL. 210 | - Each value must be unique (i.e., you can’t have two customers with the same customer_id in the customers table). 211 | - A table can not have more than one primary key column. 212 | When the primary key for one table appears in a different table, it is called a foreign key. 213 | - So `customer_id` is a primary key when it appears in `customers`, but a foreign key when it appears in `orders`. 214 | The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance: 215 | - When we join `orders` and `customers`, we join on `customer_id`, which is a foreign key in `orders` and the primary key in `customers`. 216 | 217 | ### CROSS JOIN: `SELECT...FROM...CROSS JOIN...;` 218 | Combining all rows of one table with all rows of another table. 219 | Notice that `cross joins` don’t require an ON statement. You’re not really joining on any columns! 220 | Suppose we have 3 different shirts (white, grey, and olive) and 2 different pants (light denim and black), the results might look like this: `3 shirts × 2 pants = 6 combinations` 221 | - A more common usage of CROSS JOIN is when we need to compare each row of a table to a list of values. 222 | 223 | ### UNION: `SELECT...FROM...UNION SELECT...FROM...;` 224 | Sometimes we just want to stack one dataset on top of the other. Well, the `UNION` operator allows us to do that. 225 | SQL has strict rules for appending data: 226 | - Tables must have the same number of columns. 227 | - The columns must have the same data types in the same order as the first table. 228 | 229 | ### WITH: `WITH...AS(SELECT.......)SELECT...FROM...JOIN...ON...;` 230 | Often times, we want to combine two tables, but one of the tables is the result of another calculation. 231 | - The `WITH` statement allows us to perform a separate query (such as aggregating customer’s subscriptions) 232 | - `previous_results` is the alias that we will use to reference any columns from the query inside of the WITH clause 233 | - We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table) 234 | Essentially, we are putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query. 235 | - `With previous_results_name[a temporary table using aggregation] AS(SELECT.....) SELECT ....FROM...JOIN...ON ...=...;` 236 | -------------------------------------------------------------------------------- /SQL Server/README.md: -------------------------------------------------------------------------------- 1 | # SQL Server简介 2 | - 属于关系型数据库管理系统(RDBMS) 3 | - 以SQL为基础作交互得标准编程语言。 4 | - Transact-SQL,又名T-SQL,用于SQL SERVER转悠的内置编程结构。 5 | 6 | **发展历程(版本重大变化)** 7 | 8 | 9 | **SQL Server的体系结构(如图)** 10 | 11 | 12 | ## 两个主成分 13 | 由上图可看出,sql server的两大组成部分为: 14 | - **Database Engine** 15 | - **Relational Engine(简称RE)** 16 | - 功能:执行查询(queries) 17 | - 说明:基于查询从SE中请求数据并处理结果。 18 | - 备注:部分RE包含查询处理,内存管理,线程和任务管理,缓存管理和分布式查询处理。 19 | - **Storage Engine(简称SE)** 20 | - 功能: 负责存储和检索来自存储系统(如磁盘和SAN)的数据。 21 | - **SQLOS(SQL Server Operation System)** 22 | - SQLOS提供许多操作系统服务,如内存和I / O管理。其他服务包括异常处理和同步服务。 23 | 24 | ## SQL Server的服务与工具 25 | - For Data Management 26 | - Sql Server Integration Services(SSIS) 27 | - Sql Server Data Quality Services 28 | - Sql Server Master Data Services 29 | - For developing Databases 30 | - Sql Server Data Tools 31 | - For managing, deploying and monitoring Databses 32 | - **Sql Server Management Studio(SSMS)** 33 | - For Data Analysis 34 | - Sql Server Analysis Services(SSAS) 35 | - For Data Visualiztion 36 | - Sql Server Reporting Services(SSRS) 37 | - For Machine Learning(Based on R) 38 | - ML Services(after the version of Sql Server 2016) 39 | 40 | ## SQL Server的版本 41 | 刚开始接触的时候,被若干个版本搅和得一脸茫然,虽然付费的肯定是最好的,但是拿来练练手,应该 42 | 如何选择呢? 43 | - 免费 44 | - Sql Server Developer edition:适用于数据库开发和测试 45 | - Sql Server Expression:适用于大小高达10GB的磁盘存储容量的小型数据库。 46 | - Sql Server Standard Edition:具有Enterprise Edition的部分功能集,并且在服务器上限制了可配置的处理器核心和内存的数量 47 | - 付费 48 | - Sql Server Enterprise:集成sql Server所有的功能与特征。 49 | 50 | # SQL语法补充 51 | ### OFFSET FETCH 52 | - 功能:返回跳过指定行数后所需要的指定的若干行。 53 | - 注意:`OFFSET` and `FETCH`要和`ORDER BY`一起使用,否则会报错。 54 | - 备注:`OFFSET`和`FETCH`子句比实现`TOP`子句更适合实现查询分页解决方案。 55 | 56 | ``` 57 | ORDER BY column_list [ASC |DESC] 58 | OFFSET offset_row_count {ROW | ROWS} 59 | (FETCH {FIRST | NEXT} fetch_row_count {ROW | ROWS} ONLY) 60 | ``` 61 | - The `OFFSET` clause specifies the number of rows to skip before starting to return rows from the query. The `offset_row_count` can be a constant, variable, or parameter that is greater or equal to zero. 62 | - The `FETCH` clause specifies the number of rows to return after the `OFFSET` clause has been processed. The `offset_row_count` can a constant, variable or scalar that is greater or equal to one. 63 | - The `OFFSET` clause is mandatory while the `FETCH` clause is optional. Also, the `FIRST` and `NEXT` are synonyms respectively so you can use them interchangeably. Similarly, you can use the `FIRST` and `NEXT` interchangeably. 64 | 65 | ### TOP 66 | - 功能:返回指定行数或其行数百分比 67 | 68 | ``` 69 | SELECT TOP (expression) [PERCENT] 70 | [WITH TIES] 71 | FROM 72 | table_name 73 | ORDER BY 74 | column_name; 75 | ``` 76 | - `expression`: 77 | Following the TOP keyword is an expression that specifies the number of rows to be returned. The expression is evaluated to a float value if PERCENT is used, otherwise, it is converted to a BIGINT value. 78 | - `PERCENT`: 79 | The PERCENT keyword indicates that the query returns the first N percentage of rows, where N is the result of the expression. 80 | - `WITH TIES`: 81 | **表示用作排序的指标值相同时,返回所有相同指标值所在的行** 82 | The `WITH TIES` allows you to return more rows with values match the last row in the in the limited result set. Note that WITH TIES may cause more rows to be returned than you specify in the expression. 83 | 84 | ### LIKE 85 | 功能:逻辑操作符。定义字符满足某一条件或形式。 86 | 备注:常用于`WHERE`,`SELECT`,`UPDATE`,`DELETE`。 87 | ``` 88 | column | expression LIKE pattern [ESCAPE escape_character] 89 | ``` 90 | 5种常见通配符: 91 | - `%`:any string of zero or more characters. 92 | - `_`: any single character. 93 | - `[list of characters]`: any single character within the specified set. 94 | - `[character-character]`: any single character within the specified range. 95 | - `^`: any single character not within a list or a range. 96 | 97 | 98 | 99 | 100 | 101 | -------------------------------------------------------------------------------- /UTM参数查询.md: -------------------------------------------------------------------------------- 1 | # UTM参数简介 2 | UTM的全名为Urchin Tracking Module。通过设定好的UTM参数,即可追踪网站/活动的流量来源,不同流量来源所占的比例等。 3 | 假设我们在网站上或文章附带UTM参数的链接,那么我们即可获知每一篇公众号文章所带来的网页浏览量(PV)、独立访客数(UV)、新访问用户数、单次访问时长、访问深度、跳出率等指标。 4 | 5 | ## UTM五大参数 6 | 详情介绍如下: 7 | - 来源(utm_source):用来标识流量来源网站、搜索引擎或其他来源。示例:utm_source=baidu 8 | - 媒介(utm_medium):用来标识媒介,比如电子邮件或每次点击费用。示例:utm_medium=cpc 9 | - 名称(utm_campaign):用来标识特定的产品推广活动。示例:utm_campaign=summer_spread 10 | - 关键字(utm_term):常见于付费关键字广告所使用的字词或是连结名称/图片的替代文字。示例:utm_term = web+analysis 11 | - 内容(utm_content):使用utm_content区分指向同一个网址的广告或链接。示例:utm_content=logolink或utm_content=textlink 12 | 13 | ## SQL在网站分析中的实战应用 14 | 15 | ``` 16 | #查询用户第一次访问时的UTM参数 17 | WITH first_touch AS ( 18 | SELECT user_id, 19 | MIN(timestamp) AS 'first_touch_at' 20 | FROM page_visits 21 | GROUP BY user_id) 22 | SELECT ft.user_id, 23 | ft.first_touch_at, 24 | pv.utm_source 25 | FROM first_touch AS 'ft' 26 | JOIN page_visits AS 'pv' 27 | ON ft.user_id = pv.user_id 28 | AND ft.first_touch_at = pv.timestamp; 29 | 30 | # 查询用户最后一次访问时的UTM参数 31 | WITH last_touch AS ( 32 | SELECT user_id, 33 | MAX(timestamp) AS 'last_touch_at' 34 | FROM page_visits 35 | GROUP BY user_id) 36 | SELECT lt.user_id, 37 | lt.last_touch_at, 38 | pv.utm_source 39 | FROM last_touch AS 'lt' 40 | JOIN page_visits AS 'pv' 41 | ON lt.user_id = pv.user_id 42 | AND lt.last_touch_at = pv.timestamp; 43 | ``` 44 | 45 | ## 案例:营销归因 46 | **业务背景:**CoolTShirts是一个销售各种服装的B2C网站。最近,该网站开展了一些营销活动,以增加网站访问的用户和提高购买量。通过营销归因来判断渠道的价值,他们希望可以勾画出客户旅程:从初次访问网站到完成购买。通过利用UTM参数来挖掘出优化其营销活动的方案。 47 | 48 | 现有数据库:**page_visits** 49 | 50 | | column | Description| 51 | |------------|:---------:| 52 | | user_id | A unique identifier for each visitor to a page| 53 | | timestamp | The time at which the visitor came to the page| 54 | | page_name | The title of the section of the page that was visited 55 | | utm_source| Identifies which site sent the traffic (i.e.,google, newsletter, or facebook_ad)| 56 | |utm_campaign| Identifies the specific ad or email blast (i.e., june-21-newsletter or memorial-day-sale)| 57 | 58 | ### 提出问题 59 | - 1.CoolTShirts有多少种产品推广的活动?有多少条渠道进行这些活动的推广?每条渠道对应投放的推广活动是什么? 60 | - 2.CoolTShirts的浏览页面分别为哪些? 61 | - 3.什么是客户旅途? 62 | - 4.每个产品推广活动与用户的首次互动分别为多少次?每个产品推广活动与用户的最终互动分别为多少次? 63 | - 5.有多少用户访客完成购买? 64 | - 6.每个产品推广活动引流至最终互动完成购买的次数? 65 | - 7.如果重新投放5个新的产品推广活动?该如何选择? 66 | 67 | ``` 68 | # 问题1: 69 | SELECT COUNT(DISTINCT utm_campaign) FROM page_visits; 70 | 71 | SELECT COUNT(DISTINCT utm_source) FROM 72 | page_visits; 73 | 74 | SELECT DISTINCT utm_campaign,utm_source FROM 75 | page_visits; 76 | 77 | # 问题2: 78 | SELECT DISTINCT page_name FROM page_visits; 79 | 80 | 81 | # 问题4: 82 | WITH first_touch AS ( 83 | SELECT user_id, 84 | MIN(timestamp) as first_touch_at 85 | FROM page_visits 86 | GROUP BY user_id), 87 | ft_attr AS ( 88 | SELECT ft.user_id, 89 | ft.first_touch_at, 90 | pv.utm_source, 91 | pv.utm_campaign 92 | FROM first_touch ft 93 | JOIN page_visits pv 94 | ON ft.user_id = pv.user_id 95 | AND ft.first_touch_at = pv.timestamp 96 | ) 97 | SELECT ft_attr.utm_source, 98 | ft_attr.utm_campaign, 99 | COUNT(*) 100 | FROM ft_attr 101 | GROUP BY 1, 2 102 | ORDER BY 3 DESC; 103 | 104 | 105 | # 问题4: 106 | WITH last_touch AS ( 107 | SELECT user_id, 108 | MAX(timestamp) as last_touch_at 109 | FROM page_visits 110 | GROUP BY user_id), 111 | lt_attr AS ( 112 | SELECT lt.user_id, 113 | lt.last_touch_at, 114 | pv.utm_source, 115 | pv.utm_campaign 116 | FROM last_touch lt 117 | JOIN page_visits pv 118 | ON lt.user_id = pv.user_id 119 | AND lt.last_touch_at = pv.timestamp 120 | ) 121 | SELECT lt_attr.utm_source, 122 | lt_attr.utm_campaign, 123 | COUNT(*) 124 | FROM lt_attr 125 | GROUP BY 1, 2 126 | ORDER BY 3 DESC; 127 | 128 | SELECT COUNT(DISTINCT user_id) 129 | FROM page_visits 130 | WHERE page_name = '4 - purchase'; 131 | 132 | # 问题5 133 | WITH last_touch AS ( 134 | SELECT user_id, 135 | MAX(timestamp) AS last_touch_at 136 | FROM page_visits 137 | WHERE page_name = '4 - purchase' 138 | ) 139 | SELECT * FROM last_touch limit 10; 140 | 141 | 142 | # 问题6 143 | WITH last_touch AS ( 144 | SELECT user_id, 145 | MAX(timestamp) as last_touch_at 146 | FROM page_visits 147 | WHERE page_name = '4 - purchase' 148 | GROUP BY user_id), 149 | lt_attr AS ( 150 | SELECT lt.user_id, 151 | lt.last_touch_at, 152 | pv.utm_source, 153 | pv.utm_campaign 154 | FROM last_touch lt 155 | JOIN page_visits pv 156 | ON lt.user_id = pv.user_id 157 | AND lt.last_touch_at = pv.timestamp 158 | ) 159 | SELECT lt_attr.utm_source, 160 | lt_attr.utm_campaign, 161 | COUNT(*) 162 | FROM lt_attr 163 | GROUP BY 1, 2 164 | ORDER BY 3 DESC; 165 | ``` 166 | 167 | 168 | 169 | -------------------------------------------------------------------------------- /Warby+Parker(SQL+PowerBi).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/teamowu/SQL/8793043d961c446bb3547fe4da4e9a3d74a05341/Warby+Parker(SQL+PowerBi).pdf -------------------------------------------------------------------------------- /用户流失率(Churn rate).md: -------------------------------------------------------------------------------- 1 | # Churn Rate(订户下跌率) 2 | - 流失率是指在一定时期内(通常是一个月)取消订阅的用户占所有订阅用户(包含取消用户)的百分比。 3 | - 要保证用户群增长,用户流失率必须低于同一时期的新用户流入率(retention rate)。 4 | - 为了计算用户流失率,我们只会考虑“在月初已订阅的用户”。 5 | - 流失率是指在月内取消的用户数除以总数:**Cancellations / Total subscribers** 6 | 7 | ## 思考 8 | - 熟悉公司业务 9 | - 公司经营了几个月? 需要足够几个月的信息来计算客户流失率? 10 | - 公司有哪些用户群? 11 | - 自公司成立以来,整体流失趋势是怎样的? 12 | - 比较用户群之间的流失率。 13 | - 根据客户流失率,公司应该关注哪些用户群并针对不同客户群进行不同的扩展营销手段? 14 | 15 | ## 单月用户订阅流失率 16 | ``` 17 | 计算任一月份的用户流失率(Churn rate) 18 | # 说明: 19 | - 两个temporary table:enrollmens和status 20 | - status: 是从enrollments中衍生的temporary table. 21 | - enrollments: (2017年前开始订阅,且在1月1日扣费后才取消订阅或没有取消订阅的用户)(代表订阅了2017年1月份的所有用户) 22 | 23 | WITH enrollments AS 24 | (SELECT * FROM subscriptions 25 | WHERE subscription_start < '2017-01-01' 26 | AND ( 27 | (subscription_end >= '2017-01-01') 28 | OR (subscription_end IS NULL) 29 | )), 30 | ### status(CASE 1:只登记该月取消订阅的人数(结果为1) CASE 2:只登记该月活跃用户(结果为1)) 31 | status AS 32 | ### CASE 1:取消订阅在1月份后或无取消订阅标记为0,否则标记为1*/ 33 | (SELECT 34 | CASE 35 | WHEN (subscription_end > '2017-01-31') 36 | OR (subscription_end IS NULL) THEN 0 37 | ELSE 1 38 | END as is_canceled, 39 | ### CASE 2:开始订阅在1月份前,且在1月1日扣费后才取消订阅或没有取消订阅的用户标记为1,否则标记为0. 40 | CASE WHEN subscription_start < '2017-01-01' 41 | AND ( 42 | (subscription_end >= '2017-01-01') 43 | OR (subscription_end IS NULL) 44 | ) THEN 1 45 | ELSE 0 46 | END as is_active 47 | FROM enrollments 48 | ) 49 | SELECT 1.0 * SUM(is_canceled)/ 50 | SUM(is_active) 51 | FROM status; 52 | ``` 53 | 54 | ## 多月用户订阅流失率 55 | ``` 56 | 计算3个月份的用户订阅流失率(Churn rate) 57 | # 说明: 58 | - 四个temporary table: months, cross_join, status,status_aggregate。 59 | - months: 创建该临时表格为所需查询的n个月的第一天和最后一天进行堆叠连接。(3 * 2)。 60 | - cross_join: 创建该临时表格将months表格和用户订阅信息所在表格subscriptions(1000*3)进行交叉连接获得(3000*5)的表格。 61 | - status: 创建该临时表格将cross_join中信息进行统计判断,判断每个用户在相应月中是否活跃或是否取消。 62 | - status_aggregate:创建该临时表格对status中信息进行统计,统计各个月活跃用户的总数和取消订阅的用户总数。 63 | 64 | WITH months AS ( 65 | SELECT 66 | '2017-01-01' AS first_day, 67 | '2017-01-31' AS last_day 68 | UNION 69 | SELECT 70 | '2017-02-01' AS first_day, 71 | '2017-02-28' AS last_day 72 | UNION 73 | SELECT 74 | '2017-03-01' AS first_day, 75 | '2017-03-31' AS last_day 76 | ), 77 | cross_join AS ( 78 | SELECT * 79 | FROM subscriptions 80 | CROSS JOIN months 81 | ), 82 | status AS ( 83 | SELECT 84 | id, 85 | first_day AS month, 86 | CASE 87 | WHEN (subscription_start < first_day) 88 | AND ( 89 | subscription_end > first_day 90 | OR subscription_end IS NULL 91 | ) THEN 1 92 | ELSE 0 93 | END AS is_active, 94 | CASE 95 | WHEN subscription_end BETWEEN first_day AND last_day THEN 1 96 | ELSE 0 97 | END AS is_canceled 98 | FROM cross_join 99 | ), 100 | status_aggregate AS ( 101 | SELECT 102 | month, 103 | SUM(is_active) AS active, 104 | SUM(is_canceled) AS canceled 105 | FROM status 106 | GROUP BY month 107 | ) 108 | SELECT 109 | month, 110 | 1.0 * canceled / active AS churn_rate 111 | FROM status_aggregate; 112 | ``` 113 | 114 | --------------------------------------------------------------------------------