├── README.md └── SQL sample file /README.md: -------------------------------------------------------------------------------- 1 | # SQL-Sample-Queries 2 | Here is where I will keep logs of what I am learning! 3 | 4 | # AVERAGE TEMPERATURE 5 | SELECT AVG(temperature)
6 | FROM `skillful-coast-340323.demos.weather_nyc`
7 | WHERE date BETWEEN '2020-06-01' AND '2020-06-30'

8 | 9 | # ALIAS 10 | Basic format for an AS query:
11 | SELECT column_name(s)
12 | FROM table_name AS alias_name;

13 |   Notice that AS is preceded by the table name and followed by the new nickname. It is a similar approach to aliasing a column:

14 | SELECT column_name AS alias_name
15 | FROM table_name;

16 | 17 | If using AS results in an error when running a query because the SQL database you are working with doesn't support it, you can leave it out. In the previous examples, the alternate syntax for aliasing a table or column would be:
18 | 19 | FROM table_name alias_name
20 | 21 | SELECT column_name alias_name
22 | 23 | # Arithmetic 24 | SQL can perform arithmetic for you. Just select what you want to +, -, *, / . Don’t use spaces
25 | SELECT
26 | station_name,
27 | ridership_2013,
28 | ridership_2014,
29 | ridership_2014-ridership_2013 AS change_2014_raw
30 | FROM `bigquery-public-data.new_york_subway.subway_ridership_2013_present`

31 | 32 | SELECT
33 | station_name,
34 | ridership_2013,
35 | ridership_2014,
36 | ridership_2015,
37 | ridership_2016,
38 | (ridership_2013+ridership_2014+ridership_2015+ridership_2016) / 4 AS average
39 | FROM `bigquery-public-data.new_york_subway.subway_ridership_2013_present`


40 | 41 | 42 | You can also calculate percentages within your data:
43 | SELECT
44 |   Region
45 | Small_bags
46 |   Total_bags
47 | (small_bags / total_bags)*100 AS small_bags_percent
48 | FROM avocado_data.avocado_prices
49 | WHERE total_bags <> 0


50 | 51 | You can also use != 0 in place of <>0 or use the SAFE_DIVIDE function
52 | 53 | 54 | # BETWEEN 55 | SELECT 56 |   Date, purchase_price 57 | FROM customer_data.purchases 58 | WHERE 59 |   Date BETWEEN ‘2020-12-01’ AND ‘2020-12-20’ 60 | 61 | # CAST 62 | SELECT
63 | CAST(purchase_price AS FLOAT64)
64 | FROM customer.data_purchase
65 | ORDER BY purchase_price DESC 66 | 67 | SELECT 68 | CAST(date AS date) AS date_only
69 | FROM customer_data.purchases
70 | The above statement changes SQL from recognizing the dates as datetime (2020-12-12T0:00:00) to only date (2020-12-12) 71 | 72 | # CAST 73 |
(expression AS typename) Where expression is the data to be converted and typename is the data type to be returned. 74 | 75 | ## Converting a number to a string: 76 | SELECT CAST(MyCount AS String) FROM MyTable
77 | In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table. CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. STRING indicates that you are converting the data to a string. FROM indicates which table you are selecting the data from
78 | ## Converting String to a number: 79 | SELECT CAST(MyVarcharCol AS INT) FROM MyTable
80 | CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. INT indicates that you are converting the data to an integer
81 | ## Convert date to a string: 82 | SELECT CAST(MyDate AS STRING) FROM MyTable
83 | ## Converting a date to a datetime: Datetime values have the format of YYYY-MM-DD hh: mm: ss format 84 | SELECT CAST (MyDate AS DATETIME) FROM MyTable 85 | ## The SAFE_CAST function: Using the CAST function in a query that fails returns an error in BigQuery. It returns null instead of error. 86 | SELECT SAFE_CAST (MyDate AS STRING) FROM MyTable 87 | 88 | # CONCAT 89 | SELECT
90 | CONCAT(product_code, product_color) AS new_product_code
91 | FROM customer_purchase.data
92 | WHERE
93 | Product = ‘couch’


94 | 95 | SELECT usertype
96 | CONCAT(start_station_name, “ to “, end_station_name) AS route,
97 | COUNT (*) as num_trips,
98 | ROUND(AVG(cast(tripduration as int64)/60,2) AS duration
99 | FROM big-query-public-data.ny
100 | GROUP BY start_station_name, end_station_name, usertype
101 | ORDER BY num_trips DESC
102 | LIMIT 10
103 | Reminder: Make sure to use a backtick (`) instead of an apostrophe (') in the FROM statement.
104 | About: ROUND(AVG(cast(tripduration as int64)/60,2) AS duration
105 | Big query stores numbers in a 64-bit memory system, which is why there's a 64 after integer in this case. we'll divide it by the number seconds in a minute (60) and tell it how far we want it to round, two decimal places(2).
106 | # COALESCE 107 | SELECT
108 | COALESCE(product, product_code) AS product_info
109 | FROM customer_data.purchase
110 | Return non-null values in a list
111 | 112 | # COUNT / COUNT DISTINCT 113 | COUNT returns the number of rows in a specified range. COUNT DISTINCT does the same, but it will not count repeating values. Use it after the SELECT line.
114 | SELECT
115 |   COUNT(warehouse.state) AS num_states
116 | Warehouse.state is the column and num_states is the new column you are creating to return the count.
117 | 118 | 119 | # CASE 120 | The CASE statement goes through one or more conditions and returns a value as soon as a condition is met
121 | SELECT
122 | Customer_id
123 | CASE
124 | WHEN first_name = ‘Tnoy’ THEN ‘TONY’
125 | ELSE first_name
126 | END AS cleaned_name
127 | FROM customer.data 128 | 129 | # DISTINCT 130 | SELECT DISTINCT fuel_type
131 | FROM cars.car_info;
132 | SELECT DISTINCT name
133 | FROM playlist
134 | ORDER BY playlist_id
135 | 136 | # EXTRACT 137 | This is for if you want to use data from only one part of a column of cells- such as the date from a date format that includes more than the year. This seems like it is useful for if you arent planning on cleaning or manipulating your data before using it.
138 | SELECT
139 |   EXTRACT (YEAR FROM STARTTIME) AS year,
140 |   COUNT (*) AS number_of_rides
141 | FROM
142 |   Address.address
143 | GROUP BY Year
144 | ORDER BY year
145 | 146 | SELECT
147 |   ProductId,
148 |   SUM (Quantity) AS unitssold,
149 |   ROUND (MAX (UnitPrice), 2) AS UnitPrice,
150 |   EXTRACT (YEAR FROM DATE) AS year,
151 |   EXTRACT (MONTH FROM DATE) AS month
152 | FROM `skillful-coast-340323.sales.sales`
153 | GROUP BY
154 |   year, month, ProductId
155 | ORDER BY
156 |   year, month, ProductId
157 | LIMIT 1000
158 | The ROUND (MAX (..), #) seems to need to be run because you cannot run UnitPrice on its own “SELECT list expression references column UnitPrice which is neither grouped nor aggregated at [4:5]”. Trying to run quantity on its own as a column got “SELECT list expression references column Quantity which is neither grouped nor aggregated at [3:5]” 159 | 160 | 161 | # JOIN 162 | General join syntax:
163 | SELECT
164 | --table columns are inserted here
165 | Table_name1.column_name
166 | Table_name2.column_name
167 | FROM
168 |   Table_name1
169 | JOIN
170 |   Table_name2
171 | ON table_name1.column_name=table_name2.column_name
172 | (the column name is the key, be it primary key or foreign key that they share in common)

173 | SELECT
174 |   Customers.customer_name,
175 |   Orders.product_id,
176 |   Orders.ship_date
177 | FROM
178 |   Customers
179 | INNER JOIN
180 |   Orders
181 | ON customers.customer_id = orders.customer_id
182 | 183 | 184 | SELECT
185 |   employees.name AS employee_name,
186 |   employees.role AS employee_role,
187 |   departments.name AS department_name
188 | FROM employee_data.employees
189 | INNER JOIN
190 |   employee_data.departments ON
191 |   employees.department_id = departments.department_id
192 | 193 | SELECT
194 |   employees.name AS employee_name,
195 |   employees.role AS employee_role,
196 |   departments.name AS department_name
197 | FROM employee_data.employees
198 | FULL OUTER JOIN
199 |   employee_data.departments ON
200 |   employees.department_id = departments.department_id

201 | 202 | SELECT
203 |  `bigquery-public-data.world_bank_intl_education.international_education`.country_name,
204 |   `bigquery-public-data.world_bank_intl_education.country_summary`.country_code,
205 |   `bigquery-public-data.world_bank_intl_education.international_education`.value,
206 |   `bigquery-public-data.world_bank_intl_education.country_summary`.short_name,
207 | FROM
208 |   `bigquery-public-data.world_bank_intl_education.international_education`
209 | INNER JOIN
210 |   `bigquery-public-data.world_bank_intl_education.country_summary`
211 | ON `bigquery-public-data.world_bank_intl_education.country_summary`.country_code = `bigquery-public-data.world_bank_intl_education.international_education`.country_code

212 | To use the SAME query but with alias' to clean it up:

213 | 214 | SELECT
215 |   edu.country_name,
216 |   summary.country_code,
217 |   edu.value
218 | FROM
219 |   `bigquery-public-data.world_bank_intl_education.international_education` AS edu
220 | INNER JOIN
221 |   `bigquery-public-data.world_bank_intl_education.country_summary` AS summary
222 | ON edu.country_code = summary.country_code

223 | 224 | SELECT
225 |   seasons.market AS university,
226 |   seasons.name AS team_name,
227 |   seasons.wins,
228 |   seasons.losses,
229 |   seasons.ties,
230 |   mascots.mascot AS team_mascot
231 | FROM
232 |   `bigquery-public-data.ncaa_basketball.mbb_historical_teams_seasons` AS seasons
233 | LEFT JOIN
234 |   `bigquery-public-data.ncaa_basketball.mascots` AS mascots
235 | ON
236 |   seasons.team_id = mascots.id
237 | WHERE
238 |   seasons.season = 1984
239 | AND seasons.division = 1
240 | ORDER BY
241 |   seasons.market
242 | 243 | # LENGTH 244 | SELECT length (title) AS letters_in_title, album_id
245 | FROM album
246 | WHERE letters_in_title < 4
247 | The function LENGTH(title) < 4 will return any album names that are less than 4 characters long. The complete query is SELECT * FROM album WHERE LENGTH(title) < 4. The LENGTH function counts the number of characters a string contains.
248 | TRIM 249 | 250 | # MIN/MAX 251 | SELECT
252 | MIN(length) AS min_length,
253 | MAX(length) AS max_length
254 | FROM cars.car_info;
255 | 256 | # Modulo 257 | An operator (%) that returns the remainder when one number is divided by another. 258 | 259 | 260 | # Order By 261 | SELECT *
262 | FROM movie_data.movies
263 | ORDER BY Release_date DESC


264 | 265 | SELECT * 266 | FROM movies.data
267 | WHERE Genre = ‘Comedy’
268 | AND Revenue > 30000000
269 | ORDER BY Release_date DESC


270 | 271 | SELECT total
272 | FROM invoice
273 | WHERE billing_city = "Chicago"
274 | ORDER BY total ASC


275 | 276 | 277 | SELECT County_of_Residence
278 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
279 | ORDER BY Births ASC
280 | LIMIT 10


281 | 282 | SELECT County_of_Residence
283 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
284 | WHERE year = '2018-01-01'
285 | ORDER BY Births DESC
286 | LIMIT 10
287 | The year had to be in ‘ ‘ for it to work, and since it is in date formate, it would not work as simply 2018.


288 | 289 | 290 | SELECT
291 | The meteorologists who you’re working with have asked you to get the temperature, wind speed, and precipitation for stations La Guardia and JFK, for every day in 2020, in descending order by date, and ascending order by Station ID. Use the following query to request this information:
292 | SELECT stn, date,
293 | IF(temp=9999.9, NULL, temp) AS temperature,
294 | IF(wdsp="999.9", NULL, CAST(wdsp AS Float64)) AS wind_speed,
295 | IF(prcp=99.99, 0, prcp) AS precipitation
296 | FROM `bigquery-public-data.noaa_gsod.gsod2020`
297 | WHERE stn="725030" -- La Guardia
298 | OR stn="744860" -- JFK
299 | ORDER BY date DESC, stn ASC
300 | -- Use the IF function to replace 9999.9 values, which the dataset description explains is the default value when temperature is missing, with NULLs instead.
301 | -- Use the IF function to replace 999.9 values, which the dataset description explains is the default value when wind speed is missing, with NULLs instead. -- Use the IF function to replace 99.99 values, which the dataset description explains is the default value when precipitation is missing, with NULLs instead.
302 | 303 | # SUBSTR 304 | SELECT customer_id,
305 | SUBSTR(country,1,3) AS new_country
306 | FROM customer
307 | ORDER BY country
308 | The statement SUBSTR(country, 1, 3) AS new_country will retrieve the first 3 letters of each state name and store the result in a new column as new_country. The complete query is SELECT customer_id, SUBSTR(country, 1, 3) AS new_country FROM customer ORDER BY country. The SUBSTR function extracts a substring from a string. This function instructs the database to return 3 characters of each country, starting with the first character.

309 | SELECT Invoice_id,
310 | SUBSTR(billing_city,1,4) AS new_city
311 | FROM invoice
312 | ORDER BY billing_city
313 | Billing city= city, 1=starting position in string, 4=how many you return
314 | 315 | # UPDATE 316 | UPDATE cars.car_info
317 | SET num_of_doors = "four"
318 | WHERE make = "dodge" AND fuel_type = "gas" AND body_style = "sedan";
319 | 320 | # WHERE 321 | SELECT *
322 | FROM cars.car_info
323 | WHERE num_of_doors IS NULL;
324 | 325 | SELECT * 326 | FROM movies.data
327 | WHERE Genre = ‘Comedy’
328 | Because the genre is a string, you need to put the ‘ ‘ around the string name. Capitalizations matter

329 | 330 | SELECT CustomerId
331 | FROM invoices
332 | WHERE BillingCountry = 'Germany' AND Total > 5 333 | 334 | # WITH 335 | With allows you to create temporary tables and query from them right within your query. You can also use SELECT INTO and CREATE TEMP TABLE
336 | WITH trips_over_one_hour AS (
337 | SELECT *
338 | FROM `bigquery-public-data.new_york_citibike.citibike_trips`
339 | WHERE tripduration >= 60
340 | )
341 | SELECT COUNT (*) AS cnt
342 | FROM trips_over_one_hour

343 | 344 | In order to get this one to work (as I was writing myself) I had to take the ‘ ‘ from around FROM name,and make sure I had my commas in the right place. It took me a long time.
345 | WITH
346 |   longest_used_bike AS (
347 |    SELECT
348 |     bikeid,
349 |     SUM(duration_minutes) AS trip_duration
350 |    FROM
351 |     bigquery-public-data.austin_bikeshare.bikeshare_trips
352 |    GROUP BY
353 |     bikeid
354 |    ORDER BY
355 |     trip_duration DESC
356 |    LIMIT 1
357 |   )
358 | 359 | ##_find station at which longest bike leaves most often 360 | SELECT
361 |   trips.start_station_id,
362 |   COUNT(*) AS trip_ct
363 | FROM
364 |   longest_used_bike AS longest
365 | INNER JOIN
366 |   bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips
367 | ON longest.bikeid = trips.bikeid
368 | GROUP BY
369 |   trips.start_station_id
370 | ORDER BY
371 |   trip_ct DESC
372 |   LIMIT 1
373 | 374 | 375 | 376 | 377 | -------------------------------------------------------------------------------- /SQL sample file: -------------------------------------------------------------------------------- 1 | # AVERAGE TEMPERATURE 2 | SELECT AVG(temperature) 3 | FROM `skillful-coast-340323.demos.weather_nyc` 4 | WHERE date BETWEEN '2020-06-01' AND '2020-06-30' 5 | 6 | 7 | # BETWEEN 8 | SELECT 9 | Date, purchase_price 10 | FROM customer_data.purchases 11 | WHERE 12 | Date BETWEEN ‘2020-12-01’ AND ‘2020-12-20’ 13 | 14 | # CAST 15 | SELECT 16 | CAST(purchase_price AS FLOAT64) 17 | FROM customer.data_purchase 18 | ORDER BY purchase_price DESC 19 | 20 | #SELECT 21 | CAST(date AS date) AS date_only 22 | FROM customer_data.purchases 23 | The above statement changes SQL from recognizing the dates as datetime (2020-12-12T0:00:00) to only date (2020-12-12) 24 | 25 | # CAST(expression AS typename) Where expression is the data to be converted and typename is the data type to be returned. 26 | 27 | ## Converting a number to a string: 28 | SELECT CAST(MyCount AS String) FROM MyTable 29 | In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table. CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. STRING indicates that you are converting the data to a string. FROM indicates which table you are selecting the data from 30 | Converting String to a number: 31 | SELECT CAST(MyVarcharCol AS INT) FROM MyTable 32 | CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. INT indicates that you are converting the data to an integer 33 | Convert date to a string: 34 | SELECT CAST(MyDate AS STRING) FROM MyTable 35 | Converting a date to a datetime: Datetime values have the format of YYYY-MM-DD hh: mm: ss format 36 | SELECT CAST (MyDate AS DATETIME) FROM MyTable 37 | The SAFE_CAST function: Using the CAST function in a query that fails returns an error in BigQuery. It returns null instead of error. 38 | SELECT SAFE_CAST (MyDate AS STRING) FROM MyTable 39 | 40 | CONCAT 41 | SELECT 42 | CONCAT(product_code, product_color) AS new_product_code 43 | FROM customer_purchase.data 44 | WHERE 45 | Product = ‘couch’ 46 | 47 | COALESCE 48 | SELECT 49 | COALESCE(product, product_code) AS product_info 50 | FROM customer_data.purchase 51 | Return non-null values in a list 52 | 53 | CASE 54 | The CASE statement goes through one or more conditions and returns a value as soon as a condition is met 55 | SELECT 56 | Customer_id 57 | CASE 58 | WHEN first_name = ‘Tnoy’ THEN ‘TONY’ 59 | ELSE first_name 60 | END AS cleaned_name 61 | FROM customer.data 62 | 63 | DISTINCT 64 | SELECT DISTINCT fuel_type 65 | FROM cars.car_info; 66 | SELECT DISTINCT name 67 | FROM playlist 68 | ORDER BY playlist_id 69 | 70 | LENGTH 71 | SELECT length (title) AS letters_in_title, album_id 72 | FROM album 73 | WHERE letters_in_title < 4 74 | The function LENGTH(title) < 4 will return any album names that are less than 4 characters long. The complete query is SELECT * FROM album WHERE LENGTH(title) < 4. The LENGTH function counts the number of characters a string contains. 75 | TRIM 76 | 77 | MIN/MAX 78 | SELECT 79 | MIN(length) AS min_length, 80 | MAX(length) AS max_length 81 | FROM cars.car_info; 82 | 83 | 84 | 85 | Order By 86 | SELECT * 87 | FROM movie_data.movies 88 | ORDER BY Release_date DESC 89 | 90 | SELECT * 91 | FROM movies.data 92 | WHERE Genre = ‘Comedy’ 93 | AND Revenue > 30000000 94 | ORDER BY Release_date DESC 95 | 96 | SELECT total 97 | FROM invoice 98 | WHERE billing_city = "Chicago" 99 | ORDER BY total ASC 100 | 101 | 102 | SELECT County_of_Residence 103 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` 104 | ORDER BY Births ASC 105 | LIMIT 10 106 | 107 | SELECT County_of_Residence 108 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` 109 | WHERE year = '2018-01-01' 110 | ORDER BY Births DESC 111 | LIMIT 10 112 | The year had to be in ‘ ‘ for it to work, and since it is in date formate, it would not work as simply 2018. 113 | 114 | 115 | SELECT 116 | The meteorologists who you’re working with have asked you to get the temperature, wind speed, and precipitation for stations La Guardia and JFK, for every day in 2020, in descending order by date, and ascending order by Station ID. Use the following query to request this information: 117 | SELECT stn, date, 118 | IF(temp=9999.9, NULL, temp) AS temperature, 119 | IF(wdsp="999.9", NULL, CAST(wdsp AS Float64)) AS wind_speed, 120 | IF(prcp=99.99, 0, prcp) AS precipitation 121 | FROM `bigquery-public-data.noaa_gsod.gsod2020` 122 | WHERE stn="725030" -- La Guardia 123 | OR stn="744860" -- JFK 124 | ORDER BY date DESC, stn ASC 125 | -- Use the IF function to replace 9999.9 values, which the dataset description explains is the default value when temperature is missing, with NULLs instead. 126 | -- Use the IF function to replace 999.9 values, which the dataset description explains is the default value when wind speed is missing, with NULLs instead. -- Use the IF function to replace 99.99 values, which the dataset description explains is the default value when precipitation is missing, with NULLs instead. 127 | 128 | SUBSTR 129 | SELECT customer_id, 130 | SUBSTR(country,1,3) AS new_country 131 | FROM customer 132 | ORDER BY country 133 | The statement SUBSTR(country, 1, 3) AS new_country will retrieve the first 3 letters of each state name and store the result in a new column as new_country. The complete query is SELECT customer_id, SUBSTR(country, 1, 3) AS new_country FROM customer ORDER BY country. The SUBSTR function extracts a substring from a string. This function instructs the database to return 3 characters of each country, starting with the first character. 134 | SELECT Invoice_id, 135 | SUBSTR(billing_city,1,4) AS new_city 136 | FROM invoice 137 | ORDER BY billing_city 138 | Billing city= city, 1=starting position in string, 4=how many you return 139 | 140 | UPDATE 141 | UPDATE cars.car_info 142 | SET num_of_doors = "four" 143 | WHERE make = "dodge" AND fuel_type = "gas" AND body_style = "sedan"; 144 | 145 | WHERE 146 | SELECT * 147 | FROM cars.car_info 148 | WHERE num_of_doors IS NULL; 149 | 150 | SELECT * 151 | FROM movies.data 152 | WHERE Genre = ‘Comedy’ 153 | Because the genre is a string, you need to put the ‘ ‘ around the string name. Capitalizations matter 154 | 155 | SELECT CustomerId 156 | FROM invoices 157 | WHERE BillingCountry = 'Germany' AND Total > 5 158 | 159 | 160 | 161 | --------------------------------------------------------------------------------