├── README.md
└── SQL sample file
/README.md:
--------------------------------------------------------------------------------
1 | # SQL-Sample-Queries
2 | Here is where I will keep logs of what I am learning!
3 |
4 | # AVERAGE TEMPERATURE
5 | SELECT AVG(temperature)
6 | FROM `skillful-coast-340323.demos.weather_nyc`
7 | WHERE date BETWEEN '2020-06-01' AND '2020-06-30'
8 |
9 | # ALIAS
10 | Basic format for an AS query:
11 | SELECT column_name(s)
12 | FROM table_name AS alias_name;
13 | Notice that AS is preceded by the table name and followed by the new nickname. It is a similar approach to aliasing a column:
14 | SELECT column_name AS alias_name
15 | FROM table_name;
16 |
17 | If using AS results in an error when running a query because the SQL database you are working with doesn't support it, you can leave it out. In the previous examples, the alternate syntax for aliasing a table or column would be:
18 |
19 | FROM table_name alias_name
20 |
21 | SELECT column_name alias_name
22 |
23 | # Arithmetic
24 | SQL can perform arithmetic for you. Just select what you want to +, -, *, / . Don’t use spaces
25 | SELECT
26 | station_name,
27 | ridership_2013,
28 | ridership_2014,
29 | ridership_2014-ridership_2013 AS change_2014_raw
30 | FROM `bigquery-public-data.new_york_subway.subway_ridership_2013_present`
31 |
32 | SELECT
33 | station_name,
34 | ridership_2013,
35 | ridership_2014,
36 | ridership_2015,
37 | ridership_2016,
38 | (ridership_2013+ridership_2014+ridership_2015+ridership_2016) / 4 AS average
39 | FROM `bigquery-public-data.new_york_subway.subway_ridership_2013_present`
40 |
41 |
42 | You can also calculate percentages within your data:
43 | SELECT
44 | Region
45 | Small_bags
46 | Total_bags
47 | (small_bags / total_bags)*100 AS small_bags_percent
48 | FROM avocado_data.avocado_prices
49 | WHERE total_bags <> 0
50 |
51 | You can also use != 0 in place of <>0 or use the SAFE_DIVIDE function
52 |
53 |
54 | # BETWEEN
55 | SELECT
56 | Date, purchase_price
57 | FROM customer_data.purchases
58 | WHERE
59 | Date BETWEEN ‘2020-12-01’ AND ‘2020-12-20’
60 |
61 | # CAST
62 | SELECT
63 | CAST(purchase_price AS FLOAT64)
64 | FROM customer.data_purchase
65 | ORDER BY purchase_price DESC
66 |
67 | SELECT
68 | CAST(date AS date) AS date_only
69 | FROM customer_data.purchases
70 | The above statement changes SQL from recognizing the dates as datetime (2020-12-12T0:00:00) to only date (2020-12-12)
71 |
72 | # CAST
73 |
(expression AS typename) Where expression is the data to be converted and typename is the data type to be returned.
74 |
75 | ## Converting a number to a string:
76 | SELECT CAST(MyCount AS String) FROM MyTable
77 | In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table. CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. STRING indicates that you are converting the data to a string. FROM indicates which table you are selecting the data from
78 | ## Converting String to a number:
79 | SELECT CAST(MyVarcharCol AS INT) FROM MyTable
80 | CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. INT indicates that you are converting the data to an integer
81 | ## Convert date to a string:
82 | SELECT CAST(MyDate AS STRING) FROM MyTable
83 | ## Converting a date to a datetime: Datetime values have the format of YYYY-MM-DD hh: mm: ss format
84 | SELECT CAST (MyDate AS DATETIME) FROM MyTable
85 | ## The SAFE_CAST function: Using the CAST function in a query that fails returns an error in BigQuery. It returns null instead of error.
86 | SELECT SAFE_CAST (MyDate AS STRING) FROM MyTable
87 |
88 | # CONCAT
89 | SELECT
90 | CONCAT(product_code, product_color) AS new_product_code
91 | FROM customer_purchase.data
92 | WHERE
93 | Product = ‘couch’
94 |
95 | SELECT usertype
96 | CONCAT(start_station_name, “ to “, end_station_name) AS route,
97 | COUNT (*) as num_trips,
98 | ROUND(AVG(cast(tripduration as int64)/60,2) AS duration
99 | FROM big-query-public-data.ny
100 | GROUP BY start_station_name, end_station_name, usertype
101 | ORDER BY num_trips DESC
102 | LIMIT 10
103 | Reminder: Make sure to use a backtick (`) instead of an apostrophe (') in the FROM statement.
104 | About: ROUND(AVG(cast(tripduration as int64)/60,2) AS duration
105 | Big query stores numbers in a 64-bit memory system, which is why there's a 64 after integer in this case. we'll divide it by the number seconds in a minute (60) and tell it how far we want it to round, two decimal places(2).
106 | # COALESCE
107 | SELECT
108 | COALESCE(product, product_code) AS product_info
109 | FROM customer_data.purchase
110 | Return non-null values in a list
111 |
112 | # COUNT / COUNT DISTINCT
113 | COUNT returns the number of rows in a specified range. COUNT DISTINCT does the same, but it will not count repeating values. Use it after the SELECT line.
114 | SELECT
115 | COUNT(warehouse.state) AS num_states
116 | Warehouse.state is the column and num_states is the new column you are creating to return the count.
117 |
118 |
119 | # CASE
120 | The CASE statement goes through one or more conditions and returns a value as soon as a condition is met
121 | SELECT
122 | Customer_id
123 | CASE
124 | WHEN first_name = ‘Tnoy’ THEN ‘TONY’
125 | ELSE first_name
126 | END AS cleaned_name
127 | FROM customer.data
128 |
129 | # DISTINCT
130 | SELECT DISTINCT fuel_type
131 | FROM cars.car_info;
132 | SELECT DISTINCT name
133 | FROM playlist
134 | ORDER BY playlist_id
135 |
136 | # EXTRACT
137 | This is for if you want to use data from only one part of a column of cells- such as the date from a date format that includes more than the year. This seems like it is useful for if you arent planning on cleaning or manipulating your data before using it.
138 | SELECT
139 | EXTRACT (YEAR FROM STARTTIME) AS year,
140 | COUNT (*) AS number_of_rides
141 | FROM
142 | Address.address
143 | GROUP BY Year
144 | ORDER BY year
145 |
146 | SELECT
147 | ProductId,
148 | SUM (Quantity) AS unitssold,
149 | ROUND (MAX (UnitPrice), 2) AS UnitPrice,
150 | EXTRACT (YEAR FROM DATE) AS year,
151 | EXTRACT (MONTH FROM DATE) AS month
152 | FROM `skillful-coast-340323.sales.sales`
153 | GROUP BY
154 | year, month, ProductId
155 | ORDER BY
156 | year, month, ProductId
157 | LIMIT 1000
158 | The ROUND (MAX (..), #) seems to need to be run because you cannot run UnitPrice on its own “SELECT list expression references column UnitPrice which is neither grouped nor aggregated at [4:5]”. Trying to run quantity on its own as a column got “SELECT list expression references column Quantity which is neither grouped nor aggregated at [3:5]”
159 |
160 |
161 | # JOIN
162 | General join syntax:
163 | SELECT
164 | --table columns are inserted here
165 | Table_name1.column_name
166 | Table_name2.column_name
167 | FROM
168 | Table_name1
169 | JOIN
170 | Table_name2
171 | ON table_name1.column_name=table_name2.column_name
172 | (the column name is the key, be it primary key or foreign key that they share in common)
173 | SELECT
174 | Customers.customer_name,
175 | Orders.product_id,
176 | Orders.ship_date
177 | FROM
178 | Customers
179 | INNER JOIN
180 | Orders
181 | ON customers.customer_id = orders.customer_id
182 |
183 |
184 | SELECT
185 | employees.name AS employee_name,
186 | employees.role AS employee_role,
187 | departments.name AS department_name
188 | FROM employee_data.employees
189 | INNER JOIN
190 | employee_data.departments ON
191 | employees.department_id = departments.department_id
192 |
193 | SELECT
194 | employees.name AS employee_name,
195 | employees.role AS employee_role,
196 | departments.name AS department_name
197 | FROM employee_data.employees
198 | FULL OUTER JOIN
199 | employee_data.departments ON
200 | employees.department_id = departments.department_id
201 |
202 | SELECT
203 | `bigquery-public-data.world_bank_intl_education.international_education`.country_name,
204 | `bigquery-public-data.world_bank_intl_education.country_summary`.country_code,
205 | `bigquery-public-data.world_bank_intl_education.international_education`.value,
206 | `bigquery-public-data.world_bank_intl_education.country_summary`.short_name,
207 | FROM
208 | `bigquery-public-data.world_bank_intl_education.international_education`
209 | INNER JOIN
210 | `bigquery-public-data.world_bank_intl_education.country_summary`
211 | ON `bigquery-public-data.world_bank_intl_education.country_summary`.country_code = `bigquery-public-data.world_bank_intl_education.international_education`.country_code
212 | To use the SAME query but with alias' to clean it up:
213 |
214 | SELECT
215 | edu.country_name,
216 | summary.country_code,
217 | edu.value
218 | FROM
219 | `bigquery-public-data.world_bank_intl_education.international_education` AS edu
220 | INNER JOIN
221 | `bigquery-public-data.world_bank_intl_education.country_summary` AS summary
222 | ON edu.country_code = summary.country_code
223 |
224 | SELECT
225 | seasons.market AS university,
226 | seasons.name AS team_name,
227 | seasons.wins,
228 | seasons.losses,
229 | seasons.ties,
230 | mascots.mascot AS team_mascot
231 | FROM
232 | `bigquery-public-data.ncaa_basketball.mbb_historical_teams_seasons` AS seasons
233 | LEFT JOIN
234 | `bigquery-public-data.ncaa_basketball.mascots` AS mascots
235 | ON
236 | seasons.team_id = mascots.id
237 | WHERE
238 | seasons.season = 1984
239 | AND seasons.division = 1
240 | ORDER BY
241 | seasons.market
242 |
243 | # LENGTH
244 | SELECT length (title) AS letters_in_title, album_id
245 | FROM album
246 | WHERE letters_in_title < 4
247 | The function LENGTH(title) < 4 will return any album names that are less than 4 characters long. The complete query is SELECT * FROM album WHERE LENGTH(title) < 4. The LENGTH function counts the number of characters a string contains.
248 | TRIM
249 |
250 | # MIN/MAX
251 | SELECT
252 | MIN(length) AS min_length,
253 | MAX(length) AS max_length
254 | FROM cars.car_info;
255 |
256 | # Modulo
257 | An operator (%) that returns the remainder when one number is divided by another.
258 |
259 |
260 | # Order By
261 | SELECT *
262 | FROM movie_data.movies
263 | ORDER BY Release_date DESC
264 |
265 | SELECT *
266 | FROM movies.data
267 | WHERE Genre = ‘Comedy’
268 | AND Revenue > 30000000
269 | ORDER BY Release_date DESC
270 |
271 | SELECT total
272 | FROM invoice
273 | WHERE billing_city = "Chicago"
274 | ORDER BY total ASC
275 |
276 |
277 | SELECT County_of_Residence
278 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
279 | ORDER BY Births ASC
280 | LIMIT 10
281 |
282 | SELECT County_of_Residence
283 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
284 | WHERE year = '2018-01-01'
285 | ORDER BY Births DESC
286 | LIMIT 10
287 | The year had to be in ‘ ‘ for it to work, and since it is in date formate, it would not work as simply 2018.
288 |
289 |
290 | SELECT
291 | The meteorologists who you’re working with have asked you to get the temperature, wind speed, and precipitation for stations La Guardia and JFK, for every day in 2020, in descending order by date, and ascending order by Station ID. Use the following query to request this information:
292 | SELECT stn, date,
293 | IF(temp=9999.9, NULL, temp) AS temperature,
294 | IF(wdsp="999.9", NULL, CAST(wdsp AS Float64)) AS wind_speed,
295 | IF(prcp=99.99, 0, prcp) AS precipitation
296 | FROM `bigquery-public-data.noaa_gsod.gsod2020`
297 | WHERE stn="725030" -- La Guardia
298 | OR stn="744860" -- JFK
299 | ORDER BY date DESC, stn ASC
300 | -- Use the IF function to replace 9999.9 values, which the dataset description explains is the default value when temperature is missing, with NULLs instead.
301 | -- Use the IF function to replace 999.9 values, which the dataset description explains is the default value when wind speed is missing, with NULLs instead. -- Use the IF function to replace 99.99 values, which the dataset description explains is the default value when precipitation is missing, with NULLs instead.
302 |
303 | # SUBSTR
304 | SELECT customer_id,
305 | SUBSTR(country,1,3) AS new_country
306 | FROM customer
307 | ORDER BY country
308 | The statement SUBSTR(country, 1, 3) AS new_country will retrieve the first 3 letters of each state name and store the result in a new column as new_country. The complete query is SELECT customer_id, SUBSTR(country, 1, 3) AS new_country FROM customer ORDER BY country. The SUBSTR function extracts a substring from a string. This function instructs the database to return 3 characters of each country, starting with the first character.
309 | SELECT Invoice_id,
310 | SUBSTR(billing_city,1,4) AS new_city
311 | FROM invoice
312 | ORDER BY billing_city
313 | Billing city= city, 1=starting position in string, 4=how many you return
314 |
315 | # UPDATE
316 | UPDATE cars.car_info
317 | SET num_of_doors = "four"
318 | WHERE make = "dodge" AND fuel_type = "gas" AND body_style = "sedan";
319 |
320 | # WHERE
321 | SELECT *
322 | FROM cars.car_info
323 | WHERE num_of_doors IS NULL;
324 |
325 | SELECT *
326 | FROM movies.data
327 | WHERE Genre = ‘Comedy’
328 | Because the genre is a string, you need to put the ‘ ‘ around the string name. Capitalizations matter
329 |
330 | SELECT CustomerId
331 | FROM invoices
332 | WHERE BillingCountry = 'Germany' AND Total > 5
333 |
334 | # WITH
335 | With allows you to create temporary tables and query from them right within your query. You can also use SELECT INTO and CREATE TEMP TABLE
336 | WITH trips_over_one_hour AS (
337 | SELECT *
338 | FROM `bigquery-public-data.new_york_citibike.citibike_trips`
339 | WHERE tripduration >= 60
340 | )
341 | SELECT COUNT (*) AS cnt
342 | FROM trips_over_one_hour
343 |
344 | In order to get this one to work (as I was writing myself) I had to take the ‘ ‘ from around FROM name,and make sure I had my commas in the right place. It took me a long time.
345 | WITH
346 | longest_used_bike AS (
347 | SELECT
348 | bikeid,
349 | SUM(duration_minutes) AS trip_duration
350 | FROM
351 | bigquery-public-data.austin_bikeshare.bikeshare_trips
352 | GROUP BY
353 | bikeid
354 | ORDER BY
355 | trip_duration DESC
356 | LIMIT 1
357 | )
358 |
359 | ##_find station at which longest bike leaves most often
360 | SELECT
361 | trips.start_station_id,
362 | COUNT(*) AS trip_ct
363 | FROM
364 | longest_used_bike AS longest
365 | INNER JOIN
366 | bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips
367 | ON longest.bikeid = trips.bikeid
368 | GROUP BY
369 | trips.start_station_id
370 | ORDER BY
371 | trip_ct DESC
372 | LIMIT 1
373 |
374 |
375 |
376 |
377 |
--------------------------------------------------------------------------------
/SQL sample file:
--------------------------------------------------------------------------------
1 | # AVERAGE TEMPERATURE
2 | SELECT AVG(temperature)
3 | FROM `skillful-coast-340323.demos.weather_nyc`
4 | WHERE date BETWEEN '2020-06-01' AND '2020-06-30'
5 |
6 |
7 | # BETWEEN
8 | SELECT
9 | Date, purchase_price
10 | FROM customer_data.purchases
11 | WHERE
12 | Date BETWEEN ‘2020-12-01’ AND ‘2020-12-20’
13 |
14 | # CAST
15 | SELECT
16 | CAST(purchase_price AS FLOAT64)
17 | FROM customer.data_purchase
18 | ORDER BY purchase_price DESC
19 |
20 | #SELECT
21 | CAST(date AS date) AS date_only
22 | FROM customer_data.purchases
23 | The above statement changes SQL from recognizing the dates as datetime (2020-12-12T0:00:00) to only date (2020-12-12)
24 |
25 | # CAST(expression AS typename) Where expression is the data to be converted and typename is the data type to be returned.
26 |
27 | ## Converting a number to a string:
28 | SELECT CAST(MyCount AS String) FROM MyTable
29 | In the above SQL statement, the following occurs: SELECT indicates that you will be selecting data from a table. CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. STRING indicates that you are converting the data to a string. FROM indicates which table you are selecting the data from
30 | Converting String to a number:
31 | SELECT CAST(MyVarcharCol AS INT) FROM MyTable
32 | CAST indicates that you will be converting the data you select to a different data type. AS comes before and identifies the data type which you are casting to. INT indicates that you are converting the data to an integer
33 | Convert date to a string:
34 | SELECT CAST(MyDate AS STRING) FROM MyTable
35 | Converting a date to a datetime: Datetime values have the format of YYYY-MM-DD hh: mm: ss format
36 | SELECT CAST (MyDate AS DATETIME) FROM MyTable
37 | The SAFE_CAST function: Using the CAST function in a query that fails returns an error in BigQuery. It returns null instead of error.
38 | SELECT SAFE_CAST (MyDate AS STRING) FROM MyTable
39 |
40 | CONCAT
41 | SELECT
42 | CONCAT(product_code, product_color) AS new_product_code
43 | FROM customer_purchase.data
44 | WHERE
45 | Product = ‘couch’
46 |
47 | COALESCE
48 | SELECT
49 | COALESCE(product, product_code) AS product_info
50 | FROM customer_data.purchase
51 | Return non-null values in a list
52 |
53 | CASE
54 | The CASE statement goes through one or more conditions and returns a value as soon as a condition is met
55 | SELECT
56 | Customer_id
57 | CASE
58 | WHEN first_name = ‘Tnoy’ THEN ‘TONY’
59 | ELSE first_name
60 | END AS cleaned_name
61 | FROM customer.data
62 |
63 | DISTINCT
64 | SELECT DISTINCT fuel_type
65 | FROM cars.car_info;
66 | SELECT DISTINCT name
67 | FROM playlist
68 | ORDER BY playlist_id
69 |
70 | LENGTH
71 | SELECT length (title) AS letters_in_title, album_id
72 | FROM album
73 | WHERE letters_in_title < 4
74 | The function LENGTH(title) < 4 will return any album names that are less than 4 characters long. The complete query is SELECT * FROM album WHERE LENGTH(title) < 4. The LENGTH function counts the number of characters a string contains.
75 | TRIM
76 |
77 | MIN/MAX
78 | SELECT
79 | MIN(length) AS min_length,
80 | MAX(length) AS max_length
81 | FROM cars.car_info;
82 |
83 |
84 |
85 | Order By
86 | SELECT *
87 | FROM movie_data.movies
88 | ORDER BY Release_date DESC
89 |
90 | SELECT *
91 | FROM movies.data
92 | WHERE Genre = ‘Comedy’
93 | AND Revenue > 30000000
94 | ORDER BY Release_date DESC
95 |
96 | SELECT total
97 | FROM invoice
98 | WHERE billing_city = "Chicago"
99 | ORDER BY total ASC
100 |
101 |
102 | SELECT County_of_Residence
103 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
104 | ORDER BY Births ASC
105 | LIMIT 10
106 |
107 | SELECT County_of_Residence
108 | FROM `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
109 | WHERE year = '2018-01-01'
110 | ORDER BY Births DESC
111 | LIMIT 10
112 | The year had to be in ‘ ‘ for it to work, and since it is in date formate, it would not work as simply 2018.
113 |
114 |
115 | SELECT
116 | The meteorologists who you’re working with have asked you to get the temperature, wind speed, and precipitation for stations La Guardia and JFK, for every day in 2020, in descending order by date, and ascending order by Station ID. Use the following query to request this information:
117 | SELECT stn, date,
118 | IF(temp=9999.9, NULL, temp) AS temperature,
119 | IF(wdsp="999.9", NULL, CAST(wdsp AS Float64)) AS wind_speed,
120 | IF(prcp=99.99, 0, prcp) AS precipitation
121 | FROM `bigquery-public-data.noaa_gsod.gsod2020`
122 | WHERE stn="725030" -- La Guardia
123 | OR stn="744860" -- JFK
124 | ORDER BY date DESC, stn ASC
125 | -- Use the IF function to replace 9999.9 values, which the dataset description explains is the default value when temperature is missing, with NULLs instead.
126 | -- Use the IF function to replace 999.9 values, which the dataset description explains is the default value when wind speed is missing, with NULLs instead. -- Use the IF function to replace 99.99 values, which the dataset description explains is the default value when precipitation is missing, with NULLs instead.
127 |
128 | SUBSTR
129 | SELECT customer_id,
130 | SUBSTR(country,1,3) AS new_country
131 | FROM customer
132 | ORDER BY country
133 | The statement SUBSTR(country, 1, 3) AS new_country will retrieve the first 3 letters of each state name and store the result in a new column as new_country. The complete query is SELECT customer_id, SUBSTR(country, 1, 3) AS new_country FROM customer ORDER BY country. The SUBSTR function extracts a substring from a string. This function instructs the database to return 3 characters of each country, starting with the first character.
134 | SELECT Invoice_id,
135 | SUBSTR(billing_city,1,4) AS new_city
136 | FROM invoice
137 | ORDER BY billing_city
138 | Billing city= city, 1=starting position in string, 4=how many you return
139 |
140 | UPDATE
141 | UPDATE cars.car_info
142 | SET num_of_doors = "four"
143 | WHERE make = "dodge" AND fuel_type = "gas" AND body_style = "sedan";
144 |
145 | WHERE
146 | SELECT *
147 | FROM cars.car_info
148 | WHERE num_of_doors IS NULL;
149 |
150 | SELECT *
151 | FROM movies.data
152 | WHERE Genre = ‘Comedy’
153 | Because the genre is a string, you need to put the ‘ ‘ around the string name. Capitalizations matter
154 |
155 | SELECT CustomerId
156 | FROM invoices
157 | WHERE BillingCountry = 'Germany' AND Total > 5
158 |
159 |
160 |
161 |
--------------------------------------------------------------------------------