├── .github
├── CODEOWNERS
└── pull_request_template.md
├── .gitignore
├── README.md
├── analyses
├── .gitkeep
├── driver_bets.sql
├── lap_time_analysis_python_worksheet.py
└── pit_stops_analysis_python_worksheet.py
├── dbt_project.yml
├── macros
├── .gitkeep
├── convert_laptime.sql
├── macros.yml
└── test_all_values_gte_zero.sql
├── models
├── column_definitions.md
├── core
│ ├── core.yml
│ ├── core_table_definitions.md
│ ├── dim_circuits.sql
│ ├── dim_constructors.sql
│ ├── dim_drivers.sql
│ ├── dim_races.sql
│ ├── dim_seasons.sql
│ ├── dim_status.sql
│ ├── fct_constructor_results.sql
│ ├── fct_constructor_standings.sql
│ ├── fct_driver_standings.sql
│ ├── fct_lap_times.sql
│ ├── fct_pit_stops.sql
│ ├── fct_qualifying.sql
│ ├── fct_results.sql
│ └── fct_sprint_results.sql
├── marts
│ ├── aggregates
│ │ ├── agg_fastest_pit_stops_by_constructor.py
│ │ ├── agg_lap_times_moving_avg.py
│ │ └── aggregates.yml
│ ├── mrt.md
│ ├── mrt.yml
│ ├── mrt_lap_times_years.sql
│ ├── mrt_pit_stops.sql
│ ├── mrt_results.sql
│ └── mrt_results_circuits.sql
├── ml
│ ├── prep_encoding_splitting
│ │ ├── covariate_encoding.py
│ │ ├── encoding_mapping.py
│ │ ├── hold_out_dataset_for_prediction.py
│ │ ├── ml_data_prep.py
│ │ └── training_testing_dataset.py
│ └── training_and_prediction
│ │ ├── apply_prediction_to_position.py
│ │ └── train_model_to_predict_position.py
├── overview.md
└── staging
│ └── formula1
│ ├── f1_sources.yml
│ ├── stage_model_descriptions.md
│ ├── staging.yml
│ ├── stg_circuits.sql
│ ├── stg_constructor_results.sql
│ ├── stg_constructor_standings.sql
│ ├── stg_constructors.sql
│ ├── stg_driver_standings.sql
│ ├── stg_drivers.sql
│ ├── stg_lap_times.sql
│ ├── stg_pit_stops.sql
│ ├── stg_qualifying.sql
│ ├── stg_races.sql
│ ├── stg_results.sql
│ ├── stg_seasons.sql
│ ├── stg_sprint_results.sql
│ └── stg_status.sql
├── packages.yml
├── seeds
└── .gitkeep
├── setup
├── existing_account_snowflake_grants.sql
├── pc_snowflake_grants.sql
└── setup_script_s3_to_snowflake.sql
├── snapshots
└── .gitkeep
└── tests
├── .gitkeep
├── lap_times_moving_avg_assert_positive_or_null.sql
└── macro_pit_stops_mean_is_positive.sql
/.github/CODEOWNERS:
--------------------------------------------------------------------------------
1 | # Owning team for this repo
2 | * @dbt-labs/solutions-architects
3 |
--------------------------------------------------------------------------------
/.github/pull_request_template.md:
--------------------------------------------------------------------------------
1 |
7 |
8 | ## Description & motivation
9 |
13 |
14 | ## To-do before merge
15 |
22 |
23 | ## Screenshots:
24 |
28 |
29 | ## Validation of models:
30 |
35 |
36 | ## Changes to existing models:
37 |
42 |
43 | ## Checklist:
44 |
50 | - [ ] My pull request represents one logical piece of work.
51 | - [ ] My commits are related to the pull request and look clean.
52 | - [ ] My SQL follows the [dbt Labs style guide](https://github.com/dbt-labs/corp/blob/master/dbt_style_guide.md).
53 | - [ ] I have materialized my models appropriately.
54 | - [ ] I have added appropriate tests and documentation to any new models.
55 | - [ ] I have updated the README file.
56 | {%- if project.warehouse == 'redshift' %}
57 | - [ ] I have added sort and dist keys to models materialized as tables.
58 | - [ ] I have validated the SQL in any late-binding views.
59 | {% endif %}
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | target/
3 | dbt_packages/
4 | logs/
5 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | ## About the project
3 | This repository is the code for supporting both a [dbt guide](https://docs.getdbt.com/guides/dbt-ecosystem/dbt-python-snowpark/1-overview-dbt-python-snowpark) and [Snowflake quickstart](https://quickstarts.snowflake.com/guide/leverage_dbt_cloud_to_generate_ml_ready_pipelines_using_snowpark_python/index.html?index=..%2F..index#0).
4 | This repository showcases how to combine both SQL and python together in the same data pipeline to run both analytics and machine learning models on dbt Cloud. We are able to blend these together seamlessly using Snowpark for python on Snowflake.
5 | The project utilizes open source Formula 1 data. This repository, python-snowpark-formula1, is code complete and contains all models end to end.
6 |
7 |
8 | This repo create insights for:
9 | - Finding the lap time average and rolling average through the years
10 | - Predicting the position of each driver based on a decade of data
11 |
12 | First we fetch the Formula 1 data, hosted on a dbt Labs public S3 bucket. The code for this is under the `setup` folder in the file `setup_script_s3_to_snowflake.sql`.
13 | We will create a Snowflake Stage for our CSV files then use Snowflake's COPY INTO function to copy the data in from our CSV files into tables. The Formula 1 is available on Kaggle. The data is originally compiled from the Ergast Developer API.
14 | We will not be building the full pipeline as part of this workshop. Instead we will leverage an exisitng repo, fork it, and focus on our machine learning pipeline.
15 |
16 | ## Step by step quickstart
17 | You can follow along how to build out some of the pipeline's code by following the [snowflake quickstart Leverage dbt Cloud to Generate ML ready pipelines using snowpark python](https://quickstarts.snowflake.com/guide/leverage_dbt_cloud_to_generate_ml_ready_pipelines_using_snowpark_python/index.html?index=..%2F..index#0).
18 | The quickstart is abridged for time, meaning not every file in this repo is built from scratch. The quickstart utilizes a separate forked repository that has models (.sql and .py files) dropped from this repository (python-snowpark-formula1).
19 |
20 | ## Getting started
21 | You do not need to create any local setup to replicate and run this project!
22 |
23 | All you need to get started are:
24 | 1. A Snowflake account with ACCOUNTADMIN access (you can spin up a trial account)
25 | 2. A GitHub Account
26 |
27 | A dbt cloud account will be setup through a process called Partner Connect. This creates a dbt environment to login to, in browswer, that interpolates dbt code!
28 |
29 | If you'd like to run this project locally, no problem! That setup won't be covered here and you should be familiar with using dbt core and Snowflake notebook connections locally.
30 |
31 | ## What you can learn by exploring this repo:
32 | - How to use dbt with Snowflake to build scalable transformations using SQL and Python
33 | - How to use dbt SQL to prepare your data from sources to encoding
34 | - How to train a model in dbt python and use it for future prediction
35 |
36 | ## Prerequisites
37 | To get the most out this repo it's best to have:
38 | - Basic to intermediate SQL and python.
39 | - Basic understanding of dbt fundamentals. We recommend the [dbt Fundamentals course](https://courses.getdbt.com/courses/fundamentals) if you're interested.
40 | - High level understanding of machine learning processes (encoding, training, testing)
41 | - Simple ML algorithms — we will use logistic regression to keep the focus on the workflow, not algorithms!
42 |
43 | **Bonus:** if you have completed the dbt workshop, [Accelerating Data Teams with Snowflake and dbt Cloud Hands On Lab](https://quickstarts.snowflake.com/guide/accelerating_data_teams_with_snowflake_and_dbt_cloud_hands_on_lab/index.html?index=..%2F..index#0), to have hands on keyboard practice with concepts like the source, ref, tests, and docs in dbt.
44 | By having completed that workshop, you will gain the most of this dbt python + snowpark workshop.
45 |
46 | ## Roadmap and future improvements
47 | We will be updating this project from time to time for code cleanup, automation, and developing best practices.
48 | So far the list of future improvements is as follows:
49 | [ ] label encorder clean up for numeric variables
50 | [ ] ohe for the categorical variables
51 | [ ] multi-class accuracy
52 | [ ] saving accurary to a separate file to monitor model degradation/decay
53 | [ ] trying out https://github.com/omnata-labs/dbt-ml-preprocessing for some of preprocessing?
54 |
55 | ## Contributing
56 | Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated!!
57 | Really, any changes you'd like to see made to this project come to life is awesome!
58 |
59 | If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
60 |
61 | To make a contribution follow these steps:
62 | 1. Fork the Project
63 | 2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
64 | 3. Commit your Changes (git commit -m 'Add some AmazingFeature')
65 | 4. Push to the Branch (git push origin feature/AmazingFeature)
66 | 5. Open a Pull Request
67 |
68 | ## Contact
69 | Hope Watson — [LinkedIn](https://www.linkedin.com/in/hopewatson/)
70 |
71 | ## Acknowledgments
72 |
73 | I want to give a massive thanks to [Eda Johnson](https://www.linkedin.com/in/eda-johnson-saa-csa-pmp-0a2783/) at Snowflake. She was immensely helpful and in the python models you can see where her proof of concept served as the workflow we iterated on in this project.
74 |
75 | Thanks to dbt Labs and my former manager and colleague, [Amy Chen](https://www.linkedin.com/in/yuanamychen/), for giving me the time to continue building skills and becoming technically more competent each day.
76 |
--------------------------------------------------------------------------------
/analyses/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dbt-labs/python-snowpark-formula1/b6caebae7914a119c66ac16a4430f5e10faf41a8/analyses/.gitkeep
--------------------------------------------------------------------------------
/analyses/driver_bets.sql:
--------------------------------------------------------------------------------
1 | with predictions as (
2 | select * from {{ ref('apply_prediction_to_position') }}
3 | ),
4 |
5 | encoding_mapping as (
6 | select * from {{ ref('encoding_mapping') }}
7 | )
8 |
9 | select
10 | original_driver_value,
11 | AVG(position_predicted) as avg_position_predicted,
12 | encoded_driver_value,
13 | predictions.driver
14 |
15 | from predictions
16 | join encoding_mapping on predictions.driver=encoding_mapping.encoded_driver_value
17 | group by 1,3,4
18 | order by avg_position_predicted
19 |
--------------------------------------------------------------------------------
/analyses/lap_time_analysis_python_worksheet.py:
--------------------------------------------------------------------------------
1 | # The Snowpark package is required for Python Worksheets.
2 | # You can add more packages by selecting them using the Packages control and then importing them.
3 |
4 | import snowflake.snowpark as snowpark
5 | import pandas as pd
6 |
7 | def main(session: snowpark.Session):
8 | # Your code goes here, inside the "main" handler.
9 | tableName = 'MRT_LAP_TIMES_YEARS'
10 | dataframe = session.table(tableName)
11 | lap_times = dataframe.to_pandas()
12 |
13 | # print table
14 | print(lap_times)
15 |
16 | # describe the data
17 | lap_times["LAP_TIME_SECONDS"] = lap_times["LAP_TIME_MILLISECONDS"]/1000
18 | lap_time_trends = lap_times.groupby(by="RACE_YEAR")["LAP_TIME_SECONDS"].mean().to_frame()
19 | lap_time_trends.reset_index(inplace=True)
20 | lap_time_trends["LAP_MOVING_AVG_5_YEARS"] = lap_time_trends["LAP_TIME_SECONDS"].rolling(5).mean()
21 | lap_time_trends.columns = lap_time_trends.columns.str.upper()
22 |
23 | final_df = session.create_dataframe(lap_time_trends)
24 | # Return value will appear in the Results tab.
25 | return final_df
--------------------------------------------------------------------------------
/analyses/pit_stops_analysis_python_worksheet.py:
--------------------------------------------------------------------------------
1 | # The Snowpark package is required for Python Worksheets.
2 | # You can add more packages by selecting them using the Packages control and then importing them.
3 |
4 | import snowflake.snowpark as snowpark
5 | from snowflake.snowpark.functions import col
6 | import pandas as pd
7 | import numpy as np
8 |
9 | def main(session: snowpark.Session):
10 | # Your code goes here, inside the "main" handler.
11 | tableName = 'PIT_STOPS_JOINED'
12 | dataframe = session.table(tableName)
13 | pit_stops_joined = dataframe.to_pandas()
14 |
15 | # print table
16 | print(pit_stops_joined)
17 |
18 | # provide year so we do not hardcode dates
19 | year=2021
20 |
21 | # describe the data
22 | pit_stops_joined["PIT_STOP_SECONDS"] = pit_stops_joined["PIT_STOP_MILLISECONDS"]/1000
23 | fastest_pit_stops = pit_stops_joined[(pit_stops_joined["RACE_YEAR"]==year)].groupby(by="CONSTRUCTOR_NAME")["PIT_STOP_SECONDS"].describe().sort_values(by='mean')
24 | fastest_pit_stops.reset_index(inplace=True)
25 | fastest_pit_stops.columns = fastest_pit_stops.columns.str.upper()
26 |
27 | final_df = session.create_dataframe(fastest_pit_stops)
28 | # Return value will appear in the Results tab.
29 | return final_df
--------------------------------------------------------------------------------
/dbt_project.yml:
--------------------------------------------------------------------------------
1 | name: 'snowflake_python_workshop'
2 | version: '1.5.0'
3 | require-dbt-version: '>=1.3.0'
4 | config-version: 2
5 |
6 | # This setting configures which "profile" dbt uses for this project.
7 | profile: 'default'
8 |
9 | # These configurations specify where dbt should look for different types of files.
10 | # The `source-paths` config, for example, states that models in this project can be
11 | # found in the "models/" directory. You probably won't need to change these!
12 | model-paths: ["models"]
13 | analysis-paths: ["analyses"]
14 | test-paths: ["tests"]
15 | seed-paths: ["seeds"]
16 | macro-paths: ["macros"]
17 | snapshot-paths: ["snapshots"]
18 |
19 | target-path: "target" # directory which will store compiled SQL files
20 | clean-targets: # directories to be removed by `dbt clean`
21 | - "target"
22 | - "dbt_packages"
23 |
24 | models:
25 | snowflake_python_workshop:
26 | staging:
27 | +docs:
28 | node_color: "CadetBlue"
29 |
30 | marts:
31 | +materialized: table
32 | aggregates:
33 | +docs:
34 | node_color: "Maroon"
35 | +tags: "bi"
36 |
37 | core:
38 | +materialized: table
39 | +docs:
40 | node_color: "#800080"
41 |
42 | ml:
43 | +materialized: table
44 | prep_encoding_splitting:
45 | +docs:
46 | node_color: "Indigo"
47 | training_and_prediction:
48 | +docs:
49 | node_color: "Black"
--------------------------------------------------------------------------------
/macros/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dbt-labs/python-snowpark-formula1/b6caebae7914a119c66ac16a4430f5e10faf41a8/macros/.gitkeep
--------------------------------------------------------------------------------
/macros/convert_laptime.sql:
--------------------------------------------------------------------------------
1 | {% macro convert_laptime(column_name) %}
2 | TIME_FROM_PARTS(
3 | 00, -- hours (always 0)
4 | IFF(CONTAINS({{ column_name }}, ':'),
5 | LPAD(SPLIT_PART({{ column_name }}, ':', 1), 2, '0'),
6 | 00), -- minutes
7 | IFF(CONTAINS({{ column_name }}, ':'),
8 | SPLIT_PART(SPLIT_PART({{ column_name }}, ':', 2), '.', 1),
9 | SPLIT_PART({{ column_name }}, '.', 1)), -- seconds
10 | IFF(CONTAINS({{ column_name }}, ':'),
11 | RPAD(SPLIT_PART(SPLIT_PART({{ column_name }}, ':', 2), '.', 2), 9, '0'),
12 | RPAD(SPLIT_PART({{ column_name }}, '.', 2), 9, '0')) -- nanoseconds
13 | )
14 | {% endmacro %}
--------------------------------------------------------------------------------
/macros/macros.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | macros:
4 | - name: convert_laptime
5 | description: "This macro converts formatted time string into actual time variable type."
--------------------------------------------------------------------------------
/macros/test_all_values_gte_zero.sql:
--------------------------------------------------------------------------------
1 | {% macro test_all_values_gte_zero(table, column) %}
2 |
3 | select * from {{ ref(table) }} where {{ column }} < 0
4 |
5 | {% endmacro %}
--------------------------------------------------------------------------------
/models/column_definitions.md:
--------------------------------------------------------------------------------
1 | {% docs circuit_id_def %}
2 |
3 | This column is the unique identifier for the circuit.
4 |
5 | {% enddocs %}
6 |
7 | {% docs circuit_ref_def %}
8 |
9 | This column contains the circuit reference code.
10 |
11 | {% enddocs %}
12 |
13 | {% docs circuit_name_def %}
14 |
15 | This column contains the name of the circuit.
16 |
17 | {% enddocs %}
18 |
19 | {% docs circuit_location_def %}
20 |
21 | This column contains the location of the circuit.
22 |
23 | {% enddocs %}
24 |
25 | {% docs circuit_country_def %}
26 |
27 | This column contains the country where the circuit is located.
28 |
29 | {% enddocs %}
30 |
31 | {% docs circuit_lat_def %}
32 |
33 | This column contains the latitude of the circuit.
34 |
35 | {% enddocs %}
36 |
37 | {% docs circuit_lng_def %}
38 |
39 | This column contains the longitude of the circuit.
40 |
41 | {% enddocs %}
42 |
43 | {% docs circuit_alt_def %}
44 |
45 | This column contains the altitude of the circuit.
46 |
47 | {% enddocs %}
48 |
49 | {% docs circuit_url_def %}
50 |
51 | This column contains the URL to the circuit's Wikipedia page.
52 |
53 | {% enddocs %}
54 |
55 | {% docs constructor_results_id_def %}
56 |
57 | This column is the unique identifier for each constructor result.
58 |
59 | {% enddocs %}
60 |
61 | {% docs race_id_def %}
62 |
63 | This column is the unique identifier for each race.
64 |
65 | {% enddocs %}
66 |
67 | {% docs constructor_points_def %}
68 |
69 | This column contains the points scored by the constructor in the race.
70 |
71 | {% enddocs %}
72 |
73 | {% docs constructor_status_def %}
74 |
75 | This column contains the constructor's status (e.g., "+1 Lap", "Finished", etc.).
76 |
77 | {% enddocs %}
78 |
79 | {% docs constructor_ref_def %}
80 |
81 | This column contains the constructor reference code.
82 |
83 | {% enddocs %}
84 |
85 | {% docs constructor_name_def %}
86 |
87 | This column contains the name of the constructor.
88 |
89 | {% enddocs %}
90 |
91 | {% docs constructor_url_def %}
92 |
93 | This column contains the URL to the constructor's Wikipedia page.
94 |
95 | {% enddocs %}
96 |
97 | {% docs driver_standings_id_def %}
98 |
99 | This column is the unique identifier for each driver standing.
100 |
101 | {% enddocs %}
102 |
103 | {% docs driver_points_def %}
104 |
105 | This column contains the total points scored by the driver in the season.
106 |
107 | {% enddocs %}
108 |
109 | {% docs driver_position_def %}
110 |
111 | This column contains the driver's position in the standings.
112 |
113 | {% enddocs %}
114 |
115 | {% docs position_text_def %}
116 |
117 | This column contains the text representation of the driver's position in the standings.
118 |
119 | {% enddocs %}
120 |
121 | {% docs driver_wins_def %}
122 |
123 | This column contains the number of wins by the driver in the season.
124 |
125 | {% enddocs %}
126 |
127 | {% docs driver_ref_def %}
128 |
129 | This column contains the driver reference code.
130 |
131 | {% enddocs %}
132 |
133 | {% docs driver_number_def %}
134 |
135 | This column contains the driver's racing number.
136 |
137 | {% enddocs %}
138 |
139 | {% docs drivercode_def %}
140 |
141 | This column contains the driver's three-letter code.
142 |
143 | {% enddocs %}
144 |
145 | {% docs driverforename_def %}
146 |
147 | This column contains the driver's first name.
148 |
149 | {% enddocs %}
150 |
151 | {% docs driver_surname_def %}
152 |
153 | This column contains the driver's last name.
154 |
155 | {% enddocs %}
156 |
157 | {% docs driver_dob_def %}
158 |
159 | This column contains the driver's date of birth.
160 |
161 | {% enddocs %}
162 |
163 | {% docs driver_nationality_def %}
164 |
165 | This column contains the driver's nationality.
166 |
167 | {% enddocs %}
168 |
169 | {% docs driver_url_def %}
170 |
171 | This column contains the URL to the driver's Wikipedia page.
172 |
173 | {% enddocs %}
174 |
175 | {% docs year_def %}
176 | The year in which the event (e.g., race, season) took place.
177 | {% enddocs %}
178 |
179 | {% docs round_def %}
180 | The round number of the race within the season.
181 | {% enddocs %}
182 |
183 | {% docs race_name_def %}
184 | The official name of the event (e.g., race, circuit).
185 | {% enddocs %}
186 |
187 | {% docs date_def %}
188 | The date of the event (e.g., race, season start).
189 | {% enddocs %}
190 |
191 | {% docs start_time_def %}
192 | The starting time of the race or the total race time for the driver.
193 | {% enddocs %}
194 |
195 | {% docs url_def %}
196 | The URL of the event's (e.g., race, season, driver) Wikipedia page.
197 | {% enddocs %}
198 |
199 | {% docs number_def %}
200 | The car number of the driver.
201 | {% enddocs %}
202 |
203 | {% docs grid_def %}
204 | The starting grid position of the driver.
205 | {% enddocs %}
206 |
207 | {% docs position_def %}
208 | The finishing position of the driver.
209 | {% enddocs %}
210 |
211 | {% docs position_order_def %}
212 | The finishing position order of the driver.
213 | {% enddocs %}
214 |
215 | {% docs points_def %}
216 | The points earned by the driver or constructor in the race or season.
217 | {% enddocs %}
218 |
219 | {% docs laps_def %}
220 | The number of laps completed by the driver.
221 | {% enddocs %}
222 |
223 | {% docs race_milliseconds_def %}
224 | The total race time for the driver in milliseconds.
225 | {% enddocs %}
226 |
227 | {% docs results_milliseconds_def %}
228 | The total race time for the driver in milliseconds.
229 | {% enddocs %}
230 |
231 | {% docs fastest_lap_def %}
232 | The lap number of the driver's fastest lap.
233 | {% enddocs %}
234 |
235 | {% docs rank_def %}
236 | The rank of the driver's fastest lap time among all drivers.
237 | {% enddocs %}
238 |
239 | {% docs fastest_lap_time_def %}
240 | The driver's fastest lap time.
241 | {% enddocs %}
242 |
243 | {% docs fastest_lap_speed_def %}
244 | The driver's average speed during their fastest lap.
245 | {% enddocs %}
246 |
247 | {% docs result_id_def %}
248 | This column is the unique identifier/primary key for the result table.
249 | {% enddocs %}
250 |
251 | {% docs driver_id_def %}
252 | The unique identifier of the driver.
253 | {% enddocs %}
254 |
255 | {% docs constructor_id_def %}
256 | The unique identifier of the constructor/team.
257 | {% enddocs %}
258 |
259 | {% docs status_id_def %}
260 | The unique identifier of the driver's finishing status (e.g., finished, retired).
261 | {% enddocs %}
262 |
263 | {% docs lap_def %}
264 | The lap number within the race.
265 | {% enddocs %}
266 |
267 | {% docs lap_time_def %}
268 | The lap time for the driver in the format 'MM:SS.mmm'.
269 | {% enddocs %}
270 |
271 | {% docs lap_time_milliseconds_def %}
272 | The lap time for the driver in milliseconds.
273 | {% enddocs %}
274 |
275 | {% docs contructor_name_def %}
276 | The name of the constructor/team.
277 | {% enddocs %}
278 |
279 | {% docs constructor_nationality_def %}
280 | The nationality of the constructor/team.
281 | {% enddocs %}
282 |
283 | {% docs driver_ref_num_def %}
284 | The driver's permanent reference number.
285 | {% enddocs %}
286 |
287 | {% docs driver_code_def %}
288 | The driver's three-letter abbreviation code.
289 | {% enddocs %}
290 |
291 | {% docs driver_forename_def %}
292 | The driver's first name.
293 | {% enddocs %}
294 |
295 | {% docs constructor_wins_def %}
296 |
297 | This column contains the number of wins by the constructor in the season.
298 |
299 | {% enddocs %}
300 |
301 | {% docs official_lap_time_def %}
302 | The lap time for the driver in the time format.
303 | {% enddocs %}
304 |
305 | {% docs stop_number_def %}
306 | The count of pit stops during the race.
307 | {% enddocs %}
308 |
309 | {% docs pit_stop_time_def %}
310 | Formatted pit stop time (of day)
311 | {% enddocs %}
312 |
313 | {% docs pit_stop_duration_seconds_def %}
314 | Formatted time (duration) of pitstop.
315 | {% enddocs %}
316 |
317 | {% docs pit_stop_duration_def %}
318 | Pit stop time in TIME variable.
319 | {% enddocs %}
320 |
321 | {% docs pit_stop_milliseconds_def %}
322 | NUMBER for the milliseconds spent during pit stop.
323 | {% enddocs %}
324 |
325 | {% docs qualifying_position_def %}
326 | Final qualifying position.
327 | {% enddocs %}
328 |
329 | {% docs q1_time_def %}
330 | Qualifying 1 time (MM:SS.mmm format).
331 | {% enddocs %}
332 |
333 | {% docs q2_time_def %}
334 | Qualifying 2 time (MM:SS.mmm format).
335 | {% enddocs %}
336 |
337 | {% docs q3_time_def %}
338 | Qualifying 3 time (MM:SS.mmm format).
339 | {% enddocs %}
340 |
341 | {% docs race_year_def %}
342 | The year in which the race took place.
343 | {% enddocs %}
344 |
345 | {% docs race_round_def %}
346 | The round number of the race within the championship.
347 | {% enddocs %}
348 |
349 | {% docs race_date_def %}
350 | Date of the race event (YYYY-MM-DD format).
351 | {% enddocs %}
352 |
353 | {% docs race_time_def %}
354 | Scheduled time of the race event (HH:MM:SS format).
355 | {% enddocs %}
356 |
357 | {% docs race_url_def %}
358 | Wikipedia URL for the specific race event.
359 | {% enddocs %}
360 |
361 | {% docs fp1_date_def %}
362 | Date of the first free practice session (YYYY-MM-DD format).
363 | {% enddocs %}
364 |
365 | {% docs fp1_time_def %}
366 | Scheduled time of the first free practice session (HH:MM:SS format).
367 | {% enddocs %}
368 |
369 | {% docs fp2_date_def %}
370 | Date of the second free practice session (YYYY-MM-DD format).
371 | {% enddocs %}
372 |
373 | {% docs fp2_time_def %}
374 | Scheduled time of the second free practice session (HH:MM:SS format).
375 | {% enddocs %}
376 |
377 | {% docs fp3_date_def %}
378 | Date of the third free practice session (YYYY-MM-DD format).
379 | {% enddocs %}
380 |
381 | {% docs fp3_time_def %}
382 | Scheduled time of the third free practice session (HH:MM:SS format).
383 | {% enddocs %}
384 |
385 | {% docs quali_date_def %}
386 | Date of the qualifying session (YYYY-MM-DD format).
387 | {% enddocs %}
388 |
389 | {% docs quali_time_def %}
390 | Scheduled time of the qualifying session (HH:MM:SS format).
391 | {% enddocs %}
392 |
393 | {% docs sprint_date_def %}
394 | Date of the sprint race (YYYY-MM-DD format), if applicable.
395 | {% enddocs %}
396 |
397 | {% docs sprint_time_def %}
398 | Scheduled time of the sprint race (HH:MM:SS format), if applicable.
399 | {% enddocs %}
400 |
401 | {% docs driver_rank_def %}
402 | The rank of the driver for the event.
403 | {% enddocs %}
404 |
405 | {% docs status_def %}
406 | The status during a race such as how many laps down, collision, mechanical issue, finish, etc.
407 | {% enddocs %}
408 |
409 |
--------------------------------------------------------------------------------
/models/core/core.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | models:
4 | - name: dim_circuits
5 | description: '{{ doc("dim_circuits") }}'
6 | columns:
7 | - name: dim_circuit_id
8 | description: "The primary key for this table, each circuit is distinct"
9 | tests:
10 | - unique
11 | - not_null
12 | - name: circuit_id
13 | description: '{{ doc("circuit_id_def") }}'
14 | - name: circuit_ref
15 | description: '{{ doc("circuit_ref_def") }}'
16 | - name: circuit_name
17 | description: '{{ doc("circuit_name_def") }}'
18 | - name: circuit_location
19 | description: '{{ doc("circuit_location_def") }}'
20 | - name: circuit_country
21 | description: '{{ doc("circuit_country_def") }}'
22 | - name: latitude
23 | description: '{{ doc("circuit_lat_def") }}'
24 | - name: longitude
25 | description: '{{ doc("circuit_lng_def") }}'
26 | - name: altitude
27 | description: '{{ doc("circuit_alt_def") }}'
28 | - name: circuit_url
29 | description: '{{ doc("circuit_url_def") }}'
30 |
31 | - name: fct_constructor_results
32 | description: "Fact data related to constructor (ie brand) results. This is at a race level but race info isn't in this model, just the race_id."
33 | columns:
34 | - name: constructor_results_id
35 | description: "The primary key for this table, result has its own id."
36 | tests:
37 | - unique
38 | - not_null
39 | - name: race_id
40 | description: '{{ doc("race_id_def") }}'
41 | - name: constructor_id
42 | description: '{{ doc("constructor_id_def") }}'
43 | - name: constructor_points
44 | description: '{{ doc("constructor_points_def") }}'
45 | - name: status
46 | description: '{{ doc("constructor_status_def") }}'
47 |
48 | - name: fct_constructor_standings
49 | description: "Fact data related to constructor (ie brand) results. This is at a race level but race info isn't in this model, just the race_id."
50 | columns:
51 | - name: constructor_standings_id
52 | description: "The primary key for this table, result has its own id."
53 | tests:
54 | - unique
55 | - not_null
56 | - name: race_id
57 | description: '{{ doc("race_id_def") }}'
58 | - name: constructor_id
59 | description: '{{ doc("constructor_id_def") }}'
60 | - name: points
61 | description: '{{ doc("constructor_points_def") }}'
62 | - name: constructor_position
63 | description: '{{ doc("constructor_status_def") }}'
64 | - name: position_text
65 | description: '{{ doc("constructor_points_def") }}'
66 | - name: wins
67 | description: '{{ doc("constructor_wins_def") }}'
68 |
69 | - name: dim_constructors
70 | description: "Dimensional data for the constructors (teams)."
71 | columns:
72 | - name: dim_constructor_id
73 | description: "The primary key for this table, each team has its own id."
74 | tests:
75 | - unique
76 | - not_null
77 | - name: constructor_id
78 | description: '{{ doc("constructor_id_def") }}'
79 | - name: constructor_ref
80 | description: '{{ doc("constructor_ref_def") }}'
81 | - name: constructor_name
82 | description: '{{ doc("constructor_name_def") }}'
83 | - name: constructor_nationality
84 | description: '{{ doc("constructor_nationality_def") }}'
85 |
86 | - name: fct_driver_standings
87 | description: "Fact data for driver standings by race."
88 | columns:
89 | - name: driver_standings_id
90 | description: "The primary key for this table."
91 | tests:
92 | - unique
93 | - not_null
94 | - name: race_id
95 | description: '{{ doc("race_id_def") }}'
96 | - name: driver_id
97 | description: '{{ doc("driver_id_def") }}'
98 | - name: driver_points
99 | description: '{{ doc("driver_points_def") }}'
100 | - name: driver_position
101 | description: '{{ doc("driver_position_def") }}'
102 | - name: position_text
103 | description: '{{ doc("position_text_def") }}'
104 | - name: driver_wins
105 | description: '{{ doc("driver_wins_def") }}'
106 |
107 | - name: dim_drivers
108 | description: "Dimensional model stage for driver attributes."
109 | columns:
110 | - name: driver_id
111 | description: "The primary key for this table."
112 | tests:
113 | - unique
114 | - not_null
115 | - name: driver_code
116 | description: '{{ doc("driver_code_def") }}'
117 | - name: driver_ref
118 | description: '{{ doc("driver_ref_def") }}'
119 | - name: driver_number
120 | description: '{{ doc("driver_number_def") }}'
121 | - name: forename
122 | description: '{{ doc("driver_forename_def") }}'
123 | - name: surname
124 | description: '{{ doc("driver_surname_def") }}'
125 | - name: date_of_birth
126 | description: '{{ doc("driver_dob_def") }}'
127 | - name: driver_nationality
128 | description: '{{ doc("driver_nationality_def") }}'
129 |
130 | - name: fct_lap_times
131 | description: "Fact data around every lap."
132 | columns:
133 | - name: lap_times_id
134 | description: "The primary key for this table."
135 | tests:
136 | - unique
137 | - not_null
138 | - name: race_id
139 | description: '{{ doc("race_id_def") }}'
140 | - name: driver_id
141 | description: '{{ doc("driver_id_def") }}'
142 | - name: lap
143 | description: '{{ doc("laps_def") }}'
144 | - name: driver_position
145 | description: '{{ doc("driver_position_def") }}'
146 | - name: lap_time_formatted
147 | description: '{{ doc("lap_time_def") }}'
148 | - name: official_laptime
149 | description: '{{ doc("official_lap_time_def") }}'
150 | - name: lap_time_milliseconds
151 | description: '{{ doc("lap_time_milliseconds_def") }}'
152 |
153 | - name: fct_pit_stops
154 | description: "Fact data around pitstop times."
155 | columns:
156 | - name: pitstop_id
157 | description: "The primary key for this table."
158 | tests:
159 | - unique
160 | - not_null
161 | - name: race_id
162 | description: '{{ doc("race_id_def") }}'
163 | - name: driver_id
164 | description: '{{ doc("driver_id_def") }}'
165 | - name: stop_number
166 | description: '{{ doc("laps_def") }}'
167 | - name: lap
168 | description: '{{ doc("stop_number_def") }}'
169 | - name: pit_stop_time
170 | description: '{{ doc("pit_stop_time_def") }}'
171 | - name: pit_stop_duration_seconds
172 | description: '{{ doc("pit_stop_duration_seconds_def") }}'
173 | - name: pit_stop_duration
174 | description: '{{ doc("pit_stop_duration_def") }}'
175 | - name: pit_stop_milliseconds
176 | description: '{{ doc("pit_stop_milliseconds_def") }}'
177 |
178 | - name: fct_qualifying
179 | description: "Fact data around qualifying times."
180 | columns:
181 | - name: qualifying_id
182 | description: "The primary key for this table."
183 | tests:
184 | - unique
185 | - not_null
186 | - name: race_id
187 | description: '{{ doc("race_id_def") }}'
188 | - name: driver_id
189 | description: '{{ doc("driver_id_def") }}'
190 | - name: constructor_id
191 | description: '{{ doc("constructor_id_def") }}'
192 | - name: driver_number
193 | description: '{{ doc("driver_number_def") }}'
194 | - name: qualifying_position
195 | description: '{{ doc("qualifying_position_def") }}'
196 | - name: q1_time
197 | description: '{{ doc("q1_time_def") }}'
198 | - name: q2_time
199 | description: '{{ doc("q2_time_def") }}'
200 | - name: q3_time
201 | description: '{{ doc("q3_time_def") }}'
202 |
203 | - name: fct_qualifying_pivoted
204 | description: "Fact data around qualifying times."
205 | columns:
206 | - name: qualifying_pivoted_id
207 | description: "The primary key for this table."
208 | tests:
209 | - unique
210 | - not_null
211 | - name: qualifying_round
212 | description: "The round of qualifying, must be distinct values 1, 2 or 3."
213 | tests:
214 | - accepted_values:
215 | values: ['1', '2', '3']
216 |
217 | - name: dim_races
218 | description: "Dimensional data around race events."
219 | columns:
220 | - name: race_id
221 | description: "The primary key for this table."
222 | tests:
223 | - unique
224 | - not_null
225 | - name: race_year
226 | description: '{{ doc("race_year_def") }}'
227 | - name: race_round
228 | description: '{{ doc("race_round_def") }}'
229 | - name: circuit_id
230 | description: '{{ doc("circuit_id_def") }}'
231 | - name: race_name
232 | description: '{{ doc("race_name_def") }}'
233 | - name: race_date
234 | description: '{{ doc("race_date_def") }}'
235 | - name: race_time
236 | description: '{{ doc("race_time_def") }}'
237 | - name: race_url
238 | description: '{{ doc("race_url_def") }}'
239 | - name: fp1_date
240 | description: '{{ doc("fp1_date_def") }}'
241 | - name: fp1_time
242 | description: '{{ doc("fp1_time_def") }}'
243 | - name: fp2_date
244 | description: '{{ doc("fp2_date_def") }}'
245 | - name: fp2_time
246 | description: '{{ doc("fp2_time_def") }}'
247 | - name: fp3_date
248 | description: '{{ doc("fp3_date_def") }}'
249 | - name: fp3_time
250 | description: '{{ doc("fp3_time_def") }}'
251 | - name: quali_date
252 | description: '{{ doc("quali_date_def") }}'
253 | - name: quali_time
254 | description: '{{ doc("quali_time_def") }}'
255 | - name: sprint_date
256 | description: '{{ doc("sprint_date_def") }}'
257 | - name: sprint_time
258 | description: '{{ doc("sprint_time_def") }}'
259 |
260 | - name: fct_results
261 | description: "Fact data around qualifying times."
262 | columns:
263 | - name: fct_result_id
264 | description: "The primary key for this table."
265 | tests:
266 | - unique
267 | - not_null
268 | - name: result_id
269 | description: '{{ doc("result_id_def") }}'
270 | - name: race_id
271 | description: '{{ doc("race_id_def") }}'
272 | - name: driver_id
273 | description: '{{ doc("driver_id_def") }}'
274 | - name: constructor_id
275 | description: '{{ doc("constructor_id_def") }}'
276 | - name: driver_number
277 | description: '{{ doc("driver_number_def") }}'
278 | - name: grid
279 | description: '{{ doc("grid_def") }}'
280 | - name: driver_position
281 | description: '{{ doc("driver_position_def") }}'
282 | - name: position_text
283 | description: '{{ doc("position_text_def") }}'
284 | - name: position_order
285 | description: '{{ doc("position_order_def") }}'
286 | - name: points
287 | description: '{{ doc("points_def") }}'
288 | - name: laps
289 | description: '{{ doc("laps_def") }}'
290 | - name: race_time
291 | description: '{{ doc("race_time_def") }}'
292 | - name: milliseconds
293 | description: '{{ doc("results_milliseconds_def") }}'
294 | - name: fastest_lap
295 | description: '{{ doc("fastest_lap_def") }}'
296 | - name: driver_rank
297 | description: '{{ doc("driver_rank_def") }}'
298 | - name: fastest_lap_time
299 | description: '{{ doc("fastest_lap_time_def") }}'
300 | - name: fastest_lap_speed
301 | description: '{{ doc("fastest_lap_speed_def") }}'
302 | - name: status_id
303 | description: '{{ doc("status_id_def") }}'
304 |
305 | - name: dim_seasons
306 | description: "Season information"
307 | columns:
308 | - name: season_year
309 | description: "The primary key for this table."
310 | tests:
311 | - unique
312 | - not_null
313 | - name: season_url
314 | description: "The Wiki URL for F1 Season"
315 |
316 | - name: fct_sprint_results
317 | description: "Fact data around qualifying times."
318 | columns:
319 | - name: result_id
320 | description: "The primary key for this table."
321 | tests:
322 | - unique
323 | - not_null
324 | - name: race_id
325 | description: '{{ doc("race_id_def") }}'
326 | - name: driver_id
327 | description: '{{ doc("driver_id_def") }}'
328 | - name: constructor_id
329 | description: '{{ doc("constructor_id_def") }}'
330 | - name: driver_number
331 | description: '{{ doc("driver_number_def") }}'
332 | - name: grid
333 | description: '{{ doc("grid_def") }}'
334 | - name: driver_position
335 | description: '{{ doc("driver_position_def") }}'
336 | - name: position_text
337 | description: '{{ doc("position_text_def") }}'
338 | - name: position_order
339 | description: '{{ doc("position_order_def") }}'
340 | - name: points
341 | description: '{{ doc("points_def") }}'
342 | - name: laps
343 | description: '{{ doc("laps_def") }}'
344 | - name: sprint_time
345 | description: '{{ doc("sprint_time_def") }}'
346 | - name: milliseconds
347 | description: '{{ doc("results_milliseconds_def") }}'
348 | - name: fastest_lap
349 | description: '{{ doc("fastest_lap_def") }}'
350 | - name: fastest_lap_time
351 | description: '{{ doc("fastest_lap_time_def") }}'
352 | - name: status_id
353 | description: '{{ doc("status_id_def") }}'
354 |
355 |
356 | - name: dim_status
357 | description: "Status codes mapping."
358 | columns:
359 | - name: status_id
360 | description: "The primary key for this table."
361 | tests:
362 | - unique
363 | - not_null
364 | - name: status
365 | description: '{{ doc("status_def") }}'
--------------------------------------------------------------------------------
/models/core/core_table_definitions.md:
--------------------------------------------------------------------------------
1 | {% docs dim_circuits %}
2 |
3 | This table contains dimensional data related to F1 circuits.
4 |
5 | {% enddocs %}
--------------------------------------------------------------------------------
/models/core/dim_circuits.sql:
--------------------------------------------------------------------------------
1 | {{ config(
2 | tags=["dimension", "track"]
3 | ) }}
4 |
5 | WITH circuits AS (
6 | SELECT * FROM {{ ref('stg_circuits') }}
7 | )
8 |
9 | SELECT
10 | {{ dbt_utils.generate_surrogate_key(['circuit_id']) }} AS dim_circuit_id,
11 | *
12 | FROM circuits
--------------------------------------------------------------------------------
/models/core/dim_constructors.sql:
--------------------------------------------------------------------------------
1 | WITH constructors AS (
2 | SELECT * FROM {{ ref('stg_constructors') }}
3 | )
4 |
5 | SELECT
6 | {{ dbt_utils.generate_surrogate_key(['constructor_id']) }} AS dim_constructor_id,
7 | constructor_id AS constructor_id,
8 | constructor_ref AS constructor_ref,
9 | constructor_name AS constructor_name,
10 | constructor_nationality AS constructor_nationality
11 | --constructor_url AS constructor_url
12 | FROM constructors
--------------------------------------------------------------------------------
/models/core/dim_drivers.sql:
--------------------------------------------------------------------------------
1 | WITH drivers AS (
2 | SELECT
3 | driver_id AS driver_id,
4 | driver_ref AS driver_ref,
5 | driver_number AS driver_number,
6 | driver_code AS driver_code,
7 | forename AS forename,
8 | surname AS surname,
9 | date_of_birth AS date_of_birth,
10 | driver_nationality AS driver_nationality
11 | --driver_url AS driver_url
12 | FROM {{ ref('stg_drivers') }}
13 | )
14 |
15 | SELECT * FROM drivers
--------------------------------------------------------------------------------
/models/core/dim_races.sql:
--------------------------------------------------------------------------------
1 | WITH races AS (
2 | SELECT
3 | race_id AS race_id,
4 | race_year AS race_year,
5 | race_round AS race_round,
6 | circuit_id AS circuit_id,
7 | race_name AS race_name,
8 | race_date AS race_date,
9 | race_time AS race_time,
10 | --race_url AS race_url,
11 | fp1_date AS fp1_date,
12 | fp1_time AS fp1_time,
13 | fp2_date AS fp2_date,
14 | fp2_time AS fp2_time,
15 | fp3_date AS fp3_date,
16 | fp3_time AS fp3_time,
17 | quali_date AS quali_date,
18 | quali_time AS quali_time,
19 | sprint_date AS sprint_date,
20 | sprint_time AS sprint_time
21 | FROM {{ ref('stg_races') }}
22 | )
23 |
24 | SELECT * FROM races
--------------------------------------------------------------------------------
/models/core/dim_seasons.sql:
--------------------------------------------------------------------------------
1 | WITH seasons AS (
2 | SELECT
3 | season_year AS season_year,
4 | season_url AS season_url
5 | FROM {{ ref('stg_seasons') }}
6 | )
7 |
8 | SELECT * FROM seasons
--------------------------------------------------------------------------------
/models/core/dim_status.sql:
--------------------------------------------------------------------------------
1 | WITH status AS (
2 | SELECT
3 | status_id AS status_id,
4 | status AS status
5 | FROM {{ ref('stg_status') }}
6 | )
7 |
8 | SELECT * FROM status
--------------------------------------------------------------------------------
/models/core/fct_constructor_results.sql:
--------------------------------------------------------------------------------
1 | WITH constructor_results AS (
2 | SELECT * FROM {{ ref('stg_constructor_results') }}
3 | )
4 |
5 | SELECT
6 | constructor_results_id,
7 | race_id,
8 | constructor_id,
9 | constructor_points,
10 | status
11 | FROM constructor_results
--------------------------------------------------------------------------------
/models/core/fct_constructor_standings.sql:
--------------------------------------------------------------------------------
1 | WITH constructor_standings AS (
2 | SELECT
3 | constructor_standings_id AS constructor_standings_id,
4 | race_id AS race_id,
5 | constructor_id AS constructor_id,
6 | points AS points,
7 | constructor_position AS constructor_position,
8 | position_text AS position_text,
9 | wins AS wins
10 | FROM {{ ref('stg_constructor_standings') }}
11 | )
12 |
13 | SELECT * FROM constructor_standings
--------------------------------------------------------------------------------
/models/core/fct_driver_standings.sql:
--------------------------------------------------------------------------------
1 | WITH driver_standings AS (
2 | SELECT
3 | driver_standings_id AS driver_standings_id,
4 | race_id AS race_id,
5 | driver_id AS driver_id,
6 | driver_points AS driver_points,
7 | driver_position AS driver_position,
8 | position_text AS position_text,
9 | driver_wins AS driver_wins
10 | FROM {{ ref('stg_driver_standings') }}
11 | )
12 |
13 | SELECT * FROM driver_standings
--------------------------------------------------------------------------------
/models/core/fct_lap_times.sql:
--------------------------------------------------------------------------------
1 | WITH lap_times AS (
2 | SELECT
3 | {{ dbt_utils.generate_surrogate_key(['race_id', 'driver_id', 'lap']) }} AS lap_times_id,
4 | race_id AS race_id,
5 | driver_id AS driver_id,
6 | lap AS lap,
7 | driver_position AS driver_position,
8 | lap_time_formatted AS lap_time_formatted,
9 | official_laptime AS official_laptime,
10 | lap_time_milliseconds AS lap_time_milliseconds
11 | FROM {{ ref('stg_lap_times') }}
12 | )
13 |
14 | SELECT * FROM lap_times
--------------------------------------------------------------------------------
/models/core/fct_pit_stops.sql:
--------------------------------------------------------------------------------
1 | WITH pitstops AS (
2 | SELECT
3 | {{ dbt_utils.generate_surrogate_key(['race_id', 'driver_id', 'stop_number']) }} AS pitstop_id,
4 | race_id AS race_id,
5 | driver_id AS driver_id,
6 | stop_number AS stop_number,
7 | lap AS lap,
8 | pit_stop_time AS pit_stop_time,
9 | pit_stop_duration_seconds AS pit_stop_duration_seconds,
10 | pit_stop_duration AS pit_stop_duration,
11 | pit_stop_milliseconds AS pit_stop_milliseconds,
12 | max(stop_number) over (partition by race_id,driver_id) as total_pit_stops_per_race
13 | FROM {{ ref('stg_pit_stops') }}
14 | )
15 |
16 | SELECT * FROM pitstops
--------------------------------------------------------------------------------
/models/core/fct_qualifying.sql:
--------------------------------------------------------------------------------
1 | {{ config(
2 | enabled=false
3 | ) }}
4 | WITH qualifying AS (
5 | SELECT
6 | qualifying_id AS qualifying_id,
7 | race_id AS race_id,
8 | driver_id AS driver_id,
9 | constructor_id AS constructor_id,
10 | driver_number AS driver_number,
11 | qualifying_position AS qualifying_position,
12 | q1_time AS q1_time,
13 | q2_time AS q2_time,
14 | q3_time AS q3_time
15 | FROM {{ ref('stg_qualifying') }}
16 | )
17 |
18 | SELECT * FROM qualifying
19 |
--------------------------------------------------------------------------------
/models/core/fct_results.sql:
--------------------------------------------------------------------------------
1 | {{ config(
2 | tags=["race", "laptimes"],
3 | materialized = "incremental",
4 | unique_key = "result_id"
5 | ) }}
6 |
7 | WITH results AS (
8 | SELECT
9 | {{ dbt_utils.generate_surrogate_key(['result_id']) }} AS fct_result_id,
10 | result_id AS result_id,
11 | race_id AS race_id,
12 | driver_id AS driver_id,
13 | constructor_id AS constructor_id,
14 | driver_number AS driver_number,
15 | grid AS grid,
16 | driver_position AS driver_position,
17 | position_text AS position_text,
18 | position_order AS position_order,
19 | points AS points,
20 | laps AS laps,
21 | race_time AS race_time,
22 | milliseconds AS milliseconds,
23 | fastest_lap AS fastest_lap,
24 | driver_rank AS driver_rank,
25 | fastest_lap_time AS fastest_lap_time,
26 | fastest_lap_speed AS fastest_lap_speed,
27 | status_id AS status_id
28 | FROM {{ ref('stg_results') }}
29 | )
30 |
31 | SELECT * FROM results
--------------------------------------------------------------------------------
/models/core/fct_sprint_results.sql:
--------------------------------------------------------------------------------
1 | WITH sprint_results AS (
2 | SELECT
3 | result_id AS result_id,
4 | race_id AS race_id,
5 | driver_id AS driver_id,
6 | constructor_id AS constructor_id,
7 | driver_number AS driver_number,
8 | grid AS grid,
9 | driver_position AS driver_position,
10 | position_text AS position_text,
11 | position_order AS position_order,
12 | points AS points,
13 | laps AS laps,
14 | sprint_time AS sprint_time,
15 | milliseconds AS milliseconds,
16 | fastest_lap AS fastest_lap,
17 | fastest_lap_time AS fastest_lap_time,
18 | status_id AS status_id
19 | FROM {{ ref('stg_sprint_results') }}
20 | )
21 |
22 | SELECT * FROM sprint_results
--------------------------------------------------------------------------------
/models/marts/aggregates/agg_fastest_pit_stops_by_constructor.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pandas as pd
3 |
4 | def model(dbt, session):
5 | # dbt configuration
6 | dbt.config(packages=["pandas","numpy"])
7 |
8 | # get upstream data
9 | pit_stops_joined = dbt.ref("mrt_pit_stops").to_pandas()
10 |
11 | # provide year so we do not hardcode dates
12 | year=2021
13 |
14 | # describe the data
15 | pit_stops_joined["PIT_STOP_SECONDS"] = pit_stops_joined["PIT_STOP_MILLISECONDS"]/1000
16 | fastest_pit_stops = pit_stops_joined[(pit_stops_joined["RACE_YEAR"]==year)].groupby(by="CONSTRUCTOR_NAME")["PIT_STOP_SECONDS"].describe().sort_values(by='mean')
17 | fastest_pit_stops.reset_index(inplace=True)
18 | fastest_pit_stops.columns = fastest_pit_stops.columns.str.upper()
19 |
20 | return fastest_pit_stops.round(2)
--------------------------------------------------------------------------------
/models/marts/aggregates/agg_lap_times_moving_avg.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 |
3 | def model(dbt, session):
4 | # dbt configuration
5 | dbt.config(packages=["pandas"])
6 |
7 | # get upstream data
8 | lap_times = dbt.ref("mrt_lap_times_years").to_pandas()
9 |
10 | # describe the data
11 | lap_times["LAP_TIME_SECONDS"] = lap_times["LAP_TIME_MILLISECONDS"]/1000
12 | lap_time_trends = lap_times.groupby(by="RACE_YEAR")["LAP_TIME_SECONDS"].mean().to_frame()
13 | lap_time_trends.reset_index(inplace=True)
14 | lap_time_trends["LAP_MOVING_AVG_5_YEARS"] = lap_time_trends["LAP_TIME_SECONDS"].rolling(5).mean()
15 | lap_time_trends.columns = lap_time_trends.columns.str.upper()
16 |
17 | return lap_time_trends.round(1)
--------------------------------------------------------------------------------
/models/marts/aggregates/aggregates.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | models:
4 | - name: fastest_pit_stops_by_constructor
5 | description: Use the python .describe() method to retrieve summary statistics table about pit stops by constructor. Sort by average stop time ascending so the first row returns the fastest constructor.
6 | columns:
7 | - name: constructor_name
8 | description: team that makes the car
9 | tests:
10 | - unique
11 |
12 | - name: lap_times_moving_avg
13 | description: Use the python .rolling() method to calculate the 5 year rolling average of pit stop times alongside the average for each year.
14 | columns:
15 | - name: race_year
16 | description: year of the race
17 | tests:
18 | - relationships:
19 | to: ref('int_lap_times_years')
20 | field: race_year
21 |
--------------------------------------------------------------------------------
/models/marts/mrt.md:
--------------------------------------------------------------------------------
1 | # the intent of this .md is to allow for multi-line long form explanations for our intermediate transformations
2 |
3 | # below are descriptions
4 | {% docs int_results %} In this query we want to join out other important information about the race results to have a human readable table about results, races, drivers, constructors, and status.
5 | We will have 4 left joins onto our results table. {% enddocs %}
6 |
7 | {% docs int_pit_stops %} There are many pit stops within one race, aka a M:1 relationship.
8 | We want to aggregate this so we can properly join pit stop information without creating a fanout. {% enddocs %}
9 |
10 | {% docs int_lap_times_years %} Lap times are done per lap. We need to join them out to the race year to understand yearly lap time trends. {% enddocs %}
--------------------------------------------------------------------------------
/models/marts/mrt.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | models:
4 | - name: int_results
5 | description: '{{ doc("int_results") }}'
6 |
7 | - name: int_pit_stops
8 | description: '{{ doc("int_pit_stops") }}'
9 |
10 | - name: int_lap_times_years
11 | description: '{{ doc("int_lap_times_years") }}'
--------------------------------------------------------------------------------
/models/marts/mrt_lap_times_years.sql:
--------------------------------------------------------------------------------
1 | with lap_times as (
2 |
3 | select * from {{ ref('fct_lap_times') }}
4 |
5 | ),
6 | races as (
7 |
8 | select * from {{ ref('dim_races') }}
9 | ),
10 | expanded_lap_times_by_year as (
11 | select
12 | lap_times.race_id,
13 | driver_id,
14 | race_year,
15 | lap,
16 | lap_time_milliseconds
17 | from lap_times
18 | left join races
19 | on lap_times.race_id = races.race_id
20 | where lap_time_milliseconds is not null
21 | )
22 | select * from expanded_lap_times_by_year
--------------------------------------------------------------------------------
/models/marts/mrt_pit_stops.sql:
--------------------------------------------------------------------------------
1 | with base_results as (
2 |
3 | select * from {{ ref('mrt_results') }}
4 |
5 | ),
6 |
7 | pit_stops as (
8 |
9 | select * from {{ ref('fct_pit_stops') }}
10 |
11 | ),
12 |
13 | pit_stops_joined as (
14 |
15 | select
16 | base_results.race_id,
17 | race_year,
18 | base_results.driver_id,
19 | constructor_id,
20 | constructor_name,
21 | stop_number,
22 | lap,
23 | pit_stop_time,
24 | pit_stop_duration_seconds,
25 | pit_stop_milliseconds,
26 | total_pit_stops_per_race
27 | from base_results
28 | left join pit_stops
29 | on base_results.race_id=pit_stops.race_id and base_results.driver_id=pit_stops.driver_id
30 | )
31 | select * from pit_stops_joined
--------------------------------------------------------------------------------
/models/marts/mrt_results.sql:
--------------------------------------------------------------------------------
1 | with results as (
2 |
3 | select * from {{ ref('fct_results') }}
4 |
5 | ),
6 |
7 | races as (
8 |
9 | select * from {{ ref('dim_races') }}
10 |
11 | ),
12 |
13 | drivers as (
14 |
15 | select * from {{ ref('dim_drivers') }}
16 |
17 | ),
18 |
19 | constructors as (
20 |
21 | select * from {{ ref('dim_constructors') }}
22 |
23 | ),
24 |
25 | status as (
26 |
27 | select * from {{ ref('dim_status') }}
28 |
29 | ),
30 |
31 | int_results as (
32 | select
33 | result_id,
34 | results.race_id,
35 | race_year,
36 | race_round,
37 | circuit_id,
38 | race_name,
39 | race_date,
40 | results.race_time,
41 | results.driver_id,
42 | results.driver_number,
43 | forename ||' '|| surname as driver,
44 | cast(datediff('year', date_of_birth, race_date) as int) as drivers_age_years,
45 | driver_nationality,
46 | results.constructor_id,
47 | constructor_name,
48 | constructor_nationality,
49 | grid,
50 | driver_position,
51 | position_text,
52 | position_order,
53 | points,
54 | laps,
55 | milliseconds,
56 | fastest_lap,
57 | driver_rank,
58 | fastest_lap_time,
59 | fastest_lap_speed,
60 | results.status_id,
61 | status,
62 | case when driver_position is null then 1 else 0 end as dnf_flag
63 | from results
64 | left join races
65 | on results.race_id=races.race_id
66 | left join drivers
67 | on results.driver_id = drivers.driver_id
68 | left join constructors
69 | on results.constructor_id = constructors.constructor_id
70 | left join status
71 | on results.status_id = status.status_id
72 | )
73 |
74 | select * from int_results
75 |
--------------------------------------------------------------------------------
/models/marts/mrt_results_circuits.sql:
--------------------------------------------------------------------------------
1 | with int_results as (
2 |
3 | select * from {{ ref('mrt_results') }}
4 |
5 | ),
6 |
7 | int_pit_stops as (
8 | select
9 | race_id,
10 | driver_id,
11 | max(total_pit_stops_per_race) as total_pit_stops_per_race
12 | from {{ ref('mrt_pit_stops') }}
13 | group by 1,2
14 |
15 | ),
16 |
17 | circuits as (
18 |
19 | select * from {{ ref('dim_circuits') }}
20 |
21 | ),
22 |
23 | base_results as (
24 | select
25 | result_id,
26 | int_results.race_id,
27 | race_year,
28 | race_round,
29 | int_results.circuit_id,
30 | int_results.race_name,
31 | circuit_ref,
32 | circuit_location,
33 | circuit_country,
34 | latitude,
35 | longitude,
36 | altitude,
37 | total_pit_stops_per_race,
38 | race_date,
39 | race_time,
40 | int_results.driver_id,
41 | driver,
42 | driver_number,
43 | drivers_age_years,
44 | driver_nationality,
45 | constructor_id,
46 | constructor_name,
47 | constructor_nationality,
48 | grid,
49 | driver_position,
50 | position_text,
51 | position_order,
52 | points,
53 | laps,
54 | milliseconds,
55 | fastest_lap,
56 | driver_rank,
57 | fastest_lap_time,
58 | fastest_lap_speed,
59 | status_id,
60 | status,
61 | dnf_flag
62 | from int_results
63 | left join circuits
64 | on int_results.circuit_id=circuits.circuit_id
65 | left join int_pit_stops
66 | on int_results.driver_id=int_pit_stops.driver_id and int_results.race_id=int_pit_stops.race_id
67 | )
68 |
69 | select * from base_results
--------------------------------------------------------------------------------
/models/ml/prep_encoding_splitting/covariate_encoding.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import numpy as np
3 | from sklearn.preprocessing import StandardScaler,LabelEncoder,OneHotEncoder
4 | from sklearn.linear_model import LogisticRegression
5 |
6 | def model(dbt, session):
7 | # dbt configuration
8 | dbt.config(packages=["pandas","numpy","scikit-learn"])
9 |
10 | # get upstream data
11 | data = dbt.ref("ml_data_prep").to_pandas()
12 |
13 | # list out covariates we want to use in addition to outcome variable we are modeling - position
14 | covariates = data[['RACE_YEAR','RACE_NAME','GRID','CONSTRUCTOR_NAME','DRIVER','DRIVERS_AGE_YEARS','DRIVER_CONFIDENCE','CONSTRUCTOR_RELAIBLITY','TOTAL_PIT_STOPS_PER_RACE','ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR', 'DRIVER_POSITION']]
15 |
16 | # filter covariates on active drivers and constructors
17 | # use fil_cov as short for "filtered_covariates"
18 | fil_cov = covariates[(covariates['ACTIVE_DRIVER']==1)&(covariates['ACTIVE_CONSTRUCTOR']==1)]
19 |
20 | # Encode categorical variables using LabelEncoder
21 | # TODO: we'll update this to both ohe in the future for non-ordinal variables!
22 | le = LabelEncoder()
23 | fil_cov['RACE_NAME'] = le.fit_transform(fil_cov['RACE_NAME'])
24 | fil_cov['CONSTRUCTOR_NAME'] = le.fit_transform(fil_cov['CONSTRUCTOR_NAME'])
25 | fil_cov['DRIVER'] = le.fit_transform(fil_cov['DRIVER'])
26 | fil_cov['TOTAL_PIT_STOPS_PER_RACE'] = le.fit_transform(fil_cov['TOTAL_PIT_STOPS_PER_RACE'])
27 |
28 | # Simply target variable "position" to represent 3 meaningful categories in Formula1
29 | # 1. Podium position 2. Points for team 3. Nothing - no podium or points!
30 | def position_index(x):
31 | if x<4:
32 | return 1
33 | if x>10:
34 | return 3
35 | else :
36 | return 2
37 |
38 | # we are dropping the columns that we filtered on in addition to our training variable
39 | encoded_data = fil_cov.drop(['ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR'],1)
40 | encoded_data['POSITION_LABEL']= encoded_data['DRIVER_POSITION'].apply(lambda x: position_index(x))
41 | encoded_data_grouped_target = encoded_data.drop(['DRIVER_POSITION'],1)
42 |
43 | return encoded_data_grouped_target
--------------------------------------------------------------------------------
/models/ml/prep_encoding_splitting/encoding_mapping.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import numpy as np
3 | from sklearn.preprocessing import StandardScaler,LabelEncoder,OneHotEncoder
4 | from sklearn.linear_model import LogisticRegression
5 |
6 | def model(dbt, session):
7 | # dbt configuration
8 | dbt.config(packages=["pandas","numpy","scikit-learn"])
9 |
10 | # get upstream data
11 | data = dbt.ref("ml_data_prep").to_pandas()
12 |
13 | # list out covariates we want to use in addition to outcome variable we are modeling - position
14 | covariates = data[['RACE_YEAR','RACE_NAME','GRID','CONSTRUCTOR_NAME','DRIVER','DRIVERS_AGE_YEARS','DRIVER_CONFIDENCE','CONSTRUCTOR_RELAIBLITY','TOTAL_PIT_STOPS_PER_RACE','ACTIVE_DRIVER','ACTIVE_CONSTRUCTOR', 'DRIVER_POSITION']]
15 |
16 | # filter covariates on active drivers and constructors
17 | # use fil_cov as short for "filtered_covariates"
18 | fil_cov = covariates[(covariates['ACTIVE_DRIVER']==1)&(covariates['ACTIVE_CONSTRUCTOR']==1)]
19 |
20 | # Encode categorical variables using LabelEncoder
21 | # TODO: we'll update this to both ohe in the future for non-ordinal variables!
22 | le = LabelEncoder()
23 | fil_cov['DRIVER'] = le.fit_transform(fil_cov['DRIVER'])
24 |
25 | encoding_mapping = pd.DataFrame({'original_driver_value': le.classes_, 'encoded_driver_value': range(len(le.classes_))})
26 | encoding_mapping.columns = encoding_mapping.columns.str.upper()
27 |
28 |
29 |
30 | return encoding_mapping
31 |
32 |
33 |
--------------------------------------------------------------------------------
/models/ml/prep_encoding_splitting/hold_out_dataset_for_prediction.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 |
3 | def model(dbt, session):
4 | # dbt configuration
5 | dbt.config(packages=["pandas"], tags="predict")
6 |
7 | # get upstream data
8 | encoding = dbt.ref("covariate_encoding").to_pandas()
9 |
10 | # variable for year instead of hardcoding it
11 | year=2020
12 |
13 | # filter the data based on the specified year
14 | hold_out_dataset = encoding.loc[encoding['RACE_YEAR'] == year]
15 |
16 | return hold_out_dataset
17 |
18 |
--------------------------------------------------------------------------------
/models/ml/prep_encoding_splitting/ml_data_prep.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 |
3 | def model(dbt, session):
4 | # dbt configuration
5 | dbt.config(packages=["pandas"])
6 |
7 | # get upstream data
8 | fct_results = dbt.ref("mrt_results_circuits").to_pandas()
9 |
10 | # provide years so we do not hardcode dates in filter command
11 | start_year=2010
12 | end_year=2020
13 |
14 | # describe the data for a full decade
15 | data = fct_results.loc[fct_results['RACE_YEAR'].between(start_year, end_year)]
16 |
17 | # convert string to an integer
18 | # data['POSITION'] = data['POSITION'].astype(float)
19 |
20 | # we cannot have nulls if we want to use total pit stops
21 | data['TOTAL_PIT_STOPS_PER_RACE'] = data['TOTAL_PIT_STOPS_PER_RACE'].fillna(0)
22 |
23 | # some of the constructors changed their name over the year so replacing old names with current name
24 | mapping = {'Force India': 'Racing Point', 'Sauber': 'Alfa Romeo', 'Lotus F1': 'Renault', 'Toro Rosso': 'AlphaTauri'}
25 | data['CONSTRUCTOR_NAME'].replace(mapping, inplace=True)
26 |
27 | # create confidence metrics for drivers and constructors
28 | dnf_by_driver = data.groupby('DRIVER').sum()['DNF_FLAG']
29 | driver_race_entered = data.groupby('DRIVER').count()['DNF_FLAG']
30 | driver_dnf_ratio = (dnf_by_driver/driver_race_entered)
31 | driver_confidence = 1-driver_dnf_ratio
32 | driver_confidence_dict = dict(zip(driver_confidence.index,driver_confidence))
33 |
34 | dnf_by_constructor = data.groupby('CONSTRUCTOR_NAME').sum()['DNF_FLAG']
35 | constructor_race_entered = data.groupby('CONSTRUCTOR_NAME').count()['DNF_FLAG']
36 | constructor_dnf_ratio = (dnf_by_constructor/constructor_race_entered)
37 | constructor_relaiblity = 1-constructor_dnf_ratio
38 | constructor_relaiblity_dict = dict(zip(constructor_relaiblity.index,constructor_relaiblity))
39 |
40 | data['DRIVER_CONFIDENCE'] = data['DRIVER'].apply(lambda x:driver_confidence_dict[x])
41 | data['CONSTRUCTOR_RELAIBLITY'] = data['CONSTRUCTOR_NAME'].apply(lambda x:constructor_relaiblity_dict[x])
42 |
43 | #removing retired drivers and constructors
44 | active_constructors = ['Renault', 'Williams', 'McLaren', 'Ferrari', 'Mercedes',
45 | 'AlphaTauri', 'Racing Point', 'Alfa Romeo', 'Red Bull',
46 | 'Haas F1 Team']
47 | active_drivers = ['Daniel Ricciardo', 'Kevin Magnussen', 'Carlos Sainz',
48 | 'Valtteri Bottas', 'Lance Stroll', 'George Russell',
49 | 'Lando Norris', 'Sebastian Vettel', 'Kimi Räikkönen',
50 | 'Charles Leclerc', 'Lewis Hamilton', 'Daniil Kvyat',
51 | 'Max Verstappen', 'Pierre Gasly', 'Alexander Albon',
52 | 'Sergio Pérez', 'Esteban Ocon', 'Antonio Giovinazzi',
53 | 'Romain Grosjean','Nicholas Latifi']
54 |
55 | # create flags for active drivers and constructors so we can filter downstream
56 | data['ACTIVE_DRIVER'] = data['DRIVER'].apply(lambda x: int(x in active_drivers))
57 | data['ACTIVE_CONSTRUCTOR'] = data['CONSTRUCTOR_NAME'].apply(lambda x: int(x in active_constructors))
58 |
59 | return data
--------------------------------------------------------------------------------
/models/ml/prep_encoding_splitting/training_testing_dataset.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 |
3 | def model(dbt, session):
4 |
5 | # dbt configuration
6 | dbt.config(packages=["pandas"], tags="train")
7 |
8 | # get upstream data
9 | encoding = dbt.ref("covariate_encoding").to_pandas()
10 |
11 | # provide years so we do not hardcode dates in filter command
12 | start_year=2010
13 | end_year=2019
14 |
15 | # describe the data for a full decade
16 | train_test_dataset = encoding.loc[encoding['RACE_YEAR'].between(start_year, end_year)]
17 |
18 | return train_test_dataset
--------------------------------------------------------------------------------
/models/ml/training_and_prediction/apply_prediction_to_position.py:
--------------------------------------------------------------------------------
1 | import logging
2 | import joblib
3 | import pandas as pd
4 | import os
5 | from snowflake.snowpark import types as T
6 |
7 | DB_STAGE = 'MODELSTAGE'
8 | version = '1.0'
9 | # The name of the model file
10 | model_file_path = 'driver_position_'+version
11 | model_file_packaged = 'driver_position_'+version+'.joblib'
12 |
13 | # This is a local directory, used for storing the various artifacts locally
14 | LOCAL_TEMP_DIR = f'/tmp/driver_position'
15 | DOWNLOAD_DIR = os.path.join(LOCAL_TEMP_DIR, 'download')
16 | TARGET_MODEL_DIR_PATH = os.path.join(LOCAL_TEMP_DIR, 'ml_model')
17 | TARGET_LIB_PATH = os.path.join(LOCAL_TEMP_DIR, 'lib')
18 |
19 | # The feature columns that were used during model training
20 | # and that will be used during prediction
21 | FEATURE_COLS = [
22 | "RACE_YEAR"
23 | ,"RACE_NAME"
24 | ,"GRID"
25 | ,"CONSTRUCTOR_NAME"
26 | ,"DRIVER"
27 | ,"DRIVERS_AGE_YEARS"
28 | ,"DRIVER_CONFIDENCE"
29 | ,"CONSTRUCTOR_RELAIBLITY"
30 | ,"TOTAL_PIT_STOPS_PER_RACE"]
31 |
32 | def register_udf_for_prediction(p_predictor ,p_session ,p_dbt):
33 |
34 | # The prediction udf
35 |
36 | def predict_position(p_df: T.PandasDataFrame[int, int, int, int,
37 | int, int, int, int, int]) -> T.PandasSeries[int]:
38 | # Snowpark currently does not set the column name in the input dataframe
39 | # The default col names are like 0,1,2,... Hence we need to reset the column
40 | # names to the features that we initially used for training.
41 | p_df.columns = [*FEATURE_COLS]
42 |
43 | # Perform prediction. this returns an array object
44 | pred_array = p_predictor.predict(p_df)
45 | # Convert to series
46 | df_predicted = pd.Series(pred_array)
47 | return df_predicted
48 |
49 | # The list of packages that will be used by UDF
50 | udf_packages = p_dbt.config.get('packages')
51 |
52 | predict_position_udf = p_session.udf.register(
53 | predict_position
54 | ,name=f'predict_position'
55 | ,packages = udf_packages
56 | )
57 | return predict_position_udf
58 |
59 | def download_models_and_libs_from_stage(p_session):
60 | p_session.file.get(f'@{DB_STAGE}/{model_file_path}/{model_file_packaged}', DOWNLOAD_DIR)
61 |
62 | def load_model(p_session):
63 | # Load the model and initialize the predictor
64 | model_fl_path = os.path.join(DOWNLOAD_DIR, model_file_packaged)
65 | predictor = joblib.load(model_fl_path)
66 | return predictor
67 |
68 | # -------------------------------
69 | def model(dbt, session):
70 | dbt.config(
71 | packages = ['snowflake-snowpark-python' ,'scipy','scikit-learn' ,'pandas' ,'numpy'],
72 | materialized = "table",
73 | tags = "predict",
74 | use_anonymous_sproc=True
75 | )
76 | session._use_scoped_temp_objects = False
77 | download_models_and_libs_from_stage(session)
78 | predictor = load_model(session)
79 | predict_position_udf = register_udf_for_prediction(predictor, session ,dbt)
80 |
81 | # Retrieve the data, and perform the prediction
82 | hold_out_df = (dbt.ref("hold_out_dataset_for_prediction")
83 | .select(*FEATURE_COLS)
84 | )
85 | trained_model_file = dbt.ref("train_model_to_predict_position")
86 |
87 | # Perform prediction.
88 | new_predictions_df = hold_out_df.withColumn("position_predicted"
89 | ,predict_position_udf(*FEATURE_COLS)
90 | )
91 |
92 | return new_predictions_df
--------------------------------------------------------------------------------
/models/ml/training_and_prediction/train_model_to_predict_position.py:
--------------------------------------------------------------------------------
1 | import snowflake.snowpark.functions as F
2 | from sklearn.model_selection import train_test_split
3 | import pandas as pd
4 | from sklearn.metrics import confusion_matrix, balanced_accuracy_score
5 | import io
6 | from sklearn.linear_model import LogisticRegression
7 | from joblib import dump, load
8 | import joblib
9 | import logging
10 | import sys
11 | from joblib import dump, load
12 |
13 | logger = logging.getLogger("mylog")
14 |
15 | def save_file(session, model, path, dest_filename):
16 | input_stream = io.BytesIO()
17 | joblib.dump(model, input_stream)
18 | session._conn.upload_stream(input_stream, path, dest_filename)
19 | return "successfully created file: " + path
20 |
21 | def model(dbt, session):
22 | dbt.config(
23 | packages = ['numpy','scikit-learn','pandas','numpy','joblib','cachetools'],
24 | materialized = "table",
25 | tags = "train"
26 | )
27 | # Create a stage in Snowflake to save our model file
28 | session.sql('create or replace stage MODELSTAGE').collect()
29 |
30 | #session._use_scoped_temp_objects = False
31 | version = "1.0"
32 | logger.info('Model training version: ' + version)
33 |
34 | # read in our training and testing upstream dataset
35 | test_train_df = dbt.ref("training_testing_dataset")
36 |
37 | # cast snowpark df to pandas df
38 | test_train_pd_df = test_train_df.to_pandas()
39 | target_col = "POSITION_LABEL"
40 |
41 | # split out covariate predictors, x, from our target column position_label, y.
42 | split_X = test_train_pd_df.drop([target_col], axis=1)
43 | split_y = test_train_pd_df[target_col]
44 |
45 | # Split out our training and test data into proportions
46 | X_train, X_test, y_train, y_test = train_test_split(split_X, split_y, train_size=0.7, random_state=42)
47 | train = [X_train, y_train]
48 | test = [X_test, y_test]
49 | # now we are only training our one model to deploy
50 | # we are keeping the focus on the workflows and not algorithms for this lab!
51 | model = LogisticRegression()
52 |
53 | # fit the preprocessing pipeline and the model together
54 | model.fit(X_train, y_train)
55 | y_pred = model.predict_proba(X_test)[:,1]
56 | predictions = [round(value) for value in y_pred]
57 | balanced_accuracy = balanced_accuracy_score(y_test, predictions)
58 |
59 | # Save the model to a stage
60 | save_file(session, model, "@MODELSTAGE/driver_position_"+version, "driver_position_"+version+".joblib" )
61 | logger.info('Model artifact:' + "@MODELSTAGE/driver_position_"+version+".joblib")
62 |
63 | # Take our pandas training and testing dataframes and put them back into snowpark dataframes
64 | snowpark_train_df = session.write_pandas(pd.concat(train, axis=1, join='inner'), "train_table", auto_create_table=True, create_temp_table=True)
65 | snowpark_test_df = session.write_pandas(pd.concat(test, axis=1, join='inner'), "test_table", auto_create_table=True, create_temp_table=True)
66 |
67 | # Union our training and testing data together and add a column indicating train vs test rows
68 | return snowpark_train_df.with_column("DATASET_TYPE", F.lit("train")).union(snowpark_test_df.with_column("DATASET_TYPE", F.lit("test")))
--------------------------------------------------------------------------------
/models/overview.md:
--------------------------------------------------------------------------------
1 | {% docs __overview__ %}
2 |
3 | ## About Dataset
4 |
5 | ### Context
6 | This is a dataset pulled from Kaggle called [Formula 1 World Championship (1950 - 2023)](https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020?select=status.csv).
7 |
8 | Formula 1 (a.k.a. F1 or Formula One) is the highest class of single-seater auto racing sanctioned by the Fédération Internationale de l'Automobile (FIA) and owned by the Formula One Group. The FIA Formula One World Championship has been one of the premier forms of racing around the world since its inaugural season in 1950. The word "formula" in the name refers to the set of rules to which all participants' cars must conform. A Formula One season consists of a series of races, known as Grands Prix, which take place worldwide on purpose-built circuits and on public roads.
9 |
10 | ### Content
11 | The dataset consists of all information on the Formula 1 races, drivers, constructors, qualifying, circuits, lap times, pit stops, championships from 1950 till the latest 2023 season.
12 |
13 | ### Acknowledgements
14 | The data is compiled from [Ergast](http://ergast.com/mrd/).
15 |
16 | ### Inspiration
17 |
18 | [](https://images.hgmsites.net)
19 |
20 | *"Races are won at the track. Championships are won at the factory."* - Mercedes (2019)
21 |
22 | With the amount of data being captured, analyzed and used to design, build and drive the Formula 1 cars is astounding. It is a global sport being followed by millions of people worldwide and it is very fascinating to see drivers pushing their limit in these vehicles to become the fastest racers in the world!
23 |
24 | {% enddocs %}
--------------------------------------------------------------------------------
/models/staging/formula1/f1_sources.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | sources:
4 | - name: formula1
5 | description: formula 1 datasets with normalized tables
6 | database: formula1
7 | schema: raw
8 | tables:
9 |
10 | - name: circuits
11 | description: One record per circuit, which is the specific race course.
12 | columns:
13 | - name: circuit_id
14 | tests:
15 | - unique
16 | - not_null
17 | description: Unique identifier for the circuit
18 | - name: circuit_ref
19 | description: Unique circuit reference
20 | - name: name
21 | description: Name of the circuit
22 | - name: location
23 | description: City or town where the circuit is located
24 | - name: country
25 | description: Country where the circuit is located
26 | - name: lat
27 | description: Latitude of the circuit
28 | - name: lng
29 | description: Longitude of the circuit
30 | - name: alt
31 | description: Altitude of the circuit in meters
32 | - name: url
33 | description: Wikipedia URL for the circuit
34 |
35 |
36 | - name: constructor_results
37 | description: This table contains race results for constructors.
38 | columns:
39 | - name: constructor_results_id
40 | description: Unique identifier for the constructor race result
41 | - name: race_id
42 | description: Unique identifier for the race
43 | - name: constructor_id
44 | description: Unique identifier for the constructor
45 | - name: points
46 | description: Points scored by the constructor in the race
47 | - name: status
48 | description: Overall status of the constructor's cars in the race
49 |
50 |
51 | - name: constructor_standings
52 | description: This table contains historical Formula 1 constructor standings.
53 | columns:
54 | - name: constructor_standings_id
55 | description: Unique identifier for the constructor standings entry
56 | - name: race_id
57 | description: Unique identifier for the race
58 | - name: constructor_id
59 | description: Unique identifier for the constructor
60 | - name: points
61 | description: Accumulated points for the constructor at this point in the season
62 | - name: position
63 | description: Current position of the constructor in the standings
64 | - name: wins
65 | description: Number of race wins for the constructor at this point in the season
66 |
67 |
68 | - name: constructors
69 | description: One record per constructor. Constructors are the teams that build their formula 1 cars.
70 | columns:
71 | - name: constructor_id
72 | description: Unique identifier for the constructor
73 | tests:
74 | - unique
75 | - not_null
76 | - name: constructor_ref
77 | description: Unique constructor reference
78 | - name: name
79 | description: Constructor name
80 | - name: nationality
81 | description: Constructor's nationality
82 | - name: url
83 | description: Wikipedia URL for the constructor
84 |
85 | - name: driver_standings
86 | description: This table contains historical Formula 1 driver standings.
87 | columns:
88 | - name: driver_standings_id
89 | description: Unique identifier for the driver standings entry
90 | - name: race_id
91 | description: Unique identifier for the race
92 | - name: driver_id
93 | description: Unique identifier for the driver
94 | - name: points
95 | description: Accumulated points for the driver at this point in the season
96 | - name: position
97 | description: Current position of the driver in the standings
98 | - name: wins
99 | description: Number of race wins for the driver at this point in the season
100 |
101 |
102 | - name: drivers
103 | description: One record per driver. This table gives details about the driver.
104 | columns:
105 | - name: driver_id
106 | tests:
107 | - unique
108 | - not_null
109 | description: Unique identifier for the driver
110 | - name: driver_ref
111 | description: Unique driver reference
112 | - name: number
113 | description: Driver number
114 | - name: code
115 | description: Driver code
116 | - name: forename
117 | description: Driver's first name
118 | - name: surname
119 | description: Driver's last name
120 | - name: dob
121 | description: Driver's date of birth
122 | - name: nationality
123 | description: Driver's nationality
124 | - name: url
125 | description: Wikipedia URL for the driver
126 |
127 | - name: lap_times
128 | description: One row per lap in each race. Lap times started being recorded in this dataset in 1984 and joined through driver_id.
129 | columns:
130 | - name: race_id
131 | description: Unique identifier for the race
132 | - name: driver_id
133 | description: Unique identifier for the driver
134 | - name: lap
135 | description: Lap number
136 | - name: position
137 | description: Position of the driver at the end of the lap
138 | - name: time
139 | description: Lap time in minutes, seconds, and milliseconds (MM:SS.mmm format)
140 | - name: milliseconds
141 | description: Lap time in milliseconds
142 |
143 | - name: pit_stops
144 | description: One row per pit stop. Pit stops do not have their own id column, the combination of the race_id and driver_id identify the pit stop.
145 | columns:
146 | - name: race_id
147 | description: Unique identifier for the race
148 | - name: driver_id
149 | description: Unique identifier for the driver
150 | - name: stop
151 | description: Sequential number of the pit stop for the driver in the race
152 | tests:
153 | - accepted_values:
154 | values: [1,2,3,4,5,6,7,8]
155 | quote: false
156 | - name: lap
157 | description: Lap number during which the pit stop occurred
158 | - name: time
159 | description: Time of the day when the pit stop took place (HH:MM:SS format)
160 | - name: duration
161 | description: Duration of the pit stop in seconds (SS.mmm format)
162 | - name: milliseconds
163 | description: Duration of the pit stop in milliseconds
164 |
165 | - name: qualifying
166 | description: This table contains qualifying session results for each driver.
167 | columns:
168 | - name: qualifying_id
169 | description: Unique identifier for the qualifying result
170 | - name: race_id
171 | description: Unique identifier for the race
172 | - name: driver_id
173 | description: Unique identifier for the driver
174 | - name: constructor_id
175 | description: Unique identifier for the constructor
176 | - name: number
177 | description: Car number
178 | - name: position
179 | description: Final qualifying position
180 | - name: q1
181 | description: Qualifying 1 time (MM:SS.mmm format)
182 | - name: q2
183 | description: Qualifying 2 time (MM:SS.mmm format), if applicable
184 | - name: q3
185 | description: Qualifying 3 time (MM:SS.mmm format), if applicable
186 |
187 | - name: races
188 | description: One race per row. Importantly this table contains the race year to understand trends.
189 | columns:
190 | - name: race_id
191 | tests:
192 | - unique
193 | - not_null
194 | description: Unique identifier for the race
195 | - name: year
196 | description: Year the race took place
197 | - name: round
198 | description: Round number in the championship
199 | - name: circuit_id
200 | description: Unique identifier for the circuit
201 | - name: name
202 | description: Name of the race
203 | - name: date
204 | description: Date of the race
205 | - name: time
206 | description: Time of the race
207 | - name: url
208 | description: Wikipedia URL for the race
209 |
210 | - name: results
211 | description: One row per result. The main table that we join out for grid and position variables.
212 | columns:
213 | - name: result_id
214 | description: Unique identifier for the race result
215 | tests:
216 | - unique
217 | - not_null
218 | - name: race_id
219 | description: Unique identifier for the race
220 | - name: driver_id
221 | description: Unique identifier for the driver
222 | - name: constructor_id
223 | description: Unique identifier for the constructor
224 | - name: number
225 | description: Car number
226 | - name: grid
227 | description: Starting grid position
228 | - name: position
229 | description: Finishing position
230 | - name: position_text
231 | description: Text representation of finishing position
232 | - name: position_order
233 | description: Order of finishing position
234 | - name: points
235 | description: Points scored in the race
236 | - name: laps
237 | description: Number of laps completed
238 | - name: time
239 | description: Time taken to complete the race
240 | - name: milliseconds
241 | description: Time taken to complete the race in milliseconds
242 | - name: fastest_lap
243 | description: Lap number of the driver's fastest lap
244 | - name: rank
245 | description: Rank of the driver's fastest lap time
246 | - name: fastest_lap_time
247 | description: Time of the driver's fastest lap
248 | - name: fastest_lap_speed
249 | description: Speed of the driver's fastest lap
250 | - name: status_id
251 | description: Unique identifier for the driver's race status
252 |
253 | - name: seasons
254 | description: This table contains information about Formula 1 seasons.
255 | columns:
256 | - name: year
257 | description: The year of the season
258 | - name: url
259 | description: Wikipedia URL for the season
260 |
261 | - name: sprint_results
262 | description: This table contains sprint race results for drivers and constructors.
263 | columns:
264 | - name: sprint_result_id
265 | description: Unique identifier for the sprint race result
266 | - name: race_id
267 | description: Unique identifier for the race
268 | - name: driver_id
269 | description: Unique identifier for the driver
270 | - name: constructor_id
271 | description: Unique identifier for the constructor
272 | - name: grid
273 | description: Starting grid position
274 | - name: position
275 | description: Finishing position
276 | - name: position_order
277 | description: Order of finishing position
278 | - name: points
279 | description: Points scored in the sprint race
280 | - name: laps
281 | description: Number of laps completed
282 | - name: time
283 | description: Time taken to complete the sprint race
284 | - name: milliseconds
285 | description: Time taken to complete the sprint race in milliseconds
286 | - name: status_id
287 | description: Unique identifier for the driver's sprint race status
288 |
289 | - name: status
290 | description: One status per row. The status contextualizes whether the race was finished or what issues arose e.g. collisions, engine, etc.
291 | columns:
292 | - name: status_id
293 | tests:
294 | - unique
295 | - not_null
296 | description: Unique identifier for the status
297 | - name: status
298 | description: Description of the status, explaining the reason for the status (e.g., finished, retired, disqualified)
--------------------------------------------------------------------------------
/models/staging/formula1/stage_model_descriptions.md:
--------------------------------------------------------------------------------
1 | {% docs stg_circuit_desc %}
2 | [](https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.spa-francorchamps.be)
3 |
4 |
5 | This is the staging model which has dimensional data for each F1 circuit.
6 |
7 | {% enddocs %}
8 |
9 | {% docs stg_constructors_desc %}
10 | [](https://www.grandprix247.com/wp-content/uploads/2020/11/pic1.jpg)
11 |
12 |
13 | This is the staging model for constructor dimensional data.
14 |
15 | {% enddocs %}
--------------------------------------------------------------------------------
/models/staging/formula1/staging.yml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | exposures:
4 |
5 | - name: Formula1
6 | label: F1
7 | type: analysis
8 | maturity: high
9 | url: https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020
10 | description: >
11 | This exposure is to document the source of the data.
12 | owner:
13 | name: Ian Cooper
14 | email: ian.cooper@dbtlabs.com
15 |
16 | models:
17 | - name: stg_circuits
18 | description: '{{ doc("stg_circuit_desc") }}'
19 | columns:
20 | - name: circuit_id
21 | description: "The primary key for this table, each circuit is distinct"
22 | tests:
23 | - unique
24 | - not_null
25 | - name: circuit_ref
26 | description: '{{ doc("circuit_ref_def") }}'
27 | - name: circuit_name
28 | description: '{{ doc("circuit_name_def") }}'
29 | - name: circuit_location
30 | description: '{{ doc("circuit_location_def") }}'
31 | - name: circuit_country
32 | description: '{{ doc("circuit_country_def") }}'
33 | - name: latitude
34 | description: '{{ doc("circuit_lat_def") }}'
35 | - name: longitude
36 | description: '{{ doc("circuit_lng_def") }}'
37 | - name: altitude
38 | description: '{{ doc("circuit_alt_def") }}'
39 | - name: circuit_url
40 | description: '{{ doc("circuit_url_def") }}'
41 |
42 | - name: stg_constructor_results
43 | description: "Fact data related to constructor (ie brand) results. This is at a race level but race info isn't in this model, just the race_id."
44 | columns:
45 | - name: constructor_results_id
46 | description: "The primary key for this table, result has its own id."
47 | tests:
48 | - unique
49 | - not_null
50 | - name: race_id
51 | description: '{{ doc("race_id_def") }}'
52 | - name: constructor_id
53 | description: '{{ doc("constructor_id_def") }}'
54 | - name: constructor_points
55 | description: '{{ doc("constructor_points_def") }}'
56 | - name: status
57 | description: '{{ doc("constructor_status_def") }}'
58 |
59 | - name: stg_constructor_standings
60 | description: "Fact data related to constructor (ie brand) results. This is at a race level but race info isn't in this model, just the race_id."
61 | columns:
62 | - name: constructor_standings_id
63 | description: "The primary key for this table, result has its own id."
64 | tests:
65 | - unique
66 | - not_null
67 | - name: race_id
68 | description: '{{ doc("race_id_def") }}'
69 | - name: constructor_id
70 | description: '{{ doc("constructor_id_def") }}'
71 | - name: points
72 | description: '{{ doc("constructor_points_def") }}'
73 | - name: constructor_position
74 | description: '{{ doc("constructor_status_def") }}'
75 | - name: position_text
76 | description: '{{ doc("constructor_points_def") }}'
77 | - name: wins
78 | description: '{{ doc("constructor_wins_def") }}'
79 |
80 | - name: stg_constructors
81 | description: '{{ doc("stg_constructors_desc") }}'
82 | columns:
83 | - name: constructor_id
84 | description: "The primary key for this table, each team has its own id."
85 | tests:
86 | - unique
87 | - not_null
88 | - name: constructor_ref
89 | description: '{{ doc("constructor_ref_def") }}'
90 | - name: constructor_name
91 | description: '{{ doc("constructor_name_def") }}'
92 | - name: constructor_nationality
93 | description: '{{ doc("constructor_nationality_def") }}'
94 | - name: constructor_url
95 | description: '{{ doc("constructor_url_def") }}'
96 |
97 | - name: stg_driver_standings
98 | description: "Fact data for driver standings by race."
99 | columns:
100 | - name: driver_standings_id
101 | description: "The primary key for this table."
102 | tests:
103 | - unique
104 | - not_null
105 | - name: race_id
106 | description: '{{ doc("race_id_def") }}'
107 | - name: driver_id
108 | description: '{{ doc("driver_id_def") }}'
109 | - name: driver_code
110 | description: '{{ doc("driver_code_def") }}'
111 | - name: driver_points
112 | description: '{{ doc("driver_points_def") }}'
113 | - name: driver_position
114 | description: '{{ doc("driver_position_def") }}'
115 | - name: position_text
116 | description: '{{ doc("position_text_def") }}'
117 | - name: driver_wins
118 | description: '{{ doc("driver_wins_def") }}'
119 |
120 | - name: stg_drivers
121 | description: "Dimensional model stage for driver attributes."
122 | columns:
123 | - name: driver_id
124 | description: "The primary key for this table."
125 | tests:
126 | - unique
127 | - not_null
128 | - name: driver_ref
129 | description: '{{ doc("driver_ref_def") }}'
130 | - name: driver_number
131 | description: '{{ doc("driver_number_def") }}'
132 | - name: forename
133 | description: '{{ doc("driver_forename_def") }}'
134 | - name: surname
135 | description: '{{ doc("driver_surname_def") }}'
136 | - name: date_of_birth
137 | description: '{{ doc("driver_dob_def") }}'
138 | - name: driver_nationality
139 | description: '{{ doc("driver_nationality_def") }}'
140 | - name: driver_url
141 | description: '{{ doc("driver_url_def") }}'
142 |
143 | - name: stg_lap_times
144 | description: "Fact data around every lap."
145 | columns:
146 | - name: lap_times_id
147 | description: "The primary key for this table."
148 | tests:
149 | - unique
150 | - not_null
151 | - name: race_id
152 | description: '{{ doc("race_id_def") }}'
153 | - name: driver_id
154 | description: '{{ doc("driver_id_def") }}'
155 | - name: lap
156 | description: '{{ doc("laps_def") }}'
157 | - name: driver_position
158 | description: '{{ doc("driver_position_def") }}'
159 | - name: lap_time_formatted
160 | description: '{{ doc("lap_time_def") }}'
161 | - name: official_laptime
162 | description: '{{ doc("official_lap_time_def") }}'
163 | - name: lap_time_milliseconds
164 | description: '{{ doc("lap_time_milliseconds_def") }}'
165 |
166 | - name: stg_pit_stops
167 | description: "Fact data around pitstop times."
168 | columns:
169 | - name: pitstop_id
170 | description: "The primary key for this table."
171 | tests:
172 | - unique
173 | - not_null
174 | - name: race_id
175 | description: '{{ doc("race_id_def") }}'
176 | - name: driver_id
177 | description: '{{ doc("driver_id_def") }}'
178 | - name: stop_number
179 | description: '{{ doc("laps_def") }}'
180 | - name: lap
181 | description: '{{ doc("stop_number_def") }}'
182 | - name: pit_stop_time
183 | description: '{{ doc("pit_stop_time_def") }}'
184 | - name: pit_stop_duration_seconds
185 | description: '{{ doc("pit_stop_duration_seconds_def") }}'
186 | - name: pit_stop_duration
187 | description: '{{ doc("pit_stop_duration_def") }}'
188 | - name: pit_stop_milliseconds
189 | description: '{{ doc("pit_stop_milliseconds_def") }}'
190 |
191 | - name: stg_qualifying
192 | description: "Fact data around qualifying times."
193 | columns:
194 | - name: qualifying_id
195 | description: "The primary key for this table."
196 | tests:
197 | - unique
198 | - not_null
199 | - name: race_id
200 | description: '{{ doc("race_id_def") }}'
201 | - name: driver_id
202 | description: '{{ doc("driver_id_def") }}'
203 | - name: constructor_id
204 | description: '{{ doc("constructor_id_def") }}'
205 | - name: driver_number
206 | description: '{{ doc("driver_number_def") }}'
207 | - name: qualifying_position
208 | description: '{{ doc("qualifying_position_def") }}'
209 | - name: q1_time
210 | description: '{{ doc("q1_time_def") }}'
211 | - name: q2_time
212 | description: '{{ doc("q2_time_def") }}'
213 | - name: q3_time
214 | description: '{{ doc("q3_time_def") }}'
215 |
216 | - name: stg_races
217 | description: "Fact data around qualifying times."
218 | columns:
219 | - name: race_id
220 | description: "The primary key for this table."
221 | tests:
222 | - unique
223 | - not_null
224 | - name: race_year
225 | description: '{{ doc("race_year_def") }}'
226 | - name: race_round
227 | description: '{{ doc("race_round_def") }}'
228 | - name: circuit_id
229 | description: '{{ doc("circuit_id_def") }}'
230 | - name: race_name
231 | description: '{{ doc("race_name_def") }}'
232 | - name: race_date
233 | description: '{{ doc("race_date_def") }}'
234 | - name: race_time
235 | description: '{{ doc("race_time_def") }}'
236 | - name: race_url
237 | description: '{{ doc("race_url_def") }}'
238 | - name: fp1_date
239 | description: '{{ doc("fp1_date_def") }}'
240 | - name: fp1_time
241 | description: '{{ doc("fp1_time_def") }}'
242 | - name: fp2_date
243 | description: '{{ doc("fp2_date_def") }}'
244 | - name: fp2_time
245 | description: '{{ doc("fp2_time_def") }}'
246 | - name: fp3_date
247 | description: '{{ doc("fp3_date_def") }}'
248 | - name: fp3_time
249 | description: '{{ doc("fp3_time_def") }}'
250 | - name: quali_date
251 | description: '{{ doc("quali_date_def") }}'
252 | - name: quali_time
253 | description: '{{ doc("quali_time_def") }}'
254 | - name: sprint_date
255 | description: '{{ doc("sprint_date_def") }}'
256 | - name: sprint_time
257 | description: '{{ doc("sprint_time_def") }}'
258 |
259 | - name: stg_results
260 | description: "Fact data around qualifying times."
261 | columns:
262 | - name: result_id
263 | description: "The primary key for this table."
264 | tests:
265 | - unique
266 | - not_null
267 | - name: race_id
268 | description: '{{ doc("race_id_def") }}'
269 | - name: driver_id
270 | description: '{{ doc("driver_id_def") }}'
271 | - name: constructor_id
272 | description: '{{ doc("constructor_id_def") }}'
273 | - name: driver_number
274 | description: '{{ doc("driver_number_def") }}'
275 | - name: grid
276 | description: '{{ doc("grid_def") }}'
277 | - name: driver_position
278 | description: '{{ doc("driver_position_def") }}'
279 | - name: position_text
280 | description: '{{ doc("position_text_def") }}'
281 | - name: position_order
282 | description: '{{ doc("position_order_def") }}'
283 | - name: points
284 | description: '{{ doc("points_def") }}'
285 | - name: laps
286 | description: '{{ doc("laps_def") }}'
287 | - name: race_time
288 | description: '{{ doc("race_time_def") }}'
289 | - name: milliseconds
290 | description: '{{ doc("results_milliseconds_def") }}'
291 | - name: fastest_lap
292 | description: '{{ doc("fastest_lap_def") }}'
293 | - name: driver_rank
294 | description: '{{ doc("driver_rank_def") }}'
295 | - name: fastest_lap_time
296 | description: '{{ doc("fastest_lap_time_def") }}'
297 | - name: fastest_lap_speed
298 | description: '{{ doc("fastest_lap_speed_def") }}'
299 | - name: status_id
300 | description: '{{ doc("status_id_def") }}'
301 |
302 |
303 | - name: stg_seasons
304 | description: "Season information"
305 | columns:
306 | - name: season_year
307 | description: "The primary key for this table."
308 | tests:
309 | - unique
310 | - not_null
311 | - name: season_url
312 | description: "The Wiki URL for F1 Season"
313 |
314 | - name: stg_sprint_results
315 | description: "Fact data around qualifying times."
316 | columns:
317 | - name: result_id
318 | description: "The primary key for this table."
319 | tests:
320 | - unique
321 | - not_null
322 | - name: race_id
323 | description: '{{ doc("race_id_def") }}'
324 | - name: driver_id
325 | description: '{{ doc("driver_id_def") }}'
326 | - name: constructor_id
327 | description: '{{ doc("constructor_id_def") }}'
328 | - name: driver_number
329 | description: '{{ doc("driver_number_def") }}'
330 | - name: grid
331 | description: '{{ doc("grid_def") }}'
332 | - name: driver_position
333 | description: '{{ doc("driver_position_def") }}'
334 | - name: position_text
335 | description: '{{ doc("position_text_def") }}'
336 | - name: position_order
337 | description: '{{ doc("position_order_def") }}'
338 | - name: points
339 | description: '{{ doc("points_def") }}'
340 | - name: laps
341 | description: '{{ doc("laps_def") }}'
342 | - name: sprint_time
343 | description: '{{ doc("sprint_time_def") }}'
344 | - name: milliseconds
345 | description: '{{ doc("results_milliseconds_def") }}'
346 | - name: fastest_lap
347 | description: '{{ doc("fastest_lap_def") }}'
348 | - name: fastest_lap_time
349 | description: '{{ doc("fastest_lap_time_def") }}'
350 | - name: status_id
351 | description: '{{ doc("status_id_def") }}'
352 |
353 |
354 | - name: stg_status
355 | description: "Status codes mapping."
356 | columns:
357 | - name: status_id
358 | description: "The primary key for this table."
359 | tests:
360 | - unique
361 | - not_null
362 | - name: status
363 | description: '{{ doc("status_def") }}'
--------------------------------------------------------------------------------
/models/staging/formula1/stg_circuits.sql:
--------------------------------------------------------------------------------
1 | with
2 | circuits as (select * from {{ source('formula1', 'circuits') }}),
3 | renamed as (
4 | select
5 | circuit_id as circuit_id,
6 | circuit_ref as circuit_ref,
7 | name as circuit_name,
8 | location as circuit_location,
9 | country as circuit_country,
10 | lat as latitude,
11 | lng as longitude,
12 | to_number(iff(contains(alt, 'N'), null, alt)) as altitude,
13 | url as circuit_url
14 | from circuits
15 | )
16 | select *
17 | from renamed
18 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_constructor_results.sql:
--------------------------------------------------------------------------------
1 | with
2 | constructor_results as (
3 | select * from {{ source('formula1', 'constructor_results') }}
4 | ),
5 | renamed as (
6 | select
7 | constructor_results_id as constructor_results_id,
8 | race_id as race_id,
9 | constructor_id as constructor_id,
10 | points as constructor_points,
11 | iff(contains(status, '\N'), null, status) as status
12 | from constructor_results
13 | )
14 | select *
15 | from renamed
16 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_constructor_standings.sql:
--------------------------------------------------------------------------------
1 | with
2 | constructor_standings as (
3 | select * from {{ source('formula1', 'constructor_standings') }}
4 | ),
5 | renamed as (
6 | select
7 | constructor_standings_id as constructor_standings_id,
8 | race_id as race_id,
9 | constructor_id as constructor_id,
10 | points as points,
11 | "POSITION" as constructor_position,
12 | position_text as position_text,
13 | wins as wins
14 | from constructor_standings
15 | )
16 | select *
17 | from renamed
18 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_constructors.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | constructors as (select * from {{ source('formula1', 'constructors') }}),
4 |
5 | renamed as (
6 | select
7 | constructor_id as constructor_id,
8 | constructor_ref as constructor_ref,
9 | name as constructor_name,
10 | nationality as constructor_nationality,
11 | url as constructor_url
12 | from constructors
13 | )
14 |
15 | select *
16 | from renamed
17 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_driver_standings.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | driver_standings as (select * from {{ source('formula1', 'driver_standings') }}),
4 |
5 | renamed as (
6 | select
7 | driver_standings_id as driver_standings_id,
8 | race_id as race_id,
9 | driver_id as driver_id,
10 | points as driver_points,
11 | "POSITION" as driver_position,
12 | position_text as position_text,
13 | wins as driver_wins
14 | from driver_standings
15 | )
16 |
17 | select *
18 | from renamed
19 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_drivers.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | drivers as (select * from {{ source('formula1', 'drivers') }}),
4 |
5 | renamed as (
6 | select
7 | driver_id as driver_id,
8 | driver_ref as driver_ref,
9 | number as driver_number,
10 | code as driver_code,
11 | forename,
12 | surname,
13 | dob as date_of_birth,
14 | nationality as driver_nationality,
15 | url as driver_url
16 | from drivers
17 | )
18 |
19 | select *
20 | from renamed
21 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_lap_times.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | lap_times as (select * from {{ source('formula1', 'lap_times') }}),
4 |
5 | renamed as (
6 | select
7 | race_id as race_id,
8 | driver_id as driver_id,
9 | lap,
10 | "POSITION" as driver_position,
11 | "TIME" as lap_time_formatted,
12 | {{ convert_laptime("lap_time_formatted") }} as official_laptime,
13 | milliseconds as lap_time_milliseconds
14 | from lap_times
15 | )
16 |
17 | select
18 | {{ dbt_utils.generate_surrogate_key(["race_id", "driver_id", "lap"]) }}
19 | as lap_times_id,
20 | *
21 | from renamed
22 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_pit_stops.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | pit_stops as (select * from {{ source('formula1', 'pit_stops') }}),
4 |
5 | renamed as (
6 | select
7 | race_id as race_id,
8 | driver_id as driver_id,
9 | stop as stop_number,
10 | lap,
11 | "TIME" as pit_stop_time,
12 | duration as pit_stop_duration_seconds,
13 | {{ convert_laptime("pit_stop_duration_seconds") }} as pit_stop_duration,
14 | milliseconds as pit_stop_milliseconds
15 | from pit_stops
16 | )
17 |
18 | select
19 | {{ dbt_utils.generate_surrogate_key(["race_id", "driver_id", "stop_number"]) }}
20 | as pitstop_id,
21 | *
22 | from renamed
23 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_qualifying.sql:
--------------------------------------------------------------------------------
1 | with
2 | qualifying as (select * from {{ source("formula1", "qualifying") }}),
3 |
4 | renamed as (
5 | select
6 | qualifying_id as qualifying_id,
7 | race_id as race_id,
8 | driver_id as driver_id,
9 | constructor_id as constructor_id,
10 | number as driver_number,
11 | "POSITION" as qualifying_position,
12 | iff(contains(q1, '\N'), null, {{ convert_laptime("q1") }}) as q1_time,
13 | iff(contains(q2, '\N'), null, {{ convert_laptime("q2") }}) as q2_time,
14 | iff(contains(q3, '\N'), null, {{ convert_laptime("q3") }}) as q3_time
15 | from qualifying
16 | )
17 |
18 | select *
19 | from renamed
20 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_races.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | races as (
4 |
5 | select * from {{ source('formula1','races') }}
6 |
7 | ),
8 |
9 | renamed as (
10 | select
11 | race_id as race_id,
12 | "YEAR" as race_year,
13 | "ROUND" as race_round,
14 | circuit_id as circuit_id,
15 | name as race_name,
16 | "DATE" as race_date,
17 | "TIME" as race_time,
18 | url as race_url,
19 | fp1_date,
20 | fp1_time,
21 | fp2_date,
22 | fp2_time,
23 | fp3_date,
24 | fp3_time,
25 | quali_date,
26 | quali_time,
27 | sprint_date,
28 | sprint_time
29 | from races
30 | )
31 |
32 | select * from renamed
--------------------------------------------------------------------------------
/models/staging/formula1/stg_results.sql:
--------------------------------------------------------------------------------
1 | with
2 |
3 | results as (select * from {{ source('formula1', 'results') }}),
4 |
5 | renamed as (
6 | select
7 | result_id as result_id,
8 | race_id as race_id,
9 | driver_id as driver_id,
10 | constructor_id as constructor_id,
11 | number as driver_number,
12 | grid,
13 | --position::int as position,
14 | iff(contains(position, '\N'), null, position) as driver_position,
15 | position_text as position_text,
16 | position_order as position_order,
17 | points,
18 | laps,
19 | iff(contains("TIME", '\N'), null, "TIME") as race_time,
20 | iff(contains(milliseconds, '\N'), null, milliseconds) as milliseconds,
21 | iff(contains(fastest_lap, '\N'), null, fastest_lap) as fastest_lap,
22 | "RANK" as driver_rank,
23 | iff(contains(fastest_lap_time, '\N'),null,{{ convert_laptime("fastest_lap_time") }}) as fastest_lap_time,
24 | iff(contains(fastest_lap_speed, '\N'), null, fastest_lap_speed) as fastest_lap_speed,
25 | status_id as status_id
26 | from results
27 | )
28 |
29 | select *
30 | from renamed
31 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_seasons.sql:
--------------------------------------------------------------------------------
1 | with
2 | seasons as (select * from {{ source('formula1', 'seasons') }}),
3 | renamed as (
4 | select
5 | -- need surogate key with race_id, driver_id and stop
6 | "YEAR" as season_year, url as season_url
7 | from seasons
8 | )
9 | select *
10 | from renamed
11 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_sprint_results.sql:
--------------------------------------------------------------------------------
1 | with sprint_results as (select * from {{ source('formula1', 'sprint_results') }}),
2 |
3 | renamed as (
4 | select
5 | result_id as result_id,
6 | race_id as race_id,
7 | driver_id as driver_id,
8 | constructor_id as constructor_id,
9 | number as driver_number,
10 | grid as grid,
11 | iff(contains("POSITION", '\N'), null, "POSITION") as driver_position,
12 | position_text as position_text,
13 | position_order as position_order,
14 | points as points,
15 | laps as laps,
16 | iff(contains("TIME", '\N'), null, "TIME") as sprint_time,
17 | iff(contains(milliseconds, '\N'), null, milliseconds) as milliseconds,
18 | iff(contains(fastest_lap, '\N'), null, fastest_lap) as fastest_lap,
19 | iff(
20 | contains(fastest_lap_time, '\N'),
21 | null,
22 | {{ convert_laptime("fastest_lap_time") }}
23 | ) as fastest_lap_time,
24 | status_id as status_id
25 | from sprint_results
26 | )
27 |
28 | select *
29 | from renamed
30 |
--------------------------------------------------------------------------------
/models/staging/formula1/stg_status.sql:
--------------------------------------------------------------------------------
1 | WITH status AS (
2 |
3 | SELECT * FROM {{ source('formula1','status') }}
4 |
5 | ),
6 |
7 | renamed AS (
8 | SELECT
9 | status_id AS status_id,
10 | status
11 | FROM status
12 | )
13 |
14 | SELECT * FROM renamed
--------------------------------------------------------------------------------
/packages.yml:
--------------------------------------------------------------------------------
1 | packages:
2 | - package: dbt-labs/dbt_utils
3 | version: 1.0.0
--------------------------------------------------------------------------------
/seeds/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dbt-labs/python-snowpark-formula1/b6caebae7914a119c66ac16a4430f5e10faf41a8/seeds/.gitkeep
--------------------------------------------------------------------------------
/setup/existing_account_snowflake_grants.sql:
--------------------------------------------------------------------------------
1 | /*
2 | If you already have an existing account or accidentally drop your original database, you might need to grant or regrant permissions.
3 | This file is here in the project for these circumstances. A great video on this setup can be found at: https://www.youtube.com/watch?v=kbCkwhySV_I.
4 | The code below assumes that the tranformer role and transforming warehouse have already been created in Snowflake.
5 | */
6 |
7 | /*
8 | Giving access to database, schema, and all tables within the schema to our transformer role.
9 | The permissions here are for the raw data that we read in.
10 | */
11 | grant usage on database formula1 to role transformer;
12 | grant usage on schema formula1.raw to role transformer;
13 | grant select on all tables in schema formula1.raw to role transformer;
14 |
15 | /*
16 | Giving access to database, schema, and all tables within the schema to our transformer role.
17 | The permissions here are for database and schema for the transformed data that we will write to creating new schemas and tables.
18 | */
19 | grant usage on database analytics to role transformer;
20 | grant modify on database analytics to role transformer;
21 | grant monitor on database analytics to role transformer;
22 | grant create schema on database analytics to role transformer;
23 |
24 |
25 | /*
26 | Give permission for our transformer role to use the warehouse to run our compute and commands on.
27 | */
28 | grant operate on warehouse transforming to role transformer;
29 | grant usage on warehouse transforming to role transformer;
--------------------------------------------------------------------------------
/setup/pc_snowflake_grants.sql:
--------------------------------------------------------------------------------
1 | /*
2 | If you accidentally drop your original database after setting up partner connect, you might need to grant or regrant permissions to build your dbt models.
3 | This file is here in the project for these circumstances. A great video on this setup can be found at: https://www.youtube.com/watch?v=kbCkwhySV_I.
4 | The code below assumes that the PC_DBT_ROLE role and PC_DBT_WH warehouse have already been created in Snowflake as part of the partner connect process.
5 | */
6 |
7 | /*
8 | Giving access to database, schema, and all tables within the schema to our transformer role.
9 | The permissions here are for the raw data that we read in.
10 | */
11 | grant usage on database formula1 to role PC_DBT_ROLE;
12 | grant usage on schema formula1.raw to role PC_DBT_ROLE;
13 | grant select on all tables in schema formula1.raw to role PC_DBT_ROLE;
14 |
15 | /*
16 | Giving access to database, schema, and all tables within the schema to our PC_DBT_ROLE role.
17 | The permissions here are for database and schema for the transformed data that we will write to creating new schemas and tables.
18 | */
19 | grant usage on database PC_DBT_DB to role PC_DBT_ROLE;
20 | grant modify on database PC_DBT_DB to role PC_DBT_ROLE;
21 | grant monitor on database PC_DBT_DB to role PC_DBT_ROLE;
22 | grant create schema on database PC_DBT_DB to role PC_DBT_ROLE;
23 |
24 |
25 | /*
26 | Give permission for our PC_DBT_ROLE role to use the PC_DBT_WH warehouse to run our compute and commands on.
27 | */
28 | grant operate on warehouse PC_DBT_WH to role PC_DBT_ROLE;
29 | grant usage on warehouse PC_DBT_WH to role PC_DBT_ROLE;
--------------------------------------------------------------------------------
/setup/setup_script_s3_to_snowflake.sql:
--------------------------------------------------------------------------------
1 | /*
2 | This is our setup script to create a new database for the Formula1 data in Snowflake.
3 | We are copying data from a public s3 bucket into snowflake by defining our csv format and snowflake stage.
4 | */
5 | -- create and define our formula1 database
6 | create or replace database formula1;
7 | use database formula1;
8 | create or replace schema raw;
9 | use schema raw;
10 |
11 | --define our file format for reading in the csvs
12 | create or replace file format csvformat
13 | type = csv
14 | field_delimiter =','
15 | field_optionally_enclosed_by = '"',
16 | skip_header=1;
17 |
18 | --
19 | create or replace stage formula1_stage
20 | file_format = csvformat
21 | url = 's3://formula1-dbt-cloud-python-demo/formula1-kaggle-data/';
22 |
23 | -- load in the 8 tables we need for our demo
24 | -- we are first creating the table then copying our data in from s3
25 | -- think of this as an empty container or shell that we are then filling
26 |
27 | --CIRCUITS
28 | create or replace table formula1.raw.circuits (
29 | CIRCUIT_ID NUMBER(38,0),
30 | CIRCUIT_REF VARCHAR(16777216),
31 | NAME VARCHAR(16777216),
32 | LOCATION VARCHAR(16777216),
33 | COUNTRY VARCHAR(16777216),
34 | LAT FLOAT,
35 | LNG FLOAT,
36 | ALT NUMBER(38,0),
37 | URL VARCHAR(16777216)
38 | );
39 | -- copy our data from public s3 bucket into our tables
40 | copy into circuits
41 | from @formula1_stage/circuits.csv
42 | on_error='continue';
43 |
44 | --CONSTRUCTOR RESULTS
45 | create or replace table formula1.raw.constructor_results (
46 | CONSTRUCTOR_RESULTS_ID NUMBER(38,0),
47 | RACE_ID NUMBER(38,0),
48 | CONSTRUCTOR_ID NUMBER(38,0),
49 | POINTS NUMBER(38,0),
50 | STATUS VARCHAR(16777216)
51 | );
52 | copy into constructor_results
53 | from @formula1_stage/constructor_results.csv
54 | on_error='continue';
55 |
56 | --CONSTRUCTOR STANDINGS
57 | create or replace table formula1.raw.constructor_standings (
58 | CONSTRUCTOR_STANDINGS_ID NUMBER(38,0),
59 | RACE_ID NUMBER(38,0),
60 | CONSTRUCTOR_ID NUMBER(38,0),
61 | POINTS NUMBER(38,0),
62 | POSITION FLOAT,
63 | POSITION_TEXT VARCHAR(16777216),
64 | WINS NUMBER(38,0)
65 | );
66 | copy into constructor_standings
67 | from @formula1_stage/constructor_standings.csv
68 | on_error='continue';
69 |
70 | --CONSTRUCTORS
71 | create or replace table formula1.raw.constructors (
72 | CONSTRUCTOR_ID NUMBER(38,0),
73 | CONSTRUCTOR_REF VARCHAR(16777216),
74 | NAME VARCHAR(16777216),
75 | NATIONALITY VARCHAR(16777216),
76 | URL VARCHAR(16777216)
77 | );
78 | copy into constructors
79 | from @formula1_stage/constructors.csv
80 | on_error='continue';
81 |
82 | --DRIVER STANDINGS
83 | create or replace table formula1.raw.driver_standings (
84 | DRIVER_STANDINGS_ID NUMBER(38,0),
85 | RACE_ID NUMBER(38,0),
86 | DRIVER_ID NUMBER(38,0),
87 | POINTS NUMBER(38,0),
88 | POSITION FLOAT,
89 | POSITION_TEXT VARCHAR(16777216),
90 | WINS NUMBER(38,0)
91 |
92 | );
93 | copy into driver_standings
94 | from @formula1_stage/driver_standings.csv
95 | on_error='continue';
96 |
97 | --DRIVERS
98 | create or replace table formula1.raw.drivers (
99 | DRIVER_ID NUMBER(38,0),
100 | DRIVER_REF VARCHAR(16777216),
101 | NUMBER VARCHAR(16777216),
102 | CODE VARCHAR(16777216),
103 | FORENAME VARCHAR(16777216),
104 | SURNAME VARCHAR(16777216),
105 | DOB DATE,
106 | NATIONALITY VARCHAR(16777216),
107 | URL VARCHAR(16777216)
108 | );
109 | copy into drivers
110 | from @formula1_stage/drivers.csv
111 | on_error='continue';
112 |
113 | --LAP TIMES
114 | create or replace table formula1.raw.lap_times (
115 | RACE_ID NUMBER(38,0),
116 | DRIVER_ID NUMBER(38,0),
117 | LAP NUMBER(38,0),
118 | POSITION FLOAT,
119 | TIME VARCHAR(16777216),
120 | MILLISECONDS NUMBER(38,0)
121 | );
122 | copy into lap_times
123 | from @formula1_stage/lap_times.csv
124 | on_error='continue';
125 |
126 | --PIT STOPS
127 | create or replace table formula1.raw.pit_stops (
128 | RACE_ID NUMBER(38,0),
129 | DRIVER_ID NUMBER(38,0),
130 | STOP NUMBER(38,0),
131 | LAP NUMBER(38,0),
132 | TIME VARCHAR(16777216),
133 | DURATION VARCHAR(16777216),
134 | MILLISECONDS NUMBER(38,0)
135 | );
136 | copy into pit_stops
137 | from @formula1_stage/pit_stops.csv
138 | on_error='continue';
139 |
140 | --QUALIFYING
141 | create or replace table formula1.raw.qualifying (
142 | QUALIFYING_ID NUMBER(38,0),
143 | RACE_ID NUMBER(38,0),
144 | DRIVER_ID NUMBER(38,0),
145 | CONSTRUCTOR_ID NUMBER(38,0),
146 | NUMBER NUMBER(38,0),
147 | POSITION FLOAT,
148 | Q1 VARCHAR(16777216),
149 | Q2 VARCHAR(16777216),
150 | Q3 VARCHAR(16777216)
151 | );
152 | copy into qualifying
153 | from @formula1_stage/qualifying.csv
154 | on_error='continue';
155 |
156 | --RACES
157 | create or replace table formula1.raw.races (
158 | RACE_ID NUMBER(38,0),
159 | YEAR NUMBER(38,0),
160 | ROUND NUMBER(38,0),
161 | CIRCUIT_ID NUMBER(38,0),
162 | NAME VARCHAR(16777216),
163 | DATE DATE,
164 | TIME VARCHAR(16777216),
165 | URL VARCHAR(16777216),
166 | FP1_DATE VARCHAR(16777216),
167 | FP1_TIME VARCHAR(16777216),
168 | FP2_DATE VARCHAR(16777216),
169 | FP2_TIME VARCHAR(16777216),
170 | FP3_DATE VARCHAR(16777216),
171 | FP3_TIME VARCHAR(16777216),
172 | QUALI_DATE VARCHAR(16777216),
173 | QUALI_TIME VARCHAR(16777216),
174 | SPRINT_DATE VARCHAR(16777216),
175 | SPRINT_TIME VARCHAR(16777216)
176 | );
177 | copy into races
178 | from @formula1_stage/races.csv
179 | on_error='continue';
180 |
181 | --RESULTS
182 | create or replace table formula1.raw.results (
183 | RESULT_ID NUMBER(38,0),
184 | RACE_ID NUMBER(38,0),
185 | DRIVER_ID NUMBER(38,0),
186 | CONSTRUCTOR_ID NUMBER(38,0),
187 | NUMBER NUMBER(38,0),
188 | GRID NUMBER(38,0),
189 | POSITION FLOAT,
190 | POSITION_TEXT VARCHAR(16777216),
191 | POSITION_ORDER NUMBER(38,0),
192 | POINTS NUMBER(38,0),
193 | LAPS NUMBER(38,0),
194 | TIME VARCHAR(16777216),
195 | MILLISECONDS NUMBER(38,0),
196 | FASTEST_LAP NUMBER(38,0),
197 | RANK NUMBER(38,0),
198 | FASTEST_LAP_TIME VARCHAR(16777216),
199 | FASTEST_LAP_SPEED FLOAT,
200 | STATUS_ID NUMBER(38,0)
201 | );
202 | copy into results
203 | from @formula1_stage/results.csv
204 | on_error='continue';
205 |
206 | --SEASONS
207 | create or replace table formula1.raw.seasons (
208 | YEAR NUMBER(38,0),
209 | URL VARCHAR(16777216)
210 | );
211 | copy into seasons
212 | from @formula1_stage/seasons.csv
213 | on_error='continue';
214 |
215 | --SPRINT RESULTS
216 | create or replace table formula1.raw.sprint_results (
217 | RESULT_ID NUMBER(38,0),
218 | RACE_ID NUMBER(38,0),
219 | DRIVER_ID NUMBER(38,0),
220 | CONSTRUCTOR_ID NUMBER(38,0),
221 | NUMBER NUMBER(38,0),
222 | GRID NUMBER(38,0),
223 | POSITION FLOAT,
224 | POSITION_TEXT VARCHAR(16777216),
225 | POSITION_ORDER NUMBER(38,0),
226 | POINTS NUMBER(38,0),
227 | LAPS NUMBER(38,0),
228 | TIME VARCHAR(16777216),
229 | MILLISECONDS NUMBER(38,0),
230 | FASTEST_LAP VARCHAR(16777216),
231 | FASTEST_LAP_TIME VARCHAR(16777216),
232 | STATUS_ID NUMBER(38,0)
233 | );
234 | copy into sprint_results
235 | from @formula1_stage/sprint_results.csv
236 | on_error='continue';
237 |
238 | --STATUS
239 | create or replace table formula1.raw.status (
240 | STATUS_ID NUMBER(38,0),
241 | STATUS VARCHAR(16777216)
242 | );
243 | copy into status
244 | from @formula1_stage/status.csv
245 | on_error='continue';
--------------------------------------------------------------------------------
/snapshots/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dbt-labs/python-snowpark-formula1/b6caebae7914a119c66ac16a4430f5e10faf41a8/snapshots/.gitkeep
--------------------------------------------------------------------------------
/tests/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dbt-labs/python-snowpark-formula1/b6caebae7914a119c66ac16a4430f5e10faf41a8/tests/.gitkeep
--------------------------------------------------------------------------------
/tests/lap_times_moving_avg_assert_positive_or_null.sql:
--------------------------------------------------------------------------------
1 | {{
2 | config(
3 | enabled=true,
4 | severity='error',
5 | tags = ['bi']
6 | )
7 | }}
8 |
9 | with lap_times_moving_avg as ( select * from {{ ref('lap_times_moving_avg') }} )
10 |
11 | select *
12 | from lap_times_moving_avg
13 | where lap_moving_avg_5_years < 0 and lap_moving_avg_5_years is not null
--------------------------------------------------------------------------------
/tests/macro_pit_stops_mean_is_positive.sql:
--------------------------------------------------------------------------------
1 | {{
2 | config(
3 | enabled=true,
4 | severity='warn',
5 | tags = ['bi']
6 | )
7 | }}
8 |
9 |
10 | {{ test_all_values_gte_zero('fastest_pit_stops_by_constructor', 'mean') }}
--------------------------------------------------------------------------------