├── .DS_Store ├── LICENSES ├── CODE_OF_CONDUCT.md ├── LICENSE ├── LICENSE-CODE ├── README.md └── SECURITY.md ├── README.md ├── SECURITY.md ├── SQL2019BDC ├── 00 - Prerequisites.md ├── 01 - The Big Data Landscape.md ├── 02 - SQL Server BDC Components.md ├── 03 - Planning, Installation and Configuration.md ├── 04 - Operationalization.md ├── 05 - Management and Monitoring.md ├── 06 - Security.md ├── notebooks │ ├── README.md │ ├── bdc_tutorial_00.ipynb │ ├── bdc_tutorial_01.ipynb │ ├── bdc_tutorial_02.ipynb │ ├── bdc_tutorial_03.ipynb │ ├── bdc_tutorial_04.ipynb │ └── bdc_tutorial_05.ipynb └── ssms │ └── SQL Server Scripts for bdc │ ├── SQL Server Scripts for bdc.ssmssln │ └── SQL Server Scripts for bdc │ ├── 01 - Show Configuration.sql │ ├── 02 - Population Information from WWI.sql │ ├── 03 - Sales in WWI.sql │ ├── 04 - Join to HDFS.sql │ ├── 05 - Query from Data Pool.sql │ └── SQL Server Scripts for bdc.ssmssqlproj └── graphics ├── ADS-5.png ├── KubernetesCluster.png ├── WWI-001.png ├── WWI-002.png ├── WWI-003.png ├── WWI-logo.png ├── adf.png ├── ads-1.png ├── ads-2.png ├── ads-3.png ├── ads-4.png ├── ads.png ├── aks1.png ├── aks2.png ├── bdc-security-1.png ├── bdc.png ├── bdcportal.png ├── bdcsolution1.png ├── bdcsolution2.png ├── bdcsolution3.png ├── bdcsolution4.png ├── bookpencil.png ├── building1.png ├── bulletlist.png ├── checkbox.png ├── checkmark.png ├── clipboardcheck.png ├── cloud1.png ├── datamart.png ├── datamart1.png ├── datavirtualization.png ├── datavirtualization1.png ├── education1.png ├── factory.png ├── geopin.png ├── grafana.png ├── hdfs.png ├── kibana.png ├── kubectl.png ├── kubernetes1.png ├── listcheck.png ├── microsoftlogo.png ├── owl.png ├── paperclip1.png ├── pencil2.png ├── pinmap.png ├── point1.png ├── solutiondiagram.png ├── spark1.png ├── spark2.png ├── spark3.png ├── spark4.png ├── sqlbdc.png └── textbubble.png /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/.DS_Store -------------------------------------------------------------------------------- /LICENSES/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Microsoft Open Source Code of Conduct 2 | 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 4 | 5 | Resources: 6 | 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns 10 | -------------------------------------------------------------------------------- /LICENSES/LICENSE: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. -------------------------------------------------------------------------------- /LICENSES/LICENSE-CODE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Microsoft Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE 22 | -------------------------------------------------------------------------------- /LICENSES/README.md: -------------------------------------------------------------------------------- 1 | ![](https://github.com/microsoft/sqlworkshops-k8stobdc/blob/master/graphics/microsoftlogo.png?raw=true) 2 | 3 | # Workshop: Kubernetes - From Bare Metal to SQL Server Big Data Clusters 4 | 5 | #### A Microsoft Course from the SQL Server team 6 | 7 |

8 | 9 |

About this Workshop

10 | 11 | Welcome to this Microsoft solutions workshop on *Kubernetes - From Bare Metal to SQL Server Big Data Clusters*. In this workshop, you'll learn about setting up a production-grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of [Jupyter Notebooks](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) in Microsoft's [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-ver15) tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters. 12 | 13 | The focus of this workshop is to understand the hardware, software, and environment you need to work with [SQL Server 2019's big data clusters](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sql-server-ver15) on a Kubernetes platform. 14 | 15 | You'll start by understanding Containers and Kubernetes, moving on to a discussion of the hardware and software environment for Kubernetes, and then to more in-depth Kubernetes concepts. You'll follow-on with the SQL Server 2019 big data clusters architecture, and then how to use the entire system in a practical application, all with a focus on how to extrapolate what you have learned to create other solutions for your organization. 16 | 17 | > NOTE: This course is designed to be taught in-person with hardware or virtual environments provided by the instructional team. You will also get details for setting up your own hardware, virtual or Cloud environments for Kubernetes for a workshop backup or if you are not attending in-person. 18 | 19 | This [github README.MD file](https://lab.github.com/githubtraining/introduction-to-github) explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution. 20 | 21 | (You can view all of the [source files for this workshop on this github site, along with other workshops as well. Open this link in a new tab to find out more.](https://github.com/microsoft/sqlworkshops-k8stobdc)) 22 | 23 |

24 | 25 |

Learning Objectives

26 | 27 | In this workshop you'll learn: 28 |
29 | 30 | - How Containers and Kubernetes work and when and where you can use them 31 | - Hardware considerations for setting up a production Kubernetes Cluster on-premises 32 | - Considerations for Virtual and Cloud-based environments for production Kubernetes Cluster 33 | 34 | The concepts and skills taught in this workshop form the starting points for: 35 | 36 | Solution Architects, to understand how to design an end-to-end solution. 37 | System Administrators, Database Administrators, or Data Engineers, to understand how to put together an end-to-end solution. 38 | 39 |

40 | 41 |

Business Applications of this Workshop

42 | 43 | Businesses require stable, secure environments at scale, which work in secure on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven DevOps practices, which further streamline IT processes. 44 | 45 |

46 | 47 |

Technologies used in this Workshop

48 | 49 | The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided. 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 |
Technology Description
LinuxThe primary operating system used in and by Containers and Kubernetes
ContainersThe atomic layer of a Kubernetes Cluster
KubernetesThe primary clustering technology for manifest-driven environments
SQL Server Big Data ClustersRelational and non-relational data at scale with Spark, HDFS and application deployment capabilities
61 | 62 |

63 | 64 |

Before Taking this Workshop

65 | 66 | There are a few requirements for attending the workshop, listed below: 67 | - You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution. 68 | - You must have a Microsoft Azure account with the ability to create assets for the "backup" or self-taught path. 69 | - This workshop expects that you understand computer technologies, networking, the basics of SQL Server, HDFS, Spark, and general use of Hypervisors. 70 | - The **Setup** section below explains the steps you should take prior to coming to the workshop 71 | 72 | If you are new to any of these, here are a few references you can complete prior to class: 73 | 74 | - [Microsoft SQL Server Administration and Use](https://www.microsoft.com/en-us/learning/course.aspx?cid=OD20764) 75 | - [HDFS](https://data-flair.training/blogs/hadoop-hdfs-tutorial/) 76 | - [Spark](https://www.edx.org/course/implementing-predictive-analytics-with-spark-in-az) 77 | - [Hypervisor Technologies - Hyper-V](https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/hyper-v-technology-overview) 78 | or 79 | - [Hypervisor Technologies - VMWare](https://tsmith.co/free-vmware-training/) 80 | 81 |

Setup

82 | 83 | A full pre-requisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient). 84 | 85 |

86 | 87 |

Workshop Details

88 | 89 | This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for advanced analytics over large sets of data and Data Science workloads. 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 |
Primary Audience:Technical processionals tasked with configuring, deploying and managing large-scale clustering systems
Secondary Audience: Data professionals tasked with working with data at scale
Level: 300
Type: In-Person (self-guided possible)
Length: 8
100 | 101 |

102 | 103 |

Related Workshops

104 | 105 | - [50 days from zero to hero with Kubernetes](https://azure.microsoft.com/mediahandler/files/resourcefiles/kubernetes-learning-path/Kubernetes%20Learning%20Path%20version%201.0.pdf) 106 | 107 |

108 | 109 |

Workshop Modules

110 | 111 | This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution. 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 |
ModuleTopics
01 - An introduction to Linux, Containers and Kubernetes This module covers Container technologies and how they are different than Virtual Machines. You'll learn about the need for container orchestration using Kubernetes.
02 - Hardware and Virtualization environment for Kubernetes This module explains how to make a production-grade environment using "bare metal" computer hardware or with a virtualized platform, and most importantly the storage hardware aspects.
03 - Kubernetes Concepts and Implementation Covers deploying Kubernetes, Kubernetes contexts, cluster troubleshooting and management, services: load balancing versus node ports, understanding storage from a Kubernetes perspective and making your cluster secure.
04 - SQL Server Big Data Clusters Architecture This module will dig deep into the anatomy of a big data cluster by covering topics that include: the data pool, storage pool, compute pool and cluster control plane, active directory integration, development versus production configurations and the tools required for deploying and managing a big data cluster.
05 - Using the SQL Server big data cluster on Kubernetes for Data Science Now that your big data cluster is up, it's ready for data science workloads. This Jupyter Notebook and Azure Data Studio based module will cover the use of python and PySpark, T-SQL and the execution of Spark and Machine Learning workloads.
123 | 124 |

125 | 126 |

Next Steps

127 | 128 | 129 | Next, Continue to Pre-Requisites 130 | 131 | **Workshop Authors and Contributors** 132 | 133 | - [The Microsoft SQL Server Team](http://microsoft.com/sql) 134 | - [Chris Adkin](https://www.linkedin.com/in/wollatondba/), Pure Storage 135 | 136 | **Legal Notice** 137 | 138 | *Kubernetes and the Kubernetes logo are trademarks or registered trademarks of The Linux Foundation. in the United States and/or other countries. The Linux Foundation and other parties may also have trademark rights in other terms used herein. This Workshop is not certified, accredited, affiliated with, nor endorsed by Kubernetes or The Linux Foundation.* -------------------------------------------------------------------------------- /LICENSES/SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)) of a security vulnerability, please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd). 40 | 41 | 42 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![](graphics/microsoftlogo.png) 2 | 3 | # Workshop: SQL Server Big Data Clusters - Architecture 4 | 5 | #### A Microsoft Course from the SQL Server team 6 | 7 |

8 | 9 |
10 | 11 |
About this Workshop
12 |
Business Applications of this Workshop
13 |
Technologies used in this Workshop
14 |
Before Taking this Workshop
15 |
Workshop Details
16 |
Related Workshops
17 |
Workshop Modules
18 |
Next Steps
19 | 20 |
21 | 22 |

About this Workshop

23 | 24 | Welcome to this Microsoft solutions workshop on the architecture on *SQL Server Big Data Clusters*. In this workshop, you'll learn how SQL Server Big Data Clusters (BDC) implements large-scale data processing and machine learning, and how to select and plan for the proper architecture to enable machine learning to train your models using Python, R, Java or SparkML to operationalize these models, and how to deploy your intelligent apps side-by-side with their data. 25 | 26 | The focus of this workshop is to understand how to deploy an on-premises or local environment of a big data cluster, and understand the components of the big data solution architecture. 27 | 28 | You'll start by understanding the concepts of big data analytics, and you'll get an overview of the technologies (such as containers, container orchestration, Spark and HDFS, machine learning, and other technologies) that you will use throughout the workshop. Next, you'll understand the architecture of a BDC. You'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization. 29 | 30 | This [github README.MD file](https://lab.github.com/githubtraining/introduction-to-github) explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution. To download this Lab to your local computer, click the **Clone or Download** button you see at the top right side of this page. [More about that process is here](https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository). 31 | 32 | You can view all of the [courses and other workshops our team has created at this link - open in a new tab to find out more.](https://microsoft.github.io/sqlworkshops/) 33 | 34 |

35 | 36 |

Learning Objectives

37 | 38 | In this workshop you'll learn: 39 |
40 | 41 | - When to use Big Data technology 42 | - The components and technologies of Big Data processing 43 | - Abstractions such as Containers and Container Management as they relate to SQL Server and Big Data 44 | - Planning and architecting an on-premises, in-cloud, or hybrid big data solution with SQL Server 45 | - How to install SQL Server big data clusters on-premises and in the Azure Kubernetes Service (AKS) 46 | - How to work with Apache Spark 47 | - The Data Science Process to create an end-to-end solution 48 | - How to work with the tooling for BDC (Azure Data Studio) 49 | - Monitoring and managing the BDC 50 | - Security considerations 51 | 52 | Starting in SQL Server 2019, big data clusters allows for large-scale, near real-time processing of data over the HDFS file system and other data sources. It also leverages the Apache Spark framework which is integrated into one environment for management, monitoring, and security of your environment. This means that organizations can implement everything from queries to analysis to Machine Learning and Artificial Intelligence within SQL Server, over large-scale, heterogeneous data. SQL Server big data clusters can be implemented fully on-premises, in the cloud using a Kubernetes service such as Azure's AKS, and in a hybrid fashion. This allows for full, partial, and mixed security and control as desired. 53 | 54 | The goal of this workshop is to train the team tasked with architecting and implementing SQL Server big data clusters in the planning, creation, and delivery of a system designed to be used for large-scale data analytics. Since there are multiple technologies and concepts within this solution, the workshop uses multiple types of exercises to prepare the students for this implementation. 55 | 56 | The concepts and skills taught in this workshop form the starting points for: 57 | 58 | * Data Professionals and DevOps teams, to implement and operate a SQL Server big data cluster system. 59 | * Solution Architects and Developers, to understand how to put together an end-to-end solution. 60 | * Data Scientists, to understand the environment used to analyze and solve specific predictive problems. 61 | 62 |

63 |

Business Applications of this Workshop

64 | 65 | Businesses require near real-time insights from ever-larger sets of data from a variety of sources. Large-scale data ingestion requires scale-out storage and processing in ways that allow fast response times. In addition to simply querying this data, organizations want full analysis and even predictive capabilities over their data. 66 | 67 | Some industry examples of big data processing are in Retail (*Demand Prediction, Market-Basket Analysis*), Finance (*Fraud detection, customer segmentation*), Healthcare (*Fiscal control analytics, Disease Prevention prediction and classification, Clinical Trials optimization*), Public Sector (*Revenue prediction, Education effectiveness analysis*), Manufacturing (*Predictive Maintenance, Anomaly Detection*) and Agriculture (*Food Safety analysis, Crop forecasting*) to name just a few. 68 | 69 |

70 | 71 |

Technologies used in this Workshop

72 | 73 | The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided. 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 |
Technology Description
LinuxOperating system used in Containers and Container Orchestration
ContainersEncapsulation level for the SQL Server big data cluster architecture
Container Orechestration (such as Kubernetes)Management, control plane for Containers
Microsoft AzureCloud environment for services
Azure Kubernetes Service (AKS)Kubernetes as a Service
Apache HDFSScale-out storage subsystem
Apache KnoxThe Knox Gateway provides a single access point for all REST interactions, used for security
Apache LivyJob submission system for Apache Spark
Apache SparkIn-memory large-scale, scale-out data processing architecture used by SQL Server
Python, R, Java, SparkMLML/AI programming languages used for Machine Learning and AI Model creation
Azure Data StudioTooling for SQL Server, HDFS, Big Data cluster management, T-SQL, R, Python, and SparkML languages
SQL Server Machine Learning ServicesR, Python and Java extensions for SQL Server
Microsoft Data Science Process (TDSP)Project, Development, Control and Management framework
Monitoring and ManagementDashboards, logs, API's and other constructs to manage and monitor the solution
SecurityRBAC, Keys, Secrets, VNETs and Compliance for the solution
96 | 97 |

98 | 99 | Condensed Lab: 100 | If you have already completed the pre-requisites for this course and are familiar with the technologies listed above, you can jump to a Jupyter Notebooks-based tutorial located here. Load these with Azure Data Studio, starting with bdc_tutorial_00.ipynb. 101 |

102 | 103 |

104 | 105 |

Before Taking this Workshop

106 | 107 | You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution. 108 | 109 | You must have a Microsoft Azure account with the ability to create assets, specifically the Azure Kubernetes Service (AKS). 110 | 111 | This workshop expects that you understand data structures and working with SQL Server and computer networks. This workshop does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform. 112 | 113 | If you are new to these, here are a few references you can complete prior to class: 114 | 115 | - [Microsoft SQL Server](https://docs.microsoft.com/en-us/sql/relational-databases/database-engine-tutorials?view=sql-server-ver15) 116 | - [Microsoft Azure](https://docs.microsoft.com/en-us/learn/paths/azure-fundamentals/) 117 | 118 | 119 |

Setup

120 | 121 | A full prerequisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient). 122 | 123 |

124 | 125 |

Workshop Details

126 | 127 | This workshop uses Azure Data Studio, Microsoft Azure AKS, and SQL Server (2019 and higher) with a focus on architecture and implementation. 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 |
Primary Audience:System Architects and Data Professionals tasked with implementing Big Data, Machine Learning and AI solutions
Secondary Audience: Security Architects, Developers, and Data Scientists
Level: 300
Type:In-Person
Length: 8-9 hours
138 | 139 |

140 | 141 |

Related Workshops

142 | 143 | - [Technical guide to the Cortana Intelligence Solution Template for predictive maintenance in aerospace and other businesses](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/cortana-analytics-technical-guide-predictive-maintenance) 144 | 145 |

146 | 147 |

Workshop Modules

148 | 149 | This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution. 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 |
ModuleTopics
01 - The Big Data Landscape Overview of the workshop, problem space, solution options and architectures
02 - SQL Server BDC Components Abstraction levels, frameworks, architectures and components within SQL Server big data clusters
03 - Planning, Installation
and Configuration
Mapping the requirements to the architecture design, constraints, and diagrams
04 - Operationalization Connecting applications to the solution; DDL, DML, DCL
05 - Management and
Monitoring
Tools and processes to manage the big data cluster
06 - Security Access and Authentication to the various levels of the solution
163 | 164 |

165 | 166 |

Next Steps

167 | 168 | Next, Continue to prerequisites 169 | 170 | 171 | # Contributing 172 | 173 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 174 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 175 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. 176 | 177 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide 178 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions 179 | provided by the bot. You will only need to do this once across all repos using our CLA. 180 | 181 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 182 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 183 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 184 | 185 | # Legal Notices 186 | 187 | ### License 188 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), see [the LICENSE file](https://github.com/MicrosoftDocs/mslearn-tailspin-spacegame-web/blob/master/LICENSE), and grant you a license to any code in the repository under [the MIT License](https://opensource.org/licenses/MIT), see the [LICENSE-CODE file](https://github.com/MicrosoftDocs/mslearn-tailspin-spacegame-web/blob/master/LICENSE-CODE). 189 | 190 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation 191 | may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. 192 | The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. 193 | Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653. 194 | 195 | Privacy information can be found at https://privacy.microsoft.com/en-us/ 196 | 197 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, 198 | or trademarks, whether by implication, estoppel or otherwise. 199 | 200 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd). 40 | 41 | 42 | -------------------------------------------------------------------------------- /SQL2019BDC/00 - Prerequisites.md: -------------------------------------------------------------------------------- 1 | ![](../graphics/microsoftlogo.png) 2 | 3 | # Workshop: SQL Server Big Data Clusters - Architecture 4 | 5 | #### A Microsoft Course from the SQL Server team 6 | 7 |

8 | 9 |

00 prerequisites

10 | 11 | This workshop is taught using the following components, which you will install and configure in the sections that follow. 12 | 13 | *(Note: Due to the nature of working with large-scale systems, it may not be possible for you to set up everything you need to perform each lab exercise. Participation in each Activity is optional - we will be working through the exercises together, but if you cannot install any software or don't have an Azure account, the instructor will work through each exercise in the workshop. You will also have full access to these materials so that you can work through them later when you have more time and resources.)* 14 | 15 | For this workshop, you will use Microsoft Windows as the base workstation, although Apple and Linux operating systems can be used in production. You can download a Windows 10 Workstation `.ISO` to create a Virtual Machine on the Hypervisor of your choice for free here. 16 | 17 | The other requirements are: 18 | 19 | - **Microsoft Azure**: This workshop uses the Microsoft Azure platform to host the Kubernetes cluster (using the Azure Kubernetes Service), and optionally you can deploy a system there to act as a workstation. You can use an MSDN Account, your own account, or potentially one provided for you, as long as you can create about $100.00 (U.S.) worth of assets. 20 | - **Azure Command Line Interface**: The Azure CLI allows you to work from the command line on multiple platforms to interact with your Azure subscription, and also has control statements for AKS. 21 | - **Python (3)**: Python version 3.5 (and higher) is used by the SQL Server programs to deploy and manage a Big Data Cluster for SQL Server (BDC). 22 | - **The pip3 Package**: The Python package manager *pip3* is used to install various BDC deployment and configuration tools. 23 | - **The kubectl program**: The *kubectl* program is the command-line control feature for Kubernetes. 24 | - **The azdata utility**: The *azdata* program is the deployment and configuration tool for BDC. 25 | - **Azure Data Studio**: The *Azure Data Studio* IDE, along with various Extensions, is used for deploying the system, and querying and management of the BDC. In addition, you will use this tool to participate in the workshop. Note: You can connect to a SQL Server 2019 Big Data Cluster using any SQL Server connection tool or application, such as SQL Server Management Studio, but this course will use Microsoft Azure Data Studio for cluster management, Jupyter Notebooks and other capabilities. 26 | 27 | *Note that all following activities must be completed prior to class - there will not be time to perform these operations during the workshop.* 28 | 29 |

Activity 1: Set up a Microsoft Azure Account

30 | 31 | You have multiple options for setting up Microsoft Azure account to complete this workshop. You can use a Microsoft Developer Network (MSDN) account, a personal or corporate account, or in some cases a pass may be provided by the instructor. (Note: for most classes, the MSDN account is best) 32 | 33 | **If you are attending this course in-person:** 34 | Unless you are explicitly told you will be provided an account by the instructor in the invitation to this workshop, you must have your Microsoft Azure account and Data Science Virtual Machine set up before you arrive at class. There will NOT be time to configure these resources during the course. 35 | 36 |

Option 1 - Microsoft Developer Network Account (MSDN) Account

37 | 38 | The best way to take this workshop is to use your [Microsoft Developer Network (MSDN) benefits if you have a subscription](https://marketplace.visualstudio.com/subscriptions). 39 | 40 | - [Open this resource and click the "Activate your monthly Azure credit" button](https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/) 41 | 42 |

Option 2 - Use Your Own Account

43 | 44 | You can also use your own account or one provided to you by your organization, but you must be able to create a resource group and create, start, and manage a Virtual Machine and an Azure AKS cluster. 45 | 46 |

Option 3 - Use an account provided by your instructor

47 | 48 | Your workshop invitation may have instructed you that they will provide a Microsoft Azure account for you to use. If so, you will receive instructions that it will be provided. 49 | 50 | **Unless you received explicit instructions in your workshop invitations, you much create either an MSDN or Personal account. You must have an account prior to the workshop.** 51 | 52 |

Activity 2: Prepare Your Workstation

53 |
54 | The instructions that follow are the same for either a "base metal" workstation or laptop, or a Virtual Machine. It's best to have at least 4GB of RAM on the management system, and these instructions assume that you are not planning to run the database server or any Containers on the workstation. It's also assumed that you are using a current version of Windows, either desktop or server. 55 |
56 | 57 | *(You can copy and paste all of the commands that follow in a PowerShell window that you run as the system Administrator)* 58 | 59 |

Updates

60 | 61 | First, ensure all of your updates are current. You can use the following commands to do that in an Administrator-level PowerShell session: 62 | 63 |

 64 | write-host "Standard Install for Windows. Classroom or test system only - use at your own risk!"
 65 | Set-ExecutionPolicy RemoteSigned
 66 | 
 67 | write-host "Update Windows"
 68 | Install-Module PSWindowsUpdate
 69 | Import-Module PSWindowsUpdate
 70 | Get-WindowsUpdate
 71 | Install-WindowsUpdate
 72 | 
73 | 74 | *Note: If you get an error during this update process, evaluate it to see if it is fatal. You may receive certain driver errors if you are using a Virtual Machine, this can be safely ignored.* 75 | 76 |

Install Big Data Cluster Tools

77 | 78 | Next, install the tools to work with Big Data Clusters: 79 | 80 | 81 |

Activity 3: Install BDC Tools

82 | 83 | Open this resource, and follow all instructions for the Microsoft Windows operating system 84 | 85 | 86 | **NOTE:** For the `azdata` utility step below, [use this MSI package](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-install-azdata-installer?view=sql-server-ver15) rather than the `pip` installer. 87 | 88 | - [https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15) 89 | 90 |

Activity 4: Re-Update Your Workstation

91 | 92 | Once again, download the MSI and run it from there. It's always a good idea after this many installations to run Windows Update again: 93 | 94 |
 95 | write-host "Re-Update Windows"
 96 | Get-WindowsUpdate
 97 | Install-WindowsUpdate
 98 | 
99 | 100 | *Note 1: If you get an error during this update process, evaluate it to see if it is fatal. You may receive certain driver errors if you are using a Virtual Machine, this can be safely ignored.* 101 | 102 | **Note 2: If you are using a Virtual Machine in Azure, power off the Virtual Machine using the Azure Portal every time you are done with it. Turning off the VM using just the Windows power off in the VM only stops it running, but you are still charged for the VM if you do not stop it from the Portal. Stop the VM from the Portal unless you are actively using it.** 103 | 104 |

For Further Study

105 | 108 | 109 |

Next Steps

110 | 111 | Next, Continue to 01 - The Big Data Landscape. 112 | -------------------------------------------------------------------------------- /SQL2019BDC/03 - Planning, Installation and Configuration.md: -------------------------------------------------------------------------------- 1 | ![](../graphics/microsoftlogo.png) 2 | 3 | # Workshop: SQL Server Big Data Clusters - Architecture 4 | 5 | #### A Microsoft workshop from the SQL Server team 6 | 7 |

8 | 9 |

Planning, Installation and Configuration

10 | 11 | In this workshop you'll cover using a Process and various Platform components to create a Big Data Cluster for SQL Server (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic. 12 | 13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.) 14 | 15 | You'll cover the following topics in this Module: 16 | 17 |
18 | 19 |
3.0 Planning your Installation
20 |
3.1 Installing on Azure Kubernetes Service
21 |
3.2 Installing locally using KubeADM
22 |
Install Class Environment on AKS
23 | 24 |
25 | 26 |

27 | 28 |

3.0 Planning your Installation

29 | 30 | NOTE: The following Module is based on the Public Preview of the Microsoft SQL Server 2019 big data cluster feature. These instructions will change as the product is updated. The latest installation instructions are located here. 31 | 32 | A Big Data Cluster for SQL Server (BDC) is deployed onto a Cluster Orchestration system (such as Kubernetes or OpenShift) using the `azdata` utility which creates the appropriate Nodes, Pods, Containers and other constructs for the system. The installation uses various switches on the `azdata` utility, and reads from several variables contained within an internal JSON document when you run the command. Using a switch, you can change these variables. You can also dump the entire document to a file, edit it, and then call the installation that uses that file with the `azdata` command. More detail on that process is located here. 33 | 34 | For planning, it is essential that you understand the SQL Server BDC components, and have a firm understanding of Kubernetes and TCP/IP networking. You should also have an understanding of how SQL Server and Apache Spark use the "Big Four" (*CPU, I/O, Memory and Networking*). 35 | 36 | Since the Cluster Orchestration system is often made up of Virtual Machines that host the Container Images, they must be as large as possible. For the best possible performance, large physical machines that are tuned for optimal performance is a recommended physical architecture. The least viable production system is a Minimum of 3 Linux physical machines or virtual machines. The recommended configuration per machine is 8 CPUs, 32 GB of memory and 100GB of storage. This configuration would support only one or two users with a standard workload, and you would want to increase the system for each additional user or heavier workload. 37 | 38 | 39 | You can deploy Kubernetes in a few ways: 40 | 41 | - In a Cloud Platform such as Azure Kubernetes Service (AKS) 42 | 43 | - In your own Cluster Orchestration system deployment using the appropriate tools such as `KubeADM` 44 | 45 | Regardless of the Cluster Orchestration system target, the general steps for setting up the system are: 46 | 47 | - Set up Cluster Orchestration system with a Cluster target 48 | 49 | - Install the cluster tools on the administration machine 50 | 51 | - Deploy the BDC onto the Cluster Orchestration system 52 | 53 | In the sections that follow, you'll cover the general process for each of these deployments. The official documentation referenced above have the specific steps for each deployment, and the *Activity* section of this Module has the steps for deploying the BDC on AKS for the classroom environment. 54 | 55 |

56 | 57 |

3.1 Installing on Azure Kubernetes Service

58 | 59 | The Azure Kubernetes Service provides the ability to create a Kubernetes cluster in the Azure portal, with the Azure CLI, or template driven deployment options such as Resource Manager templates and Terraform. When you deploy an AKS cluster, the Kubernetes master and all nodes are deployed and configured for you. Additional features such as advanced networking, Azure Active Directory integration, and monitoring can also be configured during the deployment process. 60 | 61 | An AKS cluster is divided into two components: The *Cluster master nodes* which provide the core Kubernetes services and orchestration of application workloads; and the *Nodes* which run your application workloads. 62 | 63 |
64 | 65 |
66 | 67 | The cluster master includes the following core Kubernetes components: 68 | 69 | - *kube-apiserver* - The API server is how the underlying Kubernetes APIs are exposed. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard. 70 | 71 | - *etcd* - To maintain the state of your Kubernetes cluster and configuration, the highly available etcd is a key value store within Kubernetes. 72 | 73 | - *kube-scheduler* - When you create or scale applications, the Scheduler determines what nodes can run the workload and starts them. 74 | 75 | - *kube-controller-manager* - The Controller Manager oversees a number of smaller Controllers that perform actions such as replicating pods and handling node operations. 76 | 77 | The Nodes include the following components: 78 | 79 | - The *kubelet* is the Kubernetes agent that processes the orchestration requests from the cluster master and scheduling of running the requested containers. 80 | 81 | - Virtual networking is handled by the *kube-proxy* on each node. The proxy routes network traffic and manages IP addressing for services and pods. 82 | 83 | - The *container runtime* is the component that allows containerized applications to run and interact with additional resources such as the virtual network and storage. In AKS, Docker is used as the container runtime. 84 | 85 |
86 | 87 |
88 | 89 | In the BDC in an AKS environment, for an optimal experience while validating basic scenarios, you should use at least three agent VMs with at least 4 vCPUs and 32 GB of memory each. 90 | 91 | With this background, you can find the latest specific steps to deploy a BDC on AKS here. 92 | 93 |

94 | 95 |

3.2 Installing Locally Using KubeADM

96 | 97 | If you choose Kubernetes as your Cluster Orchestration system, the kubeadm toolbox helps you bootstrap a Kubernetes cluster that conforms to best practices. Kubeadm also supports other cluster lifecycle functions, such as upgrades, downgrade, and managing bootstrap tokens. 98 | 99 | The kubeadm toolbox can deploy a Kubernetes cluster to physical or virtual machines. It works by specifying the TCP/IP addresses of the targets. 100 | 101 | With this background, you can find the latest specific steps to deploy a BDC using kubeadm here. 102 | 103 |

104 | 105 |

Activity: Check Class Environment on AKS

106 | 107 | In this lab you will check your deployment you performed in Module 01 of the BDC on the Azure Kubernetes Service. 108 | 109 | Using the following steps, you will evaluate your Resource Group in Azure that holds your BDC on AKS that you deployed earlier. When you complete your course you can delete this Resource Group which will stop the Azure charges for this course. 110 | 111 | Steps 112 | 113 |

Log in to the Azure Portal, and locate the Resource Groups deployed for the AKS cluster. How many do you find? What do you think their purposes are?

114 | 115 |
116 |

117 |
118 | 119 |

For Further Study

120 | 123 | 124 |

Next Steps

125 | 126 | Next, Continue to Operationalization. 127 | -------------------------------------------------------------------------------- /SQL2019BDC/04 - Operationalization.md: -------------------------------------------------------------------------------- 1 | ![](../graphics/microsoftlogo.png) 2 | 3 | # Workshop: SQL Server Big Data Clusters - Architecture 4 | 5 | #### A Microsoft Course from the SQL Server team 6 | 7 |

8 | 9 |

Operationalization

10 | 11 | In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic. 12 | 13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.) 14 | 15 | You'll cover the following topics in this Module: 16 | 17 |
18 |
4.0 End-To-End Solution for big data clusters
19 |
4.1 Data Virtualization
20 |
4.2 Creating a Distributed Data solution using big data clusters
21 |
4.3 Querying HDFS Data using big data clusters
22 |
23 | 24 |
25 |

26 |
27 | 28 |

4.0 End-To-End Solution for BDC

29 | 30 | Recall from The Big Data Landscape module that you learned about the Wide World Importers company. Wide World Importers (WWI) is a traditional brick and mortar business with a long track record of success, generating profits through strong retail store sales of their unique offering of affordable products from around the world. They have a traditional N-tier application that uses a front-end (mobile, web and installed) that interacts with a scale-out middle-tier software product, which in turn stores data in a large SQL Server database that has been scaled-up to meet demand. 31 | 32 |
33 | 34 |
35 | 36 | WWI has now added web and mobile commerce to their platform, which has generated a significant amount of additional data, and data formats. These new platforms were added without integrating into the OLTP system data or Business Intelligence infrastructures. As a result, "silos" of data stores have developed, and ingesting all of this data exceeds the scale of their current RDBMS server: 37 | 38 |
39 | 40 |
41 | 42 | This presented the following four challenges - the IT team at WWI needs to: 43 | 44 | - Scale data systems to reach more consumers 45 | 46 | - Unlock business insights from multiple sources of structured and unstructured data 47 | 48 | - Apply deep analytics with high-performance responses 49 | 50 | - Enable AI into apps to actively engage with customers 51 | 52 |
53 |

54 |
55 | 56 |

Solution - Challenge 1: Scale Data System

57 | 58 | To meet these challenges, the following solution is proposed. Using the BDC platform you learned about in the 02 - BDC Components Module, the solution allows the company to keep it's current codebase, while enabling a flexible scale-out architecture. This answers the first challenge of working with a scale-out system for larger data environments. 59 | 60 | The following diagram illustrates the complete solution that you can use to brief your audience with: 61 | 62 |
63 | 64 |
65 | 66 | In the following sections you'll dive deeper into how this scale is used to solve the rest of the challenges. 67 | 68 |

69 | 70 |

4.1 Data Virtualization - Challenge 2: Multiple Data Sources

71 | 72 | The next challenge the IT team must solve is to enable a single data query to work across multiple disparate systems, optionally joining to internal SQL Server Tables, and also at scale. 73 | 74 | Using the Data Virtualization capability you saw in the 02 - SQL Server BDC Components Module, the IT team creates External Tables using the PolyBase feature. These External Table definitions are stored in the database on the SQL Server Master Instance within the cluster. When queried by the user, the queries are engaged from the SQL Server Master Instance through the Compute Pool in the SQL Server BDC, which holds Kubernetes Nodes containing the Pods running SQL Server Instances. These Instances send the query to the PolyBase Connector at the target data system, which processes the query based on the type of target system. The results are processed and returned through the PolyBase Connector to the Compute Pool and then on to the Master Instance, and then on to the user. 75 | 76 |
77 | 78 |
79 | 80 | This process allows not only a query to disparate systems, but also those remote systems can hold extremely large sets of data. Normally you are querying a subset of that data, so the results are all that are sent back over the network. These results can be joined with internal tables for a single view, and all from within the same Transact-SQL statements. 81 | 82 |

Activity: Load and query data in an External Table

83 | 84 | In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to query the data in HDFS. This process is similar to connecting to any PolyBase target. 85 | 86 | Steps 87 | 88 |

Open this reference, and perform all of the instructions you see there. This loads your data in preparation for the next Activity.

89 |

Open this reference, and perform all of the instructions you see there. This step shows you how to create and query an External table.

90 |

(Optional) Open this reference, and review the instructions you see there. (You You must have an Oracle server that your BDC can reach to perform these steps, although you can review them if you do not)

91 | 92 |
93 |

94 |
95 | 96 |

4.2 Creating a Distributed Data solution using big data cluster - Challenge 3: Deep Analytics

97 | 98 | Ad-hoc queries are very useful for many scenarios. There are times when you would like to bring the data into storage, so that you can create denormalized representations of datasets, aggregated data, and other purpose-specific data tasks. 99 | 100 |
101 | 102 |
103 | 104 | Using the Data Virtualization capability you saw in the 02 - BDC Components Module, the IT team creates External Tables using PolyBase statements. These External Table definitions are stored in the database on the SQL Server Master Instance within the cluster. When queried by the user, the queries are engaged from the SQL Server Master Instance through the Compute Pool in the SQL Server BDC, which holds Kubernetes Nodes containing the Pods running SQL Server Instances. These Instances send the query to the PolyBase Connector at the target data system, which processes the query based on the type of target system. The results are processed and returned through the PolyBase Connector to the Compute Pool and then on to the Master Instance, and the PolyBase statements can specify the target of the Data Pool. The SQL Server Instances in the Data Pool store the data in a distributed fashion across multiple databases, called Shards. 105 | 106 |

Activity: Load and query data into the Data Pool

107 | 108 | In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to load data into the Data Pool. 109 | 110 | Steps 111 | 112 |

Open this reference, and perform the instructions you see there. This loads data into the Data Pool.

113 |
114 |

115 |
116 | 117 |

4.3 Querying HDFS Data using big data cluster - Challenge 4: Enable AI

118 | 119 | There are three primary uses for a large cluster of data processing systems for Machine Learning and AI applications. The first is that the users will involved in the creation of the Features used in various ML and AI algorithms, and are often tasked to Label the data. These users can access the Data Pool and Data Storage data stores directly to query and assist with this task. 120 | 121 | The SQL Server Master Instance in the BDC installs with Machine Learning Services, which allow creation, training, evaluation and persisting of Machine Learning Models. Data from all parts of the BDC are available, and Data Science oriented languages and libraries in R, Python and Java are enabled. In this scenario, the Data Scientist creates the R or Python code, and the Transact-SQL Developer wraps that code in a Stored Procedure. This code can be used to train, evaluate and create Machine Learning Models. The Models can be stored in the Master Instance for scoring, or sent on to the App Pool where the Machine Learning Server is running, waiting to accept REST-based calls from applications. 122 | 123 |
124 | 125 |
126 | 127 | The Data Scientist has another option to create and train ML and AI models. The Spark platform within the Storage Pool is accessible through the Knox gateway, using Livy to send Spark Jobs as you learned about in the 02 - SQL Server BDC Components Module. This gives access to the full Spark platform, using Jupyter Notebooks (included in Azure Data Studio) or any other standard tools that can access Spark through REST calls. 128 | 129 | 130 |
131 |

Activity: Load data with Spark, run a Spark Notebook

132 |
133 | 134 | In this activity, you will load the sample data into your big data cluster environment using Spark, and use a Notebook in Azure Data Studio to work with it. 135 | 136 | Steps 137 | 138 |

Open this reference, and follow the instructions you see there. This loads the data in preparation for the Notebook operations.

139 |

Open this reference, and follow the instructions you see there. This simple example shows you how to work with the data you ingested into the Storage Pool using Spark.

140 | 141 |
142 |

143 |
144 | 145 |

For Further Study

146 | 153 | 154 |

Next Steps

155 | 156 | Next, Continue to Management and Monitoring. 157 | -------------------------------------------------------------------------------- /SQL2019BDC/05 - Management and Monitoring.md: -------------------------------------------------------------------------------- 1 | ![](../graphics/microsoftlogo.png) 2 | 3 | # Workshop: SQL Server Big Data Clusters - Architecture 4 | 5 | #### A Microsoft workshop from the SQL Server team 6 | 7 |

8 | 9 |

Management and Monitoring

10 | 11 | In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic. 12 | 13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.) 14 | 15 | You'll cover the following topics in this Module: 16 | 17 |
18 | 19 |
5.0 Managing and Monitoring Your Solution
20 |
5.1 Using kubectl commands
21 |
5.2 Using azdata commands
22 |
5.3 Using Grafana and Kibana
23 | 24 |
25 | 26 |

27 | 28 |

5.0 Managing and Monitoring Your Solution

29 | 30 | There are two primary areas for monitoring your BDC deployment. The first deals with SQL Server 2019, and the second deals with the set of elements in the Cluster. 31 | 32 | For SQL Server, management is much as you would normally perform for any SQL Server system. You have the same type of services, surface points, security areas and other control vectors as in a stand-alone installation of SQL Server. The tools you have available for managing the Master Instance in the BDC are the same as managing a stand-alone installation, including SQL Server Management Studio, command-line interfaces, Azure Data Studio, and third party tools. 33 | 34 | For the cluster components, you have three primary interfaces to use, which you will review next. 35 | 36 |

37 | 38 |

5.1 Using kubectl commands

39 | 40 | Since the BDC lives within a Kubernetes cluster, you'll work with the kubectl command to deal with those specific components. The following list is a short version of some of the commands you can use to manage and monitor the BDC implementation of a Kubernetes cluster: 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 62 | 63 | 64 | 65 |
Command Description
az aks get-credentials --name  --resource-group 
Download the Kubernetes cluster configuration file and set the cluster context
kubectl get pods --all-namespaces
Get the status of pods in the cluster for either all namespaces or the big data cluster namespace
kubectl describe pod   -n 
Get a detailed description of a specific pod in json format output. It includes details, such as the current Kubernetes node that the pod is placed on, the containers running within the pod, and the image used to bootstrap the containers. It also shows other details, such as labels, status, and persisted volumes claims that are associated with the pod
kubectl get svc -n 
Get details for the big data cluster services. These details include their type and the IPs associated with respective services and ports. Note that BDC services are created in a new namespace created at cluster bootstrap time based on the cluster name specified in the azdata create cluster command
kubectl describe pod   -n 
Get a detailed description of a service in json format output. It will include details like labels, selector, IP, external-IP (if the service is of LoadBalancer type), port, etc.
kubectl exec -it   -c  -n  -- /bin/bash 
If existing tools or the infrastructure does not enable you to perform a certain task without actually being in the context of the container, you can log in to the container using kubectl exec command. For example, you might need to check if a specific file exists, or you might need to restart services in the container
53 | 54 |
 55 |   kubectl cp pod_name:source_file_path 
 56 |   -c container_name 
 57 |   -n namespace_name 
 58 |   target_local_file_path
 59 |   
60 | 61 |
Copy files from the container to your local machine. Reverse the source and destination to copy into the container
kubectl delete pods  -n  --grace-period=0 --force
For testing availability, resiliency, or data persistence, you can delete a pod to simulate a pod failure with the kubectl delete pods command. Not recommended for production, only to simulate failure
kubectl get pods  -o yaml -n  | grep hostIP
Get the IP of the node a pod is currently running on
66 | 67 | Use this resourceto learn more about these commands for troubleshooting the BDC. 69 | 70 | A full list of the **kubectl** commands is here. 71 | 72 |
73 |

Activity: Discover the IP Address of the BDC Master Installation, and Connect to it with Azure Data Studio

74 |
75 | 76 | In this activity, you will Get the IP Address of the Master Instance in your Cluster, and connect with Azure Data Studio. 77 | 78 | Steps 79 | 80 |

Open this resource, and follow the steps there for the AKS deployments: section.

81 | 82 |
83 |

84 |
85 | 86 |

5.2 Using azdata commands

87 | 88 | The **azdata** utility enables cluster administrators to bootstrap and manage big data clusters via the REST APIs exposed by the Controller service. The controller is deployed and hosted in the same Kubernetes namespace where the customer wants to build out a big data cluster. The Controller is responsible for core logic for deploying and managing a big data cluster. 89 | 90 | The Controller service is installed by a Kubernetes administrator during cluster bootstrap, using the azdata command-line utility. 91 | 92 | You can find a list of the switches and commands by typing: 93 | 94 |
 95 | azdata --h
 96 | 
97 | 98 | You used the azdata commands to deploy your cluster, and you can use it to get information about your bdc deployment as well. You should review the documentation for this command here. 99 | 100 |
101 |

102 |
103 | 104 |

5.3 Using Grafana and Kibana

105 | 106 | You learned about Grafana and Kibana systems in Module 01, Microsoft has created various views within each that you can use to interact with both the SQL Server-specific and Kubernetes portions of the BDC. The Azure Data Studio big data clusters management panel shows the TCP/IP addresses for each of these systems. 107 | 108 |
109 |

110 |
111 |
112 |

113 |
114 |
115 |

116 |
117 | 118 |

Activity: Start dashboard when cluster is running in AKS 119 |

120 | 121 | To launch the Kubernetes dashboard run the following commands: 122 | 123 |
124 | az aks browse --resource-group  --name 
125 | 
126 | 127 | Note: 128 | 129 | If you get the following error: 130 | 131 |
Unable to listen on port 8001: All listeners failed to create with the following errors: Unable to create listener: Error listen tcp4 127.0.0.1:8001: >bind: Only one usage of each socket address (protocol/network address/port) is normally permitted. Unable to create listener: Error listen tcp6: address [[::1]]:8001: missing port in >address error: Unable to listen on any of the requested ports: [{8001 9090}]
132 | 
133 | 134 | make sure you did not start the dashboard already from another window. 135 | 136 | When you launch the dashboard on your browser, you might get permission warnings due to RBAC being enabled by default in AKS clusters, and the service account used by the dashboard does not have enough permissions to access all resources (for example, pods is forbidden: User "system:serviceaccount:kube-system:kubernetes-dasboard" cannot list pods in the namespace "default"). Run the following command to give the necessary permissions to kubernetes-dashboard, and then restart the dashboard: 137 | 138 |
139 | kubectl create clusterrolebinding kubernetes-dashboard -n kube-system --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard
140 | 
141 | 142 |

143 | 144 |

For Further Study

145 | 150 | 151 |

Next Steps

152 | 153 | Next, Continue to Security. 154 | -------------------------------------------------------------------------------- /SQL2019BDC/06 - Security.md: -------------------------------------------------------------------------------- 1 | ![](../graphics/microsoftlogo.png) 2 | 3 | # Workshop: SQL Server Big Data Clusters - Architecture (CTP 3.2) 4 | 5 | #### A Microsoft workshop from the SQL Server team 6 | 7 |

8 | 9 |

Security

10 | 11 | In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic. 12 | 13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.) 14 | 15 | You'll cover the following topics in this Module: 16 | 17 |
18 | 19 |
6.0 Managing BDC Security
20 |
6.1 Access
21 |
6.2 Authentication and Authorization
22 | 23 |
24 | 25 |

26 | 27 |

6.0 Managing BDC Security

28 | 29 | Authentication is the process of verifying the identity of a user or service and ensuring they are who they are claiming to be. Authorization refers to granting or denying of access to specific resources based on the requesting user's identity. This step is performed after a user is identified through authentication. 30 | 31 | *NOTE: Security will change prior to the General Availability (GA) Release. Active Directory integration is planned for production implementations.* 32 | 33 |

34 | 35 |

6.1 Access

36 | 37 | There are three endpoints for entry points to the BDC: 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
Endpoint Description
HDFS/Spark (Knox) gatewayAn HTTPS-based endpoint that proxies other endpoints. The HDFS/Spark gateway is used for accessing services like webHDFS and Livy. Wherever you see references to Knox, this is the endpoint
Controller endpointThe endpoint for the BDC management service that exposes REST APIs for managing the cluster. Some tools, such as Azure Data Studio, access the system using this endpoint
Master InstanceGet a detailed description of a specific pod in json format output. It includes details, such as the current Kubernetes node that the pod is placed on, the containers running within the pod, and the image used to bootstrap the containers. It also shows other details, such as labels, status, and persisted volumes claims that are associated with the pod
48 | 49 | You can see these endpoints in this diagram: 50 | 51 |
52 | 53 |
54 | 55 |

56 |
57 | 58 |

6.2 Authentication and Authorization

59 | 60 | When you create the cluster, a number of logins are created. Some of these logins are for services to communicate with each other, and others are for end users to access the cluster. 61 | Non-SQL Server End-user passwords currently are set using environment variables. These are passwords that cluster administrators use to access services: 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 |
Use Variable
Controller username
CONTROLLER_USERNAME=controller_username
Controller password
CONTROLLER_PASSWORD=controller_password
SQL Master SA password
MSSQL_SA_PASSWORD=controller_sa_password
Password for accessing the HDFS/Spark endpoint
KNOX_PASSWORD=knox_password
73 | 74 | 75 | Intra-cluster authentication 76 | Upon deployment of the cluster, a number of SQL logins are created: 77 | 78 | A special SQL login is created in the Controller SQL instance that is system managed, with sysadmin role. The password for this login is captured as a K8s secret. A sysadmin login is created in all SQL instances in the cluster, that Controller owns and manages. It is required for Controller to perform administrative tasks, such as HA setup or upgrade, on these instances. These logins are also used for intra-cluster communication between SQL instances, such as the SQL master instance communicating with a data pool. 79 | 80 | Note: In current release, only basic authentication is supported. Fine-grained access control to HDFS objects, the BDC compute and data pools, is not yet available. 81 | 82 | For Intra-cluster communication with non-SQL services within the BDC, such as Livy to Spark or Spark to the storage pool, security uses certificates. All SQL Server to SQL Server communication is secured using SQL logins. 83 | 84 |
85 |

Activity: Review Security Endpoints

86 |
87 | 88 | In this activity, you will review the endpoints exposed on the cluster. 89 | 90 | Steps 91 | 92 |

Open this reference, and read the information you see for the Service Endpoints section. This shows the addresses and ports exposed to the end-users.

93 | 94 |
95 |

96 |
97 | 98 |

For Further Study

99 | 102 | 103 | Congratulations! You have completed this workshop on SQL Server big data clusters Architecture. You now have the tools, assets, and processes you need to extrapolate this information into other applications. 104 | -------------------------------------------------------------------------------- /SQL2019BDC/notebooks/README.md: -------------------------------------------------------------------------------- 1 | ![](graphics/microsoftlogo.png) 2 | 3 | # Lab: SQL Server Big Data Clusters - Architecture 4 | 5 | #### A Microsoft Course from the SQL Server team 6 | 7 |

8 | 9 |

About this Lab

10 | 11 | Welcome to this Microsoft solutions Lab on the architecture on SQL Server Big Data Clusters. As part of a larger complete Workshop, you'll experiment with SQL Server Big Data Clusters (BDC), and how you can use it to implement large-scale data processing and machine learning. 12 | 13 | This Lab assumes you have a full understanding the concepts of big data analytics, the technologies (such as containers, Kubernetes, Spark and HDFS, machine learning, and other technologies) that you will use throughout the Lab, the architecture of a BDC. If you are familiar with these topics, you can take a complete course here. 14 | 15 | In this Lab you'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization. 16 | 17 |

18 | 19 |

Before Taking this Lab

20 | 21 | This Lab expects that you understand data structures and working with SQL Server and computer networks. This Lab does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform. 22 | 23 | You need to have all of the prerequisites completed before taking this Lab. 24 | 25 | You need a full Big Data Cluster for SQL Server up and running, and have identified the connection endpoints, with all security parameters. You find out how to do that here. 26 | 27 |

28 | 29 |

Lab Notebooks

30 | 31 |

You will work through six Jupyter Notebooks using the Azure Data Studio tool. Download them and open them in Azure Data Studio, running only one cell at a time.

32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 |
NotebookTopics
bdc_tutorial_00.ipynb Overview of the Lab and Setup of the source data, problem space, solution options and architectures
bdc_tutorial_01.ipynb In this tutorial you will learn how to run standard SQL Server Queries against the Master Instance (MI) in a SQL Server big data cluster.
bdc_tutorial_02.ipynb In this tutorial you will learn how to create and query Virtualized Data in a SQL Server big data cluster.
bdc_tutorial_03.ipynb In this tutorial you will learn how to create and query a Data Mart using Virtualized Data in a SQL Server big data cluster.
bdc_tutorial_04.ipynb In this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster.
bdc_tutorial_05.ipynb In this tutorial you will learn how to work with Spark Machine Learning Jobs in a SQL Server big data cluster.
49 | 50 |

51 | 52 | -------------------------------------------------------------------------------- /SQL2019BDC/notebooks/bdc_tutorial_00.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "python3", 5 | "display_name": "Python 3" 6 | }, 7 | "language_info": { 8 | "name": "python", 9 | "version": "3.6.6", 10 | "mimetype": "text/x-python", 11 | "codemirror_mode": { 12 | "name": "ipython", 13 | "version": 3 14 | }, 15 | "pygments_lexer": "ipython3", 16 | "nbconvert_exporter": "python", 17 | "file_extension": ".py" 18 | } 19 | }, 20 | "nbformat_minor": 2, 21 | "nbformat": 4, 22 | "cells": [ 23 | { 24 | "cell_type": "markdown", 25 | "source": [ 26 | "\"Microsoft\"\r\n", 27 | "
\r\n", 28 | "\r\n", 29 | "# SQL Server 2019 big data cluster Tutorial\r\n", 30 | "## 00 - Scenario Overview and System Setup\r\n", 31 | "\r\n", 32 | "In this set of tutorials you'll work with an end-to-end scenario that uses SQL Server 2019's big data clusters to solve real-world problems. \r\n", 33 | "" 34 | ], 35 | "metadata": { 36 | "azdata_cell_guid": "d495f8be-74c3-4658-b897-ad69e6ed88ac" 37 | } 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "source": [ 42 | "## Wide World Importers\r\n", 43 | "\r\n", 44 | "Wide World Importers (WWI) is a traditional brick and mortar business that makes specialty items for other companies to use in their products. They design, sell and ship these products worldwide.\r\n", 45 | "\r\n", 46 | "WWI corporate has now added a new partnership with a company called \"AdventureWorks\", which sells bicycles both online and in-store. The AdventureWorks company has asked WWI to produce super-hero themed baskets, seats and other bicycle equipment for a new line of bicycles. WWI corporate has asked the IT department to develop a pilot program with these goals: \r\n", 47 | "\r\n", 48 | "- Integrate the large amounts of data from the AdventureWorks company including customers, products and sales\r\n", 49 | "- Allow a cross-selling strategy so that current WWI customers and AdventureWorks customers see their information without having to re-enter it\r\n", 50 | "- Incorporate their online sales information for deeper analysis\r\n", 51 | "- Provide a historical data set so that the partnership can be evaluated\r\n", 52 | "- Ensure this is a \"framework\" approach, so that it can be re-used with other partners\r\n", 53 | "\r\n", 54 | "WWI has a typical N-Tier application that provides a series of terminals, a Business Logic layer, and a Database back-end. They use on-premises systems, and are interested in linking these to the cloud. \r\n", 55 | "\r\n", 56 | "In this series of tutorials, you will build a solution using the scale-out features of SQL Server 2019, Data Virtualization, Data Marts, and the Data Lake features. " 57 | ], 58 | "metadata": { 59 | "azdata_cell_guid": "3815241f-e81e-4cf1-a48e-c6e67b0ccf7c" 60 | } 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "source": [ 65 | "## Running these Tutorials\r\n", 66 | "\r\n", 67 | "- You can read through the output of these completed tutorials if you wish - or:\r\n", 68 | "\r\n", 69 | "- You can follow along with the steps you see in these tutorials by copying the code into a SQL Query window and Spark Notebook using the Azure Data Studio tool, or you can click here to download these Jupyter Notebooks and run them in Azure Data Studio for a hands-on experience.\r\n", 70 | " \r\n", 71 | "- If you would like to run the tutorials, you'll need a SQL Server 2019 big data cluster and the client tools installed. If you want to set up your own cluster, click this reference and follow the steps you see there for the server and tools you need.\r\n", 72 | "\r\n", 73 | "- You will need to have the following: \r\n", 74 | " - Your **Knox Password**\r\n", 75 | " - The **Knox IP Address**\r\n", 76 | " - The `sa` **Username** and **Password** to your Master Instance\r\n", 77 | " - The **IP address** to the SQL Server big data cluster Master Instance \r\n", 78 | " - The **name** of your big data cluster\r\n", 79 | "\r\n", 80 | "For a complete workshop on SQL Server 2019's big data clusters, check out this resource." 81 | ], 82 | "metadata": { 83 | "azdata_cell_guid": "1c3e4b5e-fef4-43ef-a4e3-aa33fe99e25d" 84 | } 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "source": [ 89 | "## Copy Database backups to the SQL Server 2019 big data cluster Master Instance\r\n", 90 | "\r\n", 91 | "The first step for the solution is to copy the database backups from WWI from their location on the cloud and then up to your cluster. \r\n", 92 | "\r\n", 93 | "These commands use the `curl` program to pull the files down. [You can read more about curl here](https://curl.haxx.se/). \r\n", 94 | "\r\n", 95 | "The next set of commands use the `kubectl` command to copy the files from where you downloaded them to the data directory of the SQL Server 2019 bdc Master Instance. [You can read more about kubectl here](https://kubernetes.io/docs/reference/kubectl/overview/). \r\n", 96 | "\r\n", 97 | "Note that you will need to replace the section of the script marked with `` with the name of your SQL Server 2019 bdc. (It does not need single or double quotes, just the letters of your cluster name.)\r\n", 98 | "\r\n", 99 | "Notice also that these commands assume a `c:\\temp` location, if you want to use another drive or directory, edit accordingly.\r\n", 100 | "\r\n", 101 | "Once you have edited these commands, you can open a Command Prompt *(not PowerShell)* on your system and copy and paste each block, one at a time and run them there, observing the output.\r\n", 102 | "\r\n", 103 | "In the next tutorial you will restore these databases on the Master Instance." 104 | ], 105 | "metadata": { 106 | "azdata_cell_guid": "5220c555-f819-409e-b206-de9a2dd6d434" 107 | } 108 | }, 109 | { 110 | "cell_type": "code", 111 | "source": [ 112 | "REM Create a temporary directory for the files\r\n", 113 | "md c:\\temp\r\n", 114 | "cd c:\\temp\r\n", 115 | "\r\n", 116 | "REM Get the database backups\r\n", 117 | "curl \"https://sabwoody.blob.core.windows.net/backups/WideWorldImporters.bak\" -o c:\\temp\\WWI.bak\r\n", 118 | "curl \"https://sabwoody.blob.core.windows.net/backups/AdventureWorks.bak\" -o c:\\temp\\AdventureWorks.bak\r\n", 119 | "curl \"https://sabwoody.blob.core.windows.net/backups/AdventureWorksDW.bak\" -o c:\\temp\\AdventureWorksDW.bak\r\n", 120 | "curl \"https://sabwoody.blob.core.windows.net/backups/WideWorldImportersDW.bak\" -o c:\\temp\\WWIDW.bak\r\n", 121 | "" 122 | ], 123 | "metadata": { 124 | "azdata_cell_guid": "3e1f2304-cc0a-4e0e-96e2-333401b52036" 125 | }, 126 | "outputs": [], 127 | "execution_count": 2 128 | }, 129 | { 130 | "cell_type": "code", 131 | "source": [ 132 | "REM Copy the backups to the data location on the SQL Server Master Instance\r\n", 133 | "cd c:\\temp\r\n", 134 | "kubectl cp WWI.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n", 135 | "kubectl cp WWIDW.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n", 136 | "kubectl cp AdventureWorks.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n", 137 | "kubectl cp AdventureWorksDW.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n", 138 | "" 139 | ], 140 | "metadata": { 141 | "azdata_cell_guid": "19106890-7c6c-4631-9acc-3dda4d2a50ab" 142 | }, 143 | "outputs": [], 144 | "execution_count": null 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "source": [ 149 | "## Copy Exported Data to Storage Pool\r\n", 150 | "\r\n", 151 | "Next, you'll download a few text files that will form the external data to be ingested into the Storage Pool HDFS store. In production environments, you have multiple options for moving data into HDFS, such as Spark Streaming or the Azure Data Factory.\r\n", 152 | "\r\n", 153 | "The first code block creates directories in the HDFS store. The second block downloads the source data from a web location. And in the final block, you'll copy the data from your local system to the SQL Server 2019 big data cluster Storage Pool.\r\n", 154 | "\r\n", 155 | "You need to replace the ``, ``, and potentially the drive letter and directory values with the appropriate information on your system. \r\n", 156 | "> (You can use **CTL-H** to open the Find and Replace dialog in the cell)" 157 | ], 158 | "metadata": { 159 | "azdata_cell_guid": "2c426b35-dc57-4dc8-819d-6642deb69110" 160 | } 161 | }, 162 | { 163 | "cell_type": "code", 164 | "source": [ 165 | "REM Make the Directories in HDFS\r\n", 166 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/product_review_data?op=MKDIRS\"\r\n", 167 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_customers?op=MKDIRS\"\r\n", 168 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_products?op=MKDIRS\"\r\n", 169 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/web_logs?op=MKDIRS\"\r\n", 170 | "" 171 | ], 172 | "metadata": { 173 | "azdata_cell_guid": "f2143d4e-6eb6-4bbc-864a-b417398adc21" 174 | }, 175 | "outputs": [], 176 | "execution_count": null 177 | }, 178 | { 179 | "cell_type": "code", 180 | "source": [ 181 | "REM Get the textfiles \r\n", 182 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/product_reviews.csv\" -o product_reviews.csv\r\n", 183 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/products.csv\" -o products.csv\r\n", 184 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/customers.csv\" -o customers.csv\r\n", 185 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/stockitemholdings.csv\" -o products.csv\r\n", 186 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/web_clickstreams.csv\" -o web_clickstreams.csv\r\n", 187 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/fleet-formatted.csv\" -o fleet-formatted.csv\r\n", 188 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/training-formatted.csv\" -o training-formatted.csv\r\n", 189 | "" 190 | ], 191 | "metadata": { 192 | "azdata_cell_guid": "c8a74514-2e0d-4f3c-99dd-4c541c11e15e" 193 | }, 194 | "outputs": [], 195 | "execution_count": null 196 | }, 197 | { 198 | "cell_type": "code", 199 | "source": [ 200 | "REM Copy the text files to the HDFS directories\r\n", 201 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/product_review_data/product_reviews.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"product_reviews.csv\"\r\n", 202 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_customers/customers.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"customers.csv\"\r\n", 203 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_products/products.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"products.csv\"\r\n", 204 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/web_logs/web_clickstreams.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"web_clickstreams.csv\"\r\n", 205 | "" 206 | ], 207 | "metadata": { 208 | "azdata_cell_guid": "9c9e49ef-ef0d-47c4-92fd-b7e7bfa2d2f2" 209 | }, 210 | "outputs": [], 211 | "execution_count": null 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "source": [ 216 | "## Next Step: Working with the SQL Server 2019 big data cluster Master Instance\r\n", 217 | "\r\n", 218 | "Now you're ready to open the next Python Notebook - [bdc_tutorial_01.ipynb](bdc_tutorial_01.ipynb) - to learn how to work with the SQL Server 2019 bdc Master Instance." 219 | ], 220 | "metadata": { 221 | "azdata_cell_guid": "519aa112-47e0-443b-9b27-05fc02349b09" 222 | } 223 | } 224 | ] 225 | } -------------------------------------------------------------------------------- /SQL2019BDC/notebooks/bdc_tutorial_02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "SQL", 5 | "display_name": "SQL", 6 | "language": "sql" 7 | }, 8 | "language_info": { 9 | "name": "sql", 10 | "version": "" 11 | } 12 | }, 13 | "nbformat_minor": 2, 14 | "nbformat": 4, 15 | "cells": [ 16 | { 17 | "cell_type": "markdown", 18 | "source": "\"Microsoft\"\r\n
\r\n\r\n# SQL Server 2019 big data cluster Tutorial\r\n## 02 - Data Virtualization\r\n\r\nIn this tutorial you will learn how to create and query Virtualized Data in a SQL Server big data cluster. \r\n- You'll start with creating a text file format, since that's the type of data you are reading in. \r\n- Next, you'll create a data source for the SQL Storage Pool, since that allows you to access the HDFS system in BDC. \r\n- Finally, you'll create an External Table, which uses the previous steps to access the data.\r\n", 19 | "metadata": {} 20 | }, 21 | { 22 | "cell_type": "code", 23 | "source": "/* Clean up only - run this cell only if you are repeating the tutorial! */\r\nUSE WideWorldImporters;\r\nGO\r\n\r\nDROP EXTERNAL TABLE partner_customers_hdfs\r\nDROP EXTERNAL FILE FORMAT csv_file\r\nDROP EXTERNAL DATA SOURCE SqlStoragePool\r\n", 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "output_type": "display_data", 28 | "data": { 29 | "text/html": "Commands completed successfully." 30 | }, 31 | "metadata": {} 32 | }, 33 | { 34 | "output_type": "display_data", 35 | "data": { 36 | "text/html": "Commands completed successfully." 37 | }, 38 | "metadata": {} 39 | }, 40 | { 41 | "output_type": "display_data", 42 | "data": { 43 | "text/html": "Total execution time: 00:00:00.179" 44 | }, 45 | "metadata": {} 46 | } 47 | ], 48 | "execution_count": 0 49 | }, 50 | { 51 | "cell_type": "code", 52 | "source": "/* Create External File Format */\r\n\r\nUSE WideWorldImporters;\r\nGO\r\n\r\nCREATE EXTERNAL FILE FORMAT csv_file\r\nWITH (\r\n FORMAT_TYPE = DELIMITEDTEXT,\r\n FORMAT_OPTIONS(\r\n FIELD_TERMINATOR = ',',\r\n STRING_DELIMITER = '0x22',\r\n FIRST_ROW = 2,\r\n USE_TYPE_DEFAULT = TRUE)\r\n);\r\nGO", 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "output_type": "display_data", 57 | "data": { 58 | "text/html": "Commands completed successfully." 59 | }, 60 | "metadata": {} 61 | }, 62 | { 63 | "output_type": "display_data", 64 | "data": { 65 | "text/html": "Commands completed successfully." 66 | }, 67 | "metadata": {} 68 | }, 69 | { 70 | "output_type": "display_data", 71 | "data": { 72 | "text/html": "Total execution time: 00:00:00.257" 73 | }, 74 | "metadata": {} 75 | } 76 | ], 77 | "execution_count": 1 78 | }, 79 | { 80 | "cell_type": "code", 81 | "source": "/* Create External Data Source to the Storage Pool */\r\nCREATE EXTERNAL DATA SOURCE SqlStoragePool\r\nWITH (LOCATION = 'sqlhdfs://controller-svc/default');", 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "output_type": "display_data", 86 | "data": { 87 | "text/html": "Commands completed successfully." 88 | }, 89 | "metadata": {} 90 | }, 91 | { 92 | "output_type": "display_data", 93 | "data": { 94 | "text/html": "Total execution time: 00:00:00.129" 95 | }, 96 | "metadata": {} 97 | } 98 | ], 99 | "execution_count": 2 100 | }, 101 | { 102 | "cell_type": "code", 103 | "source": "/* Create an External Table that can read from the Storage Pool File Location */\r\nCREATE EXTERNAL TABLE [partner_customers_hdfs]\r\n (\"CustomerSource\" VARCHAR(250) \r\n , \"CustomerName\" VARCHAR(250) \r\n , \"EmailAddress\" VARCHAR(250))\r\n WITH\r\n (\r\n DATA_SOURCE = SqlStoragePool,\r\n LOCATION = '/partner_customers',\r\n FILE_FORMAT = csv_file\r\n );\r\nGO", 104 | "metadata": {}, 105 | "outputs": [ 106 | { 107 | "output_type": "display_data", 108 | "data": { 109 | "text/html": "Commands completed successfully." 110 | }, 111 | "metadata": {} 112 | }, 113 | { 114 | "output_type": "display_data", 115 | "data": { 116 | "text/html": "Total execution time: 00:00:00.662" 117 | }, 118 | "metadata": {} 119 | } 120 | ], 121 | "execution_count": 3 122 | }, 123 | { 124 | "cell_type": "code", 125 | "source": "/* Read Data from HDFS using only T-SQL */\r\n\r\nSELECT TOP 10 CustomerSource\r\n, CustomerName\r\n, EMailAddress\r\n FROM [partner_customers_hdfs] hdfs\r\nWHERE EmailAddress LIKE '%wingtip%'\r\nORDER BY CustomerSource, CustomerName;\r\nGO\r\n", 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "output_type": "display_data", 130 | "data": { 131 | "text/html": "(10 rows affected)" 132 | }, 133 | "metadata": {} 134 | }, 135 | { 136 | "output_type": "display_data", 137 | "data": { 138 | "text/html": "Total execution time: 00:00:00.699" 139 | }, 140 | "metadata": {} 141 | }, 142 | { 143 | "output_type": "execute_result", 144 | "metadata": {}, 145 | "execution_count": 5, 146 | "data": { 147 | "application/vnd.dataresource+json": { 148 | "schema": { 149 | "fields": [ 150 | { 151 | "name": "CustomerSource" 152 | }, 153 | { 154 | "name": "CustomerName" 155 | }, 156 | { 157 | "name": "EMailAddress" 158 | } 159 | ] 160 | }, 161 | "data": [ 162 | { 163 | "0": "AdventureWorks", 164 | "1": "Åšani Nair", 165 | "2": "åšani@wingtiptoys.com\r" 166 | }, 167 | { 168 | "0": "AdventureWorks", 169 | "1": "Åšani Sen", 170 | "2": "åšani@wingtiptoys.com\r" 171 | }, 172 | { 173 | "0": "AdventureWorks", 174 | "1": "Aakriti Bhamidipati", 175 | "2": "aakriti@wingtiptoys.com\r" 176 | }, 177 | { 178 | "0": "AdventureWorks", 179 | "1": "Aamdaal Kamasamudram", 180 | "2": "aamdaal@wingtiptoys.com\r" 181 | }, 182 | { 183 | "0": "AdventureWorks", 184 | "1": "Abel Pirvu", 185 | "2": "abel@wingtiptoys.com\r" 186 | }, 187 | { 188 | "0": "AdventureWorks", 189 | "1": "Abhaya Rambhatla", 190 | "2": "abhaya@wingtiptoys.com\r" 191 | }, 192 | { 193 | "0": "AdventureWorks", 194 | "1": "Abhra Thakur", 195 | "2": "abhra@wingtiptoys.com\r" 196 | }, 197 | { 198 | "0": "AdventureWorks", 199 | "1": "Adam Balaz", 200 | "2": "adam@wingtiptoys.com\r" 201 | }, 202 | { 203 | "0": "AdventureWorks", 204 | "1": "Adirake Narkbunnum", 205 | "2": "adirake@wingtiptoys.com\r" 206 | }, 207 | { 208 | "0": "AdventureWorks", 209 | "1": "Adirake Saenamuang", 210 | "2": "adirake@wingtiptoys.com\r" 211 | } 212 | ] 213 | }, 214 | "text/html": "
CustomerSourceCustomerNameEMailAddress
AdventureWorksÅšani Nairåšani@wingtiptoys.com\r
AdventureWorksÅšani Senåšani@wingtiptoys.com\r
AdventureWorksAakriti Bhamidipatiaakriti@wingtiptoys.com\r
AdventureWorksAamdaal Kamasamudramaamdaal@wingtiptoys.com\r
AdventureWorksAbel Pirvuabel@wingtiptoys.com\r
AdventureWorksAbhaya Rambhatlaabhaya@wingtiptoys.com\r
AdventureWorksAbhra Thakurabhra@wingtiptoys.com\r
AdventureWorksAdam Balazadam@wingtiptoys.com\r
AdventureWorksAdirake Narkbunnumadirake@wingtiptoys.com\r
AdventureWorksAdirake Saenamuangadirake@wingtiptoys.com\r
" 215 | } 216 | } 217 | ], 218 | "execution_count": 5 219 | }, 220 | { 221 | "cell_type": "code", 222 | "source": "/* Now Join Those to show customers we currently have in a SQL Server Database \r\nand the Category they qre in the External Table */\r\nUSE WideWorldImporters;\r\nGO\r\n\r\nSELECT TOP 10 a.FullName\r\n , b.CustomerSource\r\n FROM Application.People a\r\n INNER JOIN partner_customers_hdfs b ON a.FullName = b.CustomerName\r\n ORDER BY FullName ASC;\r\n GO", 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "output_type": "display_data", 227 | "data": { 228 | "text/html": "Commands completed successfully." 229 | }, 230 | "metadata": {} 231 | }, 232 | { 233 | "output_type": "display_data", 234 | "data": { 235 | "text/html": "(10 rows affected)" 236 | }, 237 | "metadata": {} 238 | }, 239 | { 240 | "output_type": "display_data", 241 | "data": { 242 | "text/html": "Total execution time: 00:00:00.661" 243 | }, 244 | "metadata": {} 245 | }, 246 | { 247 | "output_type": "execute_result", 248 | "metadata": {}, 249 | "execution_count": 6, 250 | "data": { 251 | "application/vnd.dataresource+json": { 252 | "schema": { 253 | "fields": [ 254 | { 255 | "name": "FullName" 256 | }, 257 | { 258 | "name": "CustomerSource" 259 | } 260 | ] 261 | }, 262 | "data": [ 263 | { 264 | "0": "Aahlada Thota", 265 | "1": "AdventureWorks" 266 | }, 267 | { 268 | "0": "Aakarsha Nookala", 269 | "1": "AdventureWorks" 270 | }, 271 | { 272 | "0": "Aakriti Bhamidipati", 273 | "1": "AdventureWorks" 274 | }, 275 | { 276 | "0": "Aamdaal Kamasamudram", 277 | "1": "AdventureWorks" 278 | }, 279 | { 280 | "0": "Abel Pirvu", 281 | "1": "AdventureWorks" 282 | }, 283 | { 284 | "0": "Abhaya Rambhatla", 285 | "1": "AdventureWorks" 286 | }, 287 | { 288 | "0": "Abhra Thakur", 289 | "1": "AdventureWorks" 290 | }, 291 | { 292 | "0": "Adam Balaz", 293 | "1": "AdventureWorks" 294 | }, 295 | { 296 | "0": "Adam Dvorak", 297 | "1": "AdventureWorks" 298 | }, 299 | { 300 | "0": "Adam Kubat", 301 | "1": "AdventureWorks" 302 | } 303 | ] 304 | }, 305 | "text/html": "
FullNameCustomerSource
Aahlada ThotaAdventureWorks
Aakarsha NookalaAdventureWorks
Aakriti BhamidipatiAdventureWorks
Aamdaal KamasamudramAdventureWorks
Abel PirvuAdventureWorks
Abhaya RambhatlaAdventureWorks
Abhra ThakurAdventureWorks
Adam BalazAdventureWorks
Adam DvorakAdventureWorks
Adam KubatAdventureWorks
" 306 | } 307 | } 308 | ], 309 | "execution_count": 6 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "source": "## Next Steps: Continue on to Working with the SQL Server Data Pool\r\n\r\nNow you're ready to open the next Python Notebook - `bdc_tutorial_03.ipynb` - to learn how to create and work with a Data Mart.", 314 | "metadata": {} 315 | } 316 | ] 317 | } -------------------------------------------------------------------------------- /SQL2019BDC/notebooks/bdc_tutorial_03.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "SQL", 5 | "display_name": "SQL", 6 | "language": "sql" 7 | }, 8 | "language_info": { 9 | "name": "sql", 10 | "version": "" 11 | } 12 | }, 13 | "nbformat_minor": 2, 14 | "nbformat": 4, 15 | "cells": [ 16 | { 17 | "cell_type": "markdown", 18 | "source": "\"Microsoft\"\r\n
\r\n\r\n# SQL Server 2019 big data cluster Tutorial\r\n## 03 - Creating and Querying a Data Mart\r\n\r\nIn this tutorial you will learn how to create and query a Data Mart using Virtualized Data in a SQL Server big data cluster. \r\n\r\nWide World Importers is interested in ingesting the data from web logs from an HDFS source where they have been streamed. They want to be able to analyze the traffic to see if there is a pattern in time, products or locations. \r\n\r\nThe web logs, however, are refreshed periodically. WWI would like to keep the logs in local storage to do deeper analysis. \r\n\r\nIn this Jupyter Notebook you'll create a location to store the log files as a SQL Server Table in the SQL Data Pool, and then fill it by creating an External Table that reads HDFS.", 19 | "metadata": {} 20 | }, 21 | { 22 | "cell_type": "code", 23 | "source": "USE WideWorldImporters;\r\nGO\r\n\r\nCREATE EXTERNAL DATA SOURCE SqlDataPool\r\nWITH (LOCATION = 'sqldatapool://controller-svc/default');", 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "output_type": "display_data", 28 | "data": { 29 | "text/html": "Commands completed successfully." 30 | }, 31 | "metadata": {} 32 | }, 33 | { 34 | "output_type": "display_data", 35 | "data": { 36 | "text/html": "Commands completed successfully." 37 | }, 38 | "metadata": {} 39 | }, 40 | { 41 | "output_type": "display_data", 42 | "data": { 43 | "text/html": "Total execution time: 00:00:00.166" 44 | }, 45 | "metadata": {} 46 | } 47 | ], 48 | "execution_count": 3 49 | }, 50 | { 51 | "cell_type": "code", 52 | "source": "CREATE EXTERNAL TABLE [web_clickstream_clicks_data_pool]\r\n (\"wcs_click_date_sk\" BIGINT \r\n , \"wcs_click_time_sk\" BIGINT \r\n , \"wcs_sales_sk\" BIGINT \r\n , \"wcs_item_sk\" BIGINT\r\n , \"wcs_web_page_sk\" BIGINT \r\n , \"wcs_user_sk\" BIGINT)\r\n WITH\r\n (\r\n DATA_SOURCE = SqlDataPool,\r\n DISTRIBUTION = ROUND_ROBIN\r\n );\r\nGO", 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "output_type": "display_data", 57 | "data": { 58 | "text/html": "Commands completed successfully." 59 | }, 60 | "metadata": {} 61 | }, 62 | { 63 | "output_type": "display_data", 64 | "data": { 65 | "text/html": "Total execution time: 00:00:08.849" 66 | }, 67 | "metadata": {} 68 | } 69 | ], 70 | "execution_count": 4 71 | }, 72 | { 73 | "cell_type": "code", 74 | "source": "/* Create an External Table that can read from the Storage Pool File Location */\r\nIF NOT EXISTS(SELECT * FROM sys.external_tables WHERE name = 'web_clickstreams_hdfs')\r\nBEGIN\r\n CREATE EXTERNAL TABLE [web_clickstreams_hdfs]\r\n (\"wcs_click_date_sk\" BIGINT \r\n , \"wcs_click_time_sk\" BIGINT \r\n , \"wcs_sales_sk\" BIGINT \r\n , \"wcs_item_sk\" BIGINT\r\n , \"wcs_web_page_sk\" BIGINT \r\n , \"wcs_user_sk\" BIGINT)\r\n WITH\r\n (\r\n DATA_SOURCE = SqlStoragePool,\r\n LOCATION = '/web_logs',\r\n FILE_FORMAT = csv_file\r\n );\r\nEND", 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "output_type": "display_data", 79 | "data": { 80 | "text/html": "Commands completed successfully." 81 | }, 82 | "metadata": {} 83 | }, 84 | { 85 | "output_type": "display_data", 86 | "data": { 87 | "text/html": "Total execution time: 00:00:00.223" 88 | }, 89 | "metadata": {} 90 | } 91 | ], 92 | "execution_count": 5 93 | }, 94 | { 95 | "cell_type": "code", 96 | "source": "BEGIN\r\n INSERT INTO web_clickstream_clicks_data_pool\r\n SELECT wcs_click_date_sk\r\n , wcs_click_time_sk \r\n , wcs_sales_sk \r\n , wcs_item_sk \r\n , wcs_web_page_sk \r\n , wcs_user_sk \r\n FROM web_clickstreams_hdfs\r\nEND", 97 | "metadata": {}, 98 | "outputs": [ 99 | { 100 | "output_type": "display_data", 101 | "data": { 102 | "text/html": "(6770549 rows affected)" 103 | }, 104 | "metadata": {} 105 | }, 106 | { 107 | "output_type": "display_data", 108 | "data": { 109 | "text/html": "Total execution time: 00:00:42.670" 110 | }, 111 | "metadata": {} 112 | } 113 | ], 114 | "execution_count": 6 115 | }, 116 | { 117 | "cell_type": "code", 118 | "source": "SELECT count(*) FROM [dbo].[web_clickstream_clicks_data_pool]\r\nSELECT TOP 10 * FROM [dbo].[web_clickstream_clicks_data_pool]", 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "output_type": "display_data", 123 | "data": { 124 | "text/html": "(1 row affected)" 125 | }, 126 | "metadata": {} 127 | }, 128 | { 129 | "output_type": "display_data", 130 | "data": { 131 | "text/html": "(10 rows affected)" 132 | }, 133 | "metadata": {} 134 | }, 135 | { 136 | "output_type": "display_data", 137 | "data": { 138 | "text/html": "Total execution time: 00:00:00.843" 139 | }, 140 | "metadata": {} 141 | }, 142 | { 143 | "output_type": "execute_result", 144 | "metadata": {}, 145 | "execution_count": 7, 146 | "data": { 147 | "application/vnd.dataresource+json": { 148 | "schema": { 149 | "fields": [ 150 | { 151 | "name": "(No column name)" 152 | } 153 | ] 154 | }, 155 | "data": [ 156 | { 157 | "0": "6770549" 158 | } 159 | ] 160 | }, 161 | "text/html": "
(No column name)
6770549
" 162 | } 163 | }, 164 | { 165 | "output_type": "execute_result", 166 | "metadata": {}, 167 | "execution_count": 7, 168 | "data": { 169 | "application/vnd.dataresource+json": { 170 | "schema": { 171 | "fields": [ 172 | { 173 | "name": "wcs_click_date_sk" 174 | }, 175 | { 176 | "name": "wcs_click_time_sk" 177 | }, 178 | { 179 | "name": "wcs_sales_sk" 180 | }, 181 | { 182 | "name": "wcs_item_sk" 183 | }, 184 | { 185 | "name": "wcs_web_page_sk" 186 | }, 187 | { 188 | "name": "wcs_user_sk" 189 | } 190 | ] 191 | }, 192 | "data": [ 193 | { 194 | "0": "37775", 195 | "1": "35460", 196 | "2": "NULL", 197 | "3": "7394", 198 | "4": "53", 199 | "5": "NULL" 200 | }, 201 | { 202 | "0": "37775", 203 | "1": "12155", 204 | "2": "NULL", 205 | "3": "13157", 206 | "4": "53", 207 | "5": "NULL" 208 | }, 209 | { 210 | "0": "37775", 211 | "1": "4880", 212 | "2": "NULL", 213 | "3": "13098", 214 | "4": "53", 215 | "5": "NULL" 216 | }, 217 | { 218 | "0": "37775", 219 | "1": "36272", 220 | "2": "NULL", 221 | "3": "6851", 222 | "4": "53", 223 | "5": "NULL" 224 | }, 225 | { 226 | "0": "37775", 227 | "1": "24922", 228 | "2": "NULL", 229 | "3": "5198", 230 | "4": "53", 231 | "5": "NULL" 232 | }, 233 | { 234 | "0": "37776", 235 | "1": "74100", 236 | "2": "NULL", 237 | "3": "16015", 238 | "4": "53", 239 | "5": "NULL" 240 | }, 241 | { 242 | "0": "37776", 243 | "1": "26833", 244 | "2": "NULL", 245 | "3": "12921", 246 | "4": "53", 247 | "5": "NULL" 248 | }, 249 | { 250 | "0": "37776", 251 | "1": "72943", 252 | "2": "NULL", 253 | "3": "5015", 254 | "4": "53", 255 | "5": "NULL" 256 | }, 257 | { 258 | "0": "37776", 259 | "1": "9387", 260 | "2": "NULL", 261 | "3": "12274", 262 | "4": "53", 263 | "5": "NULL" 264 | }, 265 | { 266 | "0": "37776", 267 | "1": "32557", 268 | "2": "NULL", 269 | "3": "11344", 270 | "4": "53", 271 | "5": "NULL" 272 | } 273 | ] 274 | }, 275 | "text/html": "
wcs_click_date_skwcs_click_time_skwcs_sales_skwcs_item_skwcs_web_page_skwcs_user_sk
3777535460NULL739453NULL
3777512155NULL1315753NULL
377754880NULL1309853NULL
3777536272NULL685153NULL
3777524922NULL519853NULL
3777674100NULL1601553NULL
3777626833NULL1292153NULL
3777672943NULL501553NULL
377769387NULL1227453NULL
3777632557NULL1134453NULL
" 276 | } 277 | } 278 | ], 279 | "execution_count": 7 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "source": "## Next Steps: Continue on to Working with Spark and ETL\r\n\r\nNow you're ready to open the next Python Notebook - `bdc_tutorial_04.ipynb` - to learn how to create and work with Spark and Extracting, Transforming and Loading data.", 284 | "metadata": {} 285 | } 286 | ] 287 | } -------------------------------------------------------------------------------- /SQL2019BDC/notebooks/bdc_tutorial_04.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "pyspark3kernel", 5 | "display_name": "PySpark3" 6 | }, 7 | "language_info": { 8 | "name": "pyspark3", 9 | "mimetype": "text/x-python", 10 | "codemirror_mode": { 11 | "name": "python", 12 | "version": 3 13 | }, 14 | "pygments_lexer": "python3" 15 | } 16 | }, 17 | "nbformat_minor": 2, 18 | "nbformat": 4, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "source": "\"Microsoft\"\r\n
\r\n\r\n# SQL Server 2019 big data cluster Tutorial\r\n## 04 - Using Spark for ETL\r\n\r\nIn this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster. \r\n\r\nMany times Spark is used to do transformations on data at large scale. In this Jupyter Notebook, you'll read a large text file into a Spark DataFrame, and then save out the top 10 examples as a table using SparkSQL.", 23 | "metadata": {} 24 | }, 25 | { 26 | "cell_type": "code", 27 | "source": "# Read the product reviews CSV files into a spark data frame, print schema & top rows\r\nresults = spark.read.option(\"inferSchema\", \"true\").csv('/product_review_data').toDF(\"Item_ID\", \"Review\")", 28 | "metadata": {}, 29 | "outputs": [ 30 | { 31 | "name": "stdout", 32 | "text": "Starting Spark application\n", 33 | "output_type": "stream" 34 | }, 35 | { 36 | "data": { 37 | "text/plain": "", 38 | "text/html": "\n
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
0application_1561806272028_0001pyspark3idleLinkLink
" 39 | }, 40 | "metadata": {}, 41 | "output_type": "display_data" 42 | }, 43 | { 44 | "name": "stdout", 45 | "text": "SparkSession available as 'spark'.\n", 46 | "output_type": "stream" 47 | } 48 | ], 49 | "execution_count": 3 50 | }, 51 | { 52 | "cell_type": "code", 53 | "source": "# Save results as parquet file and create hive table\r\nresults.write.format(\"parquet\").mode(\"overwrite\").saveAsTable(\"Top_Product_Reviews\")", 54 | "metadata": {}, 55 | "outputs": [], 56 | "execution_count": 4 57 | }, 58 | { 59 | "cell_type": "code", 60 | "source": "# Execute Spark SQL commands\r\nsqlDF = spark.sql(\"SELECT * FROM Top_Product_Reviews LIMIT 10\")\r\nsqlDF.show()", 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "name": "stdout", 65 | "text": "+-------+--------------------+\n|Item_ID| Review|\n+-------+--------------------+\n| 72621|Works fine. Easy ...|\n| 89334|great product to ...|\n| 89335|Next time will go...|\n| 84259|Great Gift Great ...|\n| 84398|After trip to Par...|\n| 66434|Simply the best t...|\n| 66501|This is the exact...|\n| 66587|Not super magnet;...|\n| 66680|Installed as bath...|\n| 66694|Our home was buil...|\n+-------+--------------------+", 66 | "output_type": "stream" 67 | } 68 | ], 69 | "execution_count": 5 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "source": "## Next Steps: Continue on to Working with Spark and Machine Learning\r\n\r\nNow you're ready to open the final Python Notebook in this tutorial series - `bdc_tutorial_05.ipynb` - to learn how to create and work with Spark and Machine Learning.", 74 | "metadata": {} 75 | } 76 | ] 77 | } -------------------------------------------------------------------------------- /SQL2019BDC/notebooks/bdc_tutorial_05.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "kernelspec": { 4 | "name": "pysparkkernel", 5 | "display_name": "PySpark" 6 | }, 7 | "language_info": { 8 | "name": "pyspark", 9 | "mimetype": "text/x-python", 10 | "codemirror_mode": { 11 | "name": "python", 12 | "version": 3 13 | }, 14 | "pygments_lexer": "python3" 15 | } 16 | }, 17 | "nbformat_minor": 2, 18 | "nbformat": 4, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "source": [ 23 | "\"Microsoft\"\r\n", 24 | "
\r\n", 25 | "\r\n", 26 | "# SQL Server 2019 big data cluster Tutorial\r\n", 27 | "## 05 - Using Spark For Machine Learning\r\n", 28 | "\r\n", 29 | "In this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster. \r\n", 30 | "\r\n", 31 | "Wide World Importers has refridgerated trucks to deliver temperature-sensitive products. These are high-profit, and high-expense items. In the past, there have been failures in the cooling systems, and the primary culprit has been the deep-cycle batteries used in the system.\r\n", 32 | "\r\n", 33 | "WWI began replacing the batteriess every three months as a preventative measure, but this has a high cost. Recently, the taxes on recycling batteries has increased dramatically. The CEO has asked the Data Science team if they can investigate creating a Predictive Maintenance system to more accurately tell the maintenance staff how long a battery will last, rather than relying on a flat 3 month cycle. \r\n", 34 | "\r\n", 35 | "The trucks have sensors that transmit data to a file location. The trips are also logged. In this Jupyter Notebook, you'll create, train and store a Machine Learning model using SciKit-Learn, so that it can be deployed to multiple hosts. " 36 | ], 37 | "metadata": { 38 | "azdata_cell_guid": "969bbd54-5f8e-49eb-b466-5e05633fa7be" 39 | } 40 | }, 41 | { 42 | "cell_type": "code", 43 | "source": [ 44 | "import pickle \r\n", 45 | "import pandas as pd\r\n", 46 | "import numpy as np\r\n", 47 | "import datetime as dt\r\n", 48 | "from sklearn.linear_model import LogisticRegression\r\n", 49 | "from sklearn.model_selection import train_test_split" 50 | ], 51 | "metadata": { 52 | "azdata_cell_guid": "03e7dc98-c577-4616-9708-e82908019d40" 53 | }, 54 | "outputs": [ 55 | { 56 | "output_type": "stream", 57 | "name": "stdout", 58 | "text": "Starting Spark application\n" 59 | }, 60 | { 61 | "output_type": "display_data", 62 | "data": { 63 | "text/plain": "", 64 | "text/html": "\n
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
2application_1569595626385_0003pyspark3idleLinkLink
" 65 | }, 66 | "metadata": {} 67 | }, 68 | { 69 | "output_type": "stream", 70 | "name": "stdout", 71 | "text": "SparkSession available as 'spark'.\n" 72 | } 73 | ], 74 | "execution_count": 2 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "source": [ 79 | "First, download the sensor data from the location where it is transmitted from the trucks, and load it into a Spark DataFrame." 80 | ], 81 | "metadata": { 82 | "azdata_cell_guid": "18f250c3-d1fd-40e1-a652-10f948c8b0ab" 83 | } 84 | }, 85 | { 86 | "cell_type": "code", 87 | "source": [ 88 | "df = pd.read_csv('https://sabwoody.blob.core.windows.net/backups/training-formatted.csv', header=0)\r\n", 89 | "\r\n", 90 | "df.dropna()\r\n", 91 | "print(df.shape)\r\n", 92 | "print(list(df.columns))" 93 | ], 94 | "metadata": { 95 | "azdata_cell_guid": "da351ace-6907-411b-a1ab-e3e8a1be8111" 96 | }, 97 | "outputs": [ 98 | { 99 | "output_type": "stream", 100 | "name": "stdout", 101 | "text": "(10000, 74)\n['Survival_In_Days', 'Province', 'Region', 'Trip_Length_Mean', 'Trip_Length_Sigma', 'Trips_Per_Day_Mean', 'Trips_Per_Day_Sigma', 'Battery_Rated_Cycles', 'Manufacture_Month', 'Manufacture_Year', 'Alternator_Efficiency', 'Car_Has_EcoStart', 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first', 'Sensor_Reading_1', 'Sensor_Reading_2', 'Sensor_Reading_3', 'Sensor_Reading_4', 'Sensor_Reading_5', 'Sensor_Reading_6', 'Sensor_Reading_7', 'Sensor_Reading_8', 'Sensor_Reading_9', 'Sensor_Reading_10', 'Sensor_Reading_11', 'Sensor_Reading_12', 'Sensor_Reading_13', 'Sensor_Reading_14', 'Sensor_Reading_15', 'Sensor_Reading_16', 'Sensor_Reading_17', 'Sensor_Reading_18', 'Sensor_Reading_19', 'Sensor_Reading_20', 'Sensor_Reading_21', 'Sensor_Reading_22', 'Sensor_Reading_23', 'Sensor_Reading_24', 'Sensor_Reading_25', 'Sensor_Reading_26', 'Sensor_Reading_27', 'Sensor_Reading_28', 'Sensor_Reading_29', 'Sensor_Reading_30', 'Sensor_Reading_31', 'Sensor_Reading_32', 'Sensor_Reading_33', 'Sensor_Reading_34', 'Sensor_Reading_35', 'Sensor_Reading_36', 'Sensor_Reading_37', 'Sensor_Reading_38', 'Sensor_Reading_39', 'Sensor_Reading_40', 'Sensor_Reading_41', 'Sensor_Reading_42', 'Sensor_Reading_43', 'Sensor_Reading_44', 'Sensor_Reading_45', 'Sensor_Reading_46', 'Sensor_Reading_47', 'Sensor_Reading_48', 'Sensor_Reading_49', 'Sensor_Reading_50', 'Sensor_Reading_51', 'Sensor_Reading_52', 'Sensor_Reading_53', 'Sensor_Reading_54', 'Sensor_Reading_55', 'Sensor_Reading_56', 'Sensor_Reading_57', 'Sensor_Reading_58', 'Sensor_Reading_59', 'Sensor_Reading_60', 'Sensor_Reading_61']" 102 | } 103 | ], 104 | "execution_count": 8 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "source": [ 109 | "After examining the data, the Data Science team selects certain columns that they believe are highly predictive of the battery life." 110 | ], 111 | "metadata": { 112 | "azdata_cell_guid": "12249759-7afa-402a-badc-9167ad70b5f1" 113 | } 114 | }, 115 | { 116 | "cell_type": "code", 117 | "source": [ 118 | "# Select the features used for predicting battery life\r\n", 119 | "x = df.iloc[:,1:74]\r\n", 120 | "x = x.iloc[:,np.r_[2:7, 9:73]]\r\n", 121 | "x = x.interpolate() \r\n", 122 | "\r\n", 123 | "# Select the labels only (the measured battery life) \r\n", 124 | "y = df.iloc[:,0].values.flatten()\r\n", 125 | "print('Interpolation Complete')" 126 | ], 127 | "metadata": { 128 | "azdata_cell_guid": "0806c0d5-7f0d-4528-8ec4-57c21989717f" 129 | }, 130 | "outputs": [ 131 | { 132 | "output_type": "stream", 133 | "name": "stdout", 134 | "text": "Interpolation Complete" 135 | } 136 | ], 137 | "execution_count": 9 138 | }, 139 | { 140 | "cell_type": "code", 141 | "source": [ 142 | "# Examine the features selected \r\n", 143 | "print(list(x.columns))" 144 | ], 145 | "metadata": { 146 | "azdata_cell_guid": "05de0ddd-ece4-4005-82a6-02a2cd2946bb" 147 | }, 148 | "outputs": [ 149 | { 150 | "output_type": "stream", 151 | "name": "stdout", 152 | "text": "['Trip_Length_Mean', 'Trip_Length_Sigma', 'Trips_Per_Day_Mean', 'Trips_Per_Day_Sigma', 'Battery_Rated_Cycles', 'Alternator_Efficiency', 'Car_Has_EcoStart', 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first', 'Sensor_Reading_1', 'Sensor_Reading_2', 'Sensor_Reading_3', 'Sensor_Reading_4', 'Sensor_Reading_5', 'Sensor_Reading_6', 'Sensor_Reading_7', 'Sensor_Reading_8', 'Sensor_Reading_9', 'Sensor_Reading_10', 'Sensor_Reading_11', 'Sensor_Reading_12', 'Sensor_Reading_13', 'Sensor_Reading_14', 'Sensor_Reading_15', 'Sensor_Reading_16', 'Sensor_Reading_17', 'Sensor_Reading_18', 'Sensor_Reading_19', 'Sensor_Reading_20', 'Sensor_Reading_21', 'Sensor_Reading_22', 'Sensor_Reading_23', 'Sensor_Reading_24', 'Sensor_Reading_25', 'Sensor_Reading_26', 'Sensor_Reading_27', 'Sensor_Reading_28', 'Sensor_Reading_29', 'Sensor_Reading_30', 'Sensor_Reading_31', 'Sensor_Reading_32', 'Sensor_Reading_33', 'Sensor_Reading_34', 'Sensor_Reading_35', 'Sensor_Reading_36', 'Sensor_Reading_37', 'Sensor_Reading_38', 'Sensor_Reading_39', 'Sensor_Reading_40', 'Sensor_Reading_41', 'Sensor_Reading_42', 'Sensor_Reading_43', 'Sensor_Reading_44', 'Sensor_Reading_45', 'Sensor_Reading_46', 'Sensor_Reading_47', 'Sensor_Reading_48', 'Sensor_Reading_49', 'Sensor_Reading_50', 'Sensor_Reading_51', 'Sensor_Reading_52', 'Sensor_Reading_53', 'Sensor_Reading_54', 'Sensor_Reading_55', 'Sensor_Reading_56', 'Sensor_Reading_57', 'Sensor_Reading_58', 'Sensor_Reading_59', 'Sensor_Reading_60', 'Sensor_Reading_61']" 153 | } 154 | ], 155 | "execution_count": 10 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "source": [ 160 | "The lead Data Scientist believes that a standard Regression algorithm would do the best predictions." 161 | ], 162 | "metadata": { 163 | "azdata_cell_guid": "88ff4ca6-d8ec-49ed-8e4e-915f1664527e" 164 | } 165 | }, 166 | { 167 | "cell_type": "code", 168 | "source": [ 169 | "# Train a regression model \r\n", 170 | "from sklearn.ensemble import GradientBoostingRegressor \r\n", 171 | "model = GradientBoostingRegressor() \r\n", 172 | "model.fit(x,y)" 173 | ], 174 | "metadata": { 175 | "azdata_cell_guid": "0d8162d2-7530-44a7-8318-79ee11ffa210" 176 | }, 177 | "outputs": [ 178 | { 179 | "output_type": "stream", 180 | "name": "stdout", 181 | "text": "GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,\n learning_rate=0.1, loss='ls', max_depth=3, max_features=None,\n max_leaf_nodes=None, min_impurity_decrease=0.0,\n min_impurity_split=None, min_samples_leaf=1,\n min_samples_split=2, min_weight_fraction_leaf=0.0,\n n_estimators=100, n_iter_no_change=None, presort='auto',\n random_state=None, subsample=1.0, tol=0.0001,\n validation_fraction=0.1, verbose=0, warm_start=False)" 182 | } 183 | ], 184 | "execution_count": 11 185 | }, 186 | { 187 | "cell_type": "code", 188 | "source": [ 189 | "# Try making a single prediction and observe the result \r\n", 190 | "model.predict(x.iloc[0:1]) " 191 | ], 192 | "metadata": { 193 | "azdata_cell_guid": "11818008-b450-4799-a779-a1b6de81093b" 194 | }, 195 | "outputs": [ 196 | { 197 | "output_type": "stream", 198 | "name": "stdout", 199 | "text": "array([1323.39791998])" 200 | } 201 | ], 202 | "execution_count": 12 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "source": [ 207 | "After the model is trained, perform testing from labeled data." 208 | ], 209 | "metadata": { 210 | "azdata_cell_guid": "bdff0323-079d-4559-861a-345993135749" 211 | } 212 | }, 213 | { 214 | "cell_type": "code", 215 | "source": [ 216 | "# access the test data from HDFS by reading into a Spark DataFrame \r\n", 217 | "test_data = pd.read_csv('https://sabwoody.blob.core.windows.net/backups/fleet-formatted.csv', header=0)\r\n", 218 | "test_data.dropna()\r\n", 219 | "\r\n", 220 | "# prepare the test data (dropping unused columns) \r\n", 221 | "test_data = test_data.drop(columns=[\"Car_ID\", \"Battery_Age\"])\r\n", 222 | "test_data = test_data.iloc[:,np.r_[2:7, 9:73]]\r\n", 223 | "test_data.rename(columns={'Twelve_hourly_temperature_forecast_for_next_31_days _reversed': 'Twelve_hourly_temperature_history_for_last_31_days_before_death_l ast_recording_first'}, inplace=True) \r\n", 224 | "# make the battery life predictions for each of the vehicles in the test data \r\n", 225 | "battery_life_predictions = model.predict(test_data) \r\n", 226 | "# examine the prediction \r\n", 227 | "battery_life_predictions" 228 | ], 229 | "metadata": { 230 | "azdata_cell_guid": "25465418-a277-47d8-b258-1e7f9540ffdb" 231 | }, 232 | "outputs": [ 233 | { 234 | "output_type": "stream", 235 | "name": "stdout", 236 | "text": "array([1472.91111228, 1340.08897725, 1421.38601032, 1473.79033215,\n 1651.66584142, 1412.85617044, 1842.81351408, 1264.22762055,\n 1930.45602533, 1681.86345995])" 237 | } 238 | ], 239 | "execution_count": 13 240 | }, 241 | { 242 | "cell_type": "code", 243 | "source": [ 244 | "# prepare one data frame that includes predictions for each vehicle \r\n", 245 | "scored_data = test_data \r\n", 246 | "scored_data[\"Estimated_Battery_Life\"] = battery_life_predictions \r\n", 247 | "df_scored = spark.createDataFrame(scored_data) \r\n", 248 | "# Optionally, write out the scored data: \r\n", 249 | "# df_scored.coalesce(1).write.option(\"header\", \"true\").csv(\"/pdm\") " 250 | ], 251 | "metadata": { 252 | "azdata_cell_guid": "160ceaf9-81c1-4bb8-92ca-511a8b928cac" 253 | }, 254 | "outputs": [], 255 | "execution_count": 16 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "source": [ 260 | "Once you are satisfied with the Model, you can save it out using the \"Pickle\" library for deployment to other systems." 261 | ], 262 | "metadata": { 263 | "azdata_cell_guid": "af10557a-dce4-42b4-8ba7-af1faaad461c" 264 | } 265 | }, 266 | { 267 | "cell_type": "code", 268 | "source": [ 269 | "pickle_file = open('/tmp/pdm.pkl', 'wb')\r\n", 270 | "pickle.dump(model, pickle_file)\r\n", 271 | "import os\r\n", 272 | "print(os.getcwd())\r\n", 273 | "os.listdir('//tmp/')" 274 | ], 275 | "metadata": { 276 | "azdata_cell_guid": "75c70761-9c37-4446-85f4-71c4d32f6493" 277 | }, 278 | "outputs": [ 279 | { 280 | "output_type": "stream", 281 | "name": "stdout", 282 | "text": "/tmp/nm-local-dir/usercache/root/appcache/application_1569595626385_0003/container_1569595626385_0003_01_000001\n['nm-local-dir', 'hsperfdata_root', 'hadoop-root-nodemanager.pid', 'hadoop-root-datanode.pid', 'jetty-0.0.0.0-8044-node-_-any-1399597149712545407.dir', 'jetty-localhost-43849-datanode-_-any-4367549175596043164.dir', 'pdm.pkl', 'tmpo7d6l6mt']" 283 | } 284 | ], 285 | "execution_count": 18 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "source": [ 290 | "**(Optional)**\r\n", 291 | "\r\n", 292 | "You could export this model and run it at the edge or in SQL Server directly. Here's an example of what that code could look like:\r\n", 293 | "\r\n", 294 | "
\r\n",
295 |                 "\r\n",
296 |                 "DECLARE @query_string nvarchar(max) -- Query Truck Data\r\n",
297 |                 "SET @query_string='\r\n",
298 |                 "SELECT ['Trip_Length_Mean', 'Trip_Length_Sigma', 'Trips_Per_Day_Mean', 'Trips_Per_Day_Sigma', 'Battery_Rated_Cycles', 'Alternator_Efficiency', 'Car_Has_EcoStart', 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first', 'Sensor_Reading_1', 'Sensor_Reading_2', 'Sensor_Reading_3', 'Sensor_Reading_4', 'Sensor_Reading_5', 'Sensor_Reading_6', 'Sensor_Reading_7', 'Sensor_Reading_8', 'Sensor_Reading_9', 'Sensor_Reading_10', 'Sensor_Reading_11', 'Sensor_Reading_12', 'Sensor_Reading_13', 'Sensor_Reading_14', 'Sensor_Reading_15', 'Sensor_Reading_16', 'Sensor_Reading_17', 'Sensor_Reading_18', 'Sensor_Reading_19', 'Sensor_Reading_20', 'Sensor_Reading_21', 'Sensor_Reading_22', 'Sensor_Reading_23', 'Sensor_Reading_24', 'Sensor_Reading_25', 'Sensor_Reading_26', 'Sensor_Reading_27', 'Sensor_Reading_28', 'Sensor_Reading_29', 'Sensor_Reading_30', 'Sensor_Reading_31', 'Sensor_Reading_32', 'Sensor_Reading_33', 'Sensor_Reading_34', 'Sensor_Reading_35', 'Sensor_Reading_36', 'Sensor_Reading_37', 'Sensor_Reading_38', 'Sensor_Reading_39', 'Sensor_Reading_40', 'Sensor_Reading_41', 'Sensor_Reading_42', 'Sensor_Reading_43', 'Sensor_Reading_44', 'Sensor_Reading_45', 'Sensor_Reading_46', 'Sensor_Reading_47', 'Sensor_Reading_48', 'Sensor_Reading_49', 'Sensor_Reading_50', 'Sensor_Reading_51', 'Sensor_Reading_52', 'Sensor_Reading_53', 'Sensor_Reading_54', 'Sensor_Reading_55', 'Sensor_Reading_56', 'Sensor_Reading_57', 'Sensor_Reading_58', 'Sensor_Reading_59', 'Sensor_Reading_60', 'Sensor_Reading_61']\r\n",
299 |                 "FROM Truck_Sensor_Readings'\r\n",
300 |                 "EXEC [dbo].[PredictBattLife] 'pdm', @query_string;\r\n",
301 |                 "\r\n",
302 |                 "
\r\n", 303 | "" 304 | ], 305 | "metadata": { 306 | "azdata_cell_guid": "04647e35-4f0f-4e52-a89c-6d695809da14" 307 | } 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "source": [ 312 | "## Next Steps: Continue on to other workloads in SQL Server 2019\r\n", 313 | "\r\n", 314 | "Now you're ready to work with SQL Server 2019's other features - [you can learn more about those here](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sqlallproducts-allversions)." 315 | ], 316 | "metadata": { 317 | "azdata_cell_guid": "c7122be0-61a8-4b8d-a023-14212908269d" 318 | } 319 | } 320 | ] 321 | } -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc.ssmssln: -------------------------------------------------------------------------------- 1 |  2 | Microsoft Visual Studio Solution File, Format Version 12.00 3 | # SQL Server Management Studio Solution File, Format Version 18.00 4 | VisualStudioVersion = 15.0.27428.2015 5 | MinimumVisualStudioVersion = 10.0.40219.1 6 | Project("{4F2E2C19-372F-40D8-9FA7-9D2138C6997A}") = "SQL Server Scripts for bdc", "SQL Server Scripts for bdc\SQL Server Scripts for bdc.ssmssqlproj", "{3B44EB86-7B99-4EB1-ACAD-31AF5461F4E2}" 7 | EndProject 8 | Global 9 | GlobalSection(SolutionConfigurationPlatforms) = preSolution 10 | Default|Default = Default|Default 11 | EndGlobalSection 12 | GlobalSection(ProjectConfigurationPlatforms) = postSolution 13 | {3B44EB86-7B99-4EB1-ACAD-31AF5461F4E2}.Default|Default.ActiveCfg = Default 14 | {04FC7132-4830-4B67-905B-0279F580D3E8}.Default|Default.ActiveCfg = Default 15 | EndGlobalSection 16 | GlobalSection(SolutionProperties) = preSolution 17 | HideSolutionNode = FALSE 18 | EndGlobalSection 19 | GlobalSection(ExtensibilityGlobals) = postSolution 20 | SolutionGuid = {B0EEFBD1-1FE9-4FF0-95C7-81818B1BD6F8} 21 | EndGlobalSection 22 | EndGlobal 23 | -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/01 - Show Configuration.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Note: To be run in conjunction with SQL server 2019 big data clusters course only. 3 | 4 | Show Instance Version */ 5 | SELECT @@VERSION; 6 | GO 7 | 8 | /* General Configuration */ 9 | USE master; 10 | GO 11 | EXEC sp_configure; 12 | GO 13 | 14 | /* Databases on this Instance */ 15 | SELECT db.name AS 'Database Name' 16 | , Physical_Name AS 'Location on Disk' 17 | , Cast(Cast(Round(cast(mf.size as decimal) * 8.0/1024000.0,2) as decimal(18,2)) as nvarchar) 'Size (GB)' 18 | FROM sys.master_files mf 19 | INNER JOIN 20 | sys.databases db ON db.database_id = mf.database_id 21 | WHERE mf.type_desc = 'ROWS'; 22 | GO 23 | 24 | SELECT * from sys.master_files 25 | -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/02 - Population Information from WWI.sql: -------------------------------------------------------------------------------- 1 | /* Get some general information about the data in the WWI OLTP system */ 2 | USE WideWorldImporters; 3 | GO 4 | 5 | /* Show the Populations. 6 | Where do we have the most people? 7 | */ 8 | SELECT TOP 10 CityName as 'City Name' 9 | , StateProvinceName as 'State or Province' 10 | , sp.LatestRecordedPopulation as 'Population' 11 | , CountryName 12 | FROM Application.Cities AS city 13 | JOIN Application.StateProvinces AS sp ON 14 | city.StateProvinceID = sp.StateProvinceID 15 | JOIN Application.Countries AS ctry ON 16 | sp.CountryID=ctry.CountryID 17 | ORDER BY Population, CityName; 18 | GO -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/03 - Sales in WWI.sql: -------------------------------------------------------------------------------- 1 | /* Show Customer Sales in WWI OLTP */ 2 | USE WideWorldImporters; 3 | GO 4 | 5 | SELECT TOP 10 s.CustomerID 6 | , s.CustomerName 7 | , sc.CustomerCategoryName 8 | , pp.FullName AS PrimaryContact 9 | , ap.FullName AS AlternateContact 10 | , s.PhoneNumber 11 | , s.FaxNumber 12 | , bg.BuyingGroupName 13 | , s.WebsiteURL 14 | , dm.DeliveryMethodName AS DeliveryMethod 15 | , c.CityName AS CityName 16 | , s.DeliveryLocation AS DeliveryLocation 17 | , s.DeliveryRun 18 | , s.RunPosition 19 | FROM Sales.Customers AS s 20 | LEFT OUTER JOIN Sales.CustomerCategories AS sc 21 | ON s.CustomerCategoryID = sc.CustomerCategoryID 22 | LEFT OUTER JOIN [Application].People AS pp 23 | ON s.PrimaryContactPersonID = pp.PersonID 24 | LEFT OUTER JOIN [Application].People AS ap 25 | ON s.AlternateContactPersonID = ap.PersonID 26 | LEFT OUTER JOIN Sales.BuyingGroups AS bg 27 | ON s.BuyingGroupID = bg.BuyingGroupID 28 | LEFT OUTER JOIN [Application].DeliveryMethods AS dm 29 | ON s.DeliveryMethodID = dm.DeliveryMethodID 30 | LEFT OUTER JOIN [Application].Cities AS c 31 | ON s.DeliveryCityID = c.CityID 32 | ORDER BY c.CityName -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/04 - Join to HDFS.sql: -------------------------------------------------------------------------------- 1 | /* Now Join Those to show customers we currently have in a SQL Server Database 2 | and the Category they qre in the External Table */ 3 | USE WideWorldImporters; 4 | GO 5 | 6 | SELECT TOP 10 a.FullName 7 | , b.CustomerSource 8 | FROM Application.People a 9 | INNER JOIN partner_customers_hdfs b ON a.FullName = b.CustomerName 10 | ORDER BY FullName ASC; 11 | GO -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/05 - Query from Data Pool.sql: -------------------------------------------------------------------------------- 1 | USE WideWorldImporters; 2 | GO 3 | 4 | SELECT count(*) FROM [dbo].[web_clickstream_clicks_data_pool] 5 | SELECT TOP 10 * FROM [dbo].[web_clickstream_clicks_data_pool] -------------------------------------------------------------------------------- /SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/SQL Server Scripts for bdc.ssmssqlproj: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 2019-04-29T13:21:46.7659137-04:00 8 | SQL 9 | 104.44.142.150,31433 10 | sa 11 | SQL 12 | 13 | 30 14 | 0 15 | NotSpecified 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa 24 | 104.44.142.150,31433 25 | sa 26 | 01 - Show Configuration.sql 27 | 28 | 29 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa 30 | 104.44.142.150,31433 31 | sa 32 | 02 - Population Information from WWI.sql 33 | 34 | 35 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa 36 | 104.44.142.150,31433 37 | sa 38 | 03 - Sales in WWI.sql 39 | 40 | 41 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa 42 | 104.44.142.150,31433 43 | sa 44 | 04 - Join to HDFS.sql 45 | 46 | 47 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa 48 | 104.44.142.150,31433 49 | sa 50 | 05 - Query from Data Pool.sql 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | -------------------------------------------------------------------------------- /graphics/ADS-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ADS-5.png -------------------------------------------------------------------------------- /graphics/KubernetesCluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/KubernetesCluster.png -------------------------------------------------------------------------------- /graphics/WWI-001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-001.png -------------------------------------------------------------------------------- /graphics/WWI-002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-002.png -------------------------------------------------------------------------------- /graphics/WWI-003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-003.png -------------------------------------------------------------------------------- /graphics/WWI-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-logo.png -------------------------------------------------------------------------------- /graphics/adf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/adf.png -------------------------------------------------------------------------------- /graphics/ads-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-1.png -------------------------------------------------------------------------------- /graphics/ads-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-2.png -------------------------------------------------------------------------------- /graphics/ads-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-3.png -------------------------------------------------------------------------------- /graphics/ads-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-4.png -------------------------------------------------------------------------------- /graphics/ads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads.png -------------------------------------------------------------------------------- /graphics/aks1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/aks1.png -------------------------------------------------------------------------------- /graphics/aks2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/aks2.png -------------------------------------------------------------------------------- /graphics/bdc-security-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdc-security-1.png -------------------------------------------------------------------------------- /graphics/bdc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdc.png -------------------------------------------------------------------------------- /graphics/bdcportal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcportal.png -------------------------------------------------------------------------------- /graphics/bdcsolution1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution1.png -------------------------------------------------------------------------------- /graphics/bdcsolution2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution2.png -------------------------------------------------------------------------------- /graphics/bdcsolution3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution3.png -------------------------------------------------------------------------------- /graphics/bdcsolution4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution4.png -------------------------------------------------------------------------------- /graphics/bookpencil.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bookpencil.png -------------------------------------------------------------------------------- /graphics/building1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/building1.png -------------------------------------------------------------------------------- /graphics/bulletlist.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bulletlist.png -------------------------------------------------------------------------------- /graphics/checkbox.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/checkbox.png -------------------------------------------------------------------------------- /graphics/checkmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/checkmark.png -------------------------------------------------------------------------------- /graphics/clipboardcheck.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/clipboardcheck.png -------------------------------------------------------------------------------- /graphics/cloud1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/cloud1.png -------------------------------------------------------------------------------- /graphics/datamart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datamart.png -------------------------------------------------------------------------------- /graphics/datamart1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datamart1.png -------------------------------------------------------------------------------- /graphics/datavirtualization.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datavirtualization.png -------------------------------------------------------------------------------- /graphics/datavirtualization1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datavirtualization1.png -------------------------------------------------------------------------------- /graphics/education1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/education1.png -------------------------------------------------------------------------------- /graphics/factory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/factory.png -------------------------------------------------------------------------------- /graphics/geopin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/geopin.png -------------------------------------------------------------------------------- /graphics/grafana.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/grafana.png -------------------------------------------------------------------------------- /graphics/hdfs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/hdfs.png -------------------------------------------------------------------------------- /graphics/kibana.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/kibana.png -------------------------------------------------------------------------------- /graphics/kubectl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/kubectl.png -------------------------------------------------------------------------------- /graphics/kubernetes1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/kubernetes1.png -------------------------------------------------------------------------------- /graphics/listcheck.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/listcheck.png -------------------------------------------------------------------------------- /graphics/microsoftlogo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/microsoftlogo.png -------------------------------------------------------------------------------- /graphics/owl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/owl.png -------------------------------------------------------------------------------- /graphics/paperclip1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/paperclip1.png -------------------------------------------------------------------------------- /graphics/pencil2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/pencil2.png -------------------------------------------------------------------------------- /graphics/pinmap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/pinmap.png -------------------------------------------------------------------------------- /graphics/point1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/point1.png -------------------------------------------------------------------------------- /graphics/solutiondiagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/solutiondiagram.png -------------------------------------------------------------------------------- /graphics/spark1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark1.png -------------------------------------------------------------------------------- /graphics/spark2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark2.png -------------------------------------------------------------------------------- /graphics/spark3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark3.png -------------------------------------------------------------------------------- /graphics/spark4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark4.png -------------------------------------------------------------------------------- /graphics/sqlbdc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/sqlbdc.png -------------------------------------------------------------------------------- /graphics/textbubble.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/textbubble.png --------------------------------------------------------------------------------