├── .DS_Store
├── LICENSES
├── CODE_OF_CONDUCT.md
├── LICENSE
├── LICENSE-CODE
├── README.md
└── SECURITY.md
├── README.md
├── SECURITY.md
├── SQL2019BDC
├── 00 - Prerequisites.md
├── 01 - The Big Data Landscape.md
├── 02 - SQL Server BDC Components.md
├── 03 - Planning, Installation and Configuration.md
├── 04 - Operationalization.md
├── 05 - Management and Monitoring.md
├── 06 - Security.md
├── notebooks
│ ├── README.md
│ ├── bdc_tutorial_00.ipynb
│ ├── bdc_tutorial_01.ipynb
│ ├── bdc_tutorial_02.ipynb
│ ├── bdc_tutorial_03.ipynb
│ ├── bdc_tutorial_04.ipynb
│ └── bdc_tutorial_05.ipynb
└── ssms
│ └── SQL Server Scripts for bdc
│ ├── SQL Server Scripts for bdc.ssmssln
│ └── SQL Server Scripts for bdc
│ ├── 01 - Show Configuration.sql
│ ├── 02 - Population Information from WWI.sql
│ ├── 03 - Sales in WWI.sql
│ ├── 04 - Join to HDFS.sql
│ ├── 05 - Query from Data Pool.sql
│ └── SQL Server Scripts for bdc.ssmssqlproj
└── graphics
├── ADS-5.png
├── KubernetesCluster.png
├── WWI-001.png
├── WWI-002.png
├── WWI-003.png
├── WWI-logo.png
├── adf.png
├── ads-1.png
├── ads-2.png
├── ads-3.png
├── ads-4.png
├── ads.png
├── aks1.png
├── aks2.png
├── bdc-security-1.png
├── bdc.png
├── bdcportal.png
├── bdcsolution1.png
├── bdcsolution2.png
├── bdcsolution3.png
├── bdcsolution4.png
├── bookpencil.png
├── building1.png
├── bulletlist.png
├── checkbox.png
├── checkmark.png
├── clipboardcheck.png
├── cloud1.png
├── datamart.png
├── datamart1.png
├── datavirtualization.png
├── datavirtualization1.png
├── education1.png
├── factory.png
├── geopin.png
├── grafana.png
├── hdfs.png
├── kibana.png
├── kubectl.png
├── kubernetes1.png
├── listcheck.png
├── microsoftlogo.png
├── owl.png
├── paperclip1.png
├── pencil2.png
├── pinmap.png
├── point1.png
├── solutiondiagram.png
├── spark1.png
├── spark2.png
├── spark3.png
├── spark4.png
├── sqlbdc.png
└── textbubble.png
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/.DS_Store
--------------------------------------------------------------------------------
/LICENSES/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Microsoft Open Source Code of Conduct
2 |
3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
4 |
5 | Resources:
6 |
7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 |
--------------------------------------------------------------------------------
/LICENSES/LICENSE:
--------------------------------------------------------------------------------
1 | Attribution 4.0 International
2 |
3 | =======================================================================
4 |
5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
6 | does not provide legal services or legal advice. Distribution of
7 | Creative Commons public licenses does not create a lawyer-client or
8 | other relationship. Creative Commons makes its licenses and related
9 | information available on an "as-is" basis. Creative Commons gives no
10 | warranties regarding its licenses, any material licensed under their
11 | terms and conditions, or any related information. Creative Commons
12 | disclaims all liability for damages resulting from their use to the
13 | fullest extent possible.
14 |
15 | Using Creative Commons Public Licenses
16 |
17 | Creative Commons public licenses provide a standard set of terms and
18 | conditions that creators and other rights holders may use to share
19 | original works of authorship and other material subject to copyright
20 | and certain other rights specified in the public license below. The
21 | following considerations are for informational purposes only, are not
22 | exhaustive, and do not form part of our licenses.
23 |
24 | Considerations for licensors: Our public licenses are
25 | intended for use by those authorized to give the public
26 | permission to use material in ways otherwise restricted by
27 | copyright and certain other rights. Our licenses are
28 | irrevocable. Licensors should read and understand the terms
29 | and conditions of the license they choose before applying it.
30 | Licensors should also secure all rights necessary before
31 | applying our licenses so that the public can reuse the
32 | material as expected. Licensors should clearly mark any
33 | material not subject to the license. This includes other CC-
34 | licensed material, or material used under an exception or
35 | limitation to copyright. More considerations for licensors:
36 | wiki.creativecommons.org/Considerations_for_licensors
37 |
38 | Considerations for the public: By using one of our public
39 | licenses, a licensor grants the public permission to use the
40 | licensed material under specified terms and conditions. If
41 | the licensor's permission is not necessary for any reason--for
42 | example, because of any applicable exception or limitation to
43 | copyright--then that use is not regulated by the license. Our
44 | licenses grant only permissions under copyright and certain
45 | other rights that a licensor has authority to grant. Use of
46 | the licensed material may still be restricted for other
47 | reasons, including because others have copyright or other
48 | rights in the material. A licensor may make special requests,
49 | such as asking that all changes be marked or described.
50 | Although not required by our licenses, you are encouraged to
51 | respect those requests where reasonable. More_considerations
52 | for the public:
53 | wiki.creativecommons.org/Considerations_for_licensees
54 |
55 | =======================================================================
56 |
57 | Creative Commons Attribution 4.0 International Public License
58 |
59 | By exercising the Licensed Rights (defined below), You accept and agree
60 | to be bound by the terms and conditions of this Creative Commons
61 | Attribution 4.0 International Public License ("Public License"). To the
62 | extent this Public License may be interpreted as a contract, You are
63 | granted the Licensed Rights in consideration of Your acceptance of
64 | these terms and conditions, and the Licensor grants You such rights in
65 | consideration of benefits the Licensor receives from making the
66 | Licensed Material available under these terms and conditions.
67 |
68 |
69 | Section 1 -- Definitions.
70 |
71 | a. Adapted Material means material subject to Copyright and Similar
72 | Rights that is derived from or based upon the Licensed Material
73 | and in which the Licensed Material is translated, altered,
74 | arranged, transformed, or otherwise modified in a manner requiring
75 | permission under the Copyright and Similar Rights held by the
76 | Licensor. For purposes of this Public License, where the Licensed
77 | Material is a musical work, performance, or sound recording,
78 | Adapted Material is always produced where the Licensed Material is
79 | synched in timed relation with a moving image.
80 |
81 | b. Adapter's License means the license You apply to Your Copyright
82 | and Similar Rights in Your contributions to Adapted Material in
83 | accordance with the terms and conditions of this Public License.
84 |
85 | c. Copyright and Similar Rights means copyright and/or similar rights
86 | closely related to copyright including, without limitation,
87 | performance, broadcast, sound recording, and Sui Generis Database
88 | Rights, without regard to how the rights are labeled or
89 | categorized. For purposes of this Public License, the rights
90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
91 | Rights.
92 |
93 | d. Effective Technological Measures means those measures that, in the
94 | absence of proper authority, may not be circumvented under laws
95 | fulfilling obligations under Article 11 of the WIPO Copyright
96 | Treaty adopted on December 20, 1996, and/or similar international
97 | agreements.
98 |
99 | e. Exceptions and Limitations means fair use, fair dealing, and/or
100 | any other exception or limitation to Copyright and Similar Rights
101 | that applies to Your use of the Licensed Material.
102 |
103 | f. Licensed Material means the artistic or literary work, database,
104 | or other material to which the Licensor applied this Public
105 | License.
106 |
107 | g. Licensed Rights means the rights granted to You subject to the
108 | terms and conditions of this Public License, which are limited to
109 | all Copyright and Similar Rights that apply to Your use of the
110 | Licensed Material and that the Licensor has authority to license.
111 |
112 | h. Licensor means the individual(s) or entity(ies) granting rights
113 | under this Public License.
114 |
115 | i. Share means to provide material to the public by any means or
116 | process that requires permission under the Licensed Rights, such
117 | as reproduction, public display, public performance, distribution,
118 | dissemination, communication, or importation, and to make material
119 | available to the public including in ways that members of the
120 | public may access the material from a place and at a time
121 | individually chosen by them.
122 |
123 | j. Sui Generis Database Rights means rights other than copyright
124 | resulting from Directive 96/9/EC of the European Parliament and of
125 | the Council of 11 March 1996 on the legal protection of databases,
126 | as amended and/or succeeded, as well as other essentially
127 | equivalent rights anywhere in the world.
128 |
129 | k. You means the individual or entity exercising the Licensed Rights
130 | under this Public License. Your has a corresponding meaning.
131 |
132 |
133 | Section 2 -- Scope.
134 |
135 | a. License grant.
136 |
137 | 1. Subject to the terms and conditions of this Public License,
138 | the Licensor hereby grants You a worldwide, royalty-free,
139 | non-sublicensable, non-exclusive, irrevocable license to
140 | exercise the Licensed Rights in the Licensed Material to:
141 |
142 | a. reproduce and Share the Licensed Material, in whole or
143 | in part; and
144 |
145 | b. produce, reproduce, and Share Adapted Material.
146 |
147 | 2. Exceptions and Limitations. For the avoidance of doubt, where
148 | Exceptions and Limitations apply to Your use, this Public
149 | License does not apply, and You do not need to comply with
150 | its terms and conditions.
151 |
152 | 3. Term. The term of this Public License is specified in Section
153 | 6(a).
154 |
155 | 4. Media and formats; technical modifications allowed. The
156 | Licensor authorizes You to exercise the Licensed Rights in
157 | all media and formats whether now known or hereafter created,
158 | and to make technical modifications necessary to do so. The
159 | Licensor waives and/or agrees not to assert any right or
160 | authority to forbid You from making technical modifications
161 | necessary to exercise the Licensed Rights, including
162 | technical modifications necessary to circumvent Effective
163 | Technological Measures. For purposes of this Public License,
164 | simply making modifications authorized by this Section 2(a)
165 | (4) never produces Adapted Material.
166 |
167 | 5. Downstream recipients.
168 |
169 | a. Offer from the Licensor -- Licensed Material. Every
170 | recipient of the Licensed Material automatically
171 | receives an offer from the Licensor to exercise the
172 | Licensed Rights under the terms and conditions of this
173 | Public License.
174 |
175 | b. No downstream restrictions. You may not offer or impose
176 | any additional or different terms or conditions on, or
177 | apply any Effective Technological Measures to, the
178 | Licensed Material if doing so restricts exercise of the
179 | Licensed Rights by any recipient of the Licensed
180 | Material.
181 |
182 | 6. No endorsement. Nothing in this Public License constitutes or
183 | may be construed as permission to assert or imply that You
184 | are, or that Your use of the Licensed Material is, connected
185 | with, or sponsored, endorsed, or granted official status by,
186 | the Licensor or others designated to receive attribution as
187 | provided in Section 3(a)(1)(A)(i).
188 |
189 | b. Other rights.
190 |
191 | 1. Moral rights, such as the right of integrity, are not
192 | licensed under this Public License, nor are publicity,
193 | privacy, and/or other similar personality rights; however, to
194 | the extent possible, the Licensor waives and/or agrees not to
195 | assert any such rights held by the Licensor to the limited
196 | extent necessary to allow You to exercise the Licensed
197 | Rights, but not otherwise.
198 |
199 | 2. Patent and trademark rights are not licensed under this
200 | Public License.
201 |
202 | 3. To the extent possible, the Licensor waives any right to
203 | collect royalties from You for the exercise of the Licensed
204 | Rights, whether directly or through a collecting society
205 | under any voluntary or waivable statutory or compulsory
206 | licensing scheme. In all other cases the Licensor expressly
207 | reserves any right to collect such royalties.
208 |
209 |
210 | Section 3 -- License Conditions.
211 |
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 |
215 | a. Attribution.
216 |
217 | 1. If You Share the Licensed Material (including in modified
218 | form), You must:
219 |
220 | a. retain the following if it is supplied by the Licensor
221 | with the Licensed Material:
222 |
223 | i. identification of the creator(s) of the Licensed
224 | Material and any others designated to receive
225 | attribution, in any reasonable manner requested by
226 | the Licensor (including by pseudonym if
227 | designated);
228 |
229 | ii. a copyright notice;
230 |
231 | iii. a notice that refers to this Public License;
232 |
233 | iv. a notice that refers to the disclaimer of
234 | warranties;
235 |
236 | v. a URI or hyperlink to the Licensed Material to the
237 | extent reasonably practicable;
238 |
239 | b. indicate if You modified the Licensed Material and
240 | retain an indication of any previous modifications; and
241 |
242 | c. indicate the Licensed Material is licensed under this
243 | Public License, and include the text of, or the URI or
244 | hyperlink to, this Public License.
245 |
246 | 2. You may satisfy the conditions in Section 3(a)(1) in any
247 | reasonable manner based on the medium, means, and context in
248 | which You Share the Licensed Material. For example, it may be
249 | reasonable to satisfy the conditions by providing a URI or
250 | hyperlink to a resource that includes the required
251 | information.
252 |
253 | 3. If requested by the Licensor, You must remove any of the
254 | information required by Section 3(a)(1)(A) to the extent
255 | reasonably practicable.
256 |
257 | 4. If You Share Adapted Material You produce, the Adapter's
258 | License You apply must not prevent recipients of the Adapted
259 | Material from complying with this Public License.
260 |
261 |
262 | Section 4 -- Sui Generis Database Rights.
263 |
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 |
267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 | to extract, reuse, reproduce, and Share all or a substantial
269 | portion of the contents of the database;
270 |
271 | b. if You include all or a substantial portion of the database
272 | contents in a database in which You have Sui Generis Database
273 | Rights, then the database in which You have Sui Generis Database
274 | Rights (but not its individual contents) is Adapted Material; and
275 |
276 | c. You must comply with the conditions in Section 3(a) if You Share
277 | all or a substantial portion of the contents of the database.
278 |
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 |
283 |
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 |
286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 |
297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 |
307 | c. The disclaimer of warranties and limitation of liability provided
308 | above shall be interpreted in a manner that, to the extent
309 | possible, most closely approximates an absolute disclaimer and
310 | waiver of all liability.
311 |
312 |
313 | Section 6 -- Term and Termination.
314 |
315 | a. This Public License applies for the term of the Copyright and
316 | Similar Rights licensed here. However, if You fail to comply with
317 | this Public License, then Your rights under this Public License
318 | terminate automatically.
319 |
320 | b. Where Your right to use the Licensed Material has terminated under
321 | Section 6(a), it reinstates:
322 |
323 | 1. automatically as of the date the violation is cured, provided
324 | it is cured within 30 days of Your discovery of the
325 | violation; or
326 |
327 | 2. upon express reinstatement by the Licensor.
328 |
329 | For the avoidance of doubt, this Section 6(b) does not affect any
330 | right the Licensor may have to seek remedies for Your violations
331 | of this Public License.
332 |
333 | c. For the avoidance of doubt, the Licensor may also offer the
334 | Licensed Material under separate terms or conditions or stop
335 | distributing the Licensed Material at any time; however, doing so
336 | will not terminate this Public License.
337 |
338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 | License.
340 |
341 |
342 | Section 7 -- Other Terms and Conditions.
343 |
344 | a. The Licensor shall not be bound by any additional or different
345 | terms or conditions communicated by You unless expressly agreed.
346 |
347 | b. Any arrangements, understandings, or agreements regarding the
348 | Licensed Material not stated herein are separate from and
349 | independent of the terms and conditions of this Public License.
350 |
351 |
352 | Section 8 -- Interpretation.
353 |
354 | a. For the avoidance of doubt, this Public License does not, and
355 | shall not be interpreted to, reduce, limit, restrict, or impose
356 | conditions on any use of the Licensed Material that could lawfully
357 | be made without permission under this Public License.
358 |
359 | b. To the extent possible, if any provision of this Public License is
360 | deemed unenforceable, it shall be automatically reformed to the
361 | minimum extent necessary to make it enforceable. If the provision
362 | cannot be reformed, it shall be severed from this Public License
363 | without affecting the enforceability of the remaining terms and
364 | conditions.
365 |
366 | c. No term or condition of this Public License will be waived and no
367 | failure to comply consented to unless expressly agreed to by the
368 | Licensor.
369 |
370 | d. Nothing in this Public License constitutes or may be interpreted
371 | as a limitation upon, or waiver of, any privileges and immunities
372 | that apply to the Licensor or You, including from the legal
373 | processes of any jurisdiction or authority.
374 |
375 |
376 | =======================================================================
377 |
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 |
395 | Creative Commons may be contacted at creativecommons.org.
--------------------------------------------------------------------------------
/LICENSES/LICENSE-CODE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) Microsoft Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE
22 |
--------------------------------------------------------------------------------
/LICENSES/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: Kubernetes - From Bare Metal to SQL Server Big Data Clusters
4 |
5 | #### A Microsoft Course from the SQL Server team
6 |
7 |
8 |
9 |
About this Workshop
10 |
11 | Welcome to this Microsoft solutions workshop on *Kubernetes - From Bare Metal to SQL Server Big Data Clusters*. In this workshop, you'll learn about setting up a production-grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of [Jupyter Notebooks](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) in Microsoft's [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-ver15) tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.
12 |
13 | The focus of this workshop is to understand the hardware, software, and environment you need to work with [SQL Server 2019's big data clusters](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sql-server-ver15) on a Kubernetes platform.
14 |
15 | You'll start by understanding Containers and Kubernetes, moving on to a discussion of the hardware and software environment for Kubernetes, and then to more in-depth Kubernetes concepts. You'll follow-on with the SQL Server 2019 big data clusters architecture, and then how to use the entire system in a practical application, all with a focus on how to extrapolate what you have learned to create other solutions for your organization.
16 |
17 | > NOTE: This course is designed to be taught in-person with hardware or virtual environments provided by the instructional team. You will also get details for setting up your own hardware, virtual or Cloud environments for Kubernetes for a workshop backup or if you are not attending in-person.
18 |
19 | This [github README.MD file](https://lab.github.com/githubtraining/introduction-to-github) explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution.
20 |
21 | (You can view all of the [source files for this workshop on this github site, along with other workshops as well. Open this link in a new tab to find out more.](https://github.com/microsoft/sqlworkshops-k8stobdc))
22 |
23 |
24 |
25 |
Learning Objectives
26 |
27 | In this workshop you'll learn:
28 |
29 |
30 | - How Containers and Kubernetes work and when and where you can use them
31 | - Hardware considerations for setting up a production Kubernetes Cluster on-premises
32 | - Considerations for Virtual and Cloud-based environments for production Kubernetes Cluster
33 |
34 | The concepts and skills taught in this workshop form the starting points for:
35 |
36 | Solution Architects, to understand how to design an end-to-end solution.
37 | System Administrators, Database Administrators, or Data Engineers, to understand how to put together an end-to-end solution.
38 |
39 |
40 |
41 |
Business Applications of this Workshop
42 |
43 | Businesses require stable, secure environments at scale, which work in secure on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven DevOps practices, which further streamline IT processes.
44 |
45 |
46 |
47 |
Technologies used in this Workshop
48 |
49 | The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided.
50 |
51 |
52 |
53 | Technology | Description |
54 |
55 | Linux | The primary operating system used in and by Containers and Kubernetes |
56 | Containers | The atomic layer of a Kubernetes Cluster |
57 | Kubernetes | The primary clustering technology for manifest-driven environments |
58 | SQL Server Big Data Clusters | Relational and non-relational data at scale with Spark, HDFS and application deployment capabilities |
59 |
60 |
61 |
62 |
63 |
64 |
Before Taking this Workshop
65 |
66 | There are a few requirements for attending the workshop, listed below:
67 | - You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.
68 | - You must have a Microsoft Azure account with the ability to create assets for the "backup" or self-taught path.
69 | - This workshop expects that you understand computer technologies, networking, the basics of SQL Server, HDFS, Spark, and general use of Hypervisors.
70 | - The **Setup** section below explains the steps you should take prior to coming to the workshop
71 |
72 | If you are new to any of these, here are a few references you can complete prior to class:
73 |
74 | - [Microsoft SQL Server Administration and Use](https://www.microsoft.com/en-us/learning/course.aspx?cid=OD20764)
75 | - [HDFS](https://data-flair.training/blogs/hadoop-hdfs-tutorial/)
76 | - [Spark](https://www.edx.org/course/implementing-predictive-analytics-with-spark-in-az)
77 | - [Hypervisor Technologies - Hyper-V](https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/hyper-v-technology-overview)
78 | or
79 | - [Hypervisor Technologies - VMWare](https://tsmith.co/free-vmware-training/)
80 |
81 |
Setup
82 |
83 | A full pre-requisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient).
84 |
85 |
86 |
87 |
Workshop Details
88 |
89 | This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for advanced analytics over large sets of data and Data Science workloads.
90 |
91 |
92 |
93 | Primary Audience: | Technical processionals tasked with configuring, deploying and managing large-scale clustering systems |
94 | Secondary Audience: | Data professionals tasked with working with data at scale |
95 | Level: | 300 |
96 | Type: | In-Person (self-guided possible) |
97 | Length: | 8 |
98 |
99 |
100 |
101 |
102 |
103 |
Related Workshops
104 |
105 | - [50 days from zero to hero with Kubernetes](https://azure.microsoft.com/mediahandler/files/resourcefiles/kubernetes-learning-path/Kubernetes%20Learning%20Path%20version%201.0.pdf)
106 |
107 |
108 |
109 |
Workshop Modules
110 |
111 | This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution.
112 |
113 |
114 |
115 | Module | Topics |
116 |
117 | 01 - An introduction to Linux, Containers and Kubernetes | This module covers Container technologies and how they are different than Virtual Machines. You'll learn about the need for container orchestration using Kubernetes. |
118 | 02 - Hardware and Virtualization environment for Kubernetes | This module explains how to make a production-grade environment using "bare metal" computer hardware or with a virtualized platform, and most importantly the storage hardware aspects. |
119 | 03 - Kubernetes Concepts and Implementation | Covers deploying Kubernetes, Kubernetes contexts, cluster troubleshooting and management, services: load balancing versus node ports, understanding storage from a Kubernetes perspective and making your cluster secure. |
120 | 04 - SQL Server Big Data Clusters Architecture | This module will dig deep into the anatomy of a big data cluster by covering topics that include: the data pool, storage pool, compute pool and cluster control plane, active directory integration, development versus production configurations and the tools required for deploying and managing a big data cluster. |
05 - Using the SQL Server big data cluster on Kubernetes for Data Science | Now that your big data cluster is up, it's ready for data science workloads. This Jupyter Notebook and Azure Data Studio based module will cover the use of python and PySpark, T-SQL and the execution of Spark and Machine Learning workloads. |
121 |
122 |
123 |
124 |
125 |
126 |
Next Steps
127 |
128 |
129 | Next, Continue to Pre-Requisites
130 |
131 | **Workshop Authors and Contributors**
132 |
133 | - [The Microsoft SQL Server Team](http://microsoft.com/sql)
134 | - [Chris Adkin](https://www.linkedin.com/in/wollatondba/), Pure Storage
135 |
136 | **Legal Notice**
137 |
138 | *Kubernetes and the Kubernetes logo are trademarks or registered trademarks of The Linux Foundation. in the United States and/or other countries. The Linux Foundation and other parties may also have trademark rights in other terms used herein. This Workshop is not certified, accredited, affiliated with, nor endorsed by Kubernetes or The Linux Foundation.*
--------------------------------------------------------------------------------
/LICENSES/SECURITY.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Security
4 |
5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
6 |
7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)) of a security vulnerability, please report it to us as described below.
8 |
9 | ## Reporting Security Issues
10 |
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 |
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
14 |
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
16 |
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
18 |
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 |
21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 | * Full paths of source file(s) related to the manifestation of the issue
23 | * The location of the affected source code (tag/branch/commit or direct URL)
24 | * Any special configuration required to reproduce the issue
25 | * Step-by-step instructions to reproduce the issue
26 | * Proof-of-concept or exploit code (if possible)
27 | * Impact of the issue, including how an attacker might exploit the issue
28 |
29 | This information will help us triage your report more quickly.
30 |
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
32 |
33 | ## Preferred Languages
34 |
35 | We prefer all communications to be in English.
36 |
37 | ## Policy
38 |
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
40 |
41 |
42 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: SQL Server Big Data Clusters - Architecture
4 |
5 | #### A Microsoft Course from the SQL Server team
6 |
7 |
8 |
9 |
10 |
11 | - About this Workshop
12 | - Business Applications of this Workshop
13 | - Technologies used in this Workshop
14 | - Before Taking this Workshop
15 | - Workshop Details
16 | - Related Workshops
17 | - Workshop Modules
18 | - Next Steps
19 |
20 |
21 |
22 |
23 |
24 | Welcome to this Microsoft solutions workshop on the architecture on *SQL Server Big Data Clusters*. In this workshop, you'll learn how SQL Server Big Data Clusters (BDC) implements large-scale data processing and machine learning, and how to select and plan for the proper architecture to enable machine learning to train your models using Python, R, Java or SparkML to operationalize these models, and how to deploy your intelligent apps side-by-side with their data.
25 |
26 | The focus of this workshop is to understand how to deploy an on-premises or local environment of a big data cluster, and understand the components of the big data solution architecture.
27 |
28 | You'll start by understanding the concepts of big data analytics, and you'll get an overview of the technologies (such as containers, container orchestration, Spark and HDFS, machine learning, and other technologies) that you will use throughout the workshop. Next, you'll understand the architecture of a BDC. You'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization.
29 |
30 | This [github README.MD file](https://lab.github.com/githubtraining/introduction-to-github) explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution. To download this Lab to your local computer, click the **Clone or Download** button you see at the top right side of this page. [More about that process is here](https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository).
31 |
32 | You can view all of the [courses and other workshops our team has created at this link - open in a new tab to find out more.](https://microsoft.github.io/sqlworkshops/)
33 |
34 |
35 |
36 |
Learning Objectives
37 |
38 | In this workshop you'll learn:
39 |
40 |
41 | - When to use Big Data technology
42 | - The components and technologies of Big Data processing
43 | - Abstractions such as Containers and Container Management as they relate to SQL Server and Big Data
44 | - Planning and architecting an on-premises, in-cloud, or hybrid big data solution with SQL Server
45 | - How to install SQL Server big data clusters on-premises and in the Azure Kubernetes Service (AKS)
46 | - How to work with Apache Spark
47 | - The Data Science Process to create an end-to-end solution
48 | - How to work with the tooling for BDC (Azure Data Studio)
49 | - Monitoring and managing the BDC
50 | - Security considerations
51 |
52 | Starting in SQL Server 2019, big data clusters allows for large-scale, near real-time processing of data over the HDFS file system and other data sources. It also leverages the Apache Spark framework which is integrated into one environment for management, monitoring, and security of your environment. This means that organizations can implement everything from queries to analysis to Machine Learning and Artificial Intelligence within SQL Server, over large-scale, heterogeneous data. SQL Server big data clusters can be implemented fully on-premises, in the cloud using a Kubernetes service such as Azure's AKS, and in a hybrid fashion. This allows for full, partial, and mixed security and control as desired.
53 |
54 | The goal of this workshop is to train the team tasked with architecting and implementing SQL Server big data clusters in the planning, creation, and delivery of a system designed to be used for large-scale data analytics. Since there are multiple technologies and concepts within this solution, the workshop uses multiple types of exercises to prepare the students for this implementation.
55 |
56 | The concepts and skills taught in this workshop form the starting points for:
57 |
58 | * Data Professionals and DevOps teams, to implement and operate a SQL Server big data cluster system.
59 | * Solution Architects and Developers, to understand how to put together an end-to-end solution.
60 | * Data Scientists, to understand the environment used to analyze and solve specific predictive problems.
61 |
62 |
63 |
64 |
65 | Businesses require near real-time insights from ever-larger sets of data from a variety of sources. Large-scale data ingestion requires scale-out storage and processing in ways that allow fast response times. In addition to simply querying this data, organizations want full analysis and even predictive capabilities over their data.
66 |
67 | Some industry examples of big data processing are in Retail (*Demand Prediction, Market-Basket Analysis*), Finance (*Fraud detection, customer segmentation*), Healthcare (*Fiscal control analytics, Disease Prevention prediction and classification, Clinical Trials optimization*), Public Sector (*Revenue prediction, Education effectiveness analysis*), Manufacturing (*Predictive Maintenance, Anomaly Detection*) and Agriculture (*Food Safety analysis, Crop forecasting*) to name just a few.
68 |
69 |
70 |
71 |
72 |
73 | The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided.
74 |
75 |
76 |
77 | Technology | Description |
78 |
79 | Linux | Operating system used in Containers and Container Orchestration |
80 | Containers | Encapsulation level for the SQL Server big data cluster architecture |
81 | Container Orechestration (such as Kubernetes) | Management, control plane for Containers |
82 | Microsoft Azure | Cloud environment for services |
83 | Azure Kubernetes Service (AKS) | Kubernetes as a Service |
84 | Apache HDFS | Scale-out storage subsystem |
85 | Apache Knox | The Knox Gateway provides a single access point for all REST interactions, used for security |
86 | Apache Livy | Job submission system for Apache Spark |
87 | Apache Spark | In-memory large-scale, scale-out data processing architecture used by SQL Server |
88 | Python, R, Java, SparkML | ML/AI programming languages used for Machine Learning and AI Model creation |
89 | Azure Data Studio | Tooling for SQL Server, HDFS, Big Data cluster management, T-SQL, R, Python, and SparkML languages |
90 | SQL Server Machine Learning Services | R, Python and Java extensions for SQL Server |
91 | Microsoft Data Science Process (TDSP) | Project, Development, Control and Management framework |
92 | Monitoring and Management | Dashboards, logs, API's and other constructs to manage and monitor the solution |
93 | Security | RBAC, Keys, Secrets, VNETs and Compliance for the solution |
94 |
95 |
96 |
97 |
98 |
99 | Condensed Lab:
100 | If you have already completed the pre-requisites for this course and are familiar with the technologies listed above, you can jump to a Jupyter Notebooks-based tutorial located here. Load these with Azure Data Studio, starting with bdc_tutorial_00.ipynb.
101 |
102 |
103 |
104 |
105 |
106 |
107 | You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.
108 |
109 | You must have a Microsoft Azure account with the ability to create assets, specifically the Azure Kubernetes Service (AKS).
110 |
111 | This workshop expects that you understand data structures and working with SQL Server and computer networks. This workshop does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform.
112 |
113 | If you are new to these, here are a few references you can complete prior to class:
114 |
115 | - [Microsoft SQL Server](https://docs.microsoft.com/en-us/sql/relational-databases/database-engine-tutorials?view=sql-server-ver15)
116 | - [Microsoft Azure](https://docs.microsoft.com/en-us/learn/paths/azure-fundamentals/)
117 |
118 |
119 |
Setup
120 |
121 | A full prerequisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient).
122 |
123 |
124 |
125 |
126 |
127 | This workshop uses Azure Data Studio, Microsoft Azure AKS, and SQL Server (2019 and higher) with a focus on architecture and implementation.
128 |
129 |
130 |
131 | Primary Audience: | System Architects and Data Professionals tasked with implementing Big Data, Machine Learning and AI solutions |
132 | Secondary Audience: | Security Architects, Developers, and Data Scientists |
133 | Level: | 300 |
134 | Type: | In-Person |
135 | Length: | 8-9 hours |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
143 | - [Technical guide to the Cortana Intelligence Solution Template for predictive maintenance in aerospace and other businesses](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/cortana-analytics-technical-guide-predictive-maintenance)
144 |
145 |
146 |
147 |
148 |
149 | This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution.
150 |
151 |
163 |
164 |
165 |
166 |
Next Steps
167 |
168 | Next, Continue to prerequisites
169 |
170 |
171 | # Contributing
172 |
173 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
174 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
175 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
176 |
177 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
178 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
179 | provided by the bot. You will only need to do this once across all repos using our CLA.
180 |
181 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
182 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
183 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
184 |
185 | # Legal Notices
186 |
187 | ### License
188 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), see [the LICENSE file](https://github.com/MicrosoftDocs/mslearn-tailspin-spacegame-web/blob/master/LICENSE), and grant you a license to any code in the repository under [the MIT License](https://opensource.org/licenses/MIT), see the [LICENSE-CODE file](https://github.com/MicrosoftDocs/mslearn-tailspin-spacegame-web/blob/master/LICENSE-CODE).
189 |
190 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation
191 | may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.
192 | The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.
193 | Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
194 |
195 | Privacy information can be found at https://privacy.microsoft.com/en-us/
196 |
197 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
198 | or trademarks, whether by implication, estoppel or otherwise.
199 |
200 |
--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Security
4 |
5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
6 |
7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
8 |
9 | ## Reporting Security Issues
10 |
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 |
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
14 |
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
16 |
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc).
18 |
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 |
21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 | * Full paths of source file(s) related to the manifestation of the issue
23 | * The location of the affected source code (tag/branch/commit or direct URL)
24 | * Any special configuration required to reproduce the issue
25 | * Step-by-step instructions to reproduce the issue
26 | * Proof-of-concept or exploit code (if possible)
27 | * Impact of the issue, including how an attacker might exploit the issue
28 |
29 | This information will help us triage your report more quickly.
30 |
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
32 |
33 | ## Preferred Languages
34 |
35 | We prefer all communications to be in English.
36 |
37 | ## Policy
38 |
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
40 |
41 |
42 |
--------------------------------------------------------------------------------
/SQL2019BDC/00 - Prerequisites.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: SQL Server Big Data Clusters - Architecture
4 |
5 | #### A Microsoft Course from the SQL Server team
6 |
7 |
8 |
9 |
00 prerequisites
10 |
11 | This workshop is taught using the following components, which you will install and configure in the sections that follow.
12 |
13 | *(Note: Due to the nature of working with large-scale systems, it may not be possible for you to set up everything you need to perform each lab exercise. Participation in each Activity is optional - we will be working through the exercises together, but if you cannot install any software or don't have an Azure account, the instructor will work through each exercise in the workshop. You will also have full access to these materials so that you can work through them later when you have more time and resources.)*
14 |
15 | For this workshop, you will use Microsoft Windows as the base workstation, although Apple and Linux operating systems can be used in production. You can download a Windows 10 Workstation `.ISO` to create a Virtual Machine on the Hypervisor of your choice for free here.
16 |
17 | The other requirements are:
18 |
19 | - **Microsoft Azure**: This workshop uses the Microsoft Azure platform to host the Kubernetes cluster (using the Azure Kubernetes Service), and optionally you can deploy a system there to act as a workstation. You can use an MSDN Account, your own account, or potentially one provided for you, as long as you can create about $100.00 (U.S.) worth of assets.
20 | - **Azure Command Line Interface**: The Azure CLI allows you to work from the command line on multiple platforms to interact with your Azure subscription, and also has control statements for AKS.
21 | - **Python (3)**: Python version 3.5 (and higher) is used by the SQL Server programs to deploy and manage a Big Data Cluster for SQL Server (BDC).
22 | - **The pip3 Package**: The Python package manager *pip3* is used to install various BDC deployment and configuration tools.
23 | - **The kubectl program**: The *kubectl* program is the command-line control feature for Kubernetes.
24 | - **The azdata utility**: The *azdata* program is the deployment and configuration tool for BDC.
25 | - **Azure Data Studio**: The *Azure Data Studio* IDE, along with various Extensions, is used for deploying the system, and querying and management of the BDC. In addition, you will use this tool to participate in the workshop. Note: You can connect to a SQL Server 2019 Big Data Cluster using any SQL Server connection tool or application, such as SQL Server Management Studio, but this course will use Microsoft Azure Data Studio for cluster management, Jupyter Notebooks and other capabilities.
26 |
27 | *Note that all following activities must be completed prior to class - there will not be time to perform these operations during the workshop.*
28 |
29 |
Activity 1: Set up a Microsoft Azure Account
30 |
31 | You have multiple options for setting up Microsoft Azure account to complete this workshop. You can use a Microsoft Developer Network (MSDN) account, a personal or corporate account, or in some cases a pass may be provided by the instructor. (Note: for most classes, the MSDN account is best)
32 |
33 | **If you are attending this course in-person:**
34 | Unless you are explicitly told you will be provided an account by the instructor in the invitation to this workshop, you must have your Microsoft Azure account and Data Science Virtual Machine set up before you arrive at class. There will NOT be time to configure these resources during the course.
35 |
36 |
Option 1 - Microsoft Developer Network Account (MSDN) Account
37 |
38 | The best way to take this workshop is to use your [Microsoft Developer Network (MSDN) benefits if you have a subscription](https://marketplace.visualstudio.com/subscriptions).
39 |
40 | - [Open this resource and click the "Activate your monthly Azure credit" button](https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/)
41 |
42 |
Option 2 - Use Your Own Account
43 |
44 | You can also use your own account or one provided to you by your organization, but you must be able to create a resource group and create, start, and manage a Virtual Machine and an Azure AKS cluster.
45 |
46 |
Option 3 - Use an account provided by your instructor
47 |
48 | Your workshop invitation may have instructed you that they will provide a Microsoft Azure account for you to use. If so, you will receive instructions that it will be provided.
49 |
50 | **Unless you received explicit instructions in your workshop invitations, you much create either an MSDN or Personal account. You must have an account prior to the workshop.**
51 |
52 |
Activity 2: Prepare Your Workstation
53 |
54 | The instructions that follow are the same for either a "base metal" workstation or laptop, or a Virtual Machine. It's best to have at least 4GB of RAM on the management system, and these instructions assume that you are not planning to run the database server or any Containers on the workstation. It's also assumed that you are using a current version of Windows, either desktop or server.
55 |
56 |
57 | *(You can copy and paste all of the commands that follow in a PowerShell window that you run as the system Administrator)*
58 |
59 |
Updates
60 |
61 | First, ensure all of your updates are current. You can use the following commands to do that in an Administrator-level PowerShell session:
62 |
63 |
64 | write-host "Standard Install for Windows. Classroom or test system only - use at your own risk!"
65 | Set-ExecutionPolicy RemoteSigned
66 |
67 | write-host "Update Windows"
68 | Install-Module PSWindowsUpdate
69 | Import-Module PSWindowsUpdate
70 | Get-WindowsUpdate
71 | Install-WindowsUpdate
72 |
73 |
74 | *Note: If you get an error during this update process, evaluate it to see if it is fatal. You may receive certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
75 |
76 |
Install Big Data Cluster Tools
77 |
78 | Next, install the tools to work with Big Data Clusters:
79 |
80 |
81 |
Activity 3: Install BDC Tools
82 |
83 | Open this resource, and follow all instructions for the Microsoft Windows operating system
84 |
85 |
86 | **NOTE:** For the `azdata` utility step below, [use this MSI package](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-install-azdata-installer?view=sql-server-ver15) rather than the `pip` installer.
87 |
88 | - [https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15)
89 |
90 |
Activity 4: Re-Update Your Workstation
91 |
92 | Once again, download the MSI and run it from there. It's always a good idea after this many installations to run Windows Update again:
93 |
94 |
95 | write-host "Re-Update Windows"
96 | Get-WindowsUpdate
97 | Install-WindowsUpdate
98 |
99 |
100 | *Note 1: If you get an error during this update process, evaluate it to see if it is fatal. You may receive certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
101 |
102 | **Note 2: If you are using a Virtual Machine in Azure, power off the Virtual Machine using the Azure Portal every time you are done with it. Turning off the VM using just the Windows power off in the VM only stops it running, but you are still charged for the VM if you do not stop it from the Portal. Stop the VM from the Portal unless you are actively using it.**
103 |
104 |
For Further Study
105 |
108 |
109 |
Next Steps
110 |
111 | Next, Continue to 01 - The Big Data Landscape.
112 |
--------------------------------------------------------------------------------
/SQL2019BDC/03 - Planning, Installation and Configuration.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: SQL Server Big Data Clusters - Architecture
4 |
5 | #### A Microsoft workshop from the SQL Server team
6 |
7 |
8 |
9 |
Planning, Installation and Configuration
10 |
11 | In this workshop you'll cover using a Process and various Platform components to create a Big Data Cluster for SQL Server (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic.
12 |
13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.)
14 |
15 | You'll cover the following topics in this Module:
16 |
17 |
18 |
19 | - 3.0 Planning your Installation
20 | - 3.1 Installing on Azure Kubernetes Service
21 | - 3.2 Installing locally using KubeADM
22 | - Install Class Environment on AKS
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 | NOTE: The following Module is based on the Public Preview of the Microsoft SQL Server 2019 big data cluster feature. These instructions will change as the product is updated. The latest installation instructions are located here.
31 |
32 | A Big Data Cluster for SQL Server (BDC) is deployed onto a Cluster Orchestration system (such as Kubernetes or OpenShift) using the `azdata` utility which creates the appropriate Nodes, Pods, Containers and other constructs for the system. The installation uses various switches on the `azdata` utility, and reads from several variables contained within an internal JSON document when you run the command. Using a switch, you can change these variables. You can also dump the entire document to a file, edit it, and then call the installation that uses that file with the `azdata` command. More detail on that process is located here.
33 |
34 | For planning, it is essential that you understand the SQL Server BDC components, and have a firm understanding of Kubernetes and TCP/IP networking. You should also have an understanding of how SQL Server and Apache Spark use the "Big Four" (*CPU, I/O, Memory and Networking*).
35 |
36 | Since the Cluster Orchestration system is often made up of Virtual Machines that host the Container Images, they must be as large as possible. For the best possible performance, large physical machines that are tuned for optimal performance is a recommended physical architecture. The least viable production system is a Minimum of 3 Linux physical machines or virtual machines. The recommended configuration per machine is 8 CPUs, 32 GB of memory and 100GB of storage. This configuration would support only one or two users with a standard workload, and you would want to increase the system for each additional user or heavier workload.
37 |
38 |
39 | You can deploy Kubernetes in a few ways:
40 |
41 | - In a Cloud Platform such as Azure Kubernetes Service (AKS)
42 |
43 | - In your own Cluster Orchestration system deployment using the appropriate tools such as `KubeADM`
44 |
45 | Regardless of the Cluster Orchestration system target, the general steps for setting up the system are:
46 |
47 | - Set up Cluster Orchestration system with a Cluster target
48 |
49 | - Install the cluster tools on the administration machine
50 |
51 | - Deploy the BDC onto the Cluster Orchestration system
52 |
53 | In the sections that follow, you'll cover the general process for each of these deployments. The official documentation referenced above have the specific steps for each deployment, and the *Activity* section of this Module has the steps for deploying the BDC on AKS for the classroom environment.
54 |
55 |
56 |
57 |
58 |
59 | The Azure Kubernetes Service provides the ability to create a Kubernetes cluster in the Azure portal, with the Azure CLI, or template driven deployment options such as Resource Manager templates and Terraform. When you deploy an AKS cluster, the Kubernetes master and all nodes are deployed and configured for you. Additional features such as advanced networking, Azure Active Directory integration, and monitoring can also be configured during the deployment process.
60 |
61 | An AKS cluster is divided into two components: The *Cluster master nodes* which provide the core Kubernetes services and orchestration of application workloads; and the *Nodes* which run your application workloads.
62 |
63 |
64 |
65 |
66 |
67 | The cluster master includes the following core Kubernetes components:
68 |
69 | - *kube-apiserver* - The API server is how the underlying Kubernetes APIs are exposed. This component provides the interaction for management tools, such as kubectl or the Kubernetes dashboard.
70 |
71 | - *etcd* - To maintain the state of your Kubernetes cluster and configuration, the highly available etcd is a key value store within Kubernetes.
72 |
73 | - *kube-scheduler* - When you create or scale applications, the Scheduler determines what nodes can run the workload and starts them.
74 |
75 | - *kube-controller-manager* - The Controller Manager oversees a number of smaller Controllers that perform actions such as replicating pods and handling node operations.
76 |
77 | The Nodes include the following components:
78 |
79 | - The *kubelet* is the Kubernetes agent that processes the orchestration requests from the cluster master and scheduling of running the requested containers.
80 |
81 | - Virtual networking is handled by the *kube-proxy* on each node. The proxy routes network traffic and manages IP addressing for services and pods.
82 |
83 | - The *container runtime* is the component that allows containerized applications to run and interact with additional resources such as the virtual network and storage. In AKS, Docker is used as the container runtime.
84 |
85 |
86 |
87 |
88 |
89 | In the BDC in an AKS environment, for an optimal experience while validating basic scenarios, you should use at least three agent VMs with at least 4 vCPUs and 32 GB of memory each.
90 |
91 | With this background, you can find the latest specific steps to deploy a BDC on AKS here.
92 |
93 |
94 |
95 |
96 |
97 | If you choose Kubernetes as your Cluster Orchestration system, the kubeadm toolbox helps you bootstrap a Kubernetes cluster that conforms to best practices. Kubeadm also supports other cluster lifecycle functions, such as upgrades, downgrade, and managing bootstrap tokens.
98 |
99 | The kubeadm toolbox can deploy a Kubernetes cluster to physical or virtual machines. It works by specifying the TCP/IP addresses of the targets.
100 |
101 | With this background, you can find the latest specific steps to deploy a BDC using kubeadm here.
102 |
103 |
104 |
105 |
Activity: Check Class Environment on AKS
106 |
107 | In this lab you will check your deployment you performed in Module 01 of the BDC on the Azure Kubernetes Service.
108 |
109 | Using the following steps, you will evaluate your Resource Group in Azure that holds your BDC on AKS that you deployed earlier. When you complete your course you can delete this Resource Group which will stop the Azure charges for this course.
110 |
111 | Steps
112 |
113 |
Log in to the Azure Portal, and locate the Resource Groups deployed for the AKS cluster. How many do you find? What do you think their purposes are?
114 |
115 |
116 |
117 |
118 |
119 |
For Further Study
120 |
123 |
124 |
Next Steps
125 |
126 | Next, Continue to Operationalization.
127 |
--------------------------------------------------------------------------------
/SQL2019BDC/04 - Operationalization.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: SQL Server Big Data Clusters - Architecture
4 |
5 | #### A Microsoft Course from the SQL Server team
6 |
7 |
8 |
9 |
Operationalization
10 |
11 | In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic.
12 |
13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.)
14 |
15 | You'll cover the following topics in this Module:
16 |
17 |
18 | - 4.0 End-To-End Solution for big data clusters
19 | - 4.1 Data Virtualization
20 | - 4.2 Creating a Distributed Data solution using big data clusters
21 | - 4.3 Querying HDFS Data using big data clusters
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 | Recall from The Big Data Landscape module that you learned about the Wide World Importers company. Wide World Importers (WWI) is a traditional brick and mortar business with a long track record of success, generating profits through strong retail store sales of their unique offering of affordable products from around the world. They have a traditional N-tier application that uses a front-end (mobile, web and installed) that interacts with a scale-out middle-tier software product, which in turn stores data in a large SQL Server database that has been scaled-up to meet demand.
31 |
32 |
33 |
34 |
35 |
36 | WWI has now added web and mobile commerce to their platform, which has generated a significant amount of additional data, and data formats. These new platforms were added without integrating into the OLTP system data or Business Intelligence infrastructures. As a result, "silos" of data stores have developed, and ingesting all of this data exceeds the scale of their current RDBMS server:
37 |
38 |
39 |
40 |
41 |
42 | This presented the following four challenges - the IT team at WWI needs to:
43 |
44 | - Scale data systems to reach more consumers
45 |
46 | - Unlock business insights from multiple sources of structured and unstructured data
47 |
48 | - Apply deep analytics with high-performance responses
49 |
50 | - Enable AI into apps to actively engage with customers
51 |
52 |
53 |
54 |
55 |
56 | Solution - Challenge 1: Scale Data System
57 |
58 | To meet these challenges, the following solution is proposed. Using the BDC platform you learned about in the 02 - BDC Components Module, the solution allows the company to keep it's current codebase, while enabling a flexible scale-out architecture. This answers the first challenge of working with a scale-out system for larger data environments.
59 |
60 | The following diagram illustrates the complete solution that you can use to brief your audience with:
61 |
62 |
63 |
64 |
65 |
66 | In the following sections you'll dive deeper into how this scale is used to solve the rest of the challenges.
67 |
68 |
69 |
70 |
71 |
72 | The next challenge the IT team must solve is to enable a single data query to work across multiple disparate systems, optionally joining to internal SQL Server Tables, and also at scale.
73 |
74 | Using the Data Virtualization capability you saw in the 02 - SQL Server BDC Components Module, the IT team creates External Tables using the PolyBase feature. These External Table definitions are stored in the database on the SQL Server Master Instance within the cluster. When queried by the user, the queries are engaged from the SQL Server Master Instance through the Compute Pool in the SQL Server BDC, which holds Kubernetes Nodes containing the Pods running SQL Server Instances. These Instances send the query to the PolyBase Connector at the target data system, which processes the query based on the type of target system. The results are processed and returned through the PolyBase Connector to the Compute Pool and then on to the Master Instance, and then on to the user.
75 |
76 |
77 |
78 |
79 |
80 | This process allows not only a query to disparate systems, but also those remote systems can hold extremely large sets of data. Normally you are querying a subset of that data, so the results are all that are sent back over the network. These results can be joined with internal tables for a single view, and all from within the same Transact-SQL statements.
81 |
82 |
Activity: Load and query data in an External Table
83 |
84 | In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to query the data in HDFS. This process is similar to connecting to any PolyBase target.
85 |
86 | Steps
87 |
88 |
Open this reference, and perform all of the instructions you see there. This loads your data in preparation for the next Activity.
89 |
Open this reference, and perform all of the instructions you see there. This step shows you how to create and query an External table.
90 |
(Optional) Open this reference, and review the instructions you see there. (You You must have an Oracle server that your BDC can reach to perform these steps, although you can review them if you do not)
91 |
92 |
93 |
94 |
95 |
96 |
97 |
98 | Ad-hoc queries are very useful for many scenarios. There are times when you would like to bring the data into storage, so that you can create denormalized representations of datasets, aggregated data, and other purpose-specific data tasks.
99 |
100 |
101 |
102 |
103 |
104 | Using the Data Virtualization capability you saw in the 02 - BDC Components Module, the IT team creates External Tables using PolyBase statements. These External Table definitions are stored in the database on the SQL Server Master Instance within the cluster. When queried by the user, the queries are engaged from the SQL Server Master Instance through the Compute Pool in the SQL Server BDC, which holds Kubernetes Nodes containing the Pods running SQL Server Instances. These Instances send the query to the PolyBase Connector at the target data system, which processes the query based on the type of target system. The results are processed and returned through the PolyBase Connector to the Compute Pool and then on to the Master Instance, and the PolyBase statements can specify the target of the Data Pool. The SQL Server Instances in the Data Pool store the data in a distributed fashion across multiple databases, called Shards.
105 |
106 |
Activity: Load and query data into the Data Pool
107 |
108 | In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to load data into the Data Pool.
109 |
110 | Steps
111 |
112 |
Open this reference, and perform the instructions you see there. This loads data into the Data Pool.
113 |
114 |
115 |
116 |
117 |
118 |
119 | There are three primary uses for a large cluster of data processing systems for Machine Learning and AI applications. The first is that the users will involved in the creation of the Features used in various ML and AI algorithms, and are often tasked to Label the data. These users can access the Data Pool and Data Storage data stores directly to query and assist with this task.
120 |
121 | The SQL Server Master Instance in the BDC installs with Machine Learning Services, which allow creation, training, evaluation and persisting of Machine Learning Models. Data from all parts of the BDC are available, and Data Science oriented languages and libraries in R, Python and Java are enabled. In this scenario, the Data Scientist creates the R or Python code, and the Transact-SQL Developer wraps that code in a Stored Procedure. This code can be used to train, evaluate and create Machine Learning Models. The Models can be stored in the Master Instance for scoring, or sent on to the App Pool where the Machine Learning Server is running, waiting to accept REST-based calls from applications.
122 |
123 |
124 |
125 |
126 |
127 | The Data Scientist has another option to create and train ML and AI models. The Spark platform within the Storage Pool is accessible through the Knox gateway, using Livy to send Spark Jobs as you learned about in the 02 - SQL Server BDC Components Module. This gives access to the full Spark platform, using Jupyter Notebooks (included in Azure Data Studio) or any other standard tools that can access Spark through REST calls.
128 |
129 |
130 |
131 |
Activity: Load data with Spark, run a Spark Notebook
132 |
133 |
134 | In this activity, you will load the sample data into your big data cluster environment using Spark, and use a Notebook in Azure Data Studio to work with it.
135 |
136 | Steps
137 |
138 |
Open this reference, and follow the instructions you see there. This loads the data in preparation for the Notebook operations.
139 |
Open this reference, and follow the instructions you see there. This simple example shows you how to work with the data you ingested into the Storage Pool using Spark.
140 |
141 |
142 |
143 |
144 |
145 |
For Further Study
146 |
153 |
154 |
Next Steps
155 |
156 | Next, Continue to Management and Monitoring.
157 |
--------------------------------------------------------------------------------
/SQL2019BDC/05 - Management and Monitoring.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: SQL Server Big Data Clusters - Architecture
4 |
5 | #### A Microsoft workshop from the SQL Server team
6 |
7 |
8 |
9 |
Management and Monitoring
10 |
11 | In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic.
12 |
13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.)
14 |
15 | You'll cover the following topics in this Module:
16 |
17 |
18 |
19 | - 5.0 Managing and Monitoring Your Solution
20 | - 5.1 Using kubectl commands
21 | - 5.2 Using azdata commands
22 | - 5.3 Using Grafana and Kibana
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 | There are two primary areas for monitoring your BDC deployment. The first deals with SQL Server 2019, and the second deals with the set of elements in the Cluster.
31 |
32 | For SQL Server, management is much as you would normally perform for any SQL Server system. You have the same type of services, surface points, security areas and other control vectors as in a stand-alone installation of SQL Server. The tools you have available for managing the Master Instance in the BDC are the same as managing a stand-alone installation, including SQL Server Management Studio, command-line interfaces, Azure Data Studio, and third party tools.
33 |
34 | For the cluster components, you have three primary interfaces to use, which you will review next.
35 |
36 |
37 |
38 |
39 |
40 | Since the BDC lives within a Kubernetes cluster, you'll work with the kubectl command to deal with those specific components. The following list is a short version of some of the commands you can use to manage and monitor the BDC implementation of a Kubernetes cluster:
41 |
42 |
43 |
44 | Command | Description |
45 |
46 | az aks get-credentials --name --resource-group | Download the Kubernetes cluster configuration file and set the cluster context |
47 | kubectl get pods --all-namespaces | Get the status of pods in the cluster for either all namespaces or the big data cluster namespace |
48 | kubectl describe pod -n | Get a detailed description of a specific pod in json format output. It includes details, such as the current Kubernetes node that the pod is placed on, the containers running within the pod, and the image used to bootstrap the containers. It also shows other details, such as labels, status, and persisted volumes claims that are associated with the pod |
49 | kubectl get svc -n | Get details for the big data cluster services. These details include their type and the IPs associated with respective services and ports. Note that BDC services are created in a new namespace created at cluster bootstrap time based on the cluster name specified in the azdata create cluster command |
50 | kubectl describe pod -n | Get a detailed description of a service in json format output. It will include details like labels, selector, IP, external-IP (if the service is of LoadBalancer type), port, etc. |
51 | kubectl exec -it -c -n -- /bin/bash | If existing tools or the infrastructure does not enable you to perform a certain task without actually being in the context of the container, you can log in to the container using kubectl exec command. For example, you might need to check if a specific file exists, or you might need to restart services in the container |
52 |
53 |
54 |
55 | kubectl cp pod_name:source_file_path
56 | -c container_name
57 | -n namespace_name
58 | target_local_file_path
59 |
60 |
61 | | Copy files from the container to your local machine. Reverse the source and destination to copy into the container |
62 | kubectl delete pods -n --grace-period=0 --force | For testing availability, resiliency, or data persistence, you can delete a pod to simulate a pod failure with the kubectl delete pods command. Not recommended for production, only to simulate failure |
63 | kubectl get pods -o yaml -n | grep hostIP | Get the IP of the node a pod is currently running on |
64 |
65 |
66 |
67 | Use this resourceto learn more about these commands for troubleshooting the BDC.
69 |
70 | A full list of the **kubectl** commands is here.
71 |
72 |
73 |
Activity: Discover the IP Address of the BDC Master Installation, and Connect to it with Azure Data Studio
74 |
75 |
76 | In this activity, you will Get the IP Address of the Master Instance in your Cluster, and connect with Azure Data Studio.
77 |
78 | Steps
79 |
80 |
Open this resource, and follow the steps there for the AKS deployments: section.
81 |
82 |
83 |
84 |
85 |
86 |
87 |
88 | The **azdata** utility enables cluster administrators to bootstrap and manage big data clusters via the REST APIs exposed by the Controller service. The controller is deployed and hosted in the same Kubernetes namespace where the customer wants to build out a big data cluster. The Controller is responsible for core logic for deploying and managing a big data cluster.
89 |
90 | The Controller service is installed by a Kubernetes administrator during cluster bootstrap, using the azdata command-line utility.
91 |
92 | You can find a list of the switches and commands by typing:
93 |
94 |
95 | azdata --h
96 |
97 |
98 | You used the azdata commands to deploy your cluster, and you can use it to get information about your bdc deployment as well. You should review the documentation for this command here.
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 | You learned about Grafana and Kibana systems in Module 01, Microsoft has created various views within each that you can use to interact with both the SQL Server-specific and Kubernetes portions of the BDC. The Azure Data Studio big data clusters management panel shows the TCP/IP addresses for each of these systems.
107 |
108 |
109 | 
110 |
111 |
112 | 
113 |
114 |
115 | 
116 |
117 |
118 |
Activity: Start dashboard when cluster is running in AKS
119 |
120 |
121 | To launch the Kubernetes dashboard run the following commands:
122 |
123 |
124 | az aks browse --resource-group --name
125 |
126 |
127 | Note:
128 |
129 | If you get the following error:
130 |
131 | Unable to listen on port 8001: All listeners failed to create with the following errors: Unable to create listener: Error listen tcp4 127.0.0.1:8001: >bind: Only one usage of each socket address (protocol/network address/port) is normally permitted. Unable to create listener: Error listen tcp6: address [[::1]]:8001: missing port in >address error: Unable to listen on any of the requested ports: [{8001 9090}]
132 |
133 |
134 | make sure you did not start the dashboard already from another window.
135 |
136 | When you launch the dashboard on your browser, you might get permission warnings due to RBAC being enabled by default in AKS clusters, and the service account used by the dashboard does not have enough permissions to access all resources (for example, pods is forbidden: User "system:serviceaccount:kube-system:kubernetes-dasboard" cannot list pods in the namespace "default"). Run the following command to give the necessary permissions to kubernetes-dashboard, and then restart the dashboard:
137 |
138 |
139 | kubectl create clusterrolebinding kubernetes-dashboard -n kube-system --clusterrole=cluster-admin --serviceaccount=kube-system:kubernetes-dashboard
140 |
141 |
142 |
143 |
144 |
For Further Study
145 |
150 |
151 |
Next Steps
152 |
153 | Next, Continue to Security.
154 |
--------------------------------------------------------------------------------
/SQL2019BDC/06 - Security.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Workshop: SQL Server Big Data Clusters - Architecture (CTP 3.2)
4 |
5 | #### A Microsoft workshop from the SQL Server team
6 |
7 |
8 |
9 |
Security
10 |
11 | In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic.
12 |
13 | (Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.)
14 |
15 | You'll cover the following topics in this Module:
16 |
17 |
18 |
19 | - 6.0 Managing BDC Security
20 | - 6.1 Access
21 | - 6.2 Authentication and Authorization
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 | Authentication is the process of verifying the identity of a user or service and ensuring they are who they are claiming to be. Authorization refers to granting or denying of access to specific resources based on the requesting user's identity. This step is performed after a user is identified through authentication.
30 |
31 | *NOTE: Security will change prior to the General Availability (GA) Release. Active Directory integration is planned for production implementations.*
32 |
33 |
34 |
35 |
36 |
37 | There are three endpoints for entry points to the BDC:
38 |
39 |
40 |
41 | Endpoint | Description |
42 |
43 | HDFS/Spark (Knox) gateway | An HTTPS-based endpoint that proxies other endpoints. The HDFS/Spark gateway is used for accessing services like webHDFS and Livy. Wherever you see references to Knox, this is the endpoint |
44 | Controller endpoint | The endpoint for the BDC management service that exposes REST APIs for managing the cluster. Some tools, such as Azure Data Studio, access the system using this endpoint |
45 | Master Instance | Get a detailed description of a specific pod in json format output. It includes details, such as the current Kubernetes node that the pod is placed on, the containers running within the pod, and the image used to bootstrap the containers. It also shows other details, such as labels, status, and persisted volumes claims that are associated with the pod |
46 |
47 |
48 |
49 | You can see these endpoints in this diagram:
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 | When you create the cluster, a number of logins are created. Some of these logins are for services to communicate with each other, and others are for end users to access the cluster.
61 | Non-SQL Server End-user passwords currently are set using environment variables. These are passwords that cluster administrators use to access services:
62 |
63 |
64 |
65 | Use | Variable |
66 |
67 | Controller username | CONTROLLER_USERNAME=controller_username |
68 | Controller password | CONTROLLER_PASSWORD=controller_password |
69 | SQL Master SA password | MSSQL_SA_PASSWORD=controller_sa_password |
70 | Password for accessing the HDFS/Spark endpoint | KNOX_PASSWORD=knox_password |
71 |
72 |
73 |
74 |
75 | Intra-cluster authentication
76 | Upon deployment of the cluster, a number of SQL logins are created:
77 |
78 | A special SQL login is created in the Controller SQL instance that is system managed, with sysadmin role. The password for this login is captured as a K8s secret. A sysadmin login is created in all SQL instances in the cluster, that Controller owns and manages. It is required for Controller to perform administrative tasks, such as HA setup or upgrade, on these instances. These logins are also used for intra-cluster communication between SQL instances, such as the SQL master instance communicating with a data pool.
79 |
80 | Note: In current release, only basic authentication is supported. Fine-grained access control to HDFS objects, the BDC compute and data pools, is not yet available.
81 |
82 | For Intra-cluster communication with non-SQL services within the BDC, such as Livy to Spark or Spark to the storage pool, security uses certificates. All SQL Server to SQL Server communication is secured using SQL logins.
83 |
84 |
85 |
Activity: Review Security Endpoints
86 |
87 |
88 | In this activity, you will review the endpoints exposed on the cluster.
89 |
90 | Steps
91 |
92 |
Open this reference, and read the information you see for the Service Endpoints section. This shows the addresses and ports exposed to the end-users.
93 |
94 |
95 |
96 |
97 |
98 |
For Further Study
99 |
102 |
103 | Congratulations! You have completed this workshop on SQL Server big data clusters Architecture. You now have the tools, assets, and processes you need to extrapolate this information into other applications.
104 |
--------------------------------------------------------------------------------
/SQL2019BDC/notebooks/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Lab: SQL Server Big Data Clusters - Architecture
4 |
5 | #### A Microsoft Course from the SQL Server team
6 |
7 |
8 |
9 |
10 |
11 | Welcome to this Microsoft solutions Lab on the architecture on SQL Server Big Data Clusters. As part of a larger complete Workshop, you'll experiment with SQL Server Big Data Clusters (BDC), and how you can use it to implement large-scale data processing and machine learning.
12 |
13 | This Lab assumes you have a full understanding the concepts of big data analytics, the technologies (such as containers, Kubernetes, Spark and HDFS, machine learning, and other technologies) that you will use throughout the Lab, the architecture of a BDC. If you are familiar with these topics, you can take a complete course here.
14 |
15 | In this Lab you'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization.
16 |
17 |
18 |
19 |
20 |
21 | This Lab expects that you understand data structures and working with SQL Server and computer networks. This Lab does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform.
22 |
23 | ▶ You need to have all of the prerequisites completed before taking this Lab.
24 |
25 | ▶ You need a full Big Data Cluster for SQL Server up and running, and have identified the connection endpoints, with all security parameters. You find out how to do that here.
26 |
27 |
28 |
29 |
30 |
31 | You will work through six Jupyter Notebooks using the Azure Data Studio tool. Download them and open them in Azure Data Studio, running only one cell at a time.
32 |
33 |
34 | Notebook | Topics |
35 |
36 | bdc_tutorial_00.ipynb | Overview of the Lab and Setup of the source data, problem space, solution options and architectures |
37 |
38 | bdc_tutorial_01.ipynb | In this tutorial you will learn how to run standard SQL Server Queries against the Master Instance (MI) in a SQL Server big data cluster. |
39 |
40 | bdc_tutorial_02.ipynb | In this tutorial you will learn how to create and query Virtualized Data in a SQL Server big data cluster. |
41 |
42 | bdc_tutorial_03.ipynb | In this tutorial you will learn how to create and query a Data Mart using Virtualized Data in a SQL Server big data cluster. |
43 |
44 | bdc_tutorial_04.ipynb | In this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster. |
45 |
46 | bdc_tutorial_05.ipynb | In this tutorial you will learn how to work with Spark Machine Learning Jobs in a SQL Server big data cluster. |
47 |
48 |
49 |
50 |
51 |
52 |
--------------------------------------------------------------------------------
/SQL2019BDC/notebooks/bdc_tutorial_00.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "metadata": {
3 | "kernelspec": {
4 | "name": "python3",
5 | "display_name": "Python 3"
6 | },
7 | "language_info": {
8 | "name": "python",
9 | "version": "3.6.6",
10 | "mimetype": "text/x-python",
11 | "codemirror_mode": {
12 | "name": "ipython",
13 | "version": 3
14 | },
15 | "pygments_lexer": "ipython3",
16 | "nbconvert_exporter": "python",
17 | "file_extension": ".py"
18 | }
19 | },
20 | "nbformat_minor": 2,
21 | "nbformat": 4,
22 | "cells": [
23 | {
24 | "cell_type": "markdown",
25 | "source": [
26 | "
\r\n",
27 | "
\r\n",
28 | "\r\n",
29 | "# SQL Server 2019 big data cluster Tutorial\r\n",
30 | "## 00 - Scenario Overview and System Setup\r\n",
31 | "\r\n",
32 | "In this set of tutorials you'll work with an end-to-end scenario that uses SQL Server 2019's big data clusters to solve real-world problems. \r\n",
33 | ""
34 | ],
35 | "metadata": {
36 | "azdata_cell_guid": "d495f8be-74c3-4658-b897-ad69e6ed88ac"
37 | }
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "source": [
42 | "## Wide World Importers\r\n",
43 | "\r\n",
44 | "Wide World Importers (WWI) is a traditional brick and mortar business that makes specialty items for other companies to use in their products. They design, sell and ship these products worldwide.\r\n",
45 | "\r\n",
46 | "WWI corporate has now added a new partnership with a company called \"AdventureWorks\", which sells bicycles both online and in-store. The AdventureWorks company has asked WWI to produce super-hero themed baskets, seats and other bicycle equipment for a new line of bicycles. WWI corporate has asked the IT department to develop a pilot program with these goals: \r\n",
47 | "\r\n",
48 | "- Integrate the large amounts of data from the AdventureWorks company including customers, products and sales\r\n",
49 | "- Allow a cross-selling strategy so that current WWI customers and AdventureWorks customers see their information without having to re-enter it\r\n",
50 | "- Incorporate their online sales information for deeper analysis\r\n",
51 | "- Provide a historical data set so that the partnership can be evaluated\r\n",
52 | "- Ensure this is a \"framework\" approach, so that it can be re-used with other partners\r\n",
53 | "\r\n",
54 | "WWI has a typical N-Tier application that provides a series of terminals, a Business Logic layer, and a Database back-end. They use on-premises systems, and are interested in linking these to the cloud. \r\n",
55 | "\r\n",
56 | "In this series of tutorials, you will build a solution using the scale-out features of SQL Server 2019, Data Virtualization, Data Marts, and the Data Lake features. "
57 | ],
58 | "metadata": {
59 | "azdata_cell_guid": "3815241f-e81e-4cf1-a48e-c6e67b0ccf7c"
60 | }
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "source": [
65 | "## Running these Tutorials\r\n",
66 | "\r\n",
67 | "- You can read through the output of these completed tutorials if you wish - or:\r\n",
68 | "\r\n",
69 | "- You can follow along with the steps you see in these tutorials by copying the code into a SQL Query window and Spark Notebook using the Azure Data Studio tool, or you can click here to download these Jupyter Notebooks and run them in Azure Data Studio for a hands-on experience.\r\n",
70 | " \r\n",
71 | "- If you would like to run the tutorials, you'll need a SQL Server 2019 big data cluster and the client tools installed. If you want to set up your own cluster, click this reference and follow the steps you see there for the server and tools you need.\r\n",
72 | "\r\n",
73 | "- You will need to have the following: \r\n",
74 | " - Your **Knox Password**\r\n",
75 | " - The **Knox IP Address**\r\n",
76 | " - The `sa` **Username** and **Password** to your Master Instance\r\n",
77 | " - The **IP address** to the SQL Server big data cluster Master Instance \r\n",
78 | " - The **name** of your big data cluster\r\n",
79 | "\r\n",
80 | "For a complete workshop on SQL Server 2019's big data clusters, check out this resource."
81 | ],
82 | "metadata": {
83 | "azdata_cell_guid": "1c3e4b5e-fef4-43ef-a4e3-aa33fe99e25d"
84 | }
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "source": [
89 | "## Copy Database backups to the SQL Server 2019 big data cluster Master Instance\r\n",
90 | "\r\n",
91 | "The first step for the solution is to copy the database backups from WWI from their location on the cloud and then up to your cluster. \r\n",
92 | "\r\n",
93 | "These commands use the `curl` program to pull the files down. [You can read more about curl here](https://curl.haxx.se/). \r\n",
94 | "\r\n",
95 | "The next set of commands use the `kubectl` command to copy the files from where you downloaded them to the data directory of the SQL Server 2019 bdc Master Instance. [You can read more about kubectl here](https://kubernetes.io/docs/reference/kubectl/overview/). \r\n",
96 | "\r\n",
97 | "Note that you will need to replace the section of the script marked with `` with the name of your SQL Server 2019 bdc. (It does not need single or double quotes, just the letters of your cluster name.)\r\n",
98 | "\r\n",
99 | "Notice also that these commands assume a `c:\\temp` location, if you want to use another drive or directory, edit accordingly.\r\n",
100 | "\r\n",
101 | "Once you have edited these commands, you can open a Command Prompt *(not PowerShell)* on your system and copy and paste each block, one at a time and run them there, observing the output.\r\n",
102 | "\r\n",
103 | "In the next tutorial you will restore these databases on the Master Instance."
104 | ],
105 | "metadata": {
106 | "azdata_cell_guid": "5220c555-f819-409e-b206-de9a2dd6d434"
107 | }
108 | },
109 | {
110 | "cell_type": "code",
111 | "source": [
112 | "REM Create a temporary directory for the files\r\n",
113 | "md c:\\temp\r\n",
114 | "cd c:\\temp\r\n",
115 | "\r\n",
116 | "REM Get the database backups\r\n",
117 | "curl \"https://sabwoody.blob.core.windows.net/backups/WideWorldImporters.bak\" -o c:\\temp\\WWI.bak\r\n",
118 | "curl \"https://sabwoody.blob.core.windows.net/backups/AdventureWorks.bak\" -o c:\\temp\\AdventureWorks.bak\r\n",
119 | "curl \"https://sabwoody.blob.core.windows.net/backups/AdventureWorksDW.bak\" -o c:\\temp\\AdventureWorksDW.bak\r\n",
120 | "curl \"https://sabwoody.blob.core.windows.net/backups/WideWorldImportersDW.bak\" -o c:\\temp\\WWIDW.bak\r\n",
121 | ""
122 | ],
123 | "metadata": {
124 | "azdata_cell_guid": "3e1f2304-cc0a-4e0e-96e2-333401b52036"
125 | },
126 | "outputs": [],
127 | "execution_count": 2
128 | },
129 | {
130 | "cell_type": "code",
131 | "source": [
132 | "REM Copy the backups to the data location on the SQL Server Master Instance\r\n",
133 | "cd c:\\temp\r\n",
134 | "kubectl cp WWI.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n",
135 | "kubectl cp WWIDW.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n",
136 | "kubectl cp AdventureWorks.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n",
137 | "kubectl cp AdventureWorksDW.bak master-0:/var/opt/mssql/data -c mssql-server -n \r\n",
138 | ""
139 | ],
140 | "metadata": {
141 | "azdata_cell_guid": "19106890-7c6c-4631-9acc-3dda4d2a50ab"
142 | },
143 | "outputs": [],
144 | "execution_count": null
145 | },
146 | {
147 | "cell_type": "markdown",
148 | "source": [
149 | "## Copy Exported Data to Storage Pool\r\n",
150 | "\r\n",
151 | "Next, you'll download a few text files that will form the external data to be ingested into the Storage Pool HDFS store. In production environments, you have multiple options for moving data into HDFS, such as Spark Streaming or the Azure Data Factory.\r\n",
152 | "\r\n",
153 | "The first code block creates directories in the HDFS store. The second block downloads the source data from a web location. And in the final block, you'll copy the data from your local system to the SQL Server 2019 big data cluster Storage Pool.\r\n",
154 | "\r\n",
155 | "You need to replace the ``, ``, and potentially the drive letter and directory values with the appropriate information on your system. \r\n",
156 | "> (You can use **CTL-H** to open the Find and Replace dialog in the cell)"
157 | ],
158 | "metadata": {
159 | "azdata_cell_guid": "2c426b35-dc57-4dc8-819d-6642deb69110"
160 | }
161 | },
162 | {
163 | "cell_type": "code",
164 | "source": [
165 | "REM Make the Directories in HDFS\r\n",
166 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/product_review_data?op=MKDIRS\"\r\n",
167 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_customers?op=MKDIRS\"\r\n",
168 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_products?op=MKDIRS\"\r\n",
169 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/web_logs?op=MKDIRS\"\r\n",
170 | ""
171 | ],
172 | "metadata": {
173 | "azdata_cell_guid": "f2143d4e-6eb6-4bbc-864a-b417398adc21"
174 | },
175 | "outputs": [],
176 | "execution_count": null
177 | },
178 | {
179 | "cell_type": "code",
180 | "source": [
181 | "REM Get the textfiles \r\n",
182 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/product_reviews.csv\" -o product_reviews.csv\r\n",
183 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/products.csv\" -o products.csv\r\n",
184 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/customers.csv\" -o customers.csv\r\n",
185 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/stockitemholdings.csv\" -o products.csv\r\n",
186 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/web_clickstreams.csv\" -o web_clickstreams.csv\r\n",
187 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/fleet-formatted.csv\" -o fleet-formatted.csv\r\n",
188 | "curl -G \"https://sabwoody.blob.core.windows.net/backups/training-formatted.csv\" -o training-formatted.csv\r\n",
189 | ""
190 | ],
191 | "metadata": {
192 | "azdata_cell_guid": "c8a74514-2e0d-4f3c-99dd-4c541c11e15e"
193 | },
194 | "outputs": [],
195 | "execution_count": null
196 | },
197 | {
198 | "cell_type": "code",
199 | "source": [
200 | "REM Copy the text files to the HDFS directories\r\n",
201 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/product_review_data/product_reviews.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"product_reviews.csv\"\r\n",
202 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_customers/customers.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"customers.csv\"\r\n",
203 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/partner_products/products.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"products.csv\"\r\n",
204 | "curl -i -L -k -u root: -X PUT \"https://:30443/gateway/default/webhdfs/v1/web_logs/web_clickstreams.csv?op=create&overwrite=true\" -H \"Content-Type: application/octet-stream\" -T \"web_clickstreams.csv\"\r\n",
205 | ""
206 | ],
207 | "metadata": {
208 | "azdata_cell_guid": "9c9e49ef-ef0d-47c4-92fd-b7e7bfa2d2f2"
209 | },
210 | "outputs": [],
211 | "execution_count": null
212 | },
213 | {
214 | "cell_type": "markdown",
215 | "source": [
216 | "## Next Step: Working with the SQL Server 2019 big data cluster Master Instance\r\n",
217 | "\r\n",
218 | "Now you're ready to open the next Python Notebook - [bdc_tutorial_01.ipynb](bdc_tutorial_01.ipynb) - to learn how to work with the SQL Server 2019 bdc Master Instance."
219 | ],
220 | "metadata": {
221 | "azdata_cell_guid": "519aa112-47e0-443b-9b27-05fc02349b09"
222 | }
223 | }
224 | ]
225 | }
--------------------------------------------------------------------------------
/SQL2019BDC/notebooks/bdc_tutorial_02.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "metadata": {
3 | "kernelspec": {
4 | "name": "SQL",
5 | "display_name": "SQL",
6 | "language": "sql"
7 | },
8 | "language_info": {
9 | "name": "sql",
10 | "version": ""
11 | }
12 | },
13 | "nbformat_minor": 2,
14 | "nbformat": 4,
15 | "cells": [
16 | {
17 | "cell_type": "markdown",
18 | "source": "
\r\n
\r\n\r\n# SQL Server 2019 big data cluster Tutorial\r\n## 02 - Data Virtualization\r\n\r\nIn this tutorial you will learn how to create and query Virtualized Data in a SQL Server big data cluster. \r\n- You'll start with creating a text file format, since that's the type of data you are reading in. \r\n- Next, you'll create a data source for the SQL Storage Pool, since that allows you to access the HDFS system in BDC. \r\n- Finally, you'll create an External Table, which uses the previous steps to access the data.\r\n",
19 | "metadata": {}
20 | },
21 | {
22 | "cell_type": "code",
23 | "source": "/* Clean up only - run this cell only if you are repeating the tutorial! */\r\nUSE WideWorldImporters;\r\nGO\r\n\r\nDROP EXTERNAL TABLE partner_customers_hdfs\r\nDROP EXTERNAL FILE FORMAT csv_file\r\nDROP EXTERNAL DATA SOURCE SqlStoragePool\r\n",
24 | "metadata": {},
25 | "outputs": [
26 | {
27 | "output_type": "display_data",
28 | "data": {
29 | "text/html": "Commands completed successfully."
30 | },
31 | "metadata": {}
32 | },
33 | {
34 | "output_type": "display_data",
35 | "data": {
36 | "text/html": "Commands completed successfully."
37 | },
38 | "metadata": {}
39 | },
40 | {
41 | "output_type": "display_data",
42 | "data": {
43 | "text/html": "Total execution time: 00:00:00.179"
44 | },
45 | "metadata": {}
46 | }
47 | ],
48 | "execution_count": 0
49 | },
50 | {
51 | "cell_type": "code",
52 | "source": "/* Create External File Format */\r\n\r\nUSE WideWorldImporters;\r\nGO\r\n\r\nCREATE EXTERNAL FILE FORMAT csv_file\r\nWITH (\r\n FORMAT_TYPE = DELIMITEDTEXT,\r\n FORMAT_OPTIONS(\r\n FIELD_TERMINATOR = ',',\r\n STRING_DELIMITER = '0x22',\r\n FIRST_ROW = 2,\r\n USE_TYPE_DEFAULT = TRUE)\r\n);\r\nGO",
53 | "metadata": {},
54 | "outputs": [
55 | {
56 | "output_type": "display_data",
57 | "data": {
58 | "text/html": "Commands completed successfully."
59 | },
60 | "metadata": {}
61 | },
62 | {
63 | "output_type": "display_data",
64 | "data": {
65 | "text/html": "Commands completed successfully."
66 | },
67 | "metadata": {}
68 | },
69 | {
70 | "output_type": "display_data",
71 | "data": {
72 | "text/html": "Total execution time: 00:00:00.257"
73 | },
74 | "metadata": {}
75 | }
76 | ],
77 | "execution_count": 1
78 | },
79 | {
80 | "cell_type": "code",
81 | "source": "/* Create External Data Source to the Storage Pool */\r\nCREATE EXTERNAL DATA SOURCE SqlStoragePool\r\nWITH (LOCATION = 'sqlhdfs://controller-svc/default');",
82 | "metadata": {},
83 | "outputs": [
84 | {
85 | "output_type": "display_data",
86 | "data": {
87 | "text/html": "Commands completed successfully."
88 | },
89 | "metadata": {}
90 | },
91 | {
92 | "output_type": "display_data",
93 | "data": {
94 | "text/html": "Total execution time: 00:00:00.129"
95 | },
96 | "metadata": {}
97 | }
98 | ],
99 | "execution_count": 2
100 | },
101 | {
102 | "cell_type": "code",
103 | "source": "/* Create an External Table that can read from the Storage Pool File Location */\r\nCREATE EXTERNAL TABLE [partner_customers_hdfs]\r\n (\"CustomerSource\" VARCHAR(250) \r\n , \"CustomerName\" VARCHAR(250) \r\n , \"EmailAddress\" VARCHAR(250))\r\n WITH\r\n (\r\n DATA_SOURCE = SqlStoragePool,\r\n LOCATION = '/partner_customers',\r\n FILE_FORMAT = csv_file\r\n );\r\nGO",
104 | "metadata": {},
105 | "outputs": [
106 | {
107 | "output_type": "display_data",
108 | "data": {
109 | "text/html": "Commands completed successfully."
110 | },
111 | "metadata": {}
112 | },
113 | {
114 | "output_type": "display_data",
115 | "data": {
116 | "text/html": "Total execution time: 00:00:00.662"
117 | },
118 | "metadata": {}
119 | }
120 | ],
121 | "execution_count": 3
122 | },
123 | {
124 | "cell_type": "code",
125 | "source": "/* Read Data from HDFS using only T-SQL */\r\n\r\nSELECT TOP 10 CustomerSource\r\n, CustomerName\r\n, EMailAddress\r\n FROM [partner_customers_hdfs] hdfs\r\nWHERE EmailAddress LIKE '%wingtip%'\r\nORDER BY CustomerSource, CustomerName;\r\nGO\r\n",
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "output_type": "display_data",
130 | "data": {
131 | "text/html": "(10 rows affected)"
132 | },
133 | "metadata": {}
134 | },
135 | {
136 | "output_type": "display_data",
137 | "data": {
138 | "text/html": "Total execution time: 00:00:00.699"
139 | },
140 | "metadata": {}
141 | },
142 | {
143 | "output_type": "execute_result",
144 | "metadata": {},
145 | "execution_count": 5,
146 | "data": {
147 | "application/vnd.dataresource+json": {
148 | "schema": {
149 | "fields": [
150 | {
151 | "name": "CustomerSource"
152 | },
153 | {
154 | "name": "CustomerName"
155 | },
156 | {
157 | "name": "EMailAddress"
158 | }
159 | ]
160 | },
161 | "data": [
162 | {
163 | "0": "AdventureWorks",
164 | "1": "Åšani Nair",
165 | "2": "åšani@wingtiptoys.com\r"
166 | },
167 | {
168 | "0": "AdventureWorks",
169 | "1": "Åšani Sen",
170 | "2": "åšani@wingtiptoys.com\r"
171 | },
172 | {
173 | "0": "AdventureWorks",
174 | "1": "Aakriti Bhamidipati",
175 | "2": "aakriti@wingtiptoys.com\r"
176 | },
177 | {
178 | "0": "AdventureWorks",
179 | "1": "Aamdaal Kamasamudram",
180 | "2": "aamdaal@wingtiptoys.com\r"
181 | },
182 | {
183 | "0": "AdventureWorks",
184 | "1": "Abel Pirvu",
185 | "2": "abel@wingtiptoys.com\r"
186 | },
187 | {
188 | "0": "AdventureWorks",
189 | "1": "Abhaya Rambhatla",
190 | "2": "abhaya@wingtiptoys.com\r"
191 | },
192 | {
193 | "0": "AdventureWorks",
194 | "1": "Abhra Thakur",
195 | "2": "abhra@wingtiptoys.com\r"
196 | },
197 | {
198 | "0": "AdventureWorks",
199 | "1": "Adam Balaz",
200 | "2": "adam@wingtiptoys.com\r"
201 | },
202 | {
203 | "0": "AdventureWorks",
204 | "1": "Adirake Narkbunnum",
205 | "2": "adirake@wingtiptoys.com\r"
206 | },
207 | {
208 | "0": "AdventureWorks",
209 | "1": "Adirake Saenamuang",
210 | "2": "adirake@wingtiptoys.com\r"
211 | }
212 | ]
213 | },
214 | "text/html": "CustomerSource | CustomerName | EMailAddress |
---|
AdventureWorks | Åšani Nair | åšani@wingtiptoys.com\r |
AdventureWorks | Åšani Sen | åšani@wingtiptoys.com\r |
AdventureWorks | Aakriti Bhamidipati | aakriti@wingtiptoys.com\r |
AdventureWorks | Aamdaal Kamasamudram | aamdaal@wingtiptoys.com\r |
AdventureWorks | Abel Pirvu | abel@wingtiptoys.com\r |
AdventureWorks | Abhaya Rambhatla | abhaya@wingtiptoys.com\r |
AdventureWorks | Abhra Thakur | abhra@wingtiptoys.com\r |
AdventureWorks | Adam Balaz | adam@wingtiptoys.com\r |
AdventureWorks | Adirake Narkbunnum | adirake@wingtiptoys.com\r |
AdventureWorks | Adirake Saenamuang | adirake@wingtiptoys.com\r |
"
215 | }
216 | }
217 | ],
218 | "execution_count": 5
219 | },
220 | {
221 | "cell_type": "code",
222 | "source": "/* Now Join Those to show customers we currently have in a SQL Server Database \r\nand the Category they qre in the External Table */\r\nUSE WideWorldImporters;\r\nGO\r\n\r\nSELECT TOP 10 a.FullName\r\n , b.CustomerSource\r\n FROM Application.People a\r\n INNER JOIN partner_customers_hdfs b ON a.FullName = b.CustomerName\r\n ORDER BY FullName ASC;\r\n GO",
223 | "metadata": {},
224 | "outputs": [
225 | {
226 | "output_type": "display_data",
227 | "data": {
228 | "text/html": "Commands completed successfully."
229 | },
230 | "metadata": {}
231 | },
232 | {
233 | "output_type": "display_data",
234 | "data": {
235 | "text/html": "(10 rows affected)"
236 | },
237 | "metadata": {}
238 | },
239 | {
240 | "output_type": "display_data",
241 | "data": {
242 | "text/html": "Total execution time: 00:00:00.661"
243 | },
244 | "metadata": {}
245 | },
246 | {
247 | "output_type": "execute_result",
248 | "metadata": {},
249 | "execution_count": 6,
250 | "data": {
251 | "application/vnd.dataresource+json": {
252 | "schema": {
253 | "fields": [
254 | {
255 | "name": "FullName"
256 | },
257 | {
258 | "name": "CustomerSource"
259 | }
260 | ]
261 | },
262 | "data": [
263 | {
264 | "0": "Aahlada Thota",
265 | "1": "AdventureWorks"
266 | },
267 | {
268 | "0": "Aakarsha Nookala",
269 | "1": "AdventureWorks"
270 | },
271 | {
272 | "0": "Aakriti Bhamidipati",
273 | "1": "AdventureWorks"
274 | },
275 | {
276 | "0": "Aamdaal Kamasamudram",
277 | "1": "AdventureWorks"
278 | },
279 | {
280 | "0": "Abel Pirvu",
281 | "1": "AdventureWorks"
282 | },
283 | {
284 | "0": "Abhaya Rambhatla",
285 | "1": "AdventureWorks"
286 | },
287 | {
288 | "0": "Abhra Thakur",
289 | "1": "AdventureWorks"
290 | },
291 | {
292 | "0": "Adam Balaz",
293 | "1": "AdventureWorks"
294 | },
295 | {
296 | "0": "Adam Dvorak",
297 | "1": "AdventureWorks"
298 | },
299 | {
300 | "0": "Adam Kubat",
301 | "1": "AdventureWorks"
302 | }
303 | ]
304 | },
305 | "text/html": "FullName | CustomerSource |
---|
Aahlada Thota | AdventureWorks |
Aakarsha Nookala | AdventureWorks |
Aakriti Bhamidipati | AdventureWorks |
Aamdaal Kamasamudram | AdventureWorks |
Abel Pirvu | AdventureWorks |
Abhaya Rambhatla | AdventureWorks |
Abhra Thakur | AdventureWorks |
Adam Balaz | AdventureWorks |
Adam Dvorak | AdventureWorks |
Adam Kubat | AdventureWorks |
"
306 | }
307 | }
308 | ],
309 | "execution_count": 6
310 | },
311 | {
312 | "cell_type": "markdown",
313 | "source": "## Next Steps: Continue on to Working with the SQL Server Data Pool\r\n\r\nNow you're ready to open the next Python Notebook - `bdc_tutorial_03.ipynb` - to learn how to create and work with a Data Mart.",
314 | "metadata": {}
315 | }
316 | ]
317 | }
--------------------------------------------------------------------------------
/SQL2019BDC/notebooks/bdc_tutorial_03.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "metadata": {
3 | "kernelspec": {
4 | "name": "SQL",
5 | "display_name": "SQL",
6 | "language": "sql"
7 | },
8 | "language_info": {
9 | "name": "sql",
10 | "version": ""
11 | }
12 | },
13 | "nbformat_minor": 2,
14 | "nbformat": 4,
15 | "cells": [
16 | {
17 | "cell_type": "markdown",
18 | "source": "
\r\n
\r\n\r\n# SQL Server 2019 big data cluster Tutorial\r\n## 03 - Creating and Querying a Data Mart\r\n\r\nIn this tutorial you will learn how to create and query a Data Mart using Virtualized Data in a SQL Server big data cluster. \r\n\r\nWide World Importers is interested in ingesting the data from web logs from an HDFS source where they have been streamed. They want to be able to analyze the traffic to see if there is a pattern in time, products or locations. \r\n\r\nThe web logs, however, are refreshed periodically. WWI would like to keep the logs in local storage to do deeper analysis. \r\n\r\nIn this Jupyter Notebook you'll create a location to store the log files as a SQL Server Table in the SQL Data Pool, and then fill it by creating an External Table that reads HDFS.",
19 | "metadata": {}
20 | },
21 | {
22 | "cell_type": "code",
23 | "source": "USE WideWorldImporters;\r\nGO\r\n\r\nCREATE EXTERNAL DATA SOURCE SqlDataPool\r\nWITH (LOCATION = 'sqldatapool://controller-svc/default');",
24 | "metadata": {},
25 | "outputs": [
26 | {
27 | "output_type": "display_data",
28 | "data": {
29 | "text/html": "Commands completed successfully."
30 | },
31 | "metadata": {}
32 | },
33 | {
34 | "output_type": "display_data",
35 | "data": {
36 | "text/html": "Commands completed successfully."
37 | },
38 | "metadata": {}
39 | },
40 | {
41 | "output_type": "display_data",
42 | "data": {
43 | "text/html": "Total execution time: 00:00:00.166"
44 | },
45 | "metadata": {}
46 | }
47 | ],
48 | "execution_count": 3
49 | },
50 | {
51 | "cell_type": "code",
52 | "source": "CREATE EXTERNAL TABLE [web_clickstream_clicks_data_pool]\r\n (\"wcs_click_date_sk\" BIGINT \r\n , \"wcs_click_time_sk\" BIGINT \r\n , \"wcs_sales_sk\" BIGINT \r\n , \"wcs_item_sk\" BIGINT\r\n , \"wcs_web_page_sk\" BIGINT \r\n , \"wcs_user_sk\" BIGINT)\r\n WITH\r\n (\r\n DATA_SOURCE = SqlDataPool,\r\n DISTRIBUTION = ROUND_ROBIN\r\n );\r\nGO",
53 | "metadata": {},
54 | "outputs": [
55 | {
56 | "output_type": "display_data",
57 | "data": {
58 | "text/html": "Commands completed successfully."
59 | },
60 | "metadata": {}
61 | },
62 | {
63 | "output_type": "display_data",
64 | "data": {
65 | "text/html": "Total execution time: 00:00:08.849"
66 | },
67 | "metadata": {}
68 | }
69 | ],
70 | "execution_count": 4
71 | },
72 | {
73 | "cell_type": "code",
74 | "source": "/* Create an External Table that can read from the Storage Pool File Location */\r\nIF NOT EXISTS(SELECT * FROM sys.external_tables WHERE name = 'web_clickstreams_hdfs')\r\nBEGIN\r\n CREATE EXTERNAL TABLE [web_clickstreams_hdfs]\r\n (\"wcs_click_date_sk\" BIGINT \r\n , \"wcs_click_time_sk\" BIGINT \r\n , \"wcs_sales_sk\" BIGINT \r\n , \"wcs_item_sk\" BIGINT\r\n , \"wcs_web_page_sk\" BIGINT \r\n , \"wcs_user_sk\" BIGINT)\r\n WITH\r\n (\r\n DATA_SOURCE = SqlStoragePool,\r\n LOCATION = '/web_logs',\r\n FILE_FORMAT = csv_file\r\n );\r\nEND",
75 | "metadata": {},
76 | "outputs": [
77 | {
78 | "output_type": "display_data",
79 | "data": {
80 | "text/html": "Commands completed successfully."
81 | },
82 | "metadata": {}
83 | },
84 | {
85 | "output_type": "display_data",
86 | "data": {
87 | "text/html": "Total execution time: 00:00:00.223"
88 | },
89 | "metadata": {}
90 | }
91 | ],
92 | "execution_count": 5
93 | },
94 | {
95 | "cell_type": "code",
96 | "source": "BEGIN\r\n INSERT INTO web_clickstream_clicks_data_pool\r\n SELECT wcs_click_date_sk\r\n , wcs_click_time_sk \r\n , wcs_sales_sk \r\n , wcs_item_sk \r\n , wcs_web_page_sk \r\n , wcs_user_sk \r\n FROM web_clickstreams_hdfs\r\nEND",
97 | "metadata": {},
98 | "outputs": [
99 | {
100 | "output_type": "display_data",
101 | "data": {
102 | "text/html": "(6770549 rows affected)"
103 | },
104 | "metadata": {}
105 | },
106 | {
107 | "output_type": "display_data",
108 | "data": {
109 | "text/html": "Total execution time: 00:00:42.670"
110 | },
111 | "metadata": {}
112 | }
113 | ],
114 | "execution_count": 6
115 | },
116 | {
117 | "cell_type": "code",
118 | "source": "SELECT count(*) FROM [dbo].[web_clickstream_clicks_data_pool]\r\nSELECT TOP 10 * FROM [dbo].[web_clickstream_clicks_data_pool]",
119 | "metadata": {},
120 | "outputs": [
121 | {
122 | "output_type": "display_data",
123 | "data": {
124 | "text/html": "(1 row affected)"
125 | },
126 | "metadata": {}
127 | },
128 | {
129 | "output_type": "display_data",
130 | "data": {
131 | "text/html": "(10 rows affected)"
132 | },
133 | "metadata": {}
134 | },
135 | {
136 | "output_type": "display_data",
137 | "data": {
138 | "text/html": "Total execution time: 00:00:00.843"
139 | },
140 | "metadata": {}
141 | },
142 | {
143 | "output_type": "execute_result",
144 | "metadata": {},
145 | "execution_count": 7,
146 | "data": {
147 | "application/vnd.dataresource+json": {
148 | "schema": {
149 | "fields": [
150 | {
151 | "name": "(No column name)"
152 | }
153 | ]
154 | },
155 | "data": [
156 | {
157 | "0": "6770549"
158 | }
159 | ]
160 | },
161 | "text/html": ""
162 | }
163 | },
164 | {
165 | "output_type": "execute_result",
166 | "metadata": {},
167 | "execution_count": 7,
168 | "data": {
169 | "application/vnd.dataresource+json": {
170 | "schema": {
171 | "fields": [
172 | {
173 | "name": "wcs_click_date_sk"
174 | },
175 | {
176 | "name": "wcs_click_time_sk"
177 | },
178 | {
179 | "name": "wcs_sales_sk"
180 | },
181 | {
182 | "name": "wcs_item_sk"
183 | },
184 | {
185 | "name": "wcs_web_page_sk"
186 | },
187 | {
188 | "name": "wcs_user_sk"
189 | }
190 | ]
191 | },
192 | "data": [
193 | {
194 | "0": "37775",
195 | "1": "35460",
196 | "2": "NULL",
197 | "3": "7394",
198 | "4": "53",
199 | "5": "NULL"
200 | },
201 | {
202 | "0": "37775",
203 | "1": "12155",
204 | "2": "NULL",
205 | "3": "13157",
206 | "4": "53",
207 | "5": "NULL"
208 | },
209 | {
210 | "0": "37775",
211 | "1": "4880",
212 | "2": "NULL",
213 | "3": "13098",
214 | "4": "53",
215 | "5": "NULL"
216 | },
217 | {
218 | "0": "37775",
219 | "1": "36272",
220 | "2": "NULL",
221 | "3": "6851",
222 | "4": "53",
223 | "5": "NULL"
224 | },
225 | {
226 | "0": "37775",
227 | "1": "24922",
228 | "2": "NULL",
229 | "3": "5198",
230 | "4": "53",
231 | "5": "NULL"
232 | },
233 | {
234 | "0": "37776",
235 | "1": "74100",
236 | "2": "NULL",
237 | "3": "16015",
238 | "4": "53",
239 | "5": "NULL"
240 | },
241 | {
242 | "0": "37776",
243 | "1": "26833",
244 | "2": "NULL",
245 | "3": "12921",
246 | "4": "53",
247 | "5": "NULL"
248 | },
249 | {
250 | "0": "37776",
251 | "1": "72943",
252 | "2": "NULL",
253 | "3": "5015",
254 | "4": "53",
255 | "5": "NULL"
256 | },
257 | {
258 | "0": "37776",
259 | "1": "9387",
260 | "2": "NULL",
261 | "3": "12274",
262 | "4": "53",
263 | "5": "NULL"
264 | },
265 | {
266 | "0": "37776",
267 | "1": "32557",
268 | "2": "NULL",
269 | "3": "11344",
270 | "4": "53",
271 | "5": "NULL"
272 | }
273 | ]
274 | },
275 | "text/html": "wcs_click_date_sk | wcs_click_time_sk | wcs_sales_sk | wcs_item_sk | wcs_web_page_sk | wcs_user_sk |
---|
37775 | 35460 | NULL | 7394 | 53 | NULL |
37775 | 12155 | NULL | 13157 | 53 | NULL |
37775 | 4880 | NULL | 13098 | 53 | NULL |
37775 | 36272 | NULL | 6851 | 53 | NULL |
37775 | 24922 | NULL | 5198 | 53 | NULL |
37776 | 74100 | NULL | 16015 | 53 | NULL |
37776 | 26833 | NULL | 12921 | 53 | NULL |
37776 | 72943 | NULL | 5015 | 53 | NULL |
37776 | 9387 | NULL | 12274 | 53 | NULL |
37776 | 32557 | NULL | 11344 | 53 | NULL |
"
276 | }
277 | }
278 | ],
279 | "execution_count": 7
280 | },
281 | {
282 | "cell_type": "markdown",
283 | "source": "## Next Steps: Continue on to Working with Spark and ETL\r\n\r\nNow you're ready to open the next Python Notebook - `bdc_tutorial_04.ipynb` - to learn how to create and work with Spark and Extracting, Transforming and Loading data.",
284 | "metadata": {}
285 | }
286 | ]
287 | }
--------------------------------------------------------------------------------
/SQL2019BDC/notebooks/bdc_tutorial_04.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "metadata": {
3 | "kernelspec": {
4 | "name": "pyspark3kernel",
5 | "display_name": "PySpark3"
6 | },
7 | "language_info": {
8 | "name": "pyspark3",
9 | "mimetype": "text/x-python",
10 | "codemirror_mode": {
11 | "name": "python",
12 | "version": 3
13 | },
14 | "pygments_lexer": "python3"
15 | }
16 | },
17 | "nbformat_minor": 2,
18 | "nbformat": 4,
19 | "cells": [
20 | {
21 | "cell_type": "markdown",
22 | "source": "
\r\n
\r\n\r\n# SQL Server 2019 big data cluster Tutorial\r\n## 04 - Using Spark for ETL\r\n\r\nIn this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster. \r\n\r\nMany times Spark is used to do transformations on data at large scale. In this Jupyter Notebook, you'll read a large text file into a Spark DataFrame, and then save out the top 10 examples as a table using SparkSQL.",
23 | "metadata": {}
24 | },
25 | {
26 | "cell_type": "code",
27 | "source": "# Read the product reviews CSV files into a spark data frame, print schema & top rows\r\nresults = spark.read.option(\"inferSchema\", \"true\").csv('/product_review_data').toDF(\"Item_ID\", \"Review\")",
28 | "metadata": {},
29 | "outputs": [
30 | {
31 | "name": "stdout",
32 | "text": "Starting Spark application\n",
33 | "output_type": "stream"
34 | },
35 | {
36 | "data": {
37 | "text/plain": "",
38 | "text/html": "\nID | YARN Application ID | Kind | State | Spark UI | Driver log | Current session? |
---|
0 | application_1561806272028_0001 | pyspark3 | idle | Link | Link | ✔ |
"
39 | },
40 | "metadata": {},
41 | "output_type": "display_data"
42 | },
43 | {
44 | "name": "stdout",
45 | "text": "SparkSession available as 'spark'.\n",
46 | "output_type": "stream"
47 | }
48 | ],
49 | "execution_count": 3
50 | },
51 | {
52 | "cell_type": "code",
53 | "source": "# Save results as parquet file and create hive table\r\nresults.write.format(\"parquet\").mode(\"overwrite\").saveAsTable(\"Top_Product_Reviews\")",
54 | "metadata": {},
55 | "outputs": [],
56 | "execution_count": 4
57 | },
58 | {
59 | "cell_type": "code",
60 | "source": "# Execute Spark SQL commands\r\nsqlDF = spark.sql(\"SELECT * FROM Top_Product_Reviews LIMIT 10\")\r\nsqlDF.show()",
61 | "metadata": {},
62 | "outputs": [
63 | {
64 | "name": "stdout",
65 | "text": "+-------+--------------------+\n|Item_ID| Review|\n+-------+--------------------+\n| 72621|Works fine. Easy ...|\n| 89334|great product to ...|\n| 89335|Next time will go...|\n| 84259|Great Gift Great ...|\n| 84398|After trip to Par...|\n| 66434|Simply the best t...|\n| 66501|This is the exact...|\n| 66587|Not super magnet;...|\n| 66680|Installed as bath...|\n| 66694|Our home was buil...|\n+-------+--------------------+",
66 | "output_type": "stream"
67 | }
68 | ],
69 | "execution_count": 5
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "source": "## Next Steps: Continue on to Working with Spark and Machine Learning\r\n\r\nNow you're ready to open the final Python Notebook in this tutorial series - `bdc_tutorial_05.ipynb` - to learn how to create and work with Spark and Machine Learning.",
74 | "metadata": {}
75 | }
76 | ]
77 | }
--------------------------------------------------------------------------------
/SQL2019BDC/notebooks/bdc_tutorial_05.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "metadata": {
3 | "kernelspec": {
4 | "name": "pysparkkernel",
5 | "display_name": "PySpark"
6 | },
7 | "language_info": {
8 | "name": "pyspark",
9 | "mimetype": "text/x-python",
10 | "codemirror_mode": {
11 | "name": "python",
12 | "version": 3
13 | },
14 | "pygments_lexer": "python3"
15 | }
16 | },
17 | "nbformat_minor": 2,
18 | "nbformat": 4,
19 | "cells": [
20 | {
21 | "cell_type": "markdown",
22 | "source": [
23 | "
\r\n",
24 | "
\r\n",
25 | "\r\n",
26 | "# SQL Server 2019 big data cluster Tutorial\r\n",
27 | "## 05 - Using Spark For Machine Learning\r\n",
28 | "\r\n",
29 | "In this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster. \r\n",
30 | "\r\n",
31 | "Wide World Importers has refridgerated trucks to deliver temperature-sensitive products. These are high-profit, and high-expense items. In the past, there have been failures in the cooling systems, and the primary culprit has been the deep-cycle batteries used in the system.\r\n",
32 | "\r\n",
33 | "WWI began replacing the batteriess every three months as a preventative measure, but this has a high cost. Recently, the taxes on recycling batteries has increased dramatically. The CEO has asked the Data Science team if they can investigate creating a Predictive Maintenance system to more accurately tell the maintenance staff how long a battery will last, rather than relying on a flat 3 month cycle. \r\n",
34 | "\r\n",
35 | "The trucks have sensors that transmit data to a file location. The trips are also logged. In this Jupyter Notebook, you'll create, train and store a Machine Learning model using SciKit-Learn, so that it can be deployed to multiple hosts. "
36 | ],
37 | "metadata": {
38 | "azdata_cell_guid": "969bbd54-5f8e-49eb-b466-5e05633fa7be"
39 | }
40 | },
41 | {
42 | "cell_type": "code",
43 | "source": [
44 | "import pickle \r\n",
45 | "import pandas as pd\r\n",
46 | "import numpy as np\r\n",
47 | "import datetime as dt\r\n",
48 | "from sklearn.linear_model import LogisticRegression\r\n",
49 | "from sklearn.model_selection import train_test_split"
50 | ],
51 | "metadata": {
52 | "azdata_cell_guid": "03e7dc98-c577-4616-9708-e82908019d40"
53 | },
54 | "outputs": [
55 | {
56 | "output_type": "stream",
57 | "name": "stdout",
58 | "text": "Starting Spark application\n"
59 | },
60 | {
61 | "output_type": "display_data",
62 | "data": {
63 | "text/plain": "",
64 | "text/html": "\nID | YARN Application ID | Kind | State | Spark UI | Driver log | Current session? |
---|
2 | application_1569595626385_0003 | pyspark3 | idle | Link | Link | ✔ |
"
65 | },
66 | "metadata": {}
67 | },
68 | {
69 | "output_type": "stream",
70 | "name": "stdout",
71 | "text": "SparkSession available as 'spark'.\n"
72 | }
73 | ],
74 | "execution_count": 2
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "source": [
79 | "First, download the sensor data from the location where it is transmitted from the trucks, and load it into a Spark DataFrame."
80 | ],
81 | "metadata": {
82 | "azdata_cell_guid": "18f250c3-d1fd-40e1-a652-10f948c8b0ab"
83 | }
84 | },
85 | {
86 | "cell_type": "code",
87 | "source": [
88 | "df = pd.read_csv('https://sabwoody.blob.core.windows.net/backups/training-formatted.csv', header=0)\r\n",
89 | "\r\n",
90 | "df.dropna()\r\n",
91 | "print(df.shape)\r\n",
92 | "print(list(df.columns))"
93 | ],
94 | "metadata": {
95 | "azdata_cell_guid": "da351ace-6907-411b-a1ab-e3e8a1be8111"
96 | },
97 | "outputs": [
98 | {
99 | "output_type": "stream",
100 | "name": "stdout",
101 | "text": "(10000, 74)\n['Survival_In_Days', 'Province', 'Region', 'Trip_Length_Mean', 'Trip_Length_Sigma', 'Trips_Per_Day_Mean', 'Trips_Per_Day_Sigma', 'Battery_Rated_Cycles', 'Manufacture_Month', 'Manufacture_Year', 'Alternator_Efficiency', 'Car_Has_EcoStart', 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first', 'Sensor_Reading_1', 'Sensor_Reading_2', 'Sensor_Reading_3', 'Sensor_Reading_4', 'Sensor_Reading_5', 'Sensor_Reading_6', 'Sensor_Reading_7', 'Sensor_Reading_8', 'Sensor_Reading_9', 'Sensor_Reading_10', 'Sensor_Reading_11', 'Sensor_Reading_12', 'Sensor_Reading_13', 'Sensor_Reading_14', 'Sensor_Reading_15', 'Sensor_Reading_16', 'Sensor_Reading_17', 'Sensor_Reading_18', 'Sensor_Reading_19', 'Sensor_Reading_20', 'Sensor_Reading_21', 'Sensor_Reading_22', 'Sensor_Reading_23', 'Sensor_Reading_24', 'Sensor_Reading_25', 'Sensor_Reading_26', 'Sensor_Reading_27', 'Sensor_Reading_28', 'Sensor_Reading_29', 'Sensor_Reading_30', 'Sensor_Reading_31', 'Sensor_Reading_32', 'Sensor_Reading_33', 'Sensor_Reading_34', 'Sensor_Reading_35', 'Sensor_Reading_36', 'Sensor_Reading_37', 'Sensor_Reading_38', 'Sensor_Reading_39', 'Sensor_Reading_40', 'Sensor_Reading_41', 'Sensor_Reading_42', 'Sensor_Reading_43', 'Sensor_Reading_44', 'Sensor_Reading_45', 'Sensor_Reading_46', 'Sensor_Reading_47', 'Sensor_Reading_48', 'Sensor_Reading_49', 'Sensor_Reading_50', 'Sensor_Reading_51', 'Sensor_Reading_52', 'Sensor_Reading_53', 'Sensor_Reading_54', 'Sensor_Reading_55', 'Sensor_Reading_56', 'Sensor_Reading_57', 'Sensor_Reading_58', 'Sensor_Reading_59', 'Sensor_Reading_60', 'Sensor_Reading_61']"
102 | }
103 | ],
104 | "execution_count": 8
105 | },
106 | {
107 | "cell_type": "markdown",
108 | "source": [
109 | "After examining the data, the Data Science team selects certain columns that they believe are highly predictive of the battery life."
110 | ],
111 | "metadata": {
112 | "azdata_cell_guid": "12249759-7afa-402a-badc-9167ad70b5f1"
113 | }
114 | },
115 | {
116 | "cell_type": "code",
117 | "source": [
118 | "# Select the features used for predicting battery life\r\n",
119 | "x = df.iloc[:,1:74]\r\n",
120 | "x = x.iloc[:,np.r_[2:7, 9:73]]\r\n",
121 | "x = x.interpolate() \r\n",
122 | "\r\n",
123 | "# Select the labels only (the measured battery life) \r\n",
124 | "y = df.iloc[:,0].values.flatten()\r\n",
125 | "print('Interpolation Complete')"
126 | ],
127 | "metadata": {
128 | "azdata_cell_guid": "0806c0d5-7f0d-4528-8ec4-57c21989717f"
129 | },
130 | "outputs": [
131 | {
132 | "output_type": "stream",
133 | "name": "stdout",
134 | "text": "Interpolation Complete"
135 | }
136 | ],
137 | "execution_count": 9
138 | },
139 | {
140 | "cell_type": "code",
141 | "source": [
142 | "# Examine the features selected \r\n",
143 | "print(list(x.columns))"
144 | ],
145 | "metadata": {
146 | "azdata_cell_guid": "05de0ddd-ece4-4005-82a6-02a2cd2946bb"
147 | },
148 | "outputs": [
149 | {
150 | "output_type": "stream",
151 | "name": "stdout",
152 | "text": "['Trip_Length_Mean', 'Trip_Length_Sigma', 'Trips_Per_Day_Mean', 'Trips_Per_Day_Sigma', 'Battery_Rated_Cycles', 'Alternator_Efficiency', 'Car_Has_EcoStart', 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first', 'Sensor_Reading_1', 'Sensor_Reading_2', 'Sensor_Reading_3', 'Sensor_Reading_4', 'Sensor_Reading_5', 'Sensor_Reading_6', 'Sensor_Reading_7', 'Sensor_Reading_8', 'Sensor_Reading_9', 'Sensor_Reading_10', 'Sensor_Reading_11', 'Sensor_Reading_12', 'Sensor_Reading_13', 'Sensor_Reading_14', 'Sensor_Reading_15', 'Sensor_Reading_16', 'Sensor_Reading_17', 'Sensor_Reading_18', 'Sensor_Reading_19', 'Sensor_Reading_20', 'Sensor_Reading_21', 'Sensor_Reading_22', 'Sensor_Reading_23', 'Sensor_Reading_24', 'Sensor_Reading_25', 'Sensor_Reading_26', 'Sensor_Reading_27', 'Sensor_Reading_28', 'Sensor_Reading_29', 'Sensor_Reading_30', 'Sensor_Reading_31', 'Sensor_Reading_32', 'Sensor_Reading_33', 'Sensor_Reading_34', 'Sensor_Reading_35', 'Sensor_Reading_36', 'Sensor_Reading_37', 'Sensor_Reading_38', 'Sensor_Reading_39', 'Sensor_Reading_40', 'Sensor_Reading_41', 'Sensor_Reading_42', 'Sensor_Reading_43', 'Sensor_Reading_44', 'Sensor_Reading_45', 'Sensor_Reading_46', 'Sensor_Reading_47', 'Sensor_Reading_48', 'Sensor_Reading_49', 'Sensor_Reading_50', 'Sensor_Reading_51', 'Sensor_Reading_52', 'Sensor_Reading_53', 'Sensor_Reading_54', 'Sensor_Reading_55', 'Sensor_Reading_56', 'Sensor_Reading_57', 'Sensor_Reading_58', 'Sensor_Reading_59', 'Sensor_Reading_60', 'Sensor_Reading_61']"
153 | }
154 | ],
155 | "execution_count": 10
156 | },
157 | {
158 | "cell_type": "markdown",
159 | "source": [
160 | "The lead Data Scientist believes that a standard Regression algorithm would do the best predictions."
161 | ],
162 | "metadata": {
163 | "azdata_cell_guid": "88ff4ca6-d8ec-49ed-8e4e-915f1664527e"
164 | }
165 | },
166 | {
167 | "cell_type": "code",
168 | "source": [
169 | "# Train a regression model \r\n",
170 | "from sklearn.ensemble import GradientBoostingRegressor \r\n",
171 | "model = GradientBoostingRegressor() \r\n",
172 | "model.fit(x,y)"
173 | ],
174 | "metadata": {
175 | "azdata_cell_guid": "0d8162d2-7530-44a7-8318-79ee11ffa210"
176 | },
177 | "outputs": [
178 | {
179 | "output_type": "stream",
180 | "name": "stdout",
181 | "text": "GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,\n learning_rate=0.1, loss='ls', max_depth=3, max_features=None,\n max_leaf_nodes=None, min_impurity_decrease=0.0,\n min_impurity_split=None, min_samples_leaf=1,\n min_samples_split=2, min_weight_fraction_leaf=0.0,\n n_estimators=100, n_iter_no_change=None, presort='auto',\n random_state=None, subsample=1.0, tol=0.0001,\n validation_fraction=0.1, verbose=0, warm_start=False)"
182 | }
183 | ],
184 | "execution_count": 11
185 | },
186 | {
187 | "cell_type": "code",
188 | "source": [
189 | "# Try making a single prediction and observe the result \r\n",
190 | "model.predict(x.iloc[0:1]) "
191 | ],
192 | "metadata": {
193 | "azdata_cell_guid": "11818008-b450-4799-a779-a1b6de81093b"
194 | },
195 | "outputs": [
196 | {
197 | "output_type": "stream",
198 | "name": "stdout",
199 | "text": "array([1323.39791998])"
200 | }
201 | ],
202 | "execution_count": 12
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "source": [
207 | "After the model is trained, perform testing from labeled data."
208 | ],
209 | "metadata": {
210 | "azdata_cell_guid": "bdff0323-079d-4559-861a-345993135749"
211 | }
212 | },
213 | {
214 | "cell_type": "code",
215 | "source": [
216 | "# access the test data from HDFS by reading into a Spark DataFrame \r\n",
217 | "test_data = pd.read_csv('https://sabwoody.blob.core.windows.net/backups/fleet-formatted.csv', header=0)\r\n",
218 | "test_data.dropna()\r\n",
219 | "\r\n",
220 | "# prepare the test data (dropping unused columns) \r\n",
221 | "test_data = test_data.drop(columns=[\"Car_ID\", \"Battery_Age\"])\r\n",
222 | "test_data = test_data.iloc[:,np.r_[2:7, 9:73]]\r\n",
223 | "test_data.rename(columns={'Twelve_hourly_temperature_forecast_for_next_31_days _reversed': 'Twelve_hourly_temperature_history_for_last_31_days_before_death_l ast_recording_first'}, inplace=True) \r\n",
224 | "# make the battery life predictions for each of the vehicles in the test data \r\n",
225 | "battery_life_predictions = model.predict(test_data) \r\n",
226 | "# examine the prediction \r\n",
227 | "battery_life_predictions"
228 | ],
229 | "metadata": {
230 | "azdata_cell_guid": "25465418-a277-47d8-b258-1e7f9540ffdb"
231 | },
232 | "outputs": [
233 | {
234 | "output_type": "stream",
235 | "name": "stdout",
236 | "text": "array([1472.91111228, 1340.08897725, 1421.38601032, 1473.79033215,\n 1651.66584142, 1412.85617044, 1842.81351408, 1264.22762055,\n 1930.45602533, 1681.86345995])"
237 | }
238 | ],
239 | "execution_count": 13
240 | },
241 | {
242 | "cell_type": "code",
243 | "source": [
244 | "# prepare one data frame that includes predictions for each vehicle \r\n",
245 | "scored_data = test_data \r\n",
246 | "scored_data[\"Estimated_Battery_Life\"] = battery_life_predictions \r\n",
247 | "df_scored = spark.createDataFrame(scored_data) \r\n",
248 | "# Optionally, write out the scored data: \r\n",
249 | "# df_scored.coalesce(1).write.option(\"header\", \"true\").csv(\"/pdm\") "
250 | ],
251 | "metadata": {
252 | "azdata_cell_guid": "160ceaf9-81c1-4bb8-92ca-511a8b928cac"
253 | },
254 | "outputs": [],
255 | "execution_count": 16
256 | },
257 | {
258 | "cell_type": "markdown",
259 | "source": [
260 | "Once you are satisfied with the Model, you can save it out using the \"Pickle\" library for deployment to other systems."
261 | ],
262 | "metadata": {
263 | "azdata_cell_guid": "af10557a-dce4-42b4-8ba7-af1faaad461c"
264 | }
265 | },
266 | {
267 | "cell_type": "code",
268 | "source": [
269 | "pickle_file = open('/tmp/pdm.pkl', 'wb')\r\n",
270 | "pickle.dump(model, pickle_file)\r\n",
271 | "import os\r\n",
272 | "print(os.getcwd())\r\n",
273 | "os.listdir('//tmp/')"
274 | ],
275 | "metadata": {
276 | "azdata_cell_guid": "75c70761-9c37-4446-85f4-71c4d32f6493"
277 | },
278 | "outputs": [
279 | {
280 | "output_type": "stream",
281 | "name": "stdout",
282 | "text": "/tmp/nm-local-dir/usercache/root/appcache/application_1569595626385_0003/container_1569595626385_0003_01_000001\n['nm-local-dir', 'hsperfdata_root', 'hadoop-root-nodemanager.pid', 'hadoop-root-datanode.pid', 'jetty-0.0.0.0-8044-node-_-any-1399597149712545407.dir', 'jetty-localhost-43849-datanode-_-any-4367549175596043164.dir', 'pdm.pkl', 'tmpo7d6l6mt']"
283 | }
284 | ],
285 | "execution_count": 18
286 | },
287 | {
288 | "cell_type": "markdown",
289 | "source": [
290 | "**(Optional)**\r\n",
291 | "\r\n",
292 | "You could export this model and run it at the edge or in SQL Server directly. Here's an example of what that code could look like:\r\n",
293 | "\r\n",
294 | "\r\n",
295 | "\r\n",
296 | "DECLARE @query_string nvarchar(max) -- Query Truck Data\r\n",
297 | "SET @query_string='\r\n",
298 | "SELECT ['Trip_Length_Mean', 'Trip_Length_Sigma', 'Trips_Per_Day_Mean', 'Trips_Per_Day_Sigma', 'Battery_Rated_Cycles', 'Alternator_Efficiency', 'Car_Has_EcoStart', 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first', 'Sensor_Reading_1', 'Sensor_Reading_2', 'Sensor_Reading_3', 'Sensor_Reading_4', 'Sensor_Reading_5', 'Sensor_Reading_6', 'Sensor_Reading_7', 'Sensor_Reading_8', 'Sensor_Reading_9', 'Sensor_Reading_10', 'Sensor_Reading_11', 'Sensor_Reading_12', 'Sensor_Reading_13', 'Sensor_Reading_14', 'Sensor_Reading_15', 'Sensor_Reading_16', 'Sensor_Reading_17', 'Sensor_Reading_18', 'Sensor_Reading_19', 'Sensor_Reading_20', 'Sensor_Reading_21', 'Sensor_Reading_22', 'Sensor_Reading_23', 'Sensor_Reading_24', 'Sensor_Reading_25', 'Sensor_Reading_26', 'Sensor_Reading_27', 'Sensor_Reading_28', 'Sensor_Reading_29', 'Sensor_Reading_30', 'Sensor_Reading_31', 'Sensor_Reading_32', 'Sensor_Reading_33', 'Sensor_Reading_34', 'Sensor_Reading_35', 'Sensor_Reading_36', 'Sensor_Reading_37', 'Sensor_Reading_38', 'Sensor_Reading_39', 'Sensor_Reading_40', 'Sensor_Reading_41', 'Sensor_Reading_42', 'Sensor_Reading_43', 'Sensor_Reading_44', 'Sensor_Reading_45', 'Sensor_Reading_46', 'Sensor_Reading_47', 'Sensor_Reading_48', 'Sensor_Reading_49', 'Sensor_Reading_50', 'Sensor_Reading_51', 'Sensor_Reading_52', 'Sensor_Reading_53', 'Sensor_Reading_54', 'Sensor_Reading_55', 'Sensor_Reading_56', 'Sensor_Reading_57', 'Sensor_Reading_58', 'Sensor_Reading_59', 'Sensor_Reading_60', 'Sensor_Reading_61']\r\n",
299 | "FROM Truck_Sensor_Readings'\r\n",
300 | "EXEC [dbo].[PredictBattLife] 'pdm', @query_string;\r\n",
301 | "\r\n",
302 | "
\r\n",
303 | ""
304 | ],
305 | "metadata": {
306 | "azdata_cell_guid": "04647e35-4f0f-4e52-a89c-6d695809da14"
307 | }
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "source": [
312 | "## Next Steps: Continue on to other workloads in SQL Server 2019\r\n",
313 | "\r\n",
314 | "Now you're ready to work with SQL Server 2019's other features - [you can learn more about those here](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sqlallproducts-allversions)."
315 | ],
316 | "metadata": {
317 | "azdata_cell_guid": "c7122be0-61a8-4b8d-a023-14212908269d"
318 | }
319 | }
320 | ]
321 | }
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc.ssmssln:
--------------------------------------------------------------------------------
1 |
2 | Microsoft Visual Studio Solution File, Format Version 12.00
3 | # SQL Server Management Studio Solution File, Format Version 18.00
4 | VisualStudioVersion = 15.0.27428.2015
5 | MinimumVisualStudioVersion = 10.0.40219.1
6 | Project("{4F2E2C19-372F-40D8-9FA7-9D2138C6997A}") = "SQL Server Scripts for bdc", "SQL Server Scripts for bdc\SQL Server Scripts for bdc.ssmssqlproj", "{3B44EB86-7B99-4EB1-ACAD-31AF5461F4E2}"
7 | EndProject
8 | Global
9 | GlobalSection(SolutionConfigurationPlatforms) = preSolution
10 | Default|Default = Default|Default
11 | EndGlobalSection
12 | GlobalSection(ProjectConfigurationPlatforms) = postSolution
13 | {3B44EB86-7B99-4EB1-ACAD-31AF5461F4E2}.Default|Default.ActiveCfg = Default
14 | {04FC7132-4830-4B67-905B-0279F580D3E8}.Default|Default.ActiveCfg = Default
15 | EndGlobalSection
16 | GlobalSection(SolutionProperties) = preSolution
17 | HideSolutionNode = FALSE
18 | EndGlobalSection
19 | GlobalSection(ExtensibilityGlobals) = postSolution
20 | SolutionGuid = {B0EEFBD1-1FE9-4FF0-95C7-81818B1BD6F8}
21 | EndGlobalSection
22 | EndGlobal
23 |
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/01 - Show Configuration.sql:
--------------------------------------------------------------------------------
1 | /*
2 | Note: To be run in conjunction with SQL server 2019 big data clusters course only.
3 |
4 | Show Instance Version */
5 | SELECT @@VERSION;
6 | GO
7 |
8 | /* General Configuration */
9 | USE master;
10 | GO
11 | EXEC sp_configure;
12 | GO
13 |
14 | /* Databases on this Instance */
15 | SELECT db.name AS 'Database Name'
16 | , Physical_Name AS 'Location on Disk'
17 | , Cast(Cast(Round(cast(mf.size as decimal) * 8.0/1024000.0,2) as decimal(18,2)) as nvarchar) 'Size (GB)'
18 | FROM sys.master_files mf
19 | INNER JOIN
20 | sys.databases db ON db.database_id = mf.database_id
21 | WHERE mf.type_desc = 'ROWS';
22 | GO
23 |
24 | SELECT * from sys.master_files
25 |
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/02 - Population Information from WWI.sql:
--------------------------------------------------------------------------------
1 | /* Get some general information about the data in the WWI OLTP system */
2 | USE WideWorldImporters;
3 | GO
4 |
5 | /* Show the Populations.
6 | Where do we have the most people?
7 | */
8 | SELECT TOP 10 CityName as 'City Name'
9 | , StateProvinceName as 'State or Province'
10 | , sp.LatestRecordedPopulation as 'Population'
11 | , CountryName
12 | FROM Application.Cities AS city
13 | JOIN Application.StateProvinces AS sp ON
14 | city.StateProvinceID = sp.StateProvinceID
15 | JOIN Application.Countries AS ctry ON
16 | sp.CountryID=ctry.CountryID
17 | ORDER BY Population, CityName;
18 | GO
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/03 - Sales in WWI.sql:
--------------------------------------------------------------------------------
1 | /* Show Customer Sales in WWI OLTP */
2 | USE WideWorldImporters;
3 | GO
4 |
5 | SELECT TOP 10 s.CustomerID
6 | , s.CustomerName
7 | , sc.CustomerCategoryName
8 | , pp.FullName AS PrimaryContact
9 | , ap.FullName AS AlternateContact
10 | , s.PhoneNumber
11 | , s.FaxNumber
12 | , bg.BuyingGroupName
13 | , s.WebsiteURL
14 | , dm.DeliveryMethodName AS DeliveryMethod
15 | , c.CityName AS CityName
16 | , s.DeliveryLocation AS DeliveryLocation
17 | , s.DeliveryRun
18 | , s.RunPosition
19 | FROM Sales.Customers AS s
20 | LEFT OUTER JOIN Sales.CustomerCategories AS sc
21 | ON s.CustomerCategoryID = sc.CustomerCategoryID
22 | LEFT OUTER JOIN [Application].People AS pp
23 | ON s.PrimaryContactPersonID = pp.PersonID
24 | LEFT OUTER JOIN [Application].People AS ap
25 | ON s.AlternateContactPersonID = ap.PersonID
26 | LEFT OUTER JOIN Sales.BuyingGroups AS bg
27 | ON s.BuyingGroupID = bg.BuyingGroupID
28 | LEFT OUTER JOIN [Application].DeliveryMethods AS dm
29 | ON s.DeliveryMethodID = dm.DeliveryMethodID
30 | LEFT OUTER JOIN [Application].Cities AS c
31 | ON s.DeliveryCityID = c.CityID
32 | ORDER BY c.CityName
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/04 - Join to HDFS.sql:
--------------------------------------------------------------------------------
1 | /* Now Join Those to show customers we currently have in a SQL Server Database
2 | and the Category they qre in the External Table */
3 | USE WideWorldImporters;
4 | GO
5 |
6 | SELECT TOP 10 a.FullName
7 | , b.CustomerSource
8 | FROM Application.People a
9 | INNER JOIN partner_customers_hdfs b ON a.FullName = b.CustomerName
10 | ORDER BY FullName ASC;
11 | GO
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/05 - Query from Data Pool.sql:
--------------------------------------------------------------------------------
1 | USE WideWorldImporters;
2 | GO
3 |
4 | SELECT count(*) FROM [dbo].[web_clickstream_clicks_data_pool]
5 | SELECT TOP 10 * FROM [dbo].[web_clickstream_clicks_data_pool]
--------------------------------------------------------------------------------
/SQL2019BDC/ssms/SQL Server Scripts for bdc/SQL Server Scripts for bdc/SQL Server Scripts for bdc.ssmssqlproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | 2019-04-29T13:21:46.7659137-04:00
8 | SQL
9 | 104.44.142.150,31433
10 | sa
11 | SQL
12 |
13 | 30
14 | 0
15 | NotSpecified
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa
24 | 104.44.142.150,31433
25 | sa
26 | 01 - Show Configuration.sql
27 |
28 |
29 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa
30 | 104.44.142.150,31433
31 | sa
32 | 02 - Population Information from WWI.sql
33 |
34 |
35 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa
36 | 104.44.142.150,31433
37 | sa
38 | 03 - Sales in WWI.sql
39 |
40 |
41 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa
42 | 104.44.142.150,31433
43 | sa
44 | 04 - Join to HDFS.sql
45 |
46 |
47 | 8c91a03d-f9b4-46c0-a305-b5dcc79ff907:104.44.142.150,31433:False:sa
48 | 104.44.142.150,31433
49 | sa
50 | 05 - Query from Data Pool.sql
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
--------------------------------------------------------------------------------
/graphics/ADS-5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ADS-5.png
--------------------------------------------------------------------------------
/graphics/KubernetesCluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/KubernetesCluster.png
--------------------------------------------------------------------------------
/graphics/WWI-001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-001.png
--------------------------------------------------------------------------------
/graphics/WWI-002.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-002.png
--------------------------------------------------------------------------------
/graphics/WWI-003.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-003.png
--------------------------------------------------------------------------------
/graphics/WWI-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/WWI-logo.png
--------------------------------------------------------------------------------
/graphics/adf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/adf.png
--------------------------------------------------------------------------------
/graphics/ads-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-1.png
--------------------------------------------------------------------------------
/graphics/ads-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-2.png
--------------------------------------------------------------------------------
/graphics/ads-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-3.png
--------------------------------------------------------------------------------
/graphics/ads-4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads-4.png
--------------------------------------------------------------------------------
/graphics/ads.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/ads.png
--------------------------------------------------------------------------------
/graphics/aks1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/aks1.png
--------------------------------------------------------------------------------
/graphics/aks2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/aks2.png
--------------------------------------------------------------------------------
/graphics/bdc-security-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdc-security-1.png
--------------------------------------------------------------------------------
/graphics/bdc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdc.png
--------------------------------------------------------------------------------
/graphics/bdcportal.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcportal.png
--------------------------------------------------------------------------------
/graphics/bdcsolution1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution1.png
--------------------------------------------------------------------------------
/graphics/bdcsolution2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution2.png
--------------------------------------------------------------------------------
/graphics/bdcsolution3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution3.png
--------------------------------------------------------------------------------
/graphics/bdcsolution4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bdcsolution4.png
--------------------------------------------------------------------------------
/graphics/bookpencil.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bookpencil.png
--------------------------------------------------------------------------------
/graphics/building1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/building1.png
--------------------------------------------------------------------------------
/graphics/bulletlist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/bulletlist.png
--------------------------------------------------------------------------------
/graphics/checkbox.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/checkbox.png
--------------------------------------------------------------------------------
/graphics/checkmark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/checkmark.png
--------------------------------------------------------------------------------
/graphics/clipboardcheck.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/clipboardcheck.png
--------------------------------------------------------------------------------
/graphics/cloud1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/cloud1.png
--------------------------------------------------------------------------------
/graphics/datamart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datamart.png
--------------------------------------------------------------------------------
/graphics/datamart1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datamart1.png
--------------------------------------------------------------------------------
/graphics/datavirtualization.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datavirtualization.png
--------------------------------------------------------------------------------
/graphics/datavirtualization1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/datavirtualization1.png
--------------------------------------------------------------------------------
/graphics/education1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/education1.png
--------------------------------------------------------------------------------
/graphics/factory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/factory.png
--------------------------------------------------------------------------------
/graphics/geopin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/geopin.png
--------------------------------------------------------------------------------
/graphics/grafana.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/grafana.png
--------------------------------------------------------------------------------
/graphics/hdfs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/hdfs.png
--------------------------------------------------------------------------------
/graphics/kibana.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/kibana.png
--------------------------------------------------------------------------------
/graphics/kubectl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/kubectl.png
--------------------------------------------------------------------------------
/graphics/kubernetes1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/kubernetes1.png
--------------------------------------------------------------------------------
/graphics/listcheck.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/listcheck.png
--------------------------------------------------------------------------------
/graphics/microsoftlogo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/microsoftlogo.png
--------------------------------------------------------------------------------
/graphics/owl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/owl.png
--------------------------------------------------------------------------------
/graphics/paperclip1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/paperclip1.png
--------------------------------------------------------------------------------
/graphics/pencil2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/pencil2.png
--------------------------------------------------------------------------------
/graphics/pinmap.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/pinmap.png
--------------------------------------------------------------------------------
/graphics/point1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/point1.png
--------------------------------------------------------------------------------
/graphics/solutiondiagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/solutiondiagram.png
--------------------------------------------------------------------------------
/graphics/spark1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark1.png
--------------------------------------------------------------------------------
/graphics/spark2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark2.png
--------------------------------------------------------------------------------
/graphics/spark3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark3.png
--------------------------------------------------------------------------------
/graphics/spark4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/spark4.png
--------------------------------------------------------------------------------
/graphics/sqlbdc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/sqlbdc.png
--------------------------------------------------------------------------------
/graphics/textbubble.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-bdc/e11e491f6753fca3b025cc7fa789ee026d8f02d0/graphics/textbubble.png
--------------------------------------------------------------------------------