├── LICENSES
├── CODE_OF_CONDUCT.md
├── LICENSE
├── LICENSE-CODE
└── SECURITY.md
├── PythonForDataProfessionals
├── 00 Pre-Requisites.md
├── 01 Overview and Course Setup.md
├── 02 Programming Basics.md
├── 03 Working with Data.md
├── 04 Environments and Deployment.md
├── Python for Data Professionals.pyproj
├── Python for Data Professionals_hdi_settings.json
├── assets
│ ├── MLCheatSheet.png
│ ├── NoStarchPressPython.pdf
│ ├── NumpyPythonCheatSheet.pdf
│ ├── PandasCheatSheet.pdf
│ ├── Python3CheatSheet.pdf
│ └── UseCases.png
├── code
│ ├── 01_OverviewAndCourseSetup.py
│ ├── 02_ProgrammingBasics.py
│ ├── 03_WorkingWithData.py
│ └── 04_EnvrionmentsAndDeployment.py
├── data
│ └── CATelcoCustomerChurnTrainingSample.csv
├── graphics
│ ├── AnalyticsAreas.png
│ ├── DataScience.png
│ ├── MLCapabilities.png
│ ├── MatPlotLib.png
│ ├── SmallBuck.png
│ ├── aml-logo.png
│ ├── brain.png
│ ├── check.png
│ ├── checkbox.png
│ ├── checkmark.jpg
│ ├── cortanalogo.png
│ ├── files.jpg
│ ├── ggplot.png
│ ├── keyboard.jpg
│ ├── microsoftlogo.png
│ ├── pin.jpg
│ ├── solutions-microsoft-logo-small.png
│ ├── tdsp.png
│ └── thinking.jpg
├── html
│ └── 00 Pre-Requisites.html
└── notebooks
│ ├── .ipynb_checkpoints
│ ├── 00 Pre-Requisites-checkpoint.ipynb
│ ├── 01 Overview and Setup-checkpoint.ipynb
│ ├── 02 Programming Basics-checkpoint.ipynb
│ ├── 03 Working with Data-checkpoint.ipynb
│ └── 04 Environments and Deployment-checkpoint.ipynb
│ ├── 00 Pre-Requisites.ipynb
│ ├── 01 Overview and Setup.ipynb
│ ├── 02 Programming Basics.ipynb
│ ├── 03 Working with Data.ipynb
│ └── 04 Environments and Deployment.ipynb
├── README.md
├── SECURITY.md
└── graphics
├── AnalyticsAreas.png
├── DataScience.png
├── MLCapabilities.png
├── MatPlotLib.png
├── SmallBuck.png
├── aml-logo.png
├── brain.png
├── check.png
├── checkbox.png
├── checkmark.jpg
├── cortanalogo.png
├── files.jpg
├── ggplot.png
├── keyboard.jpg
├── microsoftlogo.png
├── pin.jpg
├── solutions-microsoft-logo-small.png
├── tdsp.png
└── thinking.jpg
/LICENSES/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Microsoft Open Source Code of Conduct
2 |
3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
4 |
5 | Resources:
6 |
7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 |
--------------------------------------------------------------------------------
/LICENSES/LICENSE:
--------------------------------------------------------------------------------
1 | Attribution 4.0 International
2 |
3 | =======================================================================
4 |
5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
6 | does not provide legal services or legal advice. Distribution of
7 | Creative Commons public licenses does not create a lawyer-client or
8 | other relationship. Creative Commons makes its licenses and related
9 | information available on an "as-is" basis. Creative Commons gives no
10 | warranties regarding its licenses, any material licensed under their
11 | terms and conditions, or any related information. Creative Commons
12 | disclaims all liability for damages resulting from their use to the
13 | fullest extent possible.
14 |
15 | Using Creative Commons Public Licenses
16 |
17 | Creative Commons public licenses provide a standard set of terms and
18 | conditions that creators and other rights holders may use to share
19 | original works of authorship and other material subject to copyright
20 | and certain other rights specified in the public license below. The
21 | following considerations are for informational purposes only, are not
22 | exhaustive, and do not form part of our licenses.
23 |
24 | Considerations for licensors: Our public licenses are
25 | intended for use by those authorized to give the public
26 | permission to use material in ways otherwise restricted by
27 | copyright and certain other rights. Our licenses are
28 | irrevocable. Licensors should read and understand the terms
29 | and conditions of the license they choose before applying it.
30 | Licensors should also secure all rights necessary before
31 | applying our licenses so that the public can reuse the
32 | material as expected. Licensors should clearly mark any
33 | material not subject to the license. This includes other CC-
34 | licensed material, or material used under an exception or
35 | limitation to copyright. More considerations for licensors:
36 | wiki.creativecommons.org/Considerations_for_licensors
37 |
38 | Considerations for the public: By using one of our public
39 | licenses, a licensor grants the public permission to use the
40 | licensed material under specified terms and conditions. If
41 | the licensor's permission is not necessary for any reason--for
42 | example, because of any applicable exception or limitation to
43 | copyright--then that use is not regulated by the license. Our
44 | licenses grant only permissions under copyright and certain
45 | other rights that a licensor has authority to grant. Use of
46 | the licensed material may still be restricted for other
47 | reasons, including because others have copyright or other
48 | rights in the material. A licensor may make special requests,
49 | such as asking that all changes be marked or described.
50 | Although not required by our licenses, you are encouraged to
51 | respect those requests where reasonable. More_considerations
52 | for the public:
53 | wiki.creativecommons.org/Considerations_for_licensees
54 |
55 | =======================================================================
56 |
57 | Creative Commons Attribution 4.0 International Public License
58 |
59 | By exercising the Licensed Rights (defined below), You accept and agree
60 | to be bound by the terms and conditions of this Creative Commons
61 | Attribution 4.0 International Public License ("Public License"). To the
62 | extent this Public License may be interpreted as a contract, You are
63 | granted the Licensed Rights in consideration of Your acceptance of
64 | these terms and conditions, and the Licensor grants You such rights in
65 | consideration of benefits the Licensor receives from making the
66 | Licensed Material available under these terms and conditions.
67 |
68 |
69 | Section 1 -- Definitions.
70 |
71 | a. Adapted Material means material subject to Copyright and Similar
72 | Rights that is derived from or based upon the Licensed Material
73 | and in which the Licensed Material is translated, altered,
74 | arranged, transformed, or otherwise modified in a manner requiring
75 | permission under the Copyright and Similar Rights held by the
76 | Licensor. For purposes of this Public License, where the Licensed
77 | Material is a musical work, performance, or sound recording,
78 | Adapted Material is always produced where the Licensed Material is
79 | synched in timed relation with a moving image.
80 |
81 | b. Adapter's License means the license You apply to Your Copyright
82 | and Similar Rights in Your contributions to Adapted Material in
83 | accordance with the terms and conditions of this Public License.
84 |
85 | c. Copyright and Similar Rights means copyright and/or similar rights
86 | closely related to copyright including, without limitation,
87 | performance, broadcast, sound recording, and Sui Generis Database
88 | Rights, without regard to how the rights are labeled or
89 | categorized. For purposes of this Public License, the rights
90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
91 | Rights.
92 |
93 | d. Effective Technological Measures means those measures that, in the
94 | absence of proper authority, may not be circumvented under laws
95 | fulfilling obligations under Article 11 of the WIPO Copyright
96 | Treaty adopted on December 20, 1996, and/or similar international
97 | agreements.
98 |
99 | e. Exceptions and Limitations means fair use, fair dealing, and/or
100 | any other exception or limitation to Copyright and Similar Rights
101 | that applies to Your use of the Licensed Material.
102 |
103 | f. Licensed Material means the artistic or literary work, database,
104 | or other material to which the Licensor applied this Public
105 | License.
106 |
107 | g. Licensed Rights means the rights granted to You subject to the
108 | terms and conditions of this Public License, which are limited to
109 | all Copyright and Similar Rights that apply to Your use of the
110 | Licensed Material and that the Licensor has authority to license.
111 |
112 | h. Licensor means the individual(s) or entity(ies) granting rights
113 | under this Public License.
114 |
115 | i. Share means to provide material to the public by any means or
116 | process that requires permission under the Licensed Rights, such
117 | as reproduction, public display, public performance, distribution,
118 | dissemination, communication, or importation, and to make material
119 | available to the public including in ways that members of the
120 | public may access the material from a place and at a time
121 | individually chosen by them.
122 |
123 | j. Sui Generis Database Rights means rights other than copyright
124 | resulting from Directive 96/9/EC of the European Parliament and of
125 | the Council of 11 March 1996 on the legal protection of databases,
126 | as amended and/or succeeded, as well as other essentially
127 | equivalent rights anywhere in the world.
128 |
129 | k. You means the individual or entity exercising the Licensed Rights
130 | under this Public License. Your has a corresponding meaning.
131 |
132 |
133 | Section 2 -- Scope.
134 |
135 | a. License grant.
136 |
137 | 1. Subject to the terms and conditions of this Public License,
138 | the Licensor hereby grants You a worldwide, royalty-free,
139 | non-sublicensable, non-exclusive, irrevocable license to
140 | exercise the Licensed Rights in the Licensed Material to:
141 |
142 | a. reproduce and Share the Licensed Material, in whole or
143 | in part; and
144 |
145 | b. produce, reproduce, and Share Adapted Material.
146 |
147 | 2. Exceptions and Limitations. For the avoidance of doubt, where
148 | Exceptions and Limitations apply to Your use, this Public
149 | License does not apply, and You do not need to comply with
150 | its terms and conditions.
151 |
152 | 3. Term. The term of this Public License is specified in Section
153 | 6(a).
154 |
155 | 4. Media and formats; technical modifications allowed. The
156 | Licensor authorizes You to exercise the Licensed Rights in
157 | all media and formats whether now known or hereafter created,
158 | and to make technical modifications necessary to do so. The
159 | Licensor waives and/or agrees not to assert any right or
160 | authority to forbid You from making technical modifications
161 | necessary to exercise the Licensed Rights, including
162 | technical modifications necessary to circumvent Effective
163 | Technological Measures. For purposes of this Public License,
164 | simply making modifications authorized by this Section 2(a)
165 | (4) never produces Adapted Material.
166 |
167 | 5. Downstream recipients.
168 |
169 | a. Offer from the Licensor -- Licensed Material. Every
170 | recipient of the Licensed Material automatically
171 | receives an offer from the Licensor to exercise the
172 | Licensed Rights under the terms and conditions of this
173 | Public License.
174 |
175 | b. No downstream restrictions. You may not offer or impose
176 | any additional or different terms or conditions on, or
177 | apply any Effective Technological Measures to, the
178 | Licensed Material if doing so restricts exercise of the
179 | Licensed Rights by any recipient of the Licensed
180 | Material.
181 |
182 | 6. No endorsement. Nothing in this Public License constitutes or
183 | may be construed as permission to assert or imply that You
184 | are, or that Your use of the Licensed Material is, connected
185 | with, or sponsored, endorsed, or granted official status by,
186 | the Licensor or others designated to receive attribution as
187 | provided in Section 3(a)(1)(A)(i).
188 |
189 | b. Other rights.
190 |
191 | 1. Moral rights, such as the right of integrity, are not
192 | licensed under this Public License, nor are publicity,
193 | privacy, and/or other similar personality rights; however, to
194 | the extent possible, the Licensor waives and/or agrees not to
195 | assert any such rights held by the Licensor to the limited
196 | extent necessary to allow You to exercise the Licensed
197 | Rights, but not otherwise.
198 |
199 | 2. Patent and trademark rights are not licensed under this
200 | Public License.
201 |
202 | 3. To the extent possible, the Licensor waives any right to
203 | collect royalties from You for the exercise of the Licensed
204 | Rights, whether directly or through a collecting society
205 | under any voluntary or waivable statutory or compulsory
206 | licensing scheme. In all other cases the Licensor expressly
207 | reserves any right to collect such royalties.
208 |
209 |
210 | Section 3 -- License Conditions.
211 |
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 |
215 | a. Attribution.
216 |
217 | 1. If You Share the Licensed Material (including in modified
218 | form), You must:
219 |
220 | a. retain the following if it is supplied by the Licensor
221 | with the Licensed Material:
222 |
223 | i. identification of the creator(s) of the Licensed
224 | Material and any others designated to receive
225 | attribution, in any reasonable manner requested by
226 | the Licensor (including by pseudonym if
227 | designated);
228 |
229 | ii. a copyright notice;
230 |
231 | iii. a notice that refers to this Public License;
232 |
233 | iv. a notice that refers to the disclaimer of
234 | warranties;
235 |
236 | v. a URI or hyperlink to the Licensed Material to the
237 | extent reasonably practicable;
238 |
239 | b. indicate if You modified the Licensed Material and
240 | retain an indication of any previous modifications; and
241 |
242 | c. indicate the Licensed Material is licensed under this
243 | Public License, and include the text of, or the URI or
244 | hyperlink to, this Public License.
245 |
246 | 2. You may satisfy the conditions in Section 3(a)(1) in any
247 | reasonable manner based on the medium, means, and context in
248 | which You Share the Licensed Material. For example, it may be
249 | reasonable to satisfy the conditions by providing a URI or
250 | hyperlink to a resource that includes the required
251 | information.
252 |
253 | 3. If requested by the Licensor, You must remove any of the
254 | information required by Section 3(a)(1)(A) to the extent
255 | reasonably practicable.
256 |
257 | 4. If You Share Adapted Material You produce, the Adapter's
258 | License You apply must not prevent recipients of the Adapted
259 | Material from complying with this Public License.
260 |
261 |
262 | Section 4 -- Sui Generis Database Rights.
263 |
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 |
267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 | to extract, reuse, reproduce, and Share all or a substantial
269 | portion of the contents of the database;
270 |
271 | b. if You include all or a substantial portion of the database
272 | contents in a database in which You have Sui Generis Database
273 | Rights, then the database in which You have Sui Generis Database
274 | Rights (but not its individual contents) is Adapted Material; and
275 |
276 | c. You must comply with the conditions in Section 3(a) if You Share
277 | all or a substantial portion of the contents of the database.
278 |
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 |
283 |
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 |
286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 |
297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 |
307 | c. The disclaimer of warranties and limitation of liability provided
308 | above shall be interpreted in a manner that, to the extent
309 | possible, most closely approximates an absolute disclaimer and
310 | waiver of all liability.
311 |
312 |
313 | Section 6 -- Term and Termination.
314 |
315 | a. This Public License applies for the term of the Copyright and
316 | Similar Rights licensed here. However, if You fail to comply with
317 | this Public License, then Your rights under this Public License
318 | terminate automatically.
319 |
320 | b. Where Your right to use the Licensed Material has terminated under
321 | Section 6(a), it reinstates:
322 |
323 | 1. automatically as of the date the violation is cured, provided
324 | it is cured within 30 days of Your discovery of the
325 | violation; or
326 |
327 | 2. upon express reinstatement by the Licensor.
328 |
329 | For the avoidance of doubt, this Section 6(b) does not affect any
330 | right the Licensor may have to seek remedies for Your violations
331 | of this Public License.
332 |
333 | c. For the avoidance of doubt, the Licensor may also offer the
334 | Licensed Material under separate terms or conditions or stop
335 | distributing the Licensed Material at any time; however, doing so
336 | will not terminate this Public License.
337 |
338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 | License.
340 |
341 |
342 | Section 7 -- Other Terms and Conditions.
343 |
344 | a. The Licensor shall not be bound by any additional or different
345 | terms or conditions communicated by You unless expressly agreed.
346 |
347 | b. Any arrangements, understandings, or agreements regarding the
348 | Licensed Material not stated herein are separate from and
349 | independent of the terms and conditions of this Public License.
350 |
351 |
352 | Section 8 -- Interpretation.
353 |
354 | a. For the avoidance of doubt, this Public License does not, and
355 | shall not be interpreted to, reduce, limit, restrict, or impose
356 | conditions on any use of the Licensed Material that could lawfully
357 | be made without permission under this Public License.
358 |
359 | b. To the extent possible, if any provision of this Public License is
360 | deemed unenforceable, it shall be automatically reformed to the
361 | minimum extent necessary to make it enforceable. If the provision
362 | cannot be reformed, it shall be severed from this Public License
363 | without affecting the enforceability of the remaining terms and
364 | conditions.
365 |
366 | c. No term or condition of this Public License will be waived and no
367 | failure to comply consented to unless expressly agreed to by the
368 | Licensor.
369 |
370 | d. Nothing in this Public License constitutes or may be interpreted
371 | as a limitation upon, or waiver of, any privileges and immunities
372 | that apply to the Licensor or You, including from the legal
373 | processes of any jurisdiction or authority.
374 |
375 |
376 | =======================================================================
377 |
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 |
395 | Creative Commons may be contacted at creativecommons.org.
--------------------------------------------------------------------------------
/LICENSES/LICENSE-CODE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) Microsoft Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE
22 |
--------------------------------------------------------------------------------
/LICENSES/SECURITY.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Security
4 |
5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
6 |
7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)) of a security vulnerability, please report it to us as described below.
8 |
9 | ## Reporting Security Issues
10 |
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 |
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
14 |
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
16 |
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
18 |
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 |
21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 | * Full paths of source file(s) related to the manifestation of the issue
23 | * The location of the affected source code (tag/branch/commit or direct URL)
24 | * Any special configuration required to reproduce the issue
25 | * Step-by-step instructions to reproduce the issue
26 | * Proof-of-concept or exploit code (if possible)
27 | * Impact of the issue, including how an attacker might exploit the issue
28 |
29 | This information will help us triage your report more quickly.
30 |
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
32 |
33 | ## Preferred Languages
34 |
35 | We prefer all communications to be in English.
36 |
37 | ## Policy
38 |
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
40 |
41 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/00 Pre-Requisites.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Python for Data Professionals
4 |
5 |
00 Pre-Requisites
6 |
7 | The "Python for Data Professionals" course is taught using Microsoft Windows, SQL Server, and Visual Studio. You can of course use the Python language on many platforms and in other distributions and with other tools, but using this configuration allows you to stay consistent for instruction during this course. Feel free to use other installations after you complete the course.
8 |
9 | *Note that all following activities must be completed prior to class - there will not be time to perform these operations during the course.*
10 |
11 |
Activity 1: Set up the Windows Operating System
12 |
13 | You have three options for setting up Microsoft Windows to complete this course. You can use a Local installation of Windows, a Virtual Machine on your local system, or a Virtual Machine stored in a Cloud provider such as Microsoft Azure. *(The third option is only for classrooms where you have reliable connections to the Internet)*
14 |
15 |
Option 1 - Local Installation
16 |
17 | - Install a recent version of Microsoft Windows. For this course, Windows 10, or any current of Windows Server is acceptable.
18 | - Install all updates to the operating system.
19 |
20 |
Option 2 - Install Windows on a Local Virtual Machine Environment
21 |
22 | - Using your local system, [navigate to this resource](https://developer.microsoft.com/en-us/windows/downloads/virtual-machines) and follow the instructions there.
23 |
24 | **NOTE: Wait as long as reasonably possible to ensure that the system does not expire - these are free licenses, but they have a time limit**
25 |
26 | - You can also use whatever Hypervisor you like for your system and install a legal, registered copy of Microsoft Windows.
27 |
28 |
Option 3 - Use a Virtual Machine in a Cloud Provider
29 |
30 | - If you have access to the Internet, you can set up a [free Microsoft Azure Account](https://azure.microsoft.com/en-us/free/search/?&OCID=AID631184_SEM_bSHIQHtA&lnkd=Google_Azure_Brand&gclid=Cj0KCQjwpcLZBRCnARIsAMPBgF2myLWEk3Hllm2354GEs0rD1sDST_xcfkFGRdAE8toYZMalbQJ4M3YaAs9UEALw_wcB&dclid=CPDRgcv57tsCFVXE4Qodo-gLzg) and use a [Data Science Virtual Machine](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/provision-vm). Any size will do, and the free account provides enough resources for a single course. You will not need to install Anaconda, VSCode or SQL Server if you use this choice, as they are already installed for you.
31 | - Log in to the system and run [Windows Update](https://support.microsoft.com/en-us/help/4027667/windows-update-windows-10)
32 |
33 |
Activity 2: Install SQL Server 2017 with ML Services
34 |
35 | - [Navigate to this resource](https://www.microsoft.com/en-us/sql-server/sql-server-downloads), Select **Developer** from the lower part of the page, and install the **Developer Edition**. Select all components for installation.
36 |
37 | - Run Windows Update and select the ["Install updates for other products" option](https://www.lifewire.com/how-to-change-windows-update-settings-2625778). Apply the latest updates to the classroom system.
38 |
39 |
Activity 3: Install Visual Studio with Machine Learning and Data Science workloads
40 |
41 | - On your classroom system, [install Visual Studio 2017](https://www.visualstudio.com/downloads/) - The free Community Edition is adequate for this course.
42 |
43 | - During the installation, select the "Data storage and processing" and "Data science and analytical applicaitons" Workloads. *(NOTE: [In the Data Science Workload installation box, select ALL optional components on the Summary pane!](https://blogs.msdn.microsoft.com/visualstudio/2016/11/18/data-science-workloads-in-visual-studio-2017-rc/))*
44 |
45 | - Log in with a Live ID to Visual Studio, let the system load, and apply any updates.
46 |
47 | - After the updates complete, click the "R Tools" menu item and open the "Interactive R Window" option (This will verify that the Data Science Workloads add-ins are working, R and Python). Type the following in that panel to ensure the installation was successful:
48 |
49 | `x <- 10`
50 |
51 | `x`
52 |
53 | You should see the result **\[1\]10** returned. If not, open the Visual Studio Installer and select the "Repair" option.
54 |
55 |
For Further Study
56 |
57 | - Platforms supported: https://www.python.org/download/other/
58 |
59 | - Installing Python: https://www.python.org/downloads/
60 |
61 | - Installing Python using Anaconda: https://www.infoworld.com/article/3267976/python/anaconda-cpython-pypy-and-more-know-your-python-distributions.html
62 |
63 | Next, Continue to *01 Overview and Course Setup*
64 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/01 Overview and Course Setup.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Python for Data Professionals
4 |
5 | ## 01 Overview and Setup
6 |
7 | In this course you'll cover the basics of the Python language and environment from a Data Professional's perspective. While you will learn Python, you'll quickly cover topics that have a lot more depth available. In each section you'll get more references to go deeper, which you should follow up on. Also watch for links within the text - click on each one to explore that topic.
8 |
9 | Make sure you check out the **00 Pre-Requisites** page before you start. You'll need all of the items loaded there before you can proceed with the course.
10 |
11 | You'll cover these topics in the course:
12 |
13 |
14 |
15 |
16 | - Course Outline
17 | - 1 - Overview and Course Setup (This section)
18 | - 2 - Programming Basics
19 | - 3 Working with Data
20 | - 4 Deployment and Environments
21 |
22 |
23 |
24 |
25 |
Overview
26 |
27 | There are two main versions of Python - 2 and 3. So many programs were written for version 2 that it is still around, and version 3 was such an upgrade that programs for 2 don't always run in 3 and visa-versa. For this course we'll do everything in version 3 - it's becoming the accepted standard for data professionals.
28 |
29 | You have a few ways of working with Python:
30 |
31 | - The Interactive Interpreter (Type `python` and the version number if it is in your path)
32 | - Writing code and running it in some graphical environment (Such as VSCode, Visual Studio, Spyder, PyCharm, IDLE, etc.)
33 | - Calling a `.py` script file from the `python` command
34 |
35 | When you're in command-mode, you'll see that the code looks more like a scripting language, meaning that some parenthesis around functions might not be there. Programming-mode looks like a standard programming language environment - you'll normally use that within an Integrated Programming Environment (IDE).
36 |
37 |
Activity: Verify Your Installation and Configure Python
38 |
39 | Open the **01_OverviewAndCourseSetup.py** file and run the code you see there. The exercises will be marked out using comments:
40 |
41 |
42 | # TODO - Section Number
43 |
44 |
45 |
For Further Study
46 |
47 | - Version differences: https://wiki.python.org/moin/Python2orPython3
48 | - Development Environments: IDLE, tk, VSCode, PyCharm, Jupyter Notebooks, Documentation, Training Resources: https://www.python.org/doc/
49 | - and https://docs.python.org/3/tutorial/index.html
50 | - The Official Python Documentation Course: https://docs.python.org/3/tutorial/index.html
51 |
52 | Next, Continue to *02 Programming Basics*
--------------------------------------------------------------------------------
/PythonForDataProfessionals/02 Programming Basics.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Python for Data Professionals
4 |
5 | ## 02 Programming Basics
6 |
7 |
8 |
9 |
10 | - Course Outline
11 | - 1 - Overview and Course Setup
12 | - 2 - Programming Basics (This section)
13 | - 2.1 - Getting help
14 | - 2.2 Code Syntax and Structure
15 | - 2.3 Variables
-
16 |
- 2.4 Operations and Functions
-
17 |
- 3 Working with Data
18 | - 4 Deployment and Environments
19 |
20 |
21 |
22 |
23 | ## Programming Basics Overview
24 |
25 | From here on out, you'll focus on using Python in programming mode - you'll write code that you run from an IDE or a calling environment, not interactively from the command-line. As you work through this explanation, copy the code you see and run it to see the results. After you work through these copy-and-paste examples, you'll create your own code in the Activities that follow each section.
26 |
27 |
2.1 - Getting help
28 |
29 | The very first thing you should learn in any language is how to get help. You can [find the help documents on-line](https://docs.python.org/3/index.html), or simply type
30 |
31 | `help()`
32 |
33 | in your code. For help on a specific topic, put the topic in the parenthesis:
34 |
35 | `help(str)`
36 |
37 | To see a list of topics, type
38 |
39 | `help(topics)`
40 |
41 |
2.2 Code Syntax and Structure
42 |
43 | Let's cover a few basics about how Python code is written. (For a full discussion, check out the [Style Guide for Python, called PEP 8](https://www.python.org/dev/peps/pep-0008/) ) Let's use the "Zen of Python" rules from Tim Peters for this course:
44 |
45 |
46 |
47 | Beautiful is better than ugly.
48 | Explicit is better than implicit.
49 | Simple is better than complex.
50 | Complex is better than complicated.
51 | Flat is better than nested.
52 | Sparse is better than dense.
53 | Readability counts.
54 | Special cases aren't special enough to break the rules.
55 | Although practicality beats purity.
56 | Errors should never pass silently.
57 | Unless explicitly silenced.
58 | In the face of ambiguity, refuse the temptation to guess.
59 | There should be one-- and preferably only one --obvious way to do it.
60 | Although that way may not be obvious at first unless you're Dutch.
61 | Now is better than never.
62 | Although never is often better than right now.
63 | If the implementation is hard to explain, it's a bad idea.
64 | If the implementation is easy to explain, it may be a good idea.
65 | Namespaces are one honking great idea -- let's do more of those!
66 | --Tim Peters
67 |
68 |
69 |
70 | In general, use standard coding practices - don't use keywords for variables, be consistent in your naming (camel-case, lower-case, etc.), comment your code clearly, and understand the general syntax of your language, and follow the principles above. But the most important tip is to at least read the PEP 8 and decide for yourself how well that fits into your Zen.
71 |
72 | There is one hard-and-fast rule for Python that you *do* need to be aware of: indentation. You **must** indent your code for classes, functions (or methods), loops, conditions, and lists. You can use a tab or four spaces (spaces are the accepted way to do it) but in any case, you have to be consistent. If you use tabs, you always use tabs. If you use spaces, you have to use that throughout. It's best if you set your IDE to handle that for you, whichever way you go.
73 |
74 | Python code files have an extension of `.py`.
75 |
76 | Comments in Python start with the hash-tag: `#`. There are no block comments (and this makes us all sad) so each line you want to comment must have a tag in front of that line. Keep the lines short (80 characters or so) so that they don't fall off a single-line display like at the command line.
77 |
78 |
2.3 Variables
79 |
80 | Variables stand in for replaceable values. Python is not strongly-typed, meaning you can just declare a variable name and set it to a value at the same time, and Python will try and guess what data type you want. You use an `=` sign to assign values, and `==` to compare things.
81 |
82 | Quotes \" or ticks \' are fine, just be consistent.
83 |
84 | `# There are some keywords to be aware of, but x and y are always good choices.`
85 |
86 | `x = "Buck" # I'm a string.`
87 |
88 | `type(x)`
89 |
90 | `y = 10 # I'm an integer.`
91 |
92 | `type(y)`
93 |
94 | To change the type of a value, just re-enter something else:
95 |
96 | `x = "Buck" # I'm a string.`
97 |
98 | `type(x)`
99 |
100 | `x = 10 # Now I'm an integer.`
101 |
102 | `type(x)`
103 |
104 | Or cast it By implicitly declaring the conversion:
105 |
106 | `x = "10"`
107 |
108 | `type(x)`
109 |
110 | `print int(x)`
111 |
112 | To concatenate string values, use the `+` sign:
113 |
114 | `x = "Buck"`
115 |
116 | `y = " Woody"`
117 |
118 | `print(x + y)`
119 |
120 |
2.4 Operations and Functions
121 |
122 | Python has the following operators:
123 |
124 | Arithmetic Operators
125 | Comparison (Relational) Operators
126 | Assignment Operators
127 | Logical Operators
128 | Bitwise Operators
129 | Membership Operators
130 | Identity Operators
131 |
132 | You have the standard operators and functions from most every language. Here are some of the tokens:
133 |
134 |
135 |
136 | != *= << ^
137 | " + <<= ^=
138 | """ += <= `
139 | % , <> __
140 | %= - ==
141 | & -= > b"
142 | &= . >= b'
143 | ' ... >> j
144 | ''' / >>= r"
145 | ( // @ r'
146 | ) //= J |'
147 | * /= [ |=
148 | ** : \ ~
149 | **= < ]
150 |
151 |
152 |
153 | Wait...that's it? That's all you're going to tell me? *(Hint: use what you've learned):*
154 |
155 | `help('symbols')`
156 |
157 | Walk through each of these operators carefully - you'll use them when you work with data in the next module.
158 |
159 |
Activity - Programming basics
160 |
161 | Open the **02_ProgrammingBasics.py** file and run the code you see there. The exercises will be marked out using comments:
162 |
163 | `# - Section Number`
164 |
165 |
For Further Study
166 |
167 | - The PEP - https://www.python.org/dev/peps/pep-0008/
168 | - Introduction to the Python Coding Style - http://stackabuse.com/introduction-to-the-python-coding-style/
169 | - The Microsoft Tutorial and samples for Python - https://code.visualstudio.com/docs/languages/python
170 | - Coding requirements and standards - PEP - https://www.python.org/dev/peps/pep-0008/
171 | - Another free online self-paced course - https://www.w3schools.com/python/default.asp
172 |
173 | Next, Continue to *03 Working with Data*
--------------------------------------------------------------------------------
/PythonForDataProfessionals/03 Working with Data.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Python for Data Professionals
4 |
5 | ## 03 Working with Data
6 |
7 |
8 |
9 |
10 | - Course Outline
11 | - 1 - Overview and Course Setup
12 | - 2 - Programming Basics
13 | - 3 Working with Data (This section)
14 | - 3.1 Data Types
15 | - 3.2 Data Ingestion
16 | - 3.3 Data Inspection
17 | - 3.4 Graphing
18 | - 3.5 Machine Learning and AI
19 | - 4 Deployment and Environments
20 |
21 |
22 |
23 |
24 | Working with data is the main part of this course. This section will be quite a bit longer than what you have done so far, and the Activities will be harder. Remember to use the cheat-sheets and other references in the `./assets` course directory, because not everything you need to know will be in the course explanation. You'll need to dig a bit more and experiment, use the `help()` function, and do a bit of researching to figure out how to complete the Activities.
25 |
26 |
3.1 Data Types
27 |
28 | In most any language, after the Data Professional learns how to use help, they want to find out what data types the language supports, and how the language works with them. You covered the way Python works with data in the last module (under the topic *Operators*), so now you need to figure out the types of data Python can work with.
29 |
30 | Note that the data types you'll see next are the ones built-in to the Python language. Just like a Data Platform will often take a "primitive" data type and build on that with libraries, Python will do the same thing. You'll cover that in more depth in a moment.
31 |
32 | Python has 5 standard data type "families":
33 |
34 | - Numbers
35 | - Strings
36 | - Lists
37 | - Tuples
38 | - Dictionaries
39 |
40 |
Numbers
41 |
42 | Numbers contain the following types:
43 |
44 | - int (signed integer)
45 | - long (long integers in either decimal, octal or hex)
46 | - float (real floating point numbers)
47 | - complex (integers in the range of 0-255)
48 |
49 |
Strings
50 |
51 | Strings are ASCII characters within a single quote, double quote, or if you want to span a line, triple quotes. They are treated as an array of sorts, so if you do this:
52 |
53 | `myName = "Buck"`
54 |
55 | Then you can do this:
56 |
57 | `print(myName[0])`
58 |
59 | And you get back this:
60 |
61 | B
62 |
63 | Oh, and there are all kinds of formatting options you have with strings. [Check those out here](https://pyformat.info/)
64 |
65 |
Lists
66 |
67 | Lists are arrays - and you're not limited to a single dimension. You define them by enclosing the values in square brackets.
68 |
69 | Here's a list:
70 |
71 | `myList = [0, 1, 2]`
72 |
73 | And now you can decorate that with data:
74 |
75 | `myList[0] = "One"`
76 |
77 | `myList[1] = "Two"`
78 |
79 | `myList[2] = "Three"`
80 |
81 | `print(myList)`
82 |
83 | `print(myList[2])`
84 |
85 | And so on. There are also ranges, loops, and methods you can use on lists - [more on that here](https://www.tutorialspoint.com/python/python_lists.htm)
86 |
87 |
Tuples
88 |
89 | Tuples are similar to Lists, but are immutable - you can't extend or shrink them dynamically. Think of them as a readable SQL Table. You define a tuple with parenthesis rather than square brackets.
90 |
91 | Use a Tuple when you want to "protect" the data structure so that no one changes the structure after you define it. You'll see some real-world examples throughout this course.
92 |
93 |
Dictionaries
94 |
95 | Dictionaries are Key-Value Pair (KVP) data. You set these up with a curly-brace, the key, a colon, and then the value, like this:
96 |
97 | `myDict = {1: "Buck", 2: "Jane", 3: "Jim"}`
98 |
99 | Now you can work with them by the key or the value. For instance, to show the value for key 1, it's simply:
100 |
101 | `myDict[1]`
102 |
103 | Or to find the key for Buck, you simply type this:
104 |
105 | `myDict["Buck"]`
106 |
107 | Dictionaries are used quite frequently in Python, so you should take some time to [read up on them here](https://docs.python.org/2/tutorial/datastructures.html#dictionaries)
108 |
109 |
Activity - Programming basics
110 |
111 | Open the **03_WorkingWithData.py** file and enter the code you find for section 3.1. The exercises will be marked out using comments:
112 |
113 | `# - 3.1 `
114 |
115 |
3.2 Side-track: Working with Libraries for Data
116 |
117 | Python includes most of the functions you need to read data from files, work with them in memory and so on in the base installation. However, There is a way to add in to the functions you have for your code, using *Libraries*. Libraries are code someone else has written that you add in to your program from the start, using an `import` statement. You'll cover more information on working with Libraries (sometimes referred to as Modules or Packages, but more correctly Libraries) in a future lesson, but data "wrangling" (importing, manipulating and exporting) usually involves adding in at least one or two Libraries, so you'll cover that here.
118 |
119 |
NumPy
120 |
121 | *NOTE: You'll need to install both NumPy and Pandas before you can use them. You will cover that in a later lesson - your pre-requisites included this installation for now.*
122 |
123 | To work with numeric data, the first library you should become familiar with is *NumPy* (Numerical Python). The primary structure in NumPy is the *array*.
124 |
125 | To load the library, use the `import` statement with an optional "alias" of np:
126 |
127 | `import numpy as np`
128 |
129 | Now when you reference NumPy's methods and properties, you can use the shorter `np` label.
130 |
131 | It's simple enough to create and work with an array, now that you have the library loaded. This code creates a 2-dimensional NumPy array, and sets the values to integer:
132 |
133 | `x = np.array([(1,2,3), (4,5,6)], dtype = int)`
134 |
135 | The next important concept in NumPy is that the array is actually a set of pointers, involving four main components:
136 |
137 | - *data* : The memory address of the first byte in the array
138 |
139 | - *dtype* : The type of the elements in the array
140 |
141 | - *shape* : The layout of the array
142 |
143 | - *strides* : The number of bytes skipped in memory to go to the next element of the array
144 |
145 | Here are those properties in action:
146 |
147 | `print(x.data)`
148 |
149 | `print(x.dtype)`
150 |
151 | `print(x.shape)`
152 |
153 | `print(x.strides)`
154 |
155 | Now you can use the array, mostly by doing maths on them. Here are a few examples:
156 |
157 | Add, subtract, multiply and divide x and y:
158 |
159 | `np.add(x,y)`
160 |
161 | `np.subtract(x,y)`
162 |
163 | `np.multiply(x,y)`
164 |
165 | `np.divide(x,y)`
166 |
167 | You can experiment with a few more NumPy operations on your own in the Activities that follow.
168 |
169 |
Activity - Programming with NumPy
170 |
171 | Open the **03_WorkingWithData.py** file and enter the code you find for section 3.1a. The exercises will be marked out using comments:
172 |
173 | `# - 3.1a`
174 |
175 |
Pandas
176 |
177 | The primary library you'll use in working with data in Python is *Pandas* (the *Python Analysis Data Library*). Pandas provides many methods and properties that you can work with for your data, and it also has other data structures that make it more efficient to work with data.
178 |
179 | Just as in NumPy, use the `import` statement to load the Pandas Library:
180 |
181 | `import pandas as pd`
182 |
183 | Once the library is in memory, you start using it by creating a *dataframe* - the primary object Pandas works with. A dataframe is a mixed-type structure that looks similar to a SQL Table, and is very efficient. You can assign almost any data to a dataframe - here's an example that creates a dataframe by reading a comma-separated file:
184 |
185 | `my_df = pd.read_csv('./data/data.csv')`
186 |
187 | This illustrates one way of ingesting data, and in a moment you'll see a few more. Pandas has a lot of data sources it can work with, from the Clipboard to various filetypes. Here's a short list:
188 |
189 | - Flat Files
190 |
191 | - Clipboard
192 |
193 | - Excel
194 |
195 | - JSON
196 |
197 | - HTML
198 |
199 | - HDFStore: PyTables (HDF5)
200 |
201 | - Feather
202 |
203 | - Parquet
204 |
205 | - SAS
206 |
207 | - SQL
208 |
209 | - Google BigQuery
210 |
211 | - STATA
212 |
213 | ...among others.
214 |
215 | Now with the dataframe (`my_df`) loaded, it's an object you can work with. If you just type the name of the dataframe, you'll get back the data in the "table".
216 |
217 | Pandas has a lot of functions that allow you to work with data after you've inspected it. To work with datasets like you would in an RDBMS, here are a few examples.
218 |
219 | Starting with an equivalent (kind of) to the SELECT statement in SQL, you can project a column with the statement `my_df[col]`. Use a comma and include other columns to form *column, column*. These will come back as a new dataframe.
220 |
221 | If you want to use an ordinal position use `my_df.iloc[0]`.
222 |
223 | f you know the index you want, use `my_df.loc['index_one']`.
224 |
225 | If you want the whole row, use `my_df.iloc[0,:]` (for the first row).
226 |
227 | For a WHERE clause, use the comparison tokens you saw earlier. For instance, to get the months lower than October, use `my_df[my_df[month] > 9]`.
228 |
229 | For ORDER BY, use the sort_values function. This command sorts the first column of the dataframe in ascending order: `my_df.sort_values(col1,ascending=True)`.
230 |
231 | Moving on to JOIN operations, you have the ability to use multiple kinds of joins - for instance, the statement `my_df.join(my_df2,on=col1,how='inner')` joins the two dataframes `my_df` and `my_df2' on the column *col1* (which must exist in both dataframes).
232 |
233 | There's a lot more you can do with Pandas, including a lot of data cleaning operations that you'll use for Machine Learning and other Data Science tasks. You'll experiment with this in your Activities.
234 |
235 | Want to learn more? Check this reference: https://pandas.pydata.org/pandas-docs/stable/tutorials.html
236 |
237 |
Activity - Programming with Pandas
238 |
239 | Open the **03_WorkingWithData.py** file and enter the code you find for section 3.1b. The exercises will be marked out using comments:
240 |
241 | `# - 3.1b`
242 |
243 | (Note: Use the Cheat-sheets in the `./assets` directory in the exercise that follows)
244 |
245 |
3.3 Data Ingestion
246 |
247 | Python has many ways to read data in (*sometimes into memory, sometimes streaming as it reads it*) built right in to the standard libraries. Other Libraries, such as Pandas and NumPy, have their own way of reading in data.
248 |
249 | In any case, the data is assigned to a data family or *structure*, which you learned about earlier. Depending on which Library you are using, you'll pick a data structure that makes the most sense for how you want to work with it. For instance, Pandas uses a dataframe as the primary data structure it works with. This is why it's important to know the data types, so that you understand what structure you need to perform your desired operations.
250 |
251 |
Reading from Files
252 |
253 | Many times the data you are looking for is in storage, either locally or remotely. *File-source* based data is loosely defined as whatever data the operating system can reach natively.
254 |
255 | *NOTE:* This means that when you write your code, it's important to know where it will run. Python is an *interpreted* language, which means that it will run on a given platform in a certain way. If you load data from a Windows file system, and it gets deployed to a Linux system, you need to make sure the file-paths check for validity.
256 |
257 | You've already seen how to read data with Pandas. For the built-in Python library, you most often use the csv reader on comma-separated value data. To use it, import the `csv` module. From there, you can use a "with" block to process the file. This example opens a file, uses an if statement to process each line, and if the line contains "carrot", prints the ingredient, the type of carrot (shredded, sliced, etc.), and the amount for the recipe:
258 |
259 |
260 | import csv
261 | with open('mydata.csv') as csvfile:
262 | reader = csv.DictReader(csvfile)
263 | for row in reader:
264 | if row['ingredient'] == 'carrot':
265 | print(row['ingredient'] ,row ['type'],row ['amount'])
266 |
267 |
268 | (Note the indentation - very important!)
269 |
270 | The csv reader has a "dialect" modifier so that you can work with CSV files that are stored in a particular way - use the `help()` function to learn more.
271 |
272 | Reference: https://realpython.com/python-csv/
273 |
274 |
Working with Data in Databases
275 |
276 | Python has Libraries available that allow you to connect to a Relational Database Management System (RDBMS). the `pydobc` Library is one of the most widely used, and works well with Microsoft's SQL Server. You can read more about pyodbc and download it here: https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/python-sql-driver-pyodbc?view=sql-server-2017
277 |
278 | Once you install it (more on installing Libraries later), you once again import it, and then set up your connection. You then use the connection to send a query, returning a dataset, or updating data if that's what you're going for. Here's an example:
279 |
280 |
281 | import pyodbc
282 |
283 | server = 'tcp:myserver.database.windows.net'
284 | # Some other example server values are
285 | # server = 'localhost\sqlexpress' for a named instance
286 | # server = 'myserver,port' to specify an alternate port
287 |
288 | database = 'mydb'
289 | username = 'myusername'
290 | password = 'mypassword'
291 |
292 | cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+
293 | password)
294 |
295 | cursor = cnxn.cursor()
296 |
297 | # Sample select query
298 | cursor.execute("SELECT @@version;")
299 | row = cursor.fetchone()
300 |
301 | while row:
302 |
303 | print row[0]
304 | row = cursor.fetchone()
305 |
306 | # Sample insert query
307 |
308 | cursor.execute("INSERT SalesLT.Product (Name, ProductNumber, StandardCost, ListPrice, SellStartDate) OUTPUT INSERTED.ProductID
309 | VALUES ('SQL Server Express New 20', 'SQLEXPRESS New 20', 0, 0, CURRENT_TIMESTAMP )")
310 |
311 | row = cursor.fetchone()
312 | while row:
313 | print 'Inserted Product key is ' + str(row[0])
314 | row = cursor.fetchone()
315 |
316 |
317 |
Data in Other Sources
318 |
319 | Many other data sources, such as cloud databases and network streams, also have ways of connecting from Python. Even web pages can be used as data sources. One of the primary Libraries for working with web data is *Beautiful Soup*, [which you can find here](https://www.crummy.com/software/BeautifulSoup/). You normally need to connect to the web page first, so for that you use another import, using `requests`, or perhaps `urllib` or `urllib2`.
320 |
321 | Here's an example of reading a web page and printing all the links it has:
322 |
323 |
324 | from bs4 import BeautifulSoup
325 | import requests
326 | html_doc = requests.get("http://coolwebpage.com")
327 | soup = BeautifulSoup(html_doc, 'html.parser')
328 | print(soup.get_text())
329 |
330 |
331 |
Activity - Data Ingestion
332 |
333 | Open the **03_WorkingWithData.py** file and enter the code you find for section 3.2. The exercises will be marked out using comments:
334 |
335 | `# - 3.2`
336 |
337 |
3.4 Data Inspection
338 |
339 | After the data is loaded into a structure, the first step in analytics is to examine the data. You've already seen how to display the data using Pandas, and it's one of the best libraries for data exploration as well.
340 |
341 | Analytics professionals often start with the basics of the statistical layout of the numeric data in a dataset. If you want to see the basic statistics of your data stored in a dataframe called *my_df*, type `my_df.describe()`.
342 |
343 | You'll also want to determine the amount of data you're working with. To do that, type `my_df.shape` to get the number of rows and columns in a dataframe.
344 |
345 | Typing `my_df.head(n)` gives you `n` first rows of the data, or use `my_df.tail(n)` to get the end number of rows returned.
346 |
347 | Another way to see the "shape" of the data is to use `my_df.info()` to see the index, datatypes and memory information for the dataframe.
348 |
349 |
Activity - Data Inspection
350 |
351 | Open the **03_WorkingWithData.py** file and enter the code you find for section 3.3. The exercises will be marked out using comments:
352 |
353 | `# - 3.3`
354 |
355 |
3.4 Graphing
356 |
357 | Examining the data in tabular format won't give you all you need to evaluate and interpret it. It is very useful to display the data in a graphical format, and once again you'll turn to Libraries to do that. There are many Libraries for graphing data in Python, and more are written constantly. The primary Libraries you should be familiar with are MatPlotLib and ggplot.
358 |
359 |
Graphing Data with MatPlotLib
360 |
361 | MatPlotLib is quite old, bu it’s the most widely used graphical library for plotting in Python. It borrowed much of it's design from an industry commercial standard called MATLAB. Many other Libraries are built on top of MatPlotLib or simply work along side it.
362 |
363 | Take a look at an example of a histogram plot with MatPlotLib:
364 |
365 |
366 | import matplotlib
367 | from numpy.random import randn
368 | import matplotlib.pyplot as plt
369 | from matplotlib.ticker import FuncFormatter
370 |
371 | def to_percent(y, position):
372 | # Ignore the passed in position. This has the effect of scaling the default
373 | # tick locations.
374 | s = str(100 * y)
375 |
376 | # The percent symbol needs escaping in latex
377 | if matplotlib.rcParams['text.usetex'] is True:
378 | return s + r'$\%$'
379 | else:
380 | return s + '%'
381 |
382 | x = randn(5000)
383 |
384 | # Make a normed histogram. It'll be multiplied by 100 later.
385 | plt.hist(x, bins=50, normed=True)
386 |
387 | # Create the formatter using the function to_percent. This multiplies all the
388 | # default labels by 100, making them all percentages
389 | formatter = FuncFormatter(to_percent)
390 |
391 | # Set the formatter
392 | plt.gca().yaxis.set_major_formatter(formatter)
393 |
394 | plt.show()
395 |
396 |
397 | 
398 |
399 | Of course, MatPlotLib can do so much more. [Take a look at this reference from the documentation which goes deeper.](https://matplotlib.org/examples/index.html)
400 |
401 |
Graphing with ggplot
402 |
403 | The ggplot library is also used in the R language (in a newer version called *ggplot2*). It follows the guidelines from the *Grammar of Graphics* reference work. The commands in ggplot layer the graphical components. You'll make a base graphic, and even after you create the chart you add axes, a line, add a trendline, coloring and more.
404 |
405 | Here's an example of a plot using the ggplot Library, with the mtcars sample dataset. Notice how it "builds" on the plot so that it's fairly easy to see how it represents each part:
406 |
407 | from ggplot import *
408 |
409 | p = ggplot(aes(x='mpg'), data=mtcars)
410 | p += geom_histogram()
411 | p += xlab("Miles per Gallon")
412 | p += ylab("# of Cars")
413 | p
414 |
415 |
416 | 
417 |
418 | [Check out the official documentation for many more examples.](https://github.com/yhat/ggpy/tree/master/docs)
419 |
420 |
Activity - Graphing
421 |
422 | Open the **03_WorkingWithData.py** file and enter the code you find for section 3.4. The exercises will be marked out using comments:
423 |
424 | `# - 3.4`
425 |
426 |
3.6 Altering Data
427 |
428 | Most data isn't "clean" by default. It's either in the wrong format, missing values, or isn't all structured the way you need it. For this type of work, there are two basic tasks you should learn: Regular Expressions and once again, Pandas. You won't cover an exercise on data editing in this section; instead you'll see an example of that as part of a Machine Learning exercise.
429 |
430 | You can use Regular Expressions in Python to make a lot of your changes - you can read more about that here: https://docs.python.org/3/library/re.html
431 |
432 | But most of the time you'll be using Pandas to make those changes. You can read more about that here: https://tomaugspurger.github.io/modern-5-tidy.html
433 |
434 | And of course there are lots of other things to know about altering data. Read this resource for more: https://www.springboard.com/blog/data-wrangling/
435 |
436 |
3.7 Machine Learning and AI
437 |
438 | A full course on Machine Learning (and one of its applications, Artificial Intelligence), is long and involved. Machine Learning involves evaluating data for *features* (columns) that can create *labels* (predictions or classifications). You do this by using a collection of historical data, and selecting the most predictive features and applying one or more algorithms to that data. You get back a *model* (which is kind of like a function) that you can send new data to for a prediction. This is a bit of an oversimplification of course, but it will serve you well as you work through this course. For a more comprehensive discussion on Data Science and Machine Learning with Python, check out this reference: https://notebooks.azure.com/jakevdp/libraries/PythonDataScienceHandbook
439 |
440 | There are a few "families" of problems you can solve with a Machine Learning Solution:
441 |
442 |
443 |
444 |
445 |
446 | While it's tempting to start with the algorithms and the outputs, it's actually more important to understand the general process of a Data Science project. To do that, you can use the Team Data Science Process - in fact, you have been studying many of these steps already:
447 |
448 |
449 |
450 |
451 |
452 | Each of these phases has a specific set of steps you follow to complete them:
453 |
454 |
455 |
456 | Phase One - Business Understanding
457 |
458 | In the Business Understanding Phase the team determines the prediction or categorical work your organization wants to create. You'll also set up your project planning documents, locate your initial data source locations, and set up the environment you will use to create and operationalize your models. This phase involves a great deal of coordination among the team and the broader organization.
459 |
460 | Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-business-understanding)
461 |
462 |
463 |
464 | Phase Two - Data Acquisition and Understanding
465 |
466 | Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-data)
467 |
468 | The Data Acquisition and Understanding phase of the TDSP you ingest or access data from various locations to answer the questions the organization has asked. In most cases, this data will be in multiple locations. Once the data is ingested into the system, you’ll need to examine it to see what it holds. All data needs cleaning, so after the inspection phase, you’ll replace missing values, add and change columns. You’ve already seen the Libraries you'll need to work with for Data Wrangling - Pandas being the most common in use.
469 |
470 |
471 | Phase Three - Modeling
472 |
473 | In this phase, you will create the experiment runs, perform feature engineering, and run experiments with various settings and parameters. After selecting the best performing run, you will create a trained model and save it for operationalization in the next phase. This modeling is done with yet another set of Python Libraries - the most common being SciKit Learn and TensorFlow : References, among others. You'll see this in action in just a bit.
474 |
475 | Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-modeling)
476 |
477 |
478 | Phase Four - Deployment
479 |
480 | In this phase you will take the trained model and any other necessary assets and deploy them to a system that will respond to API requests.
481 |
482 | Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-deployment)
483 |
484 |
485 | Phase Five - Customer Acceptance
486 |
487 | The final phase involves testing the model predictions on real-world queries to ensure that it meets all requirements. In this phase you also document the project so that all parameters are well-known. Finally, a mechanism is created to re-train the model.
488 |
489 | Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-acceptance)
490 |
491 |
492 |
493 | As you can see, there are quite a few things to do to work with Python in a Data Science Machine Learning project. Rather than have you create an entire solution, there is one you can examine to see each phase. You'll do that next.
494 |
495 |
Activity - Machine Learning
496 |
497 | Now open the `/code/03_MachineLearning.py` file and read the code-blocks you see there marked "Machine Learning".
498 |
499 | Don't worry too much about the math and the functions in the Machine Learning Libraries, just focus on the process. Then swing back around to that Data Science with Python references for a deeper dive into this very large area.
500 |
501 | Want to see this in action? Check out this reference: https://tdsppython-buckwoodynotebooks.notebooks.azure.com/nb/notebooks/Instructor%20Notebook.ipynb
502 |
503 |
For Further Study
504 |
505 | - [Python Docs for Data Types](https://docs.python.org/2/tutorial/datastructures.html#)
506 |
507 | Next, Continue to *04 Environments and Deployment*
--------------------------------------------------------------------------------
/PythonForDataProfessionals/04 Environments and Deployment.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Python for Data Professionals
4 |
5 | ## 04 Environments and Deployment
6 |
7 |
8 |
9 |
10 | - Course Outline
11 | - 1 - Overview and Course Setup
12 | - 2 - Programming Basics
13 | - 3 Working with Data
14 | - 4 Deployment and Environments (This section)
15 | - 4.1 Conda
16 | - 4.2 Pickling
17 | - 4.3 SQL Server MAchine Learning Services
18 |
19 |
20 |
21 |
22 | The main installation of Python - sometimes called "Core" or "base" - has a set of parameters it works with. Since it runs on many operating systems, these variables are set and altered in different ways. Here are the primary environment settings on the standard installation of Python:
23 |
24 | - PYTHONPATH - Sets the location for the Python interpreter to locate the module files imported into a program.
25 | - PYTHONHOME - The alternative module search path.
26 | - PYTHONSTARTUP - The initialization file path ( `.pythonrc.py` ) containing the Python source code. It is executed every time you start the interpreter.
27 | - PYTHONCASEOK - For the Windows OS, find the first case-insensitive match in an "import" statement.
28 |
29 | You can show all of the variables by importing the base configuration system library, and then calling a print statement:
30 |
31 | `import sysconfig`
32 |
33 | `sysconfig.get_config_vars()`
34 |
35 | If you want to see just one variable, remember, it's just an array:
36 |
37 | `sysconfig.get_config_var('LIBDIR')`
38 |
39 |
40 |
41 |
4.1 pip and Conda
42 |
43 | To install new packages, you can build the source code manually, but that's not the way it's most often done. Typically you use a "package manager", and the most popular is "pip". The pip program installs and configures most of the libraries you will need for the base installation of Python.
44 |
45 | You probably already have the pip program. However, to install pip, you can use the [cURL](https://curl.haxx.se/download.html) program to get it:
46 |
47 | `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
48 |
49 | Then use Python to run the script to install it:
50 |
51 | `python get-pip.py`
52 |
53 | From there, you can query the packages you have with this command, from the command-line in your operating system:
54 |
55 | `pip list`
56 |
57 | You can install a package using this command:
58 |
59 | `pip install SomePackage # latest version`
60 |
61 | `pip install SomePackage==1.0.4 # specific version`
62 |
63 | `pip install 'SomePackage>=1.0.4' # minimum version`
64 |
65 | And you can remove a package with this command:
66 |
67 | `pip uninstall SomePackage`
68 |
69 | There is a lot more that you can do with pip, and you can find out the list here:
70 |
71 | `pip`
72 |
73 | A more robust package manager, which even installs a distribution of Python for you along with other tools, is [Conda](https://conda.io/docs/user-guide/getting-started.html). For this course, you have installed Python using Conda, which not only has a package manager, but also isolates environments for you. This means that you can create a "boundary" of variables, package directories, and more around a name you specify. You can then switch to that environment to create your code, and that code will always have a consistent set of variables and packages.
74 |
75 | To create a Conda environment, issue the following command:
76 |
77 | `conda create --name`
78 |
79 | For instance, this command creates a new environment called "bucktest" and installs the biology package called biopython:
80 |
81 | `conda create --name bucktest biopython`
82 |
83 | To see the environments, issue the following command:
84 |
85 | `conda info --envs`
86 |
87 | The one with the asterisk (*) is the one you are using now. To switch to another environment, issue the following command:
88 |
89 | `activate bucktest` (In Windows)
90 |
91 | `source activate bucktest` (Mac and Linux)
92 |
93 | And to see information about that environment, issue the following command:
94 |
95 | `conda list`
96 |
97 | or just `conda` to find out everything you can do with Conda.
98 |
99 | To install packages in that environment, use this command:
100 |
101 | `conda install biopython`
102 |
103 |
Activity - pip and Conda
104 |
105 | Now open the `/code/04_EnvironmentsAndDeployment.py` file and follow the instructions you see there for 4.1.
106 |
107 |
108 |
109 |
4.2 Pickling
110 |
111 | "Pickling" in Python means to serialize a Python object. Perhaps that isn't very helpful - what it really means is to take the output of whatever you did in Python and make it available again in another environment or program. It's a way of saving the "state" of a program so that it can be transferred and then re-loaded.
112 |
113 | It's best illustrated with some code:
114 |
115 | `import pickle`
116 |
117 | `a = ['1','2','3']`
118 |
119 | `PickleFileName = "picklefile"`
120 |
121 | `FileObject = open(PickleFileName,'wb')`
122 |
123 | `pickle.dump(a,FileObject)`
124 |
125 | `fileObject.close()`
126 |
127 | Now you can copy that file to a new computer, open Python, and work with it again as if you ran it there:
128 |
129 | `import pickle`
130 |
131 | `PickleFileName = "picklefile"`
132 |
133 | `FileObject = open(PickleFileName,'r') `
134 |
135 | `b = pickle.load(FileObject) `
136 |
137 | `b`
138 |
139 | And now *a* equals *b*. Of course, your program would be much longer, most often a series of steps, which might for instance do a Machine Learning prediction.
140 |
141 | You can read a lot more about pickling here: https://wiki.python.org/moin/UsingPickle
142 |
143 |
Activity - Pickle
144 |
145 | Now open the `/code/04_EnvironmentsAndDeployment.py` file and follow the instructions you see there for step 4.2.
146 |
147 |
4.3 Docker and Flask
148 |
149 | Two other abstraction levels are useful to think about. You're probably familiar with Virtual Machines - which uses software to emulate hardware. This lets you install a complete new "computer" in a computer's OS. One level up from that abstraction layer is a *Container*. A Container goes slightly further by including a very small kernel of an operating system (most often Linux) to operate a runtime - like Python. This provides an even more consistent environment for your application, since it can also include settings and programs above the Python level.
150 |
151 | The *Flask* micro-framework for Python isn't technically an abstraction layer, it has more to do with serving your application up to a Web call. You'll often see Docker and Flask used together, so you'll cover it here for completeness. Once again, seeing some code is useful to understand - this example comes from the documentation site:
152 |
153 |
154 |
155 | from flask import Flask
156 | app = Flask(__name__)
157 |
158 | @app.route('/')
159 | def hello_world():
160 | return 'Hello, World!'
161 |
162 |
163 |
164 | You can probably follow the layout of this code, but there are some specifics here. First, the code imported Flask itself. Next, the code creates an instance of a Flask app, called "app" in this case. From there, the route was set to the base URL call - just as in the main part of a web page. And finally, a simple function returns the words "Hello World!".
165 |
166 | So far, nothing is happening - the code is just on disk. However, you can "deploy" the code on a system that is running with these commands (in Linux):
167 |
168 |
169 | $ export FLASK_APP=hello.py
170 | $ flask run
171 | * Running on http://127.0.0.1:5000/
172 |
173 |
174 | OK...so what? Well, in this case, you could open a Web Browser on that system and type in that URL - and you'll see "Hello World!" pop up on the screen. Of course, real applications are much more complicated, can take POST and GET operations, and much more. But this is a very convenient way to serve up your Python application without having to tell your users to install and run Python.
175 |
176 | Of course, there's a lot more to both of these topics - read the references below to learn more.
177 |
178 |
179 |
180 |
4.3 Operationalizing Python in SQL Server Machine Learning Services
181 |
182 | SQL Server (2017 and higher) has a mechanism to run Python code by calling it in a Stored Procedure, which can work with a Pickle file or by running SQL Server code directly. The Python is run side-by-side with SQL Server, so as not to allow Python to interfere with SQL Server base processes. This Python extension is part of the SQL Server Machine Learning Services add-on to the relational database engine. It adds a Python execution environment, an Anaconda distribution with the Python 3.5 runtime and interpreter, standard libraries and tools, and the Microsoft product libraries for Python: [revoscalepy](https://docs.microsoft.com/machine-learning-server/python-reference/revoscalepy/revoscalepy-package) for analytics at scale and [microsoftml](https://docs.microsoft.com/machine-learning-server/python-reference/microsoftml/microsoftml-package) for machine learning algorithms. Python runs in a separate process from SQL Server, to guarantee that database operations are not compromised.
183 |
184 | When you run Python "inside" SQL Server, you must encapsulate the Python script inside a special stored procedure, [sp_execute_external_script](https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql?view=sql-server-ver15). Here's an example of Python code running in SQL Server using a Stored Procedure:
185 |
186 |
187 | EXECUTE sp_execute_external_script @language = N'Python'
188 | , @script = N'
189 | a = 1
190 | b = 2
191 | c = a/b
192 | d = a*b
193 | print(c, d)
194 | '
195 |
196 |
197 |
198 |
199 | After the script has been embedded in the stored procedure, any application that can make a stored procedure call can initiate execution of the Python code. From there, SQL Server manages code execution in this process:
200 |
201 | 1. A request for the Python runtime is indicated by the parameter @language='Python' passed to the stored procedure. SQL Server sends this request to the launchpad service. In Linux, SQL uses a launchpadd service to communicate with a separate launchpad process for each user. See the Extensibility architecture diagram for details.
202 | 2. The launchpad service starts the appropriate launcher; in this case, PythonLauncher.
203 | 3. PythonLauncher starts the external Python35 process.
204 | 4. BxlServer coordinates with the Python runtime to manage exchanges of data, and storage of working results.
205 | 5. SQL Satellite manages communications about related tasks and processes with SQL Server.
206 | 6. BxlServer uses SQL Satellite to communicate status and results to SQL Server.
207 | 7. SQL Server gets results and closes related tasks and processes.
208 |
209 | You can see that process here:
210 |
211 |
212 |
213 |
214 |
215 |
Activity - Run Python in a SQL Server Stored Procedure
216 |
217 | - Ensure you have [the pre-requisites completed for the installation of SQL Server Machine Learning Services](https://docs.microsoft.com/en-us/sql/machine-learning/install/sql-machine-learning-services-windows-install?view=sql-server-ver15) installed.
218 | - [Open this reference and follow the steps you see there](https://docs.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-create-script?view=sql-server-ver15).
219 |
220 |
221 |
222 |
For Further Study
223 |
224 | - [You can learn more about Docker here](https://www.fullstackpython.com/docker.html)
225 | - [More on Flask](http://flask.pocoo.org/)
226 | - [Creating a simple Flask application](http://containertutorials.com/docker-compose/flask-simple-app.html)
227 | - [More on SQL Server Machine Learning Services is here](https://docs.microsoft.com/en-us/sql/machine-learning/what-is-sql-server-machine-learning?view=sql-server-ver15)
228 |
229 | Congratulations! You now know the basics or working with Python and Data. As you can see, there's a lot more to learn - so use your new knowledge to expand on what you have learned.
--------------------------------------------------------------------------------
/PythonForDataProfessionals/Python for Data Professionals.pyproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Debug
5 | 2.0
6 | {b7bafee5-b1f9-4942-b965-117a23fe690d}
7 |
8 |
9 |
10 | .
11 | .
12 | {888888a0-9f3d-457c-b088-3a5042f75d52}
13 | Standard Python launcher
14 |
15 |
16 |
17 |
18 |
19 | 10.0
20 |
21 |
22 |
23 | Content
24 | 00 Pre-Requisites.md
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
55 |
56 |
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/Python for Data Professionals_hdi_settings.json:
--------------------------------------------------------------------------------
1 | // workspace configuration template of HDInsight extension
2 | {
3 | /* example:
4 | "script_to_cluster": [{
5 | "clusterName": "hdi_cluster_1",
6 | "filePath": "a.hql"
7 | },
8 | {
9 | "clusterName": "hdi_cluster_2",
10 | "filePath": "src/b.py"
11 | }]
12 | */
13 | "script_to_cluster": [{
14 |
15 | }],
16 | /* more details from: https://github.com/cloudera/livy
17 | examples:
18 | "livy_conf": {
19 | "driverMemory": "1G",
20 | "driverCores": 2,
21 | "executorMemory": "512M",
22 | "executorCores": 10,
23 | "numExecutors": 5
24 | }
25 | */
26 | "livy_conf": {
27 |
28 | },
29 | /* examples:
30 | "additional_conf": {
31 | azure_environment: AzureChina // Only Azure or AzureChina works here
32 | }
33 | */
34 |
35 | "additional_conf": {
36 |
37 | }
38 | }
--------------------------------------------------------------------------------
/PythonForDataProfessionals/assets/MLCheatSheet.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/assets/MLCheatSheet.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/assets/NoStarchPressPython.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/assets/NoStarchPressPython.pdf
--------------------------------------------------------------------------------
/PythonForDataProfessionals/assets/NumpyPythonCheatSheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/assets/NumpyPythonCheatSheet.pdf
--------------------------------------------------------------------------------
/PythonForDataProfessionals/assets/PandasCheatSheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/assets/PandasCheatSheet.pdf
--------------------------------------------------------------------------------
/PythonForDataProfessionals/assets/Python3CheatSheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/assets/Python3CheatSheet.pdf
--------------------------------------------------------------------------------
/PythonForDataProfessionals/assets/UseCases.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/assets/UseCases.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/code/01_OverviewAndCourseSetup.py:
--------------------------------------------------------------------------------
1 | # 01_OverviewAndCourseSetup.py
2 | # Purpose: Initial Course Setup and displaying versions
3 | # Author: Buck Woody
4 | # Credits and Sources: Inline
5 | # Last Updated: 27 June 2018
6 |
7 | # Check the Python Version and Information
8 | import platform
9 | python_version=platform.python_version()
10 | print(python_version)
11 |
12 | # - Fix this code so that it runs
13 |
14 | print "The Python Version is: " python_version
15 |
16 | # - Using "platform", what other information can you derive about this system?
17 |
18 | # EOF: 01_OverviewAndCourseSetup.py
--------------------------------------------------------------------------------
/PythonForDataProfessionals/code/02_ProgrammingBasics.py:
--------------------------------------------------------------------------------
1 | # 02_ProgrammingBasics.py
2 | # Purpose: General Programming exercises for Python
3 | # Author: Buck Woody
4 | # Credits and Sources: Inline
5 | # Last Updated: 27 June 2018
6 |
7 | # 2.1 Getting Help
8 | help()
9 | help(str)
10 |
11 | # - Write code to find help on help
12 |
13 | # 2.2 Code Syntax and Structure
14 |
15 | # - Python uses spaces to indicate code blocks. Fix the code below:
16 | x=10
17 | y=5
18 | if x > y:
19 | print(str(x) + " is greater than " + str(y))
20 |
21 | # - Arguments on first line are forbidden when not using vertical alignment. Fix this code:
22 | foo = long_function_name(var_one, var_two,
23 | var_three, var_four)
24 |
25 | # operators sit far away from their operands. Fix this code:
26 | income = (gross_wages +
27 | taxable_interest +
28 | (dividends - qualified_dividends) -
29 | ira_deduction -
30 | student_loan_interest)
31 |
32 | # - The import statement should use separate lines for each effort. You can fix the code below
33 | # using separate lines or by using the "from" statement:
34 | import sys, os
35 |
36 | # - The following code has extra spaces in the wrong places. Fix this code:
37 | i=i+1
38 | submitted +=1
39 | x = x * 2 - 1
40 | hypot2 = x * x + y * y
41 | c = (a + b) * (a - b)
42 |
43 | # 2.3 Variables
44 |
45 | # - Add a line below x=3 that changes the variable x from int to a string
46 | x=3
47 | type(x)
48 |
49 | # - Write code that prints the string "This class is awesome" using variables:
50 | x="is awesome"
51 | y="This Class"
52 |
53 | # 2.4 Operations and Functions
54 |
55 | # - Use some basic operators to write the following code:
56 | # Assign two variables
57 | # Add them
58 | # Subtract 20 from each, add those values together, save that to a new variable
59 | # Create a new string variable with the text "The result of my operations are: "
60 | # Print out a single string on the screen with the result of the variables
61 | # showing that result.
62 |
63 | # EOF: 02_ProgrammingBasics.py
--------------------------------------------------------------------------------
/PythonForDataProfessionals/code/03_WorkingWithData.py:
--------------------------------------------------------------------------------
1 | # 03_WorkingWithData.py
2 | # Purpose: Exercise files for Python for Data Professionals course, section 3
3 | # Author: Buck Woody
4 | # Credits and Sources: Inline
5 | # Last Updated: 02 July 2018
6 |
7 | # - 3.1 Data Types
8 |
9 | # Create a variable called MyName and set it to your name.
10 | # Print out the middle two letters of the variable:
11 |
12 | # Create a new variable of a 3-digit number. Print out the data type for the variable:
13 |
14 | # Change the previous variable to text. Print the data type for the variable:
15 |
16 | # Create a list structure with three numbers in it, add two of the numbers, print the result:
17 |
18 | # Create a Dictionary structure with three values using keys of 1, 2 and 3.
19 | # Query for the value of key 2:
20 |
21 | # - 3.1a NumPy Exercises
22 | # Create a NumPy 1-dimensional array consisting of three numbers.
23 | # Sum those numbers.
24 | # Add three more numbers as an additional dimension to the array.
25 | # Sum the two dimensions over the rows.
26 | # Sum the two dimensions over the columns:
27 |
28 |
29 | # - 3.1b Pandas Exercises
30 | # Use the Pandas library, and alias it as pd:
31 |
32 | # Show the first five values of long_series:
33 | long_series = pd.Series(np.random.randn(1000))
34 |
35 | # Read the file CATelcoCustomerChurnTrainingSample.csv from the ./data directory
36 | # into a Pandas Data Frame:
37 |
38 | # Explore the Data Frame you just created with Pandas:
39 |
40 | # - 3.2 Data Ingestion
41 | # Read customer data from the ./data/CATelcoCustomerChurnTrainingSample.csv
42 | # into a data frame called df using pandas:
43 |
44 | # Show the Data in the Data Frame:
45 |
46 | # - 3.3 Data Inspection
47 | # Ensure that you have 29 columns and 20,468 rows loaded
48 | print('There should be 20468 observations of 29 variables:')
49 |
50 | # Explore the df Dataframe, using at least a five-number statistical summary.
51 | # NOTE: Your exploration may be much different - you will show this data
52 | # using graphs in the next exercise.
53 |
54 | # Show the size and shape of the data:
55 |
56 | # Show the first and last 10 rows:
57 |
58 | # Show the dataframe structure:
59 |
60 | # Check for missing values:
61 | print('Missing values: ', '\n')
62 |
63 | # perform a simple statistical display:
64 | print('Dataframe Statistics: ', '\n')
65 |
66 | # - 3.4 Graphing
67 | # Using any graphical library or representation you like, create three separate graphs
68 | # that best illustrate the data layout of the dataframe you just created:
69 |
70 | # - 3.5 Machine Learning and AI
71 | # Review the following code, observing what it does.
72 |
73 | # 1 - Setup - Get everything up to date, and add any pips you want here
74 | # Import Libraries for the Customer Churn Prediction Labs - Change for other uses
75 |
76 | # Serializing output/input
77 | import pickle
78 |
79 | # Libraries for training and scoring
80 | from sklearn.naive_bayes import GaussianNB
81 | from sklearn.tree import DecisionTreeClassifier
82 | from sklearn.metrics import accuracy_score
83 | from sklearn.model_selection import train_test_split
84 | from sklearn.preprocessing import LabelEncoder
85 |
86 | # Data and Numeric Manipulation
87 | import pandas as pd
88 | import numpy as np
89 |
90 | # Working with files
91 | import csv
92 |
93 | #/ 1 - Setup
94 |
95 | #2 - Read data and verify
96 | # Read customer data from a single file
97 | df = pd.read_csv('./data/CATelcoCustomerChurnTrainingSample.csv')
98 |
99 | # Ensure that you have 29 columns and 20,468 rows loaded
100 | print('There should be 20468 obervations of 29 variables:')
101 | print(df.shape, '\n')
102 |
103 | # Optional - Instead, read the data from source:
104 | # https://github.com/Azure/MachineLearningSamples-ChurnPrediction/blob/master/data/CATelcoCustomerChurnTrainingSample.csv
105 | #/ 2 - Read Data
106 |
107 | # 2.1 - Explore Data
108 | # Explore the df Dataframe, using at least a five-number statistical summary.
109 | # NOTE: Your exploration may be much different - experiment with graphics as well.
110 |
111 | # Show the size and shape of data:
112 | print('The size of the data is: %d rows and %d columns' % df.shape, '\n')
113 |
114 | # Show the first and last 10 rows
115 | print('First ten rows of the data: ')
116 | print(df.head(10), '\n')
117 | print('Last ten rows of the data: ')
118 | print(df.tail(10), '\n')
119 |
120 | # Show the dataframe structure:
121 | print('Dataframe Structure: ', '\n')
122 | print(df.info(), '\n')
123 |
124 | # Check for missing values:
125 | print('Missing values: ', '\n')
126 | print(df.apply(lambda x: sum(x.isnull()),axis=0), '\n')
127 |
128 | # perform a simple statistical display:
129 | print('Dataframe Statistics: ', '\n')
130 | print(df.describe(), '\n')
131 |
132 | #/ 2.1
133 |
134 | # 3.0 - Customer Churn Prediction Experiment
135 | # For completeness of this example, let's re-import our libraries
136 | import pickle
137 | import pandas as pd
138 | import numpy as np
139 | import csv
140 | from sklearn.naive_bayes import GaussianNB
141 | from sklearn.tree import DecisionTreeClassifier
142 | from sklearn.metrics import accuracy_score
143 | from sklearn.model_selection import train_test_split
144 | from sklearn.preprocessing import LabelEncoder
145 |
146 | # We'll re-load the data as "CustomerDataFrame"
147 | CustomerDataFrame = pd.read_csv('data/CATelcoCustomerChurnTrainingSample.csv')
148 |
149 | # Fill all NA values with 0:
150 | CustomerDataFrame = CustomerDataFrame.fillna(0)
151 |
152 | # Drop all duplicate observations:
153 | CustomerDataFrame = CustomerDataFrame.drop_duplicates()
154 |
155 | # We don't need the 'year" or 'month' variables
156 | CustomerDataFrame = CustomerDataFrame.drop('year', 1)
157 | CustomerDataFrame = CustomerDataFrame.drop('month', 1)
158 |
159 | # Implement One-Hot Encoding for this model (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/)
160 | columns_to_encode = list(CustomerDataFrame.select_dtypes(include=['category','object']))
161 | dummies = pd.get_dummies(CustomerDataFrame[columns_to_encode]) #
162 |
163 | # Drop the original categorical columns:
164 | CustomerDataFrame = CustomerDataFrame.drop(columns_to_encode, axis=1) #
165 |
166 | # Re-join the dummies frame to the original data:
167 | CustomerDataFrame = CustomerDataFrame.join(dummies)
168 |
169 | # Show the new columns in the joined dataframe:
170 | print(CustomerDataFrame.columns, '\n')
171 |
172 | # Experiment using Naive Bayes:
173 | nb_model = GaussianNB()
174 | random_seed = 42
175 | split_ratio = .3
176 | train, test = train_test_split(CustomerDataFrame, random_state = random_seed, test_size = split_ratio)
177 |
178 | target = train['churn'].values
179 | train = train.drop('churn', 1)
180 | train = train.values
181 | nb_model.fit(train, target)
182 |
183 | expected = test['churn'].values
184 | test = test.drop('churn', 1)
185 | predicted = nb_model.predict(test)
186 |
187 | # Print out the Naive Bayes Classification Accuracy:
188 | print("Naive Bayes Classification Accuracy", accuracy_score(expected, predicted))
189 |
190 | # Experiment using Decision Trees:
191 | dt_model = DecisionTreeClassifier(min_samples_split=20, random_state=99)
192 | dt_model.fit(train, target)
193 | predicted = dt_model.predict(test)
194 |
195 | # Print out the Decision Tree Accuracy:
196 | print("Decision Tree Classification Accuracy", accuracy_score(expected, predicted))
197 |
198 | #/ 3.0
199 |
200 | # 4.0a - Create the Model File
201 | # serialize the best performing model on disk
202 | print ("Serialize the model to a model.pkl file in the root")
203 | ModelFile = open('./model.pkl', 'wb')
204 | pickle.dump(dt_model, ModelFile)
205 | ModelFile.close()
206 | #/ 4.0a
207 |
208 | # 4.0b - Operationalization: Scoring the calls to the model
209 | # Prepare the web service definition before deploying
210 | # Import for the pickle
211 | from sklearn.externals import joblib
212 |
213 | # load the model file
214 | global model
215 | model = joblib.load('model.pkl')
216 |
217 | # Import for handling the JSON file
218 | import json
219 | import pandas as pd
220 |
221 | # Set up a sample "call" from a client:
222 | input_df = "{\"callfailurerate\": 0, \"education\": \"Bachelor or equivalent\", \"usesinternetservice\": \"No\", \"gender\": \"Male\", \"unpaidbalance\": 19, \"occupation\": \"Technology Related Job\", \"year\": 2015, \"numberofcomplaints\": 0, \"avgcallduration\": 663, \"usesvoiceservice\": \"No\", \"annualincome\": 168147, \"totalminsusedinlastmonth\": 15, \"homeowner\": \"Yes\", \"age\": 12, \"maritalstatus\": \"Single\", \"month\": 1, \"calldroprate\": 0.06, \"percentagecalloutsidenetwork\": 0.82, \"penaltytoswitch\": 371, \"monthlybilledamount\": 71, \"churn\": 0, \"numdayscontractequipmentplanexpiring\": 96, \"totalcallduration\": 5971, \"callingnum\": 4251078442, \"state\": \"WA\", \"customerid\": 1, \"customersuspended\": \"Yes\", \"numberofmonthunpaid\": 7, \"noadditionallines\": \"\\\\N\"}"
223 |
224 | # Cleanup
225 | input_df_encoded = json.loads(input_df)
226 | input_df_encoded = pd.DataFrame([input_df_encoded], columns=input_df_encoded.keys())
227 | input_df_encoded = input_df_encoded.drop('year', 1)
228 | input_df_encoded = input_df_encoded.drop('month', 1)
229 | input_df_encoded = input_df_encoded.drop('churn', 1)
230 |
231 | # Pre-process scoring data consistent with training data
232 | columns_to_encode = ['customersuspended', 'education', 'gender', 'homeowner', 'maritalstatus', 'noadditionallines', 'occupation', 'state', 'usesinternetservice', 'usesvoiceservice']
233 | dummies = pd.get_dummies(input_df_encoded[columns_to_encode])
234 | input_df_encoded = input_df_encoded.join(dummies)
235 | input_df_encoded = input_df_encoded.drop(columns_to_encode, axis=1)
236 |
237 | columns_encoded = ['age', 'annualincome', 'calldroprate', 'callfailurerate', 'callingnum',
238 | 'customerid', 'monthlybilledamount', 'numberofcomplaints',
239 | 'numberofmonthunpaid', 'numdayscontractequipmentplanexpiring',
240 | 'penaltytoswitch', 'totalminsusedinlastmonth', 'unpaidbalance',
241 | 'percentagecalloutsidenetwork', 'totalcallduration', 'avgcallduration',
242 | 'customersuspended_No', 'customersuspended_Yes',
243 | 'education_Bachelor or equivalent', 'education_High School or below',
244 | 'education_Master or equivalent', 'education_PhD or equivalent',
245 | 'gender_Female', 'gender_Male', 'homeowner_No', 'homeowner_Yes',
246 | 'maritalstatus_Married', 'maritalstatus_Single', 'noadditionallines_\\N',
247 | 'occupation_Non-technology Related Job', 'occupation_Others',
248 | 'occupation_Technology Related Job', 'state_AK', 'state_AL', 'state_AR',
249 | 'state_AZ', 'state_CA', 'state_CO', 'state_CT', 'state_DE', 'state_FL',
250 | 'state_GA', 'state_HI', 'state_IA', 'state_ID', 'state_IL', 'state_IN',
251 | 'state_KS', 'state_KY', 'state_LA', 'state_MA', 'state_MD', 'state_ME',
252 | 'state_MI', 'state_MN', 'state_MO', 'state_MS', 'state_MT', 'state_NC',
253 | 'state_ND', 'state_NE', 'state_NH', 'state_NJ', 'state_NM', 'state_NV',
254 | 'state_NY', 'state_OH', 'state_OK', 'state_OR', 'state_PA', 'state_RI',
255 | 'state_SC', 'state_SD', 'state_TN', 'state_TX', 'state_UT', 'state_VA',
256 | 'state_VT', 'state_WA', 'state_WI', 'state_WV', 'state_WY',
257 | 'usesinternetservice_No', 'usesinternetservice_Yes',
258 | 'usesvoiceservice_No', 'usesvoiceservice_Yes']
259 |
260 | # Now that they are encoded, some values will be "empty". Fill those with 0's:
261 | for column_encoded in columns_encoded:
262 | if not column_encoded in input_df_encoded.columns:
263 | input_df_encoded[column_encoded] = 0
264 |
265 | # Return final prediction
266 | pred = model.predict(input_df_encoded)
267 |
268 | # (In production you would replace Print() statement here with some sort of return to JSON)
269 | print('JSON sent to the prediction Model:', '\n')
270 | print(input_df, '\n')
271 | print('For the JSON string sent from the client, The prediction is returned as more JSON (0 = No churn, 1 = Churn):', '\n')
272 | print(json.dumps(str(pred[0])))
273 |
274 | #/ 4.0b
275 |
276 | # EOF: 03_WorkingWithData.py
--------------------------------------------------------------------------------
/PythonForDataProfessionals/code/04_EnvrionmentsAndDeployment.py:
--------------------------------------------------------------------------------
1 | # 04_EnvironmentsAndDeployments.py
2 | # Purpose: Environmental settings and configurations
3 | # Author: Buck Woody
4 | # Credits and Sources: Inline
5 | # Last Updated: 07 July 2018
6 |
7 | # - 4.1 Show the main environment variables in the current Python environment. Which directory has the libraries?
8 |
9 | # - What else can you find in the sysconfig library? How would you find that out?
10 |
11 | # - Using conda commands, what libraries are currently loaded?
12 | # How would you install a new one?
13 | # What environment are you using now?
14 |
15 | # - 4.2 Create a program that has three text variables. Combine these three into another varaible.
16 | # Load the pickle library and save the results of the first program as a pkl file.
17 | # Close the first program, and create another one that opens and reads the pkl file.
18 | # Combine the final variable from the last program with a next text variable from this program.
19 |
20 | # EOF: 04_EnvironmentsAndDeployment.py
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/AnalyticsAreas.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/AnalyticsAreas.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/DataScience.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/DataScience.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/MLCapabilities.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/MLCapabilities.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/MatPlotLib.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/MatPlotLib.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/SmallBuck.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/SmallBuck.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/aml-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/aml-logo.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/brain.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/brain.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/check.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/check.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/checkbox.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/checkbox.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/checkmark.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/checkmark.jpg
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/cortanalogo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/cortanalogo.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/files.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/files.jpg
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/ggplot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/ggplot.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/keyboard.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/keyboard.jpg
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/microsoftlogo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/microsoftlogo.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/pin.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/pin.jpg
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/solutions-microsoft-logo-small.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/solutions-microsoft-logo-small.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/tdsp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/tdsp.png
--------------------------------------------------------------------------------
/PythonForDataProfessionals/graphics/thinking.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/PythonForDataProfessionals/graphics/thinking.jpg
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/.ipynb_checkpoints/00 Pre-Requisites-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "
00 Pre-Requisites
\n",
14 | "\n",
15 | "This \"Python for Data Professionals\" course is taught using [Jupyter Notebooks](https://notebooks.azure.com/help). You'll be able to run the code samples you see by typing in the Python examples as decribed and clicking the \"Run\" button you see at the top of the screen. \n",
16 | "\n",
17 | "For the most part, there are no pre-requisites for this course using a Notebook. However, if you would like to learn this material on your own machine, you'll need Microsoft Windows, SQL Server, and Visual Studio. You can of course use the Python language on many platforms and in other distributions and with other tools, but using this configuration allows you to stay consistent for instruction during this course. Feel free to use other installations after you complete the course.\n",
18 | "\n",
19 | "Read over this section and then proceed to the next notebook."
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "
Activity 1: Set up the Windows Operating System
\n",
27 | "\n",
28 | "You have three options for setting up Microsoft Windows to complete this course. You can use a Local installation of Windows, a Virtual Machine on your local system, or a Virtual Machine stored in a Cloud provider such as Microsoft Azure. *(The third option is only for classrooms where you have reliable connections to the Internet)*\n",
29 | "\n",
30 | "
Option 1 - Local Installation
\n",
31 | "\n",
32 | "- Install a recent version of Microsoft Windows. For this course, Windows 10, or any current of Windows Server is acceptable.\n",
33 | "- Install all updates to the operating system.\n",
34 | "\n",
35 | "
Option 2 - Install Windows on a Local Virtual Machine Environment
\n",
36 | "\n",
37 | "- Using your local system, [navigate to this resource](https://developer.microsoft.com/en-us/windows/downloads/virtual-machines) and follow the instructions there.\n",
38 | "\n",
39 | "**NOTE: Wait as long as reasonably possible to ensure that the system does not expire - these are free licenses, but they have a time limit**\n",
40 | "\n",
41 | "- You can also use whatever Hypervisor you like for your system and install a legal, registered copy of Microsoft Windows.\n",
42 | "\n",
43 | "
Option 3 - Use a Virtual Machine in a Cloud Provider
\n",
44 | "\n",
45 | "- If you have access to the Internet, you can set up a [free Microsoft Azure Account](https://azure.microsoft.com/en-us/free/search/?&OCID=AID631184_SEM_bSHIQHtA&lnkd=Google_Azure_Brand&gclid=Cj0KCQjwpcLZBRCnARIsAMPBgF2myLWEk3Hllm2354GEs0rD1sDST_xcfkFGRdAE8toYZMalbQJ4M3YaAs9UEALw_wcB&dclid=CPDRgcv57tsCFVXE4Qodo-gLzg) and use a [Data Science Virtual Machine](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/provision-vm). Any size will do, and the free account provides enough resources for a single course. You will not need to install Anaconda, VSCode or SQL Server if you use this choice, as they are already installed for you.\n",
46 | "- Log in to the system and run [Windows Update](https://support.microsoft.com/en-us/help/4027667/windows-update-windows-10)\n",
47 | "\n"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "
Activity 2: Install SQL Server 2017 with ML Services
\n",
55 | "\n",
56 | "- [Navigate to this resource](https://www.microsoft.com/en-us/sql-server/sql-server-downloads), Select **Developer** from the lower part of the page, and install the **Developer Edition**. Select all components for installation.\n",
57 | "\n",
58 | "- Run Windows Update and select the [\"Install updates for other products\" option](https://www.lifewire.com/how-to-change-windows-update-settings-2625778). Apply the latest updates to the classroom system.\n",
59 | "\n",
60 | "
Activity 3: Install Visual Studio with Machine Learning and Data Science workloads
\n",
61 | "\n",
62 | "- On your classroom system, [install Visual Studio 2017](https://www.visualstudio.com/downloads/) - The free Community Edition is adequate for this course.\n",
63 | "\n",
64 | "- During the installation, select the \"Data storage and processing\" and \"Data science and analytical applicaitons\" Workloads. *(NOTE: [In the Data Science Workload installation box, select ALL optional components on the Summary pane!](https://blogs.msdn.microsoft.com/visualstudio/2016/11/18/data-science-workloads-in-visual-studio-2017-rc/))*\n",
65 | "\n",
66 | "- Log in with a Live ID to Visual Studio, let the system load, and apply any updates.\n",
67 | "\n",
68 | "- After the updates complete, click the \"R Tools\" menu item and open the \"Interactive R Window\" option (This will verify that the Data Science Workloads add-ins are working, R and Python). Type the following in that panel to ensure the installation was successful:\n",
69 | "\n"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "x <- 10\n",
79 | "\n",
80 | "x\n"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "You should see the result **\\[1\\]10** returned. If not, open the Visual Studio Installer and select the \"Repair\" option."
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "
For Further Study
\n",
95 | "\n",
96 | "- Platforms supported: https://www.python.org/download/other/ \n",
97 | "\n",
98 | "- Installing Python: https://www.python.org/downloads/\n",
99 | "\n",
100 | "- Installing Python using Anaconda: https://www.infoworld.com/article/3267976/python/anaconda-cpython-pypy-and-more-know-your-python-distributions.html\n",
101 | "\n",
102 | "Next, Continue to *01 Overview and Course Setup*"
103 | ]
104 | }
105 | ],
106 | "metadata": {
107 | "kernelspec": {
108 | "display_name": "Python 3",
109 | "language": "python",
110 | "name": "python3"
111 | },
112 | "language_info": {
113 | "codemirror_mode": {
114 | "name": "ipython",
115 | "version": 3
116 | },
117 | "file_extension": ".py",
118 | "mimetype": "text/x-python",
119 | "name": "python",
120 | "nbconvert_exporter": "python",
121 | "pygments_lexer": "ipython3",
122 | "version": "3.6.5"
123 | }
124 | },
125 | "nbformat": 4,
126 | "nbformat_minor": 2
127 | }
128 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/.ipynb_checkpoints/01 Overview and Setup-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "## 01 Overview and Setup\n",
14 | "\n",
15 | "In this course you'll cover the basics of the Python language and environment from a Data Professional's perspective. While you will learn Python, you'll quickly cover topics that have a lot more depth available. In each section you'll get more references to go deeper, which you should follow up on. Also watch for links within the text - click on each one to explore that topic.\n",
16 | "\n",
17 | "Make sure you check out the **00 Pre-Requisites** page before you start. You'll need all of the items loaded there before you can proceed with the course.\n",
18 | "\n",
19 | "You'll cover these topics in the course:\n",
20 | "\n",
21 | "\n",
22 | "\n",
23 | "\n",
24 | " - Course Outline
\n",
25 | " - 1 - Overview and Course Setup (This section)
\n",
26 | " - 2 - Programming Basics
\n",
27 | " - 3 Working with Data
\n",
28 | " - 4 Deployment and Environments
\n",
29 | "\n",
30 | "\n",
31 | ""
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "
Overview
\n",
39 | "\n",
40 | "There are two main versions of Python - 2 and 3. So many programs were written for version 2 that it is still around, and version 3 was such an upgrade that programs for 2 don't always run in 3 and visa-versa. For this course we'll do everything in version 3 - it's becoming the accepted standard for data professionals.\n",
41 | "\n",
42 | "You have a few ways of working with Python:\n",
43 | "\n",
44 | "- The Interactive Interpreter (Type `python` and the version number if it is in your path)\n",
45 | "- Writing code and running it in some graphical environment (Such as VSCode, Visual Studio, Spyder, PyCharm, IDLE, etc.)\n",
46 | "- Calling a `.py` script file from the `python` command \n",
47 | "\n",
48 | "When you're in command-mode, you'll see that the code looks more like a scripting language, meaning that some parenthesis around functions might not be there. Programming-mode looks like a standard programming language environment - you'll normally use that within an Integrated Programming Environment (IDE)."
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "
Activity: Verify Your Installation and Configure Python
\n",
56 | "\n",
57 | "Open the **01_OverviewAndCourseSetup.py** file and run the code you see there. The exercises will be marked out using comments: \n",
58 | "\n",
59 | "\n",
60 | "# TODO - Section Number\n",
61 | "
"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": null,
67 | "metadata": {},
68 | "outputs": [],
69 | "source": [
70 | "# 01_OverviewAndCourseSetup.py\n",
71 | "# Purpose: Initial Course Setup and displaying versions\n",
72 | "# Author: Buck Woody\n",
73 | "# Credits and Sources: Inline\n",
74 | "# Last Updated: 27 June 2018\n",
75 | "\n",
76 | "# Check the Python Version and Information\n",
77 | "import platform\n",
78 | "python_version=platform.python_version()\n",
79 | "print(python_version)\n",
80 | "\n",
81 | "# - Fix this code so that it runs\n",
82 | "\n",
83 | "print \"The Python Version is: \" python_version\n",
84 | "\n",
85 | "# - Using \"platform\", what other information can you derive about this system?\n",
86 | "\n",
87 | "# EOF: 01_OverviewAndCourseSetup.py"
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "
For Further Study
\n",
95 | "\n",
96 | "- Version differences: https://wiki.python.org/moin/Python2orPython3 \n",
97 | "- Development Environments: IDLE, tk, VSCode, PyCharm, Jupyter Notebooks, Documentation, Training Resources: https://www.python.org/doc/\n",
98 | "- and https://docs.python.org/3/tutorial/index.html \n",
99 | "- The Official Python Documentation Course: https://docs.python.org/3/tutorial/index.html\n",
100 | "\n",
101 | "Next, Continue to *02 Programming Basics*"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "metadata": {},
108 | "outputs": [],
109 | "source": []
110 | }
111 | ],
112 | "metadata": {
113 | "kernelspec": {
114 | "display_name": "Python 3",
115 | "language": "python",
116 | "name": "python3"
117 | },
118 | "language_info": {
119 | "codemirror_mode": {
120 | "name": "ipython",
121 | "version": 3
122 | },
123 | "file_extension": ".py",
124 | "mimetype": "text/x-python",
125 | "name": "python",
126 | "nbconvert_exporter": "python",
127 | "pygments_lexer": "ipython3",
128 | "version": "3.6.5"
129 | }
130 | },
131 | "nbformat": 4,
132 | "nbformat_minor": 2
133 | }
134 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/.ipynb_checkpoints/02 Programming Basics-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "## 02 Programming Basics\n",
14 | "\n",
15 | "\n",
16 | "\n",
17 | "\n",
18 | " - Course Outline
\n",
19 | " - 1 - Overview and Course Setup
\n",
20 | " - 2 - Programming Basics (This section)
\n",
21 | " - 2.1 - Getting help
\n",
22 | " - 2.2 Code Syntax and Structure
\n",
23 | " - 2.3 Variables
- \n",
24 | "
- 2.4 Operations and Functions
- \n",
25 | "
- 3 Working with Data
\n",
26 | " - 4 Deployment and Environments
\n",
27 | "\n",
28 | "\n",
29 | ""
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Programming Basics Overview\n",
37 | "\n",
38 | "From here on out, you'll focus on using Python in programming mode - you'll write code that you run from an IDE or a calling environment, not interactively from the command-line. As you work through this explanation, copy the code you see and run it to see the results. After you work through these copy-and-paste examples, you'll create your own code in the Activities that follow each section."
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "
2.1 - Getting help
\n",
46 | "\n",
47 | "The very first thing you should learn in any language is how to get help. You can [find the help documents on-line](https://docs.python.org/3/index.html), or simply type\n",
48 | " \n",
49 | "`help()`\n",
50 | " \n",
51 | "in your code. For help on a specific topic, put the topic in the parenthesis:\n",
52 | " \n",
53 | " `help(str)`\n",
54 | "\n",
55 | " To see a list of topics, type \n",
56 | "\n",
57 | " `help(topics)`"
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": null,
63 | "metadata": {
64 | "collapsed": true
65 | },
66 | "outputs": [],
67 | "source": [
68 | "# Try it:"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "
2.2 Code Syntax and Structure
\n",
76 | "\n",
77 | "Let's cover a few basics about how Python code is written. (For a full discussion, check out the [Style Guide for Python, called PEP 8](https://www.python.org/dev/peps/pep-0008/) ) Let's use the \"Zen of Python\" rules from Tim Peters for this course:\n",
78 | "\n",
79 | "\n",
80 | "\n",
81 | " Beautiful is better than ugly.\n",
82 | " Explicit is better than implicit.\n",
83 | " Simple is better than complex.\n",
84 | " Complex is better than complicated.\n",
85 | " Flat is better than nested.\n",
86 | " Sparse is better than dense.\n",
87 | " Readability counts.\n",
88 | " Special cases aren't special enough to break the rules.\n",
89 | " Although practicality beats purity.\n",
90 | " Errors should never pass silently.\n",
91 | " Unless explicitly silenced.\n",
92 | " In the face of ambiguity, refuse the temptation to guess.\n",
93 | " There should be one-- and preferably only one --obvious way to do it.\n",
94 | " Although that way may not be obvious at first unless you're Dutch.\n",
95 | " Now is better than never.\n",
96 | " Although never is often better than right now.\n",
97 | " If the implementation is hard to explain, it's a bad idea.\n",
98 | " If the implementation is easy to explain, it may be a good idea.\n",
99 | " Namespaces are one honking great idea -- let's do more of those!\n",
100 | " --Tim Peters\n",
101 | "\n",
102 | "
\n",
103 | "\n",
104 | "In general, use standard coding practices - don't use keywords for variables, be consistent in your naming (camel-case, lower-case, etc.), comment your code clearly, and understand the general syntax of your language, and follow the principles above. But the most important tip is to at least read the PEP 8 and decide for yourself how well that fits into your Zen.\n",
105 | "\n",
106 | "There is one hard-and-fast rule for Python that you *do* need to be aware of: indentation. You **must** indent your code for classes, functions (or methods), loops, conditions, and lists. You can use a tab or four spaces (spaces are the accepted way to do it) but in any case, you have to be consistent. If you use tabs, you always use tabs. If you use spaces, you have to use that throughout. It's best if you set your IDE to handle that for you, whichever way you go.\n",
107 | "\n",
108 | "Python code files have an extension of `.py`. \n",
109 | "\n",
110 | "Comments in Python start with the hash-tag: `#`. There are no block comments (and this makes us all sad) so each line you want to comment must have a tag in front of that line. Keep the lines short (80 characters or so) so that they don't fall off a single-line display like at the command line."
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "
2.3 Variables
\n",
118 | "\n",
119 | "Variables stand in for replaceable values. Python is not strongly-typed, meaning you can just declare a variable name and set it to a value at the same time, and Python will try and guess what data type you want. You use an `=` sign to assign values, and `==` to compare things.\n",
120 | "\n",
121 | "Quotes \\\" or ticks \\' are fine, just be consistent.\n",
122 | "\n",
123 | "`# There are some keywords to be aware of, but x and y are always good choices.`\n",
124 | "\n",
125 | "`x = \"Buck\" # I'm a string.`\n",
126 | "\n",
127 | "`type(x)`\n",
128 | "\n",
129 | "`y = 10 # I'm an integer.`\n",
130 | "\n",
131 | "`type(y)`\n",
132 | "\n",
133 | "To change the type of a value, just re-enter something else:\n",
134 | "\n",
135 | "`x = \"Buck\" # I'm a string.`\n",
136 | "\n",
137 | "`type(x)`\n",
138 | "\n",
139 | "`x = 10 # Now I'm an integer.`\n",
140 | "\n",
141 | "`type(x)`\n",
142 | "\n",
143 | "Or cast it By implicitly declaring the conversion:\n",
144 | "\n",
145 | "`x = \"10\"`\n",
146 | "\n",
147 | "`type(x)`\n",
148 | "\n",
149 | "`print int(x)`\n",
150 | "\n",
151 | "To concatenate string values, use the `+` sign:\n",
152 | "\n",
153 | "`x = \"Buck\"`\n",
154 | "\n",
155 | "`y = \" Woody\"`\n",
156 | "\n",
157 | "`print(x + y)`"
158 | ]
159 | },
160 | {
161 | "cell_type": "code",
162 | "execution_count": null,
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "# Try it:\n"
167 | ]
168 | },
169 | {
170 | "cell_type": "markdown",
171 | "metadata": {},
172 | "source": [
173 | "
2.4 Operations and Functions
\n",
174 | "\n",
175 | "Python has the following operators:\n",
176 | "\n",
177 | " Arithmetic Operators\n",
178 | " Comparison (Relational) Operators\n",
179 | " Assignment Operators\n",
180 | " Logical Operators\n",
181 | " Bitwise Operators\n",
182 | " Membership Operators\n",
183 | " Identity Operators\n",
184 | "\n",
185 | "You have the standard operators and functions from most every language. Here are some of the tokens:\n",
186 | "\n",
187 | "\n",
188 | "\n",
189 | " != *= << ^ \n",
190 | " \" + <<= ^= \n",
191 | " \"\"\" += <= `\n",
192 | " % , <> __\n",
193 | " %= - == \n",
194 | " & -= > b\" \n",
195 | " &= . >= b' \n",
196 | " ' ... >> j \n",
197 | " ''' / >>= r\" \n",
198 | " ( // @ r' \n",
199 | " ) //= J |'\n",
200 | " * /= [ |= \n",
201 | " ** : \\ ~ \n",
202 | " **= < ] \n",
203 | "\n",
204 | "
\n",
205 | "\n",
206 | "Wait...that's it? That's all you're going to tell me? *(Hint: use what you've learned):*\n",
207 | "\n",
208 | "`help('symbols')`\n",
209 | "\n",
210 | "Walk through each of these operators carefully - you'll use them when you work with data in the next module.\n"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": null,
216 | "metadata": {
217 | "collapsed": true
218 | },
219 | "outputs": [],
220 | "source": [
221 | "# Try it:"
222 | ]
223 | },
224 | {
225 | "cell_type": "markdown",
226 | "metadata": {},
227 | "source": [
228 | "
Activity - Programming basics
\n",
229 | "\n",
230 | "Open the **02_ProgrammingBasics.py** file and run the code you see there. The exercises will be marked out using comments:\n",
231 | "\n",
232 | "`# - Section Number`"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": null,
238 | "metadata": {},
239 | "outputs": [],
240 | "source": [
241 | "# 02_ProgrammingBasics.py\n",
242 | "# Purpose: General Programming exercises for Python \n",
243 | "# Author: Buck Woody\n",
244 | "# Credits and Sources: Inline\n",
245 | "# Last Updated: 27 June 2018\n",
246 | "\n",
247 | "# 2.1 Getting Help\n",
248 | "help()\n",
249 | "help(str)\n",
250 | "\n",
251 | "# - Write code to find help on help\n",
252 | "\n",
253 | "# 2.2 Code Syntax and Structure\n",
254 | "\n",
255 | "# - Python uses spaces to indicate code blocks. Fix the code below:\n",
256 | "x=10\n",
257 | "y=5\n",
258 | "if x > y:\n",
259 | "print(str(x) + \" is greater than \" + str(y))\n",
260 | "\n",
261 | "# - Arguments on first line are forbidden when not using vertical alignment. Fix this code:\n",
262 | "foo = long_function_name(var_one, var_two,\n",
263 | " var_three, var_four)\n",
264 | "\n",
265 | "# operators sit far away from their operands. Fix this code:\n",
266 | "income = (gross_wages +\n",
267 | " taxable_interest +\n",
268 | " (dividends - qualified_dividends) -\n",
269 | " ira_deduction -\n",
270 | " student_loan_interest)\n",
271 | "\n",
272 | "# - The import statement should use separate lines for each effort. You can fix the code below \n",
273 | "# using separate lines or by using the \"from\" statement:\n",
274 | "import sys, os\n",
275 | "\n",
276 | "# - The following code has extra spaces in the wrong places. Fix this code:\n",
277 | "i=i+1\n",
278 | "submitted +=1\n",
279 | "x = x * 2 - 1\n",
280 | "hypot2 = x * x + y * y\n",
281 | "c = (a + b) * (a - b)\n",
282 | "\n",
283 | "# 2.3 Variables \n",
284 | "\n",
285 | "# - Add a line below x=3 that changes the variable x from int to a string\n",
286 | "x=3\n",
287 | "type(x)\n",
288 | "\n",
289 | "# - Write code that prints the string \"This class is awesome\" using variables:\n",
290 | "x=\"is awesome\"\n",
291 | "y=\"This Class\"\n",
292 | "\n",
293 | "# 2.4 Operations and Functions\n",
294 | "\n",
295 | "# - Use some basic operators to write the following code:\n",
296 | "# Assign two variables\n",
297 | "# Add them\n",
298 | "# Subtract 20 from each, add those values together, save that to a new variable\n",
299 | "# Create a new string variable with the text \"The result of my operations are: \"\n",
300 | "# Print out a single string on the screen with the result of the variables \n",
301 | "# showing that result. \n",
302 | "\n",
303 | "# EOF: 02_ProgrammingBasics.py"
304 | ]
305 | },
306 | {
307 | "cell_type": "markdown",
308 | "metadata": {},
309 | "source": [
310 | "
For Further Study
\n",
311 | "\n",
312 | "- The PEP - https://www.python.org/dev/peps/pep-0008/\n",
313 | "- Introduction to the Python Coding Style - http://stackabuse.com/introduction-to-the-python-coding-style/\n",
314 | "- The Microsoft Tutorial and samples for Python - https://code.visualstudio.com/docs/languages/python \n",
315 | "- Coding requirements and standards - PEP - https://www.python.org/dev/peps/pep-0008/\n",
316 | "- Another free online self-paced course - https://www.w3schools.com/python/default.asp \n",
317 | "\n",
318 | "Next, Continue to *03 Working with Data*"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": null,
324 | "metadata": {},
325 | "outputs": [],
326 | "source": []
327 | }
328 | ],
329 | "metadata": {
330 | "kernelspec": {
331 | "display_name": "Python 3",
332 | "language": "python",
333 | "name": "python3"
334 | },
335 | "language_info": {
336 | "codemirror_mode": {
337 | "name": "ipython",
338 | "version": 3
339 | },
340 | "file_extension": ".py",
341 | "mimetype": "text/x-python",
342 | "name": "python",
343 | "nbconvert_exporter": "python",
344 | "pygments_lexer": "ipython3",
345 | "version": "3.6.5"
346 | }
347 | },
348 | "nbformat": 4,
349 | "nbformat_minor": 2
350 | }
351 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/.ipynb_checkpoints/04 Environments and Deployment-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "## 04 Environments and Deployment\n",
14 | "\n",
15 | "\n",
16 | "\n",
17 | "\n",
18 | " - Course Outline
\n",
19 | " - 1 - Overview and Course Setup
\n",
20 | " - 2 - Programming Basics
\n",
21 | " - 3 Working with Data
\n",
22 | " - 4 Deployment and Environments (This section)
\n",
23 | " - 4.1 Conda
\n",
24 | " - 4.2 Pickling
\n",
25 | "\n",
26 | "\n",
27 | ""
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "The main installation of Python - sometimes called \"Core\" or \"base\" - has a set of parameters it works with. Since it runs on many operating systems, these variables are set and altered in different ways. Here are the primary environment settings on the standard installation of Python:\n",
35 | "\n",
36 | "- PYTHONPATH - Sets the location for the Python interpreter to locate the module files imported into a program.\n",
37 | "- PYTHONHOME - The alternative module search path. \n",
38 | "- PYTHONSTARTUP - The initialization file path ( `.pythonrc.py` ) containing the Python source code. It is executed every time you start the interpreter.\n",
39 | "- PYTHONCASEOK - For the Windows OS, find the first case-insensitive match in an \"import\" statement.\n",
40 | "\n",
41 | "You can show all of the variables by importing the base configuration system library, and then calling a print statement:\n",
42 | "\n",
43 | "`import sysconfig`\n",
44 | "\n",
45 | "`sysconfig.get_config_vars()`\n",
46 | "\n",
47 | "If you want to see just one variable, remember, it's just an array:\n",
48 | "\n",
49 | "`sysconfig.get_config_var('LIBDIR')`"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 1,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "# Try it:"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "
4.1 pip and Conda
\n",
66 | "\n",
67 | "To install new packages, you can build the source code manually, but that's not the way it's most often done. Typically you use a \"package manager\", and the most popular is \"pip\". The pip program installs and configures most of the libraries you will need for the base installation of Python.\n",
68 | "\n",
69 | "You probably already have the pip program. However, to install pip, you can use the [cURL](https://curl.haxx.se/download.html) program to get it:\n",
70 | "\n",
71 | "`curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`\n",
72 | "\n",
73 | "Then use Python to run the script to install it:\n",
74 | "\n",
75 | "`python get-pip.py`\n",
76 | "\n",
77 | "From there, you can query the packages you have with this command, from the command-line in your operating system:\n",
78 | "\n",
79 | "`pip list`\n",
80 | "\n",
81 | "You can install a package using this command:\n",
82 | "\n",
83 | "`pip install SomePackage # latest version`\n",
84 | "\n",
85 | "`pip install SomePackage==1.0.4 # specific version`\n",
86 | "\n",
87 | "`pip install 'SomePackage>=1.0.4' # minimum version`\n",
88 | "\n",
89 | "And you can remove a package with this command:\n",
90 | "\n",
91 | "`pip uninstall SomePackage`\n",
92 | "\n",
93 | "There is a lot more that you can do with pip, and you can find out the list here:\n",
94 | "\n",
95 | "`pip`\n",
96 | "\n",
97 | "A more robust package manager, which even installs a distribution of Python for you along with other tools, is [Conda](https://conda.io/docs/user-guide/getting-started.html). For this course, you have installed Python using Conda, which not only has a package manager, but also isolates environments for you. This means that you can create a \"boundary\" of variables, package directories, and more around a name you specify. You can then switch to that environment to create your code, and that code will always have a consistent set of variables and packages.\n",
98 | "\n",
99 | "To create a Conda environment, issue the following command:\n",
100 | "\n",
101 | "`conda create --name`\n",
102 | "\n",
103 | "For instance, this command creates a new environment called \"bucktest\" and installs the biology package called biopython:\n",
104 | "\n",
105 | "`conda create --name bucktest biopython`\n",
106 | "\n",
107 | "To see the environments, issue the following command:\n",
108 | "\n",
109 | "`conda info --envs`\n",
110 | "\n",
111 | "The one with the asterisk (*) is the one you are using now. To switch to another environment, issue the following command:\n",
112 | "\n",
113 | "`activate bucktest` (In Windows)\n",
114 | "\n",
115 | "`source activate bucktest` (Mac and Linux)\n",
116 | "\n",
117 | "And to see information about that environment, issue the following command:\n",
118 | "\n",
119 | "`conda list`\n",
120 | "\n",
121 | "or just `conda` to find out everything you can do with Conda.\n",
122 | "\n",
123 | "To install packages in that environment, use this command:\n",
124 | "\n",
125 | "`conda install biopython`"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "
Activity - pip and Conda
\n",
133 | "\n",
134 | "Now open the `/code/04_EnvironmentsAndDeployment.py` file and follow the instructions you see there for 4.1."
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 2,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "# 04_EnvironmentsAndDeployments.py\n",
144 | "# Purpose: Environmental settings and configurations\n",
145 | "# Author: Buck Woody\n",
146 | "# Credits and Sources: Inline\n",
147 | "# Last Updated: 07 July 2018\n",
148 | "\n",
149 | "# - 4.1 Show the main environment variables in the current Python environment. Which directory has the libraries?\n",
150 | "\n",
151 | "# - What else can you find in the sysconfig library? How would you find that out?\n",
152 | "\n",
153 | "# - Using conda commands, what libraries are currently loaded? \n",
154 | "# How would you install a new one? \n",
155 | "# What environment are you using now?"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": [
162 | "
4.2 Pickling
\n",
163 | "\n",
164 | "\"Pickling\" in Python means to serialize a Python object. Perhaps that isn't very helpful - what it really means is to take the output of whatever you did in Python and make it available again in another environment or program. It's a way of saving the \"state\" of a program so that it can be transferred and then re-loaded.\n",
165 | "\n",
166 | "It's best illustrated with some code:\n",
167 | "\n",
168 | "`import pickle`\n",
169 | "\n",
170 | "`a = ['1','2','3']`\n",
171 | "\n",
172 | "`PickleFileName = \"picklefile\"`\n",
173 | "\n",
174 | "`FileObject = open(PickleFileName,'wb')`\n",
175 | "\n",
176 | "`pickle.dump(a,FileObject)`\n",
177 | "\n",
178 | "`fileObject.close()`\n",
179 | "\n",
180 | "Now you can copy that file to a new computer, open Python, and work with it again as if you ran it there:\n",
181 | "\n",
182 | "`import pickle`\n",
183 | "\n",
184 | "`PickleFileName = \"picklefile\"`\n",
185 | "\n",
186 | "`FileObject = open(PickleFileName,'r') ` \n",
187 | "\n",
188 | "`b = pickle.load(FileObject) ` \n",
189 | "\n",
190 | "`b`\n",
191 | "\n",
192 | "And now *a* equals *b*. Of course, your program would be much longer, most often a series of steps, which might for instance do a Machine Learning prediction. \n",
193 | "\n",
194 | "You can read a lot more about pickling here: https://wiki.python.org/moin/UsingPickle"
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "metadata": {},
200 | "source": [
201 | "
Activity - Pickle
\n",
202 | "\n",
203 | "Now open the `/code/04_EnvironmentsAndDeployment.py` file and follow the instructions you see there for step 4.2."
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": 4,
209 | "metadata": {},
210 | "outputs": [],
211 | "source": [
212 | "# - 4.2 Create a program that has three text variables. Combine these three into another varaible. \n",
213 | "# Load the pickle library and save the results of the first program as a pkl file.\n",
214 | "# Close the first program, and create another one that opens and reads the pkl file.\n",
215 | "# Combine the final variable from the last program with a next text variable from this program. \n",
216 | "\n",
217 | "# EOF: 04_EnvironmentsAndDeployment.py"
218 | ]
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "metadata": {},
223 | "source": [
224 | "
4.3 Docker and Flask
\n",
225 | "\n",
226 | "Two other abstraction levels are useful to think about. You're probably familiar with Virtual Machines - which uses software to emulate hardware. This lets you install a complete new \"computer\" in a computer's OS. One level up from that abstraction layer is a *Container*. A Container goes slightly further by including a very small kernel of an operating system (most often Linux) to operate a runtime - like Python. This provides an even more consistent environment for your application, since it can also include settings and programs above the Python level. \n",
227 | "\n",
228 | "The *Flask* micro-framework for Python isn't technically an abstraction layer, it has more to do with serving your application up to a Web call. You'll often see Docker and Flask used together, so you'll cover it here for completeness. Once again, seeing some code is useful to understand - this example comes from the documentation site:\n",
229 | "\n",
230 | "\n",
231 | "\n",
232 | "from flask import Flask\n",
233 | "app = Flask(__name__)\n",
234 | "\n",
235 | "@app.route('/')\n",
236 | "def hello_world():\n",
237 | " return 'Hello, World!'\n",
238 | "\n",
239 | "
\n",
240 | "\n",
241 | "You can probably follow the layout of this code, but there are some specifics here. First, the code imported Flask itself. Next, the code creates an instance of a Flask app, called \"app\" in this case. From there, the route was set to the base URL call - just as in the main part of a web page. And finally, a simple function returns the words \"Hello World!\".\n",
242 | "\n",
243 | "So far, nothing is happening - the code is just on disk. However, you can \"deploy\" the code on a system that is running with these commands (in Linux):\n",
244 | "\n",
245 | "\n",
246 | "$ export FLASK_APP=hello.py\n",
247 | "$ flask run\n",
248 | " * Running on http://127.0.0.1:5000/\n",
249 | "
\n",
250 | "\n",
251 | "OK...so what? Well, in this case, you could open a Web Browser on that system and type in that URL - and you'll see \"Hello World!\" pop up on the screen. Of course, real applications are much more complicated, can take POST and GET operations, and much more. But this is a very convenient way to serve up your Python application without having to tell your users to install and run Python.\n",
252 | "\n",
253 | "Of course, there's a lot more to both of these topics - read the references below to learn more."
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "metadata": {},
259 | "source": [
260 | "
For Further Study
\n",
261 | "\n",
262 | "- More on Docker: https://www.fullstackpython.com/docker.html\n",
263 | "- More on Flask: http://flask.pocoo.org/\n",
264 | "- Creating a simple Flask application: http://containertutorials.com/docker-compose/flask-simple-app.html \n",
265 | "\n",
266 | "Congratulations! You now know the basics or working with Python and Data. As you can see, there's a lot more to learn - so use your new knowledge to expand on what you have learned. "
267 | ]
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.6.5"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 2
291 | }
292 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/00 Pre-Requisites.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "
00 Pre-Requisites
\n",
14 | "\n",
15 | "This \"Python for Data Professionals\" course is taught using [Jupyter Notebooks](https://notebooks.azure.com/help). You'll be able to run the code samples you see by typing in the Python examples as decribed and clicking the \"Run\" button you see at the top of the screen. \n",
16 | "\n",
17 | "For the most part, there are no pre-requisites for this course using a Notebook. However, if you would like to learn this material on your own machine, you'll need Microsoft Windows, SQL Server, and Visual Studio. You can of course use the Python language on many platforms and in other distributions and with other tools, but using this configuration allows you to stay consistent for instruction during this course. Feel free to use other installations after you complete the course.\n",
18 | "\n",
19 | "Read over this section and then proceed to the next notebook."
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "
Activity 1: Set up the Windows Operating System
\n",
27 | "\n",
28 | "You have three options for setting up Microsoft Windows to complete this course. You can use a Local installation of Windows, a Virtual Machine on your local system, or a Virtual Machine stored in a Cloud provider such as Microsoft Azure. *(The third option is only for classrooms where you have reliable connections to the Internet)*\n",
29 | "\n",
30 | "
Option 1 - Local Installation
\n",
31 | "\n",
32 | "- Install a recent version of Microsoft Windows. For this course, Windows 10, or any current of Windows Server is acceptable.\n",
33 | "- Install all updates to the operating system.\n",
34 | "\n",
35 | "
Option 2 - Install Windows on a Local Virtual Machine Environment
\n",
36 | "\n",
37 | "- Using your local system, [navigate to this resource](https://developer.microsoft.com/en-us/windows/downloads/virtual-machines) and follow the instructions there.\n",
38 | "\n",
39 | "**NOTE: Wait as long as reasonably possible to ensure that the system does not expire - these are free licenses, but they have a time limit**\n",
40 | "\n",
41 | "- You can also use whatever Hypervisor you like for your system and install a legal, registered copy of Microsoft Windows.\n",
42 | "\n",
43 | "
Option 3 - Use a Virtual Machine in a Cloud Provider
\n",
44 | "\n",
45 | "- If you have access to the Internet, you can set up a [free Microsoft Azure Account](https://azure.microsoft.com/en-us/free/search/?&OCID=AID631184_SEM_bSHIQHtA&lnkd=Google_Azure_Brand&gclid=Cj0KCQjwpcLZBRCnARIsAMPBgF2myLWEk3Hllm2354GEs0rD1sDST_xcfkFGRdAE8toYZMalbQJ4M3YaAs9UEALw_wcB&dclid=CPDRgcv57tsCFVXE4Qodo-gLzg) and use a [Data Science Virtual Machine](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/provision-vm). Any size will do, and the free account provides enough resources for a single course. You will not need to install Anaconda, VSCode or SQL Server if you use this choice, as they are already installed for you.\n",
46 | "- Log in to the system and run [Windows Update](https://support.microsoft.com/en-us/help/4027667/windows-update-windows-10)\n",
47 | "\n"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {},
53 | "source": [
54 | "
Activity 2: Install SQL Server 2017 with ML Services
\n",
55 | "\n",
56 | "- [Navigate to this resource](https://www.microsoft.com/en-us/sql-server/sql-server-downloads), Select **Developer** from the lower part of the page, and install the **Developer Edition**. Select all components for installation.\n",
57 | "\n",
58 | "- Run Windows Update and select the [\"Install updates for other products\" option](https://www.lifewire.com/how-to-change-windows-update-settings-2625778). Apply the latest updates to the classroom system.\n",
59 | "\n",
60 | "
Activity 3: Install Visual Studio with Machine Learning and Data Science workloads
\n",
61 | "\n",
62 | "- On your classroom system, [install Visual Studio 2017](https://www.visualstudio.com/downloads/) - The free Community Edition is adequate for this course.\n",
63 | "\n",
64 | "- During the installation, select the \"Data storage and processing\" and \"Data science and analytical applicaitons\" Workloads. *(NOTE: [In the Data Science Workload installation box, select ALL optional components on the Summary pane!](https://blogs.msdn.microsoft.com/visualstudio/2016/11/18/data-science-workloads-in-visual-studio-2017-rc/))*\n",
65 | "\n",
66 | "- Log in with a Live ID to Visual Studio, let the system load, and apply any updates.\n",
67 | "\n",
68 | "- After the updates complete, click the \"R Tools\" menu item and open the \"Interactive R Window\" option (This will verify that the Data Science Workloads add-ins are working, R and Python). Type the following in that panel to ensure the installation was successful:\n",
69 | "\n"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": null,
75 | "metadata": {},
76 | "outputs": [],
77 | "source": [
78 | "x <- 10\n",
79 | "\n",
80 | "x\n"
81 | ]
82 | },
83 | {
84 | "cell_type": "markdown",
85 | "metadata": {},
86 | "source": [
87 | "You should see the result **\\[1\\]10** returned. If not, open the Visual Studio Installer and select the \"Repair\" option."
88 | ]
89 | },
90 | {
91 | "cell_type": "markdown",
92 | "metadata": {},
93 | "source": [
94 | "
For Further Study
\n",
95 | "\n",
96 | "- Platforms supported: https://www.python.org/download/other/ \n",
97 | "\n",
98 | "- Installing Python: https://www.python.org/downloads/\n",
99 | "\n",
100 | "- Installing Python using Anaconda: https://www.infoworld.com/article/3267976/python/anaconda-cpython-pypy-and-more-know-your-python-distributions.html\n",
101 | "\n",
102 | "Next, Continue to *01 Overview and Course Setup*"
103 | ]
104 | }
105 | ],
106 | "metadata": {
107 | "kernelspec": {
108 | "display_name": "Python 3",
109 | "language": "python",
110 | "name": "python3"
111 | },
112 | "language_info": {
113 | "codemirror_mode": {
114 | "name": "ipython",
115 | "version": 3
116 | },
117 | "file_extension": ".py",
118 | "mimetype": "text/x-python",
119 | "name": "python",
120 | "nbconvert_exporter": "python",
121 | "pygments_lexer": "ipython3",
122 | "version": "3.6.5"
123 | }
124 | },
125 | "nbformat": 4,
126 | "nbformat_minor": 2
127 | }
128 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/01 Overview and Setup.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "## 01 Overview and Setup\n",
14 | "\n",
15 | "In this course you'll cover the basics of the Python language and environment from a Data Professional's perspective. While you will learn Python, you'll quickly cover topics that have a lot more depth available. In each section you'll get more references to go deeper, which you should follow up on. Also watch for links within the text - click on each one to explore that topic.\n",
16 | "\n",
17 | "Make sure you check out the **00 Pre-Requisites** page before you start. You'll need all of the items loaded there before you can proceed with the course.\n",
18 | "\n",
19 | "You'll cover these topics in the course:\n",
20 | "\n",
21 | "\n",
22 | "\n",
23 | "\n",
24 | " - Course Outline
\n",
25 | " - 1 - Overview and Course Setup (This section)
\n",
26 | " - 2 - Programming Basics
\n",
27 | " - 3 Working with Data
\n",
28 | " - 4 Deployment and Environments
\n",
29 | "\n",
30 | "\n",
31 | ""
32 | ]
33 | },
34 | {
35 | "cell_type": "markdown",
36 | "metadata": {},
37 | "source": [
38 | "
Overview
\n",
39 | "\n",
40 | "There are two main versions of Python - 2 and 3. So many programs were written for version 2 that it is still around, and version 3 was such an upgrade that programs for 2 don't always run in 3 and visa-versa. For this course we'll do everything in version 3 - it's becoming the accepted standard for data professionals.\n",
41 | "\n",
42 | "You have a few ways of working with Python:\n",
43 | "\n",
44 | "- The Interactive Interpreter (Type `python` and the version number if it is in your path)\n",
45 | "- Writing code and running it in some graphical environment (Such as VSCode, Visual Studio, Spyder, PyCharm, IDLE, etc.)\n",
46 | "- Calling a `.py` script file from the `python` command \n",
47 | "\n",
48 | "When you're in command-mode, you'll see that the code looks more like a scripting language, meaning that some parenthesis around functions might not be there. Programming-mode looks like a standard programming language environment - you'll normally use that within an Integrated Programming Environment (IDE)."
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "
Activity: Verify Your Installation and Configure Python
\n",
56 | "\n",
57 | "Open the **01_OverviewAndCourseSetup.py** file and run the code you see there. The exercises will be marked out using comments: \n",
58 | "\n",
59 | "\n",
60 | "# TODO - Section Number\n",
61 | "
"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 1,
67 | "metadata": {},
68 | "outputs": [
69 | {
70 | "ename": "SyntaxError",
71 | "evalue": "Missing parentheses in call to 'print'. Did you mean print(\"The Python Version is: \" python_version)? (, line 14)",
72 | "output_type": "error",
73 | "traceback": [
74 | "\u001b[1;36m File \u001b[1;32m\"\"\u001b[1;36m, line \u001b[1;32m14\u001b[0m\n\u001b[1;33m print \"The Python Version is: \" python_version\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m Missing parentheses in call to 'print'. Did you mean print(\"The Python Version is: \" python_version)?\n"
75 | ]
76 | }
77 | ],
78 | "source": [
79 | "# 01_OverviewAndCourseSetup.py\n",
80 | "# Purpose: Initial Course Setup and displaying versions\n",
81 | "# Author: Buck Woody\n",
82 | "# Credits and Sources: Inline\n",
83 | "# Last Updated: 27 June 2018\n",
84 | "\n",
85 | "# Check the Python Version and Information\n",
86 | "import platform\n",
87 | "python_version=platform.python_version()\n",
88 | "print(python_version)\n",
89 | "\n",
90 | "# - Fix this code so that it runs\n",
91 | "\n",
92 | "print \"The Python Version is: \" python_version\n",
93 | "\n",
94 | "# - Using \"platform\", what other information can you derive about this system?\n",
95 | "\n",
96 | "# EOF: 01_OverviewAndCourseSetup.py"
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "metadata": {},
102 | "source": [
103 | "
For Further Study
\n",
104 | "\n",
105 | "- Version differences: https://wiki.python.org/moin/Python2orPython3 \n",
106 | "- Development Environments: IDLE, tk, VSCode, PyCharm, Jupyter Notebooks, Documentation, Training Resources: https://www.python.org/doc/\n",
107 | "- and https://docs.python.org/3/tutorial/index.html \n",
108 | "- The Official Python Documentation Course: https://docs.python.org/3/tutorial/index.html\n",
109 | "\n",
110 | "Next, Continue to *02 Programming Basics*"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": null,
116 | "metadata": {},
117 | "outputs": [],
118 | "source": []
119 | }
120 | ],
121 | "metadata": {
122 | "kernelspec": {
123 | "display_name": "Python 3",
124 | "language": "python",
125 | "name": "python3"
126 | },
127 | "language_info": {
128 | "codemirror_mode": {
129 | "name": "ipython",
130 | "version": 3
131 | },
132 | "file_extension": ".py",
133 | "mimetype": "text/x-python",
134 | "name": "python",
135 | "nbconvert_exporter": "python",
136 | "pygments_lexer": "ipython3",
137 | "version": "3.6.5"
138 | }
139 | },
140 | "nbformat": 4,
141 | "nbformat_minor": 2
142 | }
143 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/02 Programming Basics.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "## 02 Programming Basics\n",
14 | "\n",
15 | "\n",
16 | "\n",
17 | "\n",
18 | " - Course Outline
\n",
19 | " - 1 - Overview and Course Setup
\n",
20 | " - 2 - Programming Basics (This section)
\n",
21 | " - 2.1 - Getting help
\n",
22 | " - 2.2 Code Syntax and Structure
\n",
23 | " - 2.3 Variables
- \n",
24 | "
- 2.4 Operations and Functions
- \n",
25 | "
- 3 Working with Data
\n",
26 | " - 4 Deployment and Environments
\n",
27 | "\n",
28 | "\n",
29 | ""
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Programming Basics Overview\n",
37 | "\n",
38 | "From here on out, you'll focus on using Python in programming mode - you'll write code that you run from an IDE or a calling environment, not interactively from the command-line. As you work through this explanation, copy the code you see and run it to see the results. After you work through these copy-and-paste examples, you'll create your own code in the Activities that follow each section."
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "
2.1 - Getting help
\n",
46 | "\n",
47 | "The very first thing you should learn in any language is how to get help. You can [find the help documents on-line](https://docs.python.org/3/index.html), or simply type\n",
48 | " \n",
49 | "`help()`\n",
50 | " \n",
51 | "in your code. For help on a specific topic, put the topic in the parenthesis:\n",
52 | " \n",
53 | " `help(str)`\n",
54 | "\n",
55 | " To see a list of topics, type \n",
56 | "\n",
57 | " `help(topics)`"
58 | ]
59 | },
60 | {
61 | "cell_type": "code",
62 | "execution_count": null,
63 | "metadata": {},
64 | "outputs": [],
65 | "source": [
66 | "# Try it:\n",
67 | "help(topics)"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "
2.2 Code Syntax and Structure
\n",
75 | "\n",
76 | "Let's cover a few basics about how Python code is written. (For a full discussion, check out the [Style Guide for Python, called PEP 8](https://www.python.org/dev/peps/pep-0008/) ) Let's use the \"Zen of Python\" rules from Tim Peters for this course:\n",
77 | "\n",
78 | "\n",
79 | "\n",
80 | " Beautiful is better than ugly.\n",
81 | " Explicit is better than implicit.\n",
82 | " Simple is better than complex.\n",
83 | " Complex is better than complicated.\n",
84 | " Flat is better than nested.\n",
85 | " Sparse is better than dense.\n",
86 | " Readability counts.\n",
87 | " Special cases aren't special enough to break the rules.\n",
88 | " Although practicality beats purity.\n",
89 | " Errors should never pass silently.\n",
90 | " Unless explicitly silenced.\n",
91 | " In the face of ambiguity, refuse the temptation to guess.\n",
92 | " There should be one-- and preferably only one --obvious way to do it.\n",
93 | " Although that way may not be obvious at first unless you're Dutch.\n",
94 | " Now is better than never.\n",
95 | " Although never is often better than right now.\n",
96 | " If the implementation is hard to explain, it's a bad idea.\n",
97 | " If the implementation is easy to explain, it may be a good idea.\n",
98 | " Namespaces are one honking great idea -- let's do more of those!\n",
99 | " --Tim Peters\n",
100 | "\n",
101 | "
\n",
102 | "\n",
103 | "In general, use standard coding practices - don't use keywords for variables, be consistent in your naming (camel-case, lower-case, etc.), comment your code clearly, and understand the general syntax of your language, and follow the principles above. But the most important tip is to at least read the PEP 8 and decide for yourself how well that fits into your Zen.\n",
104 | "\n",
105 | "There is one hard-and-fast rule for Python that you *do* need to be aware of: indentation. You **must** indent your code for classes, functions (or methods), loops, conditions, and lists. You can use a tab or four spaces (spaces are the accepted way to do it) but in any case, you have to be consistent. If you use tabs, you always use tabs. If you use spaces, you have to use that throughout. It's best if you set your IDE to handle that for you, whichever way you go.\n",
106 | "\n",
107 | "Python code files have an extension of `.py`. \n",
108 | "\n",
109 | "Comments in Python start with the hash-tag: `#`. There are no block comments (and this makes us all sad) so each line you want to comment must have a tag in front of that line. Keep the lines short (80 characters or so) so that they don't fall off a single-line display like at the command line."
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "
2.3 Variables
\n",
117 | "\n",
118 | "Variables stand in for replaceable values. Python is not strongly-typed, meaning you can just declare a variable name and set it to a value at the same time, and Python will try and guess what data type you want. You use an `=` sign to assign values, and `==` to compare things.\n",
119 | "\n",
120 | "Quotes \\\" or ticks \\' are fine, just be consistent.\n",
121 | "\n",
122 | "`# There are some keywords to be aware of, but x and y are always good choices.`\n",
123 | "\n",
124 | "`x = \"Buck\" # I'm a string.`\n",
125 | "\n",
126 | "`type(x)`\n",
127 | "\n",
128 | "`y = 10 # I'm an integer.`\n",
129 | "\n",
130 | "`type(y)`\n",
131 | "\n",
132 | "To change the type of a value, just re-enter something else:\n",
133 | "\n",
134 | "`x = \"Buck\" # I'm a string.`\n",
135 | "\n",
136 | "`type(x)`\n",
137 | "\n",
138 | "`x = 10 # Now I'm an integer.`\n",
139 | "\n",
140 | "`type(x)`\n",
141 | "\n",
142 | "Or cast it By implicitly declaring the conversion:\n",
143 | "\n",
144 | "`x = \"10\"`\n",
145 | "\n",
146 | "`type(x)`\n",
147 | "\n",
148 | "`print int(x)`\n",
149 | "\n",
150 | "To concatenate string values, use the `+` sign:\n",
151 | "\n",
152 | "`x = \"Buck\"`\n",
153 | "\n",
154 | "`y = \" Woody\"`\n",
155 | "\n",
156 | "`print(x + y)`"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": null,
162 | "metadata": {},
163 | "outputs": [],
164 | "source": [
165 | "# Try it:\n",
166 | "x = \"Buck\" # I'm a string.\n",
167 | "\n",
168 | "type(x)\n",
169 | "\n",
170 | "x = 10 # Now I'm an integer.\n",
171 | "\n",
172 | "type(x)"
173 | ]
174 | },
175 | {
176 | "cell_type": "markdown",
177 | "metadata": {},
178 | "source": [
179 | "
2.4 Operations and Functions
\n",
180 | "\n",
181 | "Python has the following operators:\n",
182 | "\n",
183 | " Arithmetic Operators\n",
184 | " Comparison (Relational) Operators\n",
185 | " Assignment Operators\n",
186 | " Logical Operators\n",
187 | " Bitwise Operators\n",
188 | " Membership Operators\n",
189 | " Identity Operators\n",
190 | "\n",
191 | "You have the standard operators and functions from most every language. Here are some of the tokens:\n",
192 | "\n",
193 | "\n",
194 | "\n",
195 | " != *= << ^ \n",
196 | " \" + <<= ^= \n",
197 | " \"\"\" += <= `\n",
198 | " % , <> __\n",
199 | " %= - == \n",
200 | " & -= > b\" \n",
201 | " &= . >= b' \n",
202 | " ' ... >> j \n",
203 | " ''' / >>= r\" \n",
204 | " ( // @ r' \n",
205 | " ) //= J |'\n",
206 | " * /= [ |= \n",
207 | " ** : \\ ~ \n",
208 | " **= < ] \n",
209 | "\n",
210 | "
\n",
211 | "\n",
212 | "Wait...that's it? That's all you're going to tell me? *(Hint: use what you've learned):*\n",
213 | "\n",
214 | "`help('symbols')`\n",
215 | "\n",
216 | "Walk through each of these operators carefully - you'll use them when you work with data in the next module.\n"
217 | ]
218 | },
219 | {
220 | "cell_type": "code",
221 | "execution_count": null,
222 | "metadata": {
223 | "collapsed": true
224 | },
225 | "outputs": [],
226 | "source": [
227 | "# Try it:"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "
Activity - Programming basics
\n",
235 | "\n",
236 | "Open the **02_ProgrammingBasics.py** file and run the code you see there. The exercises will be marked out using comments:\n",
237 | "\n",
238 | "`# - Section Number`"
239 | ]
240 | },
241 | {
242 | "cell_type": "code",
243 | "execution_count": null,
244 | "metadata": {},
245 | "outputs": [],
246 | "source": [
247 | "# 02_ProgrammingBasics.py\n",
248 | "# Purpose: General Programming exercises for Python \n",
249 | "# Author: Buck Woody\n",
250 | "# Credits and Sources: Inline\n",
251 | "# Last Updated: 27 June 2018\n",
252 | "\n",
253 | "# 2.1 Getting Help\n",
254 | "help()\n",
255 | "help(str)\n",
256 | "\n",
257 | "# - Write code to find help on help\n",
258 | "\n",
259 | "# 2.2 Code Syntax and Structure\n",
260 | "\n",
261 | "# - Python uses spaces to indicate code blocks. Fix the code below:\n",
262 | "x=10\n",
263 | "y=5\n",
264 | "if x > y:\n",
265 | "print(str(x) + \" is greater than \" + str(y))\n",
266 | "\n",
267 | "# - Arguments on first line are forbidden when not using vertical alignment. Fix this code:\n",
268 | "foo = long_function_name(var_one, var_two,\n",
269 | " var_three, var_four)\n",
270 | "\n",
271 | "# operators sit far away from their operands. Fix this code:\n",
272 | "income = (gross_wages +\n",
273 | " taxable_interest +\n",
274 | " (dividends - qualified_dividends) -\n",
275 | " ira_deduction -\n",
276 | " student_loan_interest)\n",
277 | "\n",
278 | "# - The import statement should use separate lines for each effort. You can fix the code below \n",
279 | "# using separate lines or by using the \"from\" statement:\n",
280 | "import sys, os\n",
281 | "\n",
282 | "# - The following code has extra spaces in the wrong places. Fix this code:\n",
283 | "i=i+1\n",
284 | "submitted +=1\n",
285 | "x = x * 2 - 1\n",
286 | "hypot2 = x * x + y * y\n",
287 | "c = (a + b) * (a - b)\n",
288 | "\n",
289 | "# 2.3 Variables \n",
290 | "\n",
291 | "# - Add a line below x=3 that changes the variable x from int to a string\n",
292 | "x=3\n",
293 | "type(x)\n",
294 | "\n",
295 | "# - Write code that prints the string \"This class is awesome\" using variables:\n",
296 | "x=\"is awesome\"\n",
297 | "y=\"This Class\"\n",
298 | "\n",
299 | "# 2.4 Operations and Functions\n",
300 | "\n",
301 | "# - Use some basic operators to write the following code:\n",
302 | "# Assign two variables\n",
303 | "# Add them\n",
304 | "# Subtract 20 from each, add those values together, save that to a new variable\n",
305 | "# Create a new string variable with the text \"The result of my operations are: \"\n",
306 | "# Print out a single string on the screen with the result of the variables \n",
307 | "# showing that result. \n",
308 | "\n",
309 | "# EOF: 02_ProgrammingBasics.py"
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "
For Further Study
\n",
317 | "\n",
318 | "- The PEP - https://www.python.org/dev/peps/pep-0008/\n",
319 | "- Introduction to the Python Coding Style - http://stackabuse.com/introduction-to-the-python-coding-style/\n",
320 | "- The Microsoft Tutorial and samples for Python - https://code.visualstudio.com/docs/languages/python \n",
321 | "- Coding requirements and standards - PEP - https://www.python.org/dev/peps/pep-0008/\n",
322 | "- Another free online self-paced course - https://www.w3schools.com/python/default.asp \n",
323 | "\n",
324 | "Next, Continue to *03 Working with Data*"
325 | ]
326 | },
327 | {
328 | "cell_type": "code",
329 | "execution_count": null,
330 | "metadata": {},
331 | "outputs": [],
332 | "source": []
333 | }
334 | ],
335 | "metadata": {
336 | "kernelspec": {
337 | "display_name": "Python 3",
338 | "language": "python",
339 | "name": "python3"
340 | },
341 | "language_info": {
342 | "codemirror_mode": {
343 | "name": "ipython",
344 | "version": 3
345 | },
346 | "file_extension": ".py",
347 | "mimetype": "text/x-python",
348 | "name": "python",
349 | "nbconvert_exporter": "python",
350 | "pygments_lexer": "ipython3",
351 | "version": "3.6.5"
352 | }
353 | },
354 | "nbformat": 4,
355 | "nbformat_minor": 2
356 | }
357 |
--------------------------------------------------------------------------------
/PythonForDataProfessionals/notebooks/04 Environments and Deployment.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "\n",
10 | "\n",
11 | "# Python for Data Professionals\n",
12 | "\n",
13 | "## 04 Environments and Deployment\n",
14 | "\n",
15 | "\n",
16 | "\n",
17 | "\n",
18 | " - Course Outline
\n",
19 | " - 1 - Overview and Course Setup
\n",
20 | " - 2 - Programming Basics
\n",
21 | " - 3 Working with Data
\n",
22 | " - 4 Deployment and Environments (This section)
\n",
23 | " - 4.1 Conda
\n",
24 | " - 4.2 Pickling
\n",
25 | "\n",
26 | "\n",
27 | ""
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "The main installation of Python - sometimes called \"Core\" or \"base\" - has a set of parameters it works with. Since it runs on many operating systems, these variables are set and altered in different ways. Here are the primary environment settings on the standard installation of Python:\n",
35 | "\n",
36 | "- PYTHONPATH - Sets the location for the Python interpreter to locate the module files imported into a program.\n",
37 | "- PYTHONHOME - The alternative module search path. \n",
38 | "- PYTHONSTARTUP - The initialization file path ( `.pythonrc.py` ) containing the Python source code. It is executed every time you start the interpreter.\n",
39 | "- PYTHONCASEOK - For the Windows OS, find the first case-insensitive match in an \"import\" statement.\n",
40 | "\n",
41 | "You can show all of the variables by importing the base configuration system library, and then calling a print statement:\n",
42 | "\n",
43 | "`import sysconfig`\n",
44 | "\n",
45 | "`sysconfig.get_config_vars()`\n",
46 | "\n",
47 | "If you want to see just one variable, remember, it's just an array:\n",
48 | "\n",
49 | "`sysconfig.get_config_var('LIBDIR')`"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 1,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "# Try it:"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "
4.1 pip and Conda
\n",
66 | "\n",
67 | "To install new packages, you can build the source code manually, but that's not the way it's most often done. Typically you use a \"package manager\", and the most popular is \"pip\". The pip program installs and configures most of the libraries you will need for the base installation of Python.\n",
68 | "\n",
69 | "You probably already have the pip program. However, to install pip, you can use the [cURL](https://curl.haxx.se/download.html) program to get it:\n",
70 | "\n",
71 | "`curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`\n",
72 | "\n",
73 | "Then use Python to run the script to install it:\n",
74 | "\n",
75 | "`python get-pip.py`\n",
76 | "\n",
77 | "From there, you can query the packages you have with this command, from the command-line in your operating system:\n",
78 | "\n",
79 | "`pip list`\n",
80 | "\n",
81 | "You can install a package using this command:\n",
82 | "\n",
83 | "`pip install SomePackage # latest version`\n",
84 | "\n",
85 | "`pip install SomePackage==1.0.4 # specific version`\n",
86 | "\n",
87 | "`pip install 'SomePackage>=1.0.4' # minimum version`\n",
88 | "\n",
89 | "And you can remove a package with this command:\n",
90 | "\n",
91 | "`pip uninstall SomePackage`\n",
92 | "\n",
93 | "There is a lot more that you can do with pip, and you can find out the list here:\n",
94 | "\n",
95 | "`pip`\n",
96 | "\n",
97 | "A more robust package manager, which even installs a distribution of Python for you along with other tools, is [Conda](https://conda.io/docs/user-guide/getting-started.html). For this course, you have installed Python using Conda, which not only has a package manager, but also isolates environments for you. This means that you can create a \"boundary\" of variables, package directories, and more around a name you specify. You can then switch to that environment to create your code, and that code will always have a consistent set of variables and packages.\n",
98 | "\n",
99 | "To create a Conda environment, issue the following command:\n",
100 | "\n",
101 | "`conda create --name`\n",
102 | "\n",
103 | "For instance, this command creates a new environment called \"bucktest\" and installs the biology package called biopython:\n",
104 | "\n",
105 | "`conda create --name bucktest biopython`\n",
106 | "\n",
107 | "To see the environments, issue the following command:\n",
108 | "\n",
109 | "`conda info --envs`\n",
110 | "\n",
111 | "The one with the asterisk (*) is the one you are using now. To switch to another environment, issue the following command:\n",
112 | "\n",
113 | "`activate bucktest` (In Windows)\n",
114 | "\n",
115 | "`source activate bucktest` (Mac and Linux)\n",
116 | "\n",
117 | "And to see information about that environment, issue the following command:\n",
118 | "\n",
119 | "`conda list`\n",
120 | "\n",
121 | "or just `conda` to find out everything you can do with Conda.\n",
122 | "\n",
123 | "To install packages in that environment, use this command:\n",
124 | "\n",
125 | "`conda install biopython`"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "
Activity - pip and Conda
\n",
133 | "\n",
134 | "Now open the `/code/04_EnvironmentsAndDeployment.py` file and follow the instructions you see there for 4.1."
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 2,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "# 04_EnvironmentsAndDeployments.py\n",
144 | "# Purpose: Environmental settings and configurations\n",
145 | "# Author: Buck Woody\n",
146 | "# Credits and Sources: Inline\n",
147 | "# Last Updated: 07 July 2018\n",
148 | "\n",
149 | "# - 4.1 Show the main environment variables in the current Python environment. Which directory has the libraries?\n",
150 | "\n",
151 | "# - What else can you find in the sysconfig library? How would you find that out?\n",
152 | "\n",
153 | "# - Using conda commands, what libraries are currently loaded? \n",
154 | "# How would you install a new one? \n",
155 | "# What environment are you using now?"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": [
162 | "
4.2 Pickling
\n",
163 | "\n",
164 | "\"Pickling\" in Python means to serialize a Python object. Perhaps that isn't very helpful - what it really means is to take the output of whatever you did in Python and make it available again in another environment or program. It's a way of saving the \"state\" of a program so that it can be transferred and then re-loaded.\n",
165 | "\n",
166 | "It's best illustrated with some code:\n",
167 | "\n",
168 | "`import pickle`\n",
169 | "\n",
170 | "`a = ['1','2','3']`\n",
171 | "\n",
172 | "`PickleFileName = \"picklefile\"`\n",
173 | "\n",
174 | "`FileObject = open(PickleFileName,'wb')`\n",
175 | "\n",
176 | "`pickle.dump(a,FileObject)`\n",
177 | "\n",
178 | "`fileObject.close()`\n",
179 | "\n",
180 | "Now you can copy that file to a new computer, open Python, and work with it again as if you ran it there:\n",
181 | "\n",
182 | "`import pickle`\n",
183 | "\n",
184 | "`PickleFileName = \"picklefile\"`\n",
185 | "\n",
186 | "`FileObject = open(PickleFileName,'r') ` \n",
187 | "\n",
188 | "`b = pickle.load(FileObject) ` \n",
189 | "\n",
190 | "`b`\n",
191 | "\n",
192 | "And now *a* equals *b*. Of course, your program would be much longer, most often a series of steps, which might for instance do a Machine Learning prediction. \n",
193 | "\n",
194 | "You can read a lot more about pickling here: https://wiki.python.org/moin/UsingPickle"
195 | ]
196 | },
197 | {
198 | "cell_type": "markdown",
199 | "metadata": {},
200 | "source": [
201 | "
Activity - Pickle
\n",
202 | "\n",
203 | "Now open the `/code/04_EnvironmentsAndDeployment.py` file and follow the instructions you see there for step 4.2."
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": 4,
209 | "metadata": {},
210 | "outputs": [],
211 | "source": [
212 | "# - 4.2 Create a program that has three text variables. Combine these three into another varaible. \n",
213 | "# Load the pickle library and save the results of the first program as a pkl file.\n",
214 | "# Close the first program, and create another one that opens and reads the pkl file.\n",
215 | "# Combine the final variable from the last program with a next text variable from this program. \n",
216 | "\n",
217 | "# EOF: 04_EnvironmentsAndDeployment.py"
218 | ]
219 | },
220 | {
221 | "cell_type": "markdown",
222 | "metadata": {},
223 | "source": [
224 | "
4.3 Docker and Flask
\n",
225 | "\n",
226 | "Two other abstraction levels are useful to think about. You're probably familiar with Virtual Machines - which uses software to emulate hardware. This lets you install a complete new \"computer\" in a computer's OS. One level up from that abstraction layer is a *Container*. A Container goes slightly further by including a very small kernel of an operating system (most often Linux) to operate a runtime - like Python. This provides an even more consistent environment for your application, since it can also include settings and programs above the Python level. \n",
227 | "\n",
228 | "The *Flask* micro-framework for Python isn't technically an abstraction layer, it has more to do with serving your application up to a Web call. You'll often see Docker and Flask used together, so you'll cover it here for completeness. Once again, seeing some code is useful to understand - this example comes from the documentation site:\n",
229 | "\n",
230 | "\n",
231 | "\n",
232 | "from flask import Flask\n",
233 | "app = Flask(__name__)\n",
234 | "\n",
235 | "@app.route('/')\n",
236 | "def hello_world():\n",
237 | " return 'Hello, World!'\n",
238 | "\n",
239 | "
\n",
240 | "\n",
241 | "You can probably follow the layout of this code, but there are some specifics here. First, the code imported Flask itself. Next, the code creates an instance of a Flask app, called \"app\" in this case. From there, the route was set to the base URL call - just as in the main part of a web page. And finally, a simple function returns the words \"Hello World!\".\n",
242 | "\n",
243 | "So far, nothing is happening - the code is just on disk. However, you can \"deploy\" the code on a system that is running with these commands (in Linux):\n",
244 | "\n",
245 | "\n",
246 | "$ export FLASK_APP=hello.py\n",
247 | "$ flask run\n",
248 | " * Running on http://127.0.0.1:5000/\n",
249 | "
\n",
250 | "\n",
251 | "OK...so what? Well, in this case, you could open a Web Browser on that system and type in that URL - and you'll see \"Hello World!\" pop up on the screen. Of course, real applications are much more complicated, can take POST and GET operations, and much more. But this is a very convenient way to serve up your Python application without having to tell your users to install and run Python.\n",
252 | "\n",
253 | "Of course, there's a lot more to both of these topics - read the references below to learn more."
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "metadata": {},
259 | "source": [
260 | "
For Further Study
\n",
261 | "\n",
262 | "- More on Docker: https://www.fullstackpython.com/docker.html\n",
263 | "- More on Flask: http://flask.pocoo.org/\n",
264 | "- Creating a simple Flask application: http://containertutorials.com/docker-compose/flask-simple-app.html \n",
265 | "\n",
266 | "Congratulations! You now know the basics or working with Python and Data. As you can see, there's a lot more to learn - so use your new knowledge to expand on what you have learned. "
267 | ]
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.6.5"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 2
291 | }
292 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | # Lab: Python Basics for Data Professionals
4 |
5 | #### A Microsoft Course from the SQL Server team
6 |
7 |
8 |
9 |
10 |
11 | - About this lab
12 | - Business Applications of this lab
13 | - Technologies used in this lab
14 | - Before Taking this lab
15 | - lab Details
16 | - Related labs
17 | - Lab Modules
18 | - Next Steps
19 |
20 |
21 |
22 |
23 |
24 | > NOTE: This course is in active re-development. [The course files are complete, and located here](https://github.com/microsoft/sqlworkshops-pythonfordatapros/tree/master/PythonForDataProfessionals), but this page is currently being worked on.
25 |
26 | Welcome to this Microsoft solutions lab on the architecture on *Python Basics for the Data Professional*. In this lab, you'll learn basic Python structures, programming and data flow. You'll get resources to go much further in your learning journey, but this short lab will get you up and running quickly.
27 |
28 | The focus of this lab is to familiarize the database professional in the basics of Python, while implementing it in SQL Server Stored Procedures using SQL Server's Machine Learning Services. After this basic introduction, the professional can move on to more in-depth training in Python if desired.
29 |
30 | You'll start by setting up your system to work with Python, then move to understanding the course itself. From there, you will move though programming basics, working with data, and then on to understanding the concepts of Python environments and how to deploy Python code.
31 |
32 | This [github README.MD file](https://lab.github.com/githubtraining/introduction-to-github) explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution. To download this Lab to your local computer, click the **Clone or Download** button you see at the top right side of this page. [More about that process is here](https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository).
33 |
34 | You can view all of the [courses and other labs our team has created at this link - open in a new tab to find out more.](https://microsoft.github.io/sqllabs/)
35 |
36 |
37 |
38 |
Learning Objectives
39 |
40 | In this lab you'll learn:
41 | - How to set up a Python environment for SQL Server using Machine Learning Services
42 | - The Basics of programming in Python including code syntax, getting help, variables, operators, and functions
43 | - Working with data structures, and understanding popular data libraries
44 | - Data Ingestion and access
45 | - Machine Learning in Python
46 | - Environments and code deployment
47 |
48 |
49 |
50 | The goal of this lab is to familiarize the data professional with Python environments and programming.
51 |
52 | The concepts and skills taught in this lab form the starting points for:
53 |
54 | - Data professionals that wish to include Python code in their data access and programming
55 | - Security professionals who wish to understand how to securely implement secure Python coding practices
56 | - Anyone interested in learning more about programming with Python and databases
57 |
58 |
59 |
60 |
61 |
62 | Businesses require the ability to securely access their data for many workloads, including various programming languages. Python (along with the R language) has merged as a powerful tool for data ingestion, processing and analysis. Previously, Python programmers accessed various databases and retrieved data over a network connection like any application, but this often means pulling large amounts of data over a potentially insecure network to bring multiple copies to each developer to work with locally. The SQL Server Machine Learning Services feature allows Python code to run inside a Stored Procedure in SQL Server, which then accesses data directly. This also allows the Python developer to create code locally, and then send that code on to the Database Administrator for installation on the server - the developer never has to touch the production server or data.
63 |
64 | This couse explains how to work with Python, and then how to operationalize the code on a SQL Server.
65 |
66 |
67 |
68 |
69 |
70 |
71 | The solution includes the following technologies - although you are not limited to these, they form the basis of the lab. At the end of the lab you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided.
72 |
73 |
74 | Technology | Description |
75 | Python | *An Open-Source, multiple paradigm coding language with extensible packages |
76 | Microsoft SQL Server | *A complete data platform, including a Relational Database Management System (RDBMS), Data Pipeline, Business Intelligence, Graph Database Processing, and other constructs to work securely with multiple forms of data, including structured, semi-structured and unstructured. |
77 |
78 |
79 |
80 |
81 |
82 |
83 |
84 |
85 | You'll need a local system that you are able to install software on. The lab demonstrations use Microsoft Windows as an operating system and all examples use Windows for the lab. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.
86 |
87 | This lab expects that you understand data structures and working with SQL Server and computer networks. This lab does not expect you to have any prior data science knowledge, but a basic knowledge of programming and statistics is helpful.
88 |
89 | If you are new to these, here are a few references you can complete prior to class:
90 |
91 | - [Microsoft SQL Server](https://docs.microsoft.com/en-us/sql/relational-databases/database-engine-tutorials?view=sql-server-ver15)
92 | - [Microsoft Azure](https://docs.microsoft.com/en-us/learn/paths/azure-fundamentals/)
93 | - [Basic Programming](https://www.khanacademy.org/computing/computer-programming/programming/intro-to-programming/v/programming-intro)
94 |
95 |
Setup
96 |
97 | A full prerequisites document is located here. These instructions should be completed before the lab starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient).
98 |
99 |
100 |
101 |
102 |
103 | This lab uses the Microsoft Windows operating system, although Linux is also supported once you have completed the exercises.
104 |
105 |
106 |
107 | Primary Audience: | Data Professionals tasked with implementing Big Data, Machine Learning and AI solutions |
108 | Secondary Audience: | Security Architects and Developers |
109 | Level: | 300 |
110 | Type: | In-Person |
111 | Length: | 8-9 hours |
112 |
113 |
114 |
115 |
116 |
117 |
118 |
119 | - This course is also availalbe in a zero-install, online Jupyter Notebook format. [You can find that here](https://notebooks.azure.com/BuckWoodyNoteBooks/projects/PythonDataProfessional).
120 |
121 |
122 |
123 |
124 |
125 | This is a modular lab, and in each section, you'll learn concepts, technologies and processes to help you complete the solution.
126 |
127 |
128 |
129 | Module | Topics |
130 |
131 | 01 - Overview and Course Setup | In this Module you will cover and overview of the Python language and set up your system for the course. |
132 | 02 - Programming Basics | This Module covers the commands and procedures for getting help in Python, code syntax and structure, variables, and operators and functions. |
133 | 03 - Working with Data | In this Module you will learn more about data types, ingestion, inpsection, and graphing, with a brief introduction to Data Science with Python. |
134 | 04 - Environments and Deployment | In this Module you will learn more about Python environments such as Conda, and how to deploy your code using the "pickle" library. |
135 |
136 |
137 |
138 |
139 |
140 |
Next Steps
141 |
142 | Next, Continue to 00 - Prerequisites
143 |
144 |
145 | # Contributing
146 |
147 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
148 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
149 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
150 |
151 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
152 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
153 | provided by the bot. You will only need to do this once across all repos using our CLA.
154 |
155 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
156 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
157 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
158 |
159 | # Legal Notices
160 |
161 | ### License
162 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), see [the LICENSE file](https://github.com/MicrosoftDocs/mslearn-tailspin-spacegame-web/blob/master/LICENSE), and grant you a license to any code in the repository under [the MIT License](https://opensource.org/licenses/MIT), see the [LICENSE-CODE file](https://github.com/MicrosoftDocs/mslearn-tailspin-spacegame-web/blob/master/LICENSE-CODE).
163 |
164 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation
165 | may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.
166 | The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.
167 | Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
168 |
169 | Privacy information can be found at https://privacy.microsoft.com/en-us/
170 |
171 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
172 | or trademarks, whether by implication, estoppel or otherwise.
173 |
174 |
--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Security
4 |
5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
6 |
7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
8 |
9 | ## Reporting Security Issues
10 |
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 |
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
14 |
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
16 |
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc).
18 |
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 |
21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 | * Full paths of source file(s) related to the manifestation of the issue
23 | * The location of the affected source code (tag/branch/commit or direct URL)
24 | * Any special configuration required to reproduce the issue
25 | * Step-by-step instructions to reproduce the issue
26 | * Proof-of-concept or exploit code (if possible)
27 | * Impact of the issue, including how an attacker might exploit the issue
28 |
29 | This information will help us triage your report more quickly.
30 |
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
32 |
33 | ## Preferred Languages
34 |
35 | We prefer all communications to be in English.
36 |
37 | ## Policy
38 |
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
40 |
41 |
42 |
--------------------------------------------------------------------------------
/graphics/AnalyticsAreas.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/AnalyticsAreas.png
--------------------------------------------------------------------------------
/graphics/DataScience.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/DataScience.png
--------------------------------------------------------------------------------
/graphics/MLCapabilities.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/MLCapabilities.png
--------------------------------------------------------------------------------
/graphics/MatPlotLib.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/MatPlotLib.png
--------------------------------------------------------------------------------
/graphics/SmallBuck.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/SmallBuck.png
--------------------------------------------------------------------------------
/graphics/aml-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/aml-logo.png
--------------------------------------------------------------------------------
/graphics/brain.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/brain.png
--------------------------------------------------------------------------------
/graphics/check.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/check.png
--------------------------------------------------------------------------------
/graphics/checkbox.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/checkbox.png
--------------------------------------------------------------------------------
/graphics/checkmark.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/checkmark.jpg
--------------------------------------------------------------------------------
/graphics/cortanalogo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/cortanalogo.png
--------------------------------------------------------------------------------
/graphics/files.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/files.jpg
--------------------------------------------------------------------------------
/graphics/ggplot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/ggplot.png
--------------------------------------------------------------------------------
/graphics/keyboard.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/keyboard.jpg
--------------------------------------------------------------------------------
/graphics/microsoftlogo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/microsoftlogo.png
--------------------------------------------------------------------------------
/graphics/pin.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/pin.jpg
--------------------------------------------------------------------------------
/graphics/solutions-microsoft-logo-small.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/solutions-microsoft-logo-small.png
--------------------------------------------------------------------------------
/graphics/tdsp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/tdsp.png
--------------------------------------------------------------------------------
/graphics/thinking.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/sqlworkshops-pythonfordatapros/e7a9fa3dadd492872812c32dc5c030359c3f3905/graphics/thinking.jpg
--------------------------------------------------------------------------------