├── .gitignore ├── LICENSE ├── README.md ├── check_env.py ├── exercises ├── calc_derivative │ ├── calc_derivative.py │ └── calc_derivative_solution.py ├── dow_selection │ ├── dow.csv │ ├── dow_selection.py │ └── dow_selection_solution.py ├── load_text │ ├── complex_data_file.txt │ ├── float_data.txt │ ├── float_data_with_header.txt │ ├── load_text.py │ └── load_text_solution.py ├── plotting │ ├── dc_metro.JPG │ ├── my_plots.png │ ├── plotting.py │ ├── plotting_bonus_solution.py │ ├── plotting_solution.py │ └── sample_plots.png ├── structured_array │ ├── short_logs.crv │ ├── structured_array.py │ └── structured_array_solution.py └── wind_statistics │ ├── wind.data │ ├── wind.desc │ ├── wind_statistics.py │ └── wind_statistics_solution.py └── slides.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Distribution / packaging 9 | .Python 10 | env/ 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | *.egg-info/ 23 | .installed.cfg 24 | *.egg 25 | 26 | # PyInstaller 27 | # Usually these files are written by a python script from a template 28 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 29 | *.manifest 30 | *.spec 31 | 32 | # Installer logs 33 | pip-log.txt 34 | pip-delete-this-directory.txt 35 | 36 | # Unit test / coverage reports 37 | htmlcov/ 38 | .tox/ 39 | .coverage 40 | .coverage.* 41 | .cache 42 | nosetests.xml 43 | coverage.xml 44 | *,cover 45 | 46 | # Translations 47 | *.mo 48 | *.pot 49 | 50 | # Django stuff: 51 | *.log 52 | 53 | # Sphinx documentation 54 | docs/_build/ 55 | 56 | # PyBuilder 57 | target/ 58 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | © 2001-2016, Enthought, Inc. 2 | All Rights Reserved. Use only permitted under license. Copying, sharing, redistributing or other unauthorized use strictly prohibited. 3 | All trademarks and registered trademarks are the property of their respective owners. 4 | Enthought, Inc. 5 | 200 W Cesar Chavez Suite 202 6 | Austin, TX 78701 7 | www.enthought.com 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SciPy2016 tutorial: Introduction to NumPy 2 | 3 | This repository contains all the material needed by students registered for the 4 | Numpy tutorial of SciPy 2016 on Monday, July 11th 2016. 5 | 6 | For a smooth experience, you will need to make sure that you install or update 7 | your Python distribution and download the tutorial material _before_ the day 8 | of the tutorial as the Wi-Fi at the AT&T center can be flaky. 9 | 10 | 11 | ## Python distribution and Packages needed 12 | 13 | If you don't already have a working python distribution, by far the easiest 14 | way to get everything you need for this tutorial is to download Enthought 15 | Canopy ([https://store.enthought.com/](https://store.enthought.com/), 16 | the free version is sufficient), or Continuum's Anaconda 17 | ([http://continuum.io/downloads](http://continuum.io/downloads)). 18 | 19 | If you have the choice, I recommend to use a Python 2.7 distribution, which 20 | is what I will be using and my material as been tested with that. If you have 21 | a Python 3.4+ version, you should be fine, though you might have to replace a 22 | print statement (`print a`) by the print function (`print(a)`) in some of the 23 | solution files. 24 | 25 | To be able to run the examples, demoes and exercises, you must have the 26 | following packages installed: 27 | 28 | - numpy 1.10+ 29 | - matplotlib 1.5+ 30 | - ipython 4.0+ (for running, experimenting and doing exercises) 31 | - nose (only to test your distribution, see below) 32 | 33 | If you use Canopy, everything you need will be installed by default. If you 34 | use `conda`, you can create a new environment using the following command: 35 | 36 | $ conda create -n numpy-tutorial python=2 numpy matplotlib nose ipython 37 | 38 | To test your installation, please execute the `check_env.py` script. The 39 | output should look something like this: 40 | 41 | $ python check_env.py 42 | .... 43 | ---------------------------------------------------------------------- 44 | Ran 4 tests in 0.162 s 45 | 46 | OK 47 | 48 | 49 | ## Content needed 50 | 51 | This GitHub repository is all that is needed in terms of tutorial content. The simplest solution is to download the material using this link: 52 | 53 | https://github.com/enthought/Numpy-Tutorial-SciPyConf-2016/archive/master.zip 54 | 55 | If you're familiar with Git, you can also clone this repository with: 56 | 57 | $ git clone https://github.com/enthought/Numpy-Tutorial-SciPyConf-2016.git 58 | 59 | It will create a new folder named SciPy2016_numpy_tutorial/ with all the 60 | content you will need: the slides I will go through (`slides.pdf`), and a folder 61 | of exercises. 62 | 63 | As you get closer to the day of the tutorial, it is highly recommended to 64 | update this repository, as I will be improving it this week. To update it, open 65 | a command prompt, move **into** the SciPy2016_numpy_tutorial/ folder and run: 66 | 67 | $ git pull 68 | 69 | 70 | Questions? Problems? 71 | ==================== 72 | Questions? Problems? Don't wait, shoot me and the rest of the group an email on 73 | the tutorial mailing list: https://groups.google.com/forum/#!forum/scipy-2016-numpy-tutorial 74 | -------------------------------------------------------------------------------- /check_env.py: -------------------------------------------------------------------------------- 1 | """ Run this file to check your python installation. 2 | """ 3 | from numpy.testing import assert_array_equal 4 | 5 | 6 | def test_import_numpy(): 7 | import numpy 8 | 9 | 10 | def test_numpy_version(): 11 | import numpy 12 | version_found = numpy.__version__.split(".") 13 | version_found = tuple(int(num) for num in version_found) 14 | assert version_found > (1, 8) 15 | 16 | 17 | def test_import_matplotlib(): 18 | from matplotlib.pyplot import plot 19 | 20 | 21 | def test_slicing(): 22 | from numpy import array 23 | x = array([[1, 2, 3], [4, 5, 6]]) 24 | assert_array_equal(x[:, ::2], array([[1, 3], [4, 6]])) 25 | 26 | 27 | if __name__ == "__main__": 28 | import nose 29 | nose.run(defaultTest=__name__) 30 | -------------------------------------------------------------------------------- /exercises/calc_derivative/calc_derivative.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Calculate Derivative 4 | -------------------- 5 | 6 | Topics: NumPy array indexing and array math. 7 | 8 | Use array slicing and math operations to calculate the 9 | numerical derivative of ``sin`` from 0 to ``2*pi``. There is no 10 | need to use a 'for' loop for this. 11 | 12 | Plot the resulting values and compare to ``cos``. 13 | 14 | Bonus 15 | ~~~~~ 16 | 17 | Implement integration of the same function using Riemann sums or the 18 | trapezoidal rule. 19 | 20 | See :ref:`calc-derivative-solution`. 21 | """ 22 | from numpy import linspace, pi, sin, cos, cumsum 23 | from matplotlib.pyplot import plot, show, subplot, legend, title 24 | 25 | # calculate the sin() function on evenly spaced data. 26 | x = linspace(0,2*pi,101) 27 | y = sin(x) 28 | 29 | plot(x,y) 30 | show() 31 | -------------------------------------------------------------------------------- /exercises/calc_derivative/calc_derivative_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Topics: NumPy array indexing and array math. 4 | 5 | Use array slicing and math operations to calculate the 6 | numerical derivative of ``sin`` from 0 to ``2*pi``. There is no 7 | need to use a for loop for this. 8 | 9 | Plot the resulting values and compare to ``cos``. 10 | 11 | Bonus 12 | ~~~~~ 13 | 14 | Implement integration of the same function using Riemann sums or the 15 | trapezoidal rule. 16 | 17 | """ 18 | from numpy import linspace, pi, sin, cos, cumsum 19 | from matplotlib.pyplot import plot, show, subplot, legend, title 20 | 21 | # calculate the sin() function on evenly spaced data. 22 | x = linspace(0,2*pi,101) 23 | y = sin(x) 24 | 25 | # calculate the derivative dy/dx numerically. 26 | # First, calculate the distance between adjacent pairs of 27 | # x and y values. 28 | dy = y[1:]-y[:-1] 29 | dx = x[1:]-x[:-1] 30 | 31 | # Now divide to get "rise" over "run" for each interval. 32 | dy_dx = dy/dx 33 | 34 | # Assuming central differences, these derivative values 35 | # centered in-between our original sample points. 36 | centers_x = (x[1:]+x[:-1])/2.0 37 | 38 | # Plot our derivative calculation. It should match up 39 | # with the cos function since the derivative of sin is 40 | # cos. 41 | subplot(1,2,1) 42 | plot(centers_x, dy_dx,'rx', centers_x, cos(centers_x),'b-') 43 | title(r"$\rm{Derivative\ of}\ sin(x)$") 44 | 45 | # Trapezoidal rule integration. 46 | avg_height = (y[1:]+y[:-1])/2.0 47 | int_sin = cumsum(dx * avg_height) 48 | 49 | # Plot our integration against -cos(x) - -cos(0) 50 | closed_form = -cos(x)+cos(0) 51 | subplot(1,2,2) 52 | plot(x[1:], int_sin,'rx', x, closed_form,'b-') 53 | legend(('numerical', 'actual')) 54 | title(r"$\int \, \sin(x) \, dx$") 55 | show() 56 | -------------------------------------------------------------------------------- /exercises/dow_selection/dow.csv: -------------------------------------------------------------------------------- 1 | 13261.82,13338.23,12969.42,13043.96,3452650000,13043.96 2 | 13044.12,13197.43,12968.44,13056.72,3429500000,13056.72 3 | 13046.56,13049.65,12740.51,12800.18,4166000000,12800.18 4 | 12801.15,12984.95,12640.44,12827.49,4221260000,12827.49 5 | 12820.9,12998.11,12511.03,12589.07,4705390000,12589.07 6 | 12590.21,12814.97,12431.53,12735.31,5351030000,12735.31 7 | 12733.11,12931.29,12632.15,12853.09,5170490000,12853.09 8 | 12850.74,12863.34,12495.91,12606.3,4495840000,12606.3 9 | 12613.78,12866.1,12596.95,12778.15,3682090000,12778.15 10 | 12777.5,12777.5,12425.92,12501.11,4601640000,12501.11 11 | 12476.81,12699.05,12294.48,12466.16,5440620000,12466.16 12 | 12467.05,12597.85,12089.38,12159.21,5303130000,12159.21 13 | 12159.94,12441.85,11953.71,12099.3,6004840000,12099.3 14 | 12092.72,12167.42,11508.74,11971.19,6544690000,11971.19 15 | 11969.08,12339.1,11530.12,12270.17,3241680000,12270.17 16 | 12272.69,12522.82,12114.83,12378.61,5735300000,12378.61 17 | 12391.7,12590.69,12103.61,12207.17,4882250000,12207.17 18 | 12205.71,12423.81,12061.42,12383.89,4100930000,12383.89 19 | 12385.19,12604.92,12262.29,12480.3,4232960000,12480.3 20 | 12480.14,12715.96,12311.55,12442.83,4742760000,12442.83 21 | 12438.28,12734.74,12197.09,12650.36,4970290000,12650.36 22 | 12638.17,12841.88,12510.05,12743.19,4650770000,12743.19 23 | 12743.11,12810.34,12557.61,12635.16,3495780000,12635.16 24 | 12631.85,12631.85,12234.97,12265.13,4315740000,12265.13 25 | 12257.25,12436.33,12142.14,12200.1,4008120000,12200.1 26 | 12196.2,12366.99,12045,12247,4589160000,12247 27 | 12248.47,12330.97,12058.01,12182.13,3768490000,12182.13 28 | 12181.89,12332.76,12006.79,12240.01,3593140000,12240.01 29 | 12241.56,12524.12,12207.9,12373.41,4044640000,12373.41 30 | 12368.12,12627.76,12354.22,12552.24,3856420000,12552.24 31 | 12551.51,12611.26,12332.03,12376.98,3644760000,12376.98 32 | 12376.66,12441.2,12216.68,12348.21,3583300000,12348.21 33 | 12349.59,12571.11,12276.81,12337.22,3613550000,12337.22 34 | 12333.31,12489.29,12159.42,12427.26,3870520000,12427.26 35 | 12426.85,12545.79,12225.36,12284.3,3696660000,12284.3 36 | 12281.09,12429.05,12116.92,12381.02,3572660000,12381.02 37 | 12380.77,12612.47,12292.03,12570.22,3866350000,12570.22 38 | 12569.48,12771.14,12449.08,12684.92,4096060000,12684.92 39 | 12683.54,12815.59,12527.64,12694.28,3904700000,12694.28 40 | 12689.28,12713.99,12463.32,12582.18,3938580000,12582.18 41 | 12579.58,12579.58,12210.3,12266.39,4426730000,12266.39 42 | 12264.36,12344.71,12101.29,12258.9,4117570000,12258.9 43 | 12259.14,12291.22,11991.06,12213.8,4757180000,12213.8 44 | 12204.93,12392.74,12105.36,12254.99,4277710000,12254.99 45 | 12254.59,12267.86,12010.03,12040.39,4323460000,12040.39 46 | 12039.09,12131.33,11778.66,11893.69,4565410000,11893.69 47 | 11893.04,11993.75,11691.47,11740.15,4261240000,11740.15 48 | 11741.33,12205.98,11741.33,12156.81,5109080000,12156.81 49 | 12148.61,12360.58,12037.79,12110.24,4414280000,12110.24 50 | 12096.49,12242.29,11832.88,12145.74,5073360000,12145.74 51 | 12146.39,12249.86,11781.43,11951.09,5153780000,11951.09 52 | 11946.45,12119.69,11650.44,11972.25,5683010000,11972.25 53 | 11975.92,12411.63,11975.92,12392.66,5335630000,12392.66 54 | 12391.52,12525.19,12077.27,12099.66,1203830000,12099.66 55 | 12102.43,12434.34,12024.68,12361.32,6145220000,12361.32 56 | 12361.97,12687.61,12346.17,12548.64,4499000000,12548.64 57 | 12547.34,12639.82,12397.62,12532.6,4145120000,12532.6 58 | 12531.79,12531.79,12309.62,12422.86,4055670000,12422.86 59 | 12421.88,12528.13,12264.76,12302.46,4037930000,12302.46 60 | 12303.92,12441.67,12164.22,12216.4,3686980000,12216.4 61 | 12215.92,12384.84,12095.18,12262.89,4188990000,12262.89 62 | 12266.64,12693.93,12266.64,12654.36,4745120000,12654.36 63 | 12651.67,12790.28,12488.22,12608.92,4320440000,12608.92 64 | 12604.69,12734.97,12455.04,12626.03,3920100000,12626.03 65 | 12626.35,12738.3,12489.4,12609.42,3703100000,12609.42 66 | 12612.59,12786.83,12550.22,12612.43,3747780000,12612.43 67 | 12602.66,12664.38,12440.55,12576.44,3602500000,12576.44 68 | 12574.65,12686.93,12416.53,12527.26,3556670000,12527.26 69 | 12526.78,12705.9,12447.96,12581.98,3686150000,12581.98 70 | 12579.78,12579.78,12280.89,12325.42,3723790000,12325.42 71 | 12324.77,12430.86,12208.42,12302.06,3565020000,12302.06 72 | 12303.6,12459.36,12223.97,12362.47,3581230000,12362.47 73 | 12371.51,12670.56,12371.51,12619.27,4260370000,12619.27 74 | 12617.4,12725.93,12472.71,12620.49,3713880000,12620.49 75 | 12626.76,12965.47,12626.76,12849.36,4222380000,12849.36 76 | 12850.91,12902.69,12666.08,12825.02,3420570000,12825.02 77 | 12825.02,12870.86,12604.53,12720.23,3821900000,12720.23 78 | 12721.45,12883.8,12627,12763.22,4103610000,12763.22 79 | 12764.68,12979.88,12651.51,12848.95,4461660000,12848.95 80 | 12848.38,12987.29,12703.7,12891.86,3891150000,12891.86 81 | 12890.76,13015.62,12791.55,12871.75,3607000000,12871.75 82 | 12870.37,12970.27,12737.82,12831.94,3815320000,12831.94 83 | 12831.45,13052.91,12746.45,12820.13,4508890000,12820.13 84 | 12818.34,13079.94,12721.94,13010,4448780000,13010 85 | 13012.53,13191.49,12931.35,13058.2,3953030000,13058.2 86 | 13056.57,13105.75,12896.5,12969.54,3410090000,12969.54 87 | 12968.89,13071.07,12817.53,13020.83,3924100000,13020.83 88 | 13010.82,13097.77,12756.14,12814.35,4075860000,12814.35 89 | 12814.84,12965.95,12727.56,12866.78,3827550000,12866.78 90 | 12860.68,12871.75,12648.09,12745.88,3518620000,12745.88 91 | 12768.38,12903.33,12746.36,12876.05,3370630000,12876.05 92 | 12872.08,12957.65,12716.16,12832.18,4018590000,12832.18 93 | 12825.12,13037.44,12806.21,12898.38,3979370000,12898.38 94 | 12891.29,13028.16,12798.39,12992.66,3836480000,12992.66 95 | 12992.74,13069.52,12860.6,12986.8,3842590000,12986.8 96 | 12985.41,13170.97,12899.19,13028.16,3683970000,13028.16 97 | 13026.04,13026.04,12742.29,12828.68,3854320000,12828.68 98 | 12824.94,12926.71,12550.39,12601.19,4517990000,12601.19 99 | 12597.69,12743.68,12515.78,12625.62,3955960000,12625.62 100 | 12620.9,12637.43,12420.2,12479.63,3516380000,12479.63 101 | 12479.63,12626.84,12397.56,12548.35,3588860000,12548.35 102 | 12542.9,12693.77,12437.38,12594.03,3927240000,12594.03 103 | 12593.87,12760.21,12493.47,12646.22,3894440000,12646.22 104 | 12647.36,12750.84,12555.6,12638.32,3845630000,12638.32 105 | 12637.67,12645.4,12385.76,12503.82,3714320000,12503.82 106 | 12503.2,12620.98,12317.61,12402.85,4396380000,12402.85 107 | 12391.86,12540.37,12283.74,12390.48,4338640000,12390.48 108 | 12388.81,12652.81,12358.07,12604.45,4350790000,12604.45 109 | 12602.74,12602.74,12180.5,12209.81,4771660000,12209.81 110 | 12210.13,12406.36,12102.5,12280.32,4404570000,12280.32 111 | 12277.71,12425.98,12116.58,12289.76,4635070000,12289.76 112 | 12286.34,12317.2,12029.46,12083.77,4779980000,12083.77 113 | 12089.63,12337.72,12041.43,12141.58,4734240000,12141.58 114 | 12144.59,12376.72,12096.23,12307.35,4080420000,12307.35 115 | 12306.86,12381.44,12139.79,12269.08,3706940000,12269.08 116 | 12269.65,12378.67,12114.14,12160.3,3801960000,12160.3 117 | 12158.68,12212.33,11947.07,12029.06,4573570000,12029.06 118 | 12022.54,12188.31,11881.03,12063.09,4811670000,12063.09 119 | 12062.19,12078.23,11785.04,11842.69,5324900000,11842.69 120 | 11843.83,11986.94,11731.06,11842.36,4186370000,11842.36 121 | 11842.36,11962.37,11668.53,11807.43,4705050000,11807.43 122 | 11805.31,12008.7,11683.75,11811.83,4825640000,11811.83 123 | 11808.57,11808.57,11431.92,11453.42,5231280000,11453.42 124 | 11452.85,11556.33,11248.48,11346.51,6208260000,11346.51 125 | 11345.7,11504.55,11226.34,11350.01,5032330000,11350.01 126 | 11344.64,11465.79,11106.65,11382.26,5846290000,11382.26 127 | 11382.34,11510.41,11180.58,11215.51,5276090000,11215.51 128 | 11297.33,11336.49,11158.02,11288.53,3247590000,11288.53 129 | 11289.19,11477.52,11094.44,11231.96,5265420000,11231.96 130 | 11225.03,11459.52,11101.19,11384.21,6034110000,11384.21 131 | 11381.93,11505.12,11115.61,11147.44,5181000000,11147.44 132 | 11148.01,11351.24,11006.01,11229.02,5840430000,11229.02 133 | 11226.17,11292.04,10908.64,11100.54,6742200000,11100.54 134 | 11103.64,11299.7,10972.63,11055.19,5434860000,11055.19 135 | 11050.8,11201.67,10731.96,10962.54,7363640000,10962.54 136 | 10961.89,11308.41,10831.61,11239.28,6738630400,11239.28 137 | 11238.39,11538.5,11118.46,11446.66,7365209600,11446.66 138 | 11436.56,11599.57,11290.5,11496.57,5653280000,11496.57 139 | 11495.02,11663.4,11339.02,11467.34,4630640000,11467.34 140 | 11457.9,11692.79,11273.32,11602.5,6180230000,11602.5 141 | 11603.39,11820.21,11410.02,11632.38,6705830000,11632.38 142 | 11630.34,11714.21,11288.79,11349.28,6127980000,11349.28 143 | 11341.14,11540.78,11252.47,11370.69,4672560000,11370.69 144 | 11369.47,11439.25,11094.76,11131.08,4282960000,11131.08 145 | 11133.44,11444.05,11086.13,11397.56,5414240000,11397.56 146 | 11397.56,11681.47,11328.68,11583.69,5631330000,11583.69 147 | 11577.99,11631.16,11317.69,11378.02,5346050000,11378.02 148 | 11379.89,11512.61,11205.41,11326.32,4684870000,11326.32 149 | 11326.32,11449.67,11144.59,11284.15,4562280000,11284.15 150 | 11286.02,11652.24,11286.02,11615.77,1219310000,11615.77 151 | 11603.64,11745.71,11454.64,11656.07,4873420000,11656.07 152 | 11655.42,11680.5,11355.63,11431.43,5319380000,11431.43 153 | 11432.09,11808.49,11344.23,11734.32,4966810000,11734.32 154 | 11729.67,11933.55,11580.19,11782.35,5067310000,11782.35 155 | 11781.7,11830.39,11541.43,11642.47,4711290000,11642.47 156 | 11632.81,11689.05,11377.37,11532.96,4787600000,11532.96 157 | 11532.07,11744.33,11399.84,11615.93,4064000000,11615.93 158 | 11611.21,11776.41,11540.05,11659.9,4041820000,11659.9 159 | 11659.65,11744.49,11410.18,11479.39,3829290000,11479.39 160 | 11478.09,11501.45,11260.53,11348.55,4159760000,11348.55 161 | 11345.94,11511.06,11240.18,11417.43,4555030000,11417.43 162 | 11415.23,11501.29,11263.63,11430.21,4032590000,11430.21 163 | 11426.79,11684,11426.79,11628.06,3741070000,11628.06 164 | 11626.19,11626.19,11336.82,11386.25,3420600000,11386.25 165 | 11383.56,11483.62,11284.47,11412.87,3587570000,11412.87 166 | 11412.46,11575.14,11349.69,11502.51,3499610000,11502.51 167 | 11499.87,11756.46,11493.72,11715.18,3854280000,11715.18 168 | 11713.23,11730.49,11508.78,11543.55,3288120000,11543.55 169 | -------------------------------------------------------------------------------- /exercises/dow_selection/dow_selection.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Dow Selection 4 | ------------- 5 | 6 | Topics: Boolean array operators, sum function, where function, plotting. 7 | 8 | The array 'dow' is a 2-D array with each row holding the 9 | daily performance of the Dow Jones Industrial Average from the 10 | beginning of 2008 (dates have been removed for exercise simplicity). 11 | The array has the following structure:: 12 | 13 | OPEN HIGH LOW CLOSE VOLUME ADJ_CLOSE 14 | 13261.82 13338.23 12969.42 13043.96 3452650000 13043.96 15 | 13044.12 13197.43 12968.44 13056.72 3429500000 13056.72 16 | 13046.56 13049.65 12740.51 12800.18 4166000000 12800.18 17 | 12801.15 12984.95 12640.44 12827.49 4221260000 12827.49 18 | 12820.9 12998.11 12511.03 12589.07 4705390000 12589.07 19 | 12590.21 12814.97 12431.53 12735.31 5351030000 12735.31 20 | 21 | 0. The data has been loaded from a .csv file for you. 22 | 1. Create a "mask" array that indicates which rows have a volume 23 | greater than 5.5 billion. 24 | 2. How many are there? (hint: use sum). 25 | 3. Find the index of every row (or day) where the volume is greater 26 | than 5.5 billion. hint: look at the where() command. 27 | 28 | Bonus 29 | ~~~~~ 30 | 31 | 1. Plot the adjusted close for *every* day in 2008. 32 | 2. Now over-plot this plot with a 'red dot' marker for every 33 | day where the volume was greater than 5.5 billion. 34 | 35 | See :ref:`dow-selection-solution`. 36 | """ 37 | 38 | from numpy import loadtxt, sum, where 39 | from matplotlib.pyplot import figure, hold, plot, show 40 | 41 | # Constants that indicate what data is held in each column of 42 | # the 'dow' array. 43 | OPEN = 0 44 | HIGH = 1 45 | LOW = 2 46 | CLOSE = 3 47 | VOLUME = 4 48 | ADJ_CLOSE = 5 49 | 50 | # 0. The data has been loaded from a .csv file for you. 51 | 52 | # 'dow' is our NumPy array that we will manipulate. 53 | dow = loadtxt('dow.csv', delimiter=',') 54 | 55 | # 1. Create a "mask" array that indicates which rows have a volume 56 | # greater than 5.5 billion. 57 | 58 | 59 | # 2. How many are there? (hint: use sum). 60 | 61 | # 3. Find the index of every row (or day) where the volume is greater 62 | # than 5.5 billion. hint: look at the where() command. 63 | 64 | # BONUS: 65 | # a. Plot the adjusted close for EVERY day in 2008. 66 | # b. Now over-plot this plot with a 'red dot' marker for every 67 | # day where the volume was greater than 5.5 billion. 68 | -------------------------------------------------------------------------------- /exercises/dow_selection/dow_selection_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | 4 | Topics: Boolean array operators, sum function, where function, plotting. 5 | 6 | The array 'dow' is a 2-D array with each row holding the 7 | daily performance of the Dow Jones Industrial Average from the 8 | beginning of 2008 (dates have been removed for exercise simplicity). 9 | The array has the following structure:: 10 | 11 | OPEN HIGH LOW CLOSE VOLUME ADJ_CLOSE 12 | 13261.82 13338.23 12969.42 13043.96 3452650000 13043.96 13 | 13044.12 13197.43 12968.44 13056.72 3429500000 13056.72 14 | 13046.56 13049.65 12740.51 12800.18 4166000000 12800.18 15 | 12801.15 12984.95 12640.44 12827.49 4221260000 12827.49 16 | 12820.9 12998.11 12511.03 12589.07 4705390000 12589.07 17 | 12590.21 12814.97 12431.53 12735.31 5351030000 12735.31 18 | 19 | 0. The data has been loaded from a .csv file for you. 20 | 1. Create a "mask" array that indicates which rows have a volume 21 | greater than 5.5 billion. 22 | 2. How many are there? (hint: use sum). 23 | 3. Find the index of every row (or day) where the volume is greater 24 | than 5.5 billion. hint: look at the where() command. 25 | 26 | Bonus 27 | ~~~~~ 28 | 29 | 1. Plot the adjusted close for *every* day in 2008. 30 | 2. Now over-plot this plot with a 'red dot' marker for every 31 | day where the volume was greater than 5.5 billion. 32 | 33 | """ 34 | 35 | from numpy import loadtxt, sum, where 36 | from matplotlib.pyplot import figure, hold, plot, show 37 | 38 | # Constants that indicate what data is held in each column of 39 | # the 'dow' array. 40 | OPEN = 0 41 | HIGH = 1 42 | LOW = 2 43 | CLOSE = 3 44 | VOLUME = 4 45 | ADJ_CLOSE = 5 46 | 47 | # 0. The data has been loaded from a csv file for you. 48 | 49 | # 'dow' is our NumPy array that we will manipulate. 50 | dow = loadtxt('dow.csv', delimiter=',') 51 | 52 | 53 | # 1. Create a "mask" array that indicates which rows have a volume 54 | # greater than 5.5 billion. 55 | high_volume_mask = dow[:, VOLUME] > 5.5e9 56 | 57 | # 2. How many are there? (hint: use sum). 58 | high_volume_days = sum(high_volume_mask) 59 | print "The dow volume has been above 5.5 billion on" \ 60 | " %d days this year." % high_volume_days 61 | 62 | # 3. Find the index of every row (or day) where the volume is greater 63 | # than 5.5 billion. hint: look at the where() command. 64 | high_vol_index = where(high_volume_mask)[0] 65 | 66 | # BONUS: 67 | # 1. Plot the adjusted close for EVERY day in 2008. 68 | # 2. Now over-plot this plot with a 'red dot' marker for every 69 | # day where the dow was greater than 5.5 billion. 70 | 71 | # Create a new plot. 72 | figure() 73 | 74 | # Plot the adjusted close for every day of the year as a blue line. 75 | # In the format string 'b-', 'b' means blue and '-' indicates a line. 76 | plot(dow[:, ADJ_CLOSE], 'b-') 77 | 78 | # Plot the days where the volume was high with red dots... 79 | plot(high_vol_index, dow[high_vol_index, ADJ_CLOSE], 'ro') 80 | 81 | # Scripts must call the plot "show" command to display the plot 82 | # to the screen. 83 | show() 84 | -------------------------------------------------------------------------------- /exercises/load_text/complex_data_file.txt: -------------------------------------------------------------------------------- 1 | -- THIS IS THE BEGINNING OF THE FILE -- 2 | % This is a more complex file to read! 3 | 4 | % Day, Month, Year, Useless Col, Avg Power 5 | 01, 01, 2000, ad766, 30 6 | 02, 01, 2000, t873, 41 7 | % we don't have Jan 03rd! 8 | 04, 01, 2000, r441, 55 9 | 05, 01, 2000, s345, 78 10 | 06, 01, 2000, x273, 134 % that day was crazy 11 | 07, 01, 2000, x355, 42 12 | 13 | %-- THIS IS THE END OF THE FILE -- 14 | -------------------------------------------------------------------------------- /exercises/load_text/float_data.txt: -------------------------------------------------------------------------------- 1 | 1 2 3 4 2 | 5 6 7 8 -------------------------------------------------------------------------------- /exercises/load_text/float_data_with_header.txt: -------------------------------------------------------------------------------- 1 | c1 c2 c3 c4 2 | 1 2 3 4 3 | 5 6 7 8 -------------------------------------------------------------------------------- /exercises/load_text/load_text.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Load Array from Text File 4 | ------------------------- 5 | 6 | 0. From the IPython prompt, type:: 7 | 8 | In [1]: loadtxt? 9 | 10 | to see the options on how to use the loadtxt command. 11 | 12 | 13 | 1. Use loadtxt to load in a 2D array of floating point values from 14 | 'float_data.txt'. The data in the file looks like:: 15 | 16 | 1 2 3 4 17 | 5 6 7 8 18 | 19 | The resulting data should be a 2x4 array of floating point values. 20 | 21 | 2. In the second example, the file 'float_data_with_header.txt' has 22 | strings as column names in the first row:: 23 | 24 | c1 c2 c3 c4 25 | 1 2 3 4 26 | 5 6 7 8 27 | 28 | Ignore these column names, and read the remainder of the data into 29 | a 2D array. 30 | 31 | Later on, we'll learn how to create a "structured array" using 32 | these column names to create fields within an array. 33 | 34 | Bonus 35 | ~~~~~ 36 | 37 | 3. A third example is more involved (the file is called 38 | 'complex_data_file.txt'). It contains comments in multiple 39 | locations, uses multiple formats, and includes a useless column to 40 | skip:: 41 | 42 | -- THIS IS THE BEGINNING OF THE FILE -- 43 | % This is a more complex file to read! 44 | 45 | % Day, Month, Year, Useless Col, Avg Power 46 | 01, 01, 2000, ad766, 30 47 | 02, 01, 2000, t873, 41 48 | % we don't have Jan 03rd! 49 | 04, 01, 2000, r441, 55 50 | 05, 01, 2000, s345, 78 51 | 06, 01, 2000, x273, 134 % that day was crazy 52 | 07, 01, 2000, x355, 42 53 | 54 | %-- THIS IS THE END OF THE FILE -- 55 | 56 | 57 | See :ref:`load-text-solution` 58 | """ 59 | 60 | from numpy import loadtxt 61 | -------------------------------------------------------------------------------- /exercises/load_text/load_text_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Load Array from Text File 4 | ------------------------- 5 | 6 | 0. From the IPython prompt, type:: 7 | 8 | In [1]: loadtxt? 9 | 10 | to see the options on how to use the loadtxt command. 11 | 12 | 13 | 1. Use loadtxt to load in a 2D array of floating point values from 14 | 'float_data.txt'. The data in the file looks like:: 15 | 16 | 1 2 3 4 17 | 5 6 7 8 18 | 19 | The resulting data should be a 2x4 array of floating point values. 20 | 21 | 2. In the second example, the file 'float_data_with_header.txt' has 22 | strings as column names in the first row:: 23 | 24 | c1 c2 c3 c4 25 | 1 2 3 4 26 | 5 6 7 8 27 | 28 | Ignore these column names, and read the remainder of the data into 29 | a 2D array. 30 | 31 | Later on, we'll learn how to create a "structured array" using 32 | these column names to create fields within an array. 33 | 34 | Bonus 35 | ~~~~~ 36 | 37 | 3. A third example is more involved. It contains comments in multiple 38 | locations, uses multiple formats, and includes a useless column to 39 | skip:: 40 | 41 | -- THIS IS THE BEGINNING OF THE FILE -- 42 | % This is a more complex file to read! 43 | 44 | % Day, Month, Year, Useless Col, Avg Power 45 | 01, 01, 2000, ad766, 30 46 | 02, 01, 2000, t873, 41 47 | % we don't have Jan 03rd! 48 | 04, 01, 2000, r441, 55 49 | 05, 01, 2000, s345, 78 50 | 06, 01, 2000, x273, 134 % that day was crazy 51 | 07, 01, 2000, x355, 42 52 | 53 | %-- THIS IS THE END OF THE FILE -- 54 | """ 55 | 56 | from numpy import loadtxt 57 | 58 | ############################################################################# 59 | # 1. Simple example loading a 2x4 array of floats from a file. 60 | ############################################################################# 61 | ary1 = loadtxt('float_data.txt') 62 | 63 | print('example 1:') 64 | print(ary1) 65 | 66 | 67 | ############################################################################# 68 | # 2. Same example, but skipping the first row of column headers 69 | ############################################################################# 70 | ary2 = loadtxt('float_data_with_header.txt', skiprows=1) 71 | 72 | print('example 2:') 73 | print(ary2) 74 | 75 | ############################################################################# 76 | # 3. More complex example with comments and columns to skip 77 | ############################################################################# 78 | ary3 = loadtxt("complex_data_file.txt", delimiter=",", comments="%", 79 | usecols=(0, 1, 2, 4), dtype=int, skiprows=1) 80 | 81 | print('example 3:') 82 | print(ary3) 83 | -------------------------------------------------------------------------------- /exercises/plotting/dc_metro.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/enthought/Numpy-Tutorial-SciPyConf-2016/8e9e8cbb57f8976a4572800fe808aedc74775a81/exercises/plotting/dc_metro.JPG -------------------------------------------------------------------------------- /exercises/plotting/my_plots.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/enthought/Numpy-Tutorial-SciPyConf-2016/8e9e8cbb57f8976a4572800fe808aedc74775a81/exercises/plotting/my_plots.png -------------------------------------------------------------------------------- /exercises/plotting/plotting.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Plotting 4 | -------- 5 | 6 | In PyLab, create a plot display that looks like the following: 7 | 8 | .. image:: plotting/sample_plots.png 9 | 10 | `Photo credit: David Fettig 11 | `_ 12 | 13 | 14 | This is a 2x2 layout, with 3 slots occupied. 15 | 16 | 1. Sine function, with blue solid line; cosine with red '+' markers; the 17 | extents fit the plot exactly. Hint: see the axis() function for setting the 18 | extents. 19 | 2. Sine function, with gridlines, axis labels, and title; the extents fit the 20 | plot exactly. 21 | 3. Image with color map; the extents run from -10 to 10, rather than the 22 | default. 23 | 24 | Save the resulting plot image to a file. (Use a different file name, so you 25 | don't overwrite the sample.) 26 | 27 | The color map in the example is 'winter'; use 'cm.' to list the available 28 | ones, and experiment to find one you like. 29 | 30 | Start with the following statements:: 31 | 32 | from matplotlib.pyplot import imread 33 | 34 | x = linspace(0, 2*pi, 101) 35 | s = sin(x) 36 | c = cos(x) 37 | 38 | img = imread('dc_metro.jpg') 39 | 40 | Tip: If you find that the label of one plot overlaps another plot, try adding 41 | a call to `tight_layout()` to your script. 42 | 43 | Bonus 44 | ~~~~~ 45 | 46 | 4. The `subplot()` function returns an axes object, which can be assigned to 47 | the `sharex` and `sharey` keyword arguments of another subplot() function 48 | call. E.g.:: 49 | 50 | ax1 = subplot(2,2,1) 51 | ... 52 | subplot(2,2,2, sharex=ax1, sharey=ax1) 53 | 54 | Make this modification to your script, and explore the consequences. 55 | Hint: try panning and zooming in the subplots. 56 | 57 | See :ref:`plotting-solution`. 58 | """ 59 | 60 | 61 | # The following imports are *not* needed in PyLab, but are needed in this file. 62 | from numpy import linspace, pi, sin, cos 63 | from matplotlib.pyplot import (plot, subplot, cm, imread, imshow, xlabel, 64 | ylabel, title, grid, axis, show, savefig, gcf, 65 | figure, close, tight_layout) 66 | 67 | x = linspace(0, 2 * pi, 101) 68 | s = sin(x) 69 | c = cos(x) 70 | 71 | img = imread('dc_metro.JPG') 72 | -------------------------------------------------------------------------------- /exercises/plotting/plotting_bonus_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Plotting 4 | -------- 5 | 6 | In PyLab, create a plot display that looks like the following: 7 | 8 | .. image:: plotting/sample_plots.png 9 | 10 | `Photo credit: David Fettig 11 | `_ 12 | 13 | 14 | This is a 2x2 layout, with 3 slots occupied. 15 | 16 | 1. Sine function, with blue solid line; cosine with red '+' markers; the 17 | extents fit the plot exactly. Hint: see the axis() function for setting the 18 | extents. 19 | 2. Sine function, with gridlines, axis labels, and title; the extents fit the 20 | plot exactly. 21 | 3. Image with color map; the extents run from -10 to 10, rather than the 22 | default. 23 | 24 | Save the resulting plot image to a file. (Use a different file name, so you 25 | don't overwrite the sample.) 26 | 27 | The color map in the example is 'winter'; use 'cm.' to list the available 28 | ones, and experiment to find one you like. 29 | 30 | Start with the following statements:: 31 | 32 | from matplotlib.pyplot import imread 33 | 34 | x = linspace(0, 2*pi, 101) 35 | s = sin(x) 36 | c = cos(x) 37 | 38 | img = imread('dc_metro.jpg') 39 | 40 | Tip: If you find that the label of one plot overlaps another plot, try adding 41 | a call to `tight_layout()` to your script. 42 | 43 | Bonus 44 | ~~~~~ 45 | 46 | 4. The `subplot()` function returns an axes object, which can be assigned to 47 | the `sharex` and `sharey` keyword arguments of another subplot() function 48 | call. E.g.:: 49 | 50 | ax1 = subplot(2,2,1) 51 | ... 52 | subplot(2,2,2, sharex=ax1, sharey=ax1) 53 | 54 | Make this modification to your script, and explore the consequences. 55 | Hint: try panning and zooming in the subplots. 56 | 57 | """ 58 | 59 | 60 | # The following imports are *not* needed in PyLab, but are needed in this file. 61 | from numpy import linspace, pi, sin, cos 62 | from matplotlib.pyplot import (plot, subplot, cm, imread, imshow, xlabel, 63 | ylabel, title, grid, axis, show, savefig, gcf, 64 | figure, close, tight_layout) 65 | 66 | x = linspace(0, 2 * pi, 101) 67 | s = sin(x) 68 | c = cos(x) 69 | 70 | img = imread('dc_metro.JPG') 71 | 72 | close('all') 73 | # 2x2 layout, first plot: sin and cos 74 | ax1 = subplot(2, 2, 1) 75 | plot(x, s, 'b-', x, c, 'r+') 76 | axis('tight') 77 | 78 | # 2nd plot: gridlines, labels 79 | subplot(2, 2, 2, sharex=ax1, sharey=ax1) 80 | plot(x, s) 81 | grid() 82 | xlabel('radians') 83 | ylabel('amplitude') 84 | title('sin(x)') 85 | axis('tight') 86 | 87 | # 3rd plot, image 88 | subplot(2, 2, 3) 89 | imshow(img, extent=[-10, 10, -10, 10], cmap=cm.winter) 90 | 91 | tight_layout() 92 | 93 | show() 94 | 95 | 96 | savefig('my_plots.png') 97 | -------------------------------------------------------------------------------- /exercises/plotting/plotting_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Plotting 4 | -------- 5 | 6 | In PyLab, create a plot display that looks like the following: 7 | 8 | .. image:: plotting/sample_plots.png 9 | 10 | `Photo credit: David Fettig 11 | `_ 12 | 13 | 14 | This is a 2x2 layout, with 3 slots occupied. 15 | 16 | 1. Sine function, with blue solid line; cosine with red '+' markers; the 17 | extents fit the plot exactly. Hint: see the axis() function for setting the 18 | extents. 19 | 2. Sine function, with gridlines, axis labels, and title; the extents fit the 20 | plot exactly. 21 | 3. Image with color map; the extents run from -10 to 10, rather than the 22 | default. 23 | 24 | Save the resulting plot image to a file. (Use a different file name, so you 25 | don't overwrite the sample.) 26 | 27 | The color map in the example is 'winter'; use 'cm.' to list the available 28 | ones, and experiment to find one you like. 29 | 30 | Start with the following statements:: 31 | 32 | from matplotlib.pyplot import imread 33 | 34 | x = linspace(0, 2*pi, 101) 35 | s = sin(x) 36 | c = cos(x) 37 | 38 | img = imread('dc_metro.jpg') 39 | 40 | Tip: If you find that the label of one plot overlaps another plot, try adding 41 | a call to `tight_layout()` to your script. 42 | 43 | Bonus 44 | ~~~~~ 45 | 46 | 4. The `subplot()` function returns an axes object, which can be assigned to 47 | the `sharex` and `sharey` keyword arguments of another subplot() function 48 | call. E.g.:: 49 | 50 | ax1 = subplot(2,2,1) 51 | ... 52 | subplot(2,2,2, sharex=ax1, sharey=ax1) 53 | 54 | Make this modification to your script, and explore the consequences. 55 | Hint: try panning and zooming in the subplots. 56 | 57 | """ 58 | 59 | 60 | # The following imports are *not* needed in PyLab, but are needed in this file. 61 | from numpy import linspace, pi, sin, cos 62 | from matplotlib.pyplot import (plot, subplot, cm, imread, imshow, xlabel, 63 | ylabel, title, grid, axis, show, savefig, gcf, 64 | figure, close, tight_layout) 65 | 66 | x = linspace(0, 2*pi, 101) 67 | s = sin(x) 68 | c = cos(x) 69 | 70 | img = imread('dc_metro.JPG') 71 | 72 | close('all') 73 | # 2x2 layout, first plot: sin and cos 74 | subplot(2, 2, 1) 75 | plot(x, s, 'b-', x, c, 'r+') 76 | axis('tight') 77 | 78 | # 2nd plot: gridlines, labels 79 | subplot(2, 2, 2) 80 | plot(x, s) 81 | grid() 82 | xlabel('radians') 83 | ylabel('amplitude') 84 | title('sin(x)') 85 | axis('tight') 86 | 87 | # 3rd plot, image 88 | subplot(2, 2, 3) 89 | imshow(img, extent=[-10, 10, -10, 10], cmap=cm.winter) 90 | 91 | tight_layout() 92 | 93 | show() 94 | 95 | 96 | savefig('my_plots.png') 97 | -------------------------------------------------------------------------------- /exercises/plotting/sample_plots.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/enthought/Numpy-Tutorial-SciPyConf-2016/8e9e8cbb57f8976a4572800fe808aedc74775a81/exercises/plotting/sample_plots.png -------------------------------------------------------------------------------- /exercises/structured_array/structured_array.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Structured Array 4 | ---------------- 5 | 6 | In this exercise you will read columns of data into a structured array using 7 | loadtxt and combine that array to a regular array to analyze the data and learn 8 | how the pressure velocity evolves as a function of the shear velocity in sound 9 | waves in the Earth. 10 | 11 | 1. The data in 'short_logs.crv' has the following format:: 12 | 13 | DEPTH CALI S-SONIC ... 14 | 8744.5000 -999.2500 -999.2500 ... 15 | 8745.0000 -999.2500 -999.2500 ... 16 | 8745.5000 -999.2500 -999.2500 ... 17 | 18 | Here the first row defines a set of names for the columns 19 | of data in the file. Use these column names to define a 20 | dtype for a structured array that will have fields 'DEPTH', 21 | 'CALI', etc. Assume all the data is of the float64 data 22 | format. 23 | 24 | 2. Use the 'loadtxt' method from numpy to read the data from 25 | the file into a structured array with the dtype created 26 | in (1). Name this array 'logs' 27 | 28 | 3. The 'logs' array is nice for retrieving columns from the data. 29 | For example, logs['DEPTH'] returns the values from the DEPTH 30 | column of the data. For row-based or array-wide operations, 31 | it is more convenient to have a 2D view into the data, as if it 32 | is a simple 2D array of float64 values. 33 | 34 | Create a 2D array called 'logs_2d' using the view operation. 35 | Be sure the 2D array has the same number of columns as in the 36 | data file. 37 | 38 | 4. -999.25 is a "special" value in this data set. It is 39 | intended to represent missing data. Replace all of these 40 | values with NaNs. Is this easier with the 'logs' array 41 | or the 'logs_2d' array? 42 | 43 | 5. Create a mask for all the "complete" rows in the array. 44 | A complete row is one that doesn't have any NaN values measured 45 | in that row. 46 | 47 | HINT: The ``all`` function is also useful here. 48 | 49 | 6. Plot the VP vs VS logs for the "complete" rows. 50 | 51 | See :ref:`structured-array-solution`. 52 | """ 53 | from numpy import dtype, loadtxt, float64, NaN, isfinite, all 54 | from matplotlib.pyplot import plot, show, xlabel, ylabel 55 | 56 | # Open the file. 57 | log_file = open('short_logs.crv') 58 | 59 | # The first line is a header that has all the log names. 60 | header = log_file.readline() 61 | log_names = header.split() 62 | -------------------------------------------------------------------------------- /exercises/structured_array/structured_array_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Structured Array 4 | ---------------- 5 | 6 | In this exercise you will read columns of data into a structured array using 7 | loadtxt and combine that array to a regular array to analyze the data and learn 8 | how the pressure velocity evolves as a function of the shear velocity in sound 9 | waves in the Earth. 10 | 11 | 1. The data in 'short_logs.crv' has the following format:: 12 | 13 | DEPTH CALI S-SONIC ... 14 | 8744.5000 -999.2500 -999.2500 ... 15 | 8745.0000 -999.2500 -999.2500 ... 16 | 8745.5000 -999.2500 -999.2500 ... 17 | 18 | Here the first row defines a set of names for the columns 19 | of data in the file. Use these column names to define a 20 | dtype for a structured array that will have fields 'DEPTH', 21 | 'CALI', etc. Assume all the data is of the float64 data 22 | format. 23 | 24 | 2. Use the 'loadtxt' method from numpy to read the data from 25 | the file into a structured array with the dtype created 26 | in (1). Name this array 'logs' 27 | 28 | 3. The 'logs' array is nice for retrieving columns from the data. 29 | For example, logs['DEPTH'] returns the values from the DEPTH 30 | column of the data. For row-based or array-wide operations, 31 | it is more convenient to have a 2D view into the data, as if it 32 | is a simple 2D array of float64 values. 33 | 34 | Create a 2D array called 'logs_2d' using the view operation. 35 | Be sure the 2D array has the same number of columns as in the 36 | data file. 37 | 38 | 4. -999.25 is a "special" value in this data set. It is 39 | intended to represent missing data. Replace all of these 40 | values with NaNs. Is this easier with the 'logs' array 41 | or the 'logs_2d' array? 42 | 43 | 5. Create a mask for all the "complete" rows in the array. 44 | A complete row is one that doesn't have any NaN values measured 45 | in that row. 46 | 47 | HINT: The ``all`` function is also useful here. 48 | 49 | 6. Plot the VP vs VS logs for the "complete" rows. 50 | """ 51 | from numpy import dtype, loadtxt, float64, NaN, isfinite, all 52 | from matplotlib.pyplot import plot, show, xlabel, ylabel 53 | 54 | # Open the file. 55 | log_file = open('short_logs.crv') 56 | 57 | # 1.Create a dtype from the names in the file header. 58 | header = log_file.readline() 59 | log_names = header.split() 60 | 61 | # Construct the array "dtype" that describes the data. All fields 62 | # are 8 byte (64 bit) floating point. 63 | fields = zip(log_names, ['f8']*len(log_names)) 64 | fields_dtype = dtype(fields) 65 | 66 | #2. Use loadtxt to load the data into a structured array. 67 | logs = loadtxt(log_file, dtype=fields_dtype) 68 | 69 | # 3. Make a 2D, float64 view of the data. 70 | # The -1 value for the row shape means that numpy should 71 | # make this dimension whatever it needs to be so that 72 | # rows*cols = size for the array. 73 | values = logs.view(float64) 74 | values.shape = -1, len(fields) 75 | 76 | # 4. Relace any values that are -999.25 with NaNs. 77 | values[values==-999.25] = NaN 78 | 79 | # 5. Make a mask for all the rows that don't have any missing values. 80 | # Pull out these samples from the logs array into a separate array. 81 | data_mask = all(isfinite(values), axis=-1) 82 | good_logs = logs[data_mask] 83 | 84 | 85 | # 6. Plot VP vs. VS for the "complete rows. 86 | plot(good_logs['VS'], good_logs['VP'], 'o') 87 | xlabel('VS') 88 | ylabel('VP') 89 | show() 90 | -------------------------------------------------------------------------------- /exercises/wind_statistics/wind.desc: -------------------------------------------------------------------------------- 1 | wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological 2 | stations in the Republic of Ireland (Haslett and raftery 1989). 3 | 4 | These data were analyzed in detail in the following article: 5 | Haslett, J. and Raftery, A. E. (1989). Space-time Modelling with 6 | Long-memory Dependence: Assessing Ireland's Wind Power Resource 7 | (with Discussion). Applied Statistics 38, 1-50. 8 | 9 | Each line corresponds to one day of data in the following format: 10 | year, month, day, average wind speed at each of the stations in the order given 11 | in Fig.4 of Haslett and Raftery : 12 | RPT, VAL, ROS, KIL, SHA, BIR, DUB, CLA, MUL, CLO, BEL, MAL 13 | 14 | Fortan format : ( i2, 2i3, 12f6.2) 15 | 16 | The data are in knots, not in m/s. 17 | 18 | Permission granted for unlimited distribution. 19 | 20 | Please report all anomalies to fraley@stat.washington.edu 21 | 22 | Be aware that the dataset is 532494 bytes long (thats over half a 23 | Megabyte). Please be sure you want the data before you request it. 24 | -------------------------------------------------------------------------------- /exercises/wind_statistics/wind_statistics.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Wind Statistics 4 | ---------------- 5 | 6 | Topics: Using array methods over different axes, fancy indexing. 7 | 8 | 1. The data in 'wind.data' has the following format:: 9 | 10 | 61 1 1 15.04 14.96 13.17 9.29 13.96 9.87 13.67 10.25 10.83 12.58 18.50 15.04 11 | 61 1 2 14.71 16.88 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83 12 | 61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 8.04 8.50 7.67 12.75 12.71 13 | 14 | The first three columns are year, month and day. The 15 | remaining 12 columns are average windspeeds in knots at 12 16 | locations in Ireland on that day. 17 | 18 | Use the 'loadtxt' function from numpy to read the data into 19 | an array. 20 | 21 | 2. Calculate the min, max and mean windspeeds and standard deviation of the 22 | windspeeds over all the locations and all the times (a single set of numbers 23 | for the entire dataset). 24 | 25 | 3. Calculate the min, max and mean windspeeds and standard deviations of the 26 | windspeeds at each location over all the days (a different set of numbers 27 | for each location) 28 | 29 | 4. Calculate the min, max and mean windspeed and standard deviations of the 30 | windspeeds across all the locations at each day (a different set of numbers 31 | for each day) 32 | 33 | 5. Find the location which has the greatest windspeed on each day (an integer 34 | column number for each day). 35 | 36 | 6. Find the year, month and day on which the greatest windspeed was recorded. 37 | 38 | 7. Find the average windspeed in January for each location. 39 | 40 | You should be able to perform all of these operations without using a for 41 | loop or other looping construct. 42 | 43 | Bonus 44 | ~~~~~ 45 | 46 | 1. Calculate the mean windspeed for each month in the dataset. Treat 47 | January 1961 and January 1962 as *different* months. (hint: first find a 48 | way to create an identifier unique for each month. The second step might 49 | require a for loop.) 50 | 51 | 2. Calculate the min, max and mean windspeeds and standard deviations of the 52 | windspeeds across all locations for each week (assume that the first week 53 | starts on January 1 1961) for the first 52 weeks. This can be done without 54 | any for loop. 55 | 56 | Bonus Bonus 57 | ~~~~~~~~~~~ 58 | 59 | Calculate the mean windspeed for each month without using a for loop. 60 | (Hint: look at `searchsorted` and `add.reduceat`.) 61 | 62 | Notes 63 | ~~~~~ 64 | 65 | These data were analyzed in detail in the following article: 66 | 67 | Haslett, J. and Raftery, A. E. (1989). Space-time Modelling with 68 | Long-memory Dependence: Assessing Ireland's Wind Power Resource 69 | (with Discussion). Applied Statistics 38, 1-50. 70 | 71 | 72 | See :ref:`wind-statistics-solution`. 73 | """ 74 | 75 | from numpy import loadtxt 76 | -------------------------------------------------------------------------------- /exercises/wind_statistics/wind_statistics_solution.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Enthought, Inc. All Rights Reserved 2 | """ 3 | Wind Statistics 4 | ---------------- 5 | 6 | Topics: Using array methods over different axes, fancy indexing. 7 | 8 | 1. The data in 'wind.data' has the following format:: 9 | 10 | 61 1 1 15.04 14.96 13.17 9.29 13.96 9.87 13.67 10.25 10.83 12.58 18.50 15.04 11 | 61 1 2 14.71 16.88 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83 12 | 61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 8.04 8.50 7.67 12.75 12.71 13 | 14 | The first three columns are year, month and day. The 15 | remaining 12 columns are average windspeeds in knots at 12 16 | locations in Ireland on that day. 17 | 18 | Use the 'loadtxt' function from numpy to read the data into 19 | an array. 20 | 21 | 2. Calculate the min, max and mean windspeeds and standard deviation of the 22 | windspeeds over all the locations and all the times (a single set of numbers 23 | for the entire dataset). 24 | 25 | 3. Calculate the min, max and mean windspeeds and standard deviations of the 26 | windspeeds at each location over all the days (a different set of numbers 27 | for each location) 28 | 29 | 4. Calculate the min, max and mean windspeed and standard deviations of the 30 | windspeeds across all the locations at each day (a different set of numbers 31 | for each day) 32 | 33 | 5. Find the location which has the greatest windspeed on each day (an integer 34 | column number for each day). 35 | 36 | 6. Find the year, month and day on which the greatest windspeed was recorded. 37 | 38 | 7. Find the average windspeed in January for each location. 39 | 40 | You should be able to perform all of these operations without using a for 41 | loop or other looping construct. 42 | 43 | Bonus 44 | ~~~~~ 45 | 46 | 1. Calculate the mean windspeed for each month in the dataset. Treat 47 | January 1961 and January 1962 as *different* months. 48 | 49 | 2. Calculate the min, max and mean windspeeds and standard deviations of the 50 | windspeeds across all locations for each week (assume that the first week 51 | starts on January 1 1961) for the first 52 weeks. 52 | 53 | Bonus Bonus 54 | ~~~~~~~~~~~ 55 | 56 | Calculate the mean windspeed for each month without using a for loop. 57 | (Hint: look at `searchsorted` and `add.reduceat`.) 58 | 59 | Notes 60 | ~~~~~ 61 | 62 | These data were analyzed in detail in the following article: 63 | 64 | Haslett, J. and Raftery, A. E. (1989). Space-time Modelling with 65 | Long-memory Dependence: Assessing Ireland's Wind Power Resource 66 | (with Discussion). Applied Statistics 38, 1-50. 67 | 68 | """ 69 | from __future__ import print_function 70 | from numpy import loadtxt, arange, searchsorted, add, zeros 71 | 72 | wind_data = loadtxt('wind.data') 73 | 74 | data = wind_data[:, 3:] 75 | 76 | print('2. Statistics over all values') 77 | print(' min:', data.min()) 78 | print(' max:', data.max()) 79 | print(' mean:', data.mean()) 80 | print(' standard deviation:', data.std()) 81 | print 82 | 83 | print('3. Statistics over all days at each location') 84 | print(' min:', data.min(axis=0)) 85 | print(' max:', data.max(axis=0)) 86 | print(' mean:', data.mean(axis=0)) 87 | print(' standard deviation:', data.std(axis=0)) 88 | print() 89 | 90 | print('4. Statistics over all locations for each day') 91 | print(' min:', data.min(axis=1)) 92 | print(' max:', data.max(axis=1)) 93 | print(' mean:', data.mean(axis=1)) 94 | print(' standard deviation:', data.std(axis=1)) 95 | print 96 | 97 | print('5. Location of daily maximum') 98 | print(' daily max location:', data.argmax(axis=1)) 99 | print() 100 | 101 | daily_max = data.max(axis=1) 102 | max_row = daily_max.argmax() 103 | 104 | print('6. Day of maximum reading') 105 | print(' Year:', int(wind_data[max_row, 0])) 106 | print(' Month:', int(wind_data[max_row, 1])) 107 | print(' Day:', int(wind_data[max_row, 2])) 108 | print() 109 | 110 | january_indices = wind_data[:, 1] == 1 111 | january_data = data[january_indices] 112 | 113 | print('7. Statistics for January') 114 | print(' mean:', january_data.mean(axis=0)) 115 | print() 116 | 117 | # Bonus 118 | 119 | # compute the month number for each day in the dataset 120 | months = (wind_data[:, 0] - 61) * 12 + wind_data[:, 1] - 1 121 | 122 | # get set of unique months 123 | month_values = set(months) 124 | 125 | # initialize an array to hold the result 126 | monthly_means = zeros(len(month_values)) 127 | 128 | for month in month_values: 129 | # find the rows that correspond to the current month 130 | day_indices = (months == month) 131 | 132 | # extract the data for the current month using fancy indexing 133 | month_data = data[day_indices] 134 | 135 | # find the mean 136 | monthly_means[month] = month_data.mean() 137 | 138 | # Note: experts might do this all-in one 139 | # monthly_means[month] = data[months==month].mean() 140 | 141 | # In fact the whole for loop could reduce to the following one-liner 142 | # monthly_means = array([data[months==month].mean() for month in month_values]) 143 | 144 | 145 | print("Bonus 1.") 146 | print(" mean:", monthly_means) 147 | print() 148 | 149 | # Bonus 2. 150 | # Extract the data for the first 52 weeks. Then reshape the array to put 151 | # on the same line 7 days worth of data for all locations. Let Numpy 152 | # figure out the number of lines needed to do so 153 | weekly_data = data[:52 * 7].reshape(-1, 7 * 12) 154 | 155 | print('Bonus 2. Weekly statistics over all locations') 156 | print(' min:', weekly_data.min(axis=1)) 157 | print(' max:', weekly_data.max(axis=1)) 158 | print(' mean:', weekly_data.mean(axis=1)) 159 | print(' standard deviation:', weekly_data.std(axis=1)) 160 | print() 161 | 162 | # Bonus Bonus : this is really tricky... 163 | 164 | # compute the month number for each day in the dataset 165 | months = (wind_data[:, 0] - 61) * 12 + wind_data[:, 1] - 1 166 | 167 | # find the indices for the start of each month 168 | # this is a useful trick - we use range from 0 to the 169 | # number of months + 1 and searchsorted to find the insertion 170 | # points for each. 171 | month_indices = searchsorted(months, arange(months[-1] + 2)) 172 | 173 | # now use add.reduceat to get the sum at each location 174 | monthly_loc_totals = add.reduceat(data, month_indices[:-1]) 175 | 176 | # now use add to find the sum across all locations for each month 177 | monthly_totals = monthly_loc_totals.sum(axis=1) 178 | 179 | # now find total number of measurements for each month 180 | month_days = month_indices[1:] - month_indices[:-1] 181 | measurement_count = month_days * 12 182 | 183 | # compute the mean 184 | monthly_means = monthly_totals / measurement_count 185 | 186 | print("Bonus Bonus") 187 | print(" mean:", monthly_means) 188 | 189 | # Notes: this method relies on the fact that the months are contiguous in the 190 | # data set - the method used in the bonus section works for non-contiguous 191 | # days. 192 | -------------------------------------------------------------------------------- /slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/enthought/Numpy-Tutorial-SciPyConf-2016/8e9e8cbb57f8976a4572800fe808aedc74775a81/slides.pdf --------------------------------------------------------------------------------