├── Pytorch
├── README.md
└── docker_projects
│ ├── Dockerfile
│ ├── bin
│ ├── pytorch.py
│ └── __init__.py
│ └── pytorch.json
├── advanced_concepts
├── AnswerFactory_module
│ ├── SHA.zip
│ ├── README.md
│ └── AnswerFactory_tutorial.md
├── this_shp_will_clip_10400100245B7800.zip
├── test-a-docker.ipynb
└── string-ports-tutorial.ipynb
├── workshop_prep
├── gbdxtools_env.yml
├── gbdxtools-test.ipynb
└── README.md
├── gbdxtools_module
├── gbdxtools_env.yml
└── README.md
├── custom_task_module
├── README.md
└── custom-task-tutorial.ipynb
├── gbdx_notebooks_module
└── README.md
└── README.md
/Pytorch/README.md:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/Pytorch/docker_projects/Dockerfile:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/Pytorch/docker_projects/bin/pytorch.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/Pytorch/docker_projects/pytorch.json:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/Pytorch/docker_projects/bin/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/advanced_concepts/AnswerFactory_module/SHA.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GeoBigData/gbdx-training/HEAD/advanced_concepts/AnswerFactory_module/SHA.zip
--------------------------------------------------------------------------------
/advanced_concepts/this_shp_will_clip_10400100245B7800.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/GeoBigData/gbdx-training/HEAD/advanced_concepts/this_shp_will_clip_10400100245B7800.zip
--------------------------------------------------------------------------------
/workshop_prep/gbdxtools_env.yml:
--------------------------------------------------------------------------------
1 | # environment.yml
2 | name: gbdxtools
3 | channels: !!python/tuple
4 | - !!python/unicode
5 | 'conda-forge'
6 | - !!python/unicode
7 | 'defaults'
8 | - !!python/unicode
9 | 'digitalglobe'
10 | dependencies:
11 | - conda-forge::python=2.7
12 | - conda-forge::numpy
13 | - conda-forge::rasterio=0.36.0
14 | - conda-forge::gdal
15 | - conda-forge::fiona
16 | - conda-forge::pyproj
17 | - conda-forge::ipython
18 | - digitalglobe::gbdxtools
19 |
--------------------------------------------------------------------------------
/gbdxtools_module/gbdxtools_env.yml:
--------------------------------------------------------------------------------
1 | # environment.yml
2 | name: gbdxtools
3 | channels: !!python/tuple
4 | - !!python/unicode
5 | 'conda-forge'
6 | - !!python/unicode
7 | 'defaults'
8 | - !!python/unicode
9 | 'digitalglobe'
10 | dependencies:
11 | - conda-forge::python=2.7
12 | - conda-forge::numpy
13 | - conda-forge::rasterio=0.36.0
14 | - conda-forge::gdal
15 | - conda-forge::fiona
16 | - conda-forge::pyproj
17 | - conda-forge::ipython
18 | - conda-forge::nb_conda
19 | - digitalglobe::gbdxtools
20 |
--------------------------------------------------------------------------------
/advanced_concepts/AnswerFactory_module/README.md:
--------------------------------------------------------------------------------
1 | # intro
2 | AnswerFactory is a GBDX-powered web app that allows users to extract information about imagery in an easy-to-use interface. Create projects by selecting an area of interest, then add 'answers' to do things like locate and count ojects, detect change, and extract features. You can view the output of your analysis within the web app, or download the output and view it in a GIS.
3 |
4 | # what you will learn
5 | In this tutorial, we are going to demonstrate how to create a project that extracts aircraft features from imagery, then we'll demonstrate how to download the vector data and view it in a GIS.
6 |
7 | # to get started
8 | - From this repository, [download the zipped file SHA.zip](https://github.com/GeoBigData/gbdx-training/raw/master/answerfactory_module/SHA.zip)
9 | - Follow the AnswerFactory tutorial, [`'AnswerFactory_tutorial.md'`](./AnswerFactory_tutorial.md) which will walk you through the steps of creating a project in AnswerFactory
10 | - Optionally, watch the recorded [AnswerFactory tutorial](https://digitalglobe.wistia.com/medias/u39g09l8ee) and follow along
11 |
12 | ___
13 | We would love to hear your feedback. Feel free to email GBDX-support@digitalglobe.com with comments and suggestions.
14 |
--------------------------------------------------------------------------------
/custom_task_module/README.md:
--------------------------------------------------------------------------------
1 | # intro
2 | Users can and are encouraged to build their own custom analytic capability, what we call a 'Task', to use within the Workflow system on GBDX.
3 |
4 | # what you will learn
5 | This tutorial will walk you through the multi-step process of registering a Task to GBDX, starting with sample code that clips a raster to a shapefile as our example Task.
6 |
7 | The tutorial describes the files required to register a Task. If you execute the provided code within the notebook, on a cell by cell basis in top-down order, those files will be written to your computer for you. You can also create those files yourself outside of the notebook if you prefer.
8 |
9 | # to get started
10 | __1. GBDX Notebooks__
11 |
12 | Become familiar with Tasks and Workflows through tutorials in the [GBDX Notebooks module](../gbdx_notebooks_module/README.md)
13 |
14 | __2. Install Anaconda and gbdxtools__
15 |
16 | Instructions are provided in the [gbdxtools module](../gbdxtools_module/README.md)
17 |
18 | __3. Install Docker__
19 |
20 | 1. Install Docker from https://www.docker.com/
21 | 2. Test your Docker installation by starting Docker and pasting in the following command, which should return your Docker version
22 | ```
23 | docker version
24 | ```
25 | 3. Sign up for a free Docker Hub account at https://hub.docker.com/
26 |
27 | __4. Download and start the Notebook tutorial:__
28 |
29 | 1. Download the file [custom-task-tutorial.ipynb](https://github.com/GeoBigData/gbdx-training/blob/master/custom_task_module/custom-task-tutorial.ipynb) from this repository.
30 |
31 | 2. Open a terminal/cmd window and copy and paste `jupyter notebook`. This will open the Jupyter Notebook interface in your browser [(documentation here)](https://jupyter.readthedocs.io/en/latest/running.html#running).
32 |
33 | 3. You can navigate your file system and open the downloaded custom-task-tutorial.ipynb notebook from the Jupyter Notebook interface.
34 |
35 | 4. The custom-task-tutorial.ipynb explains all of the steps required to register a custom task to GBDX, along with example code. You can execute the code within the notebook by using the keyboard shortcut SHIFT + ENTER, or select the play button in the toolbar. Follow the instructions in the notebook and optionally watch the [recording of this tutorial](https://digitalglobe.wistia.com/medias/8z9hj4g960).
36 |
37 | ___
38 | We would love to hear your feedback. Feel free to email GBDX-support@digitalglobe.com with comments and suggestions.
39 |
--------------------------------------------------------------------------------
/workshop_prep/gbdxtools-test.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### This short Notebook is intended to demonstrate:\n",
8 | "\n",
9 | "- how to execute code in a cell\n",
10 | "- test your GBDX credentials and/or gbdxtools config file\n",
11 | "\n",
12 | "__1. Execute the code in the following cell by first selecting the cell, then pressing the play button in the toolbar above or using the keyboard shortcut SHIFT + ENTER. This block of code simply imports the Python library 'sys' and returns your Python instance.__ "
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": null,
18 | "metadata": {},
19 | "outputs": [],
20 | "source": [
21 | "import sys\n",
22 | "sys.executable"
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "metadata": {},
28 | "source": [
29 | "__2. Fill in your your GBDX username, password, client ID and client secret in the following cell. This information can be found under your Profile information at https://gbdx.geobigdata.io/profile. If you have a GBDX config file, you can uncomment and use the first two lines of code to authenticate into GBDX.__"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": null,
35 | "metadata": {
36 | "collapsed": true
37 | },
38 | "outputs": [],
39 | "source": [
40 | "# from gbdxtools import Interface\n",
41 | "# gbdx = Interface()\n",
42 | "\n",
43 | "import gbdxtools\n",
44 | "gbdx = gbdxtools.Interface(\n",
45 | " username='',\n",
46 | " password='',\n",
47 | " client_id='',\n",
48 | " client_secret='')"
49 | ]
50 | },
51 | {
52 | "cell_type": "markdown",
53 | "metadata": {},
54 | "source": [
55 | "__3. If you've entered your credentials correctly (and config file is formatted correctly if using), executing the following block of code will return your GBDX S3 information.__ "
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "metadata": {},
62 | "outputs": [],
63 | "source": [
64 | "gbdx.s3.info"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "### If the above command returned your GBDX S3 info, you should be ready to take the GBDX tutorial. "
72 | ]
73 | }
74 | ],
75 | "metadata": {
76 | "anaconda-cloud": {},
77 | "kernelspec": {
78 | "display_name": "Python [conda env:gbdxtools]",
79 | "language": "python",
80 | "name": "conda-env-gbdxtools-py"
81 | },
82 | "language_info": {
83 | "codemirror_mode": {
84 | "name": "ipython",
85 | "version": 2
86 | },
87 | "file_extension": ".py",
88 | "mimetype": "text/x-python",
89 | "name": "python",
90 | "nbconvert_exporter": "python",
91 | "pygments_lexer": "ipython2",
92 | "version": "2.7.13"
93 | }
94 | },
95 | "nbformat": 4,
96 | "nbformat_minor": 1
97 | }
98 |
--------------------------------------------------------------------------------
/gbdx_notebooks_module/README.md:
--------------------------------------------------------------------------------
1 | ## intro
2 | GBDX Notebooks is a hosted Python environment that provides immediate access to analysis-ready imagery in a familiar Jupyter Notebook environment. This environment is preconfigured with data science and geoprocessing tools that make it easy to develop new analysis methods and run them at scale on GBDX.
3 |
4 | ## what you will learn
5 | The notebook tutorials listed below are designed to guide you through the basics of coding against imagery in the notebook to deploying an algorithm and running it at scale on GBDX.
6 |
7 | ## to get started
8 | - sign into the [GBDX Notebooks Hub](https://notebooks.geobigdata.io) using your GBDX credentials
9 | - link to your Github account [(instructions here)](https://gbdxdocs.digitalglobe.com/docs/gbdx-notebooks-course#section-getting-started)
10 | - select a tutorial from below (they progress from very basic to advanced)
11 | - follow the link to the tutorial notebook in the GBDX Notebook Hub
12 | - click "Clone and Edit" to create your own private version of the notebook
13 | - follow the instructions in the notebook to complete the tutorial
14 |
15 | ## tutorials
16 |
17 | __1. [GBDX Notebook basics](https://notebooks.geobigdata.io/hub/notebooks/5b27f7db2c7831647a306e3c?tab=code)__ `python exercises`
18 |
19 | Learn to navigate the notebook interface, how to write and execute code in the notebook, plot a graph, and start a blank notebook.
20 |
21 | __2. [Simple NDVI](https://notebooks.geobigdata.io/hub/notebooks/5b27f8262c7831647a306e3f?tab=code)__ `view image` `array operations` `NDVI`
22 |
23 | Learn how to fetch image pixel data and view it in the notebook, then run an easy vegetation analysis on the imagery.
24 |
25 | __3. [Algorithm prototyping](https://notebooks.geobigdata.io/hub/notebooks/5c006459f3acbc49e7a459f4?tab=code)__ `NDWI` `coastline extraction`
26 |
27 | In this Notebook, we walk through a simple methodology for extracting coastlines from 8-band multispectral imagery. This tutorial demonstrates how to link together several concepts from remote sensing, image science, and GIS to produce a complete geospatial analysis. The steps in the workflow include: (1) calculating a Normalized Difference Water Index; (2) thresholding the water index into a binary image; (3) cleaning up small features; and (4) converting the land/water boundaries into vector polylines representing coastlines.
28 |
29 | __4. [Algorithm scaling](https://notebooks.geobigdata.io/hub/notebooks/5b27f82b2c7831647a306e41?tab=code)__ `pip install` `edge cases` `image chips` `image affine` `visualization`
30 |
31 | In this Notebook, we will extend the simple methodology for extracting coastlines that we built in the previous tutorial. Our goal in this Notebook is to be able to run the same methodology over a much bigger geographic area. Specifically, we are going to show two different approaches to running the algorithm over an entire image rather than just one small part of that image, like we used last time.
32 |
33 | __5. [Deploy algorithm as a Task](https://notebooks.geobigdata.io/hub/notebooks/5b27f8532c7831647a306e42?tab=code)__ `task inputs` `task outputs` `task deploy`
34 |
35 | In this Notebook, we provide a walkthrough of how to deploy our coastline extraction algorithm as a GBDX Task, using some helpful tools built right into the GBDX Notebooks interface. Once we've done this, we can execute the same Task against multiple images all at the same time: each will be kicked off as a separate, parallel workflow, without being constrained to the computational limits of a single machine.
36 |
37 | __6. [Run algorithm as a Task on GBDX](https://notebooks.geobigdata.io/hub/notebooks/5b27f8282c7831647a306e40?tab=code)__ `workflows` `S3 output` `vizualizing output`
38 |
39 | In the previous Notebook, we walked through how to deploy our coastline extraction algorithm as a GBDX Task, enabling us to run it on the GBDX platform instead of inside of our Notebook. In this final Notebook, we are going to use the GBDX Task we created to run coastline extraction over multiple images, in parallel, using GBDX Workflows. Using this approach, we'll be able to extract a highly detailed coastline for the entire island of Kauai, in less than 10 minutes.
40 |
41 | ## next steps
42 |
43 | There are several notebooks in the GBDX Notebooks Hub that demonstrate useful methods and tools, you will find them in the Discover section. Here is a short list of recommended notebooks.
44 |
45 | [Searching and Ordering Imagery](https://notebooks.geobigdata.io/hub/notebooks/5b27f7db2c7831647a306e3d?tab=code) `catalog search` `order imagery`
46 |
47 | [Workflow Basics](https://notebooks.geobigdata.io/hub/notebooks/5b27f7da2c7831647a306e3b?tab=code) `tasks` `workflows` `image preprocessing` `task inputs and outputs` `s3 output`
48 |
49 | [Imagery and Areas of Interest](https://notebooks.geobigdata.io/hub/notebooks/5a037c12f74cf64a53479964?tab=code) `index image with various geometries` `image geointerface`
50 |
51 | [Color Matching Imagery to Browse Imagery](https://notebooks.geobigdata.io/hub/notebooks/5a29c32256e0d252e24aa1f5?tab=code) `color balancing for light or dark imagery`
52 |
53 |
54 | ___
55 | ### Resources
56 |
57 | [__GBDX University__](https://gbdxdocs.digitalglobe.com/)
58 |
59 | [__GBDX Notebooks Hub__](https://notebooks.geobigdata.io)
60 |
61 | [__GBDXtools Documentation__](http://gbdxtools.readthedocs.io/en/latest/)
62 |
63 | [__S3 Browser__](http://s3browser.geobigdata.io/login.html)
64 |
65 | [__GBDX Stories__](http://gbdxstories.digitalglobe.com/)
66 | ___
67 | We would love to hear your feedback. Feel free to email GBDX-support@digitalglobe.com with questions, comments or suggestions.
68 |
69 |
--------------------------------------------------------------------------------
/advanced_concepts/AnswerFactory_module/AnswerFactory_tutorial.md:
--------------------------------------------------------------------------------
1 | ### Create a project
2 | 1. SIGN INTO ANSWERFACTORY
3 | [Sign into AnswerFactory](https://vector.geobigdata.io/answer-factory/) using your GBDX credentials. This will be the email address and password you created for your GBDX account.
4 |
5 | 2. CREATE A NEW PROJECT
6 | Click on the "New Project" button at the top of the page. This will take you to the Create New Project homepage with the Project Information Panel to the left of the screen.
7 |
8 | 3. ADD AN AREA OF INTEREST (AOI)
9 | We have provided a shapefile over Shanghai Hongqiao International Airport for this tutorial. If you haven't done so already, please download the "SHA.zip" file from this github repo.
Upload this shapefile to AnswerFactory by clicking on the "Upload Shapefile" button in the Project Panel and selecting the zipped file. The map will snap to the AOI defined by the shapefile, and you will see more configurable options in the Project Panel.
10 |
11 | 4. NAME YOUR PROJECT
12 | For this tutorial, give your project a unique name by naming it something like, "extract_aircraft" and attach your initials at the end. This name will be checked against existing names to prevent multiple projects from having the same name.
13 |
14 | 5. ADD THE 'EXTRACT AIRCRAFT' ANSWER TO YOUR PROJECT
15 | For this demo, you are going to run the AnswerFactory 'Extract Aircraft' recipe. This recipe runs an algorithm over an image to identify aircraft and generate vectors that indicate where the aircraft are. There are three subtypes of aircraft currently being detected: Airliner, Fighter, and Helicopter.
Click on the Add Answers button to bring up the modal for selecting answers. Click in the field to open the dropdown list of currently available answers. Choose the 'Extract Aircraft' selection at the top of the list, which will pop up in a panel on the right side of your screen.
16 |
17 | 6. SAVE PROJECT
18 | Click on the Create Project button at the bottom of the Project Panel. When prompted, confirm that you want to create the project.
At this point, the analysis has begun. You should see a message on the right side of the screen that states, "Currently running Extract Aircraft'.
19 |
20 | ### Check results in the App
21 | Normally, it would take some time for the algorithm to process and return the results. However, AnswerFactory is smart and won't process the same algorithm over the same image twice, but will instead query and provide the vector output that already exists. This is what will happen when you run the 'Extract Aircraft' recipe over the AOI you defined using the provided shapefile. Instead of waiting for AnswerFactory to process new results, you will receive an 'Answer Ready!' email within a couple hours. You can follow the link within the email or follow the instructions outlined below to view your completed project.
22 |
23 | 1. OPEN YOUR PROJECT
24 | Click on the 'Open Project' button at the top of the screen. This will bring up the projects list on the left side of the page, where you can browse a list of public projects. To find your project, check the toggle for 'Show Only My Projects', then select your unique "extract_aircraft" project.
You will be presented with the base View Project page for your project, which contains the Project Information Panel to the left and the Answer List to the right. Note that the Project Details section contains all of the information found when you created the project, along with additional information about the project owner, the date and time of project creation, and the date and time that the project was last modified. The Answers Panel on the right shows all of the answers that have been run for the project.
25 |
26 | 2. VIEW AN ANSWER
27 | In the Answer Panel, click on the 'Extract Aircraft' text. A table will appear on the bottom half of the screen. The table will display a row for each AOI. The columns to the right of the Name column are labeled with the acquisition dates of the images that the answer was run against. For each of the acquisitions, you will find statistics stating the number of vectors found, the vector coverage percentage within the AOI, and the image coverage percentage within the AOI.
Select a cell under its acquisition date to load the vector results and source imagery used for the analysis. Hover over the blue dot icon in the upper right corner to see the map table of contents. Click the first layer, "Extract Aircraft" to toggle on/off the vector results. Clicking the layer that begins with a catalog ID (ex. 104001002979E100) will toggle on/off the source imagery used in the analysis.
28 |
29 | 3. DIG INTO AN ANSWER
30 | Double click on one of the cells to load a table of the individual vectors for that cell. Click the magnifying glass next to a row to pan/zoom to that specific vector.
31 |
32 | ### View results in a GIS
33 | You can also download the vector output from your analysis for use outside of the app. In this part of the tutorial, we are going to demonstrate how to download the output as GeoJSON and view it in a GIS. We'll show how to load the data into both QGIS and ArcMap.
34 |
35 | 1. DOWNLOAD VECTOR OUTPUT
36 | From the main table view, click on the "Download All Results" button at the bottom left, which will save the vector output in GeoJSON format to "openskynet-aircraft-detection.json".
37 |
38 | 2. VIEW OUTPUT IN QGIS
39 | From here, it is simple to view the vector output in QGIS. Open the QGIS application and start a new project. Locate the "openskynet-aircraft-detection.json" file in your Downloads folder, then simply drag and drop it into the QGIS main viewer window. The loaded data will display as polygons indicating where aircraft were found using the "Extract Aircraft" algorithm.
40 |
41 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Quickstart
2 | Want to see what GBDX can do? Follow the links below to view examples of combining the power of immediate imagery access with additional data sources to derive information and insight. Although it took a lot of coding to create the analyses, you can simply scroll through the webpage to view not only the code, but also view results in an interactive map.
3 |
4 | ### [Imagery to Insights with GBDX - Boat Traffic](https://notebooks.geobigdata.io/hub/notebooks/uoxerx07mx5eqkq3y381)
5 |
6 | This Notebook demonstrates how to quickly go from imagery to insight by combining a boat extraction analysis with weather data to gain meaningful insights about boat traffic near Point Piper, Australia.
7 |
8 |
9 | ### [Ecopia Building Footprints and GBDX](https://notebooks.geobigdata.io/hub/notebooks/64bvsjjzz486qjoyhlbs)
10 |
11 | DigitalGlobe has partnered with Ecopia to produce highly accurate, up-to-date 2D building footprints. In this Notebook, we
12 | demonstrate how to analyze those building footprints against the original source imagery to create an enhanced dataset. Combining these datasets with the power of GBDX gives us quick and powerful insight into new business use cases and research questions.
13 |
14 |
15 |
16 | # Intro
17 | DigitalGlobe's Geospatial Big Data Platform, or GBDX, provides customers with a fast and easy way to search, order, and process DigitalGlobe imagery. We provide several tools for doing big data analytics on our platform, and the ability to leverage your own capabilities against your data or ours. This guide is provided to help discover the tools and resources you need to quickly and easily start developing on GBDX.
18 |
19 | # GBDX Overview
20 | In this presentation, we explain what GBDX is, why we built it, we'll highlight solutions that partners are building on GBDX, and introduce important technical concepts. These technical concepts will be covered in more detail in the hands-on tutorials.
21 |
22 | [GBDX Overview recorded presentation](https://digitalglobe.wistia.com/medias/kbqln5pwks)
23 |
24 | [GBDX Overview slides](https://docs.google.com/presentation/d/171n5JYn4CZoX0Hy7vEtPaiJbm-sTCicdSHUzInoE89w/edit?usp=sharing)
25 |
26 | # GBDX and Python: GBDX Notebooks
27 | The quickest, easiest way to get started on GBDX. Start coding Python against DigitalGlobe imagery in a hosted Jupyter Notebook environment. You don't need to install anything and there are easy tools for searching and loading imagery directly in the notebook.
28 |
29 | These tutorials cover the skills and concepts you need to start developing on GBDX with Python. The tutorials progress from learning the basics of coding in the Notebook, to developing robust analysis methods and deploying them at scale on GBDX, and will also cover foundational skills such as searching the Catalog and ordering imagery to GBDX.
30 |
31 | You'll find all the resources to get started on GBDX in the [GBDX Notebooks module](./gbdx_notebooks_module/README.md) in this repo.
32 |
33 | # GBDX and Python: Direct Access
34 |
35 | The tutorials and resources in this section are provided to help you transition from developing via GBDX Notebooks to developing in your local Python environment. This requires installing additional libraries and software, but also provides greater flexibility and integration with your existing analysis tools.
36 |
37 | ### Install gbdxtools
38 |
39 | gbdxtools is a Python library for interacting with the GBDX API. If you've worked on any of the GBDX Notebooks tutorials, then you've used gbdxtools.
40 |
41 | The instructions in this module will help you install gbdxtools in your local development environment, where it's easy to integrate your GBDX workflow with your existing analysis tools. Once you've installed gbdxtools, you can code against DigitalGlobe imagery in exactly the same way as you did in the GBDX Notebooks tutorials, using the same gbdxtools code and commands.
42 |
43 | Find these instructions in the [gbdxtools module](./gbdxtools_module/README.md) in this repo.
44 |
45 | ### Custom Task tutorial
46 | Once you're ready to turn your analysis methods into production-ready analysis tools, you can package your code and dependencies into a Docker, then register it as a Task on GBDX. From there, it's simple to run that Task as many times as you want on as much imagery as you need.
47 |
48 | If you completed the tutorials from the GBDX Notebooks module, you've already registered a Task on GBDX and used it in a Workflow. In that environment, GBDX Notebooks automatically handled all of the Task registration steps for you.
49 |
50 | In this tutorial, we demonstrate how to manually register a Task to GBDX. We provide the code for an example Task, and walk you through the steps of Dockerizing that code, registering it as a Task, then running it in a Workflow on GBDX. This will allow you the greatest flexibility in creating your own, custom analysis tools and running them at scale.
51 |
52 | Find these instructions in the [Custom Task module](./custom_task_module/README.md) in this repo.
53 |
54 | ___
55 | ### Resources
56 |
57 | [__GBDX University__](https://gbdxdocs.digitalglobe.com/)
58 |
59 | [__GBDX Notebooks Hub__](https://notebooks.geobigdata.io)
60 |
61 | [__GBDXtools Documentation__](http://gbdxtools.readthedocs.io/en/latest/)
62 |
63 | [__S3 Browser__](http://s3browser.geobigdata.io/login.html)
64 |
65 | [__GBDX Stories__](http://gbdxstories.digitalglobe.com/)
66 | ___
67 | We would love to hear your feedback. Feel free to email GBDX-support@digitalglobe.com with questions, comments or suggestions.
68 |
--------------------------------------------------------------------------------
/gbdxtools_module/README.md:
--------------------------------------------------------------------------------
1 | ## intro
2 | GBDXtools is a GBDX Python package that allows one to easily access GBDX APIs to search the DigitalGlobe catalog, and order and process imagery. If you've completed any of the GBDX Notebooks tutorials, then you've seen gbdxtools in action. An important thing to know is that once you've installed gbdxtools locally, you can write and run analysis code in your local Python environment exactly as you would in GBDX Notebooks.
3 |
4 | ## to get started
5 |
6 | ### 1. Install Anaconda
7 |
8 | The first required step is to install Anaconda, an open source Python distribution that simplifies package management, dependencies, and environments (we recommend this step even if you already have a Python installation). This distribution includes Conda, which you will need in order to install gbdxtools using the provided conda environment file.
9 |
10 | __1.1__ Download and install the **full version** of [Anaconda](https://www.continuum.io/downloads)
11 |
12 | Requirements:
13 | - All users:
14 | - Python 2.7 version
15 | - *this should not not interfere with other Python versions you may already have*
16 | - 64-Bit is preferred
17 | - Make sure to install the full version Anaconda and not Miniconda to ensure all required dependencies are included
18 | - Windows users:
19 | - During the installation, please check the box that sets Path values when prompted
20 | - *please do this even though it is not the default setting*
21 |
22 | ### 2. Install gbdxtools
23 |
24 | We have provided a Conda environment file for easy installion of gbdxtools, along with its required dependencies, within a Conda virtual environment. Virtual environments keep the dependencies required by different projects in separate places, so that they don't interfere with each other. Please use the following Conda commands to install gbdxtools.
25 |
26 | __Mac users__:
27 |
28 | __2.1__ Download the [gbdxtools_env.yml](../gbdxtools_module/gbdxtools_env.yml) file from this repo
29 |
30 | __2.2__ Open a terminal window
31 |
32 | __2.3__ Update conda
33 | ```
34 | conda update conda
35 | ```
36 | __2.4__ Create the conda environment
37 | ```
38 | conda env create -f /full/path/to/gbdxtools_env.yml
39 | ```
40 | __2.5__ Activate the environment
41 | ```
42 | source activate gbdxtools
43 | ```
44 | __2.6__ Try importing all the modules (if no errors are raised, everything is working right)
45 | ```
46 | python -c 'import rasterio; import fiona; import shapely; import gbdxtools'
47 | ```
48 |
49 | __Windows users__:
50 |
51 | __2.1__ Download the [gbdxtools_env.yml](../gbdxtools_module/gbdxtools_env.yml) file from this repo
52 |
53 | __2.2__ Start Anaconda Prompt
54 |
55 | __2.3__ Update conda
56 | ```
57 | conda update conda
58 | ```
59 | __2.4__ Create the conda environment
60 | ```
61 | conda env create -f C:\\full\path\to\gbdx_environment.yml
62 | ```
63 | __2.5__ Activate the environment
64 | ```
65 | activate gbdxtools
66 | ```
67 | __2.6__ Try importing all the modules (if no errors are raised, everything is working right)
68 | ```
69 | python -c "import rasterio; import fiona; import shapely; import gbdxtools"
70 | ```
71 |
72 | ### 3. Activate your GBDX acount credentials
73 |
74 | __3.1__ Activate your account - once you've been assigned to your company's GBDX account (we will do this for you, please coordinate with your company's GBDX POC), you'll be sent an email from DigitalGlobe with instructions on how to activate your account. Open the message and click on "ACTIVATE YOUR ACCOUNT". This will pop up a window where you will prompted to set a password.
75 |
76 | ### 4. Test your gbdxtools installation
77 |
78 | You can start Python from the gbdxtools environment, copy and paste the following code, and fill in your GBDX credentials. You will know that you've successfully installed gbdxtools if there's no error and it prints your GBDX S3 information.
79 |
80 | ```python
81 | import gbdxtools
82 | gbdx = gbdxtools.Interface(
83 | username='',
84 | password=''
85 | )
86 |
87 | gbdx.s3.info
88 | ```
89 |
90 | From here, you can code against DigitalGlobe imagery using gbdxtools just as you did in the GBDX Notebooks tutorials, using the exact same gbdxtools commands.
91 |
92 | ### 5. Shut it down
93 |
94 | When you are finished with this tutorial, shut down the Jupyter Notebook and kernel, and deactivate the Conda environment where you installed gbdxtools.
95 |
96 | __5.1__ Stop the jupyter notebook by using the keyboard shortcut `CONTROL + C`, then `'Y'` to confirm that you would like to shut down the notebook server.
97 |
98 | __Mac users__:
99 |
100 | __5.2__ Deactivate the virtual environment
101 | ```
102 | source deactivate
103 | ```
104 | __5.3__ (Optional) Remove the environment
105 | ```
106 | conda remove --name gbdxtools --all
107 | ```
108 |
109 | __Windows users__:
110 |
111 | __5.2__ Deactivate the virtual environment
112 | ```
113 | deactivate
114 | ```
115 | __5.3__ (Optional) Remove the environment
116 |
117 | ```
118 | conda remove --name gbdxtools --all
119 | ```
120 | ___
121 | __6. Optional .gbdx-config file:__
122 |
123 | It is recommended that you save your GBDX credentials in a config file, which is safer and more convenient than coding in your GBDX credentials.
124 |
125 | __6.1__ Create a blank text file and copy and paste the following information:
126 | ```
127 | [gbdx]
128 | user_name =
129 | user_password =
130 | ```
131 |
132 | __6.2__ Replace `` and `` with the username (email) and password associated with your GBDX account.
133 |
134 | __6.3__ Save this file in your user directory with the filename `.gbdx-config`
135 | - be sure to include the `.` at the beginning of the filename
136 | - the filename needs to be saved without an extension
137 |
138 | __Mac users__:
139 | Typing `cd ~` in the terminal will take to your user directory, place your config file here.
140 |
141 | __Windows users__:
142 | Typing `echo %USERPROFILE%` will print out the user directory, place your config file here.
143 |
144 | __6.4__ Test your gbdxtools installation and config file:
145 | - Open a terminal/cmd window and type `ipython`
146 | - Copy and paste the following code, which will result in an error if it can't locate your config file or if the formatting is incorrect.
147 |
148 | ```python
149 | from gbdxtools import Interface
150 | gbdx = Interface()
151 | ```
152 | - Once you are done, quit Python by typing `exit()`
153 |
154 |
155 | Config file troubleshooting:
156 |
157 | Windows: select the filename and try to delete the '.txt' extension. You may need to save it as `.gbdx-config.` (notice the trailing period). Also, deselect the 'hide extensions' setting on your computer so you can verify if you've successfully deleted the extension.
158 |
159 | Mac: select the config file, right-click and select 'Get Info' to verify that you've saved the file without an extension.
160 |
161 | ___
162 |
163 |
164 |
165 |
166 | We would love to hear your feedback. Feel free to email GBDX-support@digitalglobe.com with comments and suggestions.
167 |
--------------------------------------------------------------------------------
/advanced_concepts/test-a-docker.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## intro\n",
8 | "Docker is an integral piece to registering a Task on GBDX. In previous tutorials, we've demonstrated how to build a Docker image locally from a Dockerfile and then push that Docker image to Docker Hub to be used in the registration process. To test that the Task works as expected, we've simply used that Task in a Workflow, and then checked the status of the Workflow. If the Workflow succeeded, we could assume that we wrote and registered the Task successfully. \n",
9 | "\n",
10 | "If you'd like to debug the Task code outside of the Workflow system, it's easy enough to test the Task Docker locally. You can simply mount test data in your Docker container in a way that mimics the way GBDX mounts the input data to the Docker container when it's running your Task within a Workflow. You can then run the Task code in the local Docker container and debug from there. This is a faster way to debug your code, and it's one that doesn't use AWS resources. \n",
11 | "\n",
12 | "__Pre-requisites:__\n",
13 | "This tutorial will build off of the string-ports-tutorial.ipynb, where we built the 'Doughnut Task'. We'll re-use the same Task code and Dockerfile that we built during that tutorial.\n",
14 | "\n",
15 | "## 1. Get test data\n",
16 | "To actively participate in this tutorial, you will need to get your hands on some test data. Specifically, you will need a raster and a shapefile that will clip that raster. You can use your own test data, or we describe how to get test data below. We'll also need a ports.json file, but we'll write that as part of the tutorial. \n",
17 | "\n",
18 | "__raster:__ if you've completed the gbdxtools-tutorial.ipynb, you will likely have run the AOP task on the image with the Cat ID '10400100245B7800', and it will be stored in your customer S3 bucket under 'demo_output/aop_10400100245B7800/'. You can download this image directly from the [S3 browser](http://s3browser.geobigdata.io/login.html). \n",
19 | "\n",
20 | "__shapefile:__ download the shapefile 'this_shp_will_clip_10400100245B7800.zip' from the [github repo](https://github.com/GeoBigData/gbdx-training/tree/master/advanced_concepts) where you downloaded this notebook, and un-zip it. \n",
21 | "\n",
22 | "___\n",
23 | "## 2. Write a test ports.json file\n",
24 | "We want to mimic what GBDX will do when it spins up your Task, which is to mount input data and a ports.json file to the running Docker container. If we were to call the Doughnut Task in gbdxtools and specify the clip selection like so...\n",
25 | "\n",
26 | "```python\n",
27 | "doughnut_task = gbdx.Task(\"doughnut_clip\", input_raster='', input_shapefile='', clip_selection=\"doughnut\")\n",
28 | "```\n",
29 | "\n",
30 | "...GBDX will generate a ports.json file like so and mount it to the running Docker container:\n",
31 | "\n",
32 | "```json\n",
33 | "{\n",
34 | " \"clip_selection\":\"doughnut\"\n",
35 | "}\n",
36 | "```\n",
37 | "\n",
38 | "To test the Task code within the Task Docker, we can write a ports.json file with the same contents. "
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "#### 2.1 Navigate to the folder with your test data by filling in the full path to the data directory in the command below and executing this cell. "
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": null,
51 | "metadata": {
52 | "collapsed": true
53 | },
54 | "outputs": [],
55 | "source": [
56 | "cd "
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "metadata": {},
62 | "source": [
63 | "#### 2.2 Write a ports.json file and add the example json code just described. Or run the code in the following cell, which will write the file for you. "
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {
70 | "collapsed": true
71 | },
72 | "outputs": [],
73 | "source": [
74 | "%%writefile ports.json\n",
75 | "{\n",
76 | " \"clip_selection\":\"doughnut\"\n",
77 | "}"
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": [
84 | "___\n",
85 | "## 3. Mount test data to a Docker container\n",
86 | "In this next step, we will run a Docker container from the Docker image that we built in the previous string-ports.ipynb tutorial notebook. When we start the container, we'll add the Docker command to mount a data volume so that we can test the Task code within the container. \n",
87 | "\n",
88 | "#### 3.1 Start Docker and bring up a terminal window\n",
89 | "\n",
90 | "#### 3.2 Run the following Docker command to start a container with mounted test data, filling in the full path to the test raster, shapefile, and ports.json files saved on your computer, and the name of the doughnut docker image that you previously built\n",
91 | "\n",
92 | "Linux:\n",
93 | "\n",
94 | "```\n",
95 | "docker run -it --rm -v :/mnt/work/input/input_raster -v :/mnt/work/input/input_shapefile -v :/mnt/work/input/ports.json bash\n",
96 | "```\n",
97 | "Windows:\n",
98 | "\n",
99 | "```\n",
100 | "docker run -it --rm -v c:\\:c:\\mnt\\work\\input_raster -v c:\\:c:\\mnt\\work\\input_shapefile -v c:\\:c:\\mnt\\work\\input\\ports.json bash\n",
101 | "```\n",
102 | "\n",
103 | "For example, my Docker command looks like this:\n",
104 | "\n",
105 | "```\n",
106 | "docker run -it --rm -v /Users/elizabethgolden/Documents/test_data/input_raster:/mnt/work/input/input_raster -v /Users/elizabethgolden/Documents/test_data/input_shapefile:/mnt/work/input/input_shapefile -v /Users/elizabethgolden/Documents/test_data/ports.json:/mnt/work/input/ports.json gbdxtrainer/doughnut_docker bash\n",
107 | "```\n",
108 | "\n",
109 | "Here's a breakdown of the above Docker run command:\n",
110 | "\n",
111 | ">`docker` : the base command for the Docker CLI\n",
112 | ">\n",
113 | ">`run` : run the following commands in a new container\n",
114 | ">\n",
115 | ">`-it` : interactive mode\n",
116 | ">\n",
117 | ">`--rm` : removes container upon exit\n",
118 | ">\n",
119 | ">`-v` : bind mount a volume\n",
120 | ">\n",
121 | ">`` : this will be the absolute filepath to the local directory where you have saved a test raster/shapefile/port.json \n",
122 | ">\n",
123 | ">`/mnt/work/input/input_raster` : the directory where the test raster/shapefile/ports.json you specified will be copied to in the Docker container\n",
124 | ">\n",
125 | ">`` : the name of your tagged Docker image, which will likely be in the format of your Docker Hub username and the name of your repository on Docker Hub for this image\n",
126 | ">\n",
127 | ">`bash` : run container with bash prompt \n",
128 | "\n",
129 | "\n",
130 | "#### 3.3 You can now nagivate within the Docker container to the mounted test data using the following command:\n",
131 | "\n",
132 | "```\n",
133 | "cd mnt/work/input\n",
134 | "```\n",
135 | "\n",
136 | "#### 3.4 You should see your test data when you list the contents of the input directory\n",
137 | "\n",
138 | "```\n",
139 | "ls\n",
140 | "```\n",
141 | "\n",
142 | "For example: \n",
143 | "\n",
144 | "```\n",
145 | "root@1539752ff9b6:/# cd mnt/work/input/\n",
146 | "root@1539752ff9b6:/mnt/work/input# ls\n",
147 | "input_raster input_shapefile ports.json\n",
148 | "```"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "___\n",
156 | "## 4. Test the Task code\n",
157 | "You're all set to test the code within the Docker container. \n",
158 | "\n",
159 | "#### 4.1 Navigate back to the root directory, then navigate to the doughnut_task.py script using the following commands:\n",
160 | "\n",
161 | "```\n",
162 | "cd /my_scripts/\n",
163 | "```\n",
164 | "\n",
165 | "#### 4.2 Then execute the doughnut_task.py script\n",
166 | "\n",
167 | "```\n",
168 | "python doughnut_task.py\n",
169 | "```\n",
170 | "\n",
171 | "#### 4.3 When the script completes, you should be able to back out to the root directory, then navigate to the output directory to see the output of the doughnut_task.py using the following commands:\n",
172 | "\n",
173 | "```\n",
174 | "cd /mnt/work/output/\n",
175 | "```"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "## conclusion\n",
183 | "You've completed the basic steps for testing Task code within a running Docker container by mounting test data volumes. Testing your Task code in the Docker container simplifies the debugging process and is a recommended step before pushing the Docker to Docker Hub. "
184 | ]
185 | }
186 | ],
187 | "metadata": {
188 | "anaconda-cloud": {},
189 | "kernelspec": {
190 | "display_name": "Python [conda root]",
191 | "language": "python",
192 | "name": "conda-root-py"
193 | },
194 | "language_info": {
195 | "codemirror_mode": {
196 | "name": "ipython",
197 | "version": 2
198 | },
199 | "file_extension": ".py",
200 | "mimetype": "text/x-python",
201 | "name": "python",
202 | "nbconvert_exporter": "python",
203 | "pygments_lexer": "ipython2",
204 | "version": "2.7.12"
205 | }
206 | },
207 | "nbformat": 4,
208 | "nbformat_minor": 1
209 | }
210 |
--------------------------------------------------------------------------------
/workshop_prep/README.md:
--------------------------------------------------------------------------------
1 | ## intro
2 | Please follow the steps below to ensure that your system is setup and tested prior to an in-person GBDX training or workshop.
3 |
4 | By the end of this walkthrough, you should have:
5 | - GBDX credentials including your client ID and client secret
6 | - installed: Anaconda, gbdxtools, Docker
7 | - successfully run the provided test Jupyter Notebook with your GBDX credentials
8 | - signed up for a free Docker Hub account
9 | - pulled and pushed an example Docker image to Docker Hub
10 |
11 | ## to get started
12 | GBDXtools is a GBDX Python package that allows one to easily access GBDX APIs to search the DigitalGlobe catalog, and order and process imagery.
13 |
14 | The following steps walk you through an easy, proven way of installing gbdxtools and its dependencies on Mac, Windows or Linux.
15 |
16 | __1. Install Anaconda__
17 | The first required step is to install Anaconda, an open source Python distribution that simplifies package management, dependencies, and environments (we recommend this step even if you already have a Python installation). This distribution includes Conda, which you will need in order to install gbdxtools using the provided conda environment file, and Jupyter Notebook, which you'll need to run the provided Jupyter Notebooks that contains the GBDX tutorials.
18 |
19 | __1.1__ Download and install the **full version** of [Anaconda](https://www.continuum.io/downloads)
20 |
21 | Requirements:
22 | - All users:
23 | - Python 2.7 version
24 | - *this should not not interfere with other Python versions you may already have*
25 | - 64-Bit is preferred
26 | - Make sure to install the full version Anaconda and not Miniconda to ensure all required dependencies are included
27 | - Windows users:
28 | - During the installation, please check the box that sets Path values when prompted
29 | - *please do this even though it is not the default setting*
30 |
31 | __2. Install gbdxtools__
32 |
33 | We have provided a Conda environment file for easy installion of gbdxtools, along with its required dependencies, within a Conda virtual environment. Virtual environments keep the dependencies required by different projects in separate places, so that they don't interfere with each other. Please use the following Conda commands to install gbdxtools.
34 |
35 | __Mac users__:
36 |
37 | __2.1__ Download the [gbdxtools_env.yml](../workshop_prep/gbdxtools_env.yml) file from this repo
38 |
39 | __2.2__ Open a terminal window
40 |
41 | __2.3__ Update conda
42 | ```
43 | conda update conda
44 | ```
45 | __2.4__ Create the conda environment
46 | ```
47 | conda env create -f /full/path/to/gbdxtools_env.yml
48 | ```
49 | __2.5__ Activate the environment
50 | ```
51 | source activate gbdxtools
52 | ```
53 | __2.6__ Try importing all the modules (if no errors are raised, everything is working right)
54 | ```
55 | python -c "import rasterio; import fiona; import shapely; import gbdxtools"
56 | ```
57 |
58 | __Windows users__:
59 |
60 | __2.1__ Download the [gbdxtools_env.yml](../workshop_prep/gbdxtools_env.yml) file from this repo
61 |
62 | __2.2__ Open cmd.exe
63 |
64 | __2.3__ Update conda
65 | ```
66 | conda update conda
67 | ```
68 | __2.4__ Create the conda environment
69 | ```
70 | conda env create -f C:\\full\path\to\gbdx_environment.yml
71 | ```
72 | __2.5__ Activate the environment
73 | ```
74 | activate gbdxtools
75 | ```
76 | __2.6__ Try importing all the modules (if no errors are raised, everything is working right)
77 | ```
78 | python -c "import rasterio; import fiona; import shapely; import gbdxtools"
79 | ```
80 |
81 | __3. Activate your GBDX acount credentials and locate your API key__
82 |
83 | __3.1__ Activate your account - once you've been assigned to your company's GBDX account (we will do this for you, please coordinate with your company's GBDX POC), you should receive an email from DigitalGlobe instructing you to activate your account. Open the message and click on "ACTIVATE YOUR ACCOUNT". This will pop up a window in your browser where you will prompted to set a password.
84 |
85 | __3.2__ Sign in to the [GBDX Web App](https://gbdx.geobigdata.io) with your GBDX username and password.
86 |
87 | __3.3__ Find your profile - first click the user icon in the lower left corner, then your username.
88 |
89 | __3.4__ Look for the strings called “Client ID” and “Client Secret.” These are the GBDX credentials you'll need later to authenticate into GBDX using gbdxtools.
90 |
91 | __4. Start Jupyter Notebook and authenticate into GBDX__
92 |
93 | Jupyter Notebook is an open-source web application that makes it easy to create and share documents - called notebooks - that contain live code and explanatory text. When you start the app, it will launch a Python 'kernel' and the Notebook Dashboard from your browser. From here, you can open and close notebooks and manage running kernels [(documentation here)](https://jupyter.readthedocs.io/en/latest/running.html#running). Use the following instructions to start the provided gbdxtools test notebook.
94 |
95 | __4.1__ Download the file [gbdxtools-test.ipynb](../workshop_prep/gbdxtools-test.ipynb) from this repository.
96 |
97 | __4.2__ Go back to the terminal window where you have activated the gbdxtools environment and navigate to the directory where you saved gbdxtools-test.ipynb.
98 |
99 | __4.3__ Start the Jupyter Notebook app.
100 | ```
101 | jupyter notebook
102 | ```
103 | __4.4__ Click gbdxtools-test.ipynb to launch the notebook.
104 |
105 | __4.5__ Switch to the gbdxtools environment kernel:
106 | - select the "Kernel" dropdown menu from the toolbar
107 | - hover over "Change Kernel"
108 | - select Python [conda env:gbdxtools]
109 |
110 | __4.5__ Follow the directions that demonstrate how to run code in a notebook and test the GBDX credentials you retrieved in step 3.
111 |
112 | __Docker__
113 |
114 | Docker is a software containerization techonology that makes it easy to share software in lightweight, portable packages. We use Docker technology to package up custom algorithms and run them against imagery in the cloud.
115 |
116 | __5. Sign up for a Docker Hub account__
117 |
118 | __5.1__ Sign up for a free Docker Hub account at https://hub.docker.com/
119 |
120 | __6. Install Docker and test__
121 |
122 | __6.1__ Install Docker from https://www.docker.com/ (some Windows users may need to install Docker Engine)
123 |
124 | __6.2__ Start the Docker app.
125 |
126 | __6.3__ Bring up a terminal/cmd window. The following Docker command will return your Docker version.
127 | ```
128 | docker --version
129 | ```
130 |
131 | __6.4__ Test pull a Docker image: Use the following command to pull the miniconda image from Docker Hub. Miniconda is a lighweight alternative to Anaconda that is simply the Conda package manager and its dependencies, in a Docker image.
132 | ```
133 | docker pull continuumio/miniconda
134 | ```
135 | __6.6__ Confirm that the image downloaded, you should see it listed it when you use the following command.
136 | ```
137 | docker images
138 | ```
139 | __6.7__ Tag the image so that you can edit it and push it to your own repository. Replace `' with your Docker Hub username.
140 | ```
141 | docker tag continuumio/miniconda /miniconda-docker
142 | ```
143 |
144 | __6.8__ Test push a Docker image: first, log in to Docker Hub using your Docker Hub username and password.
145 | ```
146 | docker login -u -p
147 | ```
148 |
149 | __6.9__ Test push a Docker image: Use the following command to push your tagged image to your Docker Hub account. Remember to replace `` with your Docker Hub username. This may take a few minutes, but it should appear as a repository on Docker Hub under your account.
150 | ```
151 | docker push /miniconda-docker
152 | ```
153 |
154 | __7. Shut it all down__
155 |
156 | When you are finished with this tutorial, shut down the Jupyter Notebook and kernel, and deactivate the Conda environment where you installed gbdxtools.
157 |
158 | __7.1__ Stop the Jupyter Notebook by typing `CONTROL + C`, then `'Y'` to confirm that you would like to shut down the notebook server.
159 |
160 | __Mac users__:
161 |
162 | __7.2__ Deactivate the virtual environment
163 | ```
164 | source deactivate
165 | ```
166 | __7.3__ (Optional) Remove the environment
167 | ```
168 | conda remove --name gbdxtools --all
169 | ```
170 |
171 | __Windows users__:
172 |
173 | __7.2__ Deactivate the virtual environment
174 | ```
175 | deactivate gbdxtools
176 | ```
177 | __7.3__ (Optional) Remove the environment
178 |
179 | ```
180 | conda remove --name gbdxtools --all
181 | ```
182 | ___
183 |
184 | __8. Optional .gbdx-config file:__
185 |
186 | Save a gbdxtools config file to your home directory with your GBDX credentials. This will allow you to authenticate a gbdxtools session without needing to enter your credentials each time.
187 |
188 | __8.1__ You will need your GBDX credentials again (located within your profile information in the [GBDX Web App](https://gbdx.geobigdata.io/profile)).
189 |
190 | __8.2__ Create a blank text file and copy and paste the following information:
191 | ```
192 | [gbdx]
193 | auth_url = https://geobigdata.io/auth/v1/oauth/token/
194 | client_id =
195 | client_secret =
196 | user_name =
197 | user_password =
198 | ```
199 | __8.3__ Replace `` and `` with your credentials, also `` and `` with the username and password associated with your GBDX account.
200 |
201 | __8.4__ Save this file in your home directory with the filename `.gbdx-config`
202 | - be sure to include the `.` at the beginning of the filename
203 | - the filename needs to be saved without an extension
204 |
205 | __8.5__ Test your gbdxtools installation and config file:
206 | - Open a terminal/cmd window and type `python`
207 | - Copy and paste the following code, which will result in an error if it can't locate your config file or if the formatting is incorrect
208 |
209 | ```python
210 | from gbdxtools import Interface
211 | gbdx = Interface()
212 | ```
213 | - Once you are done, quit Python by typing `exit()`
214 |
215 |
216 | Config file troubleshooting:
217 |
218 | Windows: select the filename and try to delete the '.txt' extension. You may need to save it as `.gbdx-config.` (notice the trailing period). Also, deselect the 'hide extensions' setting on your computer so you can verify if you've successfully deleted the extension.
219 |
220 | Mac: select the config file, right-click and select 'Get Info' to verify that you've saved the file without an extension.
221 |
222 | ___
223 |
224 | We would love to hear your feedback. Feel free to email GBDX-support@digitalglobe.com with comments and suggestions.
225 |
--------------------------------------------------------------------------------
/advanced_concepts/string-ports-tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## intro\n",
8 | "In the custom-task-tutorial notebook, we demonstrated how to create a Task that takes file-based data as an input to the Task code. But what if the user needs to pass in a parameter when they use the tool? This is where you would use a 'string' port versus a 'directory' port. This tutorial will demonstrate how to write the Task code to read input from a string port, how to register a Task with string ports, and an example of how the string port is used when calling the task with gbdxtools. \n",
9 | "\n",
10 | "For this tutorial, it will be helpful to have completed the custom-task-tutorial notebook first, as this tutorial will build off of it. We are going to take the Task that we wrote in that tutorial, which clips a raster to a shapefile, and add an additional functionality to it. We're going to add a parameter to the Task which allows the user to specify which portion of the raster will be removed, either the portion of the raster that falls outside the shapefile or the that which falls within the shapefile. For the sake of clarity, and a little brevity, we're going to call this the Doughnut Task, in which a user can select between doughnut and doughnut-hole for the clip selection. If 'doughnut' is specified in the call to this Task within a Workflow, then the inner portion of the raster will be removed, and the outer portion will be retained. If 'doughnut-hole' is specified, then the outer portion of the raster will be removed, and the inner portion retained."
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "### string ports: aop task example\n",
18 | "In order to understand how string ports are handled in GBDX, let's review a common Task already on GBDX, the Advanced Image Preprocessor Task. This is the Task that orthorectifies the raw imagery that comes off of the satellites, and handles image preprocessing options such as atmospheric compensation and pan sharpening. \n",
19 | "\n",
20 | "To review, we call this Task in gbdxtools using its registered Task name, \"AOP_Strip_Processor\", and then set it's required inputs. This Task has only one required input, \"data\", which is the S3 URL of the raw image to be processed. \n",
21 | "\n",
22 | "```python\n",
23 | "source_s3 = 's3://receiving-dgcs-tdgplatform-com/056244928010_01_003'\n",
24 | "aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3)\n",
25 | "```"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "Although this Task only has one required input, it has several optional inputs that allow the user to specify exactly how the image should be processed ([documentation here](https://gbdxdocs.digitalglobe.com/docs/advanced-image-preprocessor)). The user can specify that the image be pan sharpened, have dynamic range adjustment applied, specify the projection for orthorectification, etc. \n",
33 | "\n",
34 | "Here's an example that explicitly specifies additional processing steps.\n",
35 | "\n",
36 | "```python\n",
37 | "aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3, enable_acomp=True, enable_pansharpen=False, enable_dra=False)\n",
38 | "```"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "Let's compare the types of data input here. For the input port '`data`', we're passing in a file-based type data - the raw image that is stored in the S3 directory we've specified. When the Task runs, the specified input data will be automatically mounted inside the Docker container, at `/mnt/work/input/data`.\n",
46 | "\n",
47 | "But for the parameters that are passed in via string ports, a ports.json file is automatically generated and mounted inside the Docker container, at `/mnt/work/input/ports.json`. The ports.json file contains the name/value pairs for each string port. Here's an example of the type of ports.json file that is automatically generated when our above AOP task is called. \n",
48 | "\n",
49 | "```json\n",
50 | "{\n",
51 | "\t\"enable_acomp\": true,\n",
52 | "\t\"enable_pansharpen\": false,\n",
53 | "\t\"enable_dra\": false\n",
54 | "}\n",
55 | "```"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {},
61 | "source": [
62 | "### string ports: doughnut task example\n",
63 | "Now, let's take this concept and apply it to our example Doughnut Task. We've already discussed how the code needs to point to the `/mnt/work/input/` for its input data. Let's look at how we can point the Task code to parameter inputs read from the ports.json file. First, here's an example calling the Doughtnut Task in gbdxtools:\n",
64 | "\n",
65 | "```python\n",
66 | "source_s3 = 's3://receiving-dgcs-tdgplatform-com/056244928010_01_003'\n",
67 | "doughnut_task = gbdx.Task('doughnut_clip', data_in=source_s3, clip_selection='doughnut')\n",
68 | "```\n",
69 | "\n",
70 | "When this Task runs with the Workflow system, the input image will automatically be mounted at `/mnt/work/input/data_in`, and a ports.json file will be automatically generated and mounted at `/mnt/work/input/ports.json`. This is what the ports.json file will contain:\n",
71 | "\n",
72 | "```json\n",
73 | "{\n",
74 | " \"clip_selection\":\"doughnut\"\n",
75 | "}\n",
76 | "```"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "Now, let's modify the Task code to read its parameter input from the ports.json file. The first part of the Doughnut Task script will be the same as the Clip Raster Task script from the previous tutorial. \n",
84 | "\n",
85 | "> We need to import the raster and shapefile processing libraries, and pull the raster and shapefile from their associated input ports. \n",
86 | "\n",
87 | "```python\n",
88 | "# import libraries \n",
89 | "import fiona\n",
90 | "import rasterio\n",
91 | "import rasterio.mask\n",
92 | "import os\n",
93 | "import glob\n",
94 | "\n",
95 | "# set the input ports path\n",
96 | "in_path = '/mnt/work/input'\n",
97 | "shape_path = in_path + '/input_shapefile'\n",
98 | "raster_path = in_path + '/input_raster'\n",
99 | "\n",
100 | "# search the input shapefile port for the first shapefile that we specify in the call to this task\n",
101 | "my_shape = glob.glob1(shape_path, '*.shp')[0]\n",
102 | "\n",
103 | "# search the input image port for the first geotiff that we specify in the call to this task\n",
104 | "my_raster = glob.glob1(raster_path, '*.tif')[0]\n",
105 | "```\n",
106 | "\n",
107 | "> Next, we open the ports.json file and load its contents \n",
108 | "\n",
109 | "```python\n",
110 | "with open('/mnt/work/input/ports.json') as portsfile:\n",
111 | " ports_js = json.load(portsfile)\n",
112 | "```\n",
113 | "\n",
114 | "> Assign the value from 'clip_selection' to the `crop_selection` variable\n",
115 | "\n",
116 | "```python\n",
117 | "cs = ports_js['clip_selection']\n",
118 | "```\n",
119 | "\n",
120 | "> Then we write some simple logic that sets the default clip parameters, which is what we want if the user select `doughnut_hole`, but then reverse the values if the user selects `doughnut`\n",
121 | "\n",
122 | "```python\n",
123 | "# set default clip parameters, reverse values if clip_selection is 'doughnut'\n",
124 | "invert_method = False\n",
125 | "crop_method = True\n",
126 | "\n",
127 | "if crop_select == 'doughnut':\n",
128 | " invert_method = True\n",
129 | " crop_method = False\n",
130 | "```\n",
131 | "\n",
132 | "> The following code is identical to our original Clip Task code, wherein we define the output port and its filepath and navigate to the output directory, then use the Fiona library to grab the geometery from the shapefile. \n",
133 | "\n",
134 | "```python\n",
135 | "# define the name of the output data port\n",
136 | "out_path = '/mnt/work/output/data_out'\n",
137 | "\n",
138 | "# create the output data port\n",
139 | "if os.path.exists(out_path) == False:\n",
140 | " os.makedirs(out_path)\n",
141 | "\n",
142 | "# change directories to the output data port\n",
143 | "os.chdir(out_path)\n",
144 | "\n",
145 | "# open the input shapefile and get the polygon features for clipping\n",
146 | "with fiona.open(os.path.join(shape_path, my_shape), \"r\") as shapefile:\n",
147 | " features = [feature[\"geometry\"] for feature in shapefile]\n",
148 | "```\n",
149 | "\n",
150 | "> This next code block is similar to the Clip Task code that uses the Rasterio library to clip the raster and copy its metadata, but we add the '`invert`' parameter and pass in the '`invert_method`' we defined above, which is `True` if the user specifies '`doughnut`' and `False` if they specify '`doughnut-hole`'. If `invert_method` is True, `crop` must be False. \n",
151 | "\n",
152 | "```python\n",
153 | "# open the input image, clip the image with the shapefile and get the image metadata\n",
154 | "with rasterio.open(os.path.join(raster_path, my_raster)) as src:\n",
155 | " out_raster, out_transform = rasterio.mask.mask(src, features, crop=crop_method, invert=invert_method)\n",
156 | " out_meta = src.meta.copy()\n",
157 | "```\n",
158 | "\n",
159 | "> Finally, write out the metadata to the new raster, 'masked.tif', as we did before\n",
160 | "\n",
161 | "```python\n",
162 | "# write out the metadata to the raster\n",
163 | "out_meta.update({\"driver\": \"GTiff\",\n",
164 | " \"height\": out_raster.shape[1],\n",
165 | " \"width\": out_raster.shape[2],\n",
166 | " \"transform\": out_transform})\n",
167 | "\n",
168 | "# write out the output raster\n",
169 | "with rasterio.open(\"masked.tif\", \"w\", **out_meta) as dest:\n",
170 | " dest.write(out_raster)\n",
171 | "```"
172 | ]
173 | },
174 | {
175 | "cell_type": "markdown",
176 | "metadata": {},
177 | "source": [
178 | "___\n",
179 | "Now that we understand a little more about how to use string ports, let's write and register the new Doughnut Task to the GBDX registery, following the same sequence of steps that we did in the Clip Task registeration tutorial."
180 | ]
181 | },
182 | {
183 | "cell_type": "markdown",
184 | "metadata": {},
185 | "source": [
186 | "## 1. Inputs and outputs\n",
187 | "We've already covered the Doughnut Task code, in these next few steps we're just going to set up a directory, write and save the task code there. \n",
188 | "\n",
189 | "#### 1.1 Run the code in the following cell to create the 'doughnut_tutorial_files/docker_projects/bin' directory."
190 | ]
191 | },
192 | {
193 | "cell_type": "code",
194 | "execution_count": 1,
195 | "metadata": {
196 | "collapsed": false
197 | },
198 | "outputs": [],
199 | "source": [
200 | "import os\n",
201 | "if os.path.exists('doughnut_tutorial_files') == False:\n",
202 | " os.makedirs('doughnut_tutorial_files/docker_projects/bin')"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "#### 1.2 Run the code in the following cell to navigate to the directory you just created."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 2,
215 | "metadata": {
216 | "collapsed": false
217 | },
218 | "outputs": [
219 | {
220 | "name": "stdout",
221 | "output_type": "stream",
222 | "text": [
223 | "/Users/elizabethgolden/Dropbox/DG/notebooks/doughnut_tutorial_files/docker_projects/bin\n"
224 | ]
225 | }
226 | ],
227 | "source": [
228 | "cd doughnut_tutorial_files/docker_projects/bin"
229 | ]
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": [
235 | "#### 1.3 Run the code in the following cell to write the code that we just reviewed to 'doughnut_task.py'."
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": 3,
241 | "metadata": {
242 | "collapsed": false
243 | },
244 | "outputs": [
245 | {
246 | "name": "stdout",
247 | "output_type": "stream",
248 | "text": [
249 | "Overwriting doughnut_task.py\n"
250 | ]
251 | }
252 | ],
253 | "source": [
254 | "%%writefile doughnut_task.py\n",
255 | "\n",
256 | "# import libraries\n",
257 | "import fiona\n",
258 | "import rasterio\n",
259 | "import rasterio.mask\n",
260 | "import json\n",
261 | "import os\n",
262 | "import glob\n",
263 | "\n",
264 | "# set the input ports path\n",
265 | "in_path = '/mnt/work/input'\n",
266 | "shape_path = in_path + '/input_shapefile'\n",
267 | "raster_path = in_path + '/input_raster'\n",
268 | "\n",
269 | "# search the input shapefile port for the first shapefile that we specify in the call to this task\n",
270 | "my_shape = glob.glob1(shape_path, '*.shp')[0]\n",
271 | "\n",
272 | "# search the input image port for the first geotiff that we specify in the call to this task\n",
273 | "my_raster = glob.glob1(raster_path, '*.tif')[0]\n",
274 | "\n",
275 | "# open and load the contents of ports.json\n",
276 | "with open('/mnt/work/input/ports.json') as portsfile:\n",
277 | " ports_js = json.load(portsfile)\n",
278 | "\n",
279 | "# assign the value from 'crop_selection'\n",
280 | "crop_select = ports_js['clip_selection']\n",
281 | "\n",
282 | "# set default clip parameters, reverse the values if clip_selection is 'doughnut'\n",
283 | "invert_method = False\n",
284 | "crop_method = True\n",
285 | "\n",
286 | "if crop_select == 'doughnut':\n",
287 | " invert_method = True\n",
288 | " crop_method = False\n",
289 | "\n",
290 | "# define the name of the output data port\n",
291 | "out_path = '/mnt/work/output/data_out'\n",
292 | "\n",
293 | "# create the output data port\n",
294 | "if os.path.exists(out_path) == False:\n",
295 | " os.makedirs(out_path)\n",
296 | "\n",
297 | "# change directories to the output data port\n",
298 | "os.chdir(out_path)\n",
299 | "\n",
300 | "# open the input shapefile and get the polygon features for clipping\n",
301 | "with fiona.open(os.path.join(shape_path, my_shape), \"r\") as shapefile:\n",
302 | " features = [feature[\"geometry\"] for feature in shapefile]\n",
303 | "\n",
304 | "# open the input image, clip the image with the shapefile and get the image metadata\n",
305 | "with rasterio.open(os.path.join(raster_path, my_raster)) as src:\n",
306 | " out_raster, out_transform = rasterio.mask.mask(src, features, crop=crop_method, invert=invert_method)\n",
307 | " out_meta = src.meta.copy()\n",
308 | "\n",
309 | "# write out the metadata to the raster\n",
310 | "out_meta.update({\"driver\": \"GTiff\",\n",
311 | " \"height\": out_raster.shape[1],\n",
312 | " \"width\": out_raster.shape[2],\n",
313 | " \"transform\": out_transform})\n",
314 | "\n",
315 | "# write out the output raster\n",
316 | "with rasterio.open(\"masked.tif\", \"w\", **out_meta) as dest:\n",
317 | " dest.write(out_raster)"
318 | ]
319 | },
320 | {
321 | "cell_type": "markdown",
322 | "metadata": {},
323 | "source": [
324 | "___\n",
325 | "## 2. Dockerfile\n",
326 | "We're going to write a Dockerfile the same as before, only this time we're adding wrapping up the new Doughnut Task script. \n",
327 | "\n",
328 | "#### 2.1 Run the code in the following cell to navigate back one folder to /docker_projects."
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": 4,
334 | "metadata": {
335 | "collapsed": false
336 | },
337 | "outputs": [
338 | {
339 | "name": "stdout",
340 | "output_type": "stream",
341 | "text": [
342 | "/Users/elizabethgolden/Dropbox/DG/notebooks/doughnut_tutorial_files/docker_projects\n"
343 | ]
344 | }
345 | ],
346 | "source": [
347 | "cd .."
348 | ]
349 | },
350 | {
351 | "cell_type": "markdown",
352 | "metadata": {},
353 | "source": [
354 | "#### 2.2 Run the code in the following cell to write '`Dockerfile`', note that we're placing the new '`doughnut_task.py`' script here. "
355 | ]
356 | },
357 | {
358 | "cell_type": "code",
359 | "execution_count": 5,
360 | "metadata": {
361 | "collapsed": false
362 | },
363 | "outputs": [
364 | {
365 | "name": "stdout",
366 | "output_type": "stream",
367 | "text": [
368 | "Overwriting Dockerfile\n"
369 | ]
370 | }
371 | ],
372 | "source": [
373 | "%%writefile Dockerfile\n",
374 | "FROM continuumio/miniconda\n",
375 | "\n",
376 | "RUN conda install rasterio\n",
377 | "RUN conda install fiona\n",
378 | "\n",
379 | "RUN mkdir /my_scripts\n",
380 | "ADD ./bin /my_scripts\n",
381 | "CMD python /my_scripts/doughnut_task.py"
382 | ]
383 | },
384 | {
385 | "cell_type": "markdown",
386 | "metadata": {},
387 | "source": [
388 | "___\n",
389 | "## 3. Build your Docker\n",
390 | "Next, build the new Docker image as we did for the Clip Raster Task, but for the Doughnut Task. \n",
391 | "\n",
392 | "NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO DOCKER\n",
393 | "\n",
394 | "#### 3.1 Bring up a terminal window on your computer and start Docker.\n",
395 | "\n",
396 | "#### 3.2 Within the terminal window, navigate to the folder containing the Dockerfile, under 'doughnut_tutorial_files/docker_projects'.\n",
397 | "\n",
398 | "```\n",
399 | "cd /doughnut_tutorial_files/docker_projects\n",
400 | "```"
401 | ]
402 | },
403 | {
404 | "cell_type": "markdown",
405 | "metadata": {},
406 | "source": [
407 | "#### 3.3 Copy and paste the following Docker command to build a Docker from your Dockerfile, but __FIRST REPLACE 'gbdxtrainer' WITH YOUR DOCKER USERNAME__. \n",
408 | "\n",
409 | "```\n",
410 | "docker build -t gbdxtrainer/doughnut_docker .\n",
411 | "```"
412 | ]
413 | },
414 | {
415 | "cell_type": "markdown",
416 | "metadata": {},
417 | "source": [
418 | "## 4. Push your Docker to Docker Hub\n",
419 | "\n",
420 | "#### 4.1 While still within the terminal window, log in to Docker Hub using the following Docker command USING YOUR DOCKER HUB LOGIN CREDENTIALS. \n",
421 | "```\n",
422 | "docker login --username gbdxtrainer --password a_fake_password\n",
423 | "```"
424 | ]
425 | },
426 | {
427 | "cell_type": "markdown",
428 | "metadata": {},
429 | "source": [
430 | "#### 4.2 Once logged in, use the following Docker command to push your Docker image to Docker Hub, CHANGE TO YOUR DOCKER USERNAME. Note: this may a few minutes.\n",
431 | "```\n",
432 | "docker push gbdxtrainer/doughnut_docker\n",
433 | "```\n",
434 | "___"
435 | ]
436 | },
437 | {
438 | "cell_type": "markdown",
439 | "metadata": {},
440 | "source": [
441 | "## 5. Add GBDX collaborators to your Docker Hub repository\n",
442 | "Your Docker repository on Docker Hub can be public or private, but certain GBDX collaborators must be added to the repository in order for the Platform to pull and run the Docker. \n",
443 | "\n",
444 | "#### 5.1 Log in to Docker Hub https://hub.docker.com/\n",
445 | "\n",
446 | "You should now see the Docker image that you just pushed to Docker Hub, in it's own repository of the same name. \n",
447 | "\n",
448 | "#### 5.2 Open the repository and select the 'Collaborators' tab. Under 'Username', enter each of the following as Collaborators to your repository. This is what will allow GBDX to pull and execute your Task. \n",
449 | "```\n",
450 | "tdgpbuild\n",
451 | "tdgpdeploy\n",
452 | "tdgplatform\n",
453 | "```\n",
454 | "___"
455 | ]
456 | },
457 | {
458 | "cell_type": "markdown",
459 | "metadata": {},
460 | "source": [
461 | "## 6. Task definitition \n",
462 | "NOTE: WE'RE BACK TO THE JUPYTER NOTEBOOK FOR THE REST OF THE TUTORIAL\n",
463 | "\n",
464 | "We are going to write a Task definition schema as we did before, only add the new string input port, as shown here. Note that the 'type' for this input port is 'string', not 'directory'. \n",
465 | "\n",
466 | "```json\n",
467 | "{\n",
468 | " \"required\": true,\n",
469 | " \"description\": \"The part of raster to retain when clipped, options are 'doughnut' or 'doughnut-hole'.\",\n",
470 | " \"name\": \"clip_selection\",\n",
471 | " \"type\": \"string\"\n",
472 | " }\n",
473 | "```\n",
474 | "\n",
475 | "#### 6.1 Run the code in the following cell to navigate back one directory (back out of the '`/docker_projects`' directory to the '`/task_tutorial_files`' directory)."
476 | ]
477 | },
478 | {
479 | "cell_type": "code",
480 | "execution_count": 6,
481 | "metadata": {
482 | "collapsed": false
483 | },
484 | "outputs": [
485 | {
486 | "name": "stdout",
487 | "output_type": "stream",
488 | "text": [
489 | "/Users/elizabethgolden/Dropbox/DG/notebooks/doughnut_tutorial_files\n"
490 | ]
491 | }
492 | ],
493 | "source": [
494 | "cd .."
495 | ]
496 | },
497 | {
498 | "cell_type": "markdown",
499 | "metadata": {},
500 | "source": [
501 | "#### 6.2 MODIFY THE DOCKER IMAGE AND TASK NAME WITH YOURS, then run the code in the following cell to write the full JSON document that we just reviewed to doughnut-task-definition.json."
502 | ]
503 | },
504 | {
505 | "cell_type": "code",
506 | "execution_count": 7,
507 | "metadata": {
508 | "collapsed": false
509 | },
510 | "outputs": [
511 | {
512 | "name": "stdout",
513 | "output_type": "stream",
514 | "text": [
515 | "Overwriting doughnut-task-definition.json\n"
516 | ]
517 | }
518 | ],
519 | "source": [
520 | "%%writefile doughnut-task-definition.json\n",
521 | "{\n",
522 | " \"inputPortDescriptors\": [{\n",
523 | " \"required\": true,\n",
524 | " \"description\": \"Directory containing a raster.\",\n",
525 | " \"name\": \"input_raster\",\n",
526 | " \"type\": \"directory\"\n",
527 | " }, {\n",
528 | " \"required\": true,\n",
529 | " \"description\": \"Directory containing a shapefile\",\n",
530 | " \"name\": \"input_shapefile\",\n",
531 | " \"type\": \"directory\"\n",
532 | " }, {\n",
533 | " \"required\": true,\n",
534 | " \"description\": \"Which part of raster to retain when clipped. Options are 'doughnut' and 'doughnut-hole'.\",\n",
535 | " \"name\": \"clip_selection\",\n",
536 | " \"type\": \"string\"\n",
537 | " }],\n",
538 | " \"outputPortDescriptors\": [{\n",
539 | " \"required\": true,\n",
540 | " \"description\": \"A cropped tif.\",\n",
541 | " \"name\": \"data_out\",\n",
542 | " \"type\": \"directory\"\n",
543 | " }],\n",
544 | " \"containerDescriptors\": [{\n",
545 | " \"type\": \"DOCKER\",\n",
546 | " \"command\": \"\",\n",
547 | " \"properties\": {\n",
548 | " \"image\": \"gbdxtrainer/doughnut_docker:latest\"\n",
549 | " }\n",
550 | " }],\n",
551 | " \"description\": \"Clips a raster to shapefile, clip selection can be inverted.\",\n",
552 | " \"name\": \"doughnut_clip_gt\",\n",
553 | " \"version\": \"1.0.3\",\n",
554 | " \"properties\": {\n",
555 | " \"isPublic\": false,\n",
556 | " \"timeout\": 36000\n",
557 | " }\n",
558 | "}"
559 | ]
560 | },
561 | {
562 | "cell_type": "markdown",
563 | "metadata": {},
564 | "source": [
565 | "___\n",
566 | "## 7. Register Task\n",
567 | "All of the pieces are in place to register the new Doughnut Task to the Platform using gbdxtools. \n",
568 | "\n",
569 | "#### 7.1 Fill in your your GBDX username, password, client ID and client secret in the following cell. This information can be found under your Profile information at https://gbdx.geobigdata.io/profile. If you have a GBDX config file, you can uncomment and use the first two lines of code to authenticate into GBDX."
570 | ]
571 | },
572 | {
573 | "cell_type": "code",
574 | "execution_count": 8,
575 | "metadata": {
576 | "collapsed": true
577 | },
578 | "outputs": [],
579 | "source": [
580 | "# from gbdxtools import Interface\n",
581 | "# gbdx = Interface()\n",
582 | "\n",
583 | "import gbdxtools\n",
584 | "gbdx = gbdxtools.Interface(\n",
585 | " username='',\n",
586 | " password='',\n",
587 | " client_id='',\n",
588 | " client_secret='')"
589 | ]
590 | },
591 | {
592 | "cell_type": "markdown",
593 | "metadata": {},
594 | "source": [
595 | "#### 7.2 Run the code in the following cell to submit your Task to the Task registry."
596 | ]
597 | },
598 | {
599 | "cell_type": "code",
600 | "execution_count": 9,
601 | "metadata": {
602 | "collapsed": false
603 | },
604 | "outputs": [
605 | {
606 | "data": {
607 | "text/plain": [
608 | "u'doughnut_clip_gt:1.0.3 has been submitted for registration.'"
609 | ]
610 | },
611 | "execution_count": 9,
612 | "metadata": {},
613 | "output_type": "execute_result"
614 | }
615 | ],
616 | "source": [
617 | "gbdx.task_registry.register(json_filename = 'doughnut-task-definition.json')"
618 | ]
619 | },
620 | {
621 | "cell_type": "markdown",
622 | "metadata": {},
623 | "source": [
624 | "#### 7.3 Wait a few minutes, then see if the Task registration has completed by runing the code in the following cell to create an instance of your Task. FIRST REPLACE 'gt' IN THE TASK NAME WITH YOUR INITIALS. "
625 | ]
626 | },
627 | {
628 | "cell_type": "code",
629 | "execution_count": 10,
630 | "metadata": {
631 | "collapsed": false
632 | },
633 | "outputs": [],
634 | "source": [
635 | "doughnut_task = gbdx.Task(\"doughnut_clip_gt\")"
636 | ]
637 | },
638 | {
639 | "cell_type": "markdown",
640 | "metadata": {},
641 | "source": [
642 | "___\n",
643 | "## 8. Workflow\n",
644 | "The last step is to test the Doughnut Task in a Workflow with gbdxtools. This Workflow is identical to the Clip Task Workflow from before, but using the doughnut task registered name and clip selection parameter. \n",
645 | "\n",
646 | "#### 8.1 Run the code in the following cell to execute a Workflow using the new Doughnut Task. FIRST REPLACE 'gt' IN THE TASK NAME WITH YOUR INITIALS."
647 | ]
648 | },
649 | {
650 | "cell_type": "code",
651 | "execution_count": 11,
652 | "metadata": {
653 | "collapsed": false
654 | },
655 | "outputs": [
656 | {
657 | "name": "stdout",
658 | "output_type": "stream",
659 | "text": [
660 | "4629530075806175553\n"
661 | ]
662 | }
663 | ],
664 | "source": [
665 | "# define the S3 path for an image by passing in its Catalog ID \n",
666 | "source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')\n",
667 | "\n",
668 | "# define an input shapefile from S3\n",
669 | "shape_path = 's3://tutorial-files/this_shp_will_clip_10400100245B7800/'\n",
670 | "\n",
671 | "# define the 'AOP_Strip_Processor' \n",
672 | "aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3, enable_pansharpen=True)\n",
673 | "\n",
674 | "# define the 'gdal_cli' Task\n",
675 | "glue_task = gbdx.Task('gdal-cli', data=aop_task.outputs.data.value, execution_strategy='runonce',\n",
676 | " command=\"\"\"mv $indir/*/*.tif $outdir/\"\"\")\n",
677 | "\n",
678 | "# define the 'clip_raster' Task \n",
679 | "doughnut_task = gbdx.Task(\"doughnut_clip_gt\", input_raster=glue_task.outputs.data.value, input_shapefile=shape_path, clip_selection=\"doughnut\")\n",
680 | "\n",
681 | "# build a Workflow to run the 'clip_raster' Task\n",
682 | "workflow = gbdx.Workflow([aop_task, glue_task, doughnut_task])\n",
683 | "\n",
684 | "# specify where to save the output within your customer bucket\n",
685 | "workflow.savedata(doughnut_task.outputs.data_out, location='task_demo/doughnut')\n",
686 | "\n",
687 | "# kick off the Workflow and keep track of the Workflow ID\n",
688 | "workflow.execute()\n",
689 | "print workflow.id"
690 | ]
691 | },
692 | {
693 | "cell_type": "markdown",
694 | "metadata": {},
695 | "source": [
696 | "GBDX is now running your Workflow. While the Workflow is running, you can interact with the Workflow object and track its status. \n",
697 | "\n",
698 | "#### 8.2 Run the code in the following cell to get the status of the Workflow. This call will return the the status of whatever event is currently underway."
699 | ]
700 | },
701 | {
702 | "cell_type": "code",
703 | "execution_count": 15,
704 | "metadata": {
705 | "collapsed": false
706 | },
707 | "outputs": [
708 | {
709 | "data": {
710 | "text/plain": [
711 | "{u'event': u'succeeded', u'state': u'complete'}"
712 | ]
713 | },
714 | "execution_count": 15,
715 | "metadata": {},
716 | "output_type": "execute_result"
717 | }
718 | ],
719 | "source": [
720 | "workflow.status"
721 | ]
722 | },
723 | {
724 | "cell_type": "markdown",
725 | "metadata": {},
726 | "source": [
727 | "Once your Workflow has completed (and succeeded!), you will be able to see the output in your customer S3 bucket.\n",
728 | "\n",
729 | "NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO THE S3 BROWSER \n",
730 | "\n",
731 | "#### 8.3 Log into the S3 browser [http://s3browser.geobigdata.io](http://s3browser.geobigdata.io/login.html) using your GBDX credentials. \n",
732 | "\n",
733 | "#### 8.4 Navigate to 'task_demo/doughnut' to find the saved output of your Workflow. "
734 | ]
735 | },
736 | {
737 | "cell_type": "markdown",
738 | "metadata": {},
739 | "source": [
740 | "The end!"
741 | ]
742 | }
743 | ],
744 | "metadata": {
745 | "anaconda-cloud": {},
746 | "kernelspec": {
747 | "display_name": "Python [conda root]",
748 | "language": "python",
749 | "name": "conda-root-py"
750 | },
751 | "language_info": {
752 | "codemirror_mode": {
753 | "name": "ipython",
754 | "version": 2
755 | },
756 | "file_extension": ".py",
757 | "mimetype": "text/x-python",
758 | "name": "python",
759 | "nbconvert_exporter": "python",
760 | "pygments_lexer": "ipython2",
761 | "version": "2.7.12"
762 | }
763 | },
764 | "nbformat": 4,
765 | "nbformat_minor": 1
766 | }
767 |
--------------------------------------------------------------------------------
/custom_task_module/custom-task-tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Intro\n",
8 | "\n",
9 | "Users can and are encouraged to build their own custom analytic capability, what we call a 'Task', to use on GBDX. This process starts with packaging up your code, along with the libraries and dependencies needed to run it, in a Docker, and pushing it to a Docker Hub repository. You can then register a definition of your Task - which includes the Docker reference, expected inputs and outputs, etc - to GBDX. From there, it is simply a matter of asking the GBDX Workflow system to execute your Task, in combination with any other desired Tasks, typically via Postman or gbdxtools. The steps for converting your analytic capability to a Task that runs on GBDX are as follows:\n",
10 | "\n",
11 | "1. [MODIFY](#1.-Modify-code-inputs-and-outputs) the inputs and outputs of your code to align with expected Platform inputs and outputs\n",
12 | "2. [DOCKERFILE](#2.-Dockerfile) - write a set of instructions that will build your Docker\n",
13 | "3. [BUILD and RUN](#3.-Build-and-run-your-Docker) your Docker \n",
14 | "4. [PUSH DOCKER](#4.-Push-your-Docker-to-Docker-Hub) to Docker Hub \n",
15 | "5. [GBDX COLLABORATORS](#5.-Add-GBDX-collaborators-to-your-Docker-Hub-repository) - give GBDX access to run your Docker\n",
16 | "6. [TASK DEFINITION](#6.-Task-definitition) - write a JSON Task definition that describes and defines your Task\n",
17 | "7. [REGISTER TASK](#7.-Register-Task) to the Platform using your JSON Task definition\n",
18 | "8. [WORKFLOW](#8.-Workflow) - test your Task by executing within the Workflow system on the Platform\n",
19 | "9. [USING THE GPU](#8.-Workflow) - how to use GPUs in GBDX Tasks\n",
20 | "___"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {
26 | "slideshow": {
27 | "slide_type": "-"
28 | }
29 | },
30 | "source": [
31 | "### File and naming conventions within this tutorial\n",
32 | "Before starting, it's helpful to establish file names, port names, etc that you'll use throughout the tutorial. \n",
33 | "\n",
34 | "FILE NAMES
\n",
35 | "\n",
36 | "filename|description\n",
37 | "--------|-----------\n",
38 | "clip_raster_task.py|example analysis code that you'll Dockerize and put on GBDX as a Task\n",
39 | "Dockerfile|instructions to build a Docker that contains the Task code and its dependencies\n",
40 | "clip-raster-definition.json|defines the Task name, inputs, outputs, etc, and will be used to register the Task to GBDX\n",
41 | "gbdxtrainer/clip_raster_docker|Docker Hub username and repository name, __replace 'gbdxtrainer' with your username__\n",
42 | "clip_raster_gt|the name of your Task when you register it on GBDX, __replace 'gt' with your initials__\n",
43 | "\n",
44 | "PORT NAMES\n",
45 | "\n",
46 | "Input and output ports are how GBDX passes data in and out of your Task when it's executed. The developer defines the port names. When you write the code that does the actual analysis (in whichever code you're writing it in), this is the Task, you'll point to these port names in the code for input and ouput data. When you execute the Task within a Workflow (using gbdxtools), you use these port names to specify the data that should be passed into and out of the Task/Docker. Here are the port names you'll use in this tutorial.\n",
47 | "\n",
48 | "port name within the Task code|what it is|in use with gbdxtools\n",
49 | "---------|----------|------\n",
50 | "mnt/work/input/__input_shapefile__|input port name for a shapefile|clip_task = gbdx.Task('clip_raster_gt', __input_shapefile__='s3://.../a_shapefile, ...)'
\n",
51 | "mnt/work/input/__input_raster__|input port name for a raster|clip_task = gbdx.Task('clip_raster_gt', __input_raster__='s3://.../a_raster, ...)
\n",
52 | "mnt/work/output/__data_out__|output port name for output data|next_task = gbdx.Task('another_task', input_data=clip_task.outputs.__data_out__.value)
"
53 | ]
54 | },
55 | {
56 | "cell_type": "markdown",
57 | "metadata": {},
58 | "source": [
59 | "### Directory structure\n",
60 | "Let's also establish a directory structure that supports and simplifies the steps you'll take in this tutorial. By the end, your directory structure should look something like this:\n",
61 | " \n",
62 | " /\n",
63 | " └── Notebooks (or wherever you keep this IPython Notebook)\n",
64 | " ├── custom_task_tutorial.pynb (this notebook)\n",
65 | " └── task_tutorial_files/ \n",
66 | " ├── clip-raster-definition.json\n",
67 | " └── docker_projects/\n",
68 | " ├── Dockerfile\n",
69 | " └── bin/\n",
70 | " └── clip_raster_task.py\n",
71 | "\n",
72 | "___"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "metadata": {},
78 | "source": [
79 | "## 1. Modify code inputs and outputs\n",
80 | "\n",
81 | "The example Task you're going to put on GBDX during this tutorial simply takes a shapefile and raster as input, clips the raster to the shapefile, and writes out the clipped raster. We'll walk through what the code does, step by step. *The important thing to note is how we define inputs and outputs in this script to work on GBDX.*\n",
82 | "\n",
83 | ">First, import the required libraries. 'Fiona' and 'rasterio' for working with raster and vector data, 'os' and 'glob' for file management.\n",
84 | "\n",
85 | "```python\n",
86 | "import fiona\n",
87 | "import rasterio\n",
88 | "import rasterio.mask\n",
89 | "import os\n",
90 | "import glob\n",
91 | "```"
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | ">Set the input port paths. When GBDX spins up a Task, it creates the directory 'mnt/work/input', fetches the required input from the S3 location specified in the call to this Task, and copies it to the 'mnt/work/input' directory. Your code will need to point to this path for its input data. \n",
99 | "\n",
100 | "```python\n",
101 | "in_path = '/mnt/work/input'\n",
102 | "shape_path = in_path + '/input_shapefile'\n",
103 | "raster_path = in_path + '/input_raster'\n",
104 | "```"
105 | ]
106 | },
107 | {
108 | "cell_type": "markdown",
109 | "metadata": {},
110 | "source": [
111 | ">Grab the shapefile and raster from the input port filepath that you just defined. \n",
112 | "\n",
113 | "```python\n",
114 | "my_shape = glob.glob1(shape_path, '*.shp')[0]\n",
115 | "my_raster = glob.glob1(raster_path, '*.tif')[0]\n",
116 | "```"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | ">Define the output data port to write out the cropped tif. Similar to the input port, there is a standard filepath convention you need to follow, '`/mnt/work/output`'. There is only one output for this Task, which we are calling '`/data_out`'.\n",
124 | "\n",
125 | "```python\n",
126 | "out_path = '/mnt/work/output/data_out'\n",
127 | "```"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | ">While the input path is created by GBDX during Task execution, you'll need to create the output path and data port in the code. Create the output path/port and navigate to this directory.\n",
135 | "\n",
136 | "```python\n",
137 | "if os.path.exists(out_path) == False:\n",
138 | " os.makedirs(out_path)\n",
139 | "os.chdir(out_path)\n",
140 | "```"
141 | ]
142 | },
143 | {
144 | "cell_type": "markdown",
145 | "metadata": {},
146 | "source": [
147 | ">Open the input shapefile and get the polygon features for clipping.\n",
148 | "\n",
149 | "```python\n",
150 | "with fiona.open(os.path.join(shape_path, my_shape), \"r\") as shapefile:\n",
151 | " features = [feature[\"geometry\"] for feature in shapefile]\n",
152 | "```"
153 | ]
154 | },
155 | {
156 | "cell_type": "markdown",
157 | "metadata": {},
158 | "source": [
159 | ">Open the input raster, clip the raster with the shapefile and get the raster metadata. \n",
160 | "\n",
161 | "```python\n",
162 | "with rasterio.open(os.path.join(raster_path, my_raster)) as src:\n",
163 | " out_raster, out_transform = rasterio.mask.mask(src, features, crop=True)\n",
164 | " out_meta = src.meta.copy()\n",
165 | "```"
166 | ]
167 | },
168 | {
169 | "cell_type": "markdown",
170 | "metadata": {},
171 | "source": [
172 | ">Write out the metadata to the cropped raster.\n",
173 | "\n",
174 | "```python\n",
175 | "out_meta.update({\"driver\": \"GTiff\",\n",
176 | " \"height\": out_raster.shape[1],\n",
177 | " \"width\": out_raster.shape[2],\n",
178 | " \"transform\": out_transform})\n",
179 | "```"
180 | ]
181 | },
182 | {
183 | "cell_type": "markdown",
184 | "metadata": {},
185 | "source": [
186 | ">Write out the output image\n",
187 | "\n",
188 | "```python\n",
189 | "with rasterio.open(\"masked.tif\", \"w\", **out_meta) as dest:\n",
190 | " dest.write(out_raster)\n",
191 | "```"
192 | ]
193 | },
194 | {
195 | "cell_type": "markdown",
196 | "metadata": {},
197 | "source": [
198 | ">Optionally, write out a status file at code completion to give the user more feedback.\n",
199 | "\n",
200 | "```python\n",
201 | "status = {}\n",
202 | "status['status'] = 'Success'\n",
203 | "status['reason'] = \"===== Task successfully completed ======\"\n",
204 | "\n",
205 | "with open('/mnt/work/status.json', 'w') as statusfile:\n",
206 | " json.dump(status,statusfile)\n",
207 | "```"
208 | ]
209 | },
210 | {
211 | "cell_type": "markdown",
212 | "metadata": {},
213 | "source": [
214 | "Now that we've walked through what the example Task does and how to define its inputs and outputs in a way that GBDX recognizes, write this script to a working directory that we're going to call `/task_tutorial_files/docker_projects/bin`."
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "metadata": {},
220 | "source": [
221 | "#### 1.1 Run the code in the following cell to create the 'task_tutorial_files/docker_projects/bin' directory."
222 | ]
223 | },
224 | {
225 | "cell_type": "code",
226 | "execution_count": null,
227 | "metadata": {},
228 | "outputs": [],
229 | "source": [
230 | "import os\n",
231 | "if os.path.exists('task_tutorial_files') == False:\n",
232 | " os.makedirs('task_tutorial_files/docker_projects/bin')"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "#### 1.2 Run the code in the following cell to navigate to the directory you just created."
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": null,
245 | "metadata": {},
246 | "outputs": [],
247 | "source": [
248 | "cd task_tutorial_files/docker_projects/bin"
249 | ]
250 | },
251 | {
252 | "cell_type": "markdown",
253 | "metadata": {},
254 | "source": [
255 | "#### 1.3 Run the code in the following cell to write the code that we just reviewed to 'clip_raster_task.py'."
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": null,
261 | "metadata": {},
262 | "outputs": [],
263 | "source": [
264 | "%%writefile clip_raster_task.py\n",
265 | "\n",
266 | "import fiona\n",
267 | "import rasterio\n",
268 | "import rasterio.mask\n",
269 | "import os\n",
270 | "import glob\n",
271 | "\n",
272 | "# set the input ports path\n",
273 | "in_path = '/mnt/work/input'\n",
274 | "shape_path = in_path + '/input_shapefile'\n",
275 | "raster_path = in_path + '/input_raster'\n",
276 | "\n",
277 | "# search the input shapefile port for the first shapefile that we specify in the call to this task\n",
278 | "my_shape = glob.glob1(shape_path, '*.shp')[0]\n",
279 | "\n",
280 | "# search the input image port for the first geotiff that we specify in the call to this task\n",
281 | "my_raster = glob.glob1(raster_path, '*.tif')[0]\n",
282 | "\n",
283 | "# define the name of the output data port\n",
284 | "out_path = '/mnt/work/output/data_out'\n",
285 | "\n",
286 | "# create the output data port\n",
287 | "if os.path.exists(out_path) == False:\n",
288 | " os.makedirs(out_path)\n",
289 | "\n",
290 | "# change directories to the output data port\n",
291 | "os.chdir(out_path)\n",
292 | "\n",
293 | "# open the input shapefile and get the polygon features for clipping\n",
294 | "with fiona.open(os.path.join(shape_path, my_shape), \"r\") as shapefile:\n",
295 | " features = [feature[\"geometry\"] for feature in shapefile]\n",
296 | "\n",
297 | "# open the input image, clip the image with the shapefile and get the image metadata\n",
298 | "with rasterio.open(os.path.join(raster_path, my_raster)) as src:\n",
299 | " out_raster, out_transform = rasterio.mask.mask(src, features, crop=True)\n",
300 | " out_meta = src.meta.copy()\n",
301 | "\n",
302 | "# write out the metadata to the raster\n",
303 | "out_meta.update({\"driver\": \"GTiff\",\n",
304 | " \"height\": out_raster.shape[1],\n",
305 | " \"width\": out_raster.shape[2],\n",
306 | " \"transform\": out_transform})\n",
307 | "\n",
308 | "# write out the output raster\n",
309 | "with rasterio.open(\"masked.tif\", \"w\", **out_meta) as dest:\n",
310 | " dest.write(out_raster)"
311 | ]
312 | },
313 | {
314 | "cell_type": "markdown",
315 | "metadata": {},
316 | "source": [
317 | "#### 1.4 Run the code in the following cell to check that the file clip_raster_task.py exists."
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": null,
323 | "metadata": {},
324 | "outputs": [],
325 | "source": [
326 | "ls"
327 | ]
328 | },
329 | {
330 | "cell_type": "markdown",
331 | "metadata": {},
332 | "source": [
333 | "___\n",
334 | "## 2. Dockerfile\n",
335 | "\n",
336 | "A Dockerfile is a set of instructions to package up your Task code, along with the libraries and dependencies needed to run that code, into a lightweight, portable Docker. \n",
337 | "\n",
338 | "Before writing the Dockerfile, let's walk through what it's building.\n",
339 | "\n",
340 | ">The first line of code in a Dockerfile will typically pull an image from Docker Hub that builds an operating system, and this will serve as a foundation for the rest of the build. There are several base Docker images provided by the Docker community that are already pre-configured for certain programming applications. The first line of code in our Dockerfile pulls the base image `continuumio/miniconda` - a lightweight Docker image configured for Python development.\n",
341 | "\n",
342 | "```\n",
343 | "FROM continuumio/miniconda\n",
344 | "```"
345 | ]
346 | },
347 | {
348 | "cell_type": "markdown",
349 | "metadata": {},
350 | "source": [
351 | ">The following two lines of code install the geoprocessing libraries we need to run the Task code.\n",
352 | "\n",
353 | "```\n",
354 | "RUN conda install rasterio\n",
355 | "RUN conda install fiona\n",
356 | "```"
357 | ]
358 | },
359 | {
360 | "cell_type": "markdown",
361 | "metadata": {},
362 | "source": [
363 | ">Create a directory inside the Docker called `/my_scripts`.\n",
364 | "\n",
365 | "```\n",
366 | "RUN mkdir /my_scripts\n",
367 | "```"
368 | ]
369 | },
370 | {
371 | "cell_type": "markdown",
372 | "metadata": {},
373 | "source": [
374 | ">Copy the contents of the local directory `./bin` into the Docker directory `/my_scripts`. (Remember, /bin is where you just wrote 'clip_raster_task.py')\n",
375 | "\n",
376 | "```\n",
377 | "ADD ./bin /my_scripts\n",
378 | "```\n",
379 | "\n",
380 | ">Finally, add a command that executes 'clip_raster_task.py', within the Docker during runtime. \n",
381 | "\n",
382 | "```\n",
383 | "CMD python /my_scripts/clip_raster_task.py\n",
384 | "```"
385 | ]
386 | },
387 | {
388 | "cell_type": "markdown",
389 | "metadata": {},
390 | "source": [
391 | "You might be wondering, why are we adding the Task code to the Docker, but not any data? Remember, we built the input ports with the path `mnt/work/input`. Later on when you use your Task within a Workflow, you will specify the S3 location of the data you want to analyze, and GBDX will go fetch the data and plug it into the Docker via the input ports. \n",
392 | "\n",
393 | "Now that we've covered what a Dockerfile does, let's go ahead and write the Dockerfile. As a best practice, we like to keep the Dockerfile separate from the Task code.\n",
394 | "\n",
395 | "#### 2.1 Run the code in the following cell to navigate back one folder to /docker_projects."
396 | ]
397 | },
398 | {
399 | "cell_type": "code",
400 | "execution_count": null,
401 | "metadata": {},
402 | "outputs": [],
403 | "source": [
404 | "cd .."
405 | ]
406 | },
407 | {
408 | "cell_type": "markdown",
409 | "metadata": {},
410 | "source": [
411 | "#### 2.2 Run the code in the following cell to write the Docker instructions we just reviewed to 'Dockerfile' (no extension). "
412 | ]
413 | },
414 | {
415 | "cell_type": "code",
416 | "execution_count": null,
417 | "metadata": {},
418 | "outputs": [],
419 | "source": [
420 | "%%writefile Dockerfile\n",
421 | "FROM continuumio/miniconda\n",
422 | "\n",
423 | "RUN conda install rasterio\n",
424 | "RUN conda install fiona\n",
425 | "\n",
426 | "RUN mkdir /my_scripts\n",
427 | "ADD ./bin /my_scripts\n",
428 | "CMD python /my_scripts/clip_raster_task.py"
429 | ]
430 | },
431 | {
432 | "cell_type": "markdown",
433 | "metadata": {},
434 | "source": [
435 | "#### 2.3 Run the code in the following cell to check that the file wrote as expected."
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": null,
441 | "metadata": {},
442 | "outputs": [],
443 | "source": [
444 | "ls"
445 | ]
446 | },
447 | {
448 | "cell_type": "markdown",
449 | "metadata": {},
450 | "source": [
451 | "___\n",
452 | "## 3. Build and run your Docker\n",
453 | "NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO DOCKER\n",
454 | "\n",
455 | "You've written the Dockerfile that contains instructions to build a Docker, the next step is to actually build it. Docker needs to be installed on your computer to complete this section. \n",
456 | "\n",
457 | "#### 3.1 Bring up a terminal (Mac) or cmd (Windows) window on your computer, copy and paste the following line of code to see if Docker is running.\n",
458 | "\n",
459 | "```\n",
460 | "docker --version\n",
461 | "```"
462 | ]
463 | },
464 | {
465 | "cell_type": "markdown",
466 | "metadata": {},
467 | "source": [
468 | "You should receive an output similar to this: \n",
469 | "```\n",
470 | "Docker version 1.13.0, build 49bf474\n",
471 | "```"
472 | ]
473 | },
474 | {
475 | "cell_type": "markdown",
476 | "metadata": {},
477 | "source": [
478 | "#### 3.2 Within the terminal/cmd, navigate to the folder containing the Dockerfile you wrote in the previous section. It should be located somewhere such as 'user/your name/Notebooks/task_tutorial_files/docker_projects'. \n",
479 | "(\\*tip - you can type 'pwd' on a Mac ('cd' on Windows) to see your current directory, then use `'cd ..'` to navigate one folder back and `'cd '` to navigate into that directory) \n",
480 | "\n",
481 | "```\n",
482 | "cd /task_tutorial_files/docker_projects\n",
483 | "```"
484 | ]
485 | },
486 | {
487 | "cell_type": "markdown",
488 | "metadata": {},
489 | "source": [
490 | "#### 3.3 Copy and paste the following Docker command to build a Docker from your Dockerfile, but __FIRST REPLACE 'gbdxtrainer' WITH YOUR DOCKER USERNAME AND 'gt' WITH YOUR INITIALS__. \n",
491 | "(\\*The `-t` option allows you to name the Docker for easy reference. The `.` at the end of the command is so that it looks for your Dockerfile in the current working directory.) \n",
492 | "\n",
493 | "```\n",
494 | "docker build -t gbdxtrainer/clip_raster_docker .\n",
495 | "```"
496 | ]
497 | },
498 | {
499 | "cell_type": "markdown",
500 | "metadata": {},
501 | "source": [
502 | "#### 3.4 Copy and paste the following Docker command to list the Docker image you just built.\n",
503 | "\n",
504 | "```\n",
505 | "docker images\n",
506 | "```"
507 | ]
508 | },
509 | {
510 | "cell_type": "markdown",
511 | "metadata": {},
512 | "source": [
513 | "The output should look something like this, but with your Docker username and initials:\n",
514 | "\n",
515 | "```\n",
516 | "REPOSITORY TAG IMAGE ID CREATED SIZE\n",
517 | "gbdxtrainer/clip_raster_docker latest dfc953879205 2 minutes ago 1.58 GB\n",
518 | "```"
519 | ]
520 | },
521 | {
522 | "cell_type": "markdown",
523 | "metadata": {},
524 | "source": [
525 | "Now that you've built a Docker image, you can run a Docker container (a runtime instance of the image) and poke around inside of it. \n",
526 | "\n",
527 | "#### 3.5 Copy and paste the following Docker command, __BUT WITH YOUR USERNAME AND INITIALS__. \n",
528 | "(\\*The `-it` option allows you to run the container in interactive mode with the `bash` prompt running, and the `--rm` option removes the container once you're done poking around so that it's not taking up disk space.)\n",
529 | "\n",
530 | "```\n",
531 | "docker run -it --rm gbdxtrainer/clip_raster_docker bash\n",
532 | "```"
533 | ]
534 | },
535 | {
536 | "cell_type": "markdown",
537 | "metadata": {},
538 | "source": [
539 | "It will be obvious if you're inside the container because you're terminal/cmd prompt will look something like this:\n",
540 | "```\n",
541 | "root@b1b71e42372d:/# \n",
542 | "```"
543 | ]
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "metadata": {},
548 | "source": [
549 | "#### 3.6 You are now at the root directory of your Docker container. Copy and paste the following command to list the directories within your Docker container. "
550 | ]
551 | },
552 | {
553 | "cell_type": "markdown",
554 | "metadata": {},
555 | "source": [
556 | "```\n",
557 | "ls\n",
558 | "```"
559 | ]
560 | },
561 | {
562 | "cell_type": "markdown",
563 | "metadata": {},
564 | "source": [
565 | "You should see something that looks like this:\n",
566 | "```\n",
567 | "bin boot dev\tetc home lib\tlib64 media mnt my_scripts opt proc root\trun sbin srv\t...\n",
568 | "```"
569 | ]
570 | },
571 | {
572 | "cell_type": "markdown",
573 | "metadata": {},
574 | "source": [
575 | "#### 3.7 Note, there is a 'my_scripts' directory. Copy and paste the following command to navigate into the 'my_scripts' directory.\n",
576 | "```\n",
577 | "cd my_scripts\n",
578 | "```\n",
579 | "#### 3.8 You are now inside the directory that you specified in the Dockerfile. Copy and paste the following command to see the `'clip_raster_task.py'` file that you placed there when you wrote the Dockerfile. \n",
580 | " \n",
581 | "```\n",
582 | "ls\n",
583 | "```"
584 | ]
585 | },
586 | {
587 | "cell_type": "markdown",
588 | "metadata": {},
589 | "source": [
590 | "If you ran this script now, it would fail because there is no input data. When GBDX executes your Task, it will copy input data from S3 and plug it into the appropriate input ports. In a future tutorial, we'll cover how to test the script with input data mounted locally. \n",
591 | "\n",
592 | "#### 3.9 Quit the container using the following command.\n",
593 | "```\n",
594 | "exit\n",
595 | "```\n",
596 | "___"
597 | ]
598 | },
599 | {
600 | "cell_type": "markdown",
601 | "metadata": {},
602 | "source": [
603 | "## 4. Push your Docker to Docker Hub\n",
604 | "At this point, the Docker image you just created only exists on your machine. For GBDX to access it, the Docker image needs to be available on Docker Hub.\n",
605 | "\n",
606 | "#### 4.1 While still within the terminal/cmd, log in to Docker Hub using the following Docker command USING YOUR DOCKER HUB LOGIN CREDENTIALS. \n",
607 | "```\n",
608 | "docker login --username gbdxtrainer --password a_fake_password\n",
609 | "```"
610 | ]
611 | },
612 | {
613 | "cell_type": "markdown",
614 | "metadata": {},
615 | "source": [
616 | "#### 4.2 Once logged in, use the following Docker command to push your Docker image to Docker Hub, CHANGE TO YOUR DOCKER USERNAME AND INITIALS. Note: this might a few minutes.\n",
617 | "```\n",
618 | "docker push gbdxtrainer/clip_raster_docker\n",
619 | "```\n",
620 | "___"
621 | ]
622 | },
623 | {
624 | "cell_type": "markdown",
625 | "metadata": {},
626 | "source": [
627 | "## 5. Add GBDX collaborators to your Docker Hub repository\n",
628 | "Your Docker repository on Docker Hub can be public or private, but certain GBDX collaborators must be added to the repository in order for the Platform to pull and run the Docker. \n",
629 | "\n",
630 | "#### 5.1 Log in to Docker Hub https://hub.docker.com/"
631 | ]
632 | },
633 | {
634 | "cell_type": "markdown",
635 | "metadata": {},
636 | "source": [
637 | "You should now see the Docker image that you just pushed to Docker Hub, in it's own repository of the same name. \n",
638 | "\n",
639 | "#### 5.2 Open the repository and select the 'Collaborators' tab. Under 'Username', enter each of the following as Collaborators to your repository. This is what will allow GBDX to pull and execute your Task. \n",
640 | "```\n",
641 | "tdgpbuild\n",
642 | "tdgpdeploy\n",
643 | "tdgplatform\n",
644 | "```\n",
645 | "___"
646 | ]
647 | },
648 | {
649 | "cell_type": "markdown",
650 | "metadata": {},
651 | "source": [
652 | "## 6. Task definitition \n",
653 | "NOTE: WE'RE BACK TO THE JUPYTER NOTEBOOK FOR THE REST OF THE TUTORIAL\n",
654 | "\n",
655 | "Tasks must be registered with the Task Registry before they can be used in a Workflow. In this next step, you'll write a JSON document that describes and defines your Task according to a standard schema, and later use this JSON document to submit the Task to the Task Registry. Let's first walk through the components of a Task definition schema. \n",
656 | "\n",
657 | ">Define the input ports with the name you gave them in the Task code, which were named *input_raster* and *input_shapefile*. Indicate if the input port must be specified for the Task to run, include a human readable description of the port, and specify the input port type. (Don't worry about this too much now, but the two input port types are 'string' and 'directory'. String ports are typically used to pass in parameters, directory ports are for file-based data.) Specify the raster and shapefile input ports as type 'directory'.\n",
658 | "\n",
659 | "```json\n",
660 | "{\n",
661 | " \"inputPortDescriptors\": [{\n",
662 | " \"required\": true,\n",
663 | " \"description\": \"Directory containing a raster.\",\n",
664 | " \"name\": \"input_raster\",\n",
665 | " \"type\": \"directory\"\n",
666 | " }, {\n",
667 | " \"required\": true,\n",
668 | " \"description\": \"Directory containing a shapefile\",\n",
669 | " \"name\": \"input_shapefile\",\n",
670 | " \"type\": \"directory\"\n",
671 | " }],\n",
672 | "```"
673 | ]
674 | },
675 | {
676 | "cell_type": "markdown",
677 | "metadata": {},
678 | "source": [
679 | ">Define the output port similarly, indicating if it's required, a description, the name you gave it *data_out*, and port type. \n",
680 | "\n",
681 | "```json\n",
682 | " \"outputPortDescriptors\": [{\n",
683 | " \"required\": true,\n",
684 | " \"description\": \"A cropped tif.\",\n",
685 | " \"name\": \"data_out\",\n",
686 | " \"type\": \"directory\"\n",
687 | " }],\n",
688 | "```"
689 | ]
690 | },
691 | {
692 | "cell_type": "markdown",
693 | "metadata": {},
694 | "source": [
695 | ">Then tell GBDX where to find your Dockerized code on Docker Hub. Specify the type of container, DOCKER, and the full name of the Docker image, which YOU NEED TO CHANGE TO YOUR DOCKER HUB USERNAME AND REPOSITORY NAME. Include `':latest'` to pull the latest version of your Docker image. \n",
696 | "\n",
697 | "```json\n",
698 | " \"containerDescriptors\": [{\n",
699 | " \"type\": \"DOCKER\",\n",
700 | " \"command\": \"\",\n",
701 | " \"properties\": {\n",
702 | " \"image\": \"gbdxtrainer/clip_raster_docker_gt:latest\"\n",
703 | " }\n",
704 | " }],\n",
705 | "```"
706 | ]
707 | },
708 | {
709 | "cell_type": "markdown",
710 | "metadata": {},
711 | "source": [
712 | ">Finally, include a description of the Task itself, the Task name as it will appear in the Task registry, a version number, indicate if it will be a public or private Task, and the amount of time the Task will be allowed to run before failing. (Note- if you ever want to re-register a Task, you'll need to increment this version number.) \n",
713 | "\n",
714 | "```json\n",
715 | " \"description\": \"Clips a raster to shapefile.\",\n",
716 | " \"name\": \"clip_raster_gt\",\n",
717 | " \"version\": \"0.0.1\",\n",
718 | " \"taskOwnerEmail\": \"TEST@TEST.com\",\n",
719 | " \"properties\": {\n",
720 | " \"isPublic\": false,\n",
721 | " \"timeout\": 36000\n",
722 | " }\n",
723 | "}\n",
724 | "```\n",
725 | "\n",
726 | "#### 6.1 Now we can get around to actually writing and saving the Task definition. Run the code in the following cell to navigate back one directory (back out of the '/docker_projects' directory to the '/task_tutorial_files' directory)."
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "execution_count": null,
732 | "metadata": {},
733 | "outputs": [],
734 | "source": [
735 | "cd .."
736 | ]
737 | },
738 | {
739 | "cell_type": "markdown",
740 | "metadata": {},
741 | "source": [
742 | "#### 6.2 MODIFY THE TASK NAME WITH YOUR INITIALS, then run the code in the following cell to write the full JSON document that we just reviewed to clip-raster-definition.json."
743 | ]
744 | },
745 | {
746 | "cell_type": "code",
747 | "execution_count": null,
748 | "metadata": {},
749 | "outputs": [],
750 | "source": [
751 | "%%writefile clip-raster-definition.json\n",
752 | "{\n",
753 | " \"inputPortDescriptors\": [{\n",
754 | " \"required\": true,\n",
755 | " \"description\": \"Directory containing a raster.\",\n",
756 | " \"name\": \"input_raster\",\n",
757 | " \"type\": \"directory\"\n",
758 | " }, {\n",
759 | " \"required\": true,\n",
760 | " \"description\": \"Directory containing a shapefile\",\n",
761 | " \"name\": \"input_shapefile\",\n",
762 | " \"type\": \"directory\"\n",
763 | " }],\n",
764 | " \"outputPortDescriptors\": [{\n",
765 | " \"required\": true,\n",
766 | " \"description\": \"A cropped tif.\",\n",
767 | " \"name\": \"data_out\",\n",
768 | " \"type\": \"directory\"\n",
769 | " }],\n",
770 | " \"containerDescriptors\": [{\n",
771 | " \"type\": \"DOCKER\",\n",
772 | " \"command\": \"\",\n",
773 | " \"properties\": {\n",
774 | " \"image\": \"gbdxtrainer/clip_raster_docker_gt:latest\"\n",
775 | " }\n",
776 | " }],\n",
777 | " \"description\": \"Clips a raster to shapefile.\",\n",
778 | " \"name\": \"clip_raster_gt\",\n",
779 | " \"version\": \"0.0.1\",\n",
780 | " \"taskOwnerEmail\": \"TEST@TEST.com\",\n",
781 | " \"properties\": {\n",
782 | " \"isPublic\": false,\n",
783 | " \"timeout\": 36000\n",
784 | " }\n",
785 | "}"
786 | ]
787 | },
788 | {
789 | "cell_type": "markdown",
790 | "metadata": {},
791 | "source": [
792 | "#### 6.3 Run the following cell to check that the file wrote as expected."
793 | ]
794 | },
795 | {
796 | "cell_type": "code",
797 | "execution_count": null,
798 | "metadata": {},
799 | "outputs": [],
800 | "source": [
801 | "ls"
802 | ]
803 | },
804 | {
805 | "cell_type": "markdown",
806 | "metadata": {},
807 | "source": [
808 | "___\n",
809 | "## 7. Register Task\n",
810 | "All of the pieces are in place to register your Task to the Platform. To review, we \n",
811 | "- wrote a Task (some piece of analysis) using input and output ports\n",
812 | "- wrote a Dockerfile with instructions to build a Docker that contains the Task code and its dependencies \n",
813 | "- built a Docker from the Dockerfile\n",
814 | "- pushed the Docker to Docker Hub\n",
815 | "- Added GBDX collaborators to the Task's Docker Hub repository\n",
816 | "- wrote a Task definition \n",
817 | "\n",
818 | "The final step for putting a custom Task on GBDX is to submit the Task to the Task registry, which we can do with gbdxtools. \n",
819 | "\n",
820 | "#### 7.1 Fill in your your GBDX username, password, client ID and client secret in the following cell. This information can be found under your Profile information at https://gbdx.geobigdata.io/profile. If you have a GBDX config file, you can uncomment and use the first two lines of code to authenticate into GBDX."
821 | ]
822 | },
823 | {
824 | "cell_type": "code",
825 | "execution_count": null,
826 | "metadata": {},
827 | "outputs": [],
828 | "source": [
829 | "from gbdxtools import Interface\n",
830 | "gbdx = Interface()\n",
831 | "\n",
832 | "# import gbdxtools\n",
833 | "# gbdx = gbdxtools.Interface(\n",
834 | "# username='',\n",
835 | "# password='',\n",
836 | "# client_id='',\n",
837 | "# client_secret='')"
838 | ]
839 | },
840 | {
841 | "cell_type": "markdown",
842 | "metadata": {},
843 | "source": [
844 | "We point the following gbdxtools call to our Task definition, `clip-raster-definition.json`. \n",
845 | "\n",
846 | "#### 7.2 Run the code in the following cell to submit your Task to the Task registry."
847 | ]
848 | },
849 | {
850 | "cell_type": "code",
851 | "execution_count": null,
852 | "metadata": {},
853 | "outputs": [],
854 | "source": [
855 | "gbdx.task_registry.register(json_filename = 'clip-raster-definition.json')"
856 | ]
857 | },
858 | {
859 | "cell_type": "markdown",
860 | "metadata": {},
861 | "source": [
862 | "It might take a few minutes for your Task to show up in the registry. Once you've checked that your Task is successfully registered, you can use the gbdxtools Task API to interact with your newly created Task. \n",
863 | "\n",
864 | "#### 7.3 Wait a few minutes, then run the code in the following cell to create an instance of your Task. FIRST REPLACE 'gt' IN THE TASK NAME WITH YOUR INITIALS. "
865 | ]
866 | },
867 | {
868 | "cell_type": "code",
869 | "execution_count": null,
870 | "metadata": {},
871 | "outputs": [],
872 | "source": [
873 | "clip_task = gbdx.Task(\"clip_raster_gt\")"
874 | ]
875 | },
876 | {
877 | "cell_type": "markdown",
878 | "metadata": {},
879 | "source": [
880 | "#### 7.4 Run the code in the following cell to interact with the Task object. "
881 | ]
882 | },
883 | {
884 | "cell_type": "code",
885 | "execution_count": null,
886 | "metadata": {},
887 | "outputs": [],
888 | "source": [
889 | "clip_task.definition"
890 | ]
891 | },
892 | {
893 | "cell_type": "markdown",
894 | "metadata": {},
895 | "source": [
896 | "You should see the definition you entered when you registered the task. \n",
897 | "\n",
898 | "#### 7.5 Run the code in the following cell to see input that you specified within the Task definition and the Task code itself. "
899 | ]
900 | },
901 | {
902 | "cell_type": "code",
903 | "execution_count": null,
904 | "metadata": {},
905 | "outputs": [],
906 | "source": [
907 | "clip_task.inputs"
908 | ]
909 | },
910 | {
911 | "cell_type": "markdown",
912 | "metadata": {},
913 | "source": [
914 | "#### 7.6 Run the code in the following cell to see the output ports you specified. "
915 | ]
916 | },
917 | {
918 | "cell_type": "code",
919 | "execution_count": null,
920 | "metadata": {},
921 | "outputs": [],
922 | "source": [
923 | "clip_task.outputs"
924 | ]
925 | },
926 | {
927 | "cell_type": "markdown",
928 | "metadata": {},
929 | "source": [
930 | "You can drill down further into the inputs and outputs of your Task.\n",
931 | "\n",
932 | "#### 7.7 Run the code in the following cell to see the input_shapefile port description, which should match what you entered in the Task definition."
933 | ]
934 | },
935 | {
936 | "cell_type": "code",
937 | "execution_count": null,
938 | "metadata": {},
939 | "outputs": [],
940 | "source": [
941 | "clip_task.inputs.input_shapefile"
942 | ]
943 | },
944 | {
945 | "cell_type": "markdown",
946 | "metadata": {},
947 | "source": [
948 | "#### 7.8 Run the code in the following cell to see the input_raster port description."
949 | ]
950 | },
951 | {
952 | "cell_type": "code",
953 | "execution_count": null,
954 | "metadata": {},
955 | "outputs": [],
956 | "source": [
957 | "clip_task.inputs.input_raster"
958 | ]
959 | },
960 | {
961 | "cell_type": "markdown",
962 | "metadata": {},
963 | "source": [
964 | "___\n",
965 | "## 8. Workflow\n",
966 | "Now that you've registered a Task to GBDX, the final step of this tutorial is to use your Task in a Workflow. Here's a potential Workflow that uses the Task you just created. You're going to use a DigitalGlobe image as the raster input, and a shapefile that we've created that aligns with that image, and that we've placed in a publicly accessible S3 bucket. \n",
967 | "\n",
968 | ">First initiate a gbdxtools session. \n",
969 | "\n",
970 | "```python\n",
971 | "from gbdxtools import Interface\n",
972 | "gbdx = Interface()\n",
973 | "```"
974 | ]
975 | },
976 | {
977 | "cell_type": "markdown",
978 | "metadata": {},
979 | "source": [
980 | "> Define the S3 path for an image by passing in its Catalog ID to the following method. \n",
981 | "\n",
982 | "```python\n",
983 | "source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')\n",
984 | "```"
985 | ]
986 | },
987 | {
988 | "cell_type": "markdown",
989 | "metadata": {},
990 | "source": [
991 | ">Next, define an input shapefile. Inputs and outputs to the Workflow have to come from somewhere on S3, so we've placed a shapefile that will clip the image in a publicly accessible S3 bucket. \n",
992 | "\n",
993 | "```python\n",
994 | "shape_path = 's3://gbdx-training/custom_task_tutorial/this_shp_will_clip_10400100245B7800/'\n",
995 | "``` "
996 | ]
997 | },
998 | {
999 | "cell_type": "markdown",
1000 | "metadata": {},
1001 | "source": [
1002 | ">So far in this script, you've signed into GBDX and defined the image and shapefile inputs. You can now start setting up the Tasks that you'll execute within a Workflow. Before you use the 'clip_raster' Task, you'll want to first pre-process the image. You can use the Advanced Image Processor Task for this, which orthorectifies raw imagery and offers other image pre-processing options. Documentation at https://gbdxdocs.digitalglobe.com/docs/advanced-image-preprocessor. \n",
1003 | "\n",
1004 | ">Create this Task, using its registered Task name, 'AOP_Strip_Processor', and the image you defined earlier as input to its input port, 'data'.\n",
1005 | "\n",
1006 | "```python\n",
1007 | "aop_task = gbdx.Task('AOP_Strip_Processor', data=raster_path)\n",
1008 | "```"
1009 | ]
1010 | },
1011 | {
1012 | "cell_type": "markdown",
1013 | "metadata": {},
1014 | "source": [
1015 | ">The next step would be to set up your 'clip_raster' Task, using the output from 'aop_task' as its raster input. However, 'aop_task' outputs several files in addition to the processed image, but your Task takes just one TIF as input. How you can address this obstacle is to set up the 'gdal-cli' Task to extract just the TIF from the 'aop_task' output. \n",
1016 | "\n",
1017 | ">Define this Task, which we call the 'glue_task', using its registered Task name 'gdal_cli'. Set the aop_task output via its output port, 'data', to the glue_task input port, also called 'data'. \n",
1018 | "\n",
1019 | "```python\n",
1020 | "glue_task = gbdx.Task('gdal-cli', data=aop_task.outputs.data.value, execution_strategy='runonce',\n",
1021 | " command=\"\"\"mv $indir/*/*.tif $outdir/\"\"\")\n",
1022 | "```"
1023 | ]
1024 | },
1025 | {
1026 | "cell_type": "markdown",
1027 | "metadata": {},
1028 | "source": [
1029 | ">You can now set up your custom 'clip_raster' Task. Specify the Task with the name you used to register the Task (this will be with your initials, not 'gt'). Set the glue_task output via its output port, 'data', to the clip_task import port, which we named 'input_raster'. \n",
1030 | "\n",
1031 | "```python\n",
1032 | "clip_task = gbdx.Task(\"clip_raster_gt\", input_raster=glue_task.outputs.data.value, input_shapefile=shape_path)\n",
1033 | "```"
1034 | ]
1035 | },
1036 | {
1037 | "cell_type": "markdown",
1038 | "metadata": {},
1039 | "source": [
1040 | ">Now build a Workflow using the Workflow call and a list of the Tasks you defined above.\n",
1041 | "\n",
1042 | "```python\n",
1043 | "workflow = gbdx.Workflow([ aop_task, glue_task, clip_task ])\n",
1044 | "```"
1045 | ]
1046 | },
1047 | {
1048 | "cell_type": "markdown",
1049 | "metadata": {},
1050 | "source": [
1051 | ">The Workflow is ready to go, but before you execute it, you'll need to specify where GBDX should save the output data. Gbdxtools has a feature that will automatically save output generated from your GBDX credentials to your GBDX customer S3 bucket. \n",
1052 | "\n",
1053 | ">Specify that you want to save the output from the 'clip_raster' output port, which we named 'data_out', \n",
1054 | "and the directory you want to save it to within your customer S3 bucket. \n",
1055 | "\n",
1056 | "```python\n",
1057 | "workflow.savedata(clip_task.outputs.data_out, location='task_demo/aop_clip_raster')\n",
1058 | "```"
1059 | ]
1060 | },
1061 | {
1062 | "cell_type": "markdown",
1063 | "metadata": {},
1064 | "source": [
1065 | ">Execute the Workflow. This will kick off the series of Tasks that will pre-process the input image, select just the image from the pre-processing output, clip that image to a shapefile, and save the output to your customer bucket.\n",
1066 | "\n",
1067 | ">Also, it's a good idea to hold on to the Workflow ID. This will allow you to track the status of the Workflow, which could take several minutes to several hours depending on the kind of processing and size of the image strip. The Workflow ID will also come in handy if you need to debug a Task or Workflow later.\n",
1068 | "\n",
1069 | "```python\n",
1070 | "workflow.execute()\n",
1071 | "print workflow.id\n",
1072 | "```"
1073 | ]
1074 | },
1075 | {
1076 | "cell_type": "markdown",
1077 | "metadata": {},
1078 | "source": [
1079 | "#### 8.1 You are now ready to test your custom Task in a Workflow. Run the following line of code, which will execute the steps we outlined above. "
1080 | ]
1081 | },
1082 | {
1083 | "cell_type": "code",
1084 | "execution_count": null,
1085 | "metadata": {},
1086 | "outputs": [],
1087 | "source": [
1088 | "# define the S3 path for an image by passing in its Catalog ID \n",
1089 | "source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')\n",
1090 | "\n",
1091 | "# define an input shapefile from S3\n",
1092 | "shape_path = 's3://gbdx-training/custom_task_tutorial/this_shp_will_clip_10400100245B7800/'\n",
1093 | "\n",
1094 | "# define the 'AOP_Strip_Processor' \n",
1095 | "aop_task = gbdx.Task('AOP_Strip_Processor', data=source_s3, enable_pansharpen=True)\n",
1096 | "\n",
1097 | "# define the 'gdal_cli' Task\n",
1098 | "glue_task = gbdx.Task('gdal-cli', data=aop_task.outputs.data.value, execution_strategy='runonce',\n",
1099 | " command=\"\"\"mv $indir/*/*.tif $outdir/\"\"\")\n",
1100 | "\n",
1101 | "# define the 'clip_raster' Task \n",
1102 | "clip_task = gbdx.Task(\"clip_raster_gt\", input_raster=glue_task.outputs.data.value, input_shapefile=shape_path)\n",
1103 | "\n",
1104 | "# build a Workflow to run the 'clip_raster' Task\n",
1105 | "workflow = gbdx.Workflow([aop_task, glue_task, clip_task])\n",
1106 | "\n",
1107 | "# specify where to save the output within your customer bucket\n",
1108 | "workflow.savedata(clip_task.outputs.data_out, location='demo_output/clip_raster')\n",
1109 | "\n",
1110 | "# kick off the Workflow and keep track of the Workflow ID\n",
1111 | "workflow.execute()\n",
1112 | "print workflow.id"
1113 | ]
1114 | },
1115 | {
1116 | "cell_type": "markdown",
1117 | "metadata": {},
1118 | "source": [
1119 | "GBDX is now running your Workflow. While the Workflow is running, you can interact with the Workflow object and track its status. \n",
1120 | "\n",
1121 | "#### 8.2 Run the code in the following cell to get the status of the Workflow. This call will return the the status of whatever event is currently underway."
1122 | ]
1123 | },
1124 | {
1125 | "cell_type": "code",
1126 | "execution_count": null,
1127 | "metadata": {},
1128 | "outputs": [],
1129 | "source": [
1130 | "workflow.status"
1131 | ]
1132 | },
1133 | {
1134 | "cell_type": "markdown",
1135 | "metadata": {},
1136 | "source": [
1137 | "You can also look at a list that contains each Task 'event' and its 'state'. Note there is also a 'task_id' associated with each Task. The Task ID is helful if you want to debug a particular Task in the Workflow. \n",
1138 | "\n",
1139 | "#### 8.3 Run the code in the following cell to get a list of the Task events that have occurred so far in this Workflow."
1140 | ]
1141 | },
1142 | {
1143 | "cell_type": "code",
1144 | "execution_count": null,
1145 | "metadata": {},
1146 | "outputs": [],
1147 | "source": [
1148 | "workflow.events"
1149 | ]
1150 | },
1151 | {
1152 | "cell_type": "markdown",
1153 | "metadata": {},
1154 | "source": [
1155 | "You can also return just the information on whether the Workflow completed and if it succeeded.\n",
1156 | "\n",
1157 | "#### 8.4 Run the code in the following two cells to see if your Workflow completed and if it succeeded. "
1158 | ]
1159 | },
1160 | {
1161 | "cell_type": "code",
1162 | "execution_count": null,
1163 | "metadata": {},
1164 | "outputs": [],
1165 | "source": [
1166 | "workflow.complete"
1167 | ]
1168 | },
1169 | {
1170 | "cell_type": "code",
1171 | "execution_count": null,
1172 | "metadata": {
1173 | "collapsed": true
1174 | },
1175 | "outputs": [],
1176 | "source": [
1177 | "workflow.succeeded"
1178 | ]
1179 | },
1180 | {
1181 | "cell_type": "markdown",
1182 | "metadata": {},
1183 | "source": [
1184 | "Once the Workflow has finished, you can run the Run the following two cell blocks to get the stdout and stderr from your Workflow. \n",
1185 | "\n",
1186 | "#### 8.5 Wait until the Workflow has completed, then run the code in the following two cells to get the stdout and stderr from your Workflow. "
1187 | ]
1188 | },
1189 | {
1190 | "cell_type": "code",
1191 | "execution_count": null,
1192 | "metadata": {},
1193 | "outputs": [],
1194 | "source": [
1195 | "workflow.stdout"
1196 | ]
1197 | },
1198 | {
1199 | "cell_type": "code",
1200 | "execution_count": null,
1201 | "metadata": {},
1202 | "outputs": [],
1203 | "source": [
1204 | "workflow.stderr"
1205 | ]
1206 | },
1207 | {
1208 | "cell_type": "markdown",
1209 | "metadata": {},
1210 | "source": [
1211 | "You can also get the stderr and stdout for a particular task, given its Task ID.\n",
1212 | "\n",
1213 | "#### 8.6 Run the code in the following cell to get a lisk of Task IDs from your Workflow. "
1214 | ]
1215 | },
1216 | {
1217 | "cell_type": "code",
1218 | "execution_count": null,
1219 | "metadata": {},
1220 | "outputs": [],
1221 | "source": [
1222 | "task_ids = workflow.task_ids"
1223 | ]
1224 | },
1225 | {
1226 | "cell_type": "markdown",
1227 | "metadata": {},
1228 | "source": [
1229 | "#### 8.7 Then run the code in the following two cells to get the stdout and stderr of a particular Task, using the Workflow ID and a Task ID. "
1230 | ]
1231 | },
1232 | {
1233 | "cell_type": "code",
1234 | "execution_count": null,
1235 | "metadata": {},
1236 | "outputs": [],
1237 | "source": [
1238 | "gbdx.workflow.get_stdout(workflow.id, workflow.task_ids[0])"
1239 | ]
1240 | },
1241 | {
1242 | "cell_type": "code",
1243 | "execution_count": null,
1244 | "metadata": {
1245 | "collapsed": true
1246 | },
1247 | "outputs": [],
1248 | "source": [
1249 | "gbdx.workflow.get_stderr(workflow.id, workflow.task_ids[0])"
1250 | ]
1251 | },
1252 | {
1253 | "cell_type": "markdown",
1254 | "metadata": {},
1255 | "source": [
1256 | "You can always bring up information about a Workflow after the fact by using its Workflow ID in the following call. \n",
1257 | "\n",
1258 | "#### 8.8 Run the code in the following cell to retrieve information about your Workflow, REPLACE THE WORKFLOW ID WITH YOUR WORKFLOW ID. "
1259 | ]
1260 | },
1261 | {
1262 | "cell_type": "code",
1263 | "execution_count": null,
1264 | "metadata": {},
1265 | "outputs": [],
1266 | "source": [
1267 | "gbdx.workflow.get('4574506640983582982')"
1268 | ]
1269 | },
1270 | {
1271 | "cell_type": "markdown",
1272 | "metadata": {},
1273 | "source": [
1274 | "NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO THE S3 BROWSER\n",
1275 | "\n",
1276 | "Once your Workflow has completed (and succeeded!), you will be able to see the output in your customer S3 bucket. \n",
1277 | "\n",
1278 | "#### 8.9 Log into the S3 browser [http://s3browser.geobigdata.io](http://s3browser.geobigdata.io/login.html) using your GBDX credentials. \n",
1279 | "\n",
1280 | "#### 8.10 Navigate to 'task_demo/clip_raster' to see the saved output of your Workflow. "
1281 | ]
1282 | },
1283 | {
1284 | "cell_type": "markdown",
1285 | "metadata": {},
1286 | "source": [
1287 | "## 9. Using the GPU\n",
1288 | "Frequenly tasks need to leverage GPU resources for optimal performance. Here we will learn to create a GPU capable container, test that container in an EC2 instance, and then deploy that container into a workflow for use within GBDX.\n",
1289 | "\n",
1290 | "\n",
1291 | "### 9.1 Set up a GPU Instance for Development\n",
1292 | "\n",
1293 | "Here we will learn to create your own Virtual Private Cloud (VPC) and deploy a GPU enabled EC2 instance into that cloud for development.\n",
1294 | "\n",
1295 | "#### 9.1.1 Create a Virtual Private Cloud\n",
1296 | "\n",
1297 | "Open the AWS managment console and navigate to the Virtual Private Cloud (VPC) Dashboard. Launch the VPC Wizard to being configuring a VPC for your pytorch enabled instance. Select a VPC with a Single Public Subnet. Give your VPC and it's subnet a name, in this tutorial our VPC will be named GBDX_Pytorch_VPC and our subnet will be named GBDX_Pytorch_Subnet. If you have a preference for availability zone, please configure that now. Click 'Create VPC'.\n",
1298 | "\n",
1299 | "We will next define a security group for this VPC that manages traffic between itself and the outside world. To do this go back to the VPC dashboard and under the security subsection select Secutiry Groups and then Create Security Group. Give this security group a name, description, and associate this security group we created previously. Create this security group.\n",
1300 | "\n",
1301 | "We will now define the rules governing this security group. To do this, go back to the Security Groups page and find the security group created above. Select Edit Rules and allow HTTP/HTTPS/SSH traffic.\n",
1302 | "\n",
1303 | "All set, the VPC is now configured.\n",
1304 | "\n",
1305 | "#### 9.1.2 Create AWS EC2 Instance\n",
1306 | "\n",
1307 | "Now that our VPC has been configured it's time to create an EC2 Instance that will run inside this cloud.\n",
1308 | "\n",
1309 | "Go back to the AWS managment console and then navigate to the EC2 Dashboard and select Launch Instance. Then, select the newest version of Ubuntu as your base image. For instance type, select p2.xlarge and then Configure Instance Details.\n",
1310 | "\n",
1311 | "Here, under network, select the subnet created in previous steps, the GBDX_Pytorch_Subnet. Next, increase storage size to 800GB. Continue to Add Tags and tag this instance GBDX_Pytorch_EC2. Review and launch this instance.\n",
1312 | "\n",
1313 | "You'll be asked to save a .pem file that is used to securely access this instance, download that file and save it for later. Navigate to the EC2 Dashboard to ensure this instance is running.\n",
1314 | "\n",
1315 | "Next, we will assign a public IPv4 address to this instance running inside our VPC via an Elastic IP Address. To do this navigate back to the VPC console and then to the Elastic IP pane. Select Allocate New Address, then Allocate. Go back to the Elastic IP section of the VPC dashboard and select your newly created Elastic IP address and then, under actions, select Associate Address. Add the instance Id of the instance created previously.\n",
1316 | "\n",
1317 | "You have now created an EC2 Instance that is accessable via an IPv4 address inside of it's own Virtual Private Cloud. Go ahead and sign in to continue configuring this instance.\n",
1318 | "\n",
1319 | "#### 9.1.3 GPU Instance setup\n",
1320 | "\n",
1321 | "All GBDX GPU workers use nvidia-docker to allow running containers to leverage their GPU devices. SHH into your EC2 instance above and continue configuration.\n",
1322 | "\n",
1323 | "The steps below will install the requisite NVIDIA drivers and CUDA.\n",
1324 | "\n",
1325 | "```\n",
1326 | "sudo apt-get update && sudo apt-get -y upgrade\n",
1327 | "sudo apt-get clean\n",
1328 | "sudo apt install ubuntu-drivers-common\n",
1329 | "sudo ubuntu-drivers autoinstall\n",
1330 | "wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb\n",
1331 | "sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb\n",
1332 | "sudo apt-get update\n",
1333 | "sudo apt-get install cuda\n",
1334 | "```\n",
1335 | "\n",
1336 | "Verify CUDA installation by running:\n",
1337 | "```\n",
1338 | "nvidia-smi\n",
1339 | "```\n",
1340 | "You should see the following output:\n",
1341 | "```+-----------------------------------------------------------------------------+\n",
1342 | "| NVIDIA-SMI 375.66 Driver Version: 375.66 |\n",
1343 | "|-------------------------------+----------------------+----------------------+\n",
1344 | "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
1345 | "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
1346 | "|===============================+======================+======================|\n",
1347 | "| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |\n",
1348 | "| N/A 36C P0 45W / 125W | 0MiB / 4036MiB | 0% Default |\n",
1349 | "+-------------------------------+----------------------+----------------------+\n",
1350 | "\n",
1351 | "+-----------------------------------------------------------------------------+\n",
1352 | "| Processes: GPU Memory |\n",
1353 | "| GPU PID Type Process name Usage |\n",
1354 | "|=============================================================================|\n",
1355 | "| No running processes found |\n",
1356 | "+-----------------------------------------------------------------------------+\n",
1357 | "```\n",
1358 | "\n",
1359 | "Then, we need docker on this EC2 Instance\n",
1360 | "\n",
1361 | "```\n",
1362 | "sudo apt-get install \\\n",
1363 | "\tapt-transport-https \\\n",
1364 | "\tca-certificates \\\n",
1365 | "\tcurl \\\n",
1366 | "\tsoftware-properties-common\n",
1367 | "\n",
1368 | "# Add the Docker GPG key\n",
1369 | "curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -\n",
1370 | "sudo add-apt-repository \\\n",
1371 | " \"deb [arch=amd64] https://download.docker.com/linux/ubuntu \\\n",
1372 | " $(lsb_release -cs) \\\n",
1373 | " stable\"\n",
1374 | "sudo apt-get update\n",
1375 | "sudo apt-get install docker-ce\n",
1376 | "sudo groupadd docker\n",
1377 | "sudo usermod -aG docker $USER\n",
1378 | "\n",
1379 | "# Reboot the instance\n",
1380 | "sudo reboot\n",
1381 | "```\n",
1382 | "\n",
1383 | "Verify docker installation\n",
1384 | "\n",
1385 | "```\n",
1386 | "docker run hello-world\n",
1387 | "```\n",
1388 | "\n",
1389 | "Finally, install nvidia-docker on this instance.\n",
1390 | "\n",
1391 | "```\n",
1392 | "# Install nvidia-docker and nvidia-docker-plugin\n",
1393 | "wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb\n",
1394 | "sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb\n",
1395 | "\n",
1396 | "# Test nvidia-smi\n",
1397 | "nvidia-docker run --rm nvidia/cuda nvidia-smi\n",
1398 | "\n",
1399 | "# You should see the following output:\n",
1400 | "+-----------------------------------------------------------------------------+\n",
1401 | "| NVIDIA-SMI 375.66 Driver Version: 375.66 |\n",
1402 | "|-------------------------------+----------------------+----------------------+\n",
1403 | "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
1404 | "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
1405 | "|===============================+======================+======================|\n",
1406 | "| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |\n",
1407 | "| N/A 35C P8 18W / 125W | 0MiB / 4036MiB | 0% Default |\n",
1408 | "+-------------------------------+----------------------+----------------------+\n",
1409 | "\n",
1410 | "+-----------------------------------------------------------------------------+\n",
1411 | "| Processes: GPU Memory |\n",
1412 | "| GPU PID Type Process name Usage |\n",
1413 | "|=============================================================================|\n",
1414 | "| No running processes found |\n",
1415 | "+-----------------------------------------------------------------------------+\n",
1416 | "```\n",
1417 | "Your EC2 instance is now ready for GPU based model development.\n",
1418 | "\n",
1419 | "### 9.2 Building a GPU-Compatible Image\n",
1420 | "\n",
1421 | "Here we will see how to create a GPU-Compatible image that can be used both in this EC2 instance and deployed into a GBDX workflow. We will create a basic image that can verify it has access to GPU resources inside of the container using the python library pytorch.\n",
1422 | "\n",
1423 | "```\n",
1424 | "###Directory Structure\n",
1425 | "\n",
1426 | " /\n",
1427 | " └── Notebooks (or wherever you keep this IPython Notebook)\n",
1428 | " ├── custom_task_tutorial.pynb (this notebook)\n",
1429 | " └── task_tutorial_files/ \n",
1430 | " ├── pytorch.json\n",
1431 | " └── gpu_docker_projects/\n",
1432 | " ├── Dockerfile\n",
1433 | " └── bin/\n",
1434 | " └── pytorch.py\n",
1435 | "```\n",
1436 | "#### 9.2.1 The Dockerfile\n",
1437 | "\n",
1438 | "This dockerfile will have access to the GPU of the underlying infrastructure and needs to be designed with this in mind. Additionally it will use the Python library, Pytorch.\n",
1439 | "\n",
1440 | "```\n",
1441 | "FROM nvidia/cuda:9.0-devel\n",
1442 | "\n",
1443 | "FROM continuumio/miniconda\n",
1444 | "\n",
1445 | "FROM pytorch/pytorch\n",
1446 | "\n",
1447 | "RUN apt-get update && apt-get install -y --no-install-recommends\n",
1448 | "\n",
1449 | "RUN mkdir /my_scripts\n",
1450 | "\n",
1451 | "ADD ./bin /my_scripts\n",
1452 | "\n",
1453 | "CMD python /my_scripts/pytorch.py\n",
1454 | "```\n",
1455 | "\n",
1456 | "#### 9.2.2 Python Script\n",
1457 | "\n",
1458 | "This script will execute inside of our nvidia-docker container. It verfies whether the container can access the GPU and CUDA is configured correctly.\n",
1459 | "\n",
1460 | "```python\n",
1461 | "import torch\n",
1462 | "import os\n",
1463 | "\n",
1464 | "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
1465 | "\n",
1466 | "# Assuming that we are on a CUDA machine, this should print a CUDA device:\n",
1467 | "print(device)\n",
1468 | "\n",
1469 | "os.makedirs(\"/mnt/work/output/tmp\")\n",
1470 | "f = open(\"/mnt/work/output/tmp/test.txt\", \"w\")\n",
1471 | "f.write(str(device))\n",
1472 | "f.close()\n",
1473 | "```\n",
1474 | "\n",
1475 | "### 9.3 Testing the Docker Image\n",
1476 | "\n",
1477 | "Once both the python script and Dockerfile have made it onto our EC2 instance we're now able to test and validate before deploying into a workflow. Navigate to the directory containing our Dockerfile and build it.\n",
1478 | "\n",
1479 | "```\n",
1480 | "sudo nvidia-docker build -t pytorch .\n",
1481 | "```\n",
1482 | "\n",
1483 | "Then, execute this built docker image.\n",
1484 | "\n",
1485 | "```\n",
1486 | "sudo nvidia-docker run -it --rm pytorch\n",
1487 | "# You should see the following output:\n",
1488 | "cuda:0\n",
1489 | "```\n",
1490 | "\n",
1491 | "Congrats! You've created a GBDX GPU compatible docker image that is ready to be used in GBDX workflows.\n",
1492 | "\n",
1493 | "### 9.4 Task Definition\n",
1494 | "\n",
1495 | "The task definition for GPU based GBDX tasks is minimally different than CPU based tasks. You need to define a domain within the containerDescriptors object the includes a GPU. These domains are found here: https://github.com/TDG-Platform/operations/wiki/Worker-Spec-Chart\n",
1496 | "\n",
1497 | "An example containerDescroptors property would look like:\n",
1498 | "```\n",
1499 | "\"containerDescriptors\": [{\n",
1500 | "\t\"type\": \"DOCKER\",\n",
1501 | "\t\"command\": \"python /my_scripts/pytorch.py\",\n",
1502 | "\t\"properties\": {\n",
1503 | "\t \"image\": \"tdgp/pytorch_demo\",\n",
1504 | "\t \"domain\": \"nvidiap3\"\n",
1505 | "\t}\n",
1506 | "}]\n",
1507 | "```\n",
1508 | "\n",
1509 | "### 9.5 Registering and Executing the Task\n",
1510 | "\n",
1511 | "Registering and executing tasks instructions are similar to those found in sections 7 and 8."
1512 | ]
1513 | },
1514 | {
1515 | "cell_type": "markdown",
1516 | "metadata": {
1517 | "collapsed": true
1518 | },
1519 | "source": [
1520 | "Congratulations on completing this tutorial and for successfully putting a custom Task on GBDX! We challenge you to create a custom Task with your own analysis code, using this tutorial as a guide. "
1521 | ]
1522 | }
1523 | ],
1524 | "metadata": {
1525 | "anaconda-cloud": {},
1526 | "kernelspec": {
1527 | "display_name": "Python 3",
1528 | "language": "python",
1529 | "name": "python3"
1530 | },
1531 | "language_info": {
1532 | "codemirror_mode": {
1533 | "name": "ipython",
1534 | "version": 3
1535 | },
1536 | "file_extension": ".py",
1537 | "mimetype": "text/x-python",
1538 | "name": "python",
1539 | "nbconvert_exporter": "python",
1540 | "pygments_lexer": "ipython3",
1541 | "version": "3.7.1"
1542 | }
1543 | },
1544 | "nbformat": 4,
1545 | "nbformat_minor": 1
1546 | }
1547 |
--------------------------------------------------------------------------------