├── HandsOn.dbc └── Readme.md /HandsOn.dbc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tsmatz/azure-databricks-exercise/7d293b27a153ba99aa1538413fb8fe673adb1ce8/HandsOn.dbc -------------------------------------------------------------------------------- /Readme.md: -------------------------------------------------------------------------------- 1 | # Azure Databricks Hands-on (Tutorials) 2 | 3 | To run these exercises, follow each instructions below in this readme. 4 | 5 | 1. [Storage Settings](https://tsmatz.github.io/azure-databricks-exercise/exercise01-blob.html) 6 | 2. [Basics of PySpark, Spark Dataframe, and Spark Machine Learning](https://tsmatz.github.io/azure-databricks-exercise/exercise02-pyspark-dataframe.html) 7 | 3. [Spark Machine Learning Pipeline](https://tsmatz.github.io/azure-databricks-exercise/exercise03-sparkml-pipeline.html) 8 | 4. [Hyper-parameter Tuning](https://tsmatz.github.io/azure-databricks-exercise/exercise04-hyperparams-tuning.html) 9 | 5. [MLeap](https://tsmatz.github.io/azure-databricks-exercise/exercise05-mleap.html) (requires ML runtime) 10 | 6. [Spark PyTorch Distributor](https://tsmatz.github.io/azure-databricks-exercise/exercise06-dnn-distributor.html) (requires ML runtime) 11 | 7. [Structured Streaming (Basic)](https://tsmatz.github.io/azure-databricks-exercise/exercise07-structured-streaming.html) 12 | 8. [Structured Streaming with Azure Event Hubs or Kafka](https://tsmatz.github.io/azure-databricks-exercise/exercise08-streaming-eventhub.html) 13 | 9. [Delta Lake](https://tsmatz.github.io/azure-databricks-exercise/exercise09-databricks-delta.html) 14 | 10. [MLflow](https://tsmatz.github.io/azure-databricks-exercise/exercise10-mlflow.html) (requires ML runtime) 15 | 11. [Orchestration with Azure Data Services](https://tsmatz.github.io/azure-databricks-exercise/exercise11-orchestration.html) 16 | 12. [Delta Live Tables](https://tsmatz.github.io/azure-databricks-exercise/exercise12-dlt.html) 17 | 13. [Databricks SQL](https://tsmatz.github.io/azure-databricks-exercise/exercise13-sql.html) 18 | 19 | ## Getting Started 20 | 21 | - Create Azure Databricks resource in [Microsoft Azure](https://portal.azure.com/).
22 | When you create a resource, please select Premium plan. 23 | - After the resource is created, launch Databricks workspace UI by clicking "Launch Workspace". 24 | - Create a compute (cluster) in Databricks UI. (Select "Compute" menu and proceed to create.)
25 | Please select runtime in ML (not a standard runtime). 26 | - Clone this repository by running the following command. (Or download [HandsOn.dbc](https://github.com/tsmatz/azure-databricks-exercise/raw/master/HandsOn.dbc).)
27 | ```git clone https://github.com/tsmatz/azure-databricks-exercise``` 28 | - Import ```HandsOn.dbc``` into your Databricks workspace as follows. 29 | - Select "Workspace" in Workspace UI. 30 | - Go to user folder, click your e-mail (the arrow icon), and then select "import". 31 | - Pick up ```HandsOn.dbc```. 32 | - Open the imported notebooks and attach above compute (cluster) in every notebooks. (Select compute (cluster) on the top of notebook.) 33 | - Please make sure to run "Exercise 01 : Storage Settings (Prepare)", before running other notebooks. 34 | 35 | > Note : You cannot use Azure trial (free) subscription, because of the limited quota. When you're in Azure free subscription, please promote to pay-as-you-go. (The credit in free subscription will be reserved, even when you transit to pay-as-you-go.) 36 | 37 | *Tsuyoshi Matsuzaki @ Microsoft* 38 | --------------------------------------------------------------------------------