├── .gitignore ├── Dockerfile ├── LICENSE ├── README.md ├── datasets ├── JUPYTER_BASIC_CODE_CELL_EXECUTION.pcapng ├── apt29_evals_day1_manual_demo.zip └── flow_logs.csv └── docs ├── _config.yml ├── _toc.yml ├── community-events └── jupyterthon.md ├── community-projects └── threat-hunter-playbook.md ├── community-workshops ├── .DS_Store └── defcon_btv_2020 │ ├── .DS_Store │ ├── basic-concepts │ ├── .DS_Store │ ├── 01_Creating_A_Spark_DataFrame.ipynb │ ├── 02_Spark_SQL_View_From_Mordor_dataset.ipynb │ ├── 03_Filtering_Summarizing.ipynb │ ├── 04_Transforming.ipynb │ ├── 05_Correlating.ipynb │ ├── 06_Visualizing.ipynb │ └── intro.md │ ├── datasets │ ├── apt29_evals_day1_manual.zip │ ├── covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip │ ├── covenant_sc_dcerpc_smb_svcctl_QueryServiceStatus.zip │ ├── covenant_sharpsc_dcerpc_smb_svcctl_EnumServiceStatusW.zip │ ├── covenant_sharpwmi_dcerpc_wmi_remotecreateinstance.zip │ └── empire_psinject.zip │ ├── intro.md │ └── use-cases │ ├── .DS_Store │ ├── 01_Data_Analysis_Process_Injection.ipynb │ ├── 02_Data_Analysis_DCSync_dcerpc.ipynb │ ├── 03_Data_Analysis_RemoteCreateInstance_dcerpc_wmi.ipynb │ └── intro.md ├── fundamentals ├── libraries │ ├── intro.md │ ├── numpy_arrays.ipynb │ └── pandas.ipynb └── programming │ ├── intro.md │ └── python.ipynb ├── getting-started ├── architecture.md ├── installation.md ├── installation_binderhub.md ├── installation_docker.md ├── ipython_vs_python.ipynb └── what_is_jupyter.md ├── images ├── Binderhub-Architecture.png ├── JUPYTER_ARCHITECTURE.png ├── JUPYTER_CLIENT_EXEC_REQUEST.png ├── JUPYTER_CLIENT_EXEC_STREAM.png ├── JUPYTER_CLIENT_NOTEBOOK.png ├── JUPYTER_DOCKER_GH_MINIMAL.png ├── JUPYTER_DOCKER_MINIMAL_RUN.png ├── JUPYTER_EXECUTE_REQUEST.png ├── JUPYTER_INSTALLATION_NOTEBOOK_SERVER.png ├── JUPYTER_IPYTHON.png ├── JUPYTER_KERNEL_LOCATION.png ├── JUPYTER_NOTEBOOK_BASIC_VIEW.png ├── JUPYTER_NOTEBOOK_SERVER.png ├── JUPYTER_NOTEBOOK_SERVER_RUN.png ├── ipython-jupyter-features.png ├── ipython-to-jupyter.png └── logo │ ├── download.svg │ ├── favicon.ico │ ├── jupyter.png │ ├── logo.png │ └── logo.psd ├── introduction.md └── use-cases ├── data-analysis ├── 01_sql_to_pandas_win_events_101.ipynb ├── 02_bloodhound_explore_kerberoastable_users.ipynb ├── 03_analyzing_rpc_methods_relationships_graphframes.ipynb └── intro.md ├── data-connectors ├── azure_sentinel.ipynb ├── elasticsearch.ipynb ├── intro.md └── splunk.ipynb └── data-visualizations ├── .ipynb_checkpoints └── 03_altair_bloodhound_results-checkpoint.ipynb ├── 01_data_viz_pandas.ipynb ├── 02_data_viz_bokeh.ipynb ├── 03_altair_bloodhound_results.ipynb └── intro.md /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | docs/notebooks/.DS_Store 3 | docs/notebooks/.DS_Store 4 | docs/use-cases/.DS_Store 5 | docs/notebooks/.DS_Store 6 | docs/use-cases/data-analysis/.DS_Store 7 | docs/community-projects/.DS_Store 8 | docs/.DS_Store 9 | docs/_build/ 10 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # ThreatHunter Playbook script: Jupyter Environment Dockerfile 2 | # Author: Roberto Rodriguez (@Cyb3rWard0g) 3 | # License: GPL-3.0 4 | 5 | FROM cyb3rward0g/jupyter-pyspark:0.0.6 6 | LABEL maintainer="Roberto Rodriguez @Cyb3rWard0g" 7 | LABEL description="Dockerfile Infosec Jupyter Book Project." 8 | 9 | ARG NB_USER 10 | ARG NB_UID 11 | ENV NB_USER jovyan 12 | ENV NB_UID 1000 13 | ENV HOME /home/${NB_USER} 14 | ENV PATH "$HOME/.local/bin:$PATH" 15 | 16 | USER root 17 | 18 | RUN adduser --disabled-password \ 19 | --gecos "Default user" \ 20 | --uid ${NB_UID} \ 21 | ${NB_USER} 22 | 23 | USER ${NB_USER} 24 | 25 | RUN python3 -m pip install openhunt==1.6.8 bokeh==2.1.1 seaborn==0.10.1 msticpy==0.8.2 --user 26 | 27 | COPY docs ${HOME}/docs 28 | 29 | USER root 30 | 31 | RUN chown ${NB_USER} /usr/local/share/jupyter/kernels/pyspark3/kernel.json \ 32 | && chown -R ${NB_USER}:${NB_USER} ${HOME} ${JUPYTER_DIR} 33 | 34 | WORKDIR ${HOME} 35 | 36 | USER ${NB_USER} -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Infosec Jupyter Book 2 | 3 | [![Open_Threat_Research Community](https://img.shields.io/badge/Open_Threat_Research-Community-brightgreen.svg)](https://twitter.com/OTR_Community) 4 | [![Open Source Love svg1](https://badges.frapsoft.com/os/v3/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/) 5 | 6 | The Infosec Community Definitive Guide to Jupyter Notebooks to empower other researchers around the world to share, collaborate and help others through interactive environments. This is a community-driven project and contains documentation about the installation of a Jupyter Notebook server and interesting use cases for security research. 7 | 8 | ## Goals: 9 | 10 | * Expedite the time it takes to start working with Jupyter Notebooks 11 | * Share use cases in different areas of InfoSec where notebooks can help 12 | * Aggregate and centralize information about Jupyter Notebooks for security researchers 13 | 14 | ## Contributing 15 | 16 | If you use Jupyter Notebooks for anything in InfoSec and would like to share it with the community and help others, freel free to open a PR. We would love to provide feedback and put it in the right place. 17 | 18 | ## Authors 19 | 20 | * Roberto Rodriguez [@Cyb3rWard0g](https://twitter.com/Cyb3rWard0g) 21 | * Jose Luis Rodriguez [@Cyb3rPandaH](https://twitter.com/Cyb3rPandaH) -------------------------------------------------------------------------------- /datasets/JUPYTER_BASIC_CODE_CELL_EXECUTION.pcapng: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/datasets/JUPYTER_BASIC_CODE_CELL_EXECUTION.pcapng -------------------------------------------------------------------------------- /datasets/apt29_evals_day1_manual_demo.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/datasets/apt29_evals_day1_manual_demo.zip -------------------------------------------------------------------------------- /docs/_config.yml: -------------------------------------------------------------------------------- 1 | ####################################################################################### 2 | # Book settings 3 | title: "Infosec Jupyter Book" 4 | logo: images/logo/logo.png 5 | author: Roberto Rodriguez @Cyb3rWard0g, Jose Rodriguez (@Cyb3rPandaH) 6 | email: "" 7 | description: >- # this means to ignore newlines until "baseurl:" 8 | The Infosec Community Definitive Guide to Jupyter Notebooks 9 | execute: 10 | execute_notebooks: cache 11 | run_in_temp: true 12 | 13 | html: 14 | favicon: images/logo/favicon.ico 15 | home_page_in_navbar: false 16 | use_edit_page_button: true 17 | use_repository_button: true 18 | use_issues_button: true 19 | baseurl: https://infosecjupyterbook.com/ 20 | 21 | repository: 22 | url: https://github.com/OTRF/infosec-jupyter-book 23 | branch: master 24 | path_to_book: docs 25 | 26 | launch_buttons: 27 | notebook_interface: "classic" # The interface interactive links will activate ["classic", "jupyterlab"] 28 | binderhub_url: "https://mybinder.org" 29 | colab_url: "https://colab.research.google.com" 30 | thebe: true -------------------------------------------------------------------------------- /docs/_toc.yml: -------------------------------------------------------------------------------- 1 | - file: introduction 2 | - part: Getting Started 3 | chapters: 4 | - file: getting-started/what_is_jupyter 5 | sections: 6 | - file: getting-started/ipython_vs_python 7 | - file: getting-started/architecture 8 | - file: getting-started/installation 9 | sections: 10 | - file: getting-started/installation_docker 11 | - file: getting-started/installation_binderhub 12 | - part: Fundamentals 13 | chapters: 14 | - file: fundamentals/programming/intro 15 | sections: 16 | - file: fundamentals/programming/python 17 | - file: fundamentals/libraries/intro 18 | sections: 19 | - file: fundamentals/libraries/numpy_arrays 20 | - file: fundamentals/libraries/pandas 21 | - part: Use Cases 22 | chapters: 23 | - file: use-cases/data-analysis/intro 24 | sections: 25 | - file: use-cases/data-analysis/01_sql_to_pandas_win_events_101 26 | - file: use-cases/data-analysis/02_bloodhound_explore_kerberoastable_users 27 | - file: use-cases/data-analysis/03_analyzing_rpc_methods_relationships_graphframes 28 | - file: use-cases/data-connectors/intro 29 | sections: 30 | - file: use-cases/data-connectors/elasticsearch 31 | - file: use-cases/data-connectors/splunk 32 | - file: use-cases/data-connectors/azure_sentinel 33 | - file: use-cases/data-visualizations/intro 34 | sections: 35 | - file: use-cases/data-visualizations/01_data_viz_pandas 36 | - file: use-cases/data-visualizations/02_data_viz_bokeh 37 | - file: use-cases/data-visualizations/03_altair_bloodhound_results 38 | - part: Community Projects 39 | chapters: 40 | - file: community-projects/threat-hunter-playbook 41 | - part: Community Workshops 42 | chapters: 43 | - file: community-workshops/defcon_btv_2020/intro 44 | sections: 45 | - file: community-workshops/defcon_btv_2020/basic-concepts/intro 46 | sections: 47 | - file: community-workshops/defcon_btv_2020/basic-concepts/01_Creating_A_Spark_DataFrame 48 | - file: community-workshops/defcon_btv_2020/basic-concepts/02_Spark_SQL_View_From_Mordor_dataset 49 | - file: community-workshops/defcon_btv_2020/basic-concepts/03_Filtering_Summarizing 50 | - file: community-workshops/defcon_btv_2020/basic-concepts/04_Transforming 51 | - file: community-workshops/defcon_btv_2020/basic-concepts/05_Correlating 52 | - file: community-workshops/defcon_btv_2020/basic-concepts/06_Visualizing 53 | - file: community-workshops/defcon_btv_2020/use-cases/intro 54 | sections: 55 | - file: community-workshops/defcon_btv_2020/use-cases/01_Data_Analysis_Process_Injection 56 | - file: community-workshops/defcon_btv_2020/use-cases/02_Data_Analysis_DCSync_dcerpc 57 | - file: community-workshops/defcon_btv_2020/use-cases/03_Data_Analysis_RemoteCreateInstance_dcerpc_wmi 58 | - part: Community Events 59 | chapters: 60 | - file: community-events/jupyterthon -------------------------------------------------------------------------------- /docs/community-events/jupyterthon.md: -------------------------------------------------------------------------------- 1 | # Infosec Jupyterthon -------------------------------------------------------------------------------- /docs/community-projects/threat-hunter-playbook.md: -------------------------------------------------------------------------------- 1 | # Threat Hunter Playbook -------------------------------------------------------------------------------- /docs/community-workshops/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/.DS_Store -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/.DS_Store -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/basic-concepts/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/basic-concepts/.DS_Store -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/basic-concepts/01_Creating_A_Spark_DataFrame.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Creating a Spark Dataframe" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 15 | "* **Project**: Infosec Jupyter Book\n", 16 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 17 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 18 | "* **Reference**: https://mordordatasets.com/introduction.html" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Importing Spark libraries" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 1, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "from pyspark.sql import SparkSession" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Creating Spark session" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "spark = SparkSession \\\n", 51 | " .builder \\\n", 52 | " .appName(\"Spark_example\") \\\n", 53 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 54 | " .getOrCreate()" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/html": [ 65 | "\n", 66 | "
\n", 67 | "

SparkSession - in-memory

\n", 68 | " \n", 69 | "
\n", 70 | "

SparkContext

\n", 71 | "\n", 72 | "

Spark UI

\n", 73 | "\n", 74 | "
\n", 75 | "
Version
\n", 76 | "
v2.4.5
\n", 77 | "
Master
\n", 78 | "
local[*]
\n", 79 | "
AppName
\n", 80 | "
Spark_example
\n", 81 | "
\n", 82 | "
\n", 83 | " \n", 84 | "
\n", 85 | " " 86 | ], 87 | "text/plain": [ 88 | "" 89 | ] 90 | }, 91 | "execution_count": 3, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "spark" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "## Creating a Spark Sample DataFrame" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "### Create sample data\n", 112 | "\n", 113 | "Security event logs" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 4, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "eventLogs = [('Sysmon',1,'Process creation'),\n", 123 | " ('Sysmon',2,'A process changed a file creation time'),\n", 124 | " ('Sysmon',3,'Network connection'),\n", 125 | " ('Sysmon',4,'Sysmon service state changed'),\n", 126 | " ('Sysmon',5,'Process terminated'),\n", 127 | " ('Security',4688,'A process has been created'),\n", 128 | " ('Security',4697,'A service was installed in the system')]" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 5, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "data": { 138 | "text/plain": [ 139 | "list" 140 | ] 141 | }, 142 | "execution_count": 5, 143 | "metadata": {}, 144 | "output_type": "execute_result" 145 | } 146 | ], 147 | "source": [ 148 | "type(eventLogs)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "### Define dataframe schema" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 6, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "from pyspark.sql.types import *" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 7, 170 | "metadata": {}, 171 | "outputs": [], 172 | "source": [ 173 | "schema = StructType([\n", 174 | " StructField(\"Channel\", StringType(), True),\n", 175 | " StructField(\"Event_Id\", IntegerType(), True),\n", 176 | " StructField(\"Description\", StringType(), True)])" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "### Create Spark datarame" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 8, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "eventLogsDf = spark.createDataFrame(eventLogs,schema)" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 9, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "name": "stdout", 202 | "output_type": "stream", 203 | "text": [ 204 | "+--------+--------+--------------------------------------+\n", 205 | "|Channel |Event_Id|Description |\n", 206 | "+--------+--------+--------------------------------------+\n", 207 | "|Sysmon |1 |Process creation |\n", 208 | "|Sysmon |2 |A process changed a file creation time|\n", 209 | "|Sysmon |3 |Network connection |\n", 210 | "|Sysmon |4 |Sysmon service state changed |\n", 211 | "|Sysmon |5 |Process terminated |\n", 212 | "|Security|4688 |A process has been created |\n", 213 | "|Security|4697 |A service was installed in the system |\n", 214 | "+--------+--------+--------------------------------------+\n", 215 | "\n" 216 | ] 217 | } 218 | ], 219 | "source": [ 220 | "eventLogsDf.show(truncate = False)" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 10, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "data": { 230 | "text/plain": [ 231 | "pyspark.sql.dataframe.DataFrame" 232 | ] 233 | }, 234 | "execution_count": 10, 235 | "metadata": {}, 236 | "output_type": "execute_result" 237 | } 238 | ], 239 | "source": [ 240 | "type(eventLogsDf)" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "## Exposing Spark DataFrame as a SQL View" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 11, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "eventLogsDf.createOrReplaceTempView('eventLogs')" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "## Testing a SQL-like Query" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "Filtering on **Sysmon** event logs" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 12, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "sysmonEvents = spark.sql(\n", 280 | "'''\n", 281 | "SELECT *\n", 282 | "FROM eventLogs\n", 283 | "WHERE Channel = 'Sysmon'\n", 284 | "''')" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 13, 290 | "metadata": {}, 291 | "outputs": [ 292 | { 293 | "name": "stdout", 294 | "output_type": "stream", 295 | "text": [ 296 | "+-------+--------+--------------------------------------+\n", 297 | "|Channel|Event_Id|Description |\n", 298 | "+-------+--------+--------------------------------------+\n", 299 | "|Sysmon |1 |Process creation |\n", 300 | "|Sysmon |2 |A process changed a file creation time|\n", 301 | "|Sysmon |3 |Network connection |\n", 302 | "|Sysmon |4 |Sysmon service state changed |\n", 303 | "|Sysmon |5 |Process terminated |\n", 304 | "+-------+--------+--------------------------------------+\n", 305 | "\n" 306 | ] 307 | } 308 | ], 309 | "source": [ 310 | "sysmonEvents.show(truncate = False)" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "## Thank you! I hope you enjoyed it!" 318 | ] 319 | } 320 | ], 321 | "metadata": { 322 | "kernelspec": { 323 | "display_name": "PySpark_Python3", 324 | "language": "python", 325 | "name": "pyspark3" 326 | }, 327 | "language_info": { 328 | "codemirror_mode": { 329 | "name": "ipython", 330 | "version": 3 331 | }, 332 | "file_extension": ".py", 333 | "mimetype": "text/x-python", 334 | "name": "python", 335 | "nbconvert_exporter": "python", 336 | "pygments_lexer": "ipython3", 337 | "version": "3.7.6" 338 | } 339 | }, 340 | "nbformat": 4, 341 | "nbformat_minor": 4 342 | } -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/basic-concepts/03_Filtering_Summarizing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data Analysis with Spark.SQL: Filtering & Summarizing\n", 8 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 9 | "* **Project**: Infosec Jupyter Book\n", 10 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 11 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 12 | "* **Reference**: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "## Creating SQL view from Mordor APT29 dataset" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### Create Spark session" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "from pyspark.sql import SparkSession\n", 36 | "\n", 37 | "spark = SparkSession \\\n", 38 | " .builder \\\n", 39 | " .appName(\"Spark_Data_Analysis\") \\\n", 40 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 41 | " .getOrCreate()" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "### Expose the dataframe as a SQL view" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "apt29Json = '../datasets/apt29_evals_day1_manual_2020-05-01225525.json'\n", 58 | "\n", 59 | "apt29Df = spark.read.json(apt29Json)\n", 60 | "\n", 61 | "apt29Df.createOrReplaceTempView('apt29')" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "## Filtering & Summarizing data" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "### Filter Sysmon event 8 (Create Remote Thread) data" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [ 83 | { 84 | "name": "stdout", 85 | "output_type": "stream", 86 | "text": [ 87 | "This dataframe has 95 records!!\n", 88 | "+---------------------------------------------------------+-------------------------------+-------------+\n", 89 | "|SourceImage |TargetImage |StartFunction|\n", 90 | "+---------------------------------------------------------+-------------------------------+-------------+\n", 91 | "|C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe|C:\\Windows\\System32\\lsass.exe |- |\n", 92 | "|C:\\Windows\\System32\\csrss.exe |C:\\Windows\\System32\\svchost.exe|CtrlRoutine |\n", 93 | "|C:\\Windows\\System32\\csrss.exe |C:\\Windows\\System32\\svchost.exe|CtrlRoutine |\n", 94 | "|C:\\Windows\\System32\\csrss.exe |C:\\Windows\\System32\\svchost.exe|CtrlRoutine |\n", 95 | "|C:\\Windows\\System32\\csrss.exe |C:\\Windows\\System32\\svchost.exe|CtrlRoutine |\n", 96 | "+---------------------------------------------------------+-------------------------------+-------------+\n", 97 | "only showing top 5 rows\n", 98 | "\n" 99 | ] 100 | } 101 | ], 102 | "source": [ 103 | "sysmon8 = spark.sql(\n", 104 | "'''\n", 105 | "SELECT SourceImage, TargetImage, StartFunction\n", 106 | "FROM apt29\n", 107 | "WHERE Channel = 'Microsoft-Windows-Sysmon/Operational' AND EventID = 8\n", 108 | "''')\n", 109 | "\n", 110 | "print('This dataframe has {} records!!'.format(sysmon8.count()))\n", 111 | "sysmon8.show(n = 5, truncate = False)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "### Filter PowerShell processes within Sysmon event 8 (Create Remote Thread) data" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 4, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "This dataframe has 1 records!!\n", 131 | "+---------------------------------------------------------+-----------------------------+-------------+\n", 132 | "|SourceImage |TargetImage |StartFunction|\n", 133 | "+---------------------------------------------------------+-----------------------------+-------------+\n", 134 | "|C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe|C:\\Windows\\System32\\lsass.exe|- |\n", 135 | "+---------------------------------------------------------+-----------------------------+-------------+\n", 136 | "\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "sysmon8 = spark.sql(\n", 142 | "'''\n", 143 | "SELECT SourceImage, TargetImage, StartFunction\n", 144 | "FROM apt29\n", 145 | "WHERE Channel = 'Microsoft-Windows-Sysmon/Operational'\n", 146 | " AND EventID = 8\n", 147 | " AND SourceImage LIKE '%powershell.exe%'\n", 148 | "''')\n", 149 | "\n", 150 | "print('This dataframe has {} records!!'.format(sysmon8.count()))\n", 151 | "sysmon8.show(truncate = False)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "## SUMMARIZING data" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "### Stack Count event logs by source of data and event id" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 5, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "name": "stdout", 175 | "output_type": "stream", 176 | "text": [ 177 | "This dataframe has 203 records!!\n", 178 | "+----------------------------------------+-------+--------+\n", 179 | "|Channel |EventID|count(1)|\n", 180 | "+----------------------------------------+-------+--------+\n", 181 | "|Microsoft-Windows-Sysmon/Operational |12 |61151 |\n", 182 | "|Microsoft-Windows-Sysmon/Operational |10 |39283 |\n", 183 | "|Microsoft-Windows-Sysmon/Operational |7 |20259 |\n", 184 | "|Microsoft-Windows-Sysmon/Operational |13 |17541 |\n", 185 | "|Security |4658 |8561 |\n", 186 | "|Windows PowerShell |800 |5113 |\n", 187 | "|Microsoft-Windows-PowerShell/Operational|4103 |5080 |\n", 188 | "|Security |4690 |4269 |\n", 189 | "|Security |4656 |4260 |\n", 190 | "|Security |4663 |4197 |\n", 191 | "|Security |5156 |2679 |\n", 192 | "|security |5447 |2579 |\n", 193 | "|security |4658 |2412 |\n", 194 | "|Microsoft-Windows-Sysmon/Operational |11 |1649 |\n", 195 | "|Security |5158 |1465 |\n", 196 | "|security |4656 |1237 |\n", 197 | "|Microsoft-Windows-Sysmon/Operational |3 |1229 |\n", 198 | "|security |4690 |1202 |\n", 199 | "|security |4663 |1140 |\n", 200 | "|Security |4703 |902 |\n", 201 | "+----------------------------------------+-------+--------+\n", 202 | "only showing top 20 rows\n", 203 | "\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "eventLogs = spark.sql(\n", 209 | "'''\n", 210 | "SELECT Channel, EventID, COUNT(*)\n", 211 | "FROM apt29\n", 212 | "GROUP BY Channel, EventID\n", 213 | "ORDER BY COUNT(*) DESC\n", 214 | "''')\n", 215 | "\n", 216 | "print('This dataframe has {} records!!'.format(eventLogs.count()))\n", 217 | "eventLogs.show(truncate = False)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "### Filtering event logs groups with frequency less or equal to 500" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 6, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "name": "stdout", 234 | "output_type": "stream", 235 | "text": [ 236 | "This dataframe has 180 records!!\n", 237 | "+----------------------------------------+-------+-----+\n", 238 | "|Channel |EventID|Count|\n", 239 | "+----------------------------------------+-------+-----+\n", 240 | "|security |5156 |484 |\n", 241 | "|Microsoft-Windows-Sysmon/Operational |1 |447 |\n", 242 | "|security |5158 |431 |\n", 243 | "|Microsoft-Windows-Sysmon/Operational |23 |422 |\n", 244 | "|Microsoft-Windows-PowerShell/Operational|4104 |414 |\n", 245 | "|security |4673 |409 |\n", 246 | "|Microsoft-Windows-Sysmon/Operational |5 |401 |\n", 247 | "|Microsoft-Windows-Sysmon/Operational |18 |362 |\n", 248 | "|security |5154 |362 |\n", 249 | "|security |4688 |279 |\n", 250 | "|Security |4689 |238 |\n", 251 | "|Security |4627 |234 |\n", 252 | "|Security |4624 |234 |\n", 253 | "|Security |4634 |233 |\n", 254 | "|Microsoft-Windows-Sysmon/Operational |2 |209 |\n", 255 | "|Security |4688 |181 |\n", 256 | "|security |4945 |176 |\n", 257 | "|security |4689 |160 |\n", 258 | "|Security |4672 |154 |\n", 259 | "|Windows PowerShell |600 |138 |\n", 260 | "+----------------------------------------+-------+-----+\n", 261 | "only showing top 20 rows\n", 262 | "\n" 263 | ] 264 | } 265 | ], 266 | "source": [ 267 | "eventLogsLess = spark.sql(\n", 268 | "'''\n", 269 | "SELECT Channel, EventID, COUNT(*) as Count\n", 270 | "FROM apt29\n", 271 | "GROUP BY Channel, EventID\n", 272 | "HAVING Count <= 500\n", 273 | "ORDER BY Count DESC\n", 274 | "''')\n", 275 | "\n", 276 | "print('This dataframe has {} records!!'.format(eventLogsLess.count()))\n", 277 | "eventLogsLess.show(truncate = False)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "## Thank you! I hope you enjoyed it!" 285 | ] 286 | } 287 | ], 288 | "metadata": { 289 | "kernelspec": { 290 | "display_name": "PySpark_Python3", 291 | "language": "python", 292 | "name": "pyspark3" 293 | }, 294 | "language_info": { 295 | "codemirror_mode": { 296 | "name": "ipython", 297 | "version": 3 298 | }, 299 | "file_extension": ".py", 300 | "mimetype": "text/x-python", 301 | "name": "python", 302 | "nbconvert_exporter": "python", 303 | "pygments_lexer": "ipython3", 304 | "version": "3.7.6" 305 | } 306 | }, 307 | "nbformat": 4, 308 | "nbformat_minor": 2 309 | } 310 | -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/basic-concepts/04_Transforming.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data Analysis with Spark.SQL: Transforming\n", 8 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 9 | "* **Project**: Infosec Jupyter Book\n", 10 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 11 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 12 | "* **Reference**: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "## Creating SQL view from Mordor APT29 dataset" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### Create Spark session" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "from pyspark.sql import SparkSession\n", 36 | "\n", 37 | "spark = SparkSession \\\n", 38 | " .builder \\\n", 39 | " .appName(\"Spark_Data_Analysis\") \\\n", 40 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 41 | " .getOrCreate()" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "### Expose the dataframe as a SQL view" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "apt29Json = '../datasets/apt29_evals_day1_manual_2020-05-01225525.json'\n", 58 | "\n", 59 | "apt29Df = spark.read.json(apt29Json)\n", 60 | "\n", 61 | "apt29Df.createOrReplaceTempView('apt29')" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "## Transforming data with Spark Built-In functions" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "### Convert ProcessId (String) to Integer format" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": { 82 | "scrolled": false 83 | }, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | "This dataframe has 447 records!!\n", 90 | "root\n", 91 | " |-- ProcessId: string (nullable = true)\n", 92 | " |-- IntegerProcessId: integer (nullable = true)\n", 93 | "\n", 94 | "+---------+----------------+\n", 95 | "|ProcessId|IntegerProcessId|\n", 96 | "+---------+----------------+\n", 97 | "|8524 |8524 |\n", 98 | "|5156 |5156 |\n", 99 | "|2772 |2772 |\n", 100 | "|5944 |5944 |\n", 101 | "|4152 |4152 |\n", 102 | "+---------+----------------+\n", 103 | "only showing top 5 rows\n", 104 | "\n" 105 | ] 106 | } 107 | ], 108 | "source": [ 109 | "IntegerProcessId = spark.sql(\n", 110 | "'''\n", 111 | "SELECT ProcessId, cast(ProcessId as Integer) as IntegerProcessId\n", 112 | "FROM apt29\n", 113 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 114 | " AND EventID = 1\n", 115 | "''')\n", 116 | "\n", 117 | "print('This dataframe has {} records!!'.format(IntegerProcessId.count()))\n", 118 | "IntegerProcessId.printSchema()\n", 119 | "IntegerProcessId.show(n = 5, truncate = False)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "### Convert ProcessId (Integer) to Hexadecimal format" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 4, 132 | "metadata": { 133 | "scrolled": true 134 | }, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "This dataframe has 447 records!!\n", 141 | "root\n", 142 | " |-- ProcessId: string (nullable = true)\n", 143 | " |-- HexadecimalProcessId: string (nullable = true)\n", 144 | "\n", 145 | "+---------+--------------------+\n", 146 | "|ProcessId|HexadecimalProcessId|\n", 147 | "+---------+--------------------+\n", 148 | "|8524 |214C |\n", 149 | "|5156 |1424 |\n", 150 | "|2772 |AD4 |\n", 151 | "|5944 |1738 |\n", 152 | "|4152 |1038 |\n", 153 | "+---------+--------------------+\n", 154 | "only showing top 5 rows\n", 155 | "\n" 156 | ] 157 | } 158 | ], 159 | "source": [ 160 | "HexadecimalProcessId = spark.sql(\n", 161 | "'''\n", 162 | "SELECT ProcessId, hex(cast(ProcessId as Integer)) as HexadecimalProcessId\n", 163 | "FROM apt29\n", 164 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 165 | " AND EventID = 1\n", 166 | "''')\n", 167 | "\n", 168 | "print('This dataframe has {} records!!'.format(HexadecimalProcessId.count()))\n", 169 | "HexadecimalProcessId.printSchema()\n", 170 | "HexadecimalProcessId.show(n = 5, truncate = False)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "## Transforming data with Spark User Defined Functions (UDF)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "### Calculate the number of characters of Commad Line values in Sysmon 1 (Process Creation) events" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "* Define function" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 5, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "def LenCommand(value):\n", 201 | " Length = len(value)\n", 202 | " return Length" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "* Import **pyspark.sql.types**" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 6, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "from pyspark.sql.types import *" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "* Register **UDF**" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 7, 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "data": { 235 | "text/plain": [ 236 | "" 237 | ] 238 | }, 239 | "execution_count": 7, 240 | "metadata": {}, 241 | "output_type": "execute_result" 242 | } 243 | ], 244 | "source": [ 245 | "spark.udf.register(\"LengthCommand\", LenCommand, IntegerType())" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "* Use **UDF**" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 8, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "name": "stdout", 262 | "output_type": "stream", 263 | "text": [ 264 | "This dataframe has 447 records!!\n", 265 | "root\n", 266 | " |-- CommandLine: string (nullable = true)\n", 267 | " |-- LengthCommandLine: integer (nullable = true)\n", 268 | "\n", 269 | "+--------------------------------------------------------------------------------+-----------------+\n", 270 | "| CommandLine|LengthCommandLine|\n", 271 | "+--------------------------------------------------------------------------------+-----------------+\n", 272 | "| \"C:\\ProgramData\\victim\\‮cod.3aka3.scr\" /S| 43|\n", 273 | "|\\\\?\\C:\\windows\\system32\\conhost.exe --headless --width 80 --height 25 --signa...| 99|\n", 274 | "| \"C:\\windows\\system32\\cmd.exe\"| 29|\n", 275 | "| powershell| 10|\n", 276 | "|\"C:\\windows\\system32\\SearchProtocolHost.exe\" Global\\UsGthrFltPipeMssGthrPipe6...| 308|\n", 277 | "+--------------------------------------------------------------------------------+-----------------+\n", 278 | "only showing top 5 rows\n", 279 | "\n" 280 | ] 281 | } 282 | ], 283 | "source": [ 284 | "commandLine = spark.sql(\n", 285 | "'''\n", 286 | "SELECT CommandLine, LengthCommand(CommandLine) as LengthCommandLine\n", 287 | "FROM apt29\n", 288 | "WHERE Channel LIKE '%Sysmon%'\n", 289 | " AND EventID = 1\n", 290 | "''')\n", 291 | "\n", 292 | "print('This dataframe has {} records!!'.format(commandLine.count()))\n", 293 | "commandLine.printSchema()\n", 294 | "commandLine.show(n = 5, truncate = 80)" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "## Thank you! I hope you enjoyed it!" 302 | ] 303 | } 304 | ], 305 | "metadata": { 306 | "kernelspec": { 307 | "display_name": "PySpark_Python3", 308 | "language": "python", 309 | "name": "pyspark3" 310 | }, 311 | "language_info": { 312 | "codemirror_mode": { 313 | "name": "ipython", 314 | "version": 3 315 | }, 316 | "file_extension": ".py", 317 | "mimetype": "text/x-python", 318 | "name": "python", 319 | "nbconvert_exporter": "python", 320 | "pygments_lexer": "ipython3", 321 | "version": "3.7.6" 322 | } 323 | }, 324 | "nbformat": 4, 325 | "nbformat_minor": 2 326 | } 327 | -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/basic-concepts/05_Correlating.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data Analysis with Spark.SQL: Correlating\n", 8 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 9 | "* **Project**: Infosec Jupyter Book\n", 10 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 11 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 12 | "* **Reference**: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "## Creating SQL view from Mordor APT29 dataset" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### Create Spark session" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "from pyspark.sql import SparkSession\n", 36 | "\n", 37 | "spark = SparkSession \\\n", 38 | " .builder \\\n", 39 | " .appName(\"Spark_Data_Analysis\") \\\n", 40 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 41 | " .getOrCreate()" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "### Expose the dataframe as a SQL view" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "apt29Json = '../datasets/apt29_evals_day1_manual_2020-05-01225525.json'\n", 58 | "\n", 59 | "apt29Df = spark.read.json(apt29Json)\n", 60 | "\n", 61 | "apt29Df.createOrReplaceTempView('apt29')" 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "## Correlating data" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "### Get new Processes created by an Account that Logged On over the network" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": { 82 | "scrolled": true 83 | }, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | "This dataframe has 1 records!!\n", 90 | "+---------------+--------------+-----------------------------------+-------------------------------+---------+\n", 91 | "|SubjectUserName|TargetUserName|NewProcessName |ParentProcessName |IpAddress|\n", 92 | "+---------------+--------------+-----------------------------------+-------------------------------+---------+\n", 93 | "|NASHUA$ |pbeesly |C:\\Windows\\System32\\wsmprovhost.exe|C:\\Windows\\System32\\svchost.exe|- |\n", 94 | "+---------------+--------------+-----------------------------------+-------------------------------+---------+\n", 95 | "\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "lateralMovement = spark.sql(\n", 101 | "'''\n", 102 | "SELECT b.SubjectUserName, b.TargetUserName, b.NewProcessName, b.ParentProcessName, a.IpAddress\n", 103 | "FROM apt29 b\n", 104 | "INNER JOIN(\n", 105 | " SELECT TargetLogonId, LogonType, IpAddress\n", 106 | " FROM apt29\n", 107 | " WHERE lower(Channel) LIKE '%security%'\n", 108 | " AND EventID = 4624\n", 109 | " AND LogonType = 3\n", 110 | " )a\n", 111 | "ON a.TargetLogonId = b.TargetLogonId\n", 112 | "WHERE lower(b.Channel) LIKE '%security%'\n", 113 | " AND b.EventID = 4688\n", 114 | "''')\n", 115 | "\n", 116 | "print('This dataframe has {} records!!'.format(lateralMovement.count()))\n", 117 | "lateralMovement.show(truncate = False)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "### Add context (Parent Process) to Network Connection events" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 4, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "This dataframe has 1229 records!!\n", 137 | "+-------------------------+---------------+---------------+-----------------------+\n", 138 | "| Image| SourceIp| DestinationIp| ParentImage|\n", 139 | "+-------------------------+---------------+---------------+-----------------------+\n", 140 | "|C:\\Windows\\System32\\dn...| 10.0.0.4| 172.18.39.2| null|\n", 141 | "|C:\\Windows\\ADWS\\Micros...|0:0:0:0:0:0:0:1|0:0:0:0:0:0:0:1| null|\n", 142 | "|C:\\Windows\\System32\\ls...|0:0:0:0:0:0:0:1|0:0:0:0:0:0:0:1| null|\n", 143 | "|C:\\ProgramData\\victim\\...| 10.0.1.4| 192.168.0.5|C:\\Windows\\explorer.exe|\n", 144 | "|C:\\Windows\\System32\\sv...| 10.0.1.6| 10.0.0.4| null|\n", 145 | "+-------------------------+---------------+---------------+-----------------------+\n", 146 | "only showing top 5 rows\n", 147 | "\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "parentProcess = spark.sql(\n", 153 | "'''\n", 154 | "SELECT b.Image, b.SourceIp, b.DestinationIp, a.ParentImage\n", 155 | "FROM apt29 b\n", 156 | "LEFT JOIN(\n", 157 | " SELECT ProcessGuid, ParentImage\n", 158 | " FROM apt29\n", 159 | " WHERE lower(Channel) LIKE '%sysmon%'\n", 160 | " AND EventID = 1\n", 161 | " )a\n", 162 | "ON a.ProcessGuid = b.ProcessGuid\n", 163 | "WHERE lower(b.Channel) LIKE '%sysmon%'\n", 164 | " AND b.EventID = 3\n", 165 | "''')\n", 166 | "\n", 167 | "print('This dataframe has {} records!!'.format(parentProcess.count()))\n", 168 | "parentProcess.show(n = 5, truncate = 25)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "### Add context (Parent Process) to Processes that made a Network Connection and modified a Registry Value" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 5, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "This dataframe has 3524 records!!\n", 188 | "-RECORD 0----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", 189 | " ParentImage | C:\\Windows\\System32\\control.exe \n", 190 | " Image | C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe \n", 191 | " SourceIp | 10.0.1.4 \n", 192 | " DestinationIp | 192.168.0.5 \n", 193 | " TargetObject | HKLM\\System\\CurrentControlSet\\Services\\bam\\State\\UserSettings\\S-1-5-21-1830255721-3727074217-2423397540-1107\\\\Device\\HarddiskVolume2\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe \n", 194 | "only showing top 1 row\n", 195 | "\n" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "modifyRegistry = spark.sql(\n", 201 | "'''\n", 202 | "SELECT d.ParentImage, c.Image, c.SourceIp, c.DestinationIp, c.TargetObject\n", 203 | "FROM apt29 d\n", 204 | "RIGHT JOIN(\n", 205 | " SELECT b.ProcessGuid, b.Image, b.SourceIp, b.DestinationIp, a.TargetObject\n", 206 | " FROM apt29 b\n", 207 | " INNER JOIN(\n", 208 | " SELECT ProcessGuid, TargetObject\n", 209 | " FROM apt29\n", 210 | " WHERE lower(Channel) LIKE '%sysmon%'\n", 211 | " AND EventID = 13\n", 212 | " )a\n", 213 | " ON b.ProcessGuid = a.ProcessGuid\n", 214 | " WHERE lower(b.Channel) LIKE '%sysmon%'\n", 215 | " AND b.EventID = 3\n", 216 | ")c\n", 217 | "ON d.ProcessGuid = c.ProcessGuid\n", 218 | "WHERE lower(d.Channel) LIKE '%sysmon%'\n", 219 | " AND d.EventID = 1\n", 220 | "''')\n", 221 | "\n", 222 | "print('This dataframe has {} records!!'.format(modifyRegistry.count()))\n", 223 | "modifyRegistry.show(n = 1, vertical = True,truncate = False)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "## Thank you! I hope you enjoyed it!" 231 | ] 232 | } 233 | ], 234 | "metadata": { 235 | "kernelspec": { 236 | "display_name": "PySpark_Python3", 237 | "language": "python", 238 | "name": "pyspark3" 239 | }, 240 | "language_info": { 241 | "codemirror_mode": { 242 | "name": "ipython", 243 | "version": 3 244 | }, 245 | "file_extension": ".py", 246 | "mimetype": "text/x-python", 247 | "name": "python", 248 | "nbconvert_exporter": "python", 249 | "pygments_lexer": "ipython3", 250 | "version": "3.7.6" 251 | } 252 | }, 253 | "nbformat": 4, 254 | "nbformat_minor": 2 255 | } 256 | -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/basic-concepts/intro.md: -------------------------------------------------------------------------------- 1 | # Basic Data Analysis Concepts -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/datasets/apt29_evals_day1_manual.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/datasets/apt29_evals_day1_manual.zip -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/datasets/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/datasets/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/datasets/covenant_sc_dcerpc_smb_svcctl_QueryServiceStatus.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/datasets/covenant_sc_dcerpc_smb_svcctl_QueryServiceStatus.zip -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/datasets/covenant_sharpsc_dcerpc_smb_svcctl_EnumServiceStatusW.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/datasets/covenant_sharpsc_dcerpc_smb_svcctl_EnumServiceStatusW.zip -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/datasets/covenant_sharpwmi_dcerpc_wmi_remotecreateinstance.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/datasets/covenant_sharpwmi_dcerpc_wmi_remotecreateinstance.zip -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/datasets/empire_psinject.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/datasets/empire_psinject.zip -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/intro.md: -------------------------------------------------------------------------------- 1 | # Defcon BTV 2020 2 | 3 | ## Data Analysis for Detection Research Through Jupyter Notebooks 101 4 | From a detection research perspective, even after learning how to simulate a threat actor technique and generate some data in your lab environment, you might still struggle to know what to do with it. In some cases, you might need to filter, transform, correlate and visualize your data to come up with the right detection logic. In this workshop, we will walk you through a few basic data analysis techniques using open source and SIEM agnostic tools such as Jupyter Notebooks which are not only used by large organizations, but also can be deployed at home for free. 5 | 6 | ## Pre Requirements 7 | 8 | * Basics of Python 9 | * A computer with Docker installed (optional). 10 | * If you are planning on deploying Jupyter in your own system, we will show you how to deploy it via Docker. It is not necessary since we are going to use BinderHub to interact with Jupyter Notebooks throughout the whole workshop. 11 | 12 | ## Outline 13 | 14 | * Introduction to Jupyter Notebooks (10 mins) 15 | * Deployment Options 16 | * Binder Project 17 | * Introduction to Apache Spark (5 mins) 18 | * Spark Engine 19 | * Spark SQL & DataFrames 20 | * Data Analysis Process 101 (10 mins) 21 | * We need data! (Mordor Project) (5 mins) 22 | * Download Datasets 23 | * Raw Data -> DataFrame 24 | * A few data analysis techniques: (1 hour) 25 | * filter 26 | * transform 27 | * correlate 28 | * visualize 29 | 30 | ## Speaker: Jose Rodriguez 31 | 32 | Twitter Handle: **@Cyb3rPandaH** 33 | 34 | Jose is currently part of the ATT&CK team where he is currently revamping the concept of data sources. He is also one of the founders of Open Threat Research (OTR) and author of open source projects such as Infosec Jupyter Book, Open Source Security Event Metadata (OSSEM), Mordor, and Openhunt. 35 | 36 | ## Speaker: Roberto Rodriguez 37 | 38 | Roberto Rodriquez is a threat researcher and security engineer at the Microsoft Threat Intelligence Center (MSTIC) R&D team. 39 | 40 | He is also the author of several open source projects, such as the Threat Hunter Playbook, Mordor, OSSEM, HELK and others, to aid the community development of techniques and tooling for threat research. He is also the founder of a new community movement to empower others in the InfoSec community named Open Threat Research. 41 | 42 | Blog at `https://medium.com/@Cyb3rWard0g` 43 | 44 | Twitter Handle: **@Cyb3rWard0g** 45 | 46 | Have Fun -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/use-cases/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/community-workshops/defcon_btv_2020/use-cases/.DS_Store -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/use-cases/01_Data_Analysis_Process_Injection.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Process Injection - CreatRemoteThread\n", 8 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 9 | "* **Project**: Infosec Jupyter Book\n", 10 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 11 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 12 | "* **Reference**: \n", 13 | " * https://spark.apache.org/docs/latest/api/python/pyspark.sql.html\n", 14 | " * https://docs.microsoft.com/en-us/windows/win32/procthread/process-security-and-access-rights" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## Creating SQL view from Mordor Process Injection dataset" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Create Spark session" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "from pyspark.sql import SparkSession\n", 38 | "\n", 39 | "spark = SparkSession \\\n", 40 | " .builder \\\n", 41 | " .appName(\"Spark_Data_Analysis\") \\\n", 42 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 43 | " .getOrCreate()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### Unzip Mordor Dataset" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "Archive: ../datasets/empire_psinject.zip\n", 63 | " inflating: ../datasets/empire_psinject_2020-08-07143205.json \n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "! unzip -o ../datasets/empire_psinject.zip -d ../datasets/" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "### Expose the dataframe as a SQL view" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "processInjectionJson = '../datasets/empire_psinject_2020-08-07143205.json'\n", 85 | "\n", 86 | "processInjectionDf = spark.read.json(processInjectionJson)\n", 87 | "\n", 88 | "processInjectionDf.createOrReplaceTempView('processInjection')" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Filtering & Summarizing data" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "### Get most frecuent Access Flags (Bitmask) of Processes accessing other Processes" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "* Create dataframe" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 4, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "name": "stdout", 119 | "output_type": "stream", 120 | "text": [ 121 | "This dataframe has 10 records!!\n", 122 | "+-------------+-----+\n", 123 | "|GrantedAccess|Count|\n", 124 | "+-------------+-----+\n", 125 | "| 0x1000| 463|\n", 126 | "| 0x3000| 83|\n", 127 | "| 0x40| 4|\n", 128 | "| 0x1fffff| 2|\n", 129 | "| 0x1400| 2|\n", 130 | "| 0x1410| 2|\n", 131 | "| 0x1478| 2|\n", 132 | "| 0x1f3fff| 1|\n", 133 | "| 0x100000| 1|\n", 134 | "| 0x101541| 1|\n", 135 | "+-------------+-----+\n", 136 | "\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "processAccess = spark.sql(\n", 142 | "'''\n", 143 | "SELECT GrantedAccess, count(*) as Count\n", 144 | "FROM processInjection\n", 145 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 146 | " AND EventID = 10\n", 147 | "GROUP BY GrantedAccess\n", 148 | "ORDER BY Count DESC\n", 149 | "''')\n", 150 | "\n", 151 | "print('This dataframe has {} records!!'.format(processAccess.count()))\n", 152 | "processAccess.show()" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "## Transforming data" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "### Create a Spark UDF to get the specific Access Rights related to every Bitmask" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "* Define a function" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 5, 179 | "metadata": {}, 180 | "outputs": [], 181 | "source": [ 182 | "def getSpecificAccessRights(bitmask):\n", 183 | " bitmask = int(bitmask,16)\n", 184 | " specificAccessRights = {'PROCESS_CREATE_PROCESS' : 0x0080,\n", 185 | " 'PROCESS_CREATE_THREAD' : 0x0002,\n", 186 | " 'PROCESS_DUP_HANDLE' : 0x0040,\n", 187 | " 'PROCESS_QUERY_INFORMATION' : 0x0400,\n", 188 | " 'PROCESS_QUERY_LIMITED_INFORMATION' : 0x1000,\n", 189 | " 'PROCESS_SET_INFORMATION' : 0x0200,\n", 190 | " 'PROCESS_SET_QUOTA' : 0x0100,\n", 191 | " 'PROCESS_SUSPEND_RESUME' : 0x0800,\n", 192 | " 'PROCESS_TERMINATE' : 0x0001,\n", 193 | " 'PROCESS_VM_OPERATION' : 0x0008,\n", 194 | " 'PROCESS_VM_READ' : 0x0010,\n", 195 | " 'PROCESS_VM_WRITE' : 0x0020,\n", 196 | " 'SYNCHRONIZE' : 0x00100000,\n", 197 | " 'PROCESS_SET_LIMITED_INFORMATION' : 0x2000}\n", 198 | " \n", 199 | " rights = [ ]\n", 200 | " \n", 201 | " for key,value in specificAccessRights.items():\n", 202 | " if value & bitmask != 0:\n", 203 | " rights.append(key)\n", 204 | " \n", 205 | " return rights" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "* Register Spark UDF" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 6, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/plain": [ 223 | "" 224 | ] 225 | }, 226 | "execution_count": 6, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "from pyspark.sql.types import *\n", 233 | "spark.udf.register(\"getAccessRights\", getSpecificAccessRights,ArrayType(StringType()))" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "* Apply the Spark UDF" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 7, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "This dataframe has 10 records!!\n", 253 | "+-------------+--------------------------------------------------------------------------------+-----+\n", 254 | "|GrantedAccess| RightsRequested|Count|\n", 255 | "+-------------+--------------------------------------------------------------------------------+-----+\n", 256 | "| 0x1000| [PROCESS_QUERY_LIMITED_INFORMATION]| 463|\n", 257 | "| 0x3000| [PROCESS_QUERY_LIMITED_INFORMATION, PROCESS_SET_LIMITED_INFORMATION]| 83|\n", 258 | "| 0x40| [PROCESS_DUP_HANDLE]| 4|\n", 259 | "| 0x1400| [PROCESS_QUERY_INFORMATION, PROCESS_QUERY_LIMITED_INFORMATION]| 2|\n", 260 | "| 0x1410| [PROCESS_QUERY_INFORMATION, PROCESS_QUERY_LIMITED_INFORMATION, PROCESS_VM_READ]| 2|\n", 261 | "| 0x1478|[PROCESS_DUP_HANDLE, PROCESS_QUERY_INFORMATION, PROCESS_QUERY_LIMITED_INFORMA...| 2|\n", 262 | "| 0x1fffff|[PROCESS_CREATE_PROCESS, PROCESS_CREATE_THREAD, PROCESS_DUP_HANDLE, PROCESS_Q...| 2|\n", 263 | "| 0x1f3fff|[PROCESS_CREATE_PROCESS, PROCESS_CREATE_THREAD, PROCESS_DUP_HANDLE, PROCESS_Q...| 1|\n", 264 | "| 0x100000| [SYNCHRONIZE]| 1|\n", 265 | "| 0x101541|[PROCESS_DUP_HANDLE, PROCESS_QUERY_INFORMATION, PROCESS_QUERY_LIMITED_INFORMA...| 1|\n", 266 | "+-------------+--------------------------------------------------------------------------------+-----+\n", 267 | "\n" 268 | ] 269 | } 270 | ], 271 | "source": [ 272 | "processAccessRights = spark.sql(\n", 273 | "'''\n", 274 | "SELECT GrantedAccess, getAccessRights(GrantedAccess) as RightsRequested, count(*) as Count\n", 275 | "FROM processInjection\n", 276 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 277 | " AND EventID = 10\n", 278 | "GROUP BY GrantedAccess, RightsRequested\n", 279 | "ORDER BY Count DESC\n", 280 | "''')\n", 281 | "\n", 282 | "print('This dataframe has {} records!!'.format(processAccessRights.count()))\n", 283 | "processAccessRights.show(truncate = 80)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "### Filter events that requested \"Creation of Thread\" rights" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "* Filter **PROCESS_CREATE_THREAD (0x0002)**: Required to create a thread." 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 8, 303 | "metadata": {}, 304 | "outputs": [ 305 | { 306 | "name": "stdout", 307 | "output_type": "stream", 308 | "text": [ 309 | "This dataframe has 3 records!!\n", 310 | "+-------------+---------------------------------------------------------+-------------------------------------+\n", 311 | "|GrantedAccess| SourceImage| TargetImage|\n", 312 | "+-------------+---------------------------------------------------------+-------------------------------------+\n", 313 | "| 0x1f3fff|C:\\windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe| C:\\windows\\system32\\notepad.exe|\n", 314 | "| 0x1fffff| C:\\windows\\system32\\svchost.exe|C:\\windows\\system32\\wbem\\wmiprvse.exe|\n", 315 | "| 0x1fffff| C:\\windows\\system32\\csrss.exe|C:\\windows\\system32\\wbem\\wmiprvse.exe|\n", 316 | "+-------------+---------------------------------------------------------+-------------------------------------+\n", 317 | "\n" 318 | ] 319 | } 320 | ], 321 | "source": [ 322 | "createThread = spark.sql(\n", 323 | "'''\n", 324 | "SELECT GrantedAccess, SourceImage, TargetImage\n", 325 | "FROM processInjection\n", 326 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 327 | " AND EventID = 10\n", 328 | " AND array_contains(getAccessRights(GrantedAccess),'PROCESS_CREATE_THREAD')\n", 329 | "''')\n", 330 | "\n", 331 | "print('This dataframe has {} records!!'.format(createThread.count()))\n", 332 | "createThread.show(truncate = 80)" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "## Correlating data" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "### Find Source Processes that used CreateRemoteThread APIs" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 9, 352 | "metadata": {}, 353 | "outputs": [ 354 | { 355 | "name": "stdout", 356 | "output_type": "stream", 357 | "text": [ 358 | "This dataframe has 88 records!!\n", 359 | "+----------------------------------------+-------------------------------+-----------+\n", 360 | "| SourceImage| TargetImage|NewThreadId|\n", 361 | "+----------------------------------------+-------------------------------+-----------+\n", 362 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 3004|\n", 363 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 3756|\n", 364 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 2836|\n", 365 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 5764|\n", 366 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 8044|\n", 367 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 6168|\n", 368 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 8292|\n", 369 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 2976|\n", 370 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 1820|\n", 371 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 8252|\n", 372 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 4952|\n", 373 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 5436|\n", 374 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 9036|\n", 375 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 6556|\n", 376 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 8468|\n", 377 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 8592|\n", 378 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 6628|\n", 379 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 2272|\n", 380 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 904|\n", 381 | "|C:\\windows\\System32\\WindowsPowerShell...|C:\\windows\\system32\\notepad.exe| 8816|\n", 382 | "+----------------------------------------+-------------------------------+-----------+\n", 383 | "only showing top 20 rows\n", 384 | "\n" 385 | ] 386 | } 387 | ], 388 | "source": [ 389 | "networkConnection = spark.sql(\n", 390 | "'''\n", 391 | "SELECT b. SourceImage, b.TargetImage, a.NewThreadId\n", 392 | "FROM processInjection b\n", 393 | "INNER JOIN(\n", 394 | " SELECT SourceProcessGuid, NewThreadId\n", 395 | " FROM processInjection\n", 396 | " WHERE lower(Channel) LIKE '%sysmon%'\n", 397 | " AND EventID = 8\n", 398 | ")a\n", 399 | "ON b.SourceProcessGUID = a.SourceProcessGuid\n", 400 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 401 | " AND b.EventID = 10\n", 402 | " AND array_contains(getAccessRights(GrantedAccess),'PROCESS_CREATE_THREAD')\n", 403 | "''')\n", 404 | "\n", 405 | "print('This dataframe has {} records!!'.format(networkConnection.count()))\n", 406 | "networkConnection.show(truncate = 40)" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": {}, 412 | "source": [ 413 | "### Find Target Processes that made Network Connections " 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": 10, 419 | "metadata": {}, 420 | "outputs": [ 421 | { 422 | "name": "stdout", 423 | "output_type": "stream", 424 | "text": [ 425 | "This dataframe has 16 records!!\n", 426 | "+-------------------------------+-----------+-------------+\n", 427 | "| TargetImage| SourceIp|DestinationIp|\n", 428 | "+-------------------------------+-----------+-------------+\n", 429 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 430 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 431 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 432 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 433 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 434 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 435 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 436 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 437 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 438 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 439 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 440 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 441 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 442 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 443 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 444 | "|C:\\windows\\system32\\notepad.exe|172.18.39.5| 10.10.10.5|\n", 445 | "+-------------------------------+-----------+-------------+\n", 446 | "\n" 447 | ] 448 | } 449 | ], 450 | "source": [ 451 | "networkConnection = spark.sql(\n", 452 | "'''\n", 453 | "SELECT b.TargetImage, a.SourceIp, a.DestinationIp\n", 454 | "FROM processInjection b\n", 455 | "INNER JOIN(\n", 456 | " SELECT ProcessGuid, SourceIp, DestinationIp\n", 457 | " FROM processInjection\n", 458 | " WHERE lower(Channel) LIKE '%sysmon%'\n", 459 | " AND EventID = 3\n", 460 | ")a\n", 461 | "ON b.TargetProcessGUID = a.ProcessGuid\n", 462 | "WHERE lower(Channel) LIKE '%sysmon%'\n", 463 | " AND b.EventID = 10\n", 464 | " AND array_contains(getAccessRights(GrantedAccess),'PROCESS_CREATE_THREAD')\n", 465 | "''')\n", 466 | "\n", 467 | "print('This dataframe has {} records!!'.format(networkConnection.count()))\n", 468 | "networkConnection.show(truncate = 40)" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "## Thank you! I hope you enjoyed it!" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": {}, 482 | "outputs": [], 483 | "source": [] 484 | } 485 | ], 486 | "metadata": { 487 | "kernelspec": { 488 | "display_name": "PySpark_Python3", 489 | "language": "python", 490 | "name": "pyspark3" 491 | }, 492 | "language_info": { 493 | "codemirror_mode": { 494 | "name": "ipython", 495 | "version": 3 496 | }, 497 | "file_extension": ".py", 498 | "mimetype": "text/x-python", 499 | "name": "python", 500 | "nbconvert_exporter": "python", 501 | "pygments_lexer": "ipython3", 502 | "version": "3.7.6" 503 | } 504 | }, 505 | "nbformat": 4, 506 | "nbformat_minor": 2 507 | } 508 | -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/use-cases/02_Data_Analysis_DCSync_dcerpc.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# DCSync dcerpc dcerpc\n", 8 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 9 | "* **Project**: Infosec Jupyter Book\n", 10 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 11 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 12 | "* **Reference**: \n", 13 | " * https://spark.apache.org/docs/latest/api/python/pyspark.sql.html\n", 14 | " * https://threathunterplaybook.com/notebooks/windows/06_credential_access/WIN-180815210510.html" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## Creating SQL view from Mordor DCSync dataset" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Create Spark session" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "from pyspark.sql import SparkSession\n", 38 | "\n", 39 | "spark = SparkSession \\\n", 40 | " .builder \\\n", 41 | " .appName(\"Spark_Data_Analysis\") \\\n", 42 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 43 | " .getOrCreate()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### Unzip Mordor Dataset" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "Archive: ../datasets/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip\r\n", 63 | " inflating: ../datasets/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges_2020-08-05020926.json \r\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "! unzip -o ../datasets/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges.zip -d ../datasets/" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "### Expose the dataframe as a SQL view" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "dcSyncJson = '../datasets/covenant_dcsync_dcerpc_drsuapi_DsGetNCChanges_2020-08-05020926.json'\n", 85 | "\n", 86 | "dcSyncDf = spark.read.json(dcSyncJson)\n", 87 | "\n", 88 | "dcSyncDf.createOrReplaceTempView('dcSync')" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Technical Description\n", 96 | "Active Directory replication is the process by which the changes that originate on one domain controller are automatically transferred to other domain controllers that store the same data.\n", 97 | "Active Directory data takes the form of objects that have properties, or attributes. Each object is an instance of an object class, and object classes and their respective attributes are defined in the Active Directory schema.\n", 98 | "The values of the attributes define the object, and a change to a value of an attribute must be transferred from the domain controller on which it occurs to every other domain controller that stores a replica of that object.\n", 99 | "An adversary can abuse this model and request information about a specific account via the replication request.\n", 100 | "This is done from an account with sufficient permissions (usually domain admin level) to perform that request.\n", 101 | "Usually the accounts performing replication operations in a domain are computer accounts (i.e dcaccount$).\n", 102 | "Therefore, it might be abnormal to see other non-dc-accounts doing it.\n", 103 | "\n", 104 | "The following access rights / permissions are needed for the replication request according to the domain functional level\n", 105 | "\n", 106 | "| Control access right symbol | Identifying GUID used in ACE |\n", 107 | "| :-----------------------------| :------------------------------|\n", 108 | "| DS-Replication-Get-Changes | 1131f6aa-9c07-11d1-f79f-00c04fc2dcd2 |\n", 109 | "| DS-Replication-Get-Changes-All | 1131f6ad-9c07-11d1-f79f-00c04fc2dcd2 |\n", 110 | "| DS-Replication-Get-Changes-In-Filtered-Set | 89e95b76-444d-4c62-991a-0facbeda640c |\n", 111 | "\n", 112 | "Additional reading\n", 113 | "* https://github.com/hunters-forge/ThreatHunter-Playbook/tree/master/docs/library/active_directory_replication.md" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "## Filtering & Summarizing data" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "### What Users used replication request rights" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 4, 133 | "metadata": {}, 134 | "outputs": [ 135 | { 136 | "name": "stdout", 137 | "output_type": "stream", 138 | "text": [ 139 | "This dataframe has 3 records!!\n", 140 | "+-----------------------+-----------------------+---------------+--------------+\n", 141 | "|@timestamp |Hostname |SubjectUserName|SubjectLogonId|\n", 142 | "+-----------------------+-----------------------+---------------+--------------+\n", 143 | "|2020-08-05 02:10:03.798|MORDORDC.theshire.local|pgustavo |0x824909 |\n", 144 | "|2020-08-05 02:10:03.799|MORDORDC.theshire.local|pgustavo |0x824909 |\n", 145 | "|2020-08-05 02:10:03.799|MORDORDC.theshire.local|pgustavo |0x824909 |\n", 146 | "+-----------------------+-----------------------+---------------+--------------+\n", 147 | "\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "operationObject = spark.sql(\n", 153 | "'''\n", 154 | "SELECT `@timestamp`, Hostname, SubjectUserName, SubjectLogonId\n", 155 | "FROM dcSync\n", 156 | "WHERE Channel = \"Security\"\n", 157 | " AND EventID = 4662\n", 158 | " AND AccessMask = \"0x100\"\n", 159 | " AND (\n", 160 | " Properties LIKE \"%1131f6aa_9c07_11d1_f79f_00c04fc2dcd2%\"\n", 161 | " OR Properties LIKE \"%1131f6ad_9c07_11d1_f79f_00c04fc2dcd2%\"\n", 162 | " OR Properties LIKE \"%89e95b76_444d_4c62_991a_0facbeda640c%\"\n", 163 | " )\n", 164 | " AND NOT SubjectUserName LIKE \"%$\"\n", 165 | "''')\n", 166 | "\n", 167 | "print('This dataframe has {} records!!'.format(operationObject.count()))\n", 168 | "operationObject.show(truncate = False)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "## Correlating data" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "### Get more information about the Endpoint that requested the replication" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 5, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "This dataframe has 3 records!!\n", 195 | "+-----------------------+-----------------------+---------------+--------------+-----------+\n", 196 | "|@timestamp |Hostname |SubjectUserName|SubjectLogonId|IpAddress |\n", 197 | "+-----------------------+-----------------------+---------------+--------------+-----------+\n", 198 | "|2020-08-05 02:10:03.798|MORDORDC.theshire.local|pgustavo |0x824909 |172.18.39.5|\n", 199 | "|2020-08-05 02:10:03.799|MORDORDC.theshire.local|pgustavo |0x824909 |172.18.39.5|\n", 200 | "|2020-08-05 02:10:03.799|MORDORDC.theshire.local|pgustavo |0x824909 |172.18.39.5|\n", 201 | "+-----------------------+-----------------------+---------------+--------------+-----------+\n", 202 | "\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "authentication = spark.sql(\n", 208 | " '''\n", 209 | "SELECT o.`@timestamp`, o.Hostname, o.SubjectUserName, o.SubjectLogonId, a.IpAddress\n", 210 | "FROM dcSync o\n", 211 | "INNER JOIN (\n", 212 | " SELECT Hostname,TargetUserName,TargetLogonId,IpAddress\n", 213 | " FROM dcSync\n", 214 | " WHERE lower(Channel) = \"security\"\n", 215 | " AND EventID = 4624\n", 216 | " AND LogonType = 3\n", 217 | " AND IpAddress IS NOT NULL\n", 218 | " AND NOT TargetUserName LIKE \"%$\"\n", 219 | " ) a\n", 220 | "ON o.SubjectLogonId = a.TargetLogonId\n", 221 | "WHERE lower(o.Channel) = \"security\"\n", 222 | " AND o.EventID = 4662\n", 223 | " AND o.AccessMask = \"0x100\"\n", 224 | " AND (\n", 225 | " o.Properties LIKE \"%1131f6aa_9c07_11d1_f79f_00c04fc2dcd2%\"\n", 226 | " OR o.Properties LIKE \"%1131f6ad_9c07_11d1_f79f_00c04fc2dcd2%\"\n", 227 | " OR o.Properties LIKE \"%89e95b76_444d_4c62_991a_0facbeda640c%\"\n", 228 | " )\n", 229 | " AND o.Hostname = a.Hostname\n", 230 | " AND NOT o.SubjectUserName LIKE \"%$\"\n", 231 | " '''\n", 232 | ")\n", 233 | "\n", 234 | "print('This dataframe has {} records!!'.format(authentication.count()))\n", 235 | "authentication.show(truncate = False)" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "## Thank you! I hope you enjoyed it!" 243 | ] 244 | } 245 | ], 246 | "metadata": { 247 | "kernelspec": { 248 | "display_name": "PySpark_Python3", 249 | "language": "python", 250 | "name": "pyspark3" 251 | }, 252 | "language_info": { 253 | "codemirror_mode": { 254 | "name": "ipython", 255 | "version": 3 256 | }, 257 | "file_extension": ".py", 258 | "mimetype": "text/x-python", 259 | "name": "python", 260 | "nbconvert_exporter": "python", 261 | "pygments_lexer": "ipython3", 262 | "version": "3.7.6" 263 | } 264 | }, 265 | "nbformat": 4, 266 | "nbformat_minor": 2 267 | } 268 | -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/use-cases/03_Data_Analysis_RemoteCreateInstance_dcerpc_wmi.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Remote Create Instance - dcerpc - wmi\n", 8 | "* **Author**: Jose Rodriguez (@Cyb3rPandah)\n", 9 | "* **Project**: Infosec Jupyter Book\n", 10 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 11 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 12 | "* **Reference**: \n", 13 | " * https://spark.apache.org/docs/latest/api/python/pyspark.sql.html\n", 14 | " * https://threathunterplaybook.com/notebooks/windows/08_lateral_movement/WIN-190810201010.html" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## Creating SQL view from Mordor Process Injection dataset" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "### Create Spark session" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 1, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "from pyspark.sql import SparkSession\n", 38 | "\n", 39 | "spark = SparkSession \\\n", 40 | " .builder \\\n", 41 | " .appName(\"Spark_Data_Analysis\") \\\n", 42 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 43 | " .getOrCreate()" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "### Unzip Mordor Dataset" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "Archive: ../datasets/covenant_sharpwmi_dcerpc_wmi_remotecreateinstance.zip\r\n", 63 | " inflating: ../datasets/covenant_sharpwmi_dcerpc_wmi_remotecreateinstance_2020-08-06035621.json \r\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "! unzip -o ../datasets/covenant_sharpwmi_dcerpc_wmi_remotecreateinstance.zip -d ../datasets/" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "### Expose the dataframe as a SQL view" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "wmiJson = '../datasets/covenant_sharpwmi_dcerpc_wmi_remotecreateinstance_2020-08-06035621.json'\n", 85 | "\n", 86 | "wmiDf = spark.read.json(wmiJson)\n", 87 | "\n", 88 | "wmiDf.createOrReplaceTempView('wmi')" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "## Technical Description\n", 96 | "WMI is the Microsoft implementation of the Web-Based Enterprise Management (WBEM) and Common Information Model (CIM).\n", 97 | "Both standards aim to provide an industry-agnostic means of collecting and transmitting information related to any managed component in an enterprise.\n", 98 | "An example of a managed component in WMI would be a running process, registry key, installed service, file information, etc.\n", 99 | "At a high level, Microsoft’s implementation of these standards can be summarized as follows > Managed Components Managed components are represented as WMI objects — class instances representing highly structured operating system data. Microsoft provides a wealth of WMI objects that communicate information related to the operating system. E.g. Win32_Process, Win32_Service, AntiVirusProduct, Win32_StartupCommand, etc.\n", 100 | "\n", 101 | "One well known lateral movement technique is performed via the WMI object — class Win32_Process and its method Create.\n", 102 | "This is because the Create method allows a user to create a process either locally or remotely.\n", 103 | "One thing to notice is that when the Create method is used on a remote system, the method is run under a host process named “Wmiprvse.exe”.\n", 104 | "\n", 105 | "The process WmiprvSE.exe is what spawns the process defined in the CommandLine parameter of the Create method. Therefore, the new process created remotely will have Wmiprvse.exe as a parent. WmiprvSE.exe is a DCOM server and it is spawned underneath the DCOM service host svchost.exe with the following parameters C:\\WINDOWS\\system32\\svchost.exe -k DcomLaunch -p.\n", 106 | "From a logon session perspective, on the target, WmiprvSE.exe is spawned in a different logon session by the DCOM service host. However, whatever is executed by WmiprvSE.exe occurs on the new network type (3) logon session created by the user that authenticated from the network.\n", 107 | "\n", 108 | "Additional Reading\n", 109 | "* https://github.com/hunters-forge/ThreatHunter-Playbook/tree/master/docs/library/logon_session.md" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "## Filtering data" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "### Look for wmiprvse.exe spawning processes that are part of non-system account sessions." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 4, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "This dataframe has 1 records!!\n", 136 | "-RECORD 0--------------------------------------------------\n", 137 | " @timestamp | 2020-08-06 03:56:43.178 \n", 138 | " Hostname | WORKSTATION6.theshire.local \n", 139 | " SubjectUserName | WORKSTATION6$ \n", 140 | " TargetUserName | pgustavo \n", 141 | " NewProcessName | C:\\Windows\\System32\\GruntHTTP2.exe \n", 142 | " CommandLine | \"C:\\\\Windows\\\\System32\\\\GruntHTTP2.exe\" \n", 143 | "\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "processCreation = spark.sql(\n", 149 | "'''\n", 150 | "SELECT `@timestamp`, Hostname, SubjectUserName, TargetUserName, NewProcessName, CommandLine\n", 151 | "FROM wmi\n", 152 | "WHERE lower(Channel) = \"security\"\n", 153 | " AND EventID = 4688\n", 154 | " AND lower(ParentProcessName) LIKE \"%wmiprvse.exe\"\n", 155 | " AND NOT TargetLogonId = \"0x3e7\"\n", 156 | "''')\n", 157 | "\n", 158 | "print('This dataframe has {} records!!'.format(processCreation.count()))\n", 159 | "processCreation.show(vertical = True, truncate = False)" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "## Correlating data" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "### Look for non-system accounts leveraging WMI over the netwotk to execute code" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 5, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "name": "stdout", 183 | "output_type": "stream", 184 | "text": [ 185 | "This dataframe has 1 records!!\n", 186 | "-RECORD 0--------------------------------------------------\n", 187 | " @timestamp | 2020-08-06 03:56:43.178 \n", 188 | " Hostname | WORKSTATION6.theshire.local \n", 189 | " SubjectUserName | WORKSTATION6$ \n", 190 | " TargetUserName | pgustavo \n", 191 | " NewProcessName | C:\\Windows\\System32\\GruntHTTP2.exe \n", 192 | " CommandLine | \"C:\\\\Windows\\\\System32\\\\GruntHTTP2.exe\" \n", 193 | " IpAddress | 172.18.39.5 \n", 194 | "\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "authenticationNetwork = spark.sql(\n", 200 | "'''\n", 201 | "SELECT o.`@timestamp`, o.Hostname, o.SubjectUserName, o.TargetUserName, o.NewProcessName, o.CommandLine, a.IpAddress\n", 202 | "FROM wmi o\n", 203 | "INNER JOIN (\n", 204 | " SELECT Hostname,TargetUserName,TargetLogonId,IpAddress\n", 205 | " FROM wmi\n", 206 | " WHERE lower(Channel) = \"security\"\n", 207 | " AND LogonType = 3\n", 208 | " AND IpAddress IS NOT NULL\n", 209 | " AND NOT TargetUserName LIKE \"%$\"\n", 210 | " ) a\n", 211 | "ON o.TargetLogonId = a.TargetLogonId\n", 212 | "WHERE lower(o.Channel) = \"security\"\n", 213 | " AND o.EventID = 4688\n", 214 | " AND lower(o.ParentProcessName) LIKE \"%wmiprvse.exe\"\n", 215 | " AND NOT o.TargetLogonId = \"0x3e7\"\n", 216 | "'''\n", 217 | ")\n", 218 | "\n", 219 | "print('This dataframe has {} records!!'.format(authenticationNetwork.count()))\n", 220 | "authenticationNetwork.show(vertical = True, truncate = False)" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "## Thank you! I hope you enjoyed it!" 228 | ] 229 | } 230 | ], 231 | "metadata": { 232 | "kernelspec": { 233 | "display_name": "PySpark_Python3", 234 | "language": "python", 235 | "name": "pyspark3" 236 | }, 237 | "language_info": { 238 | "codemirror_mode": { 239 | "name": "ipython", 240 | "version": 3 241 | }, 242 | "file_extension": ".py", 243 | "mimetype": "text/x-python", 244 | "name": "python", 245 | "nbconvert_exporter": "python", 246 | "pygments_lexer": "ipython3", 247 | "version": "3.7.6" 248 | } 249 | }, 250 | "nbformat": 4, 251 | "nbformat_minor": 2 252 | } 253 | -------------------------------------------------------------------------------- /docs/community-workshops/defcon_btv_2020/use-cases/intro.md: -------------------------------------------------------------------------------- 1 | # Use Cases -------------------------------------------------------------------------------- /docs/fundamentals/libraries/intro.md: -------------------------------------------------------------------------------- 1 | # Libraries 2 | 3 | General libraries used by security analysts. -------------------------------------------------------------------------------- /docs/fundamentals/libraries/numpy_arrays.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to NumPy Arrays\n", 8 | "----------------------------------------------------------------------------\n", 9 | "## Goals:\n", 10 | "* Learn the basics of Python Numpy Arrays" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "**References:**\n", 18 | "* http://www.numpy.org/\n", 19 | "* https://docs.scipy.org/doc/numpy/user/quickstart.html\n", 20 | "* https://www.datacamp.com/community/tutorials/python-numpy-tutorial\n", 21 | "* https://blog.thedataincubator.com/2018/02/numpy-and-pandas/\n", 22 | "* https://medium.com/@ericvanrees/pandas-series-objects-and-numpy-arrays-15dfe05919d7\n", 23 | "* https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/\n", 24 | "* https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121\n", 25 | "* McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media. Kindle Edition" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## What is NumPy?\n", 33 | "* NumPy is short for \"Numerical Python\" and it is a fundamental python package for scientific computing.\n", 34 | "* It uses a high-performance data structure known as the **n-dimensional array** or **ndarray**, a multi-dimensional array object, for efficient computation of arrays and matrices." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## What is an Array?\n", 42 | "* Python arrays are data structures that store data similar to a list, except the type of objects stored in them is constrained.\n", 43 | "* Elements of an array are all of the same type and indexed by a tuple of positive integers.\n", 44 | "* The python module array allows you to specify the type of array at object creation time by using a type code, which is a single character. You can read more about each type code here: https://docs.python.org/3/library/array.html?highlight=array#module-array " 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 1, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "import array" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 2, 59 | "metadata": {}, 60 | "outputs": [ 61 | { 62 | "data": { 63 | "text/plain": [ 64 | "array.array" 65 | ] 66 | }, 67 | "execution_count": 2, 68 | "metadata": {}, 69 | "output_type": "execute_result" 70 | } 71 | ], 72 | "source": [ 73 | "array_one = array.array('i',[1,2,3,4])\n", 74 | "type(array_one)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 3, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "data": { 84 | "text/plain": [ 85 | "int" 86 | ] 87 | }, 88 | "execution_count": 3, 89 | "metadata": {}, 90 | "output_type": "execute_result" 91 | } 92 | ], 93 | "source": [ 94 | "type(array_one[0])" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "## What is a NumPy N-Dimensional Array (ndarray)?\n", 102 | "* It is an efficient multidimensional array providing fast array-oriented arithmetic operations.\n", 103 | "* An ndarray as any other array, it is a container for homogeneous data (Elements of the same type)\n", 104 | "* In NumPy, data in an ndarray is simply referred to as an array.\n", 105 | "* As with other container objects in Python, the contents of an ndarray can be accessed and modified by indexing or slicing operations.\n", 106 | "* For numerical data, NumPy arrays are more efficient for storing and manipulating data than the other built-in Python data structures. " 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 4, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "data": { 116 | "text/plain": [ 117 | "'1.19.2'" 118 | ] 119 | }, 120 | "execution_count": 4, 121 | "metadata": {}, 122 | "output_type": "execute_result" 123 | } 124 | ], 125 | "source": [ 126 | "import numpy as np\n", 127 | "np.__version__" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 5, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "list_one = [1,2,3,4,5]" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 6, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "text/plain": [ 147 | "numpy.ndarray" 148 | ] 149 | }, 150 | "execution_count": 6, 151 | "metadata": {}, 152 | "output_type": "execute_result" 153 | } 154 | ], 155 | "source": [ 156 | "numpy_array = np.array(list_one)\n", 157 | "type(numpy_array)" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 7, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "data": { 167 | "text/plain": [ 168 | "array([1, 2, 3, 4, 5])" 169 | ] 170 | }, 171 | "execution_count": 7, 172 | "metadata": {}, 173 | "output_type": "execute_result" 174 | } 175 | ], 176 | "source": [ 177 | "numpy_array" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "## Advantages of NumPy Arrays" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "### Vectorized Operations\n", 192 | "* The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.\n", 193 | "* NumPy operations perform complex computations on entire arrays without the need for Python for loops.\n", 194 | "* In other words, if you apply a function to an array, it is performed on every item in the array, rather than on the whole array object.\n", 195 | "* In a python list, you will have to perform a loop over the elements of the list." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 8, 201 | "metadata": {}, 202 | "outputs": [ 203 | { 204 | "ename": "TypeError", 205 | "evalue": "can only concatenate list (not \"int\") to list", 206 | "output_type": "error", 207 | "traceback": [ 208 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 209 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 210 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mlist_two\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;31m# The following will throw an error:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mlist_two\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 211 | "\u001b[0;31mTypeError\u001b[0m: can only concatenate list (not \"int\") to list" 212 | ] 213 | } 214 | ], 215 | "source": [ 216 | "list_two = [1,2,3,4,5]\n", 217 | "# The following will throw an error:\n", 218 | "list_two + 2" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "* Performing a loop to add **2** to every integer in the list" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 9, 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "data": { 235 | "text/plain": [ 236 | "[3, 4, 5, 6, 7]" 237 | ] 238 | }, 239 | "execution_count": 9, 240 | "metadata": {}, 241 | "output_type": "execute_result" 242 | } 243 | ], 244 | "source": [ 245 | "for index, item in enumerate(list_two):\n", 246 | " list_two[index] = item + 2\n", 247 | "list_two" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "* With a NumPy array, you can do the same simply by doing the following:" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 10, 260 | "metadata": {}, 261 | "outputs": [ 262 | { 263 | "data": { 264 | "text/plain": [ 265 | "array([1, 2, 3, 4, 5])" 266 | ] 267 | }, 268 | "execution_count": 10, 269 | "metadata": {}, 270 | "output_type": "execute_result" 271 | } 272 | ], 273 | "source": [ 274 | "numpy_array" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 11, 280 | "metadata": {}, 281 | "outputs": [ 282 | { 283 | "data": { 284 | "text/plain": [ 285 | "array([3, 4, 5, 6, 7])" 286 | ] 287 | }, 288 | "execution_count": 11, 289 | "metadata": {}, 290 | "output_type": "execute_result" 291 | } 292 | ], 293 | "source": [ 294 | "numpy_array + 2" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "* Any arithmetic operations between equal-size arrays applies the operation element-wise: " 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 12, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "numpy_array_one = np.array([1,2])\n", 311 | "numpy_array_two = np.array([4,6])" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": 13, 317 | "metadata": {}, 318 | "outputs": [ 319 | { 320 | "data": { 321 | "text/plain": [ 322 | "array([5, 8])" 323 | ] 324 | }, 325 | "execution_count": 13, 326 | "metadata": {}, 327 | "output_type": "execute_result" 328 | } 329 | ], 330 | "source": [ 331 | "numpy_array_one + numpy_array_two" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 14, 337 | "metadata": {}, 338 | "outputs": [ 339 | { 340 | "data": { 341 | "text/plain": [ 342 | "array([False, False])" 343 | ] 344 | }, 345 | "execution_count": 14, 346 | "metadata": {}, 347 | "output_type": "execute_result" 348 | } 349 | ], 350 | "source": [ 351 | "numpy_array_one > numpy_array_two" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "### Memory.\n", 359 | "* NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects.\n", 360 | "* NumPy arrays takes significantly less amount of memory as compared to python lists." 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 15, 366 | "metadata": {}, 367 | "outputs": [], 368 | "source": [ 369 | "import numpy as np\n", 370 | "import sys" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 16, 376 | "metadata": {}, 377 | "outputs": [ 378 | { 379 | "data": { 380 | "text/plain": [ 381 | "168" 382 | ] 383 | }, 384 | "execution_count": 16, 385 | "metadata": {}, 386 | "output_type": "execute_result" 387 | } 388 | ], 389 | "source": [ 390 | "python_list = [1,2,3,4,5,6]\n", 391 | "python_list_size = sys.getsizeof(1) * len(python_list)\n", 392 | "python_list_size" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 17, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/plain": [ 403 | "48" 404 | ] 405 | }, 406 | "execution_count": 17, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "python_numpy_array = np.array([1,2,3,4,5,6])\n", 413 | "python_numpy_array_size = python_numpy_array.itemsize * python_numpy_array.size\n", 414 | "python_numpy_array_size" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "## Basic Indexing and Slicing " 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "### One Dimensional Array\n", 429 | "* When it comes down to slicing and indexing, one-dimensional arrays are the same as python lists" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 18, 435 | "metadata": {}, 436 | "outputs": [ 437 | { 438 | "data": { 439 | "text/plain": [ 440 | "array([1, 2, 3, 4, 5])" 441 | ] 442 | }, 443 | "execution_count": 18, 444 | "metadata": {}, 445 | "output_type": "execute_result" 446 | } 447 | ], 448 | "source": [ 449 | "numpy_array" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 19, 455 | "metadata": {}, 456 | "outputs": [ 457 | { 458 | "data": { 459 | "text/plain": [ 460 | "2" 461 | ] 462 | }, 463 | "execution_count": 19, 464 | "metadata": {}, 465 | "output_type": "execute_result" 466 | } 467 | ], 468 | "source": [ 469 | "numpy_array[1]" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 20, 475 | "metadata": {}, 476 | "outputs": [ 477 | { 478 | "data": { 479 | "text/plain": [ 480 | "array([2, 3, 4])" 481 | ] 482 | }, 483 | "execution_count": 20, 484 | "metadata": {}, 485 | "output_type": "execute_result" 486 | } 487 | ], 488 | "source": [ 489 | "numpy_array[1:4]" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "* You can slice the array and pass it to a variable. Remember that variables just reference objects.\n", 497 | "* Any change that you make to the array slice, it will be technnically done on the original array object. Once again, variables just reference objects." 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 21, 503 | "metadata": {}, 504 | "outputs": [ 505 | { 506 | "data": { 507 | "text/plain": [ 508 | "array([2, 3, 4])" 509 | ] 510 | }, 511 | "execution_count": 21, 512 | "metadata": {}, 513 | "output_type": "execute_result" 514 | } 515 | ], 516 | "source": [ 517 | "numpy_array_slice = numpy_array[1:4]\n", 518 | "numpy_array_slice" 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 22, 524 | "metadata": {}, 525 | "outputs": [ 526 | { 527 | "data": { 528 | "text/plain": [ 529 | "array([ 2, 10, 4])" 530 | ] 531 | }, 532 | "execution_count": 22, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "numpy_array_slice[1] = 10\n", 539 | "numpy_array_slice" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 23, 545 | "metadata": {}, 546 | "outputs": [ 547 | { 548 | "data": { 549 | "text/plain": [ 550 | "array([ 1, 2, 10, 4, 5])" 551 | ] 552 | }, 553 | "execution_count": 23, 554 | "metadata": {}, 555 | "output_type": "execute_result" 556 | } 557 | ], 558 | "source": [ 559 | "numpy_array" 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": {}, 565 | "source": [ 566 | "### Two-Dimensional Array\n", 567 | "* In a two-dimensional array, elements of the array are one-dimensional arrays " 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": 24, 573 | "metadata": {}, 574 | "outputs": [], 575 | "source": [ 576 | "numpy_two_dimensional_array = np.array([[1,2,3],[4,5,6],[7,8,9]])" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": 25, 582 | "metadata": {}, 583 | "outputs": [ 584 | { 585 | "data": { 586 | "text/plain": [ 587 | "array([[1, 2, 3],\n", 588 | " [4, 5, 6],\n", 589 | " [7, 8, 9]])" 590 | ] 591 | }, 592 | "execution_count": 25, 593 | "metadata": {}, 594 | "output_type": "execute_result" 595 | } 596 | ], 597 | "source": [ 598 | "numpy_two_dimensional_array" 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 26, 604 | "metadata": {}, 605 | "outputs": [ 606 | { 607 | "data": { 608 | "text/plain": [ 609 | "array([4, 5, 6])" 610 | ] 611 | }, 612 | "execution_count": 26, 613 | "metadata": {}, 614 | "output_type": "execute_result" 615 | } 616 | ], 617 | "source": [ 618 | "numpy_two_dimensional_array[1]" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": {}, 624 | "source": [ 625 | "* Instead of looping to the one-dimensional arrays to access specific elements, you can just pass a second index value" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": 27, 631 | "metadata": {}, 632 | "outputs": [ 633 | { 634 | "data": { 635 | "text/plain": [ 636 | "6" 637 | ] 638 | }, 639 | "execution_count": 27, 640 | "metadata": {}, 641 | "output_type": "execute_result" 642 | } 643 | ], 644 | "source": [ 645 | "numpy_two_dimensional_array[1][2]" 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": 28, 651 | "metadata": {}, 652 | "outputs": [ 653 | { 654 | "data": { 655 | "text/plain": [ 656 | "6" 657 | ] 658 | }, 659 | "execution_count": 28, 660 | "metadata": {}, 661 | "output_type": "execute_result" 662 | } 663 | ], 664 | "source": [ 665 | "numpy_two_dimensional_array[1,2]" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "* Slicing two-dimensional arrays is a little different than one-dimensional ones." 673 | ] 674 | }, 675 | { 676 | "cell_type": "code", 677 | "execution_count": 29, 678 | "metadata": {}, 679 | "outputs": [ 680 | { 681 | "data": { 682 | "text/plain": [ 683 | "array([[1, 2, 3],\n", 684 | " [4, 5, 6],\n", 685 | " [7, 8, 9]])" 686 | ] 687 | }, 688 | "execution_count": 29, 689 | "metadata": {}, 690 | "output_type": "execute_result" 691 | } 692 | ], 693 | "source": [ 694 | "numpy_two_dimensional_array" 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": 30, 700 | "metadata": {}, 701 | "outputs": [ 702 | { 703 | "data": { 704 | "text/plain": [ 705 | "array([[1, 2, 3]])" 706 | ] 707 | }, 708 | "execution_count": 30, 709 | "metadata": {}, 710 | "output_type": "execute_result" 711 | } 712 | ], 713 | "source": [ 714 | "numpy_two_dimensional_array[:1]" 715 | ] 716 | }, 717 | { 718 | "cell_type": "code", 719 | "execution_count": 31, 720 | "metadata": {}, 721 | "outputs": [ 722 | { 723 | "data": { 724 | "text/plain": [ 725 | "array([[1, 2, 3],\n", 726 | " [4, 5, 6]])" 727 | ] 728 | }, 729 | "execution_count": 31, 730 | "metadata": {}, 731 | "output_type": "execute_result" 732 | } 733 | ], 734 | "source": [ 735 | "numpy_two_dimensional_array[:2]" 736 | ] 737 | }, 738 | { 739 | "cell_type": "code", 740 | "execution_count": 32, 741 | "metadata": {}, 742 | "outputs": [ 743 | { 744 | "data": { 745 | "text/plain": [ 746 | "array([[1, 2, 3],\n", 747 | " [4, 5, 6],\n", 748 | " [7, 8, 9]])" 749 | ] 750 | }, 751 | "execution_count": 32, 752 | "metadata": {}, 753 | "output_type": "execute_result" 754 | } 755 | ], 756 | "source": [ 757 | "numpy_two_dimensional_array[:3]" 758 | ] 759 | }, 760 | { 761 | "cell_type": "code", 762 | "execution_count": 33, 763 | "metadata": {}, 764 | "outputs": [ 765 | { 766 | "data": { 767 | "text/plain": [ 768 | "array([[2, 3],\n", 769 | " [5, 6]])" 770 | ] 771 | }, 772 | "execution_count": 33, 773 | "metadata": {}, 774 | "output_type": "execute_result" 775 | } 776 | ], 777 | "source": [ 778 | "numpy_two_dimensional_array[:2,1:]" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": 34, 784 | "metadata": {}, 785 | "outputs": [ 786 | { 787 | "data": { 788 | "text/plain": [ 789 | "array([[1],\n", 790 | " [4]])" 791 | ] 792 | }, 793 | "execution_count": 34, 794 | "metadata": {}, 795 | "output_type": "execute_result" 796 | } 797 | ], 798 | "source": [ 799 | "numpy_two_dimensional_array[:2,:1]" 800 | ] 801 | }, 802 | { 803 | "cell_type": "code", 804 | "execution_count": 35, 805 | "metadata": {}, 806 | "outputs": [ 807 | { 808 | "data": { 809 | "text/plain": [ 810 | "array([8, 9])" 811 | ] 812 | }, 813 | "execution_count": 35, 814 | "metadata": {}, 815 | "output_type": "execute_result" 816 | } 817 | ], 818 | "source": [ 819 | "numpy_two_dimensional_array[2][1:]" 820 | ] 821 | }, 822 | { 823 | "cell_type": "code", 824 | "execution_count": null, 825 | "metadata": {}, 826 | "outputs": [], 827 | "source": [] 828 | } 829 | ], 830 | "metadata": { 831 | "kernelspec": { 832 | "display_name": "Python 3", 833 | "language": "python", 834 | "name": "python3" 835 | }, 836 | "language_info": { 837 | "codemirror_mode": { 838 | "name": "ipython", 839 | "version": 3 840 | }, 841 | "file_extension": ".py", 842 | "mimetype": "text/x-python", 843 | "name": "python", 844 | "nbconvert_exporter": "python", 845 | "pygments_lexer": "ipython3", 846 | "version": "3.8.5" 847 | } 848 | }, 849 | "nbformat": 4, 850 | "nbformat_minor": 2 851 | } 852 | -------------------------------------------------------------------------------- /docs/fundamentals/programming/intro.md: -------------------------------------------------------------------------------- 1 | # Programming Languages 2 | 3 | Programming languages used by security analysts. -------------------------------------------------------------------------------- /docs/fundamentals/programming/python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python 101\n", 8 | "----------------------------------------------------------------------------\n", 9 | "## Goals:\n", 10 | "* Learn basic Python operations\n", 11 | "* Understand differences in data structures\n", 12 | "* Get familiarized with conditional statements and loops\n", 13 | "* Learn to create custom functions and import python modules" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "**Main Reference:** McKinney, Wes. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O'Reilly Media. Kindle Edition\n", 21 | "\n", 22 | "## Indentation\n", 23 | "Python code is structured by indentation (tabs or spaces) instead of braces which is what other languages normally use. In addition, a colon (:) is used to define the start of an indented code block." 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": null, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "for x in list(range(5)):\n", 33 | " print(\"One number per loop..\")\n", 34 | " print(x)\n", 35 | " if x > 2:\n", 36 | " print(\"The number is greater than 2\")\n", 37 | " print(\"----------------------------\")" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Everything is an Object\n", 45 | "* Everything in Python is considered an object.\n", 46 | "* A string, a list, a function and even a number is an object.\n", 47 | "* For example, you can define a variable to reference a string and then access the methods available for the string object.\n", 48 | "* If you press the tab key after the variable name and period, you will see the methods available for it." 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "a = \"pedro\"" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": {}, 64 | "outputs": [], 65 | "source": [ 66 | "a.capitalize()" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "## Variables\n", 74 | "In Python, when you define/create a variable, you are basically creating a reference to an object (i.e string,list,etc). If you want to define/create a new variable from the original variable, you will be creating another reference to the original object rather than copying the contents of the first variable to the second one. " 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "a = [1,2,3]\n", 84 | "b = a\n", 85 | "b" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "Therefore, if you update the original variable (a), the new variable (b) will automatically reference the updated object." 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "a.append(4)\n", 102 | "b" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "A variable can have a short name (like x and y) or a more descriptive name (age, dog, owner).\n", 110 | "Rules for Python variables:\n", 111 | "* A variable name must start with a letter or the underscore character\n", 112 | "* A variable name cannot start with a number\n", 113 | "* A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )\n", 114 | "* Variable names are case-sensitive (age, Age and AGE are three different variables)\n", 115 | "\n", 116 | "Reference:https://www.w3schools.com/python/python_variables.asp" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": null, 122 | "metadata": {}, 123 | "outputs": [], 124 | "source": [ 125 | "dog_name = 'Pedro'\n", 126 | "age = 3\n", 127 | "is_vaccinated = True\n", 128 | "birth_year = 2015" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "is_vaccinated" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "dog_name" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "## Data Types\n", 154 | "As any other object, you can get information about its type via the built-in function [type()](https://docs.python.org/3/library/functions.html#type)." 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "type(age)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "type(dog_name)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": null, 178 | "metadata": {}, 179 | "outputs": [], 180 | "source": [ 181 | "type(is_vaccinated)" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "## Combining variables and operations" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": {}, 195 | "outputs": [], 196 | "source": [ 197 | "x = 4\n", 198 | "y = 10" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "x-y" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": null, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "x*y" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "y/x" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": {}, 232 | "outputs": [], 233 | "source": [ 234 | "y**x" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "x>y" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": null, 249 | "metadata": {}, 250 | "outputs": [], 251 | "source": [ 252 | "x==y" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [ 261 | "y>=x" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "## Binary Operators and Comparisons" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": null, 274 | "metadata": {}, 275 | "outputs": [], 276 | "source": [ 277 | "2+4" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "5*6" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": null, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "5>3" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "## The Respective Print statement" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": null, 308 | "metadata": {}, 309 | "outputs": [], 310 | "source": [ 311 | "print(\"Hello Helk!\")" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "## Control Flows" 319 | ] 320 | }, 321 | { 322 | "cell_type": "markdown", 323 | "metadata": {}, 324 | "source": [ 325 | "References:\n", 326 | "* https://docs.python.org/3/tutorial/controlflow.html\n", 327 | "* https://docs.python.org/3/reference/compound_stmts.html#the-if-statement" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "### If,elif,else statements\n", 335 | "* The if statement is used for conditional execution\n", 336 | "* It selects exactly one of the suites by evaluating the expressions one by one until one is found to be true; then that suite is executed. \n", 337 | "* If all expressions are false, the suite of the else clause, if present, is executed." 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": {}, 344 | "outputs": [], 345 | "source": [ 346 | "print(\"x = \" + str(x))\n", 347 | "print(\"y = \" + str(y))" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": {}, 354 | "outputs": [], 355 | "source": [ 356 | "if x==y:\n", 357 | " print('yes')\n", 358 | "else:\n", 359 | " print('no')" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "* An if statement can be optionally followed by one or more elif blocks and a catch-all else block if all of the conditions are False : " 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": null, 372 | "metadata": {}, 373 | "outputs": [], 374 | "source": [ 375 | "if x==y:\n", 376 | " print('They are equal')\n", 377 | "elif x > y:\n", 378 | " print(\"It is grater than\")\n", 379 | "else:\n", 380 | " print(\"None of the conditionals were true\")" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "## Loops" 388 | ] 389 | }, 390 | { 391 | "cell_type": "markdown", 392 | "metadata": {}, 393 | "source": [ 394 | "### For\n", 395 | "The for statement is used to iterate over the elements of a sequence (such as a string, tuple or list) or other iterable object." 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "execution_count": null, 401 | "metadata": {}, 402 | "outputs": [], 403 | "source": [ 404 | "my_dog_list=['Pedro',3,True,2015]" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "for i in range(0,10):\n", 414 | " print(i*10)" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "### While\n", 422 | "A while loop allows you to execute a block of code until a condition evaluates to false or the loop is ended with a break command." 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": null, 428 | "metadata": {}, 429 | "outputs": [], 430 | "source": [ 431 | "i = 1\n", 432 | "while i <= 5:\n", 433 | " print(i ** 2)\n", 434 | " i += 1" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": null, 440 | "metadata": {}, 441 | "outputs": [], 442 | "source": [ 443 | "i = 1\n", 444 | "while i > 0:\n", 445 | " if i > 5:\n", 446 | " break\n", 447 | " print(i ** 2)\n", 448 | " i += 1" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "## Data structures" 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": {}, 461 | "source": [ 462 | "References:\n", 463 | "* https://docs.python.org/3/tutorial/datastructures.html\n", 464 | "* https://python.swaroopch.com/data_structures.html" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": {}, 470 | "source": [ 471 | "### Lists\n", 472 | "* Lists are data structures that allow you to define an ordered collection of items.\n", 473 | "* Lists are constructed with square brackets, separating items with commas: [a, b, c].\n", 474 | "* Lists are mutable objects which means that you can modify the values contained in them.\n", 475 | "* The elements of a list can be of different types (string, integer, etc)" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": {}, 482 | "outputs": [], 483 | "source": [ 484 | "my_dog_list=['Pedro',3,True,2015]" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": null, 490 | "metadata": {}, 491 | "outputs": [], 492 | "source": [ 493 | "my_dog_list[0]" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": null, 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "my_dog_list[2:4]" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": {}, 509 | "outputs": [], 510 | "source": [ 511 | "print(\"My dog's name is \" + str(my_dog_list[0]) + \" and he is \" + str(my_dog_list[1]) + \" years old.\")" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "* The list data type has some more methods an you can find them [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists).\n", 519 | "* One in particular is the `list.append()` which allows you to add an item to the end of the list. Equivalent to `a[len(a):] = [x]`.\n" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": null, 525 | "metadata": {}, 526 | "outputs": [], 527 | "source": [ 528 | "my_dog_list.append(\"tennis balls\")" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": null, 534 | "metadata": {}, 535 | "outputs": [], 536 | "source": [ 537 | "my_dog_list" 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": {}, 543 | "source": [ 544 | "* You can modify the list values too:" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": null, 550 | "metadata": {}, 551 | "outputs": [], 552 | "source": [ 553 | "my_dog_list[1] = 4\n", 554 | "my_dog_list" 555 | ] 556 | }, 557 | { 558 | "cell_type": "markdown", 559 | "metadata": {}, 560 | "source": [ 561 | "### Dictionaries\n", 562 | "* Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. \n", 563 | "* Dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys.\n", 564 | "* It is best to think of a dictionary as a set of key: value pairs, with the requirement that the keys are unique (within one dictionary).\n", 565 | "* A pair of braces creates an empty dictionary: {}.\n", 566 | "* Remember that key-value pairs in a dictionary are not ordered in any manner. If you want a particular order, then you will have to sort them yourself before using it." 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": null, 572 | "metadata": {}, 573 | "outputs": [], 574 | "source": [ 575 | "my_dog_dict={'name':'Pedro','age':3,'is_vaccinated':True,'birth_year':2015}" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": null, 581 | "metadata": {}, 582 | "outputs": [], 583 | "source": [ 584 | "my_dog_dict" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "my_dog_dict['age']" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": null, 599 | "metadata": {}, 600 | "outputs": [], 601 | "source": [ 602 | "my_dog_dict.keys()" 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": null, 608 | "metadata": {}, 609 | "outputs": [], 610 | "source": [ 611 | "my_dog_dict.values()" 612 | ] 613 | }, 614 | { 615 | "cell_type": "markdown", 616 | "metadata": {}, 617 | "source": [ 618 | "### Tuples\n", 619 | "* A tuple consists of a number of values separated by commas\n", 620 | "* On output tuples are always enclosed in parentheses, so that nested tuples are interpreted correctly; they may be input with or without surrounding parentheses, although often parentheses are necessary anyway (if the tuple is part of a larger expression)." 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": null, 626 | "metadata": {}, 627 | "outputs": [], 628 | "source": [ 629 | "my_dog_tuple=('Pedro',3,True,2015)" 630 | ] 631 | }, 632 | { 633 | "cell_type": "code", 634 | "execution_count": null, 635 | "metadata": {}, 636 | "outputs": [], 637 | "source": [ 638 | "my_dog_tuple" 639 | ] 640 | }, 641 | { 642 | "cell_type": "markdown", 643 | "metadata": {}, 644 | "source": [ 645 | "* Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking or indexing.\n", 646 | "* Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list." 647 | ] 648 | }, 649 | { 650 | "cell_type": "code", 651 | "execution_count": null, 652 | "metadata": {}, 653 | "outputs": [], 654 | "source": [ 655 | "my_dog_tuple[1]" 656 | ] 657 | }, 658 | { 659 | "cell_type": "markdown", 660 | "metadata": {}, 661 | "source": [ 662 | "## Slicing\n", 663 | "You can select sections of most sequence types by using slice notation, which in its basic form consists of\n", 664 | "start:stop passed to the indexing operator []" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": null, 670 | "metadata": {}, 671 | "outputs": [], 672 | "source": [ 673 | "seq = [ 7 , 2 , 3 , 7 , 5 , 6 , 0 , 1 ]\n", 674 | "seq [ 1 : 5 ] " 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "## Functions\n", 682 | "Functions allow you to organize and reuse code blocks. If you repeat the same code across several conditions, you could make that code block a function and re-use it. Functions are declared with the **def** keyword and\n", 683 | "returned from with the **return** keyword: " 684 | ] 685 | }, 686 | { 687 | "cell_type": "code", 688 | "execution_count": null, 689 | "metadata": {}, 690 | "outputs": [], 691 | "source": [ 692 | "def square(n):\n", 693 | " return n ** 2" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": null, 699 | "metadata": {}, 700 | "outputs": [], 701 | "source": [ 702 | "print(\"Square root of 2 is \" + str(square(2)))" 703 | ] 704 | }, 705 | { 706 | "cell_type": "code", 707 | "execution_count": null, 708 | "metadata": {}, 709 | "outputs": [], 710 | "source": [ 711 | "number_list = [1,2,3,4,5]" 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": null, 717 | "metadata": {}, 718 | "outputs": [], 719 | "source": [ 720 | "for number in number_list:\n", 721 | " sn = square(number)\n", 722 | " print(\"Square root of \" + str(number) + \" is \" + str(sn))" 723 | ] 724 | }, 725 | { 726 | "cell_type": "markdown", 727 | "metadata": {}, 728 | "source": [ 729 | "## Modules" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": {}, 735 | "source": [ 736 | "References:\n", 737 | "* https://docs.python.org/3/tutorial/modules.html#modules" 738 | ] 739 | }, 740 | { 741 | "cell_type": "markdown", 742 | "metadata": {}, 743 | "source": [ 744 | "* If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost.\n", 745 | "* Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead.\n", 746 | "* Let's say we define two functions:" 747 | ] 748 | }, 749 | { 750 | "cell_type": "code", 751 | "execution_count": null, 752 | "metadata": {}, 753 | "outputs": [], 754 | "source": [ 755 | "def square(n):\n", 756 | " return n ** 2\n", 757 | "\n", 758 | "def cube(n):\n", 759 | " return n ** 3" 760 | ] 761 | }, 762 | { 763 | "cell_type": "markdown", 764 | "metadata": {}, 765 | "source": [ 766 | "* You can save the code above in a file named math.py. I created the file for you already in the current folder.\n", 767 | "* All you have to do is import the **math_ops.py** file" 768 | ] 769 | }, 770 | { 771 | "cell_type": "code", 772 | "execution_count": null, 773 | "metadata": {}, 774 | "outputs": [], 775 | "source": [ 776 | "import math_ops" 777 | ] 778 | }, 779 | { 780 | "cell_type": "code", 781 | "execution_count": null, 782 | "metadata": {}, 783 | "outputs": [], 784 | "source": [ 785 | "for number in number_list:\n", 786 | " sn = square(number)\n", 787 | " cn = cube(number)\n", 788 | " print(\"Square root of \" + str(number) + \" is \" + str(sn))\n", 789 | " print(\"Cube root of \" + str(number) + \" is \" + str(cn))\n", 790 | " print(\"-------------------------\")" 791 | ] 792 | }, 793 | { 794 | "cell_type": "markdown", 795 | "metadata": {}, 796 | "source": [ 797 | "* You can get a list of the current installed modules too" 798 | ] 799 | }, 800 | { 801 | "cell_type": "code", 802 | "execution_count": null, 803 | "metadata": {}, 804 | "outputs": [], 805 | "source": [ 806 | "help('modules')" 807 | ] 808 | }, 809 | { 810 | "cell_type": "markdown", 811 | "metadata": {}, 812 | "source": [ 813 | "* Let's import the **datetime** module." 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": null, 819 | "metadata": {}, 820 | "outputs": [], 821 | "source": [ 822 | "import datetime" 823 | ] 824 | }, 825 | { 826 | "cell_type": "markdown", 827 | "metadata": {}, 828 | "source": [ 829 | "* Explore the **datetime** available methods. You can do that by typing the module name, a period after that and pressing the **tab** key or by using the the built-in functionn **dir()** as shown below:" 830 | ] 831 | }, 832 | { 833 | "cell_type": "code", 834 | "execution_count": null, 835 | "metadata": {}, 836 | "outputs": [], 837 | "source": [ 838 | "dir(datetime)" 839 | ] 840 | }, 841 | { 842 | "cell_type": "markdown", 843 | "metadata": {}, 844 | "source": [ 845 | "* You can also import a module with a custom name" 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": 2, 851 | "metadata": {}, 852 | "outputs": [], 853 | "source": [ 854 | "import datetime as dt" 855 | ] 856 | }, 857 | { 858 | "cell_type": "code", 859 | "execution_count": 3, 860 | "metadata": {}, 861 | "outputs": [ 862 | { 863 | "data": { 864 | "text/plain": [ 865 | "['MAXYEAR',\n", 866 | " 'MINYEAR',\n", 867 | " '__builtins__',\n", 868 | " '__cached__',\n", 869 | " '__doc__',\n", 870 | " '__file__',\n", 871 | " '__loader__',\n", 872 | " '__name__',\n", 873 | " '__package__',\n", 874 | " '__spec__',\n", 875 | " 'date',\n", 876 | " 'datetime',\n", 877 | " 'datetime_CAPI',\n", 878 | " 'sys',\n", 879 | " 'time',\n", 880 | " 'timedelta',\n", 881 | " 'timezone',\n", 882 | " 'tzinfo']" 883 | ] 884 | }, 885 | "execution_count": 3, 886 | "metadata": {}, 887 | "output_type": "execute_result" 888 | } 889 | ], 890 | "source": [ 891 | "dir(dt)" 892 | ] 893 | }, 894 | { 895 | "cell_type": "code", 896 | "execution_count": null, 897 | "metadata": {}, 898 | "outputs": [], 899 | "source": [] 900 | } 901 | ], 902 | "metadata": { 903 | "kernelspec": { 904 | "display_name": "Python 3", 905 | "language": "python", 906 | "name": "python3" 907 | }, 908 | "language_info": { 909 | "codemirror_mode": { 910 | "name": "ipython", 911 | "version": 3 912 | }, 913 | "file_extension": ".py", 914 | "mimetype": "text/x-python", 915 | "name": "python", 916 | "nbconvert_exporter": "python", 917 | "pygments_lexer": "ipython3", 918 | "version": "3.7.3" 919 | } 920 | }, 921 | "nbformat": 4, 922 | "nbformat_minor": 2 923 | } 924 | -------------------------------------------------------------------------------- /docs/getting-started/architecture.md: -------------------------------------------------------------------------------- 1 | # Architecture 2 | 3 | Jupyter Notebooks work with what is called a two-process model based on a kernel-client infrastructure. This model applies a similar concept to the Read-Evaluate-Print Loop (REPL) programming environment that takes a single user’s inputs, evaluates them, and returns the result to the user. 4 | 5 | Based on the two-process model concept, we can explain the main components of Jupyter in the following way: 6 | 7 | ![](../images/JUPYTER_ARCHITECTURE.png) 8 | 9 | ## Jupyter Client 10 | 11 | * It allows a user to send code to the kernel in a form of a Qt Console or a browser via notebook documents. 12 | * From a REPL perspective, the client does the read and print operations. 13 | * Notebooks are hosted by a Jupyter web server which uses Tornado to serve HTTP requests. 14 | 15 | **Running Code** 16 | 17 | ![](../images/JUPYTER_CLIENT_NOTEBOOK.png) 18 | 19 | **Execution** 20 | 21 | ![](../images/JUPYTER_CLIENT_EXEC_REQUEST.png) 22 | 23 | ![](../images/JUPYTER_CLIENT_EXEC_STREAM.png) 24 | 25 | ## Jupyter Kernel 26 | 27 | * It receives the code sent by the client, executes it, and returns the results back to the client for display. A kernel process can have multiple clients communicating with it which is why this model is also referred as the decoupled two-process model. 28 | * From a REPL perspective, the kernel does the evaluate operation. 29 | * kernel and clients communicate via an interactive computing protocol based on an asynchronous messaging library named ZeroMQ (low-level transport layer) and WebSockets (TCP-based) 30 | * Makes Jupyter a language agnostic application (Julia, Python, R, etc.) 31 | 32 | A kernel identifies itself to IPython by creating a directory, the name of which is used as an identifier for the kernel. These may be created in a number of locations: 33 | 34 | ![](../images/JUPYTER_KERNEL_LOCATION.png) 35 | 36 | Mine is in the following location (MAC) `~/Library/Jupyter/kernels/python37664bite09a6f3cbf7b46ec803618408bcaece5`, and you find similar files: 37 | 38 | 39 | ```bash 40 | kernel.json logo-32x32.png logo-64x64.png 41 | ``` 42 | 43 | 44 | Sample of a default Python kernel json file: 45 | 46 | ``` 47 | { 48 | "argv": [ 49 | "/usr/local/opt/python/bin/python3.7", 50 | "-m", 51 | "ipykernel_launcher", 52 | "-f", 53 | "{connection_file}" 54 | ], 55 | "display_name": "Python 3.7.6 64-bit", 56 | "language": "python", 57 | "env": {}, 58 | "metadata": { 59 | "interpreter": { 60 | "architecture": 3, 61 | "path": "/usr/local/opt/python/bin/python3.7", 62 | "version": { 63 | "options": { 64 | "loose": false, 65 | "includePrerelease": false 66 | }, 67 | "loose": false, 68 | "raw": "3.7.6-final", 69 | "major": 3, 70 | "minor": 7, 71 | "patch": 6, 72 | "prerelease": [ 73 | "final" 74 | ], 75 | "build": [], 76 | "version": "3.7.6-final" 77 | }, 78 | "sysPrefix": "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7", 79 | "fileHash": "1eaf1f22773a15c8adb7d37641dd5c88999c181add7c1d97a004dd33a2c824657b1b4cfb6ca19d4b7804b18eb454d5c34b3df1bdc53c2c93f4116419ce72d1a8", 80 | "type": "Unknown", 81 | "displayName": "Python 3.7.6 64-bit", 82 | "__store": true 83 | } 84 | } 85 | }% 86 | ``` 87 | 88 | You can build your own. I built mine to run PySpark through my Jupyter Notebook as shown below: 89 | 90 | ``` 91 | { 92 | "display_name": "PySpark_Python3", 93 | "language": "python", 94 | "argv": [ 95 | "/opt/conda/bin/python3", 96 | "-m", 97 | "ipykernel_launcher", 98 | "-f", 99 | "{connection_file}" 100 | ], 101 | "env": { 102 | "SPARK_HOME": "/opt/jupyter/spark/", 103 | "PYTHONPATH": "/opt/jupyter/spark/python/:/opt/jupyter/spark/python/lib/py4j-0.10.9-src.zip:/opt/jupyter/spark/graphframes.zip", 104 | "PYSPARK_PYTHON": "/opt/conda/bin/python3" 105 | } 106 | } 107 | ``` 108 | 109 | ## Jupyter Notebook Document Format 110 | 111 | * Notebooks are automatically saved and stored on disk in the open source JavaScript Object Notation (JSON) format and with a `.ipynb` extension. -------------------------------------------------------------------------------- /docs/getting-started/installation.md: -------------------------------------------------------------------------------- 1 | # Installation 2 | 3 | I am sure you are anxious to install Jupyter and start exploring its capabilities, but first you have to decide if you want to install the Jupyter Notebook server directly on your system or host it on a virtual machine or a docker container. 4 | 5 | I believe it is important to give you the options so that you feel comfortable running the tool however you feel like it. If you want to do a classic install directly on your system, follow the [official Jupyter Install documents](https://jupyter.org/install). I put together the following links and resources to make it easier: 6 | 7 | ## Manual Install 8 | 9 | **Prerequisite:** Python 10 | 11 | While Jupyter runs code in many programming languages, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the JupyterLab or the classic Jupyter Notebook. 12 | 13 | * [Windows Python Installer](https://www.python.org/downloads/windows/) 14 | * [Mac OS Python Installer](https://www.python.org/downloads/mac-osx/) 15 | 16 | ### Using Conda 17 | 18 | * [Windows Conda Installer](https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html) 19 | * [Mac OS Conda Installer](https://docs.conda.io/projects/conda/en/latest/user-guide/install/macos.html) 20 | 21 | Once Conda is installed, you can install Jupyter Notebook with the following command (Bash): 22 | 23 | ```bash 24 | conda install -c conda-forge notebook 25 | ``` 26 | 27 | ### Using PIP 28 | 29 | You can install PIP with the following commands: 30 | 31 | ```bash 32 | curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py 33 | python get-pip.py 34 | ``` 35 | 36 | Onve PIP is installed, you can run the following command to install a Jupyter Notebook server: 37 | 38 | ```bash 39 | pip install notebook 40 | ``` 41 | 42 | ## Running Jupyter Notebook Server 43 | 44 | Once Jupyter Notebook is installed, you can run it with the following command: 45 | 46 | ```bash 47 | jupyter notebook 48 | ``` 49 | 50 | 51 | You will get the server output showing on your teminal. You can `CTRL+C` to stop the server. It will bind to your `localhost` and on port `8888` by default. 52 | 53 | ![](../images/JUPYTER_NOTEBOOK_SERVER_RUN.png) 54 | 55 | 56 | Either the Jupyter Notebook client interface will open automatically in your default browser or you can just copy and paste the URL with the token showin in the output of the Jupyter Notebook server. 57 | 58 | 59 | ![](../images/JUPYTER_INSTALLATION_NOTEBOOK_SERVER.png) -------------------------------------------------------------------------------- /docs/getting-started/installation_binderhub.md: -------------------------------------------------------------------------------- 1 | # BinderHub 2 | 3 | Another way to interact with a Jupyter Notebook server is by leveraging the BinderHub public computing infrastructure and a Binder Repository (i.e A Dockerfile). If you want to learn more about this, you can read the [MyBinder](https://mybinder.readthedocs.io/en/latest/index.html) and [BinderHub](https://binderhub.readthedocs.io/en/latest/overview.html) docs. 4 | 5 | 6 | * Released on May, 2016 7 | * Updated 2.0 on November, 2019 8 | * The Binder Project is an open community that makes it possible to create shareable, interactive, reproducible environments. 9 | * The main technical product that the community creates is called BinderHub, and one deployment of a BinderHub exists at mybinder.org. 10 | * Who is it for?: 11 | * Researchers, Educators, people analyzing data and people trying to communicate the data analysis to others!! 12 | 13 | BinderHub connects several services together to provide on-the-fly creation and registry of Docker images. It utilizes the following tools: 14 | 15 | * A cloud provider such Google Cloud, Microsoft Azure, Amazon EC2, and others 16 | * Kubernetes to manage resources on the cloud 17 | * Helm to configure and control Kubernetes 18 | * Docker to use containers that standardize computing environments 19 | * A BinderHub UI that users can access to specify Git repos they want built 20 | * BinderHub to generate Docker images using the URL of a Git repository 21 | * A Docker registry (such as gcr.io) that hosts container images 22 | * JupyterHub to deploy temporary containers for users 23 | 24 | ![](../images/Binderhub-Architecture.png) 25 | 26 | This website was built via the [Jupyter Books](https://jupyterbook.org/intro.html) project. An open source project for building beautiful, publication-quality books and documents from computational material. Therefore, you can test the creation of a Jupyter Notebook server via Binderhub by running the server that can be used to host the content of this projects. Click on the following badge: 27 | 28 | * [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/OTRF/infosec-jupyter-book/master) 29 | 30 | 31 | Right after that you will see the following Binder page preparing and launching the Jupyter Notebook server: 32 | 33 | That's it! Now you will be able to use the notebooks available in this project! The advantage of this service is that you can share your notebooks this way with others without deploying or installing a Jupyter notebook server. -------------------------------------------------------------------------------- /docs/getting-started/installation_docker.md: -------------------------------------------------------------------------------- 1 | # Docker 2 | 3 | I prefer to share a standardized and working environment via docker images to focus more on the capabilities of the application rather than spend time troubleshooting the server installation. 4 | 5 | ## Pre-Requirements 6 | 7 | * [Docker Community Edition](https://docs.docker.com/install/linux/docker-ce/binaries/) 8 | 9 | You just have to install the community edition of Docker, so that you can pull and run ready-to-run Docker images containing Jupyter applications and interactive computing tools. You can use a stack image to do any of the following (and more): 10 | 11 | * Start a personal Jupyter Notebook server in a local Docker container 12 | * Run JupyterLab servers for a team using JupyterHub 13 | * Write your own project Dockerfile 14 | 15 | **Community Docker Images** 16 | 17 | * [Jupyter Docker Stacks](https://github.com/jupyter/docker-stacks)_ 18 | * [Jupyter Docker Base Image](https://hub.docker.com/r/jupyter/base-notebook/) 19 | * [Hunters Forge Images](https://github.com/hunters-forge/notebooks-forge) 20 | 21 | 22 | ![](../images/JUPYTER_DOCKER_GH_MINIMAL.png) 23 | 24 | 25 | ## Downloading and Running a Jupyter Notebook Server 26 | 27 | ```bash 28 | docker run -p 8888:8888 jupyter/minimal-notebook:latest 29 | ``` 30 | 31 | 32 | ![](../images/JUPYTER_DOCKER_MINIMAL_RUN.png) 33 | 34 | 35 | ### Demo Video 36 | 37 | 38 | 39 | ## Using Open Threat Research (OTRF) Docker Images 40 | 41 | ### Running Latest Images 42 | 43 | You can simply download and run a docker image already created by the OTR Community. The Docker images are under the following account: [https://hub.docker.com/u/cyb3rward0g](https://hub.docker.com/u/cyb3rward0g). Look for the docker image names that start with `jupyter-`. If we want to download and run the `jupyter-base` image, you can do it with the following command: 44 | 45 | ```bash 46 | docker run -p 8888:8888 cyb3rward0g/jupyter-base:latest 47 | ``` 48 | 49 | ### Building Latest Images 50 | 51 | Clone OTR notebooks-forge repository 52 | 53 | ```bash 54 | git clone https://github.com/OTRF/notebooks-forge 55 | ``` 56 | 57 | Build & Run Docker Image 58 | 59 | ```bash 60 | cd notebooks-forge/docker/jupyter-base 61 | docker build -t jupyter-base . 62 | docker run -d -ti -p 8888:8888 --name jupyter-base jupyter-base 63 | ``` 64 | 65 | ### Get Notebook Server Link 66 | 67 | ```bash 68 | docker exec -i jupyter-base jupyter notebook list 69 | 70 | Currently running servers: 71 | http://0.0.0.0:8888/?token=bcd90816a041fa1f966829d1d46027e4524f40d97b96b8e0 :: /opt/jupyter/notebooks 72 | ``` 73 | 74 | ### Browse to Link 75 | 76 | ![](../images/JUPYTER_NOTEBOOK_SERVER.png) 77 | -------------------------------------------------------------------------------- /docs/getting-started/ipython_vs_python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# IPython vs Python\n", 8 | "**Reference:** https://ipython.readthedocs.io/en/stable/interactive/python-ipython-diff.html" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "## Accessing help\n", 16 | "As IPython is mostly an interactive shell, the question mark is a simple shortcut to get help. A question mark alone will bring up the IPython help:" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "?" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "```python\n", 33 | "IPython -- An enhanced Interactive Python\n", 34 | "=========================================\n", 35 | "\n", 36 | "IPython offers a fully compatible replacement for the standard Python\n", 37 | "interpreter, with convenient shell features, special commands, command\n", 38 | "history mechanism and output results caching.\n", 39 | "\n", 40 | "At your system command line, type 'ipython -h' to see the command line\n", 41 | "options available. This document only describes interactive features.\n", 42 | "\n", 43 | "GETTING HELP\n", 44 | "------------\n", 45 | "\n", 46 | "Within IPython you have various way to access help:\n", 47 | "\n", 48 | " ? -> Introduction and overview of IPython's features (this screen).\n", 49 | " object? -> Details about 'object'.\n", 50 | " object?? -> More detailed, verbose information about 'object'.\n", 51 | " %quickref -> Quick reference of all IPython specific syntax and magics.\n", 52 | " help -> Access Python's own help system.\n", 53 | "\n", 54 | "If you are in terminal IPython you can quit this screen by pressing `q`.\n", 55 | "\n", 56 | "\n", 57 | "MAIN FEATURES\n", 58 | "-------------\n", 59 | "\n", 60 | "* Access to the standard Python help with object docstrings and the Python\n", 61 | " manuals. Simply type 'help' (no quotes) to invoke it.\n", 62 | "\n", 63 | "* Magic commands: type %magic for information on the magic subsystem.\n", 64 | "\n", 65 | "* System command aliases, via the %alias command or the configuration file(s).\n", 66 | "\n", 67 | ":\n", 68 | "```" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "A single question mark before, or after an object available in current namespace will show help relative to this object:" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "import os" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 4, 90 | "metadata": {}, 91 | "outputs": [], 92 | "source": [ 93 | "os?" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "```python\n", 101 | "Type: module\n", 102 | "String form: \n", 103 | "File: /usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py\n", 104 | "Docstring: \n", 105 | "OS routines for NT or Posix depending on what system we're on.\n", 106 | "\n", 107 | "This exports:\n", 108 | " - all functions from posix or nt, e.g. unlink, stat, etc.\n", 109 | " - os.path is either posixpath or ntpath\n", 110 | " - os.name is either 'posix' or 'nt'\n", 111 | " - os.curdir is a string representing the current directory (always '.')\n", 112 | " - os.pardir is a string representing the parent directory (always '..')\n", 113 | " - os.sep is the (or a most common) pathname separator ('/' or '\\\\')\n", 114 | " - os.extsep is the extension separator (always '.')\n", 115 | " - os.altsep is the alternate pathname separator (None or '/')\n", 116 | " - os.pathsep is the component separator used in $PATH etc\n", 117 | " - os.linesep is the line separator in text files ('\\r' or '\\n' or '\\r\\n')\n", 118 | " - os.defpath is the default search path for executables\n", 119 | " - os.devnull is the file path of the null device ('/dev/null', etc.)\n", 120 | "\n", 121 | "Programs that import and use 'os' stand a better chance of being\n", 122 | "portable between different platforms. Of course, they must then\n", 123 | "only use functions that are defined by all platforms (e.g., unlink\n", 124 | "and opendir), and leave all pathname manipulation to os.path\n", 125 | "(e.g., split and join).\n", 126 | "```" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "A double question mark will try to pull out more information about the object, and if possible display the python source code of this object." 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 5, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "def two_times(a):\n", 143 | " b = a * 2\n", 144 | " return b" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 6, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "two_times??" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "```python\n", 161 | "Signature: two_times(a)\n", 162 | "Docstring: \n", 163 | "Source: \n", 164 | "def two_times(a):\n", 165 | " b = a * 2\n", 166 | " return b\n", 167 | "File: ~/Documents/GitHub/infosec-jupyter-book/docs/getting-started/\n", 168 | "Type: function\n", 169 | "```" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 13, 175 | "metadata": {}, 176 | "outputs": [], 177 | "source": [ 178 | "print??" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "## Shell Assignment\n", 186 | "When doing interactive computing it is common to need to access the underlying shell. This is doable through the use of the exclamation mark `!` (or bang)." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 7, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "Untitled.ipynb installation_binderhub.md what_is_jupyter.md\r\n", 199 | "architecture.md installation_docker.md\r\n", 200 | "installation.md ipython_python.md\r\n" 201 | ] 202 | } 203 | ], 204 | "source": [ 205 | "!ls" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 8, 211 | "metadata": {}, 212 | "outputs": [ 213 | { 214 | "name": "stdout", 215 | "output_type": "stream", 216 | "text": [ 217 | "/Users/cyb3rward0g/Documents/GitHub/infosec-jupyter-book/docs/getting-started\r\n" 218 | ] 219 | } 220 | ], 221 | "source": [ 222 | "!pwd" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "## Magics\n", 230 | "Magics function are often present in the form of shell-like syntax, but are under the hood python function. The syntax and assignment possibility are similar to the one with the bang `(!)` syntax, but with more flexibility and power. Magic function start with a percent sign `(%)` or double percent `(%%)`." 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 12, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "data": { 240 | "application/json": { 241 | "cell": { 242 | "!": "OSMagics", 243 | "HTML": "Other", 244 | "SVG": "Other", 245 | "bash": "Other", 246 | "capture": "ExecutionMagics", 247 | "debug": "ExecutionMagics", 248 | "file": "Other", 249 | "html": "DisplayMagics", 250 | "javascript": "DisplayMagics", 251 | "js": "DisplayMagics", 252 | "latex": "DisplayMagics", 253 | "markdown": "DisplayMagics", 254 | "perl": "Other", 255 | "prun": "ExecutionMagics", 256 | "pypy": "Other", 257 | "python": "Other", 258 | "python2": "Other", 259 | "python3": "Other", 260 | "ruby": "Other", 261 | "script": "ScriptMagics", 262 | "sh": "Other", 263 | "svg": "DisplayMagics", 264 | "sx": "OSMagics", 265 | "system": "OSMagics", 266 | "time": "ExecutionMagics", 267 | "timeit": "ExecutionMagics", 268 | "writefile": "OSMagics" 269 | }, 270 | "line": { 271 | "alias": "OSMagics", 272 | "alias_magic": "BasicMagics", 273 | "autoawait": "AsyncMagics", 274 | "autocall": "AutoMagics", 275 | "automagic": "AutoMagics", 276 | "autosave": "KernelMagics", 277 | "bookmark": "OSMagics", 278 | "cat": "Other", 279 | "cd": "OSMagics", 280 | "clear": "KernelMagics", 281 | "colors": "BasicMagics", 282 | "conda": "PackagingMagics", 283 | "config": "ConfigMagics", 284 | "connect_info": "KernelMagics", 285 | "cp": "Other", 286 | "debug": "ExecutionMagics", 287 | "dhist": "OSMagics", 288 | "dirs": "OSMagics", 289 | "doctest_mode": "BasicMagics", 290 | "ed": "Other", 291 | "edit": "KernelMagics", 292 | "env": "OSMagics", 293 | "gui": "BasicMagics", 294 | "hist": "Other", 295 | "history": "HistoryMagics", 296 | "killbgscripts": "ScriptMagics", 297 | "ldir": "Other", 298 | "less": "KernelMagics", 299 | "lf": "Other", 300 | "lk": "Other", 301 | "ll": "Other", 302 | "load": "CodeMagics", 303 | "load_ext": "ExtensionMagics", 304 | "loadpy": "CodeMagics", 305 | "logoff": "LoggingMagics", 306 | "logon": "LoggingMagics", 307 | "logstart": "LoggingMagics", 308 | "logstate": "LoggingMagics", 309 | "logstop": "LoggingMagics", 310 | "ls": "Other", 311 | "lsmagic": "BasicMagics", 312 | "lx": "Other", 313 | "macro": "ExecutionMagics", 314 | "magic": "BasicMagics", 315 | "man": "KernelMagics", 316 | "matplotlib": "PylabMagics", 317 | "mkdir": "Other", 318 | "more": "KernelMagics", 319 | "mv": "Other", 320 | "notebook": "BasicMagics", 321 | "page": "BasicMagics", 322 | "pastebin": "CodeMagics", 323 | "pdb": "ExecutionMagics", 324 | "pdef": "NamespaceMagics", 325 | "pdoc": "NamespaceMagics", 326 | "pfile": "NamespaceMagics", 327 | "pinfo": "NamespaceMagics", 328 | "pinfo2": "NamespaceMagics", 329 | "pip": "PackagingMagics", 330 | "popd": "OSMagics", 331 | "pprint": "BasicMagics", 332 | "precision": "BasicMagics", 333 | "prun": "ExecutionMagics", 334 | "psearch": "NamespaceMagics", 335 | "psource": "NamespaceMagics", 336 | "pushd": "OSMagics", 337 | "pwd": "OSMagics", 338 | "pycat": "OSMagics", 339 | "pylab": "PylabMagics", 340 | "qtconsole": "KernelMagics", 341 | "quickref": "BasicMagics", 342 | "recall": "HistoryMagics", 343 | "rehashx": "OSMagics", 344 | "reload_ext": "ExtensionMagics", 345 | "rep": "Other", 346 | "rerun": "HistoryMagics", 347 | "reset": "NamespaceMagics", 348 | "reset_selective": "NamespaceMagics", 349 | "rm": "Other", 350 | "rmdir": "Other", 351 | "run": "ExecutionMagics", 352 | "save": "CodeMagics", 353 | "sc": "OSMagics", 354 | "set_env": "OSMagics", 355 | "store": "StoreMagics", 356 | "sx": "OSMagics", 357 | "system": "OSMagics", 358 | "tb": "ExecutionMagics", 359 | "time": "ExecutionMagics", 360 | "timeit": "ExecutionMagics", 361 | "unalias": "OSMagics", 362 | "unload_ext": "ExtensionMagics", 363 | "who": "NamespaceMagics", 364 | "who_ls": "NamespaceMagics", 365 | "whos": "NamespaceMagics", 366 | "xdel": "NamespaceMagics", 367 | "xmode": "BasicMagics" 368 | } 369 | }, 370 | "text/plain": [ 371 | "Available line magics:\n", 372 | "%alias %alias_magic %autoawait %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %conda %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %pip %popd %pprint %precision %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode\n", 373 | "\n", 374 | "Available cell magics:\n", 375 | "%%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile\n", 376 | "\n", 377 | "Automagic is ON, % prefix IS NOT needed for line magics." 378 | ] 379 | }, 380 | "execution_count": 12, 381 | "metadata": {}, 382 | "output_type": "execute_result" 383 | } 384 | ], 385 | "source": [ 386 | "%lsmagic" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": 17, 392 | "metadata": {}, 393 | "outputs": [ 394 | { 395 | "name": "stdout", 396 | "output_type": "stream", 397 | "text": [ 398 | "CPU times: user 3 µs, sys: 0 ns, total: 3 µs\n", 399 | "Wall time: 6.91 µs\n", 400 | "5 25\n", 401 | "6 36\n", 402 | "7 49\n" 403 | ] 404 | } 405 | ], 406 | "source": [ 407 | "%time\n", 408 | "for i in range(5, 8):\n", 409 | " print(i, i ** 2)" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [] 418 | } 419 | ], 420 | "metadata": { 421 | "kernelspec": { 422 | "display_name": "Python 3", 423 | "language": "python", 424 | "name": "python3" 425 | }, 426 | "language_info": { 427 | "codemirror_mode": { 428 | "name": "ipython", 429 | "version": 3 430 | }, 431 | "file_extension": ".py", 432 | "mimetype": "text/x-python", 433 | "name": "python", 434 | "nbconvert_exporter": "python", 435 | "pygments_lexer": "ipython3", 436 | "version": "3.8.5" 437 | } 438 | }, 439 | "nbformat": 4, 440 | "nbformat_minor": 4 441 | } 442 | -------------------------------------------------------------------------------- /docs/getting-started/what_is_jupyter.md: -------------------------------------------------------------------------------- 1 | # What Is Jupyter? 2 | 3 | The Jupyter Notebook project is the evolution of the IPython Notebook library which was developed primarily to enhance the default python interactive console by enabling scientific operations and advanced data analytics capabilities via sharable web documents. 4 | 5 | ## IPython 6 | 7 | * Released on Dec 10, 2001 by Fernando Perez while he was a graduate student at the University of Colorado 8 | * Release: https://mail.python.org/pipermail/python-list/2001-December/093408.html 9 | * IPython as we know it today grew out of the following three projects: 10 | * ipython by Fernando Pérez. 11 | * IPP by Janko Hauser. 12 | * LazyPython by Nathan Gray. 13 | 14 | ![](../images/JUPYTER_IPYTHON.png) 15 | 16 | ## Fernando Perez Inspiration 17 | 18 | * Research with open tools for access and collaboration 19 | * Validated in SciPy India 2010 - Workshop to include students from underprivileged colleges in rural India. 20 | * Scientific 21 | * Business of science is to understand nature 22 | * Science is about opening up the black boxes nature 23 | * Community! 24 | * SciPy: Scientists collaborating and building better tools together! 25 | * Less competitions and more collaboration! 26 | 27 | ## Then, in 2014.. 28 | 29 | ![](../images/ipython-to-jupyter.png) 30 | 31 | The IPython team realized they have built a robust application that could be used with other programming languages (Julia, R, etc.) They decided to then create the Jupyter Project making IPython the project that would handle anything Python related as shown in the following example slide presented by Fernando Perez during a presentation around the time the Jupyter Project was announced: 32 | 33 | ![](../images/ipython-jupyter-features.png) 34 | 35 | Nowadays, the Jupyter project not only supports Python but also over 40 programming languages such as R, Julia, Scala and PySpark. In fact, its name was originally derived from three programming languages: Julia, Python and R which made it one of the first language-agnostic notebook applications, and now considered one of the most preferred environments for data scientists and engineers in the community to explore and analyze data. 36 | 37 | ## What is Jupyter Notebook? 38 | 39 | The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. The Jupyter notebook combines two components: 40 | 41 | * **A web application:** a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output. 42 | * **Notebook documents:** a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects. 43 | 44 | Uses include: 45 | 46 | * Data cleaning and transformation 47 | * Statistical modeling 48 | * Data visualization 49 | * Machine learning, and much more 50 | 51 | ## What is a Notebook? 52 | 53 | Think of a notebook as a document that you can access via a web interface that allows you to save input (i.e. live code) and output (i.e. code execution results / evaluated code output) of interactive sessions as well as important notes needed to explain the methodology and steps taken to perform specific tasks (i.e data analysis). 54 | 55 | ![](../images/JUPYTER_NOTEBOOK_BASIC_VIEW.png) 56 | 57 | ## References 58 | * https://ipython.readthedocs.io/en/stable/about/history.html 59 | * https://ipython.readthedocs.io/en/stable/interactive/python-ipython-diff.html 60 | * https://www.youtube.com/watch?v=xuNj5paMuow&list=PL055Epbe6d5aP6Ru42r7hk68GTSaclYgi 61 | * https://speakerdeck.com/fperez/project-jupyter?slide=5 62 | * https://jupyter.org/ -------------------------------------------------------------------------------- /docs/images/Binderhub-Architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/Binderhub-Architecture.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_ARCHITECTURE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_ARCHITECTURE.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_CLIENT_EXEC_REQUEST.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_CLIENT_EXEC_REQUEST.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_CLIENT_EXEC_STREAM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_CLIENT_EXEC_STREAM.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_CLIENT_NOTEBOOK.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_CLIENT_NOTEBOOK.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_DOCKER_GH_MINIMAL.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_DOCKER_GH_MINIMAL.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_DOCKER_MINIMAL_RUN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_DOCKER_MINIMAL_RUN.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_EXECUTE_REQUEST.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_EXECUTE_REQUEST.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_INSTALLATION_NOTEBOOK_SERVER.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_INSTALLATION_NOTEBOOK_SERVER.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_IPYTHON.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_IPYTHON.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_KERNEL_LOCATION.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_KERNEL_LOCATION.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_NOTEBOOK_BASIC_VIEW.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_NOTEBOOK_BASIC_VIEW.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_NOTEBOOK_SERVER.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_NOTEBOOK_SERVER.png -------------------------------------------------------------------------------- /docs/images/JUPYTER_NOTEBOOK_SERVER_RUN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/JUPYTER_NOTEBOOK_SERVER_RUN.png -------------------------------------------------------------------------------- /docs/images/ipython-jupyter-features.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/ipython-jupyter-features.png -------------------------------------------------------------------------------- /docs/images/ipython-to-jupyter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/ipython-to-jupyter.png -------------------------------------------------------------------------------- /docs/images/logo/download.svg: -------------------------------------------------------------------------------- 1 | 2 | Group.svg 3 | Created using Figma 0.90 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | -------------------------------------------------------------------------------- /docs/images/logo/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/logo/favicon.ico -------------------------------------------------------------------------------- /docs/images/logo/jupyter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/logo/jupyter.png -------------------------------------------------------------------------------- /docs/images/logo/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/logo/logo.png -------------------------------------------------------------------------------- /docs/images/logo/logo.psd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OTRF/infosec-jupyter-book/61548b6e7a813f8ddaec0e700f6abd78e2c74c71/docs/images/logo/logo.psd -------------------------------------------------------------------------------- /docs/introduction.md: -------------------------------------------------------------------------------- 1 | # Infosec Jupyter Book 2 | 3 | [![Open_Threat_Research Community](https://img.shields.io/badge/Open_Threat_Research-Community-brightgreen.svg)](https://twitter.com/OTR_Community) 4 | [![Open Source Love svg1](https://badges.frapsoft.com/os/v3/open-source.svg?v=103)](https://github.com/ellerbrock/open-source-badges/) 5 | 6 | The Infosec Community Definitive Guide to Jupyter Notebooks to empower other researchers around the world to share, collaborate and help others through interactive environments. This is a community-driven project and contains documentation about the installation of a Jupyter Notebook server and interesting use cases for security research. 7 | 8 | ## Goals 9 | 10 | * Expedite the time it takes to start working with Jupyter Notebooks 11 | * Share use cases in different areas of InfoSec where notebooks can help 12 | * Aggregate and centralize information about Jupyter Notebooks for security researchers 13 | 14 | ## Contributing 15 | 16 | If you use Jupyter Notebooks for anything in InfoSec and would like to share it with the community and help others, freel free to open a PR. We would love to provide feedback and put it in the right place. 17 | 18 | ## Authors 19 | 20 | * Roberto Rodriguez [@Cyb3rWard0g](https://twitter.com/Cyb3rWard0g) 21 | * Jose Luis Rodriguez [@Cyb3rPandaH](https://twitter.com/Cyb3rPandaH) -------------------------------------------------------------------------------- /docs/use-cases/data-analysis/02_bloodhound_explore_kerberoastable_users.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Explore Kerberoastable Users with BloodHound\n", 8 | "----------------------------------------------\n", 9 | "* **Author**: Roberto Rodriguez (@Cyb3rWard0g)\n", 10 | "* **Project**: Infosec Jupyter Book\n", 11 | "* **Public Organization**: [Open Threat Research](https://github.com/OTRF)\n", 12 | "* **License**: [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/)\n", 13 | "* **Reference**: https://youtu.be/fqYoOoghqdE?t=1218" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Importing Libraries\n", 21 | "Pre-requisites:\n", 22 | "\n", 23 | "* pip install py2neo" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 1, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "from py2neo import Graph" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## Count Users with Service Principal Name Set " 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "When sharphound finds a user with a Service Principal Name set, it property named `hasspn` in the User node to `True`. Therefore, if we want to count the number users with that property set, we just need to query for users with `hasspn = True`." 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 2, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "g = Graph(\"bolt://206.189.85.93:7687\", auth=(\"neo4j\", \"BloodHound\"))" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "users_hasspn_count = g.run(\"\"\"\n", 65 | "MATCH (u:User {hasspn:true})\n", 66 | "RETURN COUNT(u)\n", 67 | "\"\"\").to_data_frame()" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 4, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "data": { 77 | "text/html": [ 78 | "
\n", 79 | "\n", 92 | "\n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | "
COUNT(u)
06
\n", 106 | "
" 107 | ], 108 | "text/plain": [ 109 | " COUNT(u)\n", 110 | "0 6" 111 | ] 112 | }, 113 | "execution_count": 4, 114 | "metadata": {}, 115 | "output_type": "execute_result" 116 | } 117 | ], 118 | "source": [ 119 | "users_hasspn_count" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 5, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "data": { 129 | "text/html": [ 130 | "
\n", 131 | "\n", 144 | "\n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | "
u.name
0SQLSVC@TOKYO.JAPAN.LOCAL
1SCANSERVICE@TOKYO.JAPAN.LOCAL
2KRBTGT@JAPAN.LOCAL
3BACKUPLDAP@TOKYO.JAPAN.LOCAL
4KRBTGT@TOKYO.JAPAN.LOCAL
5KRBTGT@SINGAPORE.LOCAL
\n", 178 | "
" 179 | ], 180 | "text/plain": [ 181 | " u.name\n", 182 | "0 SQLSVC@TOKYO.JAPAN.LOCAL\n", 183 | "1 SCANSERVICE@TOKYO.JAPAN.LOCAL\n", 184 | "2 KRBTGT@JAPAN.LOCAL\n", 185 | "3 BACKUPLDAP@TOKYO.JAPAN.LOCAL\n", 186 | "4 KRBTGT@TOKYO.JAPAN.LOCAL\n", 187 | "5 KRBTGT@SINGAPORE.LOCAL" 188 | ] 189 | }, 190 | "execution_count": 5, 191 | "metadata": {}, 192 | "output_type": "execute_result" 193 | } 194 | ], 195 | "source": [ 196 | "g.run(\"\"\"\n", 197 | "MATCH (u:User {hasspn:true})\n", 198 | "RETURN u.name\n", 199 | "\"\"\").to_data_frame()" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "## Retrieve Kerberoastable Users with Path to DA " 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "We can limit our results and return only Kereberoastable users with paths to DA. We can find Kerberoastable users with a path to DA and also see the length of the path to see which one is the closest." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 6, 219 | "metadata": {}, 220 | "outputs": [], 221 | "source": [ 222 | "krb_users_path_to_DA = g.run(\"\"\"\n", 223 | "MATCH (u:User {hasspn:true})\n", 224 | "MATCH (g:Group {name:'DOMAIN ADMINS@JAPAN.LOCAL'})\n", 225 | "MATCH p = shortestPath(\n", 226 | " (u)-[*1..]->(g)\n", 227 | ")\n", 228 | "RETURN u.name,LENGTH(p)\n", 229 | "ORDER BY LENGTH(p) ASC\n", 230 | "\"\"\").to_data_frame()" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 7, 236 | "metadata": {}, 237 | "outputs": [ 238 | { 239 | "data": { 240 | "text/html": [ 241 | "
\n", 242 | "\n", 255 | "\n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | "
u.nameLENGTH(p)
0SQLSVC@TOKYO.JAPAN.LOCAL3
1BACKUPLDAP@TOKYO.JAPAN.LOCAL5
\n", 276 | "
" 277 | ], 278 | "text/plain": [ 279 | " u.name LENGTH(p)\n", 280 | "0 SQLSVC@TOKYO.JAPAN.LOCAL 3\n", 281 | "1 BACKUPLDAP@TOKYO.JAPAN.LOCAL 5" 282 | ] 283 | }, 284 | "execution_count": 7, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "krb_users_path_to_DA" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "## Return Most Privileged Kerberoastable users" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "What if we do not have kerberoastable users with a path to DA? We can still look for most privileged Kerberoastable users based on how many computers they have local admins rights on. " 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 8, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "privileged_kerberoastable_users = g.run(\"\"\"\n", 314 | "MATCH (u:User {hasspn:true})\n", 315 | "OPTIONAL MATCH (u)-[:AdminTo]->(c1:Computer)\n", 316 | "OPTIONAL MATCH (u)-[:MemberOf*1..]->(:Group)-[:AdminTo]->(c2:Computer)\n", 317 | "WITH u,COLLECT(c1) + COLLECT(c2) AS tempVar\n", 318 | "UNWIND tempVar AS comps\n", 319 | "RETURN u.name,COUNT(DISTINCT(comps))\n", 320 | "ORDER BY COUNT(DISTINCT(comps)) DESC\n", 321 | "\"\"\").to_data_frame()" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 9, 327 | "metadata": {}, 328 | "outputs": [ 329 | { 330 | "data": { 331 | "text/html": [ 332 | "
\n", 333 | "\n", 346 | "\n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | "
u.nameCOUNT(DISTINCT(comps))
0SQLSVC@TOKYO.JAPAN.LOCAL1
\n", 362 | "
" 363 | ], 364 | "text/plain": [ 365 | " u.name COUNT(DISTINCT(comps))\n", 366 | "0 SQLSVC@TOKYO.JAPAN.LOCAL 1" 367 | ] 368 | }, 369 | "execution_count": 9, 370 | "metadata": {}, 371 | "output_type": "execute_result" 372 | } 373 | ], 374 | "source": [ 375 | "privileged_kerberoastable_users" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "metadata": {}, 382 | "outputs": [], 383 | "source": [] 384 | } 385 | ], 386 | "metadata": { 387 | "kernelspec": { 388 | "display_name": "Python 3", 389 | "language": "python", 390 | "name": "python3" 391 | }, 392 | "language_info": { 393 | "codemirror_mode": { 394 | "name": "ipython", 395 | "version": 3 396 | }, 397 | "file_extension": ".py", 398 | "mimetype": "text/x-python", 399 | "name": "python", 400 | "nbconvert_exporter": "python", 401 | "pygments_lexer": "ipython3", 402 | "version": "3.7.6" 403 | } 404 | }, 405 | "nbformat": 4, 406 | "nbformat_minor": 4 407 | } 408 | -------------------------------------------------------------------------------- /docs/use-cases/data-analysis/03_analyzing_rpc_methods_relationships_graphframes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Analyzing Windows RPC Methods & Other Functions Via GraphFrames" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "* **Author:** Roberto Rodriguez (@Cyb3rWard0g)\n", 15 | "* **Project:** Infosec Jupyter Book\n", 16 | "* **Public Organization:** Open Threat Research\n", 17 | "* **License:** Creative Commons Attribution-ShareAlike 4.0 International\n", 18 | "* **Reference:**" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Import Libraries" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 1, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "from pyspark.sql import SparkSession\n", 35 | "from pyspark.sql.functions import *\n", 36 | "from graphframes import *" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Initialize Spark Session" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "spark = SparkSession \\\n", 53 | " .builder \\\n", 54 | " .appName(\"WinRPC\") \\\n", 55 | " .config(\"spark.sql.caseSensitive\",\"True\") \\\n", 56 | " .config(\"spark.driver.memory\", \"4g\") \\\n", 57 | " .getOrCreate()" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 3, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "data": { 67 | "text/html": [ 68 | "\n", 69 | "
\n", 70 | "

SparkSession - in-memory

\n", 71 | " \n", 72 | "
\n", 73 | "

SparkContext

\n", 74 | "\n", 75 | "

Spark UI

\n", 76 | "\n", 77 | "
\n", 78 | "
Version
\n", 79 | "
v3.0.0
\n", 80 | "
Master
\n", 81 | "
local[*]
\n", 82 | "
AppName
\n", 83 | "
WinRPC
\n", 84 | "
\n", 85 | "
\n", 86 | " \n", 87 | "
\n", 88 | " " 89 | ], 90 | "text/plain": [ 91 | "" 92 | ] 93 | }, 94 | "execution_count": 3, 95 | "metadata": {}, 96 | "output_type": "execute_result" 97 | } 98 | ], 99 | "source": [ 100 | "spark" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "## Download and Decompress JSON File" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "name": "stdout", 117 | "output_type": "stream", 118 | "text": [ 119 | "--2020-07-21 15:01:41-- https://github.com/Cyb3rWard0g/WinRpcFunctions/raw/master/win10_1909/AllRpcFuncMaps.zip\n", 120 | "Resolving github.com (github.com)... 140.82.113.3\n", 121 | "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n", 122 | "HTTP request sent, awaiting response... 302 Found\n", 123 | "Location: https://raw.githubusercontent.com/Cyb3rWard0g/WinRpcFunctions/master/win10_1909/AllRpcFuncMaps.zip [following]\n", 124 | "--2020-07-21 15:01:41-- https://raw.githubusercontent.com/Cyb3rWard0g/WinRpcFunctions/master/win10_1909/AllRpcFuncMaps.zip\n", 125 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", 126 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", 127 | "HTTP request sent, awaiting response... 200 OK\n", 128 | "Length: 26891116 (26M) [application/zip]\n", 129 | "Saving to: ‘AllRpcFuncMaps.zip’\n", 130 | "\n", 131 | "AllRpcFuncMaps.zip 100%[===================>] 25.64M 4.33MB/s in 6.1s \n", 132 | "\n", 133 | "2020-07-21 15:01:47 (4.22 MB/s) - ‘AllRpcFuncMaps.zip’ saved [26891116/26891116]\n", 134 | "\n" 135 | ] 136 | } 137 | ], 138 | "source": [ 139 | "! wget https://github.com/Cyb3rWard0g/WinRpcFunctions/raw/master/win10_1909/AllRpcFuncMaps.zip" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 5, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "Archive: AllRpcFuncMaps.zip\n", 152 | " inflating: AllRpcFuncMaps.json \n" 153 | ] 154 | } 155 | ], 156 | "source": [ 157 | "! unzip AllRpcFuncMaps.zip" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "## Read JSON File as Spark DataFrame" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 6, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "name": "stdout", 174 | "output_type": "stream", 175 | "text": [ 176 | "CPU times: user 9.34 ms, sys: 5.12 ms, total: 14.5 ms\n", 177 | "Wall time: 1min 8s\n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "%%time\n", 183 | "df = spark.read.json('AllRpcFuncMaps.json')" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "## Create Temporary SQL View" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 7, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "df.createOrReplaceTempView('RPCMaps')" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "## Create GraphFrame" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 8, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "vertices = spark.sql(\n", 216 | "'''\n", 217 | "SELECT FunctionName AS id, FunctionType, Module\n", 218 | "FROM RPCMaps\n", 219 | "GROUP BY FunctionName, FunctionType, Module\n", 220 | "'''\n", 221 | ")" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 9, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "edges = spark.sql(\n", 231 | "'''\n", 232 | "SELECT CalledBy AS src, FunctionName AS dst\n", 233 | "FROM RPCMaps\n", 234 | "'''\n", 235 | ").dropDuplicates()" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": 10, 241 | "metadata": {}, 242 | "outputs": [], 243 | "source": [ 244 | "g = GraphFrame(vertices, edges)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 11, 250 | "metadata": {}, 251 | "outputs": [ 252 | { 253 | "data": { 254 | "text/plain": [ 255 | "GraphFrame(v:[id: string, FunctionType: string ... 1 more field], e:[src: string, dst: string])" 256 | ] 257 | }, 258 | "execution_count": 11, 259 | "metadata": {}, 260 | "output_type": "execute_result" 261 | } 262 | ], 263 | "source": [ 264 | "g" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "## Motif Finding" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "Motif finding refers to searching for structural patterns in a graph.\n", 279 | "\n", 280 | "GraphFrame motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries. For example, graph.find(\"(a)-[e]->(b); (b)-[e2]->(a)\") will search for pairs of vertices a,b connected by edges in both directions. It will return a DataFrame of all such structures in the graph, with columns for each of the named elements (vertices or edges) in the motif" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": {}, 286 | "source": [ 287 | "## Basic Motif Queries" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "What about a chain of 3 vertices where the first one is an RPC function and the last one is an external function named LoadLibraryExW?" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 12, 300 | "metadata": {}, 301 | "outputs": [], 302 | "source": [ 303 | "loadLibrary = g.find(\"(a)-[]->(b); (b)-[]->(c)\")\\\n", 304 | " .filter(\"a.FunctionType = 'RPCFunction'\")\\\n", 305 | " .filter(\"c.FunctionType = 'ExtFunction'\")\\\n", 306 | " .filter(\"c.id = 'LoadLibraryExW'\").dropDuplicates()" 307 | ] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": 13, 312 | "metadata": {}, 313 | "outputs": [ 314 | { 315 | "name": "stdout", 316 | "output_type": "stream", 317 | "text": [ 318 | "+---------------------------------------+----------------------------------------+----------+--------------+\n", 319 | "|Module |id |id |id |\n", 320 | "+---------------------------------------+----------------------------------------+----------+--------------+\n", 321 | "|c:/Windows/System32/appinfo.dll |RAiLaunchProcessWithIdentity |Open |LoadLibraryExW|\n", 322 | "|C:/Windows/System32/UserDataService.dll|UdmSvcImpl_GetContactRevisionEnum |Initialize|LoadLibraryExW|\n", 323 | "|c:/Windows/System32/lsm.dll |RpcWaitAsyncNotification |Initialize|LoadLibraryExW|\n", 324 | "|c:/Windows/System32/lsm.dll |RpcWaitAsyncNotification |Initialize|LoadLibraryExW|\n", 325 | "|C:/Windows/System32/PhoneService.dll |PhoneSvcImpl_PhoneRpcGetShouldMuteKeypad|Initialize|LoadLibraryExW|\n", 326 | "|C:/Windows/System32/UserDataService.dll|UdmSvcImpl_ToggleContactMaintenance |Initialize|LoadLibraryExW|\n", 327 | "|C:/Windows/System32/UserDataService.dll|UdmSvcImpl_EmptyEmailFolder |Initialize|LoadLibraryExW|\n", 328 | "|C:/Windows/System32/UserDataService.dll|UdmSvcImpl_EmptyEmailFolder |Initialize|LoadLibraryExW|\n", 329 | "|c:/Windows/System32/vpnike.dll |VpnikeCreateIDPayload |Initialize|LoadLibraryExW|\n", 330 | "|c:/Windows/System32/vpnike.dll |VpnikeCreateIDPayload |Initialize|LoadLibraryExW|\n", 331 | "+---------------------------------------+----------------------------------------+----------+--------------+\n", 332 | "only showing top 10 rows\n", 333 | "\n", 334 | "CPU times: user 6.63 ms, sys: 3.24 ms, total: 9.87 ms\n", 335 | "Wall time: 37.8 s\n" 336 | ] 337 | } 338 | ], 339 | "source": [ 340 | "%%time\n", 341 | "loadLibrary.select(\"a.Module\",\"a.id\",\"b.id\",\"c.id\").show(10,truncate=False)" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "What if we also filter our graph query by a specific module? What about Lsasrv.dll?" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 14, 354 | "metadata": {}, 355 | "outputs": [], 356 | "source": [ 357 | "loadLibrary = g.find(\"(a)-[]->(b); (b)-[]->(c)\")\\\n", 358 | " .filter(\"a.FunctionType = 'RPCFunction'\")\\\n", 359 | " .filter(\"lower(a.Module) LIKE '%lsasrv.dll'\")\\\n", 360 | " .filter(\"c.FunctionType = 'ExtFunction'\")\\\n", 361 | " .filter(\"c.id = 'LoadLibraryExW'\").dropDuplicates()" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 15, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "name": "stdout", 371 | "output_type": "stream", 372 | "text": [ 373 | "+------------------------------+----------------------------------+-------------------------+--------------+\n", 374 | "|Module |id |id |id |\n", 375 | "+------------------------------+----------------------------------+-------------------------+--------------+\n", 376 | "|c:/Windows/System32/lsasrv.dll|DsRolerGetPrimaryDomainInformation|LsapDbOpenObject |LoadLibraryExW|\n", 377 | "|c:/Windows/System32/lsasrv.dll|LsarQueryTrustedDomainInfoByName |LsapLoadLsaDbExtensionDll|LoadLibraryExW|\n", 378 | "|c:/Windows/System32/lsasrv.dll|LsarOpenPolicy2 |LsapDbOpenObject |LoadLibraryExW|\n", 379 | "|c:/Windows/System32/lsasrv.dll|DsRolerGetPrimaryDomainInformation|LsapDbOpenObject |LoadLibraryExW|\n", 380 | "|c:/Windows/System32/lsasrv.dll|LsarCreateSecret |LsapDbDereferenceObject |LoadLibraryExW|\n", 381 | "|c:/Windows/System32/lsasrv.dll|LsarEnumerateAccountsWithUserRight|LsapDbDereferenceObject |LoadLibraryExW|\n", 382 | "|c:/Windows/System32/lsasrv.dll|LsarLookupSids |LsapLookupSids |LoadLibraryExW|\n", 383 | "|c:/Windows/System32/lsasrv.dll|LsarQueryTrustedDomainInfoByName |LsapDbOpenObject |LoadLibraryExW|\n", 384 | "|c:/Windows/System32/lsasrv.dll|LsarSetTrustedDomainInfoByName |LsapDbDereferenceObject |LoadLibraryExW|\n", 385 | "|c:/Windows/System32/lsasrv.dll|LsarOpenAccount |LsapLoadLsaDbExtensionDll|LoadLibraryExW|\n", 386 | "+------------------------------+----------------------------------+-------------------------+--------------+\n", 387 | "only showing top 10 rows\n", 388 | "\n", 389 | "CPU times: user 4.95 ms, sys: 2.65 ms, total: 7.6 ms\n", 390 | "Wall time: 23 s\n" 391 | ] 392 | } 393 | ], 394 | "source": [ 395 | "%%time\n", 396 | "loadLibrary.select(\"a.Module\",\"a.id\",\"b.id\",\"c.id\").show(10,truncate=False)" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "## Breadth-first search (BFS)\n", 404 | "\n", 405 | "Breadth-first search (BFS) finds the shortest path(s) from one vertex (or a set of vertices) to another vertex (or a set of vertices). The beginning and end vertices are specified as Spark DataFrame expressions." 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "### Shortest Path from an RPC Method to LoadLibraryExW" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 16, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [ 421 | "loadLibraryBFS = g.bfs(\n", 422 | " fromExpr = \"FunctionType = 'RPCFunction'\",\n", 423 | " toExpr = \"id = 'LoadLibraryExW' and FunctionType = 'ExtFunction'\",\n", 424 | " maxPathLength = 3).dropDuplicates()" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 17, 430 | "metadata": {}, 431 | "outputs": [ 432 | { 433 | "name": "stdout", 434 | "output_type": "stream", 435 | "text": [ 436 | "+--------------------------------------+--------------------------------------------+\n", 437 | "|Module |e0 |\n", 438 | "+--------------------------------------+--------------------------------------------+\n", 439 | "|C:/Windows/System32/appmgmts.dll |[ARPRemoveApp, LoadLibraryExW] |\n", 440 | "|c:/Windows/System32/nlasvc.dll |[operator(), LoadLibraryExW] |\n", 441 | "|c:/Windows/System32/lsasrv.dll |[LsarQueryInformationPolicy, LoadLibraryExW]|\n", 442 | "|C:/Windows/System32/tellib.dll |[operator(), LoadLibraryExW] |\n", 443 | "|C:/Windows/System32/tellib.dll |[operator(), LoadLibraryExW] |\n", 444 | "|C:/Windows/System32/debugregsvc.dll |[s_MergeEtlFiles, LoadLibraryExW] |\n", 445 | "|c:/Windows/System32/samsrv.dll |[SamrCloseHandle, LoadLibraryExW] |\n", 446 | "|C:/Windows/System32/appmgmts.dll |[GetManagedApps, LoadLibraryExW] |\n", 447 | "|C:/Windows/System32/debugregsvc.dll |[s_MergeEtlFiles, LoadLibraryExW] |\n", 448 | "|C:/Windows/System32/WaaSMedicAgent.exe|[LoadPluginLibrary, LoadLibraryExW] |\n", 449 | "+--------------------------------------+--------------------------------------------+\n", 450 | "only showing top 10 rows\n", 451 | "\n", 452 | "CPU times: user 2.73 ms, sys: 1.58 ms, total: 4.31 ms\n", 453 | "Wall time: 13.5 s\n" 454 | ] 455 | } 456 | ], 457 | "source": [ 458 | "%%time\n", 459 | "loadLibraryBFS.select(\"from.Module\", \"e0\").show(10,truncate=False)" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": null, 465 | "metadata": {}, 466 | "outputs": [], 467 | "source": [] 468 | } 469 | ], 470 | "metadata": { 471 | "kernelspec": { 472 | "display_name": "PySpark_Python3", 473 | "language": "python", 474 | "name": "pyspark3" 475 | }, 476 | "language_info": { 477 | "codemirror_mode": { 478 | "name": "ipython", 479 | "version": 3 480 | }, 481 | "file_extension": ".py", 482 | "mimetype": "text/x-python", 483 | "name": "python", 484 | "nbconvert_exporter": "python", 485 | "pygments_lexer": "ipython3", 486 | "version": "3.7.6" 487 | } 488 | }, 489 | "nbformat": 4, 490 | "nbformat_minor": 4 491 | } 492 | -------------------------------------------------------------------------------- /docs/use-cases/data-analysis/intro.md: -------------------------------------------------------------------------------- 1 | # Data Analysis 2 | 3 | Notebooks shared by the community for data analysis. -------------------------------------------------------------------------------- /docs/use-cases/data-connectors/elasticsearch.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Elasticsearch\n", 8 | "\n", 9 | "* **Author:** Roberto Rodriguez (@Cyb3rWard0g)\n", 10 | "* **Notes**: Download this notebook and use it to connect to your own Elasticsearch database. The BinderHub project might not allow direct connections to external entities on port 9092\n", 11 | "* **References:**\n", 12 | " * https://medium.com/threat-hunters-forge/jupyter-notebooks-from-sigma-rules-%EF%B8%8F-to-query-elasticsearch-31a74cc59b99\n", 13 | " * https://github.com/target/huntlib" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Using Elasticsearch DSL\n", 21 | "\n", 22 | "Pre-requisites:\n", 23 | "\n", 24 | "* pip install elasticsearch\n", 25 | "* pip install pandas\n", 26 | "* pip install elasticsearch-dsl" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "### Import Libraries" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "from elasticsearch import Elasticsearch\n", 43 | "from elasticsearch_dsl import Search\n", 44 | "import pandas as pd" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "### Initialize an Elasticsearch client\n", 52 | "\n", 53 | "Initialize an Elasticsearch client using a specific Elasticsearch URL. Next, you can pass the client to the Search object that we will use to represent the search request in a little bit." 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "es = Elasticsearch(['http://:9200'])\n", 63 | "searchContext = Search(using=es, index='logs-*', doc_type='doc')" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": {}, 69 | "source": [ 70 | "### Set the Query Search Context\n", 71 | "\n", 72 | "In addition, we will need to use the query class to pass an Elasticsearch query_string . For example, what if I want to query event_id 1 events?." 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "s = searchContext.query('query_string', query='event_id:1')" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "### Run Query & Explore Response\n", 89 | "\n", 90 | "Finally, you can run the query and get the results back as a DataFrame" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": {}, 97 | "outputs": [], 98 | "source": [ 99 | "response = s.execute()\n", 100 | "\n", 101 | "if response.success():\n", 102 | " df = pd.DataFrame((d.to_dict() for d in s.scan()))\n", 103 | "\n", 104 | "df" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "## Using HuntLib (@DavidJBianco)\n", 112 | "\n", 113 | "Pre-requisites:\n", 114 | "\n", 115 | "* pip install huntlib" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### Import Library" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "from huntlib.elastic import ElasticDF" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "### Create Connection\n", 139 | "Create a plaintext connection to the Elastic server, no authentication" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": null, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "e = ElasticDF(\n", 149 | " url=\"http://localhost:9200\"\n", 150 | ")" 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "### Search ES\n", 158 | "A more complex example, showing how to set the Elastic document type, use Python-style datetime objects to constrain the search to a certain time period, and a user-defined field against which to do the time comparisons. The result size will be limited to no more than 1500 entries." 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "df = e.search_df(\n", 168 | " lucene=\"item:5285 AND color:red\",\n", 169 | " index=\"myindex-*\",\n", 170 | " doctype=\"doc\", date_field=\"mydate\",\n", 171 | " start_time=datetime.now() - timedelta(days=8),\n", 172 | " end_time=datetime.now() - timedelta(days=6),\n", 173 | " limit=1500\n", 174 | ")" 175 | ] 176 | } 177 | ], 178 | "metadata": { 179 | "kernelspec": { 180 | "display_name": "Python 3", 181 | "language": "python", 182 | "name": "python3" 183 | }, 184 | "language_info": { 185 | "codemirror_mode": { 186 | "name": "ipython", 187 | "version": 3 188 | }, 189 | "file_extension": ".py", 190 | "mimetype": "text/x-python", 191 | "name": "python", 192 | "nbconvert_exporter": "python", 193 | "pygments_lexer": "ipython3", 194 | "version": "3.7.3" 195 | } 196 | }, 197 | "nbformat": 4, 198 | "nbformat_minor": 2 199 | } 200 | -------------------------------------------------------------------------------- /docs/use-cases/data-connectors/intro.md: -------------------------------------------------------------------------------- 1 | # Data Connectors 2 | 3 | Notebooks showing how to connect to data repositories where your security events are stored. -------------------------------------------------------------------------------- /docs/use-cases/data-connectors/splunk.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Splunk\n", 8 | "\n", 9 | "* **Author:** Roberto Rodriguez (@Cyb3rWard0g)\n", 10 | "* **Notes**: Download this notebook and use it to connect to your own Splunk instance. The BinderHub project might not allow direct connections to external entities on not common ports\n", 11 | "* **References:**\n", 12 | " * https://github.com/target/huntlib" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "## Using HuntLib (@DavidJBianco)\n", 20 | "\n", 21 | "Pre-requisites:\n", 22 | "\n", 23 | "* pip install huntlib" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### Import Library" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "from huntlib.splunk import SplunkDF" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "### Search" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "df = s.search_df(\n", 56 | " spl=\"search index=win_events EventCode=4688\",\n", 57 | " start_time=\"-2d@d\",\n", 58 | " end_time=\"@d\"\n", 59 | ")" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [] 68 | } 69 | ], 70 | "metadata": { 71 | "kernelspec": { 72 | "display_name": "Python 3", 73 | "language": "python", 74 | "name": "python3" 75 | }, 76 | "language_info": { 77 | "codemirror_mode": { 78 | "name": "ipython", 79 | "version": 3 80 | }, 81 | "file_extension": ".py", 82 | "mimetype": "text/x-python", 83 | "name": "python", 84 | "nbconvert_exporter": "python", 85 | "pygments_lexer": "ipython3", 86 | "version": "3.7.3" 87 | } 88 | }, 89 | "nbformat": 4, 90 | "nbformat_minor": 2 91 | } 92 | -------------------------------------------------------------------------------- /docs/use-cases/data-visualizations/intro.md: -------------------------------------------------------------------------------- 1 | # Data Visualizations 2 | 3 | Notebooks showing a range of methods for creating powerfull and useful data visualizations --------------------------------------------------------------------------------