├── known_issues.md
├── deploy_snowflake.sh
├── Dockerfile
├── spark.ipynb
├── SQLAlchemy.ipynb
├── pyodbc.ipynb
├── Python.ipynb
└── README.md


/known_issues.md:
--------------------------------------------------------------------------------
 1 | # Known issues / Change Log / Gotchas
 2 | 
 3 | #### 2020-09-14
 4 | - ZN: Tested latest drivers as of to date and updated Dockerfile.
 5 | - ZN: Fixed line 26 to remove libpq-dev to prevent the following error:
 6 | 
 7 | ```E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/postgresql-10/libpq5_10.12-0ubuntu0.18.04.1_amd64.deb  404  Not Found [IP: XXXXX 80]
 8 | E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/postgresql-10/libpq-dev_10.12-0ubuntu0.18.04.1_amd64.deb  404  Not Found [IP: XXXXX 80]
 9 | E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
10 | The command '/bin/sh -c apt-get install -y iodbc libiodbc2-dev libpq-dev libssl-dev' returned a non-zero code: 100
11 | ```
12 | 
13 | #### 2020-07-30:
14 | 
15 | - ZN: Added conda install of jupyterlab-plotly-extension to allow plotly visualizations when using jupyterlab mode. Note that this will require a restart of your web browser for the change to take effect.
16 | - ZN: Tested latest drivers as of to date and updated Dockerfile.
17 | 
18 | 
19 | #### 2020-04-20:
20 | 
21 | - ZN: Fixed Dockerfile to pull the pandas optimized connector for snowflake (snowflake-connector-python[pandas]).
22 | - ZN: Tested latest drivers as of to date, including latest Spark Optimized driver version 2.7 and updated Dockerfile.
23 | 
24 | #### 2019-02-14:
25 | 
26 | - ZN: SnowSQL CLI version 1.2.4 has deployment issues which will be fixed in 1.2.5. As a workaround, just build the docker with 1.2.2, and launching SnowSQL will auto-upgrade the image.
27 | - PG: To allow reads against GCP, add the following option to the Spark connector:
28 | ```
29 | 'use_copy_unload':'false'
30 | ```
31 | 
32 | #### 2019-02-13:
33 | 
34 | - ZN: Jupyter-Stacks latest image updated on 02/11 upgraded Spark to 2.4.5 version which breaks pyspark. Fixed Dockerfile to pick-up last working version.
35 | 


--------------------------------------------------------------------------------
/deploy_snowflake.sh:
--------------------------------------------------------------------------------
 1 | # Script version v1.0
 2 | # Author: Zohar Nissare-Houssen
 3 | # E-mail: z.nissare-houssen@snowflake.com
 4 | #
 5 | # README:
 6 | #     - Please check the following URLs for the driver to pick up:
 7 | #         ODBC:  https://sfc-repo.snowflakecomputing.com/odbc/linux/index.html
 8 | #         JDBC:  https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/
 9 | #         Spark: https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_2.11
10 | #         Note: For Spark, the docker currently uses Spark 2.4 with Scala 2.11
11 | 
12 | #!/bin/bash
13 | 
14 | export odbc_version=${odbc_version:-2.19.16}
15 | export odbc_file=${odbc_file:-snowflake_linux_x8664_odbc-${odbc_version}.tgz}
16 | export jdbc_version=${jdbc_version:-3.9.2}
17 | export jdbc_file=${jdbc_file:-snowflake-jdbc-${jdbc_version}.jar}
18 | export scala_version=${scala_version:-2.11}
19 | export spark_version=${spark_version:-2.5.4-spark_2.4}
20 | export spark_file=${spark_file:-spark-snowflake_${scala_version}-${spark_version}.jar}
21 | export snowsql_version=${snowsql_version:-1.1.85}
22 | export bootstrap_version=`echo ${snowsql_version}|cut -c -3`
23 | export snowsql_file=${snowsql_file:-snowsql-${snowsql_version}-linux_x86_64.bash}
24 | cd /
25 | 
26 | echo "Downloading odbc driver version" ${odbc_version} "..."
27 | curl -O https://sfc-repo.snowflakecomputing.com/odbc/linux/${odbc_version}/${odbc_file}
28 | 
29 | echo "Downloading jdbc driver version" ${jdbc_version} "..."
30 | curl -O https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/${jdbc_version}/${jdbc_file}
31 | 
32 | echo "Downloading spark driver version" ${spark_version} "..."
33 | curl -O https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_${scala_version}/${spark_version}/${spark_file}
34 | 
35 | echo "Download SnowSQL client version" ${snowsql_version} "..."
36 | curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/${bootstrap_version}/linux_x86_64/${snowsql_file}
37 | 
38 | tar -xzvf ${odbc_file}
39 | ./snowflake_odbc/iodbc_setup.sh
40 | 
41 | cp ${jdbc_file} /usr/local/spark/jars
42 | cp ${spark_file} /usr/local/spark/jars
43 | 
44 | SNOWSQL_DEST=/usr/bin SNOWSQL_LOGIN_SHELL=/home/jovyan/.profile bash /${snowsql_file}
45 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | #     - Please check the following URLs for the driver to pick up:
 2 | #  All drivers: https://docs.snowflake.net/manuals/release-notes/client-change-log.html#client-changes-by-version
 3 | #         ODBC:  https://sfc-repo.snowflakecomputing.com/odbc/linux/index.html
 4 | #         JDBC:  https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/
 5 | #         Spark: https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_2.11
 6 | #         Note: For Spark, the docker currently uses Spark 2.4 with Scala 2.11
 7 | #    - Update line 29 with the correct levels to be deployed which executes deploy_snowflake.sh Script
 8 | #
 9 | # Questions: Zohar Nissare-Houssen - z.nissare-houssen@snowflake.com
10 | #
11 | 
12 | #Start from the following core stack version
13 | FROM jupyter/all-spark-notebook:1c8073a927aa
14 | USER root
15 | RUN apt-get update && \
16 |     apt-get install -y apt-utils && \
17 |     apt-get install -y libssl-dev libffi-dev && \
18 |     apt-get install -y vim
19 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade pip
20 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade pyarrow
21 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade snowflake-connector-python[pandas]
22 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade snowflake-sqlalchemy
23 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade plotly
24 | RUN conda install pyodbc
25 | RUN conda install -c conda-forge jupyterlab-plotly-extension --yes
26 | RUN apt-get install -y iodbc libiodbc2-dev libssl-dev
27 | COPY ./deploy_snowflake.sh /
28 | RUN chmod +x /deploy_snowflake.sh
29 | RUN odbc_version=2.21.8 jdbc_version=3.12.10 spark_version=2.8.1-spark_2.4 snowsql_version=1.2.9 /deploy_snowflake.sh
30 | RUN mkdir /home/jovyan/samples
31 | COPY ./pyodbc.ipynb /home/jovyan/samples
32 | COPY ./Python.ipynb /home/jovyan/samples
33 | COPY ./spark.ipynb /home/jovyan/samples
34 | COPY ./SQLAlchemy.ipynb /home/jovyan/samples
35 | RUN chown -R jovyan:users /home/jovyan/samples
36 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/pyodbc.ipynb
37 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/Python.ipynb
38 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/spark.ipynb
39 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/SQLAlchemy.ipynb
40 | 


--------------------------------------------------------------------------------
/spark.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "# part1\n",
 10 |     "from pyspark import SparkConf,SparkContext\n",
 11 |     "from pyspark.sql import SQLContext\n",
 12 |     "from pyspark.sql.types import*\n",
 13 |     "from pyspark.sql.functions import*\n",
 14 |     "from pyspark import SparkConf,SparkContext"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 2,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "#sc.stop()\n",
 24 |     "sc = SparkContext(\"local\", \"Simple App\")\n",
 25 |     "spark = SQLContext(sc)\n",
 26 |     "spark_conf = SparkConf().setMaster('local').setAppName('DEMO40')\n",
 27 |     "spark._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(spark._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())\n",
 28 |     "sc._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(sc._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())\n"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": 3,
 34 |    "metadata": {},
 35 |    "outputs": [],
 36 |    "source": [
 37 |     "sfOptions={'sfURL':'xxxx.snowflakecomputing.com',\n",
 38 |     "           'sfUser':'xxxx',\n",
 39 |     "           'sfPassword': 'xxxx',\n",
 40 |     "           'sfDatabase':'sales',\n",
 41 |     "           'sfSchema':'public',\n",
 42 |     "           'sfRole':'xxxx',\n",
 43 |     "           'sfWarehouse':'xxxx'}\n",
 44 |     "\n",
 45 |     "sfSource='net.snowflake.spark.snowflake'"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "code",
 50 |    "execution_count": 4,
 51 |    "metadata": {},
 52 |    "outputs": [],
 53 |    "source": [
 54 |     "df_nation=spark.read.format(sfSource) \\\n",
 55 |     "    .options(**sfOptions) \\\n",
 56 |     "    .option(\"dbtable\",\"nation\") \\\n",
 57 |     "    .load()\n",
 58 |     "\n",
 59 |     "df_region=spark.read.format(sfSource) \\\n",
 60 |     "    .options(**sfOptions) \\\n",
 61 |     "    .option(\"dbtable\",\"region\") \\\n",
 62 |     "    .load()\n",
 63 |     "\n",
 64 |     "df_cust=spark.read.format(sfSource) \\\n",
 65 |     "    .options(**sfOptions) \\\n",
 66 |     "    .option(\"dbtable\",\"customer\") \\\n",
 67 |     "    .load()\n"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "code",
 72 |    "execution_count": 5,
 73 |    "metadata": {},
 74 |    "outputs": [],
 75 |    "source": [
 76 |     "df_loc = df_nation.join(df_region, df_nation['N_REGIONKEY'] == df_region['R_REGIONKEY']) \n"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "code",
 81 |    "execution_count": 6,
 82 |    "metadata": {},
 83 |    "outputs": [],
 84 |    "source": [
 85 |     "df_cl = df_loc.join(df_cust, df_loc['N_NATIONKEY'] == df_cust['C_NATIONKEY']) \\\n",
 86 |     "         .filter(col('R_NAME') == 'AFRICA') \\\n",
 87 |     "         .select('C_MKTSEGMENT') \\\n",
 88 |     "         .groupBy('C_MKTSEGMENT').count()\n"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": 7,
 94 |    "metadata": {},
 95 |    "outputs": [
 96 |     {
 97 |      "name": "stdout",
 98 |      "output_type": "stream",
 99 |      "text": [
100 |       "+------------+------+\n",
101 |       "|C_MKTSEGMENT| count|\n",
102 |       "+------------+------+\n",
103 |       "|   MACHINERY|601756|\n",
104 |       "|  AUTOMOBILE|602286|\n",
105 |       "|   FURNITURE|600983|\n",
106 |       "|   HOUSEHOLD|601355|\n",
107 |       "|    BUILDING|601798|\n",
108 |       "+------------+------+\n",
109 |       "\n",
110 |       "CPU times: user 0 ns, sys: 10 ms, total: 10 ms\n",
111 |       "Wall time: 10.5 s\n"
112 |      ]
113 |     }
114 |    ],
115 |    "source": [
116 |     "%%time\n",
117 |     "df_cl.show()"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "code",
122 |    "execution_count": 8,
123 |    "metadata": {},
124 |    "outputs": [
125 |     {
126 |      "name": "stdout",
127 |      "output_type": "stream",
128 |      "text": [
129 |       "root\n",
130 |       " |-- C_MKTSEGMENT: string (nullable = true)\n",
131 |       " |-- count: long (nullable = false)\n",
132 |       "\n"
133 |      ]
134 |     }
135 |    ],
136 |    "source": [
137 |     "df_cl.printSchema()"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": 9,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "\n",
147 |     "df_cl_single = df_loc.join(df_cust, df_loc['N_NATIONKEY'] == df_cust['C_NATIONKEY']) \\\n",
148 |     "         .filter(df_cust['C_CUSTKEY'] == '123456') \\\n",
149 |     "         .select('C_MKTSEGMENT') \\\n",
150 |     "         .groupBy('C_MKTSEGMENT').count()\n"
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "code",
155 |    "execution_count": 10,
156 |    "metadata": {},
157 |    "outputs": [
158 |     {
159 |      "name": "stdout",
160 |      "output_type": "stream",
161 |      "text": [
162 |       "+------------+-----+\n",
163 |       "|C_MKTSEGMENT|count|\n",
164 |       "+------------+-----+\n",
165 |       "|  AUTOMOBILE|    1|\n",
166 |       "+------------+-----+\n",
167 |       "\n",
168 |       "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
169 |       "Wall time: 5.96 s\n"
170 |      ]
171 |     }
172 |    ],
173 |    "source": [
174 |     "%%time\n",
175 |     "df_cl_single.show()"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": null,
181 |    "metadata": {},
182 |    "outputs": [],
183 |    "source": []
184 |   }
185 |  ],
186 |  "metadata": {
187 |   "kernelspec": {
188 |    "display_name": "Python 3",
189 |    "language": "python",
190 |    "name": "python3"
191 |   },
192 |   "language_info": {
193 |    "codemirror_mode": {
194 |     "name": "ipython",
195 |     "version": 3
196 |    },
197 |    "file_extension": ".py",
198 |    "mimetype": "text/x-python",
199 |    "name": "python",
200 |    "nbconvert_exporter": "python",
201 |    "pygments_lexer": "ipython3",
202 |    "version": "3.7.3"
203 |   }
204 |  },
205 |  "nbformat": 4,
206 |  "nbformat_minor": 2
207 | }
208 | 


--------------------------------------------------------------------------------
/SQLAlchemy.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "# Import the appropriate packages & modules\n",
 10 |     "import snowflake.connector\n",
 11 |     "from snowflake.connector.converter_null import SnowflakeNoConverterToPython\n",
 12 |     "import pandas as pd \n",
 13 |     "from sqlalchemy import create_engine\n",
 14 |     "from snowflake.sqlalchemy import URL\n",
 15 |     "import sqlalchemy as sa"
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "code",
 20 |    "execution_count": 2,
 21 |    "metadata": {},
 22 |    "outputs": [],
 23 |    "source": [
 24 |     "# Set some variables for the account, user & Password\n",
 25 |     "# Modify this section to match your demo account\n",
 26 |     "# and create an 'engine' for the Snowflake connection\n",
 27 |     "ACCOUNT = 'xxxx'\n",
 28 |     "USER = 'xxxx'\n",
 29 |     "PASSWORD = 'xxxx'\n",
 30 |     "\n",
 31 |     "engine = create_engine(URL(\n",
 32 |     "    account = ACCOUNT,\n",
 33 |     "    user = USER,\n",
 34 |     "    password = PASSWORD,\n",
 35 |     "    database = 'sample',\n",
 36 |     "    schema = 'public',\n",
 37 |     "    warehouse = 'xxxx',\n",
 38 |     "    role='xxxx',\n",
 39 |     "))\n",
 40 |     "\n",
 41 |     "sql = \"select * from sales.public.customer limit 1000\""
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 3,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "# Use Pandas dataframe method read_sql_query to execute SQL in SQL Alchemy \n",
 51 |     "#%%time\n",
 52 |     "df = pd.read_sql_query(sql, engine)"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": 4,
 58 |    "metadata": {},
 59 |    "outputs": [
 60 |     {
 61 |      "name": "stdout",
 62 |      "output_type": "stream",
 63 |      "text": [
 64 |       "<class 'pandas.core.frame.DataFrame'>\n",
 65 |       "RangeIndex: 1000 entries, 0 to 999\n",
 66 |       "Data columns (total 8 columns):\n",
 67 |       "c_custkey       1000 non-null int64\n",
 68 |       "c_name          1000 non-null object\n",
 69 |       "c_address       1000 non-null object\n",
 70 |       "c_nationkey     1000 non-null int64\n",
 71 |       "c_phone         1000 non-null object\n",
 72 |       "c_acctbal       1000 non-null float64\n",
 73 |       "c_mktsegment    1000 non-null object\n",
 74 |       "c_comment       1000 non-null object\n",
 75 |       "dtypes: float64(1), int64(2), object(5)\n",
 76 |       "memory usage: 62.6+ KB\n"
 77 |      ]
 78 |     }
 79 |    ],
 80 |    "source": [
 81 |     "df.info()"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "code",
 86 |    "execution_count": 5,
 87 |    "metadata": {},
 88 |    "outputs": [
 89 |     {
 90 |      "data": {
 91 |       "text/plain": [
 92 |        "c_custkey       False\n",
 93 |        "c_name          False\n",
 94 |        "c_address       False\n",
 95 |        "c_nationkey     False\n",
 96 |        "c_phone         False\n",
 97 |        "c_acctbal       False\n",
 98 |        "c_mktsegment    False\n",
 99 |        "c_comment       False\n",
100 |        "dtype: bool"
101 |       ]
102 |      },
103 |      "execution_count": 5,
104 |      "metadata": {},
105 |      "output_type": "execute_result"
106 |     }
107 |    ],
108 |    "source": [
109 |     "pd.isnull(df).any()"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": 6,
115 |    "metadata": {},
116 |    "outputs": [
117 |     {
118 |      "data": {
119 |       "text/html": [
120 |        "<div>\n",
121 |        "<style scoped>\n",
122 |        "    .dataframe tbody tr th:only-of-type {\n",
123 |        "        vertical-align: middle;\n",
124 |        "    }\n",
125 |        "\n",
126 |        "    .dataframe tbody tr th {\n",
127 |        "        vertical-align: top;\n",
128 |        "    }\n",
129 |        "\n",
130 |        "    .dataframe thead th {\n",
131 |        "        text-align: right;\n",
132 |        "    }\n",
133 |        "</style>\n",
134 |        "<table border=\"1\" class=\"dataframe\">\n",
135 |        "  <thead>\n",
136 |        "    <tr style=\"text-align: right;\">\n",
137 |        "      <th></th>\n",
138 |        "      <th>c_custkey</th>\n",
139 |        "    </tr>\n",
140 |        "    <tr>\n",
141 |        "      <th>c_mktsegment</th>\n",
142 |        "      <th></th>\n",
143 |        "    </tr>\n",
144 |        "  </thead>\n",
145 |        "  <tbody>\n",
146 |        "    <tr>\n",
147 |        "      <th>AUTOMOBILE</th>\n",
148 |        "      <td>198</td>\n",
149 |        "    </tr>\n",
150 |        "    <tr>\n",
151 |        "      <th>BUILDING</th>\n",
152 |        "      <td>195</td>\n",
153 |        "    </tr>\n",
154 |        "    <tr>\n",
155 |        "      <th>FURNITURE</th>\n",
156 |        "      <td>187</td>\n",
157 |        "    </tr>\n",
158 |        "    <tr>\n",
159 |        "      <th>HOUSEHOLD</th>\n",
160 |        "      <td>207</td>\n",
161 |        "    </tr>\n",
162 |        "    <tr>\n",
163 |        "      <th>MACHINERY</th>\n",
164 |        "      <td>213</td>\n",
165 |        "    </tr>\n",
166 |        "  </tbody>\n",
167 |        "</table>\n",
168 |        "</div>"
169 |       ],
170 |       "text/plain": [
171 |        "              c_custkey\n",
172 |        "c_mktsegment           \n",
173 |        "AUTOMOBILE          198\n",
174 |        "BUILDING            195\n",
175 |        "FURNITURE           187\n",
176 |        "HOUSEHOLD           207\n",
177 |        "MACHINERY           213"
178 |       ]
179 |      },
180 |      "execution_count": 6,
181 |      "metadata": {},
182 |      "output_type": "execute_result"
183 |     }
184 |    ],
185 |    "source": [
186 |     "df.groupby('c_mktsegment')[['c_custkey']].count()"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": null,
192 |    "metadata": {},
193 |    "outputs": [],
194 |    "source": []
195 |   }
196 |  ],
197 |  "metadata": {
198 |   "kernelspec": {
199 |    "display_name": "Python 3",
200 |    "language": "python",
201 |    "name": "python3"
202 |   },
203 |   "language_info": {
204 |    "codemirror_mode": {
205 |     "name": "ipython",
206 |     "version": 3
207 |    },
208 |    "file_extension": ".py",
209 |    "mimetype": "text/x-python",
210 |    "name": "python",
211 |    "nbconvert_exporter": "python",
212 |    "pygments_lexer": "ipython3",
213 |    "version": "3.7.3"
214 |   }
215 |  },
216 |  "nbformat": 4,
217 |  "nbformat_minor": 2
218 | }
219 | 


--------------------------------------------------------------------------------
/pyodbc.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "# Import the necessary modules\n",
 10 |     "import pyodbc\n",
 11 |     "import pandas as pd\n"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "code",
 16 |    "execution_count": 2,
 17 |    "metadata": {},
 18 |    "outputs": [],
 19 |    "source": [
 20 |     "# Define the connection and specify a query to run\n",
 21 |     "# Change the values of server & PWD & warehouse to match your demo environment\n",
 22 |     "cn_str = '''Driver={SnowflakeDSIIDriver};\n",
 23 |     "    server=XXXX.snowflakecomputing.com;\n",
 24 |     "    database=sales;\n",
 25 |     "    warehouse=xxxx;\n",
 26 |     "    UID=xxxx;\n",
 27 |     "    PWD=xxxx'''\n",
 28 |     "cn = pyodbc.connect(cn_str)\n",
 29 |     "cn.setencoding('utf-8')\n",
 30 |     "\n",
 31 |     "sql = \"select * from sales.public.customer limit 10000\""
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 3,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "# Run the SQL and store the results in the res object\n",
 41 |     "#%%time\n",
 42 |     "res = cn.execute(sql).fetchall()"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 4,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "# Run the SQL And store the results in the df object\n",
 52 |     "#%%time\n",
 53 |     "df = pd.read_sql(sql, cn)"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": 5,
 59 |    "metadata": {},
 60 |    "outputs": [
 61 |     {
 62 |      "name": "stdout",
 63 |      "output_type": "stream",
 64 |      "text": [
 65 |       "<class 'pandas.core.frame.DataFrame'>\n",
 66 |       "RangeIndex: 10000 entries, 0 to 9999\n",
 67 |       "Data columns (total 8 columns):\n",
 68 |       "C\u0000_\u0000C\u0000U\u0000S       10000 non-null float64\n",
 69 |       "C\u0000_\u0000N\u0000          10000 non-null object\n",
 70 |       "C\u0000_\u0000A\u0000D\u0000D       10000 non-null object\n",
 71 |       "C\u0000_\u0000N\u0000A\u0000T\u0000I     10000 non-null float64\n",
 72 |       "C\u0000_\u0000P\u0000H         10000 non-null object\n",
 73 |       "C\u0000_\u0000A\u0000C\u0000C       10000 non-null float64\n",
 74 |       "C\u0000_\u0000M\u0000K\u0000T\u0000S\u0000    10000 non-null object\n",
 75 |       "C\u0000_\u0000C\u0000O\u0000M       10000 non-null object\n",
 76 |       "dtypes: float64(3), object(5)\n",
 77 |       "memory usage: 625.1+ KB\n"
 78 |      ]
 79 |     }
 80 |    ],
 81 |    "source": [
 82 |     "df.info()"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": 6,
 88 |    "metadata": {},
 89 |    "outputs": [
 90 |     {
 91 |      "data": {
 92 |       "text/plain": [
 93 |        "C\u0000_\u0000C\u0000U\u0000S       False\n",
 94 |        "C\u0000_\u0000N\u0000          False\n",
 95 |        "C\u0000_\u0000A\u0000D\u0000D       False\n",
 96 |        "C\u0000_\u0000N\u0000A\u0000T\u0000I     False\n",
 97 |        "C\u0000_\u0000P\u0000H         False\n",
 98 |        "C\u0000_\u0000A\u0000C\u0000C       False\n",
 99 |        "C\u0000_\u0000M\u0000K\u0000T\u0000S\u0000    False\n",
100 |        "C\u0000_\u0000C\u0000O\u0000M       False\n",
101 |        "dtype: bool"
102 |       ]
103 |      },
104 |      "execution_count": 6,
105 |      "metadata": {},
106 |      "output_type": "execute_result"
107 |     }
108 |    ],
109 |    "source": [
110 |     "pd.isnull(df).any()"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 7,
116 |    "metadata": {},
117 |    "outputs": [
118 |     {
119 |      "data": {
120 |       "text/html": [
121 |        "<div>\n",
122 |        "<style scoped>\n",
123 |        "    .dataframe tbody tr th:only-of-type {\n",
124 |        "        vertical-align: middle;\n",
125 |        "    }\n",
126 |        "\n",
127 |        "    .dataframe tbody tr th {\n",
128 |        "        vertical-align: top;\n",
129 |        "    }\n",
130 |        "\n",
131 |        "    .dataframe thead th {\n",
132 |        "        text-align: right;\n",
133 |        "    }\n",
134 |        "</style>\n",
135 |        "<table border=\"1\" class=\"dataframe\">\n",
136 |        "  <thead>\n",
137 |        "    <tr style=\"text-align: right;\">\n",
138 |        "      <th></th>\n",
139 |        "      <th>C\u0000_\u0000C\u0000U\u0000S</th>\n",
140 |        "      <th>C\u0000_\u0000N\u0000A\u0000T\u0000I</th>\n",
141 |        "      <th>C\u0000_\u0000A\u0000C\u0000C</th>\n",
142 |        "    </tr>\n",
143 |        "  </thead>\n",
144 |        "  <tbody>\n",
145 |        "    <tr>\n",
146 |        "      <td>count</td>\n",
147 |        "      <td>1.000000e+04</td>\n",
148 |        "      <td>10000.00000</td>\n",
149 |        "      <td>10000.000000</td>\n",
150 |        "    </tr>\n",
151 |        "    <tr>\n",
152 |        "      <td>mean</td>\n",
153 |        "      <td>4.630452e+06</td>\n",
154 |        "      <td>11.96070</td>\n",
155 |        "      <td>4505.595198</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <td>std</td>\n",
159 |        "      <td>1.993393e+06</td>\n",
160 |        "      <td>7.23479</td>\n",
161 |        "      <td>3155.296896</td>\n",
162 |        "    </tr>\n",
163 |        "    <tr>\n",
164 |        "      <td>min</td>\n",
165 |        "      <td>5.000010e+05</td>\n",
166 |        "      <td>0.00000</td>\n",
167 |        "      <td>-999.940000</td>\n",
168 |        "    </tr>\n",
169 |        "    <tr>\n",
170 |        "      <td>25%</td>\n",
171 |        "      <td>4.950709e+06</td>\n",
172 |        "      <td>6.00000</td>\n",
173 |        "      <td>1802.407500</td>\n",
174 |        "    </tr>\n",
175 |        "    <tr>\n",
176 |        "      <td>50%</td>\n",
177 |        "      <td>5.351160e+06</td>\n",
178 |        "      <td>12.00000</td>\n",
179 |        "      <td>4445.675000</td>\n",
180 |        "    </tr>\n",
181 |        "    <tr>\n",
182 |        "      <td>75%</td>\n",
183 |        "      <td>5.800076e+06</td>\n",
184 |        "      <td>18.00000</td>\n",
185 |        "      <td>7248.287500</td>\n",
186 |        "    </tr>\n",
187 |        "    <tr>\n",
188 |        "      <td>max</td>\n",
189 |        "      <td>8.028928e+06</td>\n",
190 |        "      <td>24.00000</td>\n",
191 |        "      <td>9999.590000</td>\n",
192 |        "    </tr>\n",
193 |        "  </tbody>\n",
194 |        "</table>\n",
195 |        "</div>"
196 |       ],
197 |       "text/plain": [
198 |        "          C\u0000_\u0000C\u0000U\u0000S  C\u0000_\u0000N\u0000A\u0000T\u0000I     C\u0000_\u0000A\u0000C\u0000C\n",
199 |        "count  1.000000e+04  10000.00000  10000.000000\n",
200 |        "mean   4.630452e+06     11.96070   4505.595198\n",
201 |        "std    1.993393e+06      7.23479   3155.296896\n",
202 |        "min    5.000010e+05      0.00000   -999.940000\n",
203 |        "25%    4.950709e+06      6.00000   1802.407500\n",
204 |        "50%    5.351160e+06     12.00000   4445.675000\n",
205 |        "75%    5.800076e+06     18.00000   7248.287500\n",
206 |        "max    8.028928e+06     24.00000   9999.590000"
207 |       ]
208 |      },
209 |      "execution_count": 7,
210 |      "metadata": {},
211 |      "output_type": "execute_result"
212 |     }
213 |    ],
214 |    "source": [
215 |     "df.describe()"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "code",
220 |    "execution_count": null,
221 |    "metadata": {},
222 |    "outputs": [],
223 |    "source": []
224 |   }
225 |  ],
226 |  "metadata": {
227 |   "kernelspec": {
228 |    "display_name": "Python 3",
229 |    "language": "python",
230 |    "name": "python3"
231 |   },
232 |   "language_info": {
233 |    "codemirror_mode": {
234 |     "name": "ipython",
235 |     "version": 3
236 |    },
237 |    "file_extension": ".py",
238 |    "mimetype": "text/x-python",
239 |    "name": "python",
240 |    "nbconvert_exporter": "python",
241 |    "pygments_lexer": "ipython3",
242 |    "version": "3.7.3"
243 |   }
244 |  },
245 |  "nbformat": 4,
246 |  "nbformat_minor": 2
247 | }
248 | 


--------------------------------------------------------------------------------
/Python.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "# Import the various modules required to make a simple Snowflake connection from Python\n",
 10 |     "import snowflake.connector\n",
 11 |     "from snowflake.connector.converter_null import SnowflakeNoConverterToPython\n",
 12 |     "import pandas as pd"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "code",
 17 |    "execution_count": 2,
 18 |    "metadata": {},
 19 |    "outputs": [],
 20 |    "source": [
 21 |     "# Modify this cell to include information about your demo account\n",
 22 |     "ACCOUNT = 'xxxx'\n",
 23 |     "USER = 'xxxx'\n",
 24 |     "PASSWORD = 'xxxx'\n",
 25 |     "\n",
 26 |     "con = snowflake.connector.connect(\n",
 27 |     "  user=USER,\n",
 28 |     "  password=PASSWORD,\n",
 29 |     "  account=ACCOUNT\n",
 30 |     "    ,converter_class=SnowflakeNoConverterToPython\n",
 31 |     ")"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": 3,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "# Create a variable called sql and specify a query that it will store\n",
 41 |     "sql = \"select * from sales.public.customer limit 10000\""
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 4,
 47 |    "metadata": {},
 48 |    "outputs": [
 49 |     {
 50 |      "data": {
 51 |       "text/plain": [
 52 |        "<snowflake.connector.cursor.SnowflakeCursor at 0x7f689ce3e5f8>"
 53 |       ]
 54 |      },
 55 |      "execution_count": 4,
 56 |      "metadata": {},
 57 |      "output_type": "execute_result"
 58 |     }
 59 |    ],
 60 |    "source": [
 61 |     "# Specify the virtual warehouse and role we want to use\n",
 62 |     "con.cursor().execute(\"USE WAREHOUSE xxxx\")\n",
 63 |     "con.cursor().execute(\"USE role xxxx\")"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "code",
 68 |    "execution_count": 5,
 69 |    "metadata": {},
 70 |    "outputs": [],
 71 |    "source": [
 72 |     "# Execute the query using the Python connector\n",
 73 |     "#%%time\n",
 74 |     "res = con.cursor().execute(sql).fetchall()\n"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "code",
 79 |    "execution_count": 6,
 80 |    "metadata": {},
 81 |    "outputs": [
 82 |     {
 83 |      "name": "stdout",
 84 |      "output_type": "stream",
 85 |      "text": [
 86 |       "<class 'pandas.core.frame.DataFrame'>\n",
 87 |       "RangeIndex: 10000 entries, 0 to 9999\n",
 88 |       "Data columns (total 8 columns):\n",
 89 |       "C_CUSTKEY       10000 non-null object\n",
 90 |       "C_NAME          10000 non-null object\n",
 91 |       "C_ADDRESS       10000 non-null object\n",
 92 |       "C_NATIONKEY     10000 non-null object\n",
 93 |       "C_PHONE         10000 non-null object\n",
 94 |       "C_ACCTBAL       10000 non-null object\n",
 95 |       "C_MKTSEGMENT    10000 non-null object\n",
 96 |       "C_COMMENT       10000 non-null object\n",
 97 |       "dtypes: object(8)\n",
 98 |       "memory usage: 625.1+ KB\n"
 99 |      ]
100 |     }
101 |    ],
102 |    "source": [
103 |     "# Run that same query, but this time use the read_sql method\n",
104 |     "# in the Pandas data frame object\n",
105 |     "#%%time\n",
106 |     "df = pd.read_sql(sql, con)\n",
107 |     "df.info()\n"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": 7,
113 |    "metadata": {},
114 |    "outputs": [
115 |     {
116 |      "data": {
117 |       "text/html": [
118 |        "<div>\n",
119 |        "<style scoped>\n",
120 |        "    .dataframe tbody tr th:only-of-type {\n",
121 |        "        vertical-align: middle;\n",
122 |        "    }\n",
123 |        "\n",
124 |        "    .dataframe tbody tr th {\n",
125 |        "        vertical-align: top;\n",
126 |        "    }\n",
127 |        "\n",
128 |        "    .dataframe thead th {\n",
129 |        "        text-align: right;\n",
130 |        "    }\n",
131 |        "</style>\n",
132 |        "<table border=\"1\" class=\"dataframe\">\n",
133 |        "  <thead>\n",
134 |        "    <tr style=\"text-align: right;\">\n",
135 |        "      <th></th>\n",
136 |        "      <th>C_CUSTKEY</th>\n",
137 |        "    </tr>\n",
138 |        "    <tr>\n",
139 |        "      <th>C_MKTSEGMENT</th>\n",
140 |        "      <th></th>\n",
141 |        "    </tr>\n",
142 |        "  </thead>\n",
143 |        "  <tbody>\n",
144 |        "    <tr>\n",
145 |        "      <td>AUTOMOBILE</td>\n",
146 |        "      <td>2043</td>\n",
147 |        "    </tr>\n",
148 |        "    <tr>\n",
149 |        "      <td>BUILDING</td>\n",
150 |        "      <td>1938</td>\n",
151 |        "    </tr>\n",
152 |        "    <tr>\n",
153 |        "      <td>FURNITURE</td>\n",
154 |        "      <td>2060</td>\n",
155 |        "    </tr>\n",
156 |        "    <tr>\n",
157 |        "      <td>HOUSEHOLD</td>\n",
158 |        "      <td>1989</td>\n",
159 |        "    </tr>\n",
160 |        "    <tr>\n",
161 |        "      <td>MACHINERY</td>\n",
162 |        "      <td>1970</td>\n",
163 |        "    </tr>\n",
164 |        "  </tbody>\n",
165 |        "</table>\n",
166 |        "</div>"
167 |       ],
168 |       "text/plain": [
169 |        "              C_CUSTKEY\n",
170 |        "C_MKTSEGMENT           \n",
171 |        "AUTOMOBILE         2043\n",
172 |        "BUILDING           1938\n",
173 |        "FURNITURE          2060\n",
174 |        "HOUSEHOLD          1989\n",
175 |        "MACHINERY          1970"
176 |       ]
177 |      },
178 |      "execution_count": 7,
179 |      "metadata": {},
180 |      "output_type": "execute_result"
181 |     }
182 |    ],
183 |    "source": [
184 |     "# Get a count of distinct customers by market segment\n",
185 |     "df.groupby('C_MKTSEGMENT')[['C_CUSTKEY']].count()"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "code",
190 |    "execution_count": 8,
191 |    "metadata": {},
192 |    "outputs": [
193 |     {
194 |      "data": {
195 |       "text/plain": [
196 |        "C_CUSTKEY       False\n",
197 |        "C_NAME          False\n",
198 |        "C_ADDRESS       False\n",
199 |        "C_NATIONKEY     False\n",
200 |        "C_PHONE         False\n",
201 |        "C_ACCTBAL       False\n",
202 |        "C_MKTSEGMENT    False\n",
203 |        "C_COMMENT       False\n",
204 |        "dtype: bool"
205 |       ]
206 |      },
207 |      "execution_count": 8,
208 |      "metadata": {},
209 |      "output_type": "execute_result"
210 |     }
211 |    ],
212 |    "source": [
213 |     "# Check to see if any of the columns have null values\n",
214 |     "pd.isnull(df).any()"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 9,
220 |    "metadata": {},
221 |    "outputs": [
222 |     {
223 |      "data": {
224 |       "text/plain": [
225 |        "list"
226 |       ]
227 |      },
228 |      "execution_count": 9,
229 |      "metadata": {},
230 |      "output_type": "execute_result"
231 |     }
232 |    ],
233 |    "source": [
234 |     "type(res)"
235 |    ]
236 |   },
237 |   {
238 |    "cell_type": "code",
239 |    "execution_count": 10,
240 |    "metadata": {},
241 |    "outputs": [
242 |     {
243 |      "name": "stdout",
244 |      "output_type": "stream",
245 |      "text": [
246 |       "('5050001', 'Customer#005050001', 'h2Q2lfB QpSuOt32ZDV7S8RsTKgedv4w9s9wa', '18', '28-680-716-8960', '4571.61', 'AUTOMOBILE', 'e thinly bold ideas. carefully final pinto beans cajole across')\n"
247 |      ]
248 |     }
249 |    ],
250 |    "source": [
251 |     "print (res[0])"
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "code",
256 |    "execution_count": 11,
257 |    "metadata": {},
258 |    "outputs": [
259 |     {
260 |      "name": "stdout",
261 |      "output_type": "stream",
262 |      "text": [
263 |       "AUTOMOBILE has occured 1974 times\n",
264 |       "BUILDING has occured 1964 times\n",
265 |       "MACHINERY has occured 1989 times\n",
266 |       "HOUSEHOLD has occured 2025 times\n",
267 |       "FURNITURE has occured 2048 times\n"
268 |      ]
269 |     }
270 |    ],
271 |    "source": [
272 |     "unique_cust_key = []\n",
273 |     "z = []\n",
274 |     "for x in res:\n",
275 |     "    z.append((x[0],x[6]))\n",
276 |     "\n",
277 |     "for x in z:\n",
278 |     "    if x not in unique_cust_key:\n",
279 |     "        unique_cust_key.append(x)\n",
280 |     "    \n",
281 |     "# initailize a null list \n",
282 |     "unique_list = []\n",
283 |     "\n",
284 |     "# traverse for all elements \n",
285 |     "for x in unique_cust_key:\n",
286 |     "    # check if exists in unique_list or not \n",
287 |     "    if x[1] not in unique_list:\n",
288 |     "        unique_list.append(x[1])\n",
289 |     "       \n",
290 |     "def countX(lst, x):\n",
291 |     "    count = 0\n",
292 |     "    for y in lst:\n",
293 |     "        if (y[1] == x):\n",
294 |     "            count = count + 1\n",
295 |     "    return count\n",
296 |     "\n",
297 |     "for a in unique_list:\n",
298 |     "    print('{} has occured {} times'.format(a, countX(unique_cust_key, a))) \n",
299 |     "    \n"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "code",
304 |    "execution_count": null,
305 |    "metadata": {},
306 |    "outputs": [],
307 |    "source": []
308 |   },
309 |   {
310 |    "cell_type": "code",
311 |    "execution_count": null,
312 |    "metadata": {},
313 |    "outputs": [],
314 |    "source": []
315 |   }
316 |  ],
317 |  "metadata": {
318 |   "kernelspec": {
319 |    "display_name": "Python 3",
320 |    "language": "python",
321 |    "name": "python3"
322 |   },
323 |   "language_info": {
324 |    "codemirror_mode": {
325 |     "name": "ipython",
326 |     "version": 3
327 |    },
328 |    "file_extension": ".py",
329 |    "mimetype": "text/x-python",
330 |    "name": "python",
331 |    "nbconvert_exporter": "python",
332 |    "pygments_lexer": "ipython3",
333 |    "version": "3.7.3"
334 |   }
335 |  },
336 |  "nbformat": 4,
337 |  "nbformat_minor": 2
338 | }
339 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # THIS REPOSITORY IS DEPRECATED: PLEASE USE THE [SNOWTIRE_V2 REPO](https://github.com/zoharsan/snowtire_v2) 
  2 | 
  3 | # Introduction
  4 | 
  5 | Snowtire is a docker image which aims to provide Snowflake users with a turn key docker environment already set-up with Snowflake drivers of the version of your choice with a comprehensive data science environment including Jupyter Notebooks, Python, Spark, R to experiment the various Snowflake connectors available: 
  6 | 
  7 | - ODBC
  8 | - JDBC
  9 | - Python Connector
 10 | - Spark Connector
 11 | - SnowSQL Client.
 12 | 
 13 | SQL Alchemy python package is also installed as part of this docker image.
 14 | 
 15 | The base docker image is [Jupyter Docker Stacks](https://github.com/jupyter/docker-stacks). More specifically, the image used is [jupyter/all-spark-notebook](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-all-spark-notebook) which provides a comprehensive jupyter environment including r, sci-py, pyspark and scala.
 16 | 
 17 | Please review the [licensing terms](https://raw.githubusercontent.com/jupyter/docker-stacks/master/LICENSE.md) of the above mentioned project.
 18 | 
 19 | **NOTE: Snowtire is not officially supported by Snowflake, and is provided as-is.**
 20 | 
 21 | # Prerequisites
 22 | 
 23 | - You need git on your Mac or Windows
 24 | - You need to download and install [Docker Desktop for Mac](https://hub.docker.com/editions/community/docker-ce-desktop-mac) or [Docker Desktop for Windows](https://hub.docker.com/editions/community/docker-ce-desktop-windows). You may need to create an account on Docker to be able to download it.
 25 | 
 26 | **NOTE FOR WINDOWS**
 27 | 
 28 | On Windows, a common issue encountered is the configuration of line endings, which adds default CRLF Windows line endings to the script deploy_snowflake.sh causing the script to fail. You can either configure [```core.autocrlf```](https://docs.github.com/en/free-pro-team@latest/github/using-git/configuring-git-to-handle-line-endings#refreshing-a-repository-after-changing-line-endings) to ```false```, or use an editor like Notepad+++ to open the deploy_snowflake.sh file and save it in UNIX mode which will convert CRLF line endings into LF before you build the snowtire docker image.
 29 | 
 30 | 
 31 | # Instructions
 32 | 
 33 | ## Download the repository
 34 | 
 35 | Change the Directory to a location where you are storing your Docker images:
 36 | 
 37 | ```
 38 | mkdir DockerImages
 39 | cd DockerImages
 40 | git clone https://github.com/zoharsan/snowtire.git
 41 | cd snowtire
 42 | ```
 43 | 
 44 | If you are just updating the repository to the latest version (always recommended before building a docker image). Run the following command from within your local clone (under snowtire directory):
 45 | 
 46 | ```
 47 | git pull
 48 | 
 49 | ```
 50 | 
 51 | ## Specify the driver levels
 52 | 
 53 | First check the latest clients available in the official [Snowflake documentation](https://docs.snowflake.net/manuals/release-notes/client-change-log.html#client-changes-by-version)
 54 | 
 55 | Once you have chosen the versions, you can customize the line 26 in the Dockerfile. For example:
 56 | 
 57 | ```
 58 | RUN odbc_version=2.21.1 jdbc_version=3.12.3 spark_version=2.7.0-spark_2.4 snowsql_version=1.2.5 /deploy_snowflake.sh
 59 | ```
 60 | 
 61 | **NOTE: SnowSQL CLI has the ability to [auto-upgrade](https://docs.snowflake.net/manuals/user-guide/snowsql-install-config.html#label-understanding-auto-upgrades) to the latest version available. So, you may not need to specify a higher version.**
 62 | 
 63 | ## Build Snowtire docker image
 64 | 
 65 | ```
 66 | docker build --pull -t snowtire .
 67 | ```
 68 | You may get some warnings which are non critical, and/or expected. You can safely ignore them:
 69 | ```
 70 | ...
 71 | debconf: delaying package configuration, since apt-utils is not installed
 72 | ...
 73 | ==> WARNING: A newer version of conda exists. <==
 74 |   current version: 4.6.14
 75 |   latest version: 4.7.5
 76 | 
 77 | Please update conda by running
 78 | 
 79 |     $ conda update -n base conda
 80 | ...
 81 | grep: /etc/odbcinst.ini: No such file or directory
 82 | ...
 83 | grep: /etc/odbc.ini: No such file or directory
 84 | ```
 85 | 
 86 | You should see the following message at the very end:
 87 | ```
 88 | Successfully tagged snowtire:latest
 89 | ```
 90 | 
 91 | ## Running the image
 92 | ```
 93 | docker run -p 8888:8888 --name spare-0 snowtire:latest
 94 | ```
 95 | If the port 8888 is already taken on your laptop, and you want to use another port, you can simply change the port mapping. For example, for port 9999, it would be:
 96 | ```
 97 | docker run -p 9999:8888 --name spare-1 snowtire:latest
 98 | ```
 99 | 
100 | You should see a message like the following the very first time you bring up this image. Copy the token value in the URL:
101 | ```
102 | [I 23:33:42.828 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
103 | [I 23:33:43.820 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
104 | [I 23:33:43.820 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
105 | [I 23:33:43.822 NotebookApp] Serving notebooks from local directory: /home/jovyan
106 | [I 23:33:43.822 NotebookApp] The Jupyter Notebook is running at:
107 | [I 23:33:43.822 NotebookApp] http://(a8e53cbad3a0 or 127.0.0.1):8888/?token=eb2222f1a8cd14046ecc5177d4b1b5965446e3c34b8f42ad
108 | [I 23:33:43.822 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
109 | [C 23:33:43.826 NotebookApp] 
110 |     
111 |     To access the notebook, open this file in a browser:
112 |         file:///home/jovyan/.local/share/jupyter/runtime/nbserver-17-open.html
113 |     Or copy and paste one of these URLs:
114 |         http://(a8e53cbad3a0 or 127.0.0.1):8888/?token=eb2222f1a8cd14046ecc5177d4b1b5965446e3c34b8f42ad
115 | ```
116 | 
117 | **Note:** If you are restarting the image, and you need to retrieve the token you can retrieve it as following:
118 | 
119 |   - Retrieve its value from the docker logs as following:
120 | 
121 |   ```
122 |   docker logs spare-0 --tail 10
123 |   ```
124 |  
125 |  - Open a bash session on the docker container
126 |  
127 |  ```
128 |  docker exec -it spare-0 /bin/bash
129 |  ```
130 |  
131 |  Then run the following command:
132 |  
133 |  ```
134 |  jupyter notebook list
135 |  ```
136 |  
137 | ## Accessing the image
138 | 
139 | Open a web browser on: http://localhost:8888
140 | 
141 | It will prompt you for a Password or token. Enter the token you have in the previous message.
142 | 
143 | ## Working with the image
144 | 
145 | Snowtire come with 4 different small examples of python notebooks allowing to test various connectors including odbc, jdbc, spark. You will need to customize your Snowflake account name, your credentials (user/password), database name and warehouse.
146 | 
147 | You can always upload to the jupyter environment any demo notebook from the main interface. See the Upload button at the top right:
148 | 
149 | ![Image](https://github.com/zoharsan/snowflake-jupyter-extras/blob/master/Notebooks.png)
150 | 
151 | These notebooks can work with the tpch_sf1 database which is provided as a sample within any Snowflake environment.
152 | 
153 | If you plan to develop new notebooks within the Docker environment, in order to avoid losing any work due to a Docker container discarded accidentally or any other container corruption, it is recommended to always keep a local copy of your work once you are done. This can be done in the Jupyter menu: File->Download as.
154 | 
155 | ### Stopping and starting the docker image
156 | 
157 | Once finished, you can stop the image with the following command:
158 | ```
159 | docker stop spare-0
160 | ```
161 | If you want to resume work, you can start the image with the following command:
162 | ```
163 | docker start spare-0
164 | ```
165 | 
166 | ### Additional handy commands
167 | 
168 | - To delete the container. WARNING: If you do this, you will lose any notebook, and any work you have saved or done within the container.
169 | ```
170 | docker rm spare-0
171 | ```
172 | - To open a bash session on the docker container, which will be useful to use the snowsql interface:
173 | ```
174 | docker exec -it spare-0 /bin/bash
175 | ```
176 | - To copy files in the docker container:
177 | ```
178 | docker cp <absolute-file-name> spare-0:<absolute-path-name in the container>
179 | Example: docker cp README.md spare-0:/
180 | ```
181 | - To list all docker containers available:
182 | ```
183 | docker ps -a
184 | ```
185 | - To list all docker images available:
186 | ```
187 | docker image ls
188 | ```
189 | - To delete a docker image:
190 | ```
191 | docker image rm <image-id>
192 | ```
193 | You can find out the image id in the previous list command.
194 | 
195 | ### Known Issues & Troubleshooting
196 | 
197 | ---
198 | #### Stability: Notebook hangs or crashes on large data sets ####
199 | 
200 | Make sure you have enough memory allocated for your Docker Workstation, at least 4 GB. On Mac:
201 | 
202 | - Stop all your docker images (See instructions above to stop/start docker images).
203 | - Click on the Docker Icon on the top right hand side of your Mac Menu bar.
204 | - Select Preferences
205 | - Select Resources
206 | - Set CPUs to minimum of 2.
207 | - Set Memory to 4 GB.
208 | - Click on Apply & Restart.
209 | 
210 | ---
211 | #### Python Kernel Dying ####
212 | 
213 | In case you have the Python kernel dying while running the notebook, and you want to troubleshoot the root cause, please add these lines as your first paragraph of your notebook and execute the paragraph:
214 | ```
215 | # Debugging
216 | 
217 | import logging
218 | import os
219 |   
220 | for logger_name in ['snowflake','botocore','azure']:
221 |     logger = logging.getLogger(logger_name)
222 |     logger.setLevel(logging.DEBUG)
223 |     ch = logging.FileHandler('python_connector.log')
224 |     ch.setLevel(logging.DEBUG)
225 |     ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
226 |     logger.addHandler(ch)
227 | ```
228 | This will generate a python_connector.log file where the notebook resides. Use the commands above to ssh into the image and examine the log.
229 | 
230 | ---
231 | #### Building the Docker Image fails on Windows ####
232 | 
233 | Building the docker image fails on Windows with the following errors:
234 | 
235 | ```
236 | Step 14/24 : RUN odbc_version=2.21.8 jdbc_version=3.12.10 spark_version=2.8.1-spark_2.4 snowsql_version=1.2.9 /deploy_snowflake.sh
237 |  ---> Running in 0cfd230c3949
238 | : not foundwflake.sh: 11: /deploy_snowflake.sh:
239 | : not foundwflake.sh: 13: /deploy_snowflake.sh:
240 | /deploy_snowflake.sh: 24: cd: can't cd to /
241 | : not foundwflake.sh: 25: /deploy_snowflake.sh:
242 |  ...loading odbc driver version 2.21.8
243 | curl: (3) URL using bad/illegal format or missing URL
244 |  ...loading jdbc driver version 3.12.10
245 | : not foundwflake.sh: 28: /deploy_snowflake.sh:
246 | curl: (3) URL using bad/illegal format or missing URL
247 |  ...loading spark driver version 2.8.1-spark_2.4
248 | : not foundwflake.sh: 31: /deploy_snowflake.sh:
249 | curl: (3) URL using bad/illegal format or missing URL
250 |  ...load SnowSQL client version 1.2.9
251 | ...
252 | ```
253 | This is caused by Windows CRLF line ending special characters added to the deploy_snowflake.sh script causing the script to fail in the Linux Ubuntu container. Open the deploy_snowflake.sh with an Editor like Notepad++ and save the file in UNIX mode which will convert CRLF to LF line endings and rerun the docker build command.
254 | 
255 | 
256 | ---
257 | #### Known Issues Log ####
258 | 
259 | Please check [known issues](known_issues.md) log for known issues with Snowtire. 
260 | 


--------------------------------------------------------------------------------