├── known_issues.md ├── deploy_snowflake.sh ├── Dockerfile ├── spark.ipynb ├── SQLAlchemy.ipynb ├── pyodbc.ipynb ├── Python.ipynb └── README.md /known_issues.md: -------------------------------------------------------------------------------- 1 | # Known issues / Change Log / Gotchas 2 | 3 | #### 2020-09-14 4 | - ZN: Tested latest drivers as of to date and updated Dockerfile. 5 | - ZN: Fixed line 26 to remove libpq-dev to prevent the following error: 6 | 7 | ```E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/postgresql-10/libpq5_10.12-0ubuntu0.18.04.1_amd64.deb 404 Not Found [IP: XXXXX 80] 8 | E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/postgresql-10/libpq-dev_10.12-0ubuntu0.18.04.1_amd64.deb 404 Not Found [IP: XXXXX 80] 9 | E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? 10 | The command '/bin/sh -c apt-get install -y iodbc libiodbc2-dev libpq-dev libssl-dev' returned a non-zero code: 100 11 | ``` 12 | 13 | #### 2020-07-30: 14 | 15 | - ZN: Added conda install of jupyterlab-plotly-extension to allow plotly visualizations when using jupyterlab mode. Note that this will require a restart of your web browser for the change to take effect. 16 | - ZN: Tested latest drivers as of to date and updated Dockerfile. 17 | 18 | 19 | #### 2020-04-20: 20 | 21 | - ZN: Fixed Dockerfile to pull the pandas optimized connector for snowflake (snowflake-connector-python[pandas]). 22 | - ZN: Tested latest drivers as of to date, including latest Spark Optimized driver version 2.7 and updated Dockerfile. 23 | 24 | #### 2019-02-14: 25 | 26 | - ZN: SnowSQL CLI version 1.2.4 has deployment issues which will be fixed in 1.2.5. As a workaround, just build the docker with 1.2.2, and launching SnowSQL will auto-upgrade the image. 27 | - PG: To allow reads against GCP, add the following option to the Spark connector: 28 | ``` 29 | 'use_copy_unload':'false' 30 | ``` 31 | 32 | #### 2019-02-13: 33 | 34 | - ZN: Jupyter-Stacks latest image updated on 02/11 upgraded Spark to 2.4.5 version which breaks pyspark. Fixed Dockerfile to pick-up last working version. 35 | -------------------------------------------------------------------------------- /deploy_snowflake.sh: -------------------------------------------------------------------------------- 1 | # Script version v1.0 2 | # Author: Zohar Nissare-Houssen 3 | # E-mail: z.nissare-houssen@snowflake.com 4 | # 5 | # README: 6 | # - Please check the following URLs for the driver to pick up: 7 | # ODBC: https://sfc-repo.snowflakecomputing.com/odbc/linux/index.html 8 | # JDBC: https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/ 9 | # Spark: https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_2.11 10 | # Note: For Spark, the docker currently uses Spark 2.4 with Scala 2.11 11 | 12 | #!/bin/bash 13 | 14 | export odbc_version=${odbc_version:-2.19.16} 15 | export odbc_file=${odbc_file:-snowflake_linux_x8664_odbc-${odbc_version}.tgz} 16 | export jdbc_version=${jdbc_version:-3.9.2} 17 | export jdbc_file=${jdbc_file:-snowflake-jdbc-${jdbc_version}.jar} 18 | export scala_version=${scala_version:-2.11} 19 | export spark_version=${spark_version:-2.5.4-spark_2.4} 20 | export spark_file=${spark_file:-spark-snowflake_${scala_version}-${spark_version}.jar} 21 | export snowsql_version=${snowsql_version:-1.1.85} 22 | export bootstrap_version=`echo ${snowsql_version}|cut -c -3` 23 | export snowsql_file=${snowsql_file:-snowsql-${snowsql_version}-linux_x86_64.bash} 24 | cd / 25 | 26 | echo "Downloading odbc driver version" ${odbc_version} "..." 27 | curl -O https://sfc-repo.snowflakecomputing.com/odbc/linux/${odbc_version}/${odbc_file} 28 | 29 | echo "Downloading jdbc driver version" ${jdbc_version} "..." 30 | curl -O https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/${jdbc_version}/${jdbc_file} 31 | 32 | echo "Downloading spark driver version" ${spark_version} "..." 33 | curl -O https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_${scala_version}/${spark_version}/${spark_file} 34 | 35 | echo "Download SnowSQL client version" ${snowsql_version} "..." 36 | curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/${bootstrap_version}/linux_x86_64/${snowsql_file} 37 | 38 | tar -xzvf ${odbc_file} 39 | ./snowflake_odbc/iodbc_setup.sh 40 | 41 | cp ${jdbc_file} /usr/local/spark/jars 42 | cp ${spark_file} /usr/local/spark/jars 43 | 44 | SNOWSQL_DEST=/usr/bin SNOWSQL_LOGIN_SHELL=/home/jovyan/.profile bash /${snowsql_file} 45 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # - Please check the following URLs for the driver to pick up: 2 | # All drivers: https://docs.snowflake.net/manuals/release-notes/client-change-log.html#client-changes-by-version 3 | # ODBC: https://sfc-repo.snowflakecomputing.com/odbc/linux/index.html 4 | # JDBC: https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/ 5 | # Spark: https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_2.11 6 | # Note: For Spark, the docker currently uses Spark 2.4 with Scala 2.11 7 | # - Update line 29 with the correct levels to be deployed which executes deploy_snowflake.sh Script 8 | # 9 | # Questions: Zohar Nissare-Houssen - z.nissare-houssen@snowflake.com 10 | # 11 | 12 | #Start from the following core stack version 13 | FROM jupyter/all-spark-notebook:1c8073a927aa 14 | USER root 15 | RUN apt-get update && \ 16 | apt-get install -y apt-utils && \ 17 | apt-get install -y libssl-dev libffi-dev && \ 18 | apt-get install -y vim 19 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade pip 20 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade pyarrow 21 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade snowflake-connector-python[pandas] 22 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade snowflake-sqlalchemy 23 | RUN sudo -u jovyan /opt/conda/bin/python -m pip install --upgrade plotly 24 | RUN conda install pyodbc 25 | RUN conda install -c conda-forge jupyterlab-plotly-extension --yes 26 | RUN apt-get install -y iodbc libiodbc2-dev libssl-dev 27 | COPY ./deploy_snowflake.sh / 28 | RUN chmod +x /deploy_snowflake.sh 29 | RUN odbc_version=2.21.8 jdbc_version=3.12.10 spark_version=2.8.1-spark_2.4 snowsql_version=1.2.9 /deploy_snowflake.sh 30 | RUN mkdir /home/jovyan/samples 31 | COPY ./pyodbc.ipynb /home/jovyan/samples 32 | COPY ./Python.ipynb /home/jovyan/samples 33 | COPY ./spark.ipynb /home/jovyan/samples 34 | COPY ./SQLAlchemy.ipynb /home/jovyan/samples 35 | RUN chown -R jovyan:users /home/jovyan/samples 36 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/pyodbc.ipynb 37 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/Python.ipynb 38 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/spark.ipynb 39 | RUN sudo -u jovyan /opt/conda/bin/jupyter trust /home/jovyan/samples/SQLAlchemy.ipynb 40 | -------------------------------------------------------------------------------- /spark.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# part1\n", 10 | "from pyspark import SparkConf,SparkContext\n", 11 | "from pyspark.sql import SQLContext\n", 12 | "from pyspark.sql.types import*\n", 13 | "from pyspark.sql.functions import*\n", 14 | "from pyspark import SparkConf,SparkContext" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 2, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "#sc.stop()\n", 24 | "sc = SparkContext(\"local\", \"Simple App\")\n", 25 | "spark = SQLContext(sc)\n", 26 | "spark_conf = SparkConf().setMaster('local').setAppName('DEMO40')\n", 27 | "spark._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(spark._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())\n", 28 | "sc._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(sc._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())\n" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 3, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "sfOptions={'sfURL':'xxxx.snowflakecomputing.com',\n", 38 | " 'sfUser':'xxxx',\n", 39 | " 'sfPassword': 'xxxx',\n", 40 | " 'sfDatabase':'sales',\n", 41 | " 'sfSchema':'public',\n", 42 | " 'sfRole':'xxxx',\n", 43 | " 'sfWarehouse':'xxxx'}\n", 44 | "\n", 45 | "sfSource='net.snowflake.spark.snowflake'" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 4, 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "df_nation=spark.read.format(sfSource) \\\n", 55 | " .options(**sfOptions) \\\n", 56 | " .option(\"dbtable\",\"nation\") \\\n", 57 | " .load()\n", 58 | "\n", 59 | "df_region=spark.read.format(sfSource) \\\n", 60 | " .options(**sfOptions) \\\n", 61 | " .option(\"dbtable\",\"region\") \\\n", 62 | " .load()\n", 63 | "\n", 64 | "df_cust=spark.read.format(sfSource) \\\n", 65 | " .options(**sfOptions) \\\n", 66 | " .option(\"dbtable\",\"customer\") \\\n", 67 | " .load()\n" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 5, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "df_loc = df_nation.join(df_region, df_nation['N_REGIONKEY'] == df_region['R_REGIONKEY']) \n" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 6, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "df_cl = df_loc.join(df_cust, df_loc['N_NATIONKEY'] == df_cust['C_NATIONKEY']) \\\n", 86 | " .filter(col('R_NAME') == 'AFRICA') \\\n", 87 | " .select('C_MKTSEGMENT') \\\n", 88 | " .groupBy('C_MKTSEGMENT').count()\n" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 7, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "+------------+------+\n", 101 | "|C_MKTSEGMENT| count|\n", 102 | "+------------+------+\n", 103 | "| MACHINERY|601756|\n", 104 | "| AUTOMOBILE|602286|\n", 105 | "| FURNITURE|600983|\n", 106 | "| HOUSEHOLD|601355|\n", 107 | "| BUILDING|601798|\n", 108 | "+------------+------+\n", 109 | "\n", 110 | "CPU times: user 0 ns, sys: 10 ms, total: 10 ms\n", 111 | "Wall time: 10.5 s\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "%%time\n", 117 | "df_cl.show()" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 8, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "name": "stdout", 127 | "output_type": "stream", 128 | "text": [ 129 | "root\n", 130 | " |-- C_MKTSEGMENT: string (nullable = true)\n", 131 | " |-- count: long (nullable = false)\n", 132 | "\n" 133 | ] 134 | } 135 | ], 136 | "source": [ 137 | "df_cl.printSchema()" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 9, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "\n", 147 | "df_cl_single = df_loc.join(df_cust, df_loc['N_NATIONKEY'] == df_cust['C_NATIONKEY']) \\\n", 148 | " .filter(df_cust['C_CUSTKEY'] == '123456') \\\n", 149 | " .select('C_MKTSEGMENT') \\\n", 150 | " .groupBy('C_MKTSEGMENT').count()\n" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 10, 156 | "metadata": {}, 157 | "outputs": [ 158 | { 159 | "name": "stdout", 160 | "output_type": "stream", 161 | "text": [ 162 | "+------------+-----+\n", 163 | "|C_MKTSEGMENT|count|\n", 164 | "+------------+-----+\n", 165 | "| AUTOMOBILE| 1|\n", 166 | "+------------+-----+\n", 167 | "\n", 168 | "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n", 169 | "Wall time: 5.96 s\n" 170 | ] 171 | } 172 | ], 173 | "source": [ 174 | "%%time\n", 175 | "df_cl_single.show()" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [] 184 | } 185 | ], 186 | "metadata": { 187 | "kernelspec": { 188 | "display_name": "Python 3", 189 | "language": "python", 190 | "name": "python3" 191 | }, 192 | "language_info": { 193 | "codemirror_mode": { 194 | "name": "ipython", 195 | "version": 3 196 | }, 197 | "file_extension": ".py", 198 | "mimetype": "text/x-python", 199 | "name": "python", 200 | "nbconvert_exporter": "python", 201 | "pygments_lexer": "ipython3", 202 | "version": "3.7.3" 203 | } 204 | }, 205 | "nbformat": 4, 206 | "nbformat_minor": 2 207 | } 208 | -------------------------------------------------------------------------------- /SQLAlchemy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# Import the appropriate packages & modules\n", 10 | "import snowflake.connector\n", 11 | "from snowflake.connector.converter_null import SnowflakeNoConverterToPython\n", 12 | "import pandas as pd \n", 13 | "from sqlalchemy import create_engine\n", 14 | "from snowflake.sqlalchemy import URL\n", 15 | "import sqlalchemy as sa" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 2, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "# Set some variables for the account, user & Password\n", 25 | "# Modify this section to match your demo account\n", 26 | "# and create an 'engine' for the Snowflake connection\n", 27 | "ACCOUNT = 'xxxx'\n", 28 | "USER = 'xxxx'\n", 29 | "PASSWORD = 'xxxx'\n", 30 | "\n", 31 | "engine = create_engine(URL(\n", 32 | " account = ACCOUNT,\n", 33 | " user = USER,\n", 34 | " password = PASSWORD,\n", 35 | " database = 'sample',\n", 36 | " schema = 'public',\n", 37 | " warehouse = 'xxxx',\n", 38 | " role='xxxx',\n", 39 | "))\n", 40 | "\n", 41 | "sql = \"select * from sales.public.customer limit 1000\"" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 3, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "# Use Pandas dataframe method read_sql_query to execute SQL in SQL Alchemy \n", 51 | "#%%time\n", 52 | "df = pd.read_sql_query(sql, engine)" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 4, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "name": "stdout", 62 | "output_type": "stream", 63 | "text": [ 64 | "\n", 65 | "RangeIndex: 1000 entries, 0 to 999\n", 66 | "Data columns (total 8 columns):\n", 67 | "c_custkey 1000 non-null int64\n", 68 | "c_name 1000 non-null object\n", 69 | "c_address 1000 non-null object\n", 70 | "c_nationkey 1000 non-null int64\n", 71 | "c_phone 1000 non-null object\n", 72 | "c_acctbal 1000 non-null float64\n", 73 | "c_mktsegment 1000 non-null object\n", 74 | "c_comment 1000 non-null object\n", 75 | "dtypes: float64(1), int64(2), object(5)\n", 76 | "memory usage: 62.6+ KB\n" 77 | ] 78 | } 79 | ], 80 | "source": [ 81 | "df.info()" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 5, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "data": { 91 | "text/plain": [ 92 | "c_custkey False\n", 93 | "c_name False\n", 94 | "c_address False\n", 95 | "c_nationkey False\n", 96 | "c_phone False\n", 97 | "c_acctbal False\n", 98 | "c_mktsegment False\n", 99 | "c_comment False\n", 100 | "dtype: bool" 101 | ] 102 | }, 103 | "execution_count": 5, 104 | "metadata": {}, 105 | "output_type": "execute_result" 106 | } 107 | ], 108 | "source": [ 109 | "pd.isnull(df).any()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 6, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/html": [ 120 | "
\n", 121 | "\n", 134 | "\n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | "
c_custkey
c_mktsegment
AUTOMOBILE198
BUILDING195
FURNITURE187
HOUSEHOLD207
MACHINERY213
\n", 168 | "
" 169 | ], 170 | "text/plain": [ 171 | " c_custkey\n", 172 | "c_mktsegment \n", 173 | "AUTOMOBILE 198\n", 174 | "BUILDING 195\n", 175 | "FURNITURE 187\n", 176 | "HOUSEHOLD 207\n", 177 | "MACHINERY 213" 178 | ] 179 | }, 180 | "execution_count": 6, 181 | "metadata": {}, 182 | "output_type": "execute_result" 183 | } 184 | ], 185 | "source": [ 186 | "df.groupby('c_mktsegment')[['c_custkey']].count()" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": null, 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [] 195 | } 196 | ], 197 | "metadata": { 198 | "kernelspec": { 199 | "display_name": "Python 3", 200 | "language": "python", 201 | "name": "python3" 202 | }, 203 | "language_info": { 204 | "codemirror_mode": { 205 | "name": "ipython", 206 | "version": 3 207 | }, 208 | "file_extension": ".py", 209 | "mimetype": "text/x-python", 210 | "name": "python", 211 | "nbconvert_exporter": "python", 212 | "pygments_lexer": "ipython3", 213 | "version": "3.7.3" 214 | } 215 | }, 216 | "nbformat": 4, 217 | "nbformat_minor": 2 218 | } 219 | -------------------------------------------------------------------------------- /pyodbc.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# Import the necessary modules\n", 10 | "import pyodbc\n", 11 | "import pandas as pd\n" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 2, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "# Define the connection and specify a query to run\n", 21 | "# Change the values of server & PWD & warehouse to match your demo environment\n", 22 | "cn_str = '''Driver={SnowflakeDSIIDriver};\n", 23 | " server=XXXX.snowflakecomputing.com;\n", 24 | " database=sales;\n", 25 | " warehouse=xxxx;\n", 26 | " UID=xxxx;\n", 27 | " PWD=xxxx'''\n", 28 | "cn = pyodbc.connect(cn_str)\n", 29 | "cn.setencoding('utf-8')\n", 30 | "\n", 31 | "sql = \"select * from sales.public.customer limit 10000\"" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 3, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "# Run the SQL and store the results in the res object\n", 41 | "#%%time\n", 42 | "res = cn.execute(sql).fetchall()" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 4, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "# Run the SQL And store the results in the df object\n", 52 | "#%%time\n", 53 | "df = pd.read_sql(sql, cn)" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 5, 59 | "metadata": {}, 60 | "outputs": [ 61 | { 62 | "name": "stdout", 63 | "output_type": "stream", 64 | "text": [ 65 | "\n", 66 | "RangeIndex: 10000 entries, 0 to 9999\n", 67 | "Data columns (total 8 columns):\n", 68 | "C\u0000_\u0000C\u0000U\u0000S 10000 non-null float64\n", 69 | "C\u0000_\u0000N\u0000 10000 non-null object\n", 70 | "C\u0000_\u0000A\u0000D\u0000D 10000 non-null object\n", 71 | "C\u0000_\u0000N\u0000A\u0000T\u0000I 10000 non-null float64\n", 72 | "C\u0000_\u0000P\u0000H 10000 non-null object\n", 73 | "C\u0000_\u0000A\u0000C\u0000C 10000 non-null float64\n", 74 | "C\u0000_\u0000M\u0000K\u0000T\u0000S\u0000 10000 non-null object\n", 75 | "C\u0000_\u0000C\u0000O\u0000M 10000 non-null object\n", 76 | "dtypes: float64(3), object(5)\n", 77 | "memory usage: 625.1+ KB\n" 78 | ] 79 | } 80 | ], 81 | "source": [ 82 | "df.info()" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 6, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "C\u0000_\u0000C\u0000U\u0000S False\n", 94 | "C\u0000_\u0000N\u0000 False\n", 95 | "C\u0000_\u0000A\u0000D\u0000D False\n", 96 | "C\u0000_\u0000N\u0000A\u0000T\u0000I False\n", 97 | "C\u0000_\u0000P\u0000H False\n", 98 | "C\u0000_\u0000A\u0000C\u0000C False\n", 99 | "C\u0000_\u0000M\u0000K\u0000T\u0000S\u0000 False\n", 100 | "C\u0000_\u0000C\u0000O\u0000M False\n", 101 | "dtype: bool" 102 | ] 103 | }, 104 | "execution_count": 6, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "pd.isnull(df).any()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 7, 116 | "metadata": {}, 117 | "outputs": [ 118 | { 119 | "data": { 120 | "text/html": [ 121 | "
\n", 122 | "\n", 135 | "\n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | "
C\u0000_\u0000C\u0000U\u0000SC\u0000_\u0000N\u0000A\u0000T\u0000IC\u0000_\u0000A\u0000C\u0000C
count1.000000e+0410000.0000010000.000000
mean4.630452e+0611.960704505.595198
std1.993393e+067.234793155.296896
min5.000010e+050.00000-999.940000
25%4.950709e+066.000001802.407500
50%5.351160e+0612.000004445.675000
75%5.800076e+0618.000007248.287500
max8.028928e+0624.000009999.590000
\n", 195 | "
" 196 | ], 197 | "text/plain": [ 198 | " C\u0000_\u0000C\u0000U\u0000S C\u0000_\u0000N\u0000A\u0000T\u0000I C\u0000_\u0000A\u0000C\u0000C\n", 199 | "count 1.000000e+04 10000.00000 10000.000000\n", 200 | "mean 4.630452e+06 11.96070 4505.595198\n", 201 | "std 1.993393e+06 7.23479 3155.296896\n", 202 | "min 5.000010e+05 0.00000 -999.940000\n", 203 | "25% 4.950709e+06 6.00000 1802.407500\n", 204 | "50% 5.351160e+06 12.00000 4445.675000\n", 205 | "75% 5.800076e+06 18.00000 7248.287500\n", 206 | "max 8.028928e+06 24.00000 9999.590000" 207 | ] 208 | }, 209 | "execution_count": 7, 210 | "metadata": {}, 211 | "output_type": "execute_result" 212 | } 213 | ], 214 | "source": [ 215 | "df.describe()" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [] 224 | } 225 | ], 226 | "metadata": { 227 | "kernelspec": { 228 | "display_name": "Python 3", 229 | "language": "python", 230 | "name": "python3" 231 | }, 232 | "language_info": { 233 | "codemirror_mode": { 234 | "name": "ipython", 235 | "version": 3 236 | }, 237 | "file_extension": ".py", 238 | "mimetype": "text/x-python", 239 | "name": "python", 240 | "nbconvert_exporter": "python", 241 | "pygments_lexer": "ipython3", 242 | "version": "3.7.3" 243 | } 244 | }, 245 | "nbformat": 4, 246 | "nbformat_minor": 2 247 | } 248 | -------------------------------------------------------------------------------- /Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# Import the various modules required to make a simple Snowflake connection from Python\n", 10 | "import snowflake.connector\n", 11 | "from snowflake.connector.converter_null import SnowflakeNoConverterToPython\n", 12 | "import pandas as pd" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 2, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "# Modify this cell to include information about your demo account\n", 22 | "ACCOUNT = 'xxxx'\n", 23 | "USER = 'xxxx'\n", 24 | "PASSWORD = 'xxxx'\n", 25 | "\n", 26 | "con = snowflake.connector.connect(\n", 27 | " user=USER,\n", 28 | " password=PASSWORD,\n", 29 | " account=ACCOUNT\n", 30 | " ,converter_class=SnowflakeNoConverterToPython\n", 31 | ")" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 3, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "# Create a variable called sql and specify a query that it will store\n", 41 | "sql = \"select * from sales.public.customer limit 10000\"" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 4, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "data": { 51 | "text/plain": [ 52 | "" 53 | ] 54 | }, 55 | "execution_count": 4, 56 | "metadata": {}, 57 | "output_type": "execute_result" 58 | } 59 | ], 60 | "source": [ 61 | "# Specify the virtual warehouse and role we want to use\n", 62 | "con.cursor().execute(\"USE WAREHOUSE xxxx\")\n", 63 | "con.cursor().execute(\"USE role xxxx\")" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 5, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "# Execute the query using the Python connector\n", 73 | "#%%time\n", 74 | "res = con.cursor().execute(sql).fetchall()\n" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 6, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "name": "stdout", 84 | "output_type": "stream", 85 | "text": [ 86 | "\n", 87 | "RangeIndex: 10000 entries, 0 to 9999\n", 88 | "Data columns (total 8 columns):\n", 89 | "C_CUSTKEY 10000 non-null object\n", 90 | "C_NAME 10000 non-null object\n", 91 | "C_ADDRESS 10000 non-null object\n", 92 | "C_NATIONKEY 10000 non-null object\n", 93 | "C_PHONE 10000 non-null object\n", 94 | "C_ACCTBAL 10000 non-null object\n", 95 | "C_MKTSEGMENT 10000 non-null object\n", 96 | "C_COMMENT 10000 non-null object\n", 97 | "dtypes: object(8)\n", 98 | "memory usage: 625.1+ KB\n" 99 | ] 100 | } 101 | ], 102 | "source": [ 103 | "# Run that same query, but this time use the read_sql method\n", 104 | "# in the Pandas data frame object\n", 105 | "#%%time\n", 106 | "df = pd.read_sql(sql, con)\n", 107 | "df.info()\n" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 7, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "data": { 117 | "text/html": [ 118 | "
\n", 119 | "\n", 132 | "\n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | "
C_CUSTKEY
C_MKTSEGMENT
AUTOMOBILE2043
BUILDING1938
FURNITURE2060
HOUSEHOLD1989
MACHINERY1970
\n", 166 | "
" 167 | ], 168 | "text/plain": [ 169 | " C_CUSTKEY\n", 170 | "C_MKTSEGMENT \n", 171 | "AUTOMOBILE 2043\n", 172 | "BUILDING 1938\n", 173 | "FURNITURE 2060\n", 174 | "HOUSEHOLD 1989\n", 175 | "MACHINERY 1970" 176 | ] 177 | }, 178 | "execution_count": 7, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "# Get a count of distinct customers by market segment\n", 185 | "df.groupby('C_MKTSEGMENT')[['C_CUSTKEY']].count()" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 8, 191 | "metadata": {}, 192 | "outputs": [ 193 | { 194 | "data": { 195 | "text/plain": [ 196 | "C_CUSTKEY False\n", 197 | "C_NAME False\n", 198 | "C_ADDRESS False\n", 199 | "C_NATIONKEY False\n", 200 | "C_PHONE False\n", 201 | "C_ACCTBAL False\n", 202 | "C_MKTSEGMENT False\n", 203 | "C_COMMENT False\n", 204 | "dtype: bool" 205 | ] 206 | }, 207 | "execution_count": 8, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "# Check to see if any of the columns have null values\n", 214 | "pd.isnull(df).any()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 9, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "text/plain": [ 225 | "list" 226 | ] 227 | }, 228 | "execution_count": 9, 229 | "metadata": {}, 230 | "output_type": "execute_result" 231 | } 232 | ], 233 | "source": [ 234 | "type(res)" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 10, 240 | "metadata": {}, 241 | "outputs": [ 242 | { 243 | "name": "stdout", 244 | "output_type": "stream", 245 | "text": [ 246 | "('5050001', 'Customer#005050001', 'h2Q2lfB QpSuOt32ZDV7S8RsTKgedv4w9s9wa', '18', '28-680-716-8960', '4571.61', 'AUTOMOBILE', 'e thinly bold ideas. carefully final pinto beans cajole across')\n" 247 | ] 248 | } 249 | ], 250 | "source": [ 251 | "print (res[0])" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 11, 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "name": "stdout", 261 | "output_type": "stream", 262 | "text": [ 263 | "AUTOMOBILE has occured 1974 times\n", 264 | "BUILDING has occured 1964 times\n", 265 | "MACHINERY has occured 1989 times\n", 266 | "HOUSEHOLD has occured 2025 times\n", 267 | "FURNITURE has occured 2048 times\n" 268 | ] 269 | } 270 | ], 271 | "source": [ 272 | "unique_cust_key = []\n", 273 | "z = []\n", 274 | "for x in res:\n", 275 | " z.append((x[0],x[6]))\n", 276 | "\n", 277 | "for x in z:\n", 278 | " if x not in unique_cust_key:\n", 279 | " unique_cust_key.append(x)\n", 280 | " \n", 281 | "# initailize a null list \n", 282 | "unique_list = []\n", 283 | "\n", 284 | "# traverse for all elements \n", 285 | "for x in unique_cust_key:\n", 286 | " # check if exists in unique_list or not \n", 287 | " if x[1] not in unique_list:\n", 288 | " unique_list.append(x[1])\n", 289 | " \n", 290 | "def countX(lst, x):\n", 291 | " count = 0\n", 292 | " for y in lst:\n", 293 | " if (y[1] == x):\n", 294 | " count = count + 1\n", 295 | " return count\n", 296 | "\n", 297 | "for a in unique_list:\n", 298 | " print('{} has occured {} times'.format(a, countX(unique_cust_key, a))) \n", 299 | " \n" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": null, 305 | "metadata": {}, 306 | "outputs": [], 307 | "source": [] 308 | }, 309 | { 310 | "cell_type": "code", 311 | "execution_count": null, 312 | "metadata": {}, 313 | "outputs": [], 314 | "source": [] 315 | } 316 | ], 317 | "metadata": { 318 | "kernelspec": { 319 | "display_name": "Python 3", 320 | "language": "python", 321 | "name": "python3" 322 | }, 323 | "language_info": { 324 | "codemirror_mode": { 325 | "name": "ipython", 326 | "version": 3 327 | }, 328 | "file_extension": ".py", 329 | "mimetype": "text/x-python", 330 | "name": "python", 331 | "nbconvert_exporter": "python", 332 | "pygments_lexer": "ipython3", 333 | "version": "3.7.3" 334 | } 335 | }, 336 | "nbformat": 4, 337 | "nbformat_minor": 2 338 | } 339 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # THIS REPOSITORY IS DEPRECATED: PLEASE USE THE [SNOWTIRE_V2 REPO](https://github.com/zoharsan/snowtire_v2) 2 | 3 | # Introduction 4 | 5 | Snowtire is a docker image which aims to provide Snowflake users with a turn key docker environment already set-up with Snowflake drivers of the version of your choice with a comprehensive data science environment including Jupyter Notebooks, Python, Spark, R to experiment the various Snowflake connectors available: 6 | 7 | - ODBC 8 | - JDBC 9 | - Python Connector 10 | - Spark Connector 11 | - SnowSQL Client. 12 | 13 | SQL Alchemy python package is also installed as part of this docker image. 14 | 15 | The base docker image is [Jupyter Docker Stacks](https://github.com/jupyter/docker-stacks). More specifically, the image used is [jupyter/all-spark-notebook](https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-all-spark-notebook) which provides a comprehensive jupyter environment including r, sci-py, pyspark and scala. 16 | 17 | Please review the [licensing terms](https://raw.githubusercontent.com/jupyter/docker-stacks/master/LICENSE.md) of the above mentioned project. 18 | 19 | **NOTE: Snowtire is not officially supported by Snowflake, and is provided as-is.** 20 | 21 | # Prerequisites 22 | 23 | - You need git on your Mac or Windows 24 | - You need to download and install [Docker Desktop for Mac](https://hub.docker.com/editions/community/docker-ce-desktop-mac) or [Docker Desktop for Windows](https://hub.docker.com/editions/community/docker-ce-desktop-windows). You may need to create an account on Docker to be able to download it. 25 | 26 | **NOTE FOR WINDOWS** 27 | 28 | On Windows, a common issue encountered is the configuration of line endings, which adds default CRLF Windows line endings to the script deploy_snowflake.sh causing the script to fail. You can either configure [```core.autocrlf```](https://docs.github.com/en/free-pro-team@latest/github/using-git/configuring-git-to-handle-line-endings#refreshing-a-repository-after-changing-line-endings) to ```false```, or use an editor like Notepad+++ to open the deploy_snowflake.sh file and save it in UNIX mode which will convert CRLF line endings into LF before you build the snowtire docker image. 29 | 30 | 31 | # Instructions 32 | 33 | ## Download the repository 34 | 35 | Change the Directory to a location where you are storing your Docker images: 36 | 37 | ``` 38 | mkdir DockerImages 39 | cd DockerImages 40 | git clone https://github.com/zoharsan/snowtire.git 41 | cd snowtire 42 | ``` 43 | 44 | If you are just updating the repository to the latest version (always recommended before building a docker image). Run the following command from within your local clone (under snowtire directory): 45 | 46 | ``` 47 | git pull 48 | 49 | ``` 50 | 51 | ## Specify the driver levels 52 | 53 | First check the latest clients available in the official [Snowflake documentation](https://docs.snowflake.net/manuals/release-notes/client-change-log.html#client-changes-by-version) 54 | 55 | Once you have chosen the versions, you can customize the line 26 in the Dockerfile. For example: 56 | 57 | ``` 58 | RUN odbc_version=2.21.1 jdbc_version=3.12.3 spark_version=2.7.0-spark_2.4 snowsql_version=1.2.5 /deploy_snowflake.sh 59 | ``` 60 | 61 | **NOTE: SnowSQL CLI has the ability to [auto-upgrade](https://docs.snowflake.net/manuals/user-guide/snowsql-install-config.html#label-understanding-auto-upgrades) to the latest version available. So, you may not need to specify a higher version.** 62 | 63 | ## Build Snowtire docker image 64 | 65 | ``` 66 | docker build --pull -t snowtire . 67 | ``` 68 | You may get some warnings which are non critical, and/or expected. You can safely ignore them: 69 | ``` 70 | ... 71 | debconf: delaying package configuration, since apt-utils is not installed 72 | ... 73 | ==> WARNING: A newer version of conda exists. <== 74 | current version: 4.6.14 75 | latest version: 4.7.5 76 | 77 | Please update conda by running 78 | 79 | $ conda update -n base conda 80 | ... 81 | grep: /etc/odbcinst.ini: No such file or directory 82 | ... 83 | grep: /etc/odbc.ini: No such file or directory 84 | ``` 85 | 86 | You should see the following message at the very end: 87 | ``` 88 | Successfully tagged snowtire:latest 89 | ``` 90 | 91 | ## Running the image 92 | ``` 93 | docker run -p 8888:8888 --name spare-0 snowtire:latest 94 | ``` 95 | If the port 8888 is already taken on your laptop, and you want to use another port, you can simply change the port mapping. For example, for port 9999, it would be: 96 | ``` 97 | docker run -p 9999:8888 --name spare-1 snowtire:latest 98 | ``` 99 | 100 | You should see a message like the following the very first time you bring up this image. Copy the token value in the URL: 101 | ``` 102 | [I 23:33:42.828 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret 103 | [I 23:33:43.820 NotebookApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab 104 | [I 23:33:43.820 NotebookApp] JupyterLab application directory is /opt/conda/share/jupyter/lab 105 | [I 23:33:43.822 NotebookApp] Serving notebooks from local directory: /home/jovyan 106 | [I 23:33:43.822 NotebookApp] The Jupyter Notebook is running at: 107 | [I 23:33:43.822 NotebookApp] http://(a8e53cbad3a0 or 127.0.0.1):8888/?token=eb2222f1a8cd14046ecc5177d4b1b5965446e3c34b8f42ad 108 | [I 23:33:43.822 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). 109 | [C 23:33:43.826 NotebookApp] 110 | 111 | To access the notebook, open this file in a browser: 112 | file:///home/jovyan/.local/share/jupyter/runtime/nbserver-17-open.html 113 | Or copy and paste one of these URLs: 114 | http://(a8e53cbad3a0 or 127.0.0.1):8888/?token=eb2222f1a8cd14046ecc5177d4b1b5965446e3c34b8f42ad 115 | ``` 116 | 117 | **Note:** If you are restarting the image, and you need to retrieve the token you can retrieve it as following: 118 | 119 | - Retrieve its value from the docker logs as following: 120 | 121 | ``` 122 | docker logs spare-0 --tail 10 123 | ``` 124 | 125 | - Open a bash session on the docker container 126 | 127 | ``` 128 | docker exec -it spare-0 /bin/bash 129 | ``` 130 | 131 | Then run the following command: 132 | 133 | ``` 134 | jupyter notebook list 135 | ``` 136 | 137 | ## Accessing the image 138 | 139 | Open a web browser on: http://localhost:8888 140 | 141 | It will prompt you for a Password or token. Enter the token you have in the previous message. 142 | 143 | ## Working with the image 144 | 145 | Snowtire come with 4 different small examples of python notebooks allowing to test various connectors including odbc, jdbc, spark. You will need to customize your Snowflake account name, your credentials (user/password), database name and warehouse. 146 | 147 | You can always upload to the jupyter environment any demo notebook from the main interface. See the Upload button at the top right: 148 | 149 | ![Image](https://github.com/zoharsan/snowflake-jupyter-extras/blob/master/Notebooks.png) 150 | 151 | These notebooks can work with the tpch_sf1 database which is provided as a sample within any Snowflake environment. 152 | 153 | If you plan to develop new notebooks within the Docker environment, in order to avoid losing any work due to a Docker container discarded accidentally or any other container corruption, it is recommended to always keep a local copy of your work once you are done. This can be done in the Jupyter menu: File->Download as. 154 | 155 | ### Stopping and starting the docker image 156 | 157 | Once finished, you can stop the image with the following command: 158 | ``` 159 | docker stop spare-0 160 | ``` 161 | If you want to resume work, you can start the image with the following command: 162 | ``` 163 | docker start spare-0 164 | ``` 165 | 166 | ### Additional handy commands 167 | 168 | - To delete the container. WARNING: If you do this, you will lose any notebook, and any work you have saved or done within the container. 169 | ``` 170 | docker rm spare-0 171 | ``` 172 | - To open a bash session on the docker container, which will be useful to use the snowsql interface: 173 | ``` 174 | docker exec -it spare-0 /bin/bash 175 | ``` 176 | - To copy files in the docker container: 177 | ``` 178 | docker cp spare-0: 179 | Example: docker cp README.md spare-0:/ 180 | ``` 181 | - To list all docker containers available: 182 | ``` 183 | docker ps -a 184 | ``` 185 | - To list all docker images available: 186 | ``` 187 | docker image ls 188 | ``` 189 | - To delete a docker image: 190 | ``` 191 | docker image rm 192 | ``` 193 | You can find out the image id in the previous list command. 194 | 195 | ### Known Issues & Troubleshooting 196 | 197 | --- 198 | #### Stability: Notebook hangs or crashes on large data sets #### 199 | 200 | Make sure you have enough memory allocated for your Docker Workstation, at least 4 GB. On Mac: 201 | 202 | - Stop all your docker images (See instructions above to stop/start docker images). 203 | - Click on the Docker Icon on the top right hand side of your Mac Menu bar. 204 | - Select Preferences 205 | - Select Resources 206 | - Set CPUs to minimum of 2. 207 | - Set Memory to 4 GB. 208 | - Click on Apply & Restart. 209 | 210 | --- 211 | #### Python Kernel Dying #### 212 | 213 | In case you have the Python kernel dying while running the notebook, and you want to troubleshoot the root cause, please add these lines as your first paragraph of your notebook and execute the paragraph: 214 | ``` 215 | # Debugging 216 | 217 | import logging 218 | import os 219 | 220 | for logger_name in ['snowflake','botocore','azure']: 221 | logger = logging.getLogger(logger_name) 222 | logger.setLevel(logging.DEBUG) 223 | ch = logging.FileHandler('python_connector.log') 224 | ch.setLevel(logging.DEBUG) 225 | ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s')) 226 | logger.addHandler(ch) 227 | ``` 228 | This will generate a python_connector.log file where the notebook resides. Use the commands above to ssh into the image and examine the log. 229 | 230 | --- 231 | #### Building the Docker Image fails on Windows #### 232 | 233 | Building the docker image fails on Windows with the following errors: 234 | 235 | ``` 236 | Step 14/24 : RUN odbc_version=2.21.8 jdbc_version=3.12.10 spark_version=2.8.1-spark_2.4 snowsql_version=1.2.9 /deploy_snowflake.sh 237 | ---> Running in 0cfd230c3949 238 | : not foundwflake.sh: 11: /deploy_snowflake.sh: 239 | : not foundwflake.sh: 13: /deploy_snowflake.sh: 240 | /deploy_snowflake.sh: 24: cd: can't cd to / 241 | : not foundwflake.sh: 25: /deploy_snowflake.sh: 242 | ...loading odbc driver version 2.21.8 243 | curl: (3) URL using bad/illegal format or missing URL 244 | ...loading jdbc driver version 3.12.10 245 | : not foundwflake.sh: 28: /deploy_snowflake.sh: 246 | curl: (3) URL using bad/illegal format or missing URL 247 | ...loading spark driver version 2.8.1-spark_2.4 248 | : not foundwflake.sh: 31: /deploy_snowflake.sh: 249 | curl: (3) URL using bad/illegal format or missing URL 250 | ...load SnowSQL client version 1.2.9 251 | ... 252 | ``` 253 | This is caused by Windows CRLF line ending special characters added to the deploy_snowflake.sh script causing the script to fail in the Linux Ubuntu container. Open the deploy_snowflake.sh with an Editor like Notepad++ and save the file in UNIX mode which will convert CRLF to LF line endings and rerun the docker build command. 254 | 255 | 256 | --- 257 | #### Known Issues Log #### 258 | 259 | Please check [known issues](known_issues.md) log for known issues with Snowtire. 260 | --------------------------------------------------------------------------------