├── .gitignore ├── MANIFEST.in ├── README.md ├── dbt ├── __init__.py ├── adapters │ ├── __init__.py │ └── rockset │ │ ├── __init__.py │ │ ├── __version__.py │ │ ├── column.py │ │ ├── connections.py │ │ ├── impl.py │ │ ├── relation.py │ │ └── sample_profiles.yml └── include │ ├── __init__.py │ └── rockset │ ├── __init__.py │ ├── dbt_project.yml │ ├── macros │ ├── catalog.sql │ ├── current_ts.sql │ ├── macros.sql │ └── materializations │ │ ├── incremental.sql │ │ ├── query_lambda.sql │ │ ├── seed.sql │ │ ├── table.sql │ │ └── view.sql │ └── profile_template.yml ├── dev-requirements.txt ├── mypy.ini ├── pytest.ini ├── setup.py ├── test.env ├── tests ├── __init__.py ├── conftest.py └── functional │ └── adapter │ ├── test_basic.py │ ├── test_basic_overrides.py │ └── test_query_lambda.py └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.pyo 3 | __pycache__/ 4 | .DS_Store 5 | build/ 6 | dbt_rockset.egg-info/ 7 | logs/** 8 | dist/ 9 | dbt/include/rockset/logs/ 10 | dbt/include/rockset/target/ 11 | venv/ 12 | pytestdebug.log 13 | .env 14 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | recursive-include dbt/include *.sql *.yml *.md 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # dbt-rockset 2 | 3 | The dbt-Rockset adapter brings real-time analytics to [dbt](https://www.getdbt.com/). Using the adapter, you can load data into [Rockset](https://rockset.com/) and create collections, by writing SQL SELECT statements in dbt. These collections can then be built on top of each other to support highly-complex data transformations with many dependency edges. 4 | 5 | The following subsections describe the adapter's installation procedure and support for dbt: 6 | 7 | * [Installation and Set up](#installation-and-set-up) 8 | * [Supported Materializations](#supported-materializations) 9 | * [Real-Time Streaming ELT Using dbt + Rockset](#real-time-streaming-elt-using-dbt--rockset) 10 | * [Persistent Materializations Using dbt + Rockset](#persistent-materializations-using-dbt--rockset) 11 | * [Testing, Formatting, & Caveats](#testing-formatting--caveats) 12 | 13 | See the following blogs for additional information: 14 | * [Real-Time Analytics with dbt + Rockset](https://rockset.com/blog/real-time-analytics-with-dbt-rockset/) 15 | * [Real-Time Data Transformations with dbt + Rockset](https://rockset.com/blog/real-time-data-transformations-dbt-rockset/). 16 | 17 | ## Installation and Set up 18 | 19 | The following subsections describe how to set up and use the adapter: 20 | 21 | * [Install the Plug-in](#install-the-plug-in) 22 | * [Configure your Profile](#configure-your-profile) 23 | 24 | See the [adapter's GitHub repo](https://github.com/rockset/dbt-rockset) for additional information. 25 | 26 | ### Install the Plug-in 27 | 28 | Open a command-line window and run the following command to install the adapter: 29 | 30 | ```bash 31 | pip3 install dbt-rockset 32 | ``` 33 | 34 | ### Configure your Profile 35 | 36 | Configure a [dbt profile](https://docs.getdbt.com/dbt-cli/configure-your-profile) similar to the example shown below, to connect with your Rockset account. Enter any workspace that you’d like your dbt collections to be created in, and any Rockset API key. The database field is required by dbt but unused in Rockset. 37 | 38 | ``` 39 | rockset: 40 | outputs: 41 | dev: 42 | type: rockset 43 | threads: 1 44 | database: N/A 45 | workspace: 46 | api_key: 47 | api_server: # Optional, default is `api.usw2a1.rockset.com`, the api server for region Oregon. 48 | vi_rrn: # Optional, the VI to use for IIS queries 49 | run_async_iis: # Optional, by default false, whether use async execution for IIS queries 50 | target: dev 51 | ``` 52 | 53 | Update your dbt project to use this Rockset dbt profile. You can switch profiles in your project by editing the ```dbt_project.yml``` file. 54 | 55 | ## Supported Materializations 56 | 57 | Type | Supported? | Details 58 | -----|------------|---------------- 59 | [Table](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#table) | YES | Creates a [Rockset collection](https://docs.rockset.com/collections/). 60 | [View](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#view) | YES | Creates a [Rockset view](https://rockset.com/docs/views/#gatsby-focus-wrapper). 61 | [Ephemeral](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#ephemeral) | Yes | Create a CTE. 62 | [Incremental](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#incremental) | YES | Creates a [Rockset collection](https://docs.rockset.com/collections/) if it doesn't exist, and writes to it. 63 | 64 | ### Query Lambda Configuration 65 | [Query Lambdas](https://docs.rockset.com/documentation/docs/query-lambdas) can be created and updated using dbt. 66 | To manage a Query Lamdba using dbt, a materialization of 'query_lambda' should be used. 67 | For example, 68 | ``` 69 | {{ 70 | config( 71 | materialized='query_lambda', 72 | tags=['example_tag'], 73 | parameters=[ 74 | {'name': 'order_id', 'type': 'string', 'value': 'xyz'}, 75 | {'name': 'limit', 'type': 'int', 'value': '10'}, 76 | ] 77 | ) 78 | }} 79 | 80 | select * from {{ ref('orders') }} 81 | where order_id = :order_id 82 | order by order_time 83 | limit :limit 84 | ``` 85 | 86 | See the [tests](https://github.com/rockset/dbt-rockset/blob/master/tests/functional/adapter/test_query_lambda.py) for more example usages. 87 | 88 | >:warning: Query Lambdas cannot be referenced as a model in other dbt models as they cannot be executed from dbt. 89 | 90 | ## Real-Time Streaming ELT Using dbt + Rockset 91 | 92 | As data is ingested, Rockset performs the following: 93 | * The data is automatically indexed in at least three different ways using Rockset’s [Converged Index™](https://rockset.com/blog/converged-indexing-the-secret-sauce-behind-rocksets-fast-queries/) technology. 94 | * Your write-time data transformations are performed. 95 | * The data is made available for queries within seconds. 96 | 97 | When you execute queries on that data, Rockset leverages those indexes to complete any read-time data transformations you define using dbt, with sub-second latency. 98 | 99 | ### Write-Time Data Transformations Using Rollups and Ingest Transformation 100 | 101 | Rockset can extract and load semi-structured data from multiple sources in real-time. 102 | For high-velocity data (e.g. data streams), you can roll it up at write-time. For example, when you have streaming data coming in from Kafka or Kinesis, 103 | you can create a Rockset collection for each data stream, and then set up [rollups](https://rockset.com/blog/how-rockset-enables-sql-based-rollups-for-streaming-data/), to perform transformations and aggregations on the data as it is written into Rockset. This can help to: 104 | 105 | * Reduce the size of large scale data streams. 106 | * De-duplicate data. 107 | * Partition your data. 108 | 109 | Collections can also be created from other data sources including: 110 | * Data lakes (e.g., S3 or GCS). 111 | * NoSQL databases (e.g., DynamoDB or MongoDB) 112 | * Relational databases (e.g., PostgreSQL or MySQL). 113 | 114 | You can then use Rocket’s [ingest transformation](/ingest-transformation) to transform the data using SQL statements as it is written into Rockset. 115 | 116 | ### Read-Time Data Transformations Using Rockset Views 117 | 118 | The adapter can set up data transformations as SQL statements in dbt, using View Materializations that can be performed during read-time. 119 | 120 | To set this up: 121 | 122 | 1. Create a dbt model using SQL statements for each transformation you want to perform on your data. 123 | 2. Execute ```dbt run```. dbt will automatically create a Rockset View for each dbt model, which performs all the data transformations when queries are executed. 124 | 125 | If queries complete within your latency requirements, then you have achieved the gold standard of real-time data transformations: Real-Time Streaming ELT. 126 | 127 | Your data will be automatically kept up-to-date in real-time, and reflected in your queries. There is no need for periodic batch updates to “refresh” your data. You will not need to execute ```dbt run``` again after the initial set up, unless you want to make changes to the actual data transformation logic (e.g. adding or updating dbt models). 128 | 129 | ## Persistent Materializations Using dbt + Rockset 130 | 131 | If write-time transformations and views don't meet your application’s latency requirements (or your data transformations become too complex), you can persist them as Rockset collections. 132 | 133 | Rockset requires synchronous queries to complete in under two minutes to cater to real-time use cases, which may affect you if your read-time transformations are too complicated. This requires a batch ELT workflow to manually execute ```dbt run``` each time you want to update your data transformations. You can use micro-batching to frequently run dbt, to keep your transformed data up-to-date in near real-time. 134 | 135 | 136 | Persistent materializations are both faster to query and better at handling query concurrency, as they are materialized as collections in Rockset. Since the bulk of the data transformations have already been performed ahead of time, your queries will complete significantly faster because you can minimize the complexity necessary during read-time. 137 | 138 | There are two persistent materializations available in dbt: 139 | 140 | * [Incremental Models](#materializing-dbt-incremental-models-in-rockset) 141 | * [Table Models](#materializing-dbt-table-models-in-rockset) 142 | 143 | ### Materializing dbt Incremental Models in Rockset 144 | 145 | [Incremental Models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#incremental) enable you to insert or update documents into a Rockset collection since the last time dbt was run. This can significantly reduce the build time since Rockset only needs to perform transformations on the new data that was just generated, rather than dropping, recreating, and performing transformations on the entire data set. 146 | 147 | Depending on the complexity of your data transformations, incremental materializations may not always be a viable option to meet your transformation requirements. Incremental materializations are best suited for event or time-series data streamed directly into Rockset. To tell dbt which documents it should perform transformations on during an incremental run, provide SQL that filters for these documents using the ```is_incremental()``` macro in your dbt code. You can learn more about configuring incremental models in dbt [here](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#how-do-i-use-the-incremental-materialization). 148 | 149 | ### Materializing dbt Table Models in Rockset 150 | 151 | A [Table Model](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations#table) is a transformation that drops and recreates an entire Rockset collection with the execution of ```dbt```. It updates that collection's transformed data with the most up-to-date source data. This is the simplest way to persist transformed data in Rockset, and results in much faster queries since the transformations are completed prior to query time. 152 | 153 | However, Table Models can be slow to complete, since Rockset is not optimized for creating entirely new collections from scratch on the fly. This may significantly increase your data latency, as it may take several minutes for Rockset to provision resources for a new collection and then populate it with transformed data. 154 | 155 | ### Putting It All Together 156 | 157 | You can use Table Models and Incremental Models (in conjunction with Rockset views), to customize the perfect stack to meet the unique requirements of your data transformations. For example, you can use SQL-based rollups to: 158 | 159 | * Transform your streaming data during write-time. 160 | * Transform and persist them into Rockset collections via Incremental or Table Models. 161 | * Execute a sequence of view models during read-time to transform your data again. 162 | 163 | ## Testing, Formatting, & Caveats 164 | 165 | ### Testing Changes 166 | 167 | Before landing a commit, ensure that your changes pass tests. Your credentials should be set in the .env file in the repo root. See the test.env file for an example. Once the .env file is set, you can run the tests using the following command. 168 | ``` 169 | pytest -s tests/functional 170 | ``` 171 | 172 | ### Formatting 173 | 174 | Before landing a commit format changes using [black](https://github.com/psf/black). 175 | ``` 176 | # Install 177 | python -m pip install black 178 | # Usage (From repo root) 179 | python -m black . 180 | ``` 181 | 182 | ### Caveats 183 | 1. `unique_key` is not supported with incremental, unless it is set to [_id](https://rockset.com/docs/special-fields/#the-_id-field), which acts as a natural `unique_key` in Rockset anyway. 184 | 2. The `table` materialization is slower in Rockset than most due to Rockset's architecture as a low-latency, real-time database. Creating new collections requires provisioning hot storage to index and serve fresh data, which takes about a minute. 185 | 3. Rockset queries have a two-minute timeout unless run asynchronously. You can extend this limit to 30 minutes by setting the run_async_iis to true. However, if the query ends up in the queue because you have hit your org's Concurrent Query Execution Limit (CQEL), the query must at least start execution before 2 minutes have passed. Otherwise, your IIS query will error. If the query leaves the queue and begins execution before 2 minutes have passed, the normal 30 minute time limit will apply. 186 | -------------------------------------------------------------------------------- /dbt/__init__.py: -------------------------------------------------------------------------------- 1 | __path__ = __import__("pkgutil").extend_path(__path__, __name__) 2 | -------------------------------------------------------------------------------- /dbt/adapters/__init__.py: -------------------------------------------------------------------------------- 1 | __path__ = __import__("pkgutil").extend_path(__path__, __name__) 2 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/__init__.py: -------------------------------------------------------------------------------- 1 | from dbt.adapters.rockset.connections import RocksetConnectionManager 2 | from dbt.adapters.rockset.connections import RocksetCredentials 3 | from dbt.adapters.rockset.impl import RocksetAdapter 4 | from dbt.adapters.rockset.relation import RocksetRelation 5 | from dbt.adapters.rockset.column import RocksetColumn 6 | 7 | from dbt.adapters.base import AdapterPlugin 8 | from dbt.include import rockset 9 | 10 | 11 | Plugin = AdapterPlugin( 12 | adapter=RocksetAdapter, 13 | credentials=RocksetCredentials, 14 | include_path=rockset.PACKAGE_PATH, 15 | ) 16 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/__version__.py: -------------------------------------------------------------------------------- 1 | version = "1.7.4" 2 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/column.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass 2 | 3 | from dbt.adapters.base.column import Column 4 | from dbt.exceptions import DbtRuntimeError 5 | 6 | import rockset 7 | 8 | 9 | @dataclass 10 | class RocksetColumn(Column): 11 | def is_integer(self) -> bool: 12 | return self.dtype.lower() in ["int"] 13 | 14 | def is_numeric(self) -> bool: 15 | return self.dtype.lower() in ["int", "float"] 16 | 17 | def is_float(self): 18 | return self.dtype.lower() in ["float"] 19 | 20 | def is_string(self): 21 | return self.dtype.lower() in ["string"] 22 | 23 | def string_size(self) -> int: 24 | if not self.is_string(): 25 | raise DbtRuntimeError("Called string_size() on non-string field!") 26 | 27 | return rockset.Client.MAX_FIELD_VALUE_BYTES 28 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/connections.py: -------------------------------------------------------------------------------- 1 | from contextlib import contextmanager 2 | from dataclasses import dataclass 3 | from dbt.adapters.base import Credentials 4 | from dbt.adapters.base import BaseConnectionManager 5 | from dbt.clients import agate_helper 6 | from dbt.contracts.connection import AdapterResponse, Connection 7 | from dbt.logger import GLOBAL_LOGGER as logger 8 | from dbt.exceptions import NotImplementedError, DbtValidationError 9 | 10 | import agate 11 | import dbt 12 | import rockset_sqlalchemy as sql 13 | from .__version__ import version as rs_version 14 | from typing import Any, Dict, List, Optional, Tuple, Union 15 | 16 | 17 | @dataclass 18 | class RocksetCredentials(Credentials): 19 | database: str = "db" 20 | vi_rrn: Optional[str] = None 21 | run_async_iis: Optional[bool] = False 22 | api_server: Optional[str] = "api.usw2a1.rockset.com" 23 | api_key: Optional[str] = None 24 | schema: Optional[str] = None 25 | 26 | @property 27 | def type(self): 28 | return "rockset" 29 | 30 | @property 31 | def unique_field(self): 32 | return self.api_key 33 | 34 | def _connection_keys(self): 35 | return ("api_key", "apiserver", "schema") 36 | 37 | _ALIASES = {"workspace": "schema"} 38 | 39 | 40 | class RocksetConnectionManager(BaseConnectionManager): 41 | TYPE = "rockset" 42 | 43 | @classmethod 44 | def open(cls, connection: Connection) -> Connection: 45 | if connection.state == "open": 46 | logger.debug("Connection is already open, skipping open.") 47 | return connection 48 | 49 | credentials = connection.credentials 50 | 51 | # Ensure the credentials have a valid apiserver before connecting to rockset 52 | if not ( 53 | credentials.api_server is not None 54 | and "api" in credentials.api_server 55 | and credentials.api_server.endswith("rockset.com") 56 | ): 57 | raise DbtValidationError( 58 | f"Invalid apiserver `{credentials.api_server}` specified in profile. Expecting a server of the form api..rockset.com" 59 | ) 60 | 61 | try: 62 | handle = sql.connect( 63 | api_server=credentials.api_server, api_key=credentials.api_key 64 | ) 65 | handle._client.api_client.user_agent = "dbt/" + rs_version 66 | 67 | connection.state = "open" 68 | connection.handle = handle 69 | return connection 70 | except Exception as e: 71 | connection.state = "fail" 72 | connection.handle = None 73 | raise dbt.exceptions.FailedToConnectException(e) 74 | 75 | @classmethod 76 | def get_status(cls, cursor) -> str: 77 | # Rockset cursors don't have a status_message 78 | return "OK" 79 | 80 | def cancel_open(self) -> Optional[List[str]]: 81 | raise NotImplementedError("`cancel_open` is not implemented for this adapter!") 82 | 83 | def begin(self) -> None: 84 | """Begin a transaction. (passable)""" 85 | raise NotImplementedError("`begin` is not implemented for this adapter!") 86 | 87 | def commit(self) -> None: 88 | """Commit a transaction. (passable)""" 89 | raise NotImplementedError("`commit` is not implemented for this adapter!") 90 | 91 | def clear_transaction(self) -> None: 92 | pass 93 | 94 | # auto_begin is ignored in Rockset, and only included for consistency 95 | def execute( 96 | self, 97 | sql: str, 98 | auto_begin: bool = False, 99 | fetch: bool = False, 100 | limit: Optional[int] = None, 101 | ) -> Tuple[Union[AdapterResponse, str], agate.Table]: 102 | sql = self._add_query_comment(sql) 103 | cursor = self.get_thread_connection().handle.cursor() 104 | 105 | if fetch: 106 | rows, field_names = self._sql_to_results(cursor, sql, limit) 107 | table = agate_helper.table_from_data_flat(rows, field_names) 108 | else: 109 | cursor.execute(sql) 110 | table = agate_helper.empty_table() 111 | 112 | return AdapterResponse(_message="OK"), table 113 | 114 | def _sql_to_results(self, cursor, sql, limit): 115 | cursor.execute(sql) 116 | field_names = self._description_to_field_names(cursor.description) 117 | json_results = [] 118 | if limit is None: 119 | rows = cursor.fetchall() 120 | else: 121 | rows = cursor.fetchmany(limit) 122 | for row in rows: 123 | json_results.append(self._row_to_json(row, field_names)) 124 | return json_results, field_names 125 | 126 | def _row_to_json(self, row, field_names): 127 | json_res = {} 128 | for i in range(len(row)): 129 | json_res[field_names[i]] = row[i] 130 | return json_res 131 | 132 | def _description_to_field_names(self, description): 133 | return [desc[0] for desc in description] 134 | 135 | @contextmanager 136 | def exception_handler(self, sql: str): 137 | try: 138 | yield 139 | except Exception as e: 140 | raise e 141 | 142 | @classmethod 143 | def get_response(cls, cursor) -> str: 144 | return "OK" 145 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/impl.py: -------------------------------------------------------------------------------- 1 | from dbt.adapters.base import BaseAdapter, available, RelationType 2 | from dbt.adapters.sql import SQLAdapter 3 | from dbt.adapters.rockset.connections import RocksetConnectionManager 4 | from dbt.adapters.rockset.relation import RocksetQuotePolicy, RocksetRelation 5 | from dbt.contracts.graph.manifest import Manifest 6 | from dbt.adapters.rockset.column import RocksetColumn 7 | from dbt.logger import GLOBAL_LOGGER as logger 8 | from dbt.adapters.base import BaseRelation 9 | from dbt.exceptions import NotImplementedError 10 | from .__version__ import version as rs_version 11 | 12 | import agate 13 | import datetime 14 | import dbt 15 | import json 16 | import collections 17 | import requests 18 | import rockset 19 | import backoff 20 | from rockset.models import * 21 | from rockset import ApiException 22 | from decimal import Decimal 23 | from time import sleep 24 | from typing import List, Optional, Set 25 | 26 | 27 | OK = 200 28 | NOT_FOUND = 404 29 | ASYNC_OPTIONS = { 30 | "client_timeout_ms": 1000, # arbitrary 31 | "timeout_ms": 1800000, 32 | "max_initial_results": 1, 33 | } 34 | 35 | 36 | def fatal_code(e: ApiException): 37 | if e.status == 429: 38 | return False 39 | else: 40 | return 400 <= e.status < 500 41 | 42 | 43 | class RocksetAdapter(BaseAdapter): 44 | RELATION_TYPES = { 45 | "TABLE": RelationType.Table, 46 | } 47 | 48 | Relation = RocksetRelation 49 | Column = RocksetColumn 50 | ConnectionManager = RocksetConnectionManager 51 | 52 | @classmethod 53 | def convert_text_type(cls, agate_table: agate.Table, col_idx: int) -> str: 54 | return "string" 55 | 56 | @classmethod 57 | def convert_number_type(cls, agate_table: agate.Table, col_idx: int) -> str: 58 | decimals = agate_table.aggregate(agate.MaxPrecision(col_idx)) 59 | return "float" if decimals else "int" 60 | 61 | @classmethod 62 | def convert_boolean_type(cls, agate_table: agate.Table, col_idx: int) -> str: 63 | return "bool" 64 | 65 | @classmethod 66 | def convert_datetime_type(cls, agate_table: agate.Table, col_idx: int) -> str: 67 | return "datetime" 68 | 69 | @classmethod 70 | def convert_date_type(cls, agate_table: agate.Table, col_idx: int) -> str: 71 | return "date" 72 | 73 | @classmethod 74 | def convert_time_type(cls, agate_table: agate.Table, col_idx: int) -> str: 75 | return "time" 76 | 77 | @classmethod 78 | def is_cancelable(cls) -> bool: 79 | return False 80 | 81 | @classmethod 82 | def date_function(cls): 83 | return "CURRENT_TIMESTAMP()" 84 | 85 | def create_schema(self, relation: RocksetRelation) -> None: 86 | rs = self._rs_client() 87 | logger.debug('Creating workspace "{}"', relation.schema) 88 | # Must check if the workspace already exists... 89 | current_workspaces = {ws.name for ws in rs.Workspaces.list().data} 90 | if relation.schema in current_workspaces: 91 | return 92 | rs.Workspaces.create(name=relation.schema) 93 | # Wait for workspace creation to complete 94 | for _ in range(10): 95 | sleep(1) 96 | current_workspaces = {ws.name for ws in rs.Workspaces.list().data} 97 | if relation.schema in current_workspaces: 98 | return 99 | logger.info(f"Waiting for workspace {relation.schema} to be created") 100 | 101 | def drop_schema(self, relation: RocksetRelation) -> None: 102 | rs = self._rs_client() 103 | ws = relation.schema 104 | logger.info('Dropping workspace "{}"', ws) 105 | 106 | # Drop all views in the ws 107 | for view in self._list_views(ws): 108 | self._delete_view_recursively(ws, view) 109 | 110 | # Drop all aliases in the ws 111 | for alias in rs.Aliases.workspace_aliases(workspace=ws).data: 112 | self._delete_alias(ws, alias.name) 113 | # Drop all collections in the ws 114 | for collection in rs.Collections.workspace_collections(workspace=ws).data: 115 | self._delete_collection(ws, collection.name, wait_until_deleted=False) 116 | # Drop all QLs in the ws 117 | for ql in rs.QueryLambdas.list_query_lambdas_in_workspace(workspace=ws).data: 118 | self._delete_ql(ws, ql.name) 119 | 120 | try: 121 | # Wait until the ws has 0 collections. We do this so deletion of multiple collections 122 | # can happen in parallel 123 | while True: 124 | workspace = rs.Workspaces.get(workspace=ws).data 125 | if workspace.collection_count == 0: 126 | break 127 | logger.debug( 128 | f"Waiting for ws {ws} to have 0 collections, has {workspace.collection_count}" 129 | ) 130 | sleep(3) 131 | 132 | rs.Workspaces.delete(workspace=ws) 133 | sleep(4) # Wait for workspace to be deleted 134 | except Exception as e: 135 | logger.debug(f"Caught exception of type {e.__class__}") 136 | # Workspace does not exist 137 | if isinstance(e, rockset.exceptions.ApiException) and e.status == NOT_FOUND: 138 | pass 139 | else: # Unexpected error 140 | raise e 141 | 142 | # Required by BaseAdapter 143 | @available 144 | def list_schemas(self, database: str) -> List[str]: 145 | rs = self._rs_client() 146 | return [ws.name for ws in rs.Workspaces.list().data] 147 | 148 | # Relation/Collection related methods 149 | def truncate_relation(self, relation: RocksetRelation) -> None: 150 | raise NotImplementedError("`truncate` is not implemented for this adapter!") 151 | 152 | @available.parse(lambda *a, **k: "") 153 | def get_dummy_sql(self): 154 | return f""" 155 | /* Placeholder Query */ 156 | SELECT 3; 157 | """ 158 | 159 | @available.parse_list 160 | def drop_relation(self, relation: RocksetRelation) -> None: 161 | ws = relation.schema 162 | identifier = relation.identifier 163 | 164 | if self._does_view_exist(ws, identifier): 165 | self._delete_view_recursively(ws, identifier) 166 | elif self._does_collection_exist(ws, identifier): 167 | self._delete_collection(ws, identifier) 168 | else: 169 | raise dbt.exceptions.Exception( 170 | f"Tried to drop relation {ws}.{identifier} that does not exist!" 171 | ) 172 | 173 | def rename_relation( 174 | self, from_relation: RocksetRelation, to_relation: RocksetRelation 175 | ) -> None: 176 | raise NotImplementedError("`rename` is not implemented for this adapter!") 177 | 178 | @available.parse(lambda *a, **k: "") 179 | def get_collection(self, relation) -> RocksetRelation: 180 | ws = relation.schema 181 | cname = relation.identifier 182 | 183 | try: 184 | rs = self._rs_client() 185 | existing_collection = rs.Collections.get(collection=cname, workspace=ws) 186 | return self._rs_collection_to_relation(existing_collection) 187 | except Exception as e: 188 | if ( 189 | hasattr(e, "status") and e.status == NOT_FOUND 190 | ): # Collection does not exist 191 | return None 192 | else: # Unexpected error 193 | raise e 194 | 195 | # Required by BaseAdapter 196 | def list_relations_without_caching( 197 | self, schema_relation: RocksetRelation 198 | ) -> List[RocksetRelation]: 199 | # Due to the database matching issue, we can not implement relation caching, so 200 | # this is a simple pass-through to list_relations 201 | return self.list_relations(None, schema_relation.schema) 202 | 203 | # We override `list_relations` bc the base implementation uses a caching mechanism that does 204 | # not work for our purposes bc it relies on comparing relation.database, and database is not 205 | # a concept in Rockset 206 | def list_relations( 207 | self, database: Optional[str], schema: str 208 | ) -> List[RocksetRelation]: 209 | 210 | rs = self._rs_client() 211 | relations = [] 212 | 213 | collections = rs.Collections.workspace_collections(workspace=schema).data 214 | for collection in collections: 215 | relations.append(self._rs_collection_to_relation(collection)) 216 | return relations 217 | 218 | # Required by BaseAdapter 219 | def get_columns_in_relation(self, relation: RocksetRelation) -> List[RocksetColumn]: 220 | logger.debug(f"Getting columns in relation {relation.identifier}") 221 | sql = 'DESCRIBE "{}"."{}"'.format(relation.schema, relation.identifier) 222 | status, table = self.connections.execute(sql, fetch=True) 223 | 224 | columns = [] 225 | for row in table.rows: 226 | top_lvl_field = json.loads(row["field"]) 227 | if len(top_lvl_field) == 1: 228 | col = self.Column.create(top_lvl_field[0], row["type"]) 229 | columns.append(col) 230 | return columns 231 | 232 | def _get_types_in_relation(self, relation: RocksetRelation) -> List[RocksetColumn]: 233 | logger.debug(f"Getting columns in relation {relation.identifier}") 234 | if not (relation.schema and relation.identifier): 235 | return [] 236 | sql = 'DESCRIBE "{}"."{}"'.format(relation.schema, relation.identifier) 237 | status, table = self.connections.execute(sql, fetch=True) 238 | 239 | field_types = collections.defaultdict(list) 240 | for row in table.rows: 241 | r = json.loads(row["field"]) 242 | field_path = ".".join(r) 243 | field_types[field_path].append((row["occurrences"], row["type"])) 244 | 245 | # Sort types by their relative frequency 246 | return [ 247 | { 248 | "column_name": k, 249 | "column_type": "/".join([t[1] for t in sorted(v, reverse=True)]), 250 | } 251 | for k, v in field_types.items() 252 | ] 253 | 254 | def get_filtered_catalog( 255 | self, manifest: Manifest, relations: Optional[Set[BaseRelation]] = None 256 | ): 257 | catalogs: agate.Table 258 | if relations is None or len(relations) > 100: 259 | # Do it the traditional way. We get the full catalog. 260 | catalogs, exceptions = self.get_catalog(manifest) 261 | else: 262 | # Do it the new way. We try to save time by selecting information 263 | # only for the exact set of relations we are interested in. 264 | catalogs, exceptions = self.get_catalog_by_relations(manifest, relations) 265 | 266 | if relations and catalogs: 267 | relation_map = { 268 | ( 269 | r.schema.casefold() if r.schema else None, 270 | r.identifier.casefold() if r.identifier else None, 271 | ) 272 | for r in relations 273 | } 274 | 275 | def in_map(row: agate.Row): 276 | s = row["table_schema"] 277 | i = row["table_name"] 278 | s = s.casefold() if s is not None else None 279 | i = i.casefold() if i is not None else None 280 | return (s, i) in relation_map 281 | 282 | catalogs = catalogs.where(in_map) 283 | 284 | return catalogs, exceptions 285 | 286 | def get_catalog_by_relations( 287 | self, manifest: Manifest, relations: List[RocksetRelation] 288 | ): 289 | rs = self._rs_client() 290 | 291 | columns = [ 292 | "table_database", 293 | "table_name", 294 | "table_schema", 295 | "table_type", 296 | "stats:row_count:label", 297 | "stats:row_count:value", 298 | "stats:row_count:description", 299 | "stats:row_count:include", 300 | "stats:bytes:label", 301 | "stats:bytes:value", 302 | "stats:bytes:description", 303 | "stats:bytes:include", 304 | "column_type", 305 | "column_name", 306 | "column_index", 307 | ] 308 | catalog_rows = [] 309 | databases = {x.database for x in manifest.sources.values()} | { 310 | x.database for x in manifest.nodes.values() 311 | } 312 | # DB is not a thing in RS. Include all DBs to be filtered out in later stages of catalog generation 313 | for relation in relations: 314 | for collection in rs.Collections.workspace_collections( 315 | workspace=relation.schema 316 | ).data: 317 | rel = self.Relation.create( 318 | schema=collection.workspace, identifier=collection.name 319 | ) 320 | col_types = self._get_types_in_relation(rel) 321 | for i, c in enumerate(col_types): 322 | for db in databases: 323 | catalog_rows.append( 324 | [ 325 | db, 326 | collection.name, 327 | collection.workspace, 328 | "NoSQL", 329 | "Row Count", 330 | collection.stats["doc_count"], 331 | "Rows in table", 332 | True, 333 | "Bytes", 334 | collection.stats["bytes_inserted"], 335 | "Inserted bytes", 336 | True, 337 | c["column_type"], 338 | c["column_name"], 339 | i, 340 | ] 341 | ) 342 | catalog_table = agate.Table( 343 | rows=catalog_rows, 344 | column_names=columns, 345 | column_types=agate.TypeTester( 346 | force={"table_database": agate.Text(cast_nulls=False, null_values=[])} 347 | ), 348 | ) 349 | 350 | return catalog_table, [] 351 | 352 | # Rockset doesn't support DESCRIBE on views, so those are not included in the catalog information 353 | def get_catalog(self, manifest: Manifest) -> agate.Table: 354 | schemas = super()._get_cache_schemas(manifest) 355 | return self.get_catalog_by_relations(manifest, schemas) 356 | 357 | def expand_column_types( 358 | self, goal: RocksetRelation, current: RocksetRelation 359 | ) -> None: 360 | raise NotImplementedError( 361 | "`expand_column_types` is not implemented for this adapter!" 362 | ) 363 | 364 | def expand_target_column_types( 365 | self, from_relation: RocksetRelation, to_relation: RocksetRelation 366 | ) -> None: 367 | raise NotImplementedError( 368 | "`expand_target_column_types` is not implemented for this adapter!" 369 | ) 370 | 371 | @classmethod 372 | def quote(cls, identifier: str) -> str: 373 | return "`{}`".format(identifier) 374 | 375 | ### 376 | # Special Rockset implementations 377 | ### 378 | 379 | @available.parse(lambda *a, **k: "") 380 | def do_debug(self, obj): 381 | import pdb 382 | 383 | pdb.set_trace() 384 | 385 | @available.parse(lambda *a, **k: "") 386 | def apply_snapshot(self, relation, sql): 387 | raise dbt.exceptions.Exception("Snapshots unsupported") 388 | 389 | # Query Lambda materialization 390 | @available.parse(lambda *a, **k: "") 391 | @backoff.on_exception(backoff.expo, ApiException, max_tries=20, giveup=fatal_code) 392 | def create_or_update_query_lambda(self, relation, sql, tags, parameters): 393 | ws = relation.schema 394 | name = relation.identifier 395 | # Ensure workspace exists 396 | self.create_schema(relation) 397 | 398 | if self._query_lambda_exists(ws, name): 399 | self._update_query_lambda(ws, name, sql, tags, parameters) 400 | else: 401 | self._create_query_lambda(ws, name, sql, tags, parameters) 402 | 403 | def _update_query_lambda(self, ws, name, sql, tags, parameters): 404 | rs = self._rs_client() 405 | query_params = [QueryParameter(**qp) for qp in parameters] 406 | try: 407 | api_response = rs.QueryLambdas.update_query_lambda( 408 | workspace=ws, 409 | description="Created via DBT", 410 | is_public=False, 411 | query_lambda=name, 412 | sql=QueryLambdaSql( 413 | default_parameters=query_params, 414 | query=sql, 415 | ), 416 | ) 417 | ql_version = api_response.data.version 418 | for t in tags: 419 | self._add_query_lambda_tag(ws, name, ql_version, t) 420 | except ApiException as e: 421 | logger.error(e) 422 | raise e 423 | 424 | def _create_query_lambda(self, ws, name, sql, tags, parameters): 425 | rs = self._rs_client() 426 | query_params = [QueryParameter(**qp) for qp in parameters] 427 | try: 428 | api_response = rs.QueryLambdas.create_query_lambda( 429 | workspace=ws, 430 | description="Created via DBT", 431 | is_public=False, 432 | name=name, 433 | sql=QueryLambdaSql( 434 | default_parameters=query_params, 435 | query=sql, 436 | ), 437 | ) 438 | ql_version = api_response.data.version 439 | for t in tags: 440 | self._add_query_lambda_tag(ws, name, ql_version, t) 441 | except ApiException as e: 442 | logger.error(e) 443 | raise e 444 | 445 | def _add_query_lambda_tag(self, ws, name, version, tag): 446 | rs = self._rs_client() 447 | api_response = rs.QueryLambdas.create_query_lambda_tag( 448 | workspace=ws, 449 | query_lambda=name, 450 | tag_name=tag, 451 | version=version, 452 | ) 453 | 454 | def _query_lambda_exists(self, ws, name): 455 | # Check if latest tag exists 456 | rs = self._rs_client() 457 | try: 458 | resp = rs.QueryLambdas.get_query_lambda_tag_version( 459 | workspace=ws, query_lambda=name, tag="latest" 460 | ) 461 | return True 462 | except ApiException as e: 463 | if e.status == 404: 464 | return False 465 | else: 466 | raise e 467 | 468 | # Table materialization 469 | @available.parse(lambda *a, **k: "") 470 | def create_table(self, relation, sql): 471 | ws = relation.schema 472 | cname = relation.identifier 473 | rs = self._rs_client() 474 | 475 | if self._does_collection_exist(ws, cname): 476 | self._delete_collection(ws, cname) 477 | 478 | if self._does_alias_exist(ws, cname): 479 | self._delete_alias(ws, cname) 480 | 481 | if self._does_view_exist(ws, cname): 482 | self._delete_view_recursively(ws, cname) 483 | 484 | logger.debug(f"Creating collection {ws}.{cname}") 485 | 486 | c = rs.Collections.create_s3_collection(name=cname, workspace=ws) 487 | self._wait_until_collection_ready(ws, cname) 488 | 489 | # Run an INSERT INTO statement and wait for it to be fully ingested 490 | relation = self._rs_collection_to_relation(c) 491 | iis_query_id = self._execute_iis_query_and_wait_for_docs(relation, sql) 492 | 493 | # Used by dbt seed 494 | @available.parse_none 495 | def load_dataframe( 496 | self, database, schema, table_name, agate_table, column_override 497 | ): 498 | ws = schema 499 | cname = table_name 500 | 501 | # Translate the agate table in json docs 502 | json_docs = [] 503 | for row in agate_table.rows: 504 | d = dict(row.dict()) 505 | for k, v in d.items(): 506 | d[k] = self._convert_agate_data_type(v) 507 | json_docs.append(d) 508 | 509 | # The check for a view should happen before this point 510 | if self._does_view_exist(ws, cname): 511 | raise dbt.exceptions.Exception(f"InternalError : View {ws}.{cname} exists") 512 | 513 | # Create the Rockset collection 514 | if not self._does_collection_exist(ws, cname): 515 | rs = self._rs_client() 516 | rs.Collections.create_s3_collection(name=cname, workspace=ws) 517 | self._wait_until_collection_ready(ws, cname) 518 | 519 | # Write the results to the collection and wait until the docs are ingested 520 | body = {"data": json_docs} 521 | write_api_endpoint = f"/v1/orgs/self/ws/{ws}/collections/{cname}/docs" 522 | resp = json.loads(self._send_rs_request("POST", write_api_endpoint, body).text) 523 | self._wait_until_past_commit_fence(ws, cname, resp["last_offset"]) 524 | 525 | def _convert_agate_data_type(self, v): 526 | if v is None: 527 | return None 528 | elif isinstance(v, str): 529 | return v 530 | elif isinstance(v, Decimal): 531 | return float(v) 532 | elif isinstance(v, datetime.datetime): 533 | return str(v) 534 | elif isinstance(v, datetime.date): 535 | return str(v) 536 | else: 537 | raise dbt.exceptions.Exception( 538 | f"Unknown data type {v.__class__} in seeded table" 539 | ) 540 | 541 | # View materialization 542 | # As of this comment, the rockset python sdk does not support views, so this is implemented 543 | # with the python requests library 544 | @available.parse(lambda *a, **k: "") 545 | def create_view(self, relation, sql): 546 | ws = relation.schema 547 | view = relation.identifier 548 | 549 | if not self._does_view_exist(ws, view): 550 | self._create_view(ws, view, sql) 551 | else: 552 | self._update_view(ws, view, sql) 553 | 554 | # If we wait until the view is synced, then we can be sure that any subsequent queries 555 | # of the view will use the new sql text 556 | self._wait_until_view_fully_synced(ws, view) 557 | 558 | # Sleep a few seconds to be extra sure that all caches are updated with the new view 559 | sleep(3) 560 | 561 | @available.parse(lambda *a, **k: "") 562 | def add_incremental_docs(self, relation, sql, unique_key): 563 | if unique_key and unique_key != "_id": 564 | raise NotImplementedError( 565 | "`unique_key` can only be set to `_id` with the Rockset adapter!" 566 | ) 567 | 568 | ws = relation.schema 569 | cname = relation.identifier 570 | self._wait_until_collection_ready(ws, cname) 571 | 572 | # Run an INSERT INTO statement and wait for it to be fully ingested 573 | iis_query_id = self._execute_iis_query_and_wait_for_docs(relation, sql) 574 | 575 | ### 576 | # Internal Rockset helper methods 577 | ### 578 | 579 | def _rs_client(self): 580 | return self.connections.get_thread_connection().handle._client 581 | 582 | def _rs_api_key(self): 583 | return self.connections.get_thread_connection().credentials.api_key 584 | 585 | def _rs_vi_rrn(self): 586 | return self.connections.get_thread_connection().credentials.vi_rrn 587 | 588 | def _rs_run_async_iis(self): 589 | return self.connections.get_thread_connection().credentials.run_async_iis 590 | 591 | def _rs_api_server(self): 592 | return ( 593 | f"https://{self.connections.get_thread_connection().credentials.api_server}" 594 | ) 595 | 596 | def _rs_cursor(self): 597 | return self.connections.get_thread_connection().handle.cursor() 598 | 599 | def _execute_iis_query_and_wait_for_docs(self, relation, sql): 600 | query_id, num_docs_inserted = self._execute_iis_query(relation, sql) 601 | if num_docs_inserted > 0: 602 | self._wait_until_iis_query_processed( 603 | relation.schema, relation.identifier, query_id 604 | ) 605 | else: 606 | logger.info(f"Query {query_id} inserted 0 docs; no ingest to wait for.") 607 | 608 | # Execute a query not using the SQL cursor, but by hitting the REST api. This should be done 609 | # if you need the QueryResponse object returned 610 | # Returns: query_id (str), num_docs_inserted (int) 611 | def _execute_iis_query(self, relation, sql): 612 | iis_sql = f"INSERT INTO {relation} {sql}" 613 | logger.debug(f"Executing sql: {iis_sql}") 614 | 615 | if self._rs_vi_rrn(): 616 | endpoint = f"/v1/orgs/self/virtualinstances/{self._rs_vi_rrn()}/queries" 617 | else: 618 | endpoint = "/v1/orgs/self/queries" 619 | 620 | body = { 621 | "sql": {"query": iis_sql}, 622 | } 623 | if self._rs_run_async_iis(): 624 | body["async_options"] = ASYNC_OPTIONS 625 | 626 | resp = self._send_rs_request("POST", endpoint, body=body) 627 | if not 200 <= resp.status_code < 300: 628 | raise dbt.exceptions.Exception(resp.text) 629 | json_resp = json.loads(resp.text) 630 | query_id = json_resp["query_id"] 631 | 632 | # If using async queries and query did not complete, wait until query completes 633 | if json_resp["status"] == "QUEUED" or json_resp["status"] == "RUNNING": 634 | self._wait_until_async_query_completed(query_id) 635 | # Async IIS queries do not have results saved in s3 so just assume docs were inserted 636 | # If this behavior changes in the future, then get the results of the async IIS query 637 | # and get num_docs_inserted 638 | return query_id, 1 639 | 640 | assert ( 641 | len(json_resp["results"]) == 1 642 | ), f"IIS queries should return only one document but got {len(json_resp['results'])}" 643 | 644 | return query_id, json_resp["results"][0]["num_docs_inserted"] 645 | 646 | def _wait_until_collection_does_not_exist(self, cname, ws): 647 | while True: 648 | try: 649 | self._rs_client().Collections.get(collection=cname, workspace=ws) 650 | logger.debug(f"Waiting for collection {ws}.{cname} to be deleted...") 651 | sleep(3) 652 | except Exception as e: 653 | if ( 654 | hasattr(e, "status") and e.status == NOT_FOUND 655 | ): # Collection does not exist 656 | return 657 | raise e 658 | 659 | def _wait_until_view_does_not_exist(self, ws, view): 660 | while True: 661 | if self._does_view_exist(ws, view): 662 | logger.debug(f"Waiting for view {ws}.{view} to be deleted") 663 | sleep(3) 664 | else: 665 | break 666 | 667 | def _wait_until_collection_ready(self, ws, cname): 668 | max_wait_time_secs = 600 669 | sleep_secs = 3 670 | total_sleep_time = 0 671 | 672 | while total_sleep_time < max_wait_time_secs: 673 | if not self._does_collection_exist(ws, cname): 674 | logger.debug( 675 | f"Collection {ws}.{cname} does not exist. This is likely a transient consistency error." 676 | ) 677 | sleep(sleep_secs) 678 | total_sleep_time += sleep_secs 679 | continue 680 | 681 | c = self._rs_client().Collections.get(collection=cname, workspace=ws) 682 | if c.data["status"] == "READY": 683 | logger.debug(f"{ws}.{cname} is ready!") 684 | return 685 | else: 686 | logger.debug(f"Waiting for collection {ws}.{cname} to become ready...") 687 | sleep(sleep_secs) 688 | total_sleep_time += sleep_secs 689 | 690 | raise dbt.exceptions.Exception( 691 | f"Waited more than {max_wait_time_secs} secs for {ws}.{cname} to become ready. Something is wrong." 692 | ) 693 | 694 | def _rs_collection_to_relation(self, collection): 695 | 696 | if collection is None: 697 | return None 698 | 699 | if hasattr(collection, "data"): 700 | collection = collection.data # TODO: remove the need for this 701 | 702 | return self.Relation.create( 703 | schema=collection.workspace, 704 | identifier=collection.name, 705 | type="table", 706 | quote_policy=RocksetQuotePolicy(), 707 | ) 708 | 709 | def _wait_until_alias_deleted(self, ws, alias): 710 | while True: 711 | if self._does_alias_exist(ws, alias): 712 | logger.debug(f"Waiting for alias {ws}.{alias} to be deleted") 713 | sleep(3) 714 | else: 715 | break 716 | 717 | def _wait_until_collection_deleted(self, ws, cname): 718 | while True: 719 | if self._does_collection_exist(ws, cname): 720 | logger.debug(f"Waiting for collection {ws}.{cname} to be deleted") 721 | sleep(3) 722 | else: 723 | break 724 | 725 | def _delete_collection(self, ws, cname, wait_until_deleted=True): 726 | rs = self._rs_client() 727 | 728 | for ref_view in self._get_referencing_views(ws, cname): 729 | self._delete_view_recursively(ref_view[0], ref_view[1]) 730 | 731 | try: 732 | c = rs.Collections.delete(collection=cname, workspace=ws) 733 | 734 | if wait_until_deleted: 735 | self._wait_until_collection_deleted(ws, cname) 736 | except Exception as e: 737 | if hasattr(e, "status") and e.status != NOT_FOUND: 738 | raise e # Unexpected error 739 | 740 | @backoff.on_exception(backoff.expo, ApiException, max_tries=20, giveup=fatal_code) 741 | def _delete_ql(self, ws, ql): 742 | rs = self._rs_client() 743 | try: 744 | rs.QueryLambdas.delete_query_lambda(query_lambda=ql, workspace=ws) 745 | self._wait_until_ql_deleted(ws, ql) 746 | except Exception as e: 747 | if hasattr(e, "status") and e.status != NOT_FOUND: 748 | raise e # Unexpected error 749 | 750 | def _wait_until_ql_deleted(self, ws, ql): 751 | while True: 752 | if self._query_lambda_exists(ws, ql): 753 | logger.debug(f"Waiting for ql {ws}.{ql} to be deleted") 754 | sleep(3) 755 | else: 756 | break 757 | 758 | def _delete_alias(self, ws, alias): 759 | rs = self._rs_client() 760 | 761 | for ref_view in self._get_referencing_views(ws, alias): 762 | self._delete_view_recursively(ref_view[0], ref_view[1]) 763 | 764 | try: 765 | rs.Aliases.delete(alias=alias, workspace=ws) 766 | self._wait_until_alias_deleted(ws, alias) 767 | except Exception as e: 768 | if hasattr(e, "status") and e.status != NOT_FOUND: 769 | raise e # Unexpected error 770 | 771 | def _wait_until_past_commit_fence(self, ws, cname, fence): 772 | endpoint = ( 773 | f"/v1/orgs/self/ws/{ws}/collections/{cname}/offsets/commit?fence={fence}" 774 | ) 775 | while True: 776 | resp = self._send_rs_request("GET", endpoint) 777 | resp_json = json.loads(resp.text) 778 | passed = resp_json["data"]["passed"] 779 | commit_offset = resp_json["offsets"]["commit"] 780 | if passed: 781 | logger.debug( 782 | f"Commit offset {commit_offset} is past given fence {fence}" 783 | ) 784 | break 785 | else: 786 | logger.debug( 787 | f"Waiting for commit offset to pass fence {fence}; it is currently {commit_offset}" 788 | ) 789 | sleep(3) 790 | 791 | def _wait_until_async_query_completed(self, query_id): 792 | endpoint = f"/v1/orgs/self/queries/{query_id}" 793 | while True: 794 | query_resp = self._send_rs_request("GET", endpoint) 795 | query_data = json.loads(query_resp.text) 796 | 797 | status = ( 798 | query_data.get("data").get("status") if "data" in query_data else None 799 | ) 800 | if status == "COMPLETED": 801 | return 802 | elif status == "ERROR" or status == "CANCELLED": 803 | raise Exception( 804 | f"IIS query did not complete successfully. Query data: {query_resp.text}" 805 | ) 806 | else: 807 | logger.debug(f"Insert Into Query not completed yet") 808 | sleep(3) 809 | 810 | def _wait_until_iis_fully_ingested(self, ws, cname, query_id): 811 | endpoint = f"/v1/orgs/self/queries/{query_id}" 812 | while True: 813 | query_resp = self._send_rs_request("GET", endpoint) 814 | query_data = json.loads(query_resp.text) 815 | 816 | last_offset = query_data.get("last_offset") 817 | if last_offset is not None: 818 | return last_offset 819 | else: 820 | logger.debug( 821 | f"Insert Into Query not yet finished processing; last offset not present" 822 | ) 823 | sleep(3) 824 | 825 | def _wait_until_iis_query_processed(self, ws, cname, query_id): 826 | last_offset = self._wait_until_iis_fully_ingested(ws, cname, query_id) 827 | self._wait_until_past_commit_fence(ws, cname, last_offset) 828 | 829 | def _send_rs_request(self, type, endpoint, body=None, check_success=True): 830 | url = self._rs_api_server() + endpoint 831 | version = "dbt/" + rs_version 832 | headers = { 833 | "authorization": f"apikey {self._rs_api_key()}", 834 | "user-agent": version, 835 | } 836 | 837 | if type == "GET": 838 | resp = requests.get(url, headers=headers) 839 | elif type == "POST": 840 | resp = requests.post(url, headers=headers, json=body) 841 | elif type == "DELETE": 842 | resp = requests.delete(url, headers=headers) 843 | else: 844 | raise Exception(f"Unimplemented request type {type}") 845 | 846 | code = resp.status_code 847 | if check_success and (code < 200 or code > 299): 848 | raise Exception(resp.text) 849 | return resp 850 | 851 | def _views_endpoint(self, ws): 852 | return f"/v1/orgs/self/ws/{ws}/views" 853 | 854 | def _list_views(self, ws): 855 | endpoint = self._views_endpoint(ws) 856 | resp_json = json.loads(self._send_rs_request("GET", endpoint).text) 857 | return [v["name"] for v in resp_json["data"]] 858 | 859 | def _does_view_exist(self, ws, view): 860 | endpoint = self._views_endpoint(ws) + f"/{view}" 861 | response = self._send_rs_request("GET", endpoint, check_success=False) 862 | if response.status_code == NOT_FOUND: 863 | return False 864 | elif response.status_code == OK: 865 | return True 866 | else: 867 | raise Exception( 868 | f"throwing from 332 with status_code {response.status_code} and text {response.text}" 869 | ) 870 | 871 | def _does_alias_exist(self, ws, alias): 872 | rs = self._rs_client() 873 | try: 874 | rs.Aliases.get(alias=alias, workspace=ws) 875 | return True 876 | except Exception as e: 877 | if e.status == NOT_FOUND: 878 | return False 879 | else: 880 | raise e 881 | 882 | def _does_collection_exist(self, ws, cname): 883 | rs = self._rs_client() 884 | try: 885 | rs.Collections.get(workspace=ws, collection=cname) 886 | return True 887 | except Exception as e: 888 | if e.status == NOT_FOUND: 889 | return False 890 | else: 891 | raise e 892 | 893 | def _create_view(self, ws, view, sql): 894 | # Check if alias or collection exist with same name 895 | rs = self._rs_client() 896 | if self._does_alias_exist(ws, view): 897 | self._delete_alias(ws, view) 898 | 899 | if self._does_collection_exist(ws, view): 900 | self._delete_collection(ws, view) 901 | 902 | endpoint = self._views_endpoint(ws) 903 | body = {"name": view, "query": sql, "description": "Created via dbt"} 904 | self._send_rs_request("POST", endpoint, body=body) 905 | 906 | # Delete the view and any views that depend on it (recursively) 907 | def _delete_view_recursively(self, ws, view): 908 | for ref_view in self._get_referencing_views(ws, view): 909 | self._delete_view_recursively(ref_view[0], ref_view[1]) 910 | 911 | endpoint = f"{self._views_endpoint(ws)}/{view}" 912 | del_resp = self._send_rs_request("DELETE", endpoint) 913 | if del_resp.status_code == NOT_FOUND: 914 | return 915 | elif del_resp.status_code != OK: 916 | raise Exception( 917 | f"throwing from 395 with code {del_resp.status_code} and text {del_resp.text}" 918 | ) 919 | 920 | self._wait_until_view_does_not_exist(ws, view) 921 | 922 | def _get_referencing_views(self, ws, view): 923 | view_path = f"{ws}.{view}" 924 | 925 | list_endpoint = f"{self._views_endpoint(ws)}" 926 | list_resp = self._send_rs_request("GET", list_endpoint) 927 | list_json = json.loads(list_resp.text) 928 | 929 | results = [] 930 | for view in list_json["data"]: 931 | for referenced_entity in view["entities"]: 932 | if referenced_entity == view_path: 933 | results.append((view["workspace"], view["name"])) 934 | return results 935 | 936 | def _update_view(self, ws, view, sql): 937 | endpoint = self._views_endpoint(ws) + f"/{view}" 938 | body = {"query": sql} 939 | self._send_rs_request("POST", endpoint, body=body) 940 | 941 | def _wait_until_view_fully_synced(self, ws, view): 942 | max_wait_time_secs = 600 943 | sleep_secs = 3 944 | total_sleep_time = 0 945 | 946 | endpoint = f"{self._views_endpoint(ws)}/{view}" 947 | while total_sleep_time < max_wait_time_secs: 948 | if not self._does_view_exist(ws, view): 949 | logger.debug( 950 | f"View {ws}.{view} does not exist. This is likely a transient consistency error." 951 | ) 952 | sleep(sleep_secs) 953 | total_sleep_time += sleep_secs 954 | continue 955 | 956 | resp = self._send_rs_request("GET", endpoint) 957 | view_json = json.loads(resp.text)["data"] 958 | state = view_json["state"] 959 | 960 | if state == "SYNCING": 961 | logger.debug(f"Waiting for view {ws}.{view} to be fully synced") 962 | sleep(sleep_secs) 963 | total_sleep_time += sleep_secs 964 | else: 965 | logger.debug(f"View {ws}.{view} is synced and ready to be queried") 966 | return 967 | 968 | raise dbt.exceptions.Exception( 969 | f"Waited more than {max_wait_time_secs} secs for view {ws}.{view} to become synced. Something is wrong." 970 | ) 971 | 972 | def run_sql_for_tests(self, sql, fetch, conn): 973 | cursor = conn.handle.cursor() 974 | try: 975 | cursor.execute(sql) 976 | if fetch == "one": 977 | return cursor.fetchone() 978 | elif fetch == "all": 979 | return cursor.fetchall() 980 | except BaseException as e: 981 | logger.error(sql) 982 | logger.error(e) 983 | raise 984 | finally: 985 | conn.state = "close" 986 | 987 | # Overridden because Rockset generates columns not added during testing. 988 | def get_rows_different_sql( 989 | self, 990 | relation_a: RocksetRelation, 991 | relation_b: RocksetRelation, 992 | column_names: Optional[List[str]] = None, 993 | except_operator: str = "EXCEPT", 994 | ) -> str: 995 | 996 | names: List[str] 997 | # columns generated by Rockset 998 | skip_cmp_columns: Set[str] = {"_event_time", "_id", "_meta"} 999 | if column_names is None: 1000 | columns = self.get_columns_in_relation(relation_a) 1001 | names = sorted( 1002 | (self.quote(c.name) for c in columns if c.name not in skip_cmp_columns) 1003 | ) 1004 | else: 1005 | names = sorted( 1006 | (self.quote(n) for n in column_names if n not in skip_cmp_columns) 1007 | ) 1008 | columns_csv = ", ".join(names) 1009 | 1010 | sql = COLUMNS_EQUAL_SQL.format( 1011 | columns=columns_csv, 1012 | relation_a=str(relation_a), 1013 | relation_b=str(relation_b), 1014 | except_op=except_operator, 1015 | ) 1016 | 1017 | return sql 1018 | 1019 | 1020 | COLUMNS_EQUAL_SQL = """ 1021 | with diff_count as ( 1022 | SELECT 1023 | 1 as id, 1024 | COUNT(*) as num_missing FROM ( 1025 | (SELECT {columns} FROM {relation_a} {except_op} 1026 | SELECT {columns} FROM {relation_b}) 1027 | UNION ALL 1028 | (SELECT {columns} FROM {relation_b} {except_op} 1029 | SELECT {columns} FROM {relation_a}) 1030 | ) as a 1031 | ), table_a as ( 1032 | SELECT COUNT(*) as num_rows FROM {relation_a} 1033 | ), table_b as ( 1034 | SELECT COUNT(*) as num_rows FROM {relation_b} 1035 | ), row_count_diff as ( 1036 | select 1037 | 1 as id, 1038 | table_a.num_rows - table_b.num_rows as difference 1039 | from table_a, table_b 1040 | ) 1041 | select 1042 | row_count_diff.difference as row_count_difference, 1043 | diff_count.num_missing as num_mismatched 1044 | from row_count_diff 1045 | join diff_count using (id) 1046 | """.strip() 1047 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/relation.py: -------------------------------------------------------------------------------- 1 | from dataclasses import dataclass, field 2 | from dbt.adapters.base.relation import BaseRelation, Policy 3 | from dbt.contracts.relation import RelationType 4 | 5 | import traceback 6 | from typing import List, Optional, Type 7 | 8 | 9 | @dataclass 10 | class RocksetQuotePolicy(Policy): 11 | database: bool = False 12 | schema: bool = True 13 | identifier: bool = True 14 | 15 | 16 | @dataclass 17 | class RocksetIncludePolicy(Policy): 18 | database: bool = False 19 | schema: bool = True 20 | identifier: bool = True 21 | 22 | 23 | @dataclass(frozen=True, eq=False, repr=False) 24 | class RocksetRelation(BaseRelation): 25 | quote_policy: RocksetQuotePolicy = field( 26 | default_factory=lambda: RocksetQuotePolicy() 27 | ) 28 | include_policy: RocksetIncludePolicy = field( 29 | default_factory=lambda: RocksetIncludePolicy() 30 | ) 31 | 32 | # We override this function with a very simple implementation. Database is not a concept 33 | # in Rockset, so we do not make such a comparison 34 | def matches( 35 | self, 36 | database: Optional[str] = None, 37 | schema: Optional[str] = None, 38 | identifier: Optional[str] = None, 39 | ) -> bool: 40 | return self.path.schema == schema and self.path.identifier == identifier 41 | -------------------------------------------------------------------------------- /dbt/adapters/rockset/sample_profiles.yml: -------------------------------------------------------------------------------- 1 | rockset: 2 | outputs: 3 | dev: 4 | type: rockset 5 | workspace: 6 | api_key: 7 | region: # us-west-2 by default 8 | target: dev -------------------------------------------------------------------------------- /dbt/include/__init__.py: -------------------------------------------------------------------------------- 1 | __path__ = __import__("pkgutil").extend_path(__path__, __name__) 2 | -------------------------------------------------------------------------------- /dbt/include/rockset/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | PACKAGE_PATH = os.path.dirname(__file__) 4 | -------------------------------------------------------------------------------- /dbt/include/rockset/dbt_project.yml: -------------------------------------------------------------------------------- 1 | 2 | name: dbt_rockset 3 | version: 1.0 4 | config-version: 2 5 | profile: rockset 6 | macro-paths: ["macros"] 7 | -------------------------------------------------------------------------------- /dbt/include/rockset/macros/catalog.sql: -------------------------------------------------------------------------------- 1 | {% macro rockset__get_catalog(information_schema, schemas) -%} 2 | {{adapter.get_catalog(information_schema, schemas)}} 3 | {% endmacro %} 4 | -------------------------------------------------------------------------------- /dbt/include/rockset/macros/current_ts.sql: -------------------------------------------------------------------------------- 1 | {% macro rockset__current_timestamp() -%} 2 | CURRENT_TIMESTAMP() 3 | {%- endmacro %} 4 | -------------------------------------------------------------------------------- /dbt/include/rockset/macros/macros.sql: -------------------------------------------------------------------------------- 1 | -- Rockset does not have a notion of database, so do not include it when resolving refs or sources 2 | -- It resolves as schema.identifier (i.e. workspace.collection), instead of database.schema.identifier 3 | {% macro rockset__ref(modelname) %} 4 | {{ builtins.ref(modelname).include(database=False).render() }} 5 | {% endmacro %} 6 | 7 | {% macro rockset__source(source_name, table_name) %} 8 | {{ builtins.source(source_name, table_name).include(database=False).render() }} 9 | {% endmacro %} 10 | -------------------------------------------------------------------------------- /dbt/include/rockset/macros/materializations/incremental.sql: -------------------------------------------------------------------------------- 1 | {% materialization incremental, adapter='rockset' -%} 2 | {%- set unique_key = config.get('unique_key') -%} 3 | {%- set full_refresh_mode = (should_full_refresh()) -%} 4 | {%- set target_relation = this %} 5 | {%- set existing_relation = adapter.get_collection(target_relation) %} 6 | 7 | {{ run_hooks(pre_hooks) }} 8 | 9 | {% if existing_relation is none %} 10 | {{ adapter.create_table(target_relation, sql) }} 11 | {% elif full_refresh_mode %} 12 | {{ adapter.drop_relation(existing_relation) }} 13 | {{ adapter.create_table(target_relation, sql) }} 14 | {% else %} 15 | {{ adapter.add_incremental_docs(target_relation, sql, unique_key) }} 16 | {% endif %} 17 | 18 | {#-- Rockset does not support CREATE TABLE sql. All logic to create / add docs to collections happens above --#} 19 | {%- call statement('main') -%} 20 | {{ adapter.get_dummy_sql() }} 21 | {% endcall %} 22 | 23 | {{ run_hooks(post_hooks) }} 24 | 25 | {% set target_relation = this.incorporate(type='table') %} 26 | {% do persist_docs(target_relation, model) %} 27 | {{ return({'relations': [target_relation]}) }} 28 | 29 | {%- endmaterialization %} -------------------------------------------------------------------------------- /dbt/include/rockset/macros/materializations/query_lambda.sql: -------------------------------------------------------------------------------- 1 | {% materialization query_lambda, adapter='rockset' -%} 2 | {% set target_relation = this.incorporate(type='view') %} 3 | {% set tags = config.get('tags', default=[]) %} 4 | {% set parameters = config.get('parameters',default=[]) %} 5 | {{ adapter.create_or_update_query_lambda(target_relation, sql, tags, parameters) }} 6 | 7 | {{ run_hooks(pre_hooks) }} 8 | 9 | {#-- All logic to create Query Lambdas happens in the adapter --#} 10 | {% call statement('main') -%} 11 | {{ adapter.get_dummy_sql() }} 12 | {%- endcall %} 13 | 14 | {{ run_hooks(post_hooks) }} 15 | 16 | {% do persist_docs(target_relation, model) %} 17 | 18 | {{ return({'relations':[target_relation]}) }} 19 | {%- endmaterialization %} -------------------------------------------------------------------------------- /dbt/include/rockset/macros/materializations/seed.sql: -------------------------------------------------------------------------------- 1 | 2 | {% materialization seed, adapter='rockset' %} 3 | 4 | {%- set identifier = model['alias'] -%} 5 | {%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%} 6 | {%- set full_refresh_mode = (should_full_refresh()) -%} 7 | 8 | {%- set exists_as_view = (old_relation is not none and old_relation.is_view) -%} 9 | {%- set exists_as_table = (old_relation is not none and old_relation.is_table) -%} 10 | 11 | {%- set agate_table = load_agate_table() -%} 12 | {%- do store_result('agate_table', response='OK', agate_table=agate_table) -%} 13 | 14 | {{ run_hooks(pre_hooks, inside_transaction=False) }} 15 | {{ run_hooks(pre_hooks, inside_transaction=True) }} 16 | 17 | {% if exists_as_view %} 18 | {{ exceptions.raise_compiler_error("Cannot seed to '{}', it is a view".format(old_relation)) }} 19 | {% elif exists_as_table and full_refresh_mode %} 20 | {{ adapter.drop_relation(old_relation) }} 21 | {% endif %} 22 | {% set sql = load_csv_rows(model, agate_table) %} 23 | 24 | {% call statement('main') -%} 25 | {{ adapter.get_dummy_sql() }} 26 | {%- endcall %} 27 | 28 | {% set target_relation = this.incorporate(type='table') %} 29 | {% do persist_docs(target_relation, model) %} 30 | 31 | {{ run_hooks(post_hooks, inside_transaction=True) }} 32 | {{ run_hooks(post_hooks, inside_transaction=False) }} 33 | 34 | {{ return({'relations': [target_relation]}) }} 35 | 36 | {% endmaterialization %} 37 | 38 | {% macro rockset__load_csv_rows(model, agate_table) %} 39 | {%- set column_override = model['config'].get('column_types', {}) -%} 40 | {{ adapter.load_dataframe(model['database'], model['schema'], model['alias'], 41 | agate_table, column_override) }} 42 | {% endmacro %} 43 | 44 | -------------------------------------------------------------------------------- /dbt/include/rockset/macros/materializations/table.sql: -------------------------------------------------------------------------------- 1 | {% materialization table, adapter='rockset' -%} 2 | {%- set identifier = model['alias'] -%} 3 | {%- set target_relation = api.Relation.create(database=database, schema=schema, identifier=identifier, type='table') -%} 4 | 5 | {{ run_hooks(pre_hooks) }} 6 | {{ adapter.create_table(target_relation, sql) }} 7 | 8 | {#-- Rockset does not support CREATE TABLE sql. All logic to create collections happens in adapter.create_table --#} 9 | {% call statement('main') -%} 10 | {{ adapter.get_dummy_sql() }} 11 | {%- endcall %} 12 | 13 | {{ run_hooks(post_hooks) }} 14 | 15 | {% do persist_docs(target_relation, model) %} 16 | 17 | {{ return({'relations': [target_relation]}) }} 18 | 19 | {% endmaterialization %} 20 | -------------------------------------------------------------------------------- /dbt/include/rockset/macros/materializations/view.sql: -------------------------------------------------------------------------------- 1 | {% materialization view, adapter='rockset' -%} 2 | {% set target_relation = this.incorporate(type='view') %} 3 | {{ adapter.create_view(target_relation, sql) }} 4 | 5 | {#-- Rockset does not support CREATE VIEW sql. All logic to create views happens in create_view --#} 6 | {% call statement('main') -%} 7 | {{ adapter.get_dummy_sql() }} 8 | {%- endcall %} 9 | 10 | {{ run_hooks(post_hooks) }} 11 | 12 | {% do persist_docs(target_relation, model) %} 13 | 14 | {{ return({'relations': [target_relation]}) }} 15 | {%- endmaterialization %} 16 | 17 | {% macro rockset__create_view_as(relation, sql) -%} 18 | {{ adapter.create_view(relation, sql) }} 19 | 20 | {#-- Rockset does not support CREATE VIEW sql. All logic to create views happens in create_view --#} 21 | {% call statement('main') -%} 22 | {{ adapter.get_dummy_sql() }} 23 | {%- endcall %} 24 | 25 | {{ run_hooks(post_hooks) }} 26 | 27 | {% do persist_docs(relation, model) %} 28 | 29 | {{ return(adapter.get_dummy_sql()) }} 30 | {%- endmacro %} 31 | 32 | -------------------------------------------------------------------------------- /dbt/include/rockset/profile_template.yml: -------------------------------------------------------------------------------- 1 | fixed: 2 | type: rockset 3 | prompts: 4 | api_key: 5 | hint: "Your apikey" 6 | hide_input: true 7 | api_server: 8 | default: api.usw2a1.rockset.com 9 | user: 10 | hint: "dev username" 11 | vi_rrn: 12 | hint: "Which Virtual Instance should operations be directed to" 13 | run_async_iis: 14 | hint: "Whether INSERT INTO queries should be run asynchronously. If your queries will exceed 2 minutes, you should enable this." 15 | threads: 16 | hint: "1 or more" 17 | type: "int" 18 | default: 1 19 | -------------------------------------------------------------------------------- /dev-requirements.txt: -------------------------------------------------------------------------------- 1 | agate==1.7.1 2 | ansicolors==1.1.8 3 | arrow==1.3.0 4 | asciitree==0.3.3 5 | asttokens==2.4.1 6 | attrs==23.1.0 7 | Babel==2.13.1 8 | backoff==2.2.1 9 | binaryornot==0.4.4 10 | certifi==2023.7.22 11 | cffi==1.16.0 12 | chardet==5.2.0 13 | charset-normalizer==3.3.2 14 | click==8.1.7 15 | colorama==0.4.5 16 | cookiecutter==2.4.0 17 | dbt-core==1.7.8 18 | dbt-extractor==0.5.1 19 | dbt-semantic-interfaces==0.4.3 20 | dbt-tests-adapter==1.7.8 21 | decorator==5.1.1 22 | executing==2.0.1 23 | future==0.18.3 24 | geojson==2.5.0 25 | greenlet==3.0.1 26 | hologram==0.0.16 27 | idna==3.4 28 | importlib-metadata==6.8.0 29 | iniconfig==2.0.0 30 | ipdb==0.13.13 31 | ipython==8.22.2 32 | isodate==0.6.1 33 | jedi==0.19.1 34 | Jinja2==3.1.2 35 | jsonschema==4.19.2 36 | jsonschema-specifications==2023.7.1 37 | leather==0.3.4 38 | Logbook==1.5.3 39 | markdown-it-py==3.0.0 40 | MarkupSafe==2.0.1 41 | mashumaro==3.12 42 | matplotlib-inline==0.1.6 43 | mdurl==0.1.2 44 | minimal-snowplow-tracker==0.0.2 45 | more-itertools==8.14.0 46 | msgpack==1.0.7 47 | networkx==2.8.8 48 | packaging==21.3 49 | parsedatetime==2.4 50 | parso==0.8.3 51 | pathspec==0.11.2 52 | pexpect==4.9.0 53 | pluggy==1.3.0 54 | prompt-toolkit==3.0.43 55 | protobuf==4.25.0 56 | ptyprocess==0.7.0 57 | pure-eval==0.2.2 58 | py==1.11.0 59 | pycparser==2.21 60 | pydantic==1.10.13 61 | Pygments==2.16.1 62 | pyparsing==3.1.1 63 | pyrsistent==0.20.0 64 | pytest==7.0.0 65 | pytest-dotenv==0.5.2 66 | python-dateutil==2.8.2 67 | python-dotenv==1.0.0 68 | python-slugify==8.0.1 69 | pytimeparse==1.1.8 70 | pytz==2023.3.post1 71 | PyYAML==6.0.1 72 | referencing==0.30.2 73 | requests==2.31.0 74 | rich==13.7.0 75 | rockset==2.1.0 76 | rockset-sqlalchemy==0.0.1 77 | rpds-py==0.10.6 78 | setuptools==69.1.1 79 | simple-term-menu==1.6.4 80 | six==1.16.0 81 | SQLAlchemy==1.4.50 82 | sqlparse==0.4.3 83 | stack-data==0.6.3 84 | text-unidecode==1.3 85 | toml==0.10.2 86 | tomli==2.0.1 87 | traitlets==5.14.2 88 | typed_ast==1.5.5 89 | types-python-dateutil==2.8.19.14 90 | typing_extensions==4.8.0 91 | urllib3==1.26.18 92 | wcwidth==0.2.13 93 | Werkzeug==2.1.2 94 | zipp==3.17.0 95 | -------------------------------------------------------------------------------- /mypy.ini: -------------------------------------------------------------------------------- 1 | [mypy] 2 | mypy_path = ./third-party-stubs 3 | namespace_packages = True 4 | -------------------------------------------------------------------------------- /pytest.ini: -------------------------------------------------------------------------------- 1 | [pytest] 2 | filterwarnings = 3 | ignore:.*'soft_unicode' has been renamed to 'soft_str'*:DeprecationWarning 4 | ignore:unclosed file .*:ResourceWarning 5 | env_files = 6 | test.env 7 | testpaths = 8 | tests/functional 9 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | from setuptools import find_namespace_packages, setup 3 | 4 | package_name = "dbt-rockset" 5 | # make sure this always matches dbt/adapters/{adapter}/__version__.py 6 | package_version = "1.7.4" 7 | description = """The Rockset adapter plugin for dbt""" 8 | 9 | setup( 10 | name=package_name, 11 | version=package_version, 12 | description=description, 13 | long_description=description, 14 | author="Steven Baldwin", 15 | author_email="sbaldwin@rockset.com", 16 | url="https://github.com/rockset/dbt-rockset", 17 | packages=find_namespace_packages(include=["dbt", "dbt.*"]), 18 | include_package_data=True, 19 | install_requires=["dbt-core~=1.7.0", "rockset_sqlalchemy~=0.0.1", "backoff==2.*"], 20 | ) 21 | -------------------------------------------------------------------------------- /test.env: -------------------------------------------------------------------------------- 1 | # API_KEY=Y9iY...Fdwu 2 | # API_SERVER=api.usw2a1.rockset.com 3 | # VI_RRN=rrn:vi:usw2a1:8533f1e5-554e-4ccd-a557-a9fe333fbf2f 4 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rockset/dbt-rockset/6c3721a7a38bef00e27dba6ac8b2738188d74aa0/tests/__init__.py -------------------------------------------------------------------------------- /tests/conftest.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import os 3 | from dotenv import load_dotenv, find_dotenv 4 | 5 | # Import the standard functional fixtures as a plugin 6 | # Note: fixtures with session scope need to be local 7 | pytest_plugins = ["dbt.tests.fixtures.project"] 8 | 9 | # The profile dictionary, used to write out profiles.yml 10 | # dbt will supply a unique schema per test, so we do not specify 'schema' here 11 | 12 | 13 | @pytest.fixture(scope="class") 14 | def dbt_profile_target(): 15 | return { 16 | "type": "rockset", 17 | "threads": 1, 18 | "database": "db", 19 | "api_key": os.getenv("API_KEY"), 20 | "api_server": os.getenv("API_SERVER"), 21 | "vi_rrn": os.getenv("VI_RRN"), 22 | "run_async_iis": os.getenv("USE_ASYNC").lower() == "true", 23 | } 24 | 25 | 26 | @pytest.fixture(scope="session", autouse=True) 27 | def load_env(): 28 | env_file = find_dotenv(".env") 29 | load_dotenv(env_file) 30 | -------------------------------------------------------------------------------- /tests/functional/adapter/test_basic.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | from dbt.tests.adapter.basic.test_singular_tests import BaseSingularTests 4 | from dbt.tests.adapter.basic.test_singular_tests_ephemeral import ( 5 | BaseSingularTestsEphemeral, 6 | ) 7 | from dbt.tests.adapter.basic.test_empty import BaseEmpty 8 | from dbt.tests.adapter.basic.test_incremental import BaseIncremental 9 | from dbt.tests.adapter.basic.test_generic_tests import BaseGenericTests 10 | from dbt.tests.adapter.basic.test_snapshot_check_cols import BaseSnapshotCheckCols 11 | from dbt.tests.adapter.basic.test_snapshot_timestamp import BaseSnapshotTimestamp 12 | from dbt.tests.adapter.utils.test_timestamps import BaseCurrentTimestamps 13 | from dbt.tests.util import run_dbt, check_relations_equal, relation_from_name 14 | 15 | 16 | @pytest.mark.skip() 17 | # Rockset always used timestamps with zone information included 18 | class TestCurrentTimestampsRockset(BaseCurrentTimestamps): 19 | pass 20 | 21 | 22 | class TestSingularTestsRockset(BaseSingularTests): 23 | pass 24 | 25 | 26 | class TestSingularTestsEphemeralRockset(BaseSingularTestsEphemeral): 27 | pass 28 | 29 | 30 | class TestEmptyRockset(BaseEmpty): 31 | pass 32 | 33 | 34 | class TestIncrementalRockset(BaseIncremental): 35 | pass 36 | 37 | 38 | class TestGenericTestsRockset(BaseGenericTests): 39 | pass 40 | 41 | 42 | # Snapshots unsupported 43 | @pytest.mark.skip() 44 | class TestSnapshotCheckColsRockset(BaseSnapshotCheckCols): 45 | pass 46 | 47 | 48 | # Snapshots unsupported 49 | @pytest.mark.skip() 50 | class TestSnapshotTimestampRockset(BaseSnapshotTimestamp): 51 | pass 52 | -------------------------------------------------------------------------------- /tests/functional/adapter/test_basic_overrides.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import os 3 | from dbt.tests.adapter.basic.test_base import BaseSimpleMaterializations 4 | from dbt.tests.adapter.basic.test_adapter_methods import ( 5 | BaseAdapterMethod, 6 | models__expected_sql, 7 | models__upstream_sql, 8 | models__model_sql, 9 | ) 10 | from dbt.tests.adapter.basic.test_ephemeral import BaseEphemeral 11 | from dbt.tests.util import ( 12 | run_dbt, 13 | check_relations_equal, 14 | relation_from_name, 15 | check_result_nodes_by_name, 16 | get_manifest, 17 | check_relation_types, 18 | ) 19 | 20 | 21 | class TestBaseAdapterMethodRockset(BaseAdapterMethod): 22 | @pytest.fixture(scope="class") 23 | def tests(self): 24 | # RS cannot get columns of a relation during compilation 25 | return {} 26 | 27 | @pytest.fixture(scope="class") 28 | def models(self): 29 | return { 30 | "upstream.sql": models__upstream_sql, 31 | "expected.sql": models__expected_sql, 32 | # RS cannot get columns of a view, must materialize the model 33 | "model.sql": """ 34 | {{ config(materialized="table") }} 35 | """ 36 | + models__model_sql, 37 | } 38 | 39 | pass 40 | 41 | 42 | class TestEphemeralRockset(BaseEphemeral): 43 | # RS cannot generate catalog entries from views, requires an override 44 | def test_ephemeral(self, project): 45 | # seed command 46 | results = run_dbt(["seed"]) 47 | assert len(results) == 1 48 | check_result_nodes_by_name(results, ["base"]) 49 | 50 | # run command 51 | results = run_dbt(["run"]) 52 | assert len(results) == 2 53 | check_result_nodes_by_name(results, ["view_model", "table_model"]) 54 | 55 | # base table rowcount 56 | relation = relation_from_name(project.adapter, "base") 57 | result = project.run_sql( 58 | f"select count(*) as num_rows from {relation}", fetch="one" 59 | ) 60 | assert result[0] == 10 61 | 62 | # relations equal 63 | check_relations_equal(project.adapter, ["base", "view_model", "table_model"]) 64 | 65 | # catalog node count 66 | catalog = run_dbt(["docs", "generate"]) 67 | catalog_path = os.path.join(project.project_root, "target", "catalog.json") 68 | assert os.path.exists(catalog_path) 69 | # views are not in the catalog 70 | assert len(catalog.nodes) == 2 71 | assert len(catalog.sources) == 1 72 | 73 | # manifest (not in original) 74 | manifest = get_manifest(project.project_root) 75 | assert len(manifest.nodes) == 4 76 | assert len(manifest.sources) == 1 77 | 78 | pass 79 | 80 | 81 | class TestSimpleMaterializationsRockset(BaseSimpleMaterializations): 82 | # RS cannot generate catalog entries from views, requires an override 83 | def test_base(self, project): 84 | 85 | # seed command 86 | results = run_dbt(["seed"]) 87 | # seed result length 88 | assert len(results) == 1 89 | 90 | # run command 91 | results = run_dbt() 92 | # run result length 93 | assert len(results) == 3 94 | 95 | # names exist in result nodes 96 | check_result_nodes_by_name(results, ["view_model", "table_model", "swappable"]) 97 | 98 | # check relation types 99 | expected = { 100 | "base": "table", 101 | "view_model": "view", 102 | "table_model": "table", 103 | "swappable": "table", 104 | } 105 | check_relation_types(project.adapter, expected) 106 | 107 | # base table rowcount 108 | relation = relation_from_name(project.adapter, "base") 109 | result = project.run_sql( 110 | f"select count(*) as num_rows from {relation}", fetch="one" 111 | ) 112 | assert result[0] == 10 113 | 114 | # relations_equal 115 | check_relations_equal( 116 | project.adapter, ["base", "view_model", "table_model", "swappable"] 117 | ) 118 | 119 | # check relations in catalog 120 | catalog = run_dbt(["docs", "generate"]) 121 | # views aren't in the catalog 122 | assert len(catalog.nodes) == 3 123 | assert len(catalog.sources) == 1 124 | 125 | # run_dbt changing materialized_var to view 126 | # required for BigQuery 127 | if project.test_config.get("require_full_refresh", False): 128 | results = run_dbt( 129 | [ 130 | "run", 131 | "--full-refresh", 132 | "-m", 133 | "swappable", 134 | "--vars", 135 | "materialized_var: view", 136 | ] 137 | ) 138 | else: 139 | results = run_dbt( 140 | ["run", "-m", "swappable", "--vars", "materialized_var: view"] 141 | ) 142 | assert len(results) == 1 143 | 144 | # check relation types, swappable is view 145 | expected = { 146 | "base": "table", 147 | "view_model": "view", 148 | "table_model": "table", 149 | "swappable": "view", 150 | } 151 | check_relation_types(project.adapter, expected) 152 | 153 | # run_dbt changing materialized_var to incremental 154 | results = run_dbt( 155 | ["run", "-m", "swappable", "--vars", "materialized_var: incremental"] 156 | ) 157 | assert len(results) == 1 158 | 159 | # check relation types, swappable is table 160 | expected = { 161 | "base": "table", 162 | "view_model": "view", 163 | "table_model": "table", 164 | "swappable": "table", 165 | } 166 | check_relation_types(project.adapter, expected) 167 | 168 | pass 169 | -------------------------------------------------------------------------------- /tests/functional/adapter/test_query_lambda.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | import os 3 | import random, time 4 | from rockset import * 5 | from rockset.models import * 6 | from dbt.tests.adapter.basic.test_base import BaseSimpleMaterializations 7 | from dbt.tests.adapter.basic.test_adapter_methods import ( 8 | BaseAdapterMethod, 9 | models__expected_sql, 10 | models__upstream_sql, 11 | models__model_sql, 12 | ) 13 | from dbt.tests.adapter.basic.test_ephemeral import BaseEphemeral 14 | from dbt.tests.util import ( 15 | run_dbt, 16 | check_relations_equal, 17 | relation_from_name, 18 | check_result_nodes_by_name, 19 | get_manifest, 20 | check_relation_types, 21 | ) 22 | from dbt.tests.fixtures.project import write_project_files 23 | from dbt.tests.util import run_dbt, check_relations_equal 24 | 25 | models__base_sql = """ 26 | select 1 as num 27 | UNION 28 | select 2 as num 29 | """ 30 | 31 | 32 | # Create some tags 33 | tags = [f"rand_tag_{i}_" + str(random.randint(0, 10**4)) for i in range(3)] 34 | 35 | models__ql_sql = ( 36 | """ 37 | {{ config( 38 | materialized="query_lambda", 39 | """ 40 | + f"tags={tags}," 41 | + """ 42 | parameters=[ 43 | {'name': 'mul', 'value': '7', 'type': 'int' }, 44 | {'name': 'exclude', 'value': '1', 'type': 'int' }, 45 | ], 46 | )}} 47 | select num * :mul as result 48 | from {{ ref('base') }} 49 | where num <= :exclude 50 | """ 51 | ) 52 | 53 | ql_name = "dbt_ql" 54 | 55 | 56 | NUM_QLS = 35 57 | 58 | 59 | class TestQueryLambdaRateLimitingRockset(BaseAdapterMethod): 60 | # Creates many QLs to ensure they eventually get created even when encountering rate limiting 61 | @pytest.fixture(scope="class") 62 | def models(self): 63 | models = { 64 | "base.sql": models__base_sql, 65 | } 66 | 67 | for i in range(NUM_QLS): 68 | ql_model = ( 69 | """ 70 | {{ config( 71 | materialized="query_lambda", 72 | """ 73 | + f'tags=["{str(i)}"],' 74 | + """ 75 | parameters=[ 76 | {'name': 'mul', 'value': '7', 'type': 'int' }, 77 | {'name': 'exclude', 'value': '1', 'type': 'int' }, 78 | ], 79 | )}} 80 | select num * :mul as result 81 | from {{ ref('base') }} 82 | where num <= :exclude 83 | """ 84 | ) 85 | models[f"{str(i)}_ql.sql"] = ql_model 86 | return models 87 | 88 | def test_adapter_methods(self, project, equal_tables): 89 | run_dbt(["compile"]) # trigger any compile-time issues 90 | result = run_dbt() 91 | workspace = result.results[0].node.schema 92 | rs = RocksetClient(api_key=os.getenv("API_KEY"), host=os.getenv("API_SERVER")) 93 | # Ensure all the QLs are visible 94 | for i in range(NUM_QLS): 95 | ql_name = f"{str(i)}_ql" 96 | ql_tag = str(i) 97 | ql_retrieved = False 98 | for retry in range(5): 99 | try: 100 | resp = rs.QueryLambdas.get_query_lambda_tag_version( 101 | query_lambda=ql_name, workspace=workspace, tag=ql_tag 102 | ) 103 | ql_retrieved = True 104 | break 105 | except ApiException as e: 106 | print("Couldn't get ql " + ql_name + " : " + str(e)) 107 | time.sleep(retry) 108 | assert ql_retrieved, f"{ql_name} was not visible" 109 | 110 | 111 | class TestQueryLambdaCreationRockset(BaseAdapterMethod): 112 | @pytest.fixture(scope="class") 113 | def models(self): 114 | return { 115 | "base.sql": models__base_sql, 116 | f"{ql_name}.sql": models__ql_sql, 117 | } 118 | 119 | def test_adapter_methods(self, project, equal_tables): 120 | run_dbt(["compile"]) # trigger any compile-time issues 121 | result = run_dbt() 122 | workspace = result.results[0].node.schema 123 | rs = RocksetClient(api_key=os.getenv("API_KEY"), host=os.getenv("API_SERVER")) 124 | resp = rs.QueryLambdas.get_query_lambda_tag_version( 125 | query_lambda=ql_name, workspace=workspace, tag=tags[0] 126 | ) 127 | ql_version = resp.data.version.version 128 | tag_resp = rs.QueryLambdas.list_query_lambda_tags( 129 | query_lambda=ql_name, workspace=workspace 130 | ) 131 | returned_tags = {x.tag_name for x in tag_resp.data} 132 | assert returned_tags == (set(tags) | {"latest"}), "Expected tags must match" 133 | 134 | exec_resp = rs.QueryLambdas.execute_query_lambda( 135 | query_lambda=ql_name, version=ql_version, workspace=workspace 136 | ) 137 | assert exec_resp.results == [{"result": 7}], "QL result must match" 138 | 139 | 140 | class TestQueryLambdaUpdatesRockset(BaseAdapterMethod): 141 | @pytest.fixture(scope="class") 142 | def models(self): 143 | return { 144 | "base.sql": models__base_sql, 145 | f"{ql_name}.sql": models__ql_sql, 146 | } 147 | 148 | def test_adapter_methods(self, project, equal_tables): 149 | result = run_dbt(["compile"]) # trigger any compile-time issues 150 | workspace = result.results[0].node.schema 151 | # Create a ql, so the dbt created one must be updated 152 | rs = RocksetClient(api_key=os.getenv("API_KEY"), host=os.getenv("API_SERVER")) 153 | # Wait for workspace to be created from dbt compile 154 | for _ in range(10): 155 | time.sleep(1) 156 | if workspace in {ws.name for ws in rs.Workspaces.list().data}: 157 | break 158 | resp = rs.QueryLambdas.create_query_lambda( 159 | name=ql_name, 160 | workspace=workspace, 161 | sql=QueryLambdaSql(query="SELECT 1", default_parameters=[]), 162 | ) 163 | result = run_dbt() 164 | resp = rs.QueryLambdas.get_query_lambda_tag_version( 165 | query_lambda=ql_name, workspace=workspace, tag=tags[1] 166 | ) 167 | ql_version = resp.data.version.version 168 | tag_resp = rs.QueryLambdas.list_query_lambda_tags( 169 | query_lambda=ql_name, workspace=workspace 170 | ) 171 | returned_tags = {x.tag_name for x in tag_resp.data} 172 | assert returned_tags == (set(tags) | {"latest"}), "Expected tags must match" 173 | 174 | exec_resp = rs.QueryLambdas.execute_query_lambda( 175 | query_lambda=ql_name, version=ql_version, workspace=workspace 176 | ) 177 | assert exec_resp.results == [{"result": 7}], "QL result must match" 178 | ql_versions = rs.QueryLambdas.list_query_lambda_versions( 179 | query_lambda=ql_name, workspace=workspace 180 | ) 181 | assert len(ql_versions.data) == 2, "Two version have been created" 182 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | skipsdist = True 3 | envlist = py37,py38,py39,py310,py311 4 | 5 | [testenv:{unit,py37,py38,py39,py310,py311,py}] 6 | description = unit testing 7 | skip_install = true 8 | passenv = 9 | DBT_* 10 | PYTEST_ADDOPTS 11 | commands = {envpython} -m pytest {posargs} tests/unit 12 | deps = 13 | -rdev-requirements.txt 14 | -e. 15 | 16 | [testenv:{integration,py37,py38,py39,py310,py311,py}-{ rockset }] 17 | description = adapter plugin integration testing 18 | skip_install = true 19 | passenv = 20 | DBT_* 21 | ROCKSET_TEST_* 22 | PYTEST_ADDOPTS 23 | commands = 24 | rockset: {envpython} -m pytest {posargs} -m profile_rockset tests/integration 25 | rockset: {envpython} -m pytest {posargs} tests/functional 26 | deps = 27 | -rdev-requirements.txt 28 | -e. 29 | --------------------------------------------------------------------------------