├── LICENSE ├── README.md ├── array_type └── array_type.md ├── bytecode_interpreter └── bytecode_interpreter.md ├── codegen_cache └── codegen_cache.md ├── postgres_parser └── postgres_parser.md ├── ssl └── ssl.md ├── template.md └── zone_maps └── zone_maps.md /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # peloton-design 2 | 3 | Before you work on a major feature, create a pull request for your design doc in this repo. 4 | 5 | Create a folder for each design doc, with the doc itself in .md and any resources 6 | (diagram you include in the .md file, for example) in the folder. If you use a drawing 7 | software, it will be great to include the source file so others can modify it later on. 8 | -------------------------------------------------------------------------------- /array_type/array_type.md: -------------------------------------------------------------------------------- 1 | # Array Type 2 | 3 | ## Overview 4 | Array type is a type for `Value`, meaning that `Value` represents an variable-length array of a built-in or user-defined base type, enum type, or composite type. We add array type to replace hacks that convert an array to VARCHAR and further to support queries containing arrays. 5 | 6 | ## Scope 7 | Modifications to replace the hacks is mainly based on `Type` and `Value`, and supporting queries also involves `expression`/`parser`/`planner`/`traffic_cop`. The corresponding type ids need to be added. 8 | 9 | We create a `ArrayType` class inherited from `Type` class, incorporate array type into constructors and destructor of `Value`, and update `ValueFactory`. 10 | 11 | The implementation can be broken down into the following: 12 | 13 | ### ArrayType Class ### 14 | * Provides functions to express, access and manipulate arrays 15 | * Overloads virtual comparison functions 16 | * Defines how to serialize or deserialize arrays into or from a memory pool 17 | 18 | ### Supporting Queries ### 19 | * *expression*: add `ArrayExpression` inherited from `AbstractExpression` to wrap a `Value` of array type. 20 | * *parser*: add functions to take in Postgres ArrayExpr primnode and transfers it into Peloton `AbstractExpression` 21 | * *planner*: add a branch in insert plan for Array Expression type 22 | * *traffic_cop*: add selections for array field_types 23 | 24 | ## Architectural Design 25 | Vectors are utilized to maintain arrays and we store a pointer to its vector inside `Value`. 26 | 27 | `manage_data` can be indicated if we want to retain ownership of the vector when passing it to create a value. We should set it to `true` when we want to keep ownership, which means that we should make sure memory leaks will not happen by ourselves. For example, we can declare a vector variable, in which case the memory is allocated on stack and will be deallocated automatically. Or we have to delete a vector manually after newing one. On the other hand, it should only be `false` when memory of the vector is allocated on heap and we don't want to be responsible for deallocating it once the value it belongs to is created. The reason for the latter option is to improve performance since only pointers to vectors are passed through the whole routine in this case, rather than actual values. And the memory of vectors will be released when the value is destructed. 28 | 29 | `SerializeTo` in `ArrayType` serializes a array value into a given storage space. The first 4 bytes is the total number of elements in the array and then the remaining bytes are the elements. The type of element can be determined by type id, an attribute in `Value`, which can be accessed during serialization and deserialization, so we don't have to store it in the space. `DeserializeFrom` deserializes a value of the given type from a given storage space in reverse order. 30 | 31 | ## Design Rationale 32 | The goal of this design is to fit array type implementation smoothly into the existing type module. 33 | 34 | We adopt the single type id scheme for array type to avoid function overloading and routine changes, that is to say, we have a separate type id for every primitive type array, sharing the same Array Class. 35 | 36 | ## Testing Plan 37 | ### Serialization and Deserialization ### 38 | `SerializeTest` serializes array values and then deserialize them to check if they remain the same. 39 | 40 | ### Element Retrieval ### 41 | `GetElementTest` creates vectors of different types and inserts *n* elements into each vector. Each vector is used to create a value, and each element retrieved from the value should be equal to matching element in the vector. 42 | 43 | ### Comparison Functions ### 44 | `CompareTest` contains various comparison function tests. 45 | 46 | ### Client Queries ### 47 | `QueryTest` tests queries coming in from SQL that contains arrays. 48 | 49 | ## Trade-offs and Potential Problems 50 | 51 | ## Future Work 52 | Array type are fully supported internally. For queries from clients, however, this is only for the traditional interpreter currently and additional work is required for codegen. 53 | 54 | We can support multidimensional array by adding a `TypeId::ARRAYARRAY` type id, the `Value` which belongs to contains a vector of `Value`. For example, to represent a two-dimensional array of integer, each `Value` in the vector can have a `TypeId::INTEGERARRAY` type id, containing a vector of integers. If we want to represent a three-dimensional array, then each `Value` in the vector can have a `TypeId::ARRAYARRAY` type id again, and so on and so forth. 55 | 56 | The following functions need to be overloaded if we want queries involving arrays to benefit from optimizers. 57 | ``` 58 | virtual bool operator==(const AbstractExpression &rhs) const; 59 | virtual bool operator!=(const AbstractExpression &rhs) const { 60 | return !(*this == rhs); 61 | } 62 | virtual hash_t Hash() const; 63 | virtual bool ExactlyEquals(const AbstractExpression &other) const; 64 | virtual hash_t HashForExactMatch() const; 65 | ``` 66 | -------------------------------------------------------------------------------- /bytecode_interpreter/bytecode_interpreter.md: -------------------------------------------------------------------------------- 1 | # Bytecode Interpreter 2 | 3 | ## Overview 4 | 5 | The Bytecode Interpreter allows to interpret LLVM IR generated by codegen instead of compiling it. 6 | 7 | ## Scope 8 | * All files are in the namespace `peloton::codegen::interpreter` 9 | * `codegen::Query` can decide whether to compile and run the query or to interpret it. 10 | 11 | ## Architectural Design 12 | 13 | The Bytecode Interpreter consists of three components (BytecodeFunction, BytecodeBuilder and BytecodeInterpreter), which are explained in detail below. 14 | 15 | ### 1. BytecodeFunction 16 | Holds all information, that is required to interpret a function, and is completely independent from the CodeContext it was created from (except for tracing information in Debug mode). Once created, it can be executed many times. However, it is not supposed to be saved, but rather to be created on-the-fly before execution. 17 | 18 | It contains: 19 | 20 | * The Bytecode itself 21 | * Information needed to create the activation record: 22 | * Number of value slots to create 23 | * Number of function arguments (to check if given number is correct) 24 | * Which constants to initialize 25 | * Call Contexts for external function calls 26 | * Other Bytecode Functions for internal function calls 27 | * In Debug mode: tracing information 28 | 29 | All references in the Bytecode Function are done with indexes: 30 | 31 | * Values are referenced with indexes for the value slots 32 | * Instructions are referenced with indexes for the instruction stream 33 | 34 | #### Bytecode Internals 35 | 36 | All bytecode instructions are defined in `bytecode_definitions.def`. Most instructions are typed and get automatically expanded to their type instances. Typed instructions are accessed by there suffix `_`, e.g. `add_i8`. Type expansion is supported for integer types, floating point types and both. 37 | 38 | Most instructions follow the pattern of having a 2 byte opcode and several 2 byte arguments. Instructions usually fit in a single instruction slot, some instructions however require two or more instruction slots. 39 | 40 | *There is no documentation about the structure of all bytecode instructions. To find out about the arguments of a bytecode instruction, find the corresponding translate function in the BytecodeBuilder and the handler in the BytecodeInterpreter. * 41 | 42 | The 2 byte arguments in the bytecode instructions can be one of the following: 43 | 44 | 1. A value slot index (most arguments in most instructions) 45 | 2. A instruction slot index (for branch instruction) 46 | 3. An immediate value (only for alloca, gep_offset and extractvalue) 47 | 48 | Value Slot zero is a dummy slot. It is used for unused values that have to be put somewhere. This way we avoid handling the cases where wo do not want to use a value. 49 | 50 | ### BytecodeBuilder 51 | Creates a BytecodeFunction object from a LLVM function inside a CodeContext, that can be handed to the BytecodeInterpreter. 52 | 53 | `interpreter::BytecodeFunction bytecode = interpreter::BytecodeBuilder::CreateBytecodeFunction(code_context_, query_funcs_.plan_func);` 54 | 55 | If the LLVM function can not be translated (because of missing features/types) a `NotSupportedException` is thrown. 56 | 57 | The BytecodeBuilder works in 4 steps: 58 | 59 | 1. Analysis 60 | 2. Register Allocation 61 | 3. Translation 62 | 4. Finalization 63 | 64 | Internal the BytecodeBuilder assigns indexes to LLVM values and instructions, which are usually only accessed by raw pointers. They are used to access those in a continuous way, merge several LLVM values and to compute liveliness. 65 | Those indexes stay inside the BytecodeBuilder and do not end up in the BytecodeFunction. They are completely independent from the indexes in the BytecodeFunction! 66 | 67 | #### 1. Analysis 68 | Analysis the LLVM function and creates additional information, but doesn't create bytecode. 69 | 70 | * Determining liveness of all LLVM values (definition and last usage) 71 | * Linear scan algorithm using reverse post order traversal of basic blocks 72 | * The scheduling of the basic blocks is determined by the reverse post order traversal to make this work 73 | * Merges LLVM values that are equivalent, e.g. when LLVM instructions translate to NOPs 74 | * Merges LLVM constants, which have the same value 75 | 76 | #### 2. Register Allocation 77 | Global register allocation: maps every LLVM Value in the LLVM function to a value slot (register). (Naive register allocation can be turned on for debugging purposes) 78 | 79 | #### 3. Translation 80 | Translates every LLVM instruction into a bytecode instruction (if not NOP) and places it in the bytecode stream. 81 | 82 | As the translation is done in one pass and the instructions can have different size, the destination instruction index for branch instructions is not known when creating the instruction. Therefore these relocations are saved and applied at the end of the pass, when all indexes are known. 83 | 84 | At the end of each basic block, mov instructions are created to resolve the Phi's referencing the basic block. 85 | 86 | Because of the Phi swap problem (lost copy) it can happen, that during translation additional value slots are needed that have not been mapped by the register allocation. The number of additional temporary value slots is tracked and added to the overall number of value slots during finalization. 87 | 88 | #### 4. Finalization 89 | Calculate overall number of required value slots (including the temporary ones added in the translation pass) and prepare the data structures in the Bytecode Function for creating the activation record. 90 | 91 | 92 | ### BytecodeInterpreter 93 | Takes a Bytecode Function and executes the function it was created for by interpreting the bytecode. 94 | 95 | `return_value = interpreter::BytecodeInterpreter::ExecuteFunction(context, {args, ...});` 96 | 97 | For every invocation of a function, a new activation record is created, so recursive calls are actually possible. 98 | 99 | 100 | ## Debugging 101 | For debugging, the content of a Bytecode Function can be dumped to a file by calling `bytecode.DumpContents()`. 102 | 103 | In log level LOG_TRACE the interpreter will log every single executed instruction and every value assignment. This produces a lot of output! The tracing will become more detailed in future updates. 104 | 105 | 106 | ## Testing Plan 107 | Most methods in the Bytecode Interpreter and Builder are hard to test, as they require a lot of context. The test cases basically test some edge cases in the Translation/Execution. 108 | 109 | 110 | ## Trade-offs and Potential Problems 111 | Known limitations: 112 | * LLVM vector types are not supported 113 | * LLVM values are restricted to max. 8 byte 114 | * Workaround for overflow intrinsics available 115 | 116 | *Every query except OrderBy, that is currently created by our test cases, is supported* 117 | 118 | ## Future Work 119 | * Add functors to allow OrderBy execution 120 | * Add call wrappers for the 20% most called external functions to avoid libffi overhead. 121 | * Compile queries in background and switch to native execution once the compilation finished. 122 | * Create LLVM-like execution trace or/and a Debugger interface 123 | 124 | ## Glossary 125 | RPO 126 | : Reverse Post Order - a way to traverse the control flow graph, needed for linear scanning register allocation 127 | 128 | IP 129 | : Instruction Pointer - refers to the interpreter IP (not the actual IP of the processor) 130 | -------------------------------------------------------------------------------- /codegen_cache/codegen_cache.md: -------------------------------------------------------------------------------- 1 | # Codegen Cache and Parameterization 2 | 3 | ## Overview 4 | This feature provides performance enhancement of the query compilation in Peloton's codegen execution engine, by caching compiled queries and reusing them for the next user requests. The constants and parameters are parameterized, so that the queries with an _similar_ plan would have the benefit of caching. 5 | 6 | ## Scope 7 | The query cache feature is based on plan/expression/schema/table comparisons. These classes provide hash functions and equality(and non-equality) operators, in order for the query cache to compare plans using a map. The plans also provide a function to retrieve parameter values out of them. 8 | 9 | The parameterization feature requires extracting parameter information from the plans and sets up some sort of parameter value storage. The codegen translators retrieve the values from this storage at runtime. 10 | 11 | In a brief summary, it contains modifications on the following modules: 12 | 13 | ### Plan Comparison ### 14 | * *catalog* provides hash/equality checks on `Schema` 15 | * *storage* provides Hash/equality checks on `DataTable` 16 | * *expression* provides Hash/equality checks on expressions such as `AbstractExpression`, `CaseExpression`, `ConstantValueExpression`, `ParameterValueExpression` and `TupleValueExpression` 17 | * *planner* provides Hash/equality checks on plans such as `AggregatePlan`, `DeletePlan`, `HashJoinPlan`, `HashPlan`, `InsertPlan`, `OrderByPlan`, `ProjectInfo`, `ProjectionPlan`, `SeqScanPlan` and `UpdatePlan` 18 | 19 | ### Parameter Retrieval ### 20 | * *planner* provides functions that retrieves parameter value/information 21 | * *executor* retrieves parameter information and hands it over to codegen 22 | * *codegen* builds up a value cache for the runtime to read the values at execution 23 | 24 | ### Query Cache and Execution ### 25 | * *execution* checks the query cache before executing a query 26 | * *codegen* provides query cache and parameter value cache. It also adds parameter value translator and changes expression translators to retrieve the cached parameter values 27 | 28 | ### Refactoring INSERT ### 29 | * *planner* modifies `InsertPlan` to store values in a vector, rather than in `storage::Tuple` when values are provided directly from SQL 30 | * *execution* modifies the legacy `InsertExecutor` to receive a vector of values from the planner, without removing the old way of receiving `storage::Tuple` 31 | * *codegen* is also modified to receive a vector of values, so that the values can be parameterized 32 | 33 | ## Architectural Design 34 | The overall architecture of Peloton and the entire flow of a query execution remains the same as before, but the exeuction now bypasses the compilation stage when there has been an _equal_ query executed before. An equal query here is defined as a query with the same plan, but with different constant values and/or different parameter values. The decision, whether the cahced queries are used, is made in `executor::PlanExecutor` before a codegen query execution. 35 | 36 | `QueryCache` keeps all the compiled queries in a hash table, which is global in Peloton. A search for an equal query is executed by `Hash()` and `operation==` that are provided by the plan that is going to be compared. Once the equality comparison succeeds, `PlanExecutor` obtains the previously compiled query, a `Query` object, from the cache and this query object is executed through its `Execute()` function. The `Query` object contains a set of compiled functions that can be executed inside LLVM. 37 | 38 | `QueryParameters` contains all the parameter information and the constant values by extracting it from provided plan, and also the parameter values from the original parameter value vector in Peloton. This is also achieved before the codegen execution engine is involved. 39 | 40 | `ParameterCache` stores the values from `QueryParameters`, which get indexed so that the codegen translators retrieve its actual value by an index. The structure is statically determined at compile time, and dynamically changing the size is not possible in the current design. In other words, the number of parameters are fixed at compile time. This is a reason why we currently have a separate codegen executable for `Insert` with different bulk insert tuple number. 41 | 42 | In addition, `InsertPlan` is refactorized not to build a `storage::Tuple` in the case of the query coming in from SQL. It stores values in a vector, and these values in the vector get parameterized. 43 | 44 | ## Design Rationale and Limitation 45 | The goal of this design is to fit a query cache implemenation smoothly into the existing Peloton architecture. The logic for plan comparison is implemented in the plans in order for these to be updated whenever there are changes in the plans. 46 | 47 | It does not change a plan in order to optimize the cache performance, i.e. 2+2 is different from 4 in the perspective of `QueryCache`. In addition, it does not re-order the expressions to find a match in the cache. We think that these are a role of the optimizer. 48 | 49 | ## Testing Plan 50 | ### Plan comparisons ### 51 | All the codegen supported queries are tested in `test/planner/planner_equality_test`. Not all the permutations are tested, but most of the basic and many complex ones are added and tested. 52 | 53 | ### Parameter Retrieval ### 54 | `test/codegen/parameterizaton_test` contains various parameterization tests. 55 | 56 | ### Query Cache and Execution ### 57 | `test/codegen/query_cache_test` contains query cache tests including execution. 58 | 59 | ### Refactoring Insert ### 60 | `test/sql/insert_sql_test` contains most of the INSERT related SQL tests. `test/codegen/insert_translator` and `test/executor/insert_test` tests the codegen and the old execution engine respectively after refactorization. 61 | 62 | ## Trade-offs and Potential Problems 63 | The query cache capacity is infinite unless there is a request from outside, setting the capacity. In other words, the query cache itself provides an API to resize the cache, but no admin feature is implemented. 64 | 65 | ## Future Work 66 | We could pre-populate the cache with some basic and essential queries when a table is generated. One such obvious query is an INSERT, since a data table is useless if there is no tuple in it. 67 | 68 | The query cache is global at the current implementation. In other words, there is one query cache in one Peloton instance. A query cache can be instantiated for each table, so that it is natural to destroy when a table is destroyed, and the cache can be managed in a more micro manner, e.g. size, turn on/off, etc. 69 | 70 | The query cache can be persistently stored in a persistent storage, and retrieved back to the memory cache when Peloton reboots. Checkpointing the cache to the storage in a regular time interval in backgound would suffice. 71 | 72 | ## Glossary 73 | * Query cache: cache of the query objects with compiled object for codegen execution, which is different from physical plan cache or cache 74 | -------------------------------------------------------------------------------- /postgres_parser/postgres_parser.md: -------------------------------------------------------------------------------- 1 | # Postgres Parser 2 | 3 | ## Overview 4 | The Postgres Parser component serves as the parsing module for Peloton. Prior to Postgres Parser, the system was running with a customized parser, which did not have a good enough coverage of the necessary types of queries and had some flaws. The creation of Postgres Parser was for the sake of providing reliable parsing for standard PostgreSQL queries so that the team can run a range of benchmarks on the system. The fact that Postgres Parser is not a real parser means that it is just a temporary solution, and if we want anything more than what PostgreSQL supports, we'll need a new parser module or an extra parser module. 5 | 6 | ## How It Works 7 | ### Pipeline ### 8 | * Get Postgres' parse tree by feeding the input query to *pg_query_parse* function. 9 | * Pass the Postgres' parse tree, which is of type List pointer, to *ListTransform* function. 10 | * The *ListTransform* function will then traverse the list of parse nodes and call *NodeTransform* on each single one of them. 11 | * The *NodeTransform* function will then call the corresponding transform function for the types of parse node. 12 | 13 | ### Structure ### 14 | Each transform function share the same basic logic: Traverse each field of the generated Postgres parse tree node and fill in the corresponding field in Peloton parse tree. If there's any necessary subsequent transform (e.g. transform of a sub-select in a select query), the transform function will call the corresponding transform function and wait for its result for filling in the corresponding field. \newline 15 | Note that each transform function is only aware of its own sub-tree of the parse tree, thus knowing nothing about the rest parts in the parse tree, this design is to accommodate the nature of SQL parse trees and to provide re-usability of all transform functions. For example, a function responsible for transforming a constant value parse node can be used by the transform functions of both select queries and insert queries, and it does not, and should not be aware of which type of query it is working on. 16 | 17 | ### Exception Handling and Memory Management ### 18 | The current parser module only covers types of features that our system has encountered, there will definitely be features or types of queries or parse nodes not supported by the current version. The module is designed to throw NotImplementedException under such cases to let the user or developer be aware. On the other hand, when the input queries have some problems, the parser module should be able to identify that and throw exceptions. 19 | With the above being said, one may notice that the exception can happen at any point of the transform, thus proper handling of exceptions is very important. There are two situations where the current version of parser module may encounter exceptions. One situation is that when the input query has some grammar problems, for this case the *libpg_query* library will automatically mark the "error" field in the generated Postgres parse tree, our module will take it from there and throw and exception with the corresponding error message. The other situation is that when some transform function encounters a type of parse node or a feature that is not supported yet, and it will throw NotImplementedException. However, this could result in memory leaks if the exceptions are not properly handled. 20 | The solution to this has 2 phases: For the first phase, we want to add memory management code along the calling paths so that no memory leak will happen when exceptions do occur in the future. For the second phase, we want to migrate to using smart pointers in the module so that we can utilize the underlying automatic memory management provided by the standard library. For the first phase, we should utilize the idea of defensive programming, every time we need to call some transform function, we should wrap a try-block around it and perform necessary memory management if necessary once we catch any exception. For the second phase, it requires a big change to the current code thus takes a longer time to complete, but the principle is quite easy which is to change the raw pointers to smart pointers. 21 | 22 | ## Development Guidelines 23 | ### Adding Support for Extra Node Types in Existing Transform Functions ### 24 | One common type of addition to the parser module is that we notice that some queries introduce new types of child parse node to some kinds of parse trees that already have corresponding transform functions in our system, and the new type of child also has existing transform functions. So this is the easiest case, we can simply add a new case to the switch logic in the caller transform function to call the proper existing transform functions of the new types of child. If the child happens to encounter new types of child, follow this same guideline. 25 | A good example for this is that we already support transform for TypeCast nodes, and we also have support for ResTarget Nodes. However we were not aware of the case where a TypeCast can be a target. As we already have transform functions for both TypeCast and ResTarget, we can simply add this case to the transform function of ResTarget and let that function call TypeCast's transform function when some target is of TypeCast type. 26 | 27 | ### Adding Support for New Type of Parse Node ### 28 | For this case you should first find the corresponding definition for the parse node type in *third_party/libpg_query/src/postgres/include/nodes/parsenodes.h* and copy the necessary definitions to *src/include/parser/parsenodes.h*. Notice that if the new type of parse node depends on some other types you should also copy those. 29 | Then you should add a new function in *postgresparser.h/postgresparser.cpp* to support this type of parse node. The general way of naming the transform functions is + Transform. For example, the transform function for TypeCast is TypeCastTransform. The transform function usually takes in a pointer of the Postgres parse node type and returns a pointer to the corresponding type of Peloton parse node. For example, SelectTransform which is the transform function for SelectStmt nodes will return a pointer to SelectStatement which is Peloton's own type for select queries (Note SelectStmt's transform is not SelectStmtTransform because SelectTransform is shorter and easier to understand, so for all other kinds of statement types please get rid of the "Stmt" part in the name for transform functions). 30 | 31 | ### Adding Support for New Type of Statements/Expressions ### 32 | This is something similar to Adding support for new type of parse node, but notice that sometimes adding support for a new type of statement would also require additional statement types or re-organization of statement or expression systems. Please discuss with the whole team should this kind of needs pop up. 33 | 34 | ### Style Guides for Transform Functions ### 35 | The most important local variable for every transform functions is the final result of the transform function, which is a pointer to some kinds of Peloton's own parse node. To keep things consistent and easy to understand, please use "result" as the name of the local variable. 36 | To make the interfaces clean and easy to understand, every transform function should explicitly take the corresponding parse node type as the input. For example, ConstTransform is the transform function for *A_Const* parse nodes, so it takes in a pointer to *A_Const* parse node as the argument. 37 | 38 | 39 | ## Future Works 40 | Some work that should be done recently is the refactorization of All CREATE-related statements. There should be a base class(CreateStatement) and various derived classes (CreateTableStatement, CreateIndexStatement, CreateSchemaStatement etc.). 41 | This module should still be around for some good amount of time so that we can have reliable parsing and complete our own parse node system. The main framework is pretty much done for the time being, most of the future work on this module will be adding support for newly encountered types of queries and parse nodes. 42 | On the other hand, Peloton should definitely need its own parsing module as there may be needs for parsing queries that do not belong to legal PostgreSQL queries, this should require a substantial amount of time and effort to complete. I think before we totally get rid of Postgres Parser module, we can have a 2-level design, the first level is still the Postgres Parser, while the second level can be a customized parser only for the new needs. In this way we can implement customized grammars in the second level while still have full support for normal SQL queries in the first level. 43 | -------------------------------------------------------------------------------- /ssl/ssl.md: -------------------------------------------------------------------------------- 1 | # SSL Connection 2 | 3 | ## Overview 4 | SSL connection provides secure communication between the server and the client. 5 | 6 | ## Scope 7 | This feature is in the Network Layer, mainly implemented in Network Manager, Network Connection and Protocol Handler. 8 | 9 | ## Introduction 10 | ### SSL Handshake ### 11 | The SSL or TLS handshake enables the SSL or TLS client and server to establish the secret keys with which they communicate. In the process, they will agree on the protocol version, cryptographic algorithms, do certificate authentication and use asymmetric encryption techniques to generate a shared secret key. 12 | 13 | ### SSL Read (non-blocking) ### 14 | The SSL data needs to be decrypted and verified before it is delivered to the application. The entire SSL record is needed for decryption and verification, so even if the application wants to read only one byte, the OpenSSL needs to receive the entire SSL record containing that byte from the connection. After OpenSSL decrypts and verifies the record, it will place it in its internal SSL buffer and return the number of data that is requested to the application. The SSL buffer is maintained by OpenSSL and the operating system doesn’t know the status of the buffer. Thus, if all the data is read from network buffer to the SSL buffer, select() won’t work since the network buffer is empty. OpenSSL provides a function SSL_pending() to indicate whether there is data left unread in this buffer. 15 | 16 | The error queue needs to be cleared before checking the error code of the current operation.Some error codes are explained here: 17 | * *SSL_ERROR_WANT_WRITE* indicates the OpenSSL is doing a rehandshake and is doing a write during the rehandshake. Need to call SSL_read() again when the socket becomes readable. 18 | * *SSL_ERROR_WANT_READ* indicates the network buffer is empty and it would have been blocked if we set the network socket to be blocking. 19 | 20 | ### SSL Write (non-blocking) ### 21 | Similar issues exist in SSL Write as in Read. If OpenSSL wants to write a record but there is not enough space in the network buffer, it would throw SSL_ERROR_WANT_WRITE error. It will send out the part of data that can fit in the network buffer and need the application to call SSL_write() again using the same application buffer to send out the rest of the SSL packet. OpenSSL automatically remembers where the buffer write pointer was and only writes the data after the write pointer. 22 | 23 | * *SSL_ERROR_WANT_WRITE* indicates that we have unflushed data in the SSL buffer. We need to call SSL_write() again with the same application buffer. 24 | * *SSL_ERROR_WANT_READ* indicates the OpenSSL is doing a rehandshake and is doing a read during the handshake. Need to call SSL_write() again when the socket becomes writable. 25 | 26 | ### SSL Rehandshake ### 27 | Automatic rehandshake support is built into SSL_read() and SSL_write(). 28 | 29 | ### SSL Authentication ### 30 | Server can set the level of certificate-based client authentication of whether to attempt or enforce a certificate-based client authentication. It is done by SSL_CTX_set_verify(). 31 | 32 | ### SSL Session ### 33 | OpenSSL uses a session object to store the session information defining a set of security parameters and can be shared by multiple SSL connections. Session objects need to be associated SSL connections or SSL_CTX using session ID context. 34 | 35 | ## Architectural Design 36 | The SSL component is added to the previous Network Layer in this way: 37 | 38 | ### NetworkManager ### 39 | Maintains a global SSL context which loads server certificate, private key file and settings of client authentication, server session caching, callback registration for multithreaded environment, etc for all connections. The SSL context is initialized when Peloton starts up. 40 | 41 | ### NetworkConnection ### 42 | 1. For the startup packet, if its version code is of SSL protocol, check whether the server supports SSL and prepares for SSL handshake. 43 | 2. Perform SSL read and write in FillReadBuffer() and FlushWriteBuffer() for SSL connections. 44 | 3. For SSL connection, in the state transitions in StateMachine, manually check whether it is doing rehandshake or whether there is data unread in the SSL buffer. 45 | 46 | ### ProtocolHandler ### 47 | Process the SSL request packet at the beginning of a SSL connection. 48 | 49 | ### Certificates and Keys ### 50 | Certificates and keys of root, server and client are copied into `/peloton/data` directory referring PostgreSQL design. 51 | Certificates and keys are created for Peloton SSL default setting and testing. `/peloton/data/intermediate_openssl.cnf` and `/peloton/data/openssl.cnf` are created as Peloton default configuration files used in certificates creation process. Peloton has two bash scripts `/peloton/script/installation/create_certificates.sh` and `/peloton/script/installation/create_certificates_test.sh`. The create_certificates.sh will create default required certificates in `/peloton/data`. And create_certificates_test.sh creates complete directory including intermediate files in `/peloton/data` for testing. 52 | 53 | ### Multithread Support ### 54 | Some global data structures(error queue, global SSL_CTX, etc) are not thread safe in OpenSSL. OpenSSL uses mutex to protect these structures and requires applications to provide two extra callbacks(CRYPTO_set_id_callback and CRYPTO_set_locking_callback). Peloton uses static locks which is initialized in NetworkManager. 55 | 56 | Version issue. For OpenSSL version greater than 1.1.0, the application does not need the extra callbacks above. For Linux, APT brings version lower than 1.1.0 right now. For new version of MAC OS X, it supports even older version of OpenSSL. New interfaces like `CRYPTO_THREADID_set_callback` are not supported so we still use the relative deprecated API `CRYPTO_set_id_callback`. 57 | 58 | ## Testing Plan 59 | ### PostgreSQL ### 60 | PostgreSQL provide protection in different modes. In PostgreSQL terminal, we can set sslmode parameter like `verify-full` or `verify-ca`, and providing the system with a root certificate to verify against. If our SSL component works, “SSL connection (protocol: TLSv1, cipher: AES256-SHA, bits: 256, compression: off)” will pop up in the terminal. We write unit test to check it. 61 | 62 | ### JDBC ### 63 | We add `/peloton/script/testing/jdbc/src/SSLTest.java`. In that test file we set some properties for JDBC to support SSL. 64 | 65 | We also write bash script `/peloton/script/testing/jdbc/load_rootcrt_ssl.sh` to import root certificate to JRE to test. 66 | 67 | ## Trade-offs and Potential Problems 68 | 69 | ## Future Work 70 | 71 | ## Reference 72 | https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.1.0/com.ibm.mq.doc/sy10660_.htm 73 | http://www.linuxjournal.com/article/5487?page=0,0 74 | https://linux.die.net/man/3/ssl_accept 75 | https://www.pgcon.org/2014/schedule/attachments/330_postgres-for-the-wire.pdf 76 | http://h41379.www4.hpe.com/doc/83final/ba554_90007/ch04s02.html 77 | https://www.quora.com/Digital-Certificates-What-is-the-difference-between-usecase-for-pem-pfx-fp-cer-crt-etc-files-and-how-can-they-be-converted-to-one-another 78 | https://www.postgresql.org/docs/9.1/static/libpq-ssl.html 79 | -------------------------------------------------------------------------------- /template.md: -------------------------------------------------------------------------------- 1 | # Component Name 2 | 3 | ## Overview 4 | 5 | >What motivates this to be implemented? What will this component achieve? 6 | 7 | ## Scope 8 | >Which parts of the system will this feature rely on or modify? Write down specifics so people involved can review the design doc 9 | 10 | ## Glossary (Optional) 11 | 12 | >If you are introducing new concepts or giving unintuitive names to components, write them down here. 13 | 14 | ## Architectural Design 15 | >Explain the input and output of the component, describe interactions and breakdown the smaller components if any. Include diagrams if appropriate. 16 | 17 | ## Design Rationale 18 | >Explain the goals of this design and how the design achieves these goals. Present alternatives considered and document why they are not chosen. 19 | 20 | ## Testing Plan 21 | >How should the component be tested? 22 | 23 | ## Trade-offs and Potential Problems 24 | >Write down any conscious trade-off you made that can be problematic in the future, or any problems discovered during the design process that remain unaddressed (technical debts). 25 | 26 | ## Future Work 27 | >Write down future work to fix known problems or otherwise improve the component. 28 | 29 | -------------------------------------------------------------------------------- /zone_maps/zone_maps.md: -------------------------------------------------------------------------------- 1 | # Zone Maps 2 | 3 | ## Overview 4 | 5 | Zone Maps are implemented to improve scan times for predicates that might be highly selective. Zone Maps store metadata information about each column in the tile group like min and max. This can be used to skip a tile group based on the predicate value. Thus for highly selective predicate values we might ending up skipping a large number of tile groups improving our scan times tremendously. 6 | 7 | ## Scope 8 | The implementation of Zone Maps in Peloton can be broken down into the following:- 9 | 1. **Zone Map Manager** : Any part of the system (ex Brain, Table Scan etc) should use the APIs provided by this Manager. This manager/indirection provides cleaner APIs, hiding the dirty work behind it. Using these APIs one can perform. 10 | - Creation of Zone Maps for a table / tile group . 11 | - Deletion of a Zone Map for a Tile group. 12 | - Reading of Zone Map and comparing against Predicates 13 | - *Any other feature that goes in the zone map later should be added in the manager itself* 14 | 2. **Zone Map Catalog** : Peloton stores the zone maps of the entire system in a single catalog table. Each row of this catalog has the following columns. 15 | - database_id 16 | - table_id 17 | - tile_group_id 18 | - column_id 19 | - minimum 20 | - maximum 21 | - type 22 | - *In the absence of Array Types, we are currently storing minimum and maximum as VARCHAR and hence we store the type to deserialize it.* 23 | 3. **Additional Wiring / CallBacks / Proxies** : Primarily in place for the beast codegen to consume the Zone Maps. While iterating over the tilegroups, the decision to skip/scan a tile group is done through a simple call. This just adds one line of code in the `table.cpp` while iterating over the tile groups to scan. *Sidenote: If at some point its decided to not use Zone Maps, you can just delete this one line but this would break my heart* 24 | 25 | ## Architectural Design 26 | The overall flow of Peloton is mostly unchanged except the following:- 27 | 1. In `table_scan_translator.cpp` we check if the predicate is Zone Mappable. If yes then we parse the predicates to get an array of predicates in which each element is a struct having the following parameters: 28 | - column_id 29 | - Comparison Operator ID 30 | - predicate value 31 | 2. In `table.cpp` we invoke the Zone Map Manager while iterating over the Tile Groups to return a True/False. A True indicates we need to scan the tile group and vice versa. The API(`ComparePredicateAgainstZoneMap`) which returns the True/False takes the predicate array and compares with the metadata present in the Zone Map Catalog. 32 | 33 | ## Design Rationale 34 | The Zone Maps are stored in catalog to make them transactional. Earlier designs which stored Zone Maps on the heap with each tile group header having a pointer to the Zone Map was removed because it did not provide the transaction semantics. With Zone Maps in Catalog, we get the transactional capabilities for free. 35 | 36 | ## Testing Plan 37 | 1. Basic Zone Map Contents Test. 38 | - This checks whether the min and max for all tile groups and columns are accurate. 39 | 2. Transaction Level GC Manager Test - for Immutability 40 | - Creates a table and makes some tile groups immutable. 41 | - Deletion of slots in immutable tile groups are checked for no recycling. 42 | 3. Comparing different predicates against zone maps. 43 | - Single Column Predicates on (col_a = x, col_a < x, col_a > x) 44 | - Single Column Conjunction Predicate ( col_a > x and col_a < y) 45 | - Multi Column Conjunction Predicate ( col_a > x and col_a < y and col_b > p and col_b < q) 46 | 4. End to End Tests with Zone Map 47 | - Conjunction and Simple Predicate tests just like table scan translator tests but now with Zone maps created which check for correctness. 48 | 49 | ## Trade-offs and Potential Problems 50 | By removing the storage of zone maps in catalog we trade off some performance for transaction semantics. Some potential problems are also discussed in Future Work. Storage in catalog in the absence of array types also incurs an over head of serialization and deserialization of the min and max from VARCHAR to their original types. 51 | 52 | ## Future Work 53 | 1. Replacing the BWTree Index lookup with a Hash Map lookup to provide O(1) lookup and unnecessary copying of keys for index lookup. 54 | 2. Storing all column metadata information (min/max) in a single row in catalog once array types are implemented. This saves us multiple trips to the catalog for one tile group if we have predicates on multiple columns. 55 | 3. This should be checked and then done. If we can invoke the Zone Map manager before iterating over the tile groups and ask it to give the list of tile groups that need to be scanned it might give us better performance over invoking Zone Map manager's API each time while iterating over the tile groups. The benefit could be there if we store all tile groups and each columns metadata (min, max) in a single row. Thus in one trip to the catalog we could potentially get the metadata information about all tile groups. Then we could compare against the predicate and return the list of tile groups to scan. 56 | 57 | --------------------------------------------------------------------------------