├── .github └── workflows │ └── conventional-commits.yml ├── .gitignore ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── bin ├── check-component-branch-rebase.sh └── start-build-env.sh ├── build-env-python ├── Dockerfile ├── README.md └── docker-entrypoint.sh ├── build-env ├── Dockerfile ├── README.md └── docker-entrypoint.sh ├── static ├── tdp_logo.png ├── tdp_logo_cmjn.svg ├── tdp_logo_cmjn_white.svg └── tdp_logo_white.png └── ui └── README.md /.github/workflows/conventional-commits.yml: -------------------------------------------------------------------------------- 1 | name: Conventional Commits 2 | on: 3 | pull_request: 4 | types: [opened, reopened, synchronize] 5 | jobs: 6 | check-conventional-commits: 7 | runs-on: ubuntu-latest 8 | steps: 9 | - uses: actions/checkout@v2 10 | - uses: webiny/action-conventional-commits@v1.0.3 11 | 12 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .* 2 | !.github 3 | !.gitignore 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | 2 | # Contributing to TDP 3 | 4 | TDP is an open source project hosted on [GitHub](https://github.com/TOSIT-IO/TDP) administered by the [TOSIT association](https://tosit.fr/). It is released under the [Apache License 2.0](https://github.com/TOSIT-IO/TDP/blob/main/LICENSE). 5 | 6 | Learn more about how to [contribute to TDP ](https://www.trunkdataplatform.io/en/contribute/project/contributing) on our website. 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright 2022 tosit.io 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Trunk Data Platform 2 | 3 |  4 | 5 | Trunk Data Platform is an Open Source, free, Hadoop distribution. 6 | 7 | This distribution is built by EDF (French electricity provider) & DGFIP (Tax Office by the French Ministry of Finance), through an association called TOSIT (The Open source I Trust). 8 | 9 | TDP is built from Apache projects source code. 10 | 11 | ## TDP repositories 12 | 13 | The TDP project is composed of multiple repositories: 14 | - [tdp-collection](https://github.com/TOSIT-IO/tdp-collection): main Ansible collection to deploy TDP core components. 15 | - [tdp-collection-extras](https://github.com/TOSIT-IO/tdp-collection-extras): Ansible collection to deploy extra components that are not part of TDP core. 16 | - [tdp-collection-prerequisites](https://github.com/TOSIT-IO/tdp-collection-prerequisites): Ansible collection to deploy prerequisite components to a TDP installation (i.e.: KDC, PostgreSQL, etc.). 17 | - [tdp-lib](https://github.com/TOSIT-IO/tdp-lib): Python library to configure, manage and deploy TDP. 18 | - [tdp-server](https://github.com/TOSIT-IO/tdp-server): REST API for tdp-lib orchestration. 19 | - [tdp-ui](https://github.com/TOSIT-IO/tdp-ui): Web UI for TDP clusters deployment and configuration, uses tdp-server. 20 | - [tdp-getting-started](https://github.com/TOSIT-IO/tdp-getting-started): A ready to deploy TDP virtual environment based of Vagrant showcasing how to use every component of TDP. 21 | 22 | Each component of TDP also has its own repository. 23 | 24 | ## Trunk Data Platform Release TDP-1.0 components version 25 | 26 | ### TDP Core 27 | 28 | The following table shows the core components of TDP as well as the Apache branch they were based on and the TDP branch which serves as base for our releases. 29 | 30 | | Component | Version | Apache Git branch | TDP Git Branch | TDP commits | 31 | | --------------------------- | ---------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- | 32 | | Apache ZooKeeper | 3.4.6 | release-3.4.6 | XXX | X.X.X | 33 | | Apache Hadoop | 3.1.1-0.0 | [rel/release-3.1.1](https://github.com/apache/hadoop/commits/branch-3.1.1) | [branch-3.1.1-TDP](https://github.com/TOSIT-IO/hadoop/commits/branch-3.1.1-TDP) | [compare](https://github.com/TOSIT-IO/hadoop/compare/branch-3.1.1...branch-3.1.1-TDP) | 34 | | Apache Hive | 3.1.3-1.0 | [branch-3.1](https://github.com/apache/hive/commits/branch-3.1) | [branch-3.1-TDP](https://github.com/TOSIT-IO/hive/commits/branch-3.1-TDP) | [compare](https://github.com/TOSIT-IO/hive/compare/branch-3.1...branch-3.1-TDP) | 35 | | Apache Hive 2 (for Spark 3) | 2.3.9-1.0 | [branch-2.3](https://github.com/apache/hive/commits/branch-2.3) | [branch-2.3-TDP](https://github.com/TOSIT-IO/hive/commits/branch-2.3-TDP) | [compare](https://github.com/TOSIT-IO/hive/compare/branch-2.3...branch-2.3-TDP) | 36 | | Apache Hive 1 (for Spark 2) | 1.2.3-1.0 | [branch-1.2](https://github.com/apache/hive/commits/branch-1.2) | [branch-1.2-TDP](https://github.com/TOSIT-IO/hive/commits/branch-1.2-TDP) | [compare](https://github.com/TOSIT-IO/hive/compare/branch-1.2...branch-1.2-TDP) | 37 | | Apache Tez | 0.9.1-1.0 | [branch-0.9.1](https://github.com/apache/tez/commits/branch-0.9.1) | [branch-0.9.1-TDP](https://github.com/TOSIT-IO/tez/commits/branch-0.9.1-TDP) | [compare](https://github.com/TOSIT-IO/tez/compare/branch-0.9.1...branch-0.9.1-TDP) | 38 | | Apache Spark | 2.3.4-1.0 | [branch-2.3](https://github.com/apache/spark/commits/branch-2.3) | [branch-2.3-TDP](https://github.com/TOSIT-IO/spark/commits/branch-2.3-TDP) | [compare](https://github.com/TOSIT-IO/spark/compare/branch-2.3...branch-2.3-TDP) | 39 | | Apache Spark 3 | 3.2.2-0.0 | [branch-3.2](https://github.com/apache/spark/commits/branch-3.2) | [branch-3.2-TDP](https://github.com/TOSIT-IO/spark/commits/branch-3.2-TDP) | [compare](https://github.com/TOSIT-IO/spark/compare/branch-3.2...branch-3.2-TDP) | 40 | | Apache Ranger | 2.0.0-1.0 | [ranger-2.0](https://github.com/apache/ranger/commits/ranger-2.0) | [ranger-2.0-TDP](https://github.com/TOSIT-IO/ranger/commits/ranger-2.0-TDP) | [compare](https://github.com/TOSIT-IO/ranger/compare/ranger-2.0...ranger-2.0-TDP) | 41 | | Apache Solr (for Ranger) | 7.7.3 | releases/lucene-solr/7.7.3 | XXX | X.X.X | 42 | | Apache HBase | 2.1.10-1.0 | [branch-2.1](https://github.com/apache/hbase/commits/branch-2.1) | [branch-2.1-TDP](https://github.com/TOSIT-IO/hbase/commits/branch-2.1-TDP) | [compare](https://github.com/TOSIT-IO/hbase/compare/branch-2.1...branch-2.1-TDP) | 43 | | Apache Phoenix | 5.1.3-1.0 | [5.1](https://github.com/apache/phoenix/commits/5.1) | [5.1.3-TDP](https://github.com/TOSIT-IO/phoenix/commits/5.1.3-TDP) | [compare](https://github.com/TOSIT-IO/phoenix/compare/5.1...5.1.3-TDP) | 44 | | Apache Phoenix Query Server | 6.0.0-0.0 | [6.0.0](https://github.com/apache/phoenix-queryserver/commits/6.0.0) | [6.0.0-TDP](https://github.com/TOSIT-IO/phoenix-queryserver/commits/6.0.0-TDP) | [compare](https://github.com/TOSIT-IO/phoenix-queryserver/compare/6.0.0...6.0.0-TDP) | 45 | | Apache Knox | 1.6.1-0.0 | [v1.6.1](https://github.com/apache/knox/commits/v1.6.1) | [v1.6.1-TDP](https://github.com/TOSIT-IO/knox/commits/v1.6.1-TDP) | [compare](https://github.com/TOSIT-IO/knox/compare/v1.6.1...v1.6.1-TDP) | 46 | | Apache HBase Connectors | 1.0.0-0.0 | [rel/1.0.0](https://github.com/apache/hbase-connectors/commits/rel/1.0.0) | [branch-2.3.4-1.0.0-TDP](https://github.com/TOSIT-IO/hbase-connectors/commits/branch-2.3.4-1.0.0-TDP) | [compare](https://github.com/TOSIT-IO/hbase-connectors/compare/1.0.0...branch-2.3.4-1.0.0-TDP) | 47 | | Apache HBase Operator tools | 1.1.0-0.0 | [rel/1.1.0](https://github.com/apache/hbase-operator-tools/commits/rel/1.1.0) | [branch-1.1.0-TDP](https://github.com/TOSIT-IO/hbase-operator-tools/commits/branch-1.1.0-TDP) | [compare](https://github.com/TOSIT-IO/hbase-operator-tools/compare/branch-1.1.0...branch-1.1.0-TDP) | 48 | 49 | Versions are approximately based on the [HDP 3.1.5 release](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/release-notes/content/hdp_relnotes.html). 50 | 51 | **Note**: For some projects, the Apache foundation maintains sometimes a branch with this the components on which are backported fixes and features. We will be using these branches as much as possible if they are maintained and compatible. 52 | 53 | ### TDP Extras 54 | 55 | "TDP Extras" carries some projects that cannot be integrated to "TDP Core". There can be different reasons that keep the project outside of the core: 56 | 57 | - The project is not judged as a key component of the Hadoop ecosystem. This is the case of Airflow. 58 | - The project is not active enough. This is the case of Livy that has not been updated in 2 years. 59 | - The project has some incompatibilities with other "TDP Core" projects' releases. This is the case of Kafka 2.X that relies on ZooKeeper 3.5.X (and cannot use the ZooKeeper 3.4.6 of "TDP Core"). 60 | 61 | | Component | Version | Apache Git branch | TDP Git Branch | TDP commits | 62 | | ---------------------------------- | ------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | 63 | | Apache ZooKeeper 3.5.9 (for Kafka) | 3.5.9 | release-3.5.9 | XXX | X.X.X | 64 | | Apache Kafka | 2.8.2 | [2.8](https://github.com/TOSIT-IO/kafka/tree/2.8) | [2.8-TDP](https://github.com/TOSIT-IO/kafka/tree/2.8-TDP) | [compare](https://github.com/TOSIT-IO/kafka/compare/2.8...2.8-TDP) | 65 | | Apache Livy | 0.8.0 | [master](https://github.com/TOSIT-IO/incubator-livy/tree/master) | [branch-0.8.0-TDP](https://github.com/TOSIT-IO/incubator-livy/tree/branch-0.8.0-TDP) | [compare](https://github.com/TOSIT-IO/incubator-livy/compare/master...branch-0.8.0-TDP) | 66 | | Apache Airflow | 2.2.2 | 2.2.2 | XXX | X.X.X | 67 | 68 | **Note:** A project can graduate from "TDP Extras" to "TDP Core" if enough people are supporting it and/or if it is made compatible with all the other projects of the stack. 69 | 70 | ## Tested operating system (OS) 71 | 72 | Only bare metal and virtual machine deployment are tested. Container based OS may work but are not guaranteed. 73 | 74 | - Centos 7 75 | - Rocky 8 76 | 77 | Redhat like OS may work but are not guaranteed. 78 | 79 | ## TDP Components release 80 | 81 | Every TDP initial release is built from a reference branch on the Apache Git repository according to the above tables. The main change from the original branches is the version declaration in the pom.xml files. 82 | 83 | ## Building / Testing environment 84 | 85 | The builds / unit testing of the Maven Java projects of each component above can be run in Kubernetes pods which are scheduled by a Jenkins installation also running on Kubernetes. 86 | Kubernetes pods scheduling allows for **truly** reproducible and isolated builds. Jenkins' strong integration with the Java ecosystem is a perfect match to build the components of the distribution. 87 | 88 | ### Build order 89 | 90 | - hadoop 91 | - tez 92 | - hive1 93 | - spark 94 | - hive2 95 | - spark3 96 | - hive 97 | - hbase 98 | - ranger 99 | - phoenix 100 | - phoenix-queryserver 101 | - knox 102 | - hbase-spark-connector 103 | - hbase-operator-tools 104 | 105 | ### Kubernetes 106 | 107 | Kubernetes was installed on Ubuntu 20.04 Virtual Machines with [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/). 108 | 109 | **Note:** It is strongly recommended to deploy a Storage Class in order to have persistence on the Kubernetes cluster (useful for Jenkins among others). In our case, we are using [Rook](https://rook.io/) on physical drives attached to the Kubernetes cluster's VMs. 110 | 111 | ### Jenkins 112 | 113 | Jenkins is used to trigger the builds which is the same process for every component of the stack: 114 | 115 | - Git clone the sources 116 | - Build the project 117 | - Run the unit tests 118 | - Publish the artifacts to a remote repository 119 | 120 | Jenkins was installed on the Kubernetes cluster with the official [jenkinsci Helm chart](https://github.com/jenkinsci/helm-charts). 121 | 122 | ### Nexus / Docker registry 123 | 124 | The building environment needs multiple registries: 125 | 126 | - Maven to host the compiled Jars 127 | - Docker to host the images that we use to build the projects 128 | - File registry to host the .tar.gz files with the binaries and jars for every compiled projects. 129 | 130 | Nexus Repository OSS can assume all three roles, is free and open source. 131 | 132 | Nexus OSS was install on the Kubernetes cluster with the [helm chart](https://github.com/Oteemo/charts/tree/master/charts/sonatype-nexus) provided by [Oteemo](https://github.com/Oteemo). 133 | 134 | ## Local build environment 135 | 136 | It is possible to run a local environment for building / small scale testing. 137 | 138 | Prerequisite: 139 | 140 | - Docker installed and available to your local user 141 | 142 | You can start a local building environment with the `bin/start-build-env.sh` script. 143 | 144 | **Note:** See `build-env/README.md` for details. 145 | 146 | To build TDP component binaries, attach to the running `tdp-builder` container and `git clone` the TDP component repository to it. Each TDP component's `tdp/README.md` has custom instructions to launch the build process. 147 | Assign a directory path to the `TDP_HOME` variable in the `bin/start-build-env.sh` to control the local path of built TDP binaries. 148 | -------------------------------------------------------------------------------- /bin/check-component-branch-rebase.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # The script checks if the branches ending with '-fix' have been rebased on their corresponding '-basic' branch in each component's repository. 4 | 5 | # Go to the folder which contains all theses repositories and execute this script. 6 | 7 | # List of repositories 8 | repositories=( 9 | "hadoop" 10 | "tez" 11 | "spark-hive" 12 | "spark" 13 | "hive" 14 | "hbase" 15 | "ranger" 16 | "phoenix" 17 | "phoenix-queryserver" 18 | "knox" 19 | "hbase-connectors" 20 | "hbase-operator-tools" 21 | ) 22 | 23 | # Function to check branches 24 | check_branches() { 25 | cd "$1" || return 26 | echo "Checking repository: $(basename "$1")" 27 | 28 | # Fetch latest changes from remote 29 | git fetch origin 30 | 31 | # Initialize an empty array 32 | basic_branches=() 33 | 34 | # Read each line of the command output into the array 35 | while IFS= read -r branch; do 36 | # Append each branch to the array 37 | basic_branches+=("$branch") 38 | done < <(git branch -r --list 'origin/*-basic') 39 | 40 | # Check if the array is empty 41 | if [ ${#basic_branches[@]} -eq 0 ]; then 42 | echo "No branches matching the pattern 'origin/*-basic' found." 43 | else 44 | # Loop through each branch in the array 45 | for ((i=0; i<${#basic_branches[@]}; i++)); do 46 | # Replace "-basic" with "-fix" for each branch 47 | fix_branch=${basic_branches[i]/-basic/-fix} 48 | echo "Basic branch: ${basic_branches[i]}" 49 | # Check if fix branch contains the basic branch 50 | if git merge-base --is-ancestor ${basic_branches[i]} $fix_branch; then 51 | echo "Branch -fix is based on -basic branch" 52 | else 53 | echo "Error: Branch -fix is not based on -basic branch" 54 | fi 55 | done 56 | fi 57 | cd .. 58 | } 59 | 60 | # Loop through each repository 61 | for repo in "${repositories[@]}"; do 62 | check_branches "$repo" 63 | echo "" 64 | done 65 | -------------------------------------------------------------------------------- /bin/start-build-env.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | SCRIPT_DIR=$(cd `pwd`/`dirname ${BASH_SOURCE}` && pwd) 4 | DOCKER_DIR=$(cd $SCRIPT_DIR/../build-env && pwd ) 5 | 6 | docker build -t tdp-builder "$DOCKER_DIR" 7 | 8 | USER_NAME=${SUDO_USER:=$USER} 9 | USER_ID=$(id -u "${USER_NAME}") 10 | GROUP_ID=$(id -g "${USER_NAME}") 11 | TDP_HOME="${TDP_HOME:=$(pwd)}" 12 | 13 | docker run --rm=true -t -i \ 14 | -v "${TDP_HOME}:/tdp" \ 15 | -w "/tdp" \ 16 | -v "${HOME}/.m2:/home/builder/.m2${V_OPTS:-}" \ 17 | -e "BUILDER_UID=${USER_ID}" \ 18 | -e "BUILDER_GID=${GROUP_ID}" \ 19 | --ulimit nofile=500000:500000 \ 20 | tdp-builder 21 | -------------------------------------------------------------------------------- /build-env-python/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM quay.io/pypa/manylinux2014_x86_64:2024.09.05-1 2 | 3 | # Group paquets and repos 4 | RUN yum groups mark install "Development Tools" 5 | RUN yum groups mark convert "Development Tools" 6 | RUN yum group install -y "Development Tools" 7 | RUN yum install -y epel-release 8 | 9 | # Base packets + specific (Rust, Kerberos, DB, etc.) 10 | RUN yum update -y && yum install -y \ 11 | cmake \ 12 | curl \ 13 | cyrus-sasl-devel \ 14 | cyrus-sasl-gssapi \ 15 | gcc \ 16 | git \ 17 | gmp-devel \ 18 | krb5-devel \ 19 | libffi-devel \ 20 | libtidy-devel \ 21 | libxml2-devel \ 22 | libxslt-devel \ 23 | mariadb-devel \ 24 | maven \ 25 | openldap-devel \ 26 | openssl-devel \ 27 | postgresql-devel \ 28 | rsync \ 29 | rust \ 30 | rust-toolset-7 \ 31 | sqlite-devel \ 32 | sudo \ 33 | swig \ 34 | wget \ 35 | zlib-devel 36 | 37 | # Diverse configurations 38 | RUN ln -s /usr/local/bin/python3.6 /usr/local/bin/python3 39 | RUN mkdir /usr/local/share/jupyter && chmod 777 /usr/local/share/jupyter 40 | RUN mkdir /opt/_internal/cpython-3.6.15/etc && chmod 777 /opt/_internal/cpython-3.6.15/etc 41 | RUN mkdir -p /hue/build 42 | 43 | # Pip update 44 | RUN /usr/local/bin/python3.6 -m pip install --upgrade pip 45 | RUN /usr/local/bin/python3.6 -m pip install --upgrade wheel 46 | RUN /usr/local/bin/python3.6 -m pip install --upgrade setuptools 47 | 48 | # NVM & NodeJS 49 | RUN mkdir /usr/local/nvm 50 | WORKDIR /usr/local/nvm 51 | RUN curl https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh --output install.sh && chmod u+x install.sh 52 | RUN export NVM_DIR=/usr/local/nvm && ./install.sh 53 | RUN source /usr/local/nvm/nvm.sh && source /usr/local/nvm/bash_completion && nvm install lts/gallium 54 | 55 | # GoSu 56 | ENV GOSU_VERSION=1.11 57 | RUN gpg --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 \ 58 | && curl -o /usr/local/bin/gosu -SL "https://github.com/tianon/gosu/releases/download/${GOSU_VERSION}/gosu-amd64" \ 59 | && curl -o /usr/local/bin/gosu.asc -SL "https://github.com/tianon/gosu/releases/download/${GOSU_VERSION}/gosu-amd64.asc" \ 60 | && gpg --verify /usr/local/bin/gosu.asc \ 61 | && rm /usr/local/bin/gosu.asc \ 62 | && rm -r /root/.gnupg/ \ 63 | && chmod +x /usr/local/bin/gosu \ 64 | # Verify that the binary works 65 | && gosu nobody true 66 | 67 | # Create the user builder 68 | COPY docker-entrypoint.sh /usr/local/bin/ 69 | ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"] 70 | -------------------------------------------------------------------------------- /build-env-python/README.md: -------------------------------------------------------------------------------- 1 | # Build Environment 2 | 3 | Building components written in Python has different requirements than the ones written in Java. They are so to speak not really compiled, however they are still packaged in a Python wheel or a tar.gz file. 4 | 5 | The image being used is a manylinux2014 image originally based on CentOS 7 and compatible with many Linux OSs. It moreover has several different Python versions pre-installed on it. 6 | 7 | ## Build the image 8 | 9 | The image can be built and tagged with: 10 | 11 | ```bash 12 | docker build . -t tdp-builder-python 13 | ``` 14 | 15 | ## Start the container 16 | 17 | Contrary to the `tdp-builder` container where components are compiled with maven putting the jar files in the `.m2` cache it is not the case here and therefore volumes, working directories and even users are different for each component. 18 | 19 | Check the documentation of the concerned component for the command to start the container. 20 | -------------------------------------------------------------------------------- /build-env-python/docker-entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -eo pipefail 3 | 4 | if [[ -n "$USER_UID" ]] && [[ -n "$USER_GID" ]]; then 5 | # Create group and user only if they don't exist 6 | [[ ! $(getent group builder) ]] && groupadd -r --gid "$USER_GID" builder 7 | if [[ ! $(getent passwd builder) ]]; then 8 | useradd --create-home --home-dir /home/builder --uid "$USER_UID" --gid "$USER_GID" --system --shell /bin/bash builder 9 | usermod -aG wheel builder 10 | mkdir -p /home/builder 11 | chown builder:builder /home/builder 12 | gosu builder cp -r /etc/skel/. /home/builder 13 | fi 14 | # Avoid changing dir if a work dir is specified 15 | [[ "$PWD" == "/root" ]] && cd /home/builder 16 | if [[ -z "$@" ]]; then 17 | exec gosu builder /bin/bash 18 | else 19 | exec gosu builder "$@" 20 | fi 21 | fi 22 | 23 | exec "$@" 24 | -------------------------------------------------------------------------------- /build-env/Dockerfile: -------------------------------------------------------------------------------- 1 | 2 | # Licensed to the Apache Software Foundation (ASF) under one 3 | # or more contributor license agreements. See the NOTICE file 4 | # distributed with this work for additional information 5 | # regarding copyright ownership. The ASF licenses this file 6 | # to you under the Apache License, Version 2.0 (the 7 | # "License"); you may not use this file except in compliance 8 | # with the License. You may obtain a copy of the License at 9 | # 10 | # http://www.apache.org/licenses/LICENSE-2.0 11 | # 12 | # Unless required by applicable law or agreed to in writing, software 13 | # distributed under the License is distributed on an "AS IS" BASIS, 14 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | # See the License for the specific language governing permissions and 16 | # limitations under the License. 17 | 18 | # Dockerfile for installing the necessary dependencies for building Hadoop. 19 | # See BUILDING.txt. 20 | 21 | FROM ubuntu:focal 22 | ARG OS_IDENTIFIER=ubuntu-2004 23 | 24 | WORKDIR /root 25 | 26 | SHELL ["/bin/bash", "-o", "pipefail", "-c"] 27 | 28 | ##### 29 | # Disable suggests/recommends 30 | ##### 31 | RUN echo APT::Install-Recommends "0"\; > /etc/apt/apt.conf.d/10disableextras 32 | RUN echo APT::Install-Suggests "0"\; >> /etc/apt/apt.conf.d/10disableextras 33 | 34 | ENV DEBIAN_FRONTEND noninteractive 35 | ENV DEBCONF_TERSE true 36 | 37 | # hadolint ignore=DL3008 38 | RUN apt-get -q update \ 39 | && apt-get -q install -y --no-install-recommends \ 40 | ant \ 41 | apt-utils \ 42 | autoconf \ 43 | automake \ 44 | bats \ 45 | build-essential \ 46 | bzip2 \ 47 | clang \ 48 | cmake \ 49 | curl \ 50 | doxygen \ 51 | fuse \ 52 | g++ \ 53 | gcc \ 54 | git \ 55 | gnupg-agent \ 56 | gosu \ 57 | libbcprov-java \ 58 | libbz2-dev \ 59 | libcurl4-openssl-dev \ 60 | libfuse-dev \ 61 | libkrb5-dev \ 62 | libprotobuf-dev \ 63 | libprotoc-dev \ 64 | libsasl2-dev \ 65 | libsnappy-dev \ 66 | libssl-dev \ 67 | libtool \ 68 | libzstd-dev \ 69 | locales \ 70 | make \ 71 | openjdk-8-jdk \ 72 | pinentry-curses \ 73 | pkg-config \ 74 | python3 \ 75 | python3-pip \ 76 | python3-pkg-resources \ 77 | python3-setuptools \ 78 | python3-venv \ 79 | python3-wheel \ 80 | rsync \ 81 | shellcheck \ 82 | software-properties-common \ 83 | sudo \ 84 | unzip \ 85 | valgrind \ 86 | vim \ 87 | wget \ 88 | zlib1g-dev \ 89 | && apt-get clean \ 90 | && rm -rf /var/lib/apt/lists/* 91 | 92 | RUN locale-gen en_US.UTF-8 93 | ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8' 94 | ENV PYTHONIOENCODING=utf-8 95 | 96 | # Install Maven 3.9.9 97 | RUN mkdir /opt/maven \ 98 | && curl -L https://dlcdn.apache.org/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.zip -o apache-maven-3.9.9-bin.zip \ 99 | && unzip apache-maven-3.9.9-bin.zip \ 100 | && mv apache-maven-3.9.9 /opt/maven \ 101 | && rm apache-maven-3.9.9-bin.zip 102 | 103 | ENV MAVEN_HOME=/opt/maven/apache-maven-3.9.9 104 | ENV PATH=$PATH:$MAVEN_HOME/bin 105 | 106 | ###### 107 | # R version available in apt repos is not compatible when building spark R 108 | # Install custom R version 109 | ###### 110 | ARG R_VERSION=4.2.3 111 | 112 | RUN wget https://cdn.posit.co/r/${OS_IDENTIFIER}/pkgs/r-${R_VERSION}_1_amd64.deb && \ 113 | apt-get update -qq && \ 114 | DEBIAN_FRONTEND=noninteractive apt-get install -f -y ./r-${R_VERSION}_1_amd64.deb && \ 115 | ln -s /opt/R/${R_VERSION}/bin/R /usr/bin/R && \ 116 | ln -s /opt/R/${R_VERSION}/bin/Rscript /usr/bin/Rscript && \ 117 | ln -s /opt/R/${R_VERSION}/lib/R /usr/lib/R && \ 118 | rm r-${R_VERSION}_1_amd64.deb && \ 119 | rm -rf /var/lib/apt/lists/* 120 | 121 | ###### 122 | # Set env vars required to build Hadoop 123 | ###### 124 | # JAVA_HOME must be set in Maven >= 3.5.0 (MNG-6003) 125 | ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64 126 | 127 | 128 | ####### 129 | # Install Boost 1.86 130 | ####### 131 | 132 | # hadolint ignore=DL3003 133 | RUN mkdir -p /opt/boost-library \ 134 | && curl -L https://sourceforge.net/projects/boost/files/boost/1.86.0/boost_1_86_0.tar.bz2/download >boost_1_86_0.tar.bz2 \ 135 | && mv boost_1_86_0.tar.bz2 /opt/boost-library \ 136 | && cd /opt/boost-library \ 137 | && tar --bzip2 -xf boost_1_86_0.tar.bz2 \ 138 | && cd /opt/boost-library/boost_1_86_0 \ 139 | && ./bootstrap.sh --prefix=/usr/ \ 140 | && ./b2 --without-python install \ 141 | && cd /root \ 142 | && rm -rf /opt/boost-library 143 | 144 | ####### 145 | # Install SpotBugs 4.2.2 146 | ####### 147 | RUN mkdir -p /opt/spotbugs \ 148 | && curl -L -s -S https://github.com/spotbugs/spotbugs/releases/download/4.2.2/spotbugs-4.2.2.tgz \ 149 | -o /opt/spotbugs.tgz \ 150 | && tar xzf /opt/spotbugs.tgz --strip-components 1 -C /opt/spotbugs \ 151 | && chmod +x /opt/spotbugs/bin/* 152 | ENV SPOTBUGS_HOME /opt/spotbugs 153 | 154 | ###### 155 | # Install Google Protobuf 3.21.12 156 | ###### 157 | # hadolint ignore=DL3003 158 | RUN mkdir -p /opt/protobuf-src \ 159 | && curl -L -s -S \ 160 | https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz \ 161 | -o /opt/protobuf.tar.gz \ 162 | && tar xzf /opt/protobuf.tar.gz --strip-components 1 -C /opt/protobuf-src \ 163 | && cd /opt/protobuf-src/ \ 164 | && ./autogen.sh \ 165 | && ./configure --prefix=/opt/protobuf \ 166 | && make "-j$(nproc)" \ 167 | && make install \ 168 | && cd /root \ 169 | && rm -rf /opt/protobuf-src 170 | ENV PROTOBUF_HOME /opt/protobuf 171 | ENV PATH "${PATH}:/opt/protobuf/bin" 172 | 173 | 174 | # Use venv because of new python system protection 175 | ENV VENV_PATH=/opt/venv 176 | RUN python3 -m venv $VENV_PATH 177 | ENV PATH "$VENV_PATH/bin:$PATH" 178 | 179 | #### 180 | # Upgrade pip3 181 | #### 182 | RUN python3 -m pip install --upgrade pip setuptools wheel 183 | 184 | #### 185 | # Install pandas and pyarrow for Spark 3 186 | # venv-pack for jupyterhub venv 187 | #### 188 | RUN $VENV_PATH/bin/pip3 install numpy==1.24.4 \ 189 | pandas==2.0.3 \ 190 | pyarrow==14.0.2 \ 191 | venv-pack==0.2.0 192 | 193 | # Install pylint and python-dateutil 194 | RUN $VENV_PATH/bin/pip3 install pylint==2.6.0 python-dateutil==2.8.2 195 | 196 | ### 197 | ## Install Yarn 1.12.1 for web UI framework 198 | #### 199 | RUN curl -s -S https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add - \ 200 | && echo 'deb https://dl.yarnpkg.com/debian/ stable main' > /etc/apt/sources.list.d/yarn.list \ 201 | && apt-get -q update \ 202 | && apt-get install -y --no-install-recommends yarn=1.21.1-1 \ 203 | && apt-get clean \ 204 | && rm -rf /var/lib/apt/lists/* 205 | 206 | ### 207 | # Install hadolint 208 | #### 209 | RUN curl -L -s -S \ 210 | https://github.com/hadolint/hadolint/releases/download/v1.11.1/hadolint-Linux-x86_64 \ 211 | -o /bin/hadolint \ 212 | && chmod a+rx /bin/hadolint \ 213 | && shasum -a 512 /bin/hadolint | \ 214 | awk '$1!="734e37c1f6619cbbd86b9b249e69c9af8ee1ea87a2b1ff71dccda412e9dac35e63425225a95d71572091a3f0a11e9a04c2fc25d9e91b840530c26af32b9891ca" {exit(1)}' 215 | 216 | ###### 217 | # Intel ISA-L 2.29.0 218 | ###### 219 | # hadolint ignore=DL3003,DL3008 220 | RUN mkdir -p /opt/isa-l-src \ 221 | && apt-get -q update \ 222 | && apt-get install -y --no-install-recommends automake yasm \ 223 | && apt-get clean \ 224 | && curl -L -s -S \ 225 | https://github.com/intel/isa-l/archive/v2.29.0.tar.gz \ 226 | -o /opt/isa-l.tar.gz \ 227 | && tar xzf /opt/isa-l.tar.gz --strip-components 1 -C /opt/isa-l-src \ 228 | && cd /opt/isa-l-src \ 229 | && ./autogen.sh \ 230 | && ./configure \ 231 | && make "-j$(nproc)" \ 232 | && make install \ 233 | && cd /root \ 234 | && rm -rf /opt/isa-l-src 235 | 236 | ### 237 | # Avoid out of memory errors in builds 238 | ### 239 | ENV MAVEN_OPTS -Xms256m -Xmx3072m 240 | 241 | # Skip gpg verification when downloading Yetus via yetus-wrapper 242 | ENV HADOOP_SKIP_YETUS_VERIFICATION true 243 | 244 | ### 245 | # Everything past this point is either not needed for testing or breaks Yetus. 246 | # So tell Yetus not to read the rest of the file: 247 | # YETUS CUT HERE 248 | ### 249 | 250 | # Hugo static website generator for new hadoop site 251 | RUN curl -L -o hugo.deb https://github.com/gohugoio/hugo/releases/download/v0.58.3/hugo_0.58.3_Linux-64bit.deb \ 252 | && dpkg --install hugo.deb \ 253 | && rm hugo.deb 254 | 255 | # Install gradle 8.1.1 256 | RUN mkdir -p /opt/gradle \ 257 | && curl -L https://services.gradle.org/distributions/gradle-8.1.1-bin.zip -o gradle-8.1.1-bin.zip \ 258 | && unzip gradle-8.1.1-bin.zip \ 259 | && mv gradle-8.1.1 /opt/gradle \ 260 | && rm gradle-8.1.1-bin.zip 261 | 262 | ENV GRADLE_HOME=/opt/gradle/gradle-8.1.1 263 | ENV PATH=$PATH:$GRADLE_HOME/bin 264 | 265 | COPY docker-entrypoint.sh /usr/local/bin/ 266 | ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"] 267 | -------------------------------------------------------------------------------- /build-env/README.md: -------------------------------------------------------------------------------- 1 | # Build Environment 2 | 3 | Building Hadoop and other big data ecosystem project can be quite complex. Having an “official” building container is a good addition to any open source project, it helps both new developers on their journey to a first contribution as well as maintainers to reproduce issues more easily by providing a controlled and reproducible environment. 4 | 5 | The Docker image in `Dockerfile` is based on the one provided in the official [image](https://raw.githubusercontent.com/apache/hadoop/trunk/dev-support/docker/Dockerfile) provided by Apache. This image has got all the pre-requisites to build all the components of TDP. 6 | 7 | ## Build the image 8 | 9 | The image can be built and tagged with: 10 | 11 | ```bash 12 | docker build . -t tdp-builder 13 | ``` 14 | 15 | This image contains an entrypoint that create a `builder` user on the fly if you specify `BUILDER_UID` and `BUILDER_GID` as environment variables. This allow to have a `builder` user with the same `uid` and `gid` as the host user. If these variables are not defined, the `builder` user will not be created. 16 | 17 | The container needs to start as root to create the `builder` user so do not run tdp-builder with `docker run --user ...`, instead use variables above. The entrypoint will use `gosu` to exec command as `builder` user. 18 | 19 | ## Start the container 20 | 21 | The container should be started with: 22 | 23 | ```bash 24 | docker run --rm=true -t -i \ 25 | -v "${TDP_HOME:-${PWD}}:/tdp" \ 26 | -w "/tdp" \ 27 | -v "${HOME}/.m2:/home/builder/.m2${V_OPTS:-}" \ 28 | -e "BUILDER_UID=$(id -u)" \ 29 | -e "BUILDER_GID=$(id -g)" \ 30 | --ulimit nofile=500000:500000 \ 31 | tdp-builder 32 | ``` 33 | 34 | The important parameters are: 35 | - ~/.m2 should mounted to have the compiled jar outside the container and use of your local maven cache for faster builds 36 | - `BUILDER_UID` and `BUILDER_GID` should be defined to create the `builder` user to not build as root 37 | - --ulimit nofile=500000:500000 is helpful to run the tests (some are resource intensive and break easily with a low ulimit) 38 | - TDP_HOME is where the TDP repositories (hadoop, hive, hbase, etc) are cloned 39 | 40 | ## Start script 41 | 42 | All these steps can run with `./bin/start-build-env.sh` 43 | 44 | **Note:** By default, the current directory is mounted to the build container, you can change the mounted directory to where the TDP source repository live by running `export TDP_HOME="/path/to/tdp"` before running the `start-build-env.sh` script. 45 | -------------------------------------------------------------------------------- /build-env/docker-entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | set -eo pipefail 3 | 4 | if [[ -n "$BUILDER_UID" ]] && [[ -n "$BUILDER_GID" ]]; then 5 | # Create group and user only if they don't exist 6 | [[ ! $(getent group "$BUILDER_GID") ]] && groupadd --gid "$BUILDER_GID" --system builder 7 | if [[ ! $(getent passwd "$BUILDER_UID") ]]; then 8 | useradd --uid "$BUILDER_UID" --system --gid "$BUILDER_GID" --home-dir /home/builder builder 9 | # Avoid useradd warning if home dir already exists by making home dir ourselves. 10 | # Home dir can exists if mounted via "docker run -v ...:/home/builder/...". 11 | mkdir -p /home/builder 12 | chown builder:builder /home/builder 13 | gosu builder cp -r /etc/skel/. /home/builder 14 | fi 15 | # Avoid changing dir if a work dir is specified 16 | [[ "$PWD" == "/root" ]] && cd /home/builder 17 | if [[ -z "$@" ]]; then 18 | exec gosu $BUILDER_UID /bin/bash 19 | else 20 | exec gosu $BUILDER_UID "$@" 21 | fi 22 | fi 23 | 24 | exec "$@" 25 | -------------------------------------------------------------------------------- /static/tdp_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TOSIT-IO/TDP/9020cc0e7d814bf67d05609fe25283f1fd28a09d/static/tdp_logo.png -------------------------------------------------------------------------------- /static/tdp_logo_cmjn.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 77 | -------------------------------------------------------------------------------- /static/tdp_logo_cmjn_white.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 77 | -------------------------------------------------------------------------------- /static/tdp_logo_white.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TOSIT-IO/TDP/9020cc0e7d814bf67d05609fe25283f1fd28a09d/static/tdp_logo_white.png -------------------------------------------------------------------------------- /ui/README.md: -------------------------------------------------------------------------------- 1 | # Tosit Data Platform UI 2 | 3 | Purpose of TDP is to be used in production environment.\ 4 | For that, we need a nice and effective UI, here it is. 5 | 6 | ## API 7 | 8 | API is written in python, it uses flask. All output is returned in the UI. It uses OpenAPI. The API controls SERVICES and COMPONENTS. Using the API is and must forever be an option! It means all ansible roles and playbooks must remind 100% functionnal if you manipulate them manually. 9 | 10 | Here are the key function of the API: 11 | - Deploy a service or component 12 | - List the services or components 13 | - Delete a service or component 14 | - Start a service or component 15 | - Stop a service or component 16 | - CRUD configurations 17 | - Rolling restart 18 | - CRUD config groups 19 | 20 | An instance of this API will only manage one cluster 21 | 22 | REACT or VUE.js must be chosen 23 | 24 | ## SUPERVISION 25 | 26 | In V0, there will be no historical.\ 27 | Supervision must not use any kind of agent. 28 | 29 | LEVEL 0 : Uses NC or Systemctl\ 30 | LEVEL 1 : Uses CURL or RPC or JMX\ 31 | LEVEL 2 : Service Check\ 32 | LEVEL 3 : Auto resolves all issues :) 33 | 34 | All the previous features are implemented service and component on the API 35 | 36 | ## STOP / START / ROLLING 37 | 38 | Prerequisites : All the following things are supposed to be setup and in a functionnal state: 39 | - KDC 40 | - LDAP 41 | - RDBMS 42 | - SSSD 43 | - SSL 44 | - Hadoop cluster must have already been deployed 45 | 46 | Here are services dependancies 47 | 48 |
49 | ZK 50 | | 51 | | 52 | Ranger Admin 53 | | 54 | | 55 | JN 56 | | 57 | | 58 | ZKFC 59 | | 60 | | 61 | ___________________________ NN + DN____________________________ 62 | | | | 63 | | | | 64 | SPARK HS YNM + YRM + YATS HBASE MASTER 65 | | | | 66 | | | HBASE RS + REST 67 | | HS2 + HSM | 68 | | | PHOENIX QS 69 | |____________________________ | _____________________________| 70 | OOZIE 71 | | 72 | | 73 | KNOX 74 |75 | 76 | ## CONFIGURATIONS MANAGEMENT 77 | 78 | Playbook orchestration, versionning\ 79 | LEVEL 0 : Hadoop cluster must have already been deployed\ 80 | LEVEL 1 : No existing Hadoop cluster 81 | 82 | ### GIT REPO for OPS, VARS, TOPOLOGIES 83 | 84 | Git repo is synchronized, the API server is the only instance that can push to the remote\ 85 | On the GIT, only vars are stored, not XML files\ 86 | Rule is : 1 modification = 1 commit\ 87 | We use an RDBMS instance in order to store this historic of deployment actions\ 88 | When we deploy: 89 | - 0 : We choose the commit 90 | - 1 : We do the deployment action 91 | - 2 : We write the action in the RDBMS 92 | 93 | If a Rollback is needed -> Git revert (1 and only 1 commit) 94 | 95 | Configuration management scope is the whole cluster (1 and only 1 ops repo) 96 | 97 | ### GIT REPO for Ansible Collections 98 | 99 | . 100 | 101 | ## METROLOGY 102 | 103 | ### UI 104 | UI = GRAFANA\ 105 | Backend = Prometheus\ 106 | Is there any kind of feature for time-frame merging in Prometheus ?\ 107 | If not, let's have a look at a solution with GRAPHITE + an agent 108 | 109 | ### Logs 110 | We log services like this 111 |
112 | Log file 113 | | 114 | | 115 | Fluentd 116 | | 117 | | 118 | Elasticsearch 119 |120 | --------------------------------------------------------------------------------