├── .gitignore ├── iep-012 ├── images │ ├── azure.png │ ├── environment_namespace.png │ ├── azure.xml │ └── environment_namespace.xml └── README.adoc ├── iep-004 ├── images │ ├── enable_azure_custom_logs.png │ └── get_azure_OMS_credentials.png └── README.adoc ├── template.adoc ├── README.adoc ├── iep-006 └── README.adoc ├── iep-009 └── README.adoc ├── iep-007 └── README.adoc ├── iep-002 └── README.adoc ├── iep-008 └── README.adoc ├── iep-001 └── README.adoc ├── iep-005 └── README.adoc ├── iep-003 └── README.adoc └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.sw* 3 | -------------------------------------------------------------------------------- /iep-012/images/azure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-012/images/azure.png -------------------------------------------------------------------------------- /iep-012/images/environment_namespace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-012/images/environment_namespace.png -------------------------------------------------------------------------------- /iep-004/images/enable_azure_custom_logs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-004/images/enable_azure_custom_logs.png -------------------------------------------------------------------------------- /iep-004/images/get_azure_OMS_credentials.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-004/images/get_azure_OMS_credentials.png -------------------------------------------------------------------------------- /template.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-X: The title goes here 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | X 18 | 19 | | Title 20 | | The title goes here 21 | 22 | | Author 23 | | The author goes here 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | [Process|Service|Architecture] 30 | 31 | | Created 32 | | 2016-10-25 33 | |=== 34 | 35 | 36 | 37 | == Abstract 38 | 39 | == Specification 40 | 41 | == Motivation 42 | 43 | == Rationale 44 | 45 | == Costs 46 | 47 | == Reference implementation 48 | 49 | -------------------------------------------------------------------------------- /README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | 10 | = Infrastructure Enhancement Proposals 11 | 12 | 13 | This repository is for tracking enhancement proposals to the 14 | link:https://jenkins.io[Jenkins project's] 15 | infrastructure. The goal of this repository is to collect and codify proposals 16 | in a transparent and historically reviewable fashion. It can be thought of 17 | similar to efforts such as the 18 | link:http://www.ietf.org/rfc.html[IETF's RFC process] 19 | or the Python project's 20 | link:https://www.python.org/dev/peps/[PEPs]. 21 | 22 | 23 | = Creating a proposal 24 | 25 | 26 | . Prepare an AsciiDoc file outlining your proposal, using requirement verbiage following 27 | link:http://www.faqs.org/rfcs/rfc2119.html[RFC 2119] 28 | and the directions from 29 | link:https://github.com/jenkins-infra/iep/tree/master/iep-001[IEP-001]. 30 | . Copy `template.adoc` into the new `iep-xxx/` folder as `README.adoc`, 31 | numbering the directory sequentially from the previous IEP. 32 | . Write proposal 33 | . Open a pull request. 34 | -------------------------------------------------------------------------------- /iep-012/images/azure.xml: -------------------------------------------------------------------------------- 1 | 5Vlbc5s6EP41foyHiwH7sXYu5yGn45l0pu2TRwYZVMusK4Rj99dXAomLAZfUJO3JyYthWd32+3b1SRnZi93xgaF99C8EmI4sIziO7NuRZZkTyxU/0nLKLd5klhtCRgLlVBqeyA+sjIaypiTASc2RA1BO9nWjD3GMfV6zIcbgue62AVofdY9C3DA8+Yg2rZ9JwKPcOrW80v4PJmGkRzZdtb4d0s5qJUmEAniumOy7kb1gADx/2h0XmMrg6bjk7e47vhYTYzjmvRo4ah78pBeHA7FW9QqMRxBCjOhdaZ0zSOMAyx4M8RbxHRWPpnjER8K/VJ6/Spexo96WmJEd5piphmKK7PQlcxER0gbVxikMzWbBB4mheI0hxrnlnlCqvn/DnJ8UZ1DKQZjKZTwC7NUEE85gixdAgWULtw3DdReL4otGVmAy30DMK57y716EdJ4HT0asM/7KlEDKfOVlK8oiFmLl5RTYi6TBINbLTsKFYYo4OdR7R4q9YeFXAiweFMbteKuhD4imqtOR5VKulpiljF6l+z2VLJyXkamY3FD+3sXBHkjME92JGD7vJ//e4FZBdwlUgJKojUV1YBTGFK0xXUJCOIFYmH0RYEmJ+QEzTkRqPp45rIFz2FUcPlASyg9cMmCO1FvRj5jaXs5ydwxl0RrvEh/hsU8hDcboR8rwigIKVmtEUexjttpgxIVVRk5wr0Yjb3brFquXw+PjZXo0gdcNZqpOnHQlmebvz2XZcZRLVKk42nYNVbwWqjThzGLGsiJbAtqRXiou1XgpdCVpVMqak2a+iabTW68NtQpt5OhLxIU9ziyWIXtCyT7fATbkKOc2BCqWV0dFR7sCSgFcFZXCeA0s0wYsS0YOiGNhXDIIxM9HzJ+BbQfMvi6KD52V50mYJd74QBhPEV3Fal11vrTU5wZfWA5CP/C9yylZT8gG8nYb8Aq03uVcjb6UxbUc+saq8+5mWu8BNptEbCjnZCrW0ItfpvsaiqBUAV/1Nn5ZEWTJWxUE3jvRA15TDzhNPeBeqQeypiIs6FRx0Ht1F78cp0aviXVGpby/3yaW9xrEerHUdGq8+l+RyrT+BKtMfdpTtJqZg9LKGVTRfmJpwvF2+t4VrTgZc0RioWMTzA7Ev1rHXt407frG5dm9VKztXi+X3EHp8fT0+M6JscWn1QGleYRejxCmcaajmozQd0JDn2tMa1BKPCwf3jklEg4suxDrJMTlnWtYohhur9oxsQdgymRQpiRCQsiJ43UEsO29w/wBkXQmvbtEUl8R1EWPisjyKUoS4n+KSFzTWmVGVJVSt6rqoZVaLuRMfYlcEUv2RVreGGPTsVSF+t1j3bDnNrOFrP+5C4BfFqb8RiAkPErXKx8CfIFfvcrP5asepy5c7Le7frOHPS69NLd75uZfdg5qyW19Z9Ijtwe/bddjD7SBLEQbRtYizOwNzid/WcqnMQmEB9kQHGSJv4L1t/ze+e3y35zN+qmPl5NHvJb/9svrf/nPU/vuJw== -------------------------------------------------------------------------------- /iep-012/images/environment_namespace.xml: -------------------------------------------------------------------------------- 1 | 7Vxdc5s6EP01ebwZMEY4j7Hb3q/0TibuTNtHBRSjBiOPkJO4v/6uQMJggU0MJk5LJjMxQgjpnCPtarXxhTNbvvzJ8Sr8zAISXYys4OXC+XAxGtnjEYI/smSTlXjjq6xgwWmgKm0L5vQnUYWWKl3TgCSlioKxSNBVudBncUx8USrDnLPncrUHFpXfusILYhTMfRyZpV9pIMKsdDLytuV/EboI9ZttpMa3xLqyGkkS4oA9F4qcjxfOjDMmsk/LlxmJJHgal+y5TzV3845xEosmD0xUP55wtFaDu/YF46p3YqOHDB1dyY/rZZRVcKZPhAsKoNzgexLdsoQKymKocs+EYEuoEMkbU+w/Ljhbx8GMRfI5aM15SH8KbVxHdCGfFWwFpaFYRnBhw0e2FhGNySxn0oLCtDkSqKtEcPZICq1b6U9+R1ME4E4faBTpmjGLYURTrF4dkQdof6rwgH6Rl1pQ7Zwq0DhhSyL4BqroB9zsCaVu21O6eN5qxVG4hwWZIFWGlToXecNbAuGD4rCGT9vg06CSxMG1nARw5Uc4SahfhhxGyTffFLjpxXd5cenKyxcqvul68Dm74+z8qJu3hFMYAeGVtGX9IoEx1XZghr6zNfdJWbAC8wVR1ZwaOgp4uxV46zJOIizoU7kbVSSoN9wyCh3csm2V2EbODo1Z79VDxam40w4qt+NOdtrJRmy0A1TiTaHaSlZI6rs7Kb9mZJUWCPiQNbhVWw5oMwGOOhBgG5HtU++pBOheDQI8GwG6hgBDIaRjcC1fMvoUru8vA+Y/En7pg5naFWdBhycxMz6IS4q1C0Mz9lAJS0djWxCe7Y7qldfK1KADQHOyYpc/SPxI4+QPn14yvnjXaKOx9XZoX5nr6r/re8JjWPoyyK0vfJ0IEjxOEgPm5JkuI5wi88BiMVd3JJx+SKPgBm/A0YKSRIC7pq+mIeP0J9THmiO4zYXyx8GJr4R++9BcNqZtP0ngsVsNtr1T9Bm/lCre4EQ7ez6LIrxK6H3aZfngEhYiGk+Vm9nSIzSUorH6VBha0WXNXXarG1nZO7LSLRRUhVCF9XCcSXtZOaa/+B9ekmSFpb2TqpoTf82p2AyaqtNUrYJuaKwROSgqQ5eNlJWtCqa0tJRQMynZLupASleHPT+5EZQbvJV22uYgNBovDs0mtZHH97ohy8SibhfwWix23aejsJg0xaLViGvcTt1S2TmwHQMA17IrALDG7QEwrdXf8QKmYWPTZP8yy0ijiZwJZu8OIrcTnZPlGGT9k3ltBlkwBGGshQrcJl5YdbgnyRaBmzQA82G8LblTA5VFDJ59iFLDG9IgIHG6Qgusp4iEWm09oKvuFH4Br5nc+LnQ9Rlc29tr+JXVuZixGEaDaUoEAWqfiaR3GuAkTNd+0zVtROirF+ZKekcdrMtjg907cMybUVs0c78wvx0wqv3+Phg1d7h3ZEET3crvOWE74NCb9MehZ3A4J/yJgtc9GMjXGEj9RNVe/GQG0/TutMFMt0wyDJJHQX50a0rVKcX7mZdtDal3NobU3N+khtSknHdtX9836R3Q3Kd11eH0KvOac12McnZret8d1x3Q26fhtc2I1y0LBqt7jNV1kBmrPJ3Vtc2gwrBPbWte7VdHzU43M804xLBTPcaWHuC0V2NaFX0Y9qodsNinzayIOBjsmUkdX0IqcWyVWZTeiQN5bFKaAXuzNnT0vZi2odaWYtbG4ZDrKZI2dDR+N7uiaZaGfn8e1W+WpXFEJoVWTpFkgF27QqDvkC1YjKOP29IdE3Qwqcc9oIofRIiNcp/wWjA5V/P33jA5w5WyatTXiXjGpnhqjl4aq6Lx1PPenoSG4Dbl6kgSXJME2+qLhUlfLJwD0pMKpGsip90jffU7IX1VgbTTE9LacAwri474lGgY90SDa4YVdzJw7kgEvmPjg4EhAed8EnBsnQ7dSwYOMiOXhmhOmYFTlwb9Fhk4yAzz1WDRZsSoxgE5gwwcZEbLhhScfVMZHf4vktPFNtGQg9N9bBPtX5z7jG2iIQunm9jmAU77jG2iIQ/n2NjmARb7jG0i899YhkycY+zkm2TiIDOPakjFOZk9RedjT80UrCEZpzsju5/oXo1sVdbVkI3T0vzu57fXPFgzZjGk4xxpehun4+gJ3Io4M8Ay7FhbJ7vWxJPewMJ65o7VIPaMzvy19ItxfVRx6I9qMhGHU3/Nu2vS/Ouf+lfKp+LYH9UkjXR+LOSdQfJF/6dzlTRUHPyjvtIvvN7SL84Cas+E2qvxObqHureTf6347eX3fAL0TUMVCxVZAdm32rRgocasTNzL8peVIXsnKtOhZXmfSTT79HIkwRUJNpOaLVlLgj1vdDKC5XY7/6LCrPr26x6dj/8D -------------------------------------------------------------------------------- /iep-006/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-006: Jenkins.io on Azure 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 006 18 | 19 | | Title 20 | | Hosting Jenkins.io on Azure 21 | 22 | | Author 23 | | link:https://github.com/olblak[Olblak] 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | [Architecture] 30 | 31 | | Created 32 | | 2017-06-26 33 | |=== 34 | 35 | 36 | 37 | == Abstract 38 | 39 | Currently the main Jenkins website (jenkins.io) runs on a single server. + 40 | This basic architecture has his limitations in terms of availability and scalability and could be easily improved by using cloud services. 41 | 42 | The goal of this document is to design the best solution for a jenkins.io hosted on Azure. 43 | 44 | == Specification 45 | 46 | Following elements must be taken into accounts. 47 | 48 | * Jenkins.io only contains static files 49 | * Endpoint Require HTTP/HTTPS support 50 | * Use several URL redirection but only one endpoint (https://jenkins.io) 51 | * www.jenkins.io is updated every 30 minutes 52 | 53 | === Static website 54 | 55 | 'www.jenkins.io' only contains static files which means that using a blob storage for storing 56 | files and azure-cli for updating the website, can be a solution to improve the reliability. 57 | 58 | Advantages of this approach are: 59 | 60 | * No server to administrate 61 | * Geo Redundancy 62 | * Easy files update 63 | * Good SLA (99.99%) 64 | 65 | 66 | === URL 67 | 68 | Main website endpoint is 'https://www.jenkins.io' and everything else should be a permanent redirection to it. 69 | 70 | - 'jenkins-ci.org' 71 | - 'jenkins.io' 72 | - 'www.jenkins-ci.org' 73 | - HTTP traffic 74 | 75 | === Design 76 | 77 | We deploy the website on a blob storage (Azure File Storage) and we use web servers deployed on the Kubernetes cluster to process 78 | HTTP/HTTPS requests, SSL certificate, ... 79 | 80 | There are two ways to implement this solution 81 | 82 | 1. The blob storage is mounted inside the docker container and the container works like a classic web server. 83 | 2. Files in the blob storage are accessed trough a HTTP request, and the container works as a proxy. 84 | 85 | At this moment, there is no technical reason to prefer between those two implementation, therefor I suggest to go with 86 | the blob storage mounted inside the container as we are already doing it for others applications. 87 | 88 | Main advantages of this design are: 89 | 90 | * We already apply this design for others applications (azure repo proxy, pluginsite). 91 | * We can use Letsencrypt certificates for HTTPS 92 | * We can update the website without having to rebuild the docker image 93 | * We have a full control on redirection/proxy rules. 94 | 95 | == Motivation 96 | Motivations here are essentially to increase reliability, availability and scalability. 97 | 98 | == Rationale 99 | They are a lot of different ways to host a static website and I briefly take a look to some of them here. 100 | 101 | === Dedicated Server 102 | Running an Apache/Nginx web server on a dedicated server. + 103 | That's what we are doing at the moment. 104 | 105 | Pros: 106 | 107 | * We already have the configuration management code to manage it. 108 | * It's easy to control the cost as we pay dedicated server/virtual machine on a monthly basis. 109 | * It's easy to do continuous delivery as we only have to upload files on the correct directory. 110 | 111 | Cons: 112 | 113 | * It requires maintenance, server updates, ... 114 | * The dedicated server is a single point of failure. 115 | 116 | === Standalone Docker 117 | The docker image contain everything, the web server and the website. + 118 | That means that the docker image is self-sufficient. 119 | This docker image can be deployed on the k8s cluster. + 120 | 121 | Pros: 122 | 123 | * All docker pros 124 | 125 | Cons: 126 | 127 | * It adds a lot of complexity to the Continuous Delivery process as the website is updated every 30 min. 128 | * A lot of docker images will be build per day (up to 48) which means a lot of layers, spaces used, ... 129 | * We'll have to manage a big amount of docker image tags, ... 130 | 131 | 132 | === Standalone Blob Storage 133 | As 'jenkins.io' is only composed of static files, we can use a blob storage to store 134 | files and azure-cli to update website content. 135 | 136 | Pros: 137 | 138 | * No server to administrate 139 | * Geo Redundancy 140 | * Easy files update 141 | * Good SLA (99.99%) 142 | 143 | Cons: 144 | 145 | * Very basic web server 146 | ** Do not support HTTPS. 147 | ** Do not support URL redirection like 'jenkins-ci.org' -> to 'jenkins.io' 148 | ** URL follow this format 'http:////' 149 | 150 | === Blob Storage + CDN 151 | 152 | Add a CDN in front of a blob storage 153 | 154 | Pros: 155 | 156 | * Provide HTTPS 157 | * Increase performance 158 | * Same than Standalone blob storage 159 | 160 | Cons: 161 | 162 | * Costs 163 | * Do not support domain redirection 164 | 165 | == Costs 166 | 167 | At this moment, it's hard to evaluate the price of this change as Azure blob storage pricing depends on several factors like number of requests done or the amount of data stored. 168 | 169 | == Reference implementation 170 | 171 | This implementation works exactly like https://github.com/jenkins-infra/jenkins-infra/tree/staging/dist/profile/templates/kubernetes/resources/repo_proxy[Repo-proxy], excepted that instead of getting contents from Artifactory and caching the results on the blob storage, we get contents from a blob storage. 172 | -------------------------------------------------------------------------------- /iep-009/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-9: Incremental builds Maven repository 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 9 18 | 19 | | Title 20 | | Incremental builds Maven repository 21 | 22 | | Author 23 | | link:https://github.com/rtyler[R Tyler Croy] 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | Architecture 30 | 31 | | Created 32 | | 2018-03-19 33 | 34 | | Discussions-To 35 | | link:http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-March/001417.html[infra@lists.jenkins-ci.org] 36 | 37 | |=== 38 | 39 | 40 | == Abstract 41 | 42 | In order to support more rapid continuous delivery models, such as that 43 | described by 44 | link:https://github.com/jenkinsci/jep/tree/master/jep/300[Jenkins Essentials], 45 | Jenkins core and plugin builds must be deployed into a Maven repository much 46 | more incrementally rather than waiting for a developer to manually deploy a 47 | release to the existing `releases` footnote:[https://repo.jenkins-ci.org/releases/] 48 | repository. 49 | 50 | 51 | == Specification 52 | 53 | For a more incremental developer workflow, rather than attempting to make sense 54 | of potentially thousands of `SNAPSHOT` versions of artifacts in a repository. 55 | this document proposes a specific and tightly controlled Maven repository for 56 | such builds. 57 | 58 | This would mean Artifactory would have an `incrementals` repository *just* for 59 | these kinds of pre-release builds. For example, rather than 60 | `git-client-2.4.0-SNAPSHOT.jar` the repository would contain 61 | `git-client-2.4.0-a3dbf.jar`, assuming `a3dbf` is the Git short-commit for the 62 | build. 63 | 64 | === Garbage Collection 65 | 66 | Artifacts in the `incrementals` repository must be garbage collected to the 67 | most recent 5 artifacts **or** the most recent 30 days of artifacts. For 68 | example, if a plugin (`io.jenkins.plugins.retrocean`) has no new commits in a 69 | 30 day period, the `io/jenkins/plugins/retrocean/` directory should have no 70 | more than the five most recent artifacts stored. More active plugins should 71 | have no more than 30 days worth of artifacts stored. 72 | 73 | 74 | [NOTE] 75 | ==== 76 | Plugin and core tooling should use a consistent format for defining incremental 77 | built versions, such as: `-..-.jar`. The 78 | specific format is not required for this document. 79 | ==== 80 | 81 | 82 | == Motivation 83 | 84 | By adding this new, somewhat ephemeral, Maven repository, we can support newer 85 | development workflows in the Jenkins project without adversely affecting the 86 | existing "mainstream" development workflow of Jenkins plugins. 87 | 88 | As mentioned briefly in <>, overhead is sufficiently low to adding that 89 | adding a new Maven repository, even if it's experimental and eventually 90 | abandoned, is worth the exploration. 91 | 92 | == Rationale 93 | 94 | A Maven repository hosted in Artifactory, alongside the `releases` and a number 95 | of other repositories, results in the simplest to deploy, and simplest to 96 | consume "bucket of artifacts" we currently have at our disposal. 97 | 98 | In order to avoid over-reliance on these "incremental builds", the artifacts 99 | themselves should be expected to be deleted after the specified "garbage 100 | collection" period. 101 | 102 | 103 | === Alternate Approaches 104 | 105 | ==== Azure Storage Container 106 | 107 | One alternative approach which was briefly discussed in person with 108 | link:https://github.com/carlossg[Carlos Sanchez] was to drop these 109 | "incremental build" artifacts directly into an Azure storage container. 110 | 111 | This approach was rejected as it would require additional non-native tooling to 112 | be used in the development workflow. By adopting a Maven repository, existing 113 | tools for grabbing Maven dependencies can be utilized by developers wishing to 114 | incorporate "incremental builds" into their build and test workflows. 115 | 116 | 117 | ==== Existing Maven Repository 118 | 119 | Re-using the existing `releases` Maven repository was explicitly _not_ 120 | considered as an alternative approach as the Jenkins infrastructure team has 121 | had multiple performance issues 122 | footnote:[http://lists.jenkins-ci.org/pipermail/jenkins-infra/2017-December/001349.html] 123 | with that repository over the past couple of months. Most notably, the indexing 124 | times in the repository have varied between 10 minutes and over 60 minutes due 125 | to opaque server-side issues. The consequence of adding more load, and many 126 | more artifacts, to the repository would severely affect the indexing time and 127 | adversely affect the ability for Jenkins users to receive new plugin updates in 128 | the Update Center footnote:[https://updates.jenkins.io]. 129 | 130 | 131 | [[costs]] 132 | == Costs 133 | 134 | As Artifactory is a hosted service provided by link:https://jfrog.com[JFrog], 135 | it is not expected that any additional financial cost will be involved in this 136 | proposal. 137 | 138 | Additionally, since this document proposes an additional repository, it is not 139 | expected to incur any substantial runtime/performance cost on the existing 140 | Update Center and `releases` repository indexing process. 141 | 142 | 143 | == Reference implementation 144 | 145 | Since we have no staging implementation of Artifactory, there is no reference 146 | implementation, acceptance of this IEP document would result in a new 147 | repository being created on 148 | link:https://repo.jenkins-ci.org[repo.jenkins-ci.org]. 149 | -------------------------------------------------------------------------------- /iep-007/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-007: Kubernetes cluster upgrade 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 007 18 | 19 | | Title 20 | | Kubernetes cluster upgrade 21 | 22 | | Author 23 | | link:https://github.com/olblak[Olblak] 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | [Process] 30 | 31 | | Created 32 | | 2016-07-04 33 | |=== 34 | 35 | 36 | 37 | == Abstract 38 | 39 | As Kubernetes project often releases new versions, it's interesting to enjoy new features or bugfixes introduced by those new versions. 40 | Therefor we need an easy way to upgrade Kubernetes cluster on a regular basis. 41 | 42 | 43 | == Specification 44 | 45 | Currently there are two main strategies to "upgrade" a cluster. + 46 | 47 | . Upgrading an existing cluster. 48 | . Migrating on a second cluster. 49 | 50 | As Azure do not (yet) provide any tools to upgrade existing clusters, we have to upgrade them manually. + 51 | It appears to be easier and safer to deploy a new cluster and re-deploy all resources on the new one. 52 | 53 | As long as the new cluster stays in the same region than the previous one, we can use same blob storages and attach them to the new cluster. + 54 | The only important element that we loose when we migrate to a new cluster, is the cluster public IP. + 55 | This means that we need to update 'nginx.azure.jenkins.io' with a new public IP. 56 | 57 | === Migration Process 58 | 59 | IMPORTANT: The old cluster must be kept until the new one is ready to serve requests. 60 | 61 | ==== Step 1: Backup 62 | 63 | Ensure secrets containing Letsencrypt certificates are exported. (Require https://github.com/jenkins-infra/jenkins-infra/pull/819[#PR819]) + 64 | A cron job should periodically export letsencrypt certificate into `~/backup/$(CLUSTER)/secret.$(APP)-tls.yaml` 65 | 66 | .Export a secret 67 | ---- 68 | .bin/kubectl get secret $(APPLICATION)-tls --export=true --kubeconfig .kube/$(CLUSTER).conf -o yaml > ~/backup/$(CLUSTER)/secret.$(APPLICATION)-tls.yaml 69 | ---- 70 | 71 | ==== Step 2: Deploy the new cluster 72 | 73 | Add a second k8s resource in github.com/jenkins-infra/azure, named 'pea' (Require https://github.com/jenkins-infra/iep/pull/11[#PR11]) 74 | 75 | ==== Step 3: Configure the new cluster 76 | 77 | * Update following hieraconfig variables with new the k8s cluster information(Require PR on jenkins-infra/jenkins-infra) 78 | 79 | ---- 80 | profile::kubernetes::params::clusters: 81 | - server: https://clusterexample1.eastus.cloudapp.azure.com 82 | username: clusterexample1-admin 83 | clustername: clusterexample1 84 | certificate_authority_data: ... 85 | client_certificate_data: ... 86 | client_key_data: ... 87 | ---- 88 | 89 | * Run puppet agent 90 | * Get new public IP (Manual operation) 91 | ---- 92 | kubectl get service nginx --namespace nginx-ingress 93 | ---- 94 | * Restore backed up secrets containing Letsencrypt certificates on the new cluster (Manual operation) 95 | ---- 96 | .bin/kubectl apply -f ~/backup/$(OLD_CLUSTER)/secret.*-tls.yaml --kubeconfig .kube/$(CLUSTER).conf 97 | ---- 98 | * Validate HTTPS endpoint (Manual operation) 99 | ---- 100 | curl -I https://jenkins.io --resolve "jenkins.io:443:" 101 | curl -I https://plugins.jenkins.io --resolve "plugins.jenkins.io:443:" 102 | curl -I https://repo.azure.jenkins.io --resolve "repo.azure.jenkins.io:443:" 103 | ... 104 | ---- 105 | 106 | ==== Step 4: Update DNS Record 107 | 108 | Update nginx.azure.jenkins.io with the new public IP (Require PR on jenkins-infra/jenkins-infra) 109 | 110 | [NOTE] 111 | During DNS update, requests will be send either to the new cluster, either to the old cluster. 112 | Users shouldn't detect any differences. 113 | 114 | ==== Step 5: Remove the old cluster 115 | Remove k8s.tf from jenkins-infra/azure (Require PR on jenkins-infra/azure) 116 | 117 | [NOTE] 118 | It may be safer to not automate this step, and just delete the good storage account through Azure portal 119 | 120 | 121 | ==== Conclusion 122 | With this scenario, we shouldn't have any downtime as HTTP/HTTPS requests will almost have (depending on the service) the same response whatever we reach the old or the new cluster. 123 | 124 | 125 | == Motivation 126 | 127 | As testing environments have short lives, we create them to validate deployments then we trash them so they often use the latest version available from Azure. 128 | This means that we may not detect issues when those versions are not aligned with production 129 | We also would like to enjoy improvements from bugfixes and new features. 130 | It's easier to follow Kubernetes documentation if we use a version close to the upstream version. 131 | 132 | 133 | == Rationale 134 | 135 | There isn't any tool to easily upgrade a Kubernetes cluster on Azure so I tried to do it manually. 136 | 137 | I only spent one day to try this upgrade but it wasn't trivial. 138 | I applied following steps manually: 139 | 140 | * Update kubelet && kubectl 141 | * Update Kubernetes options according the version (some options were deprecated and others were new) 142 | * Reproduce the production cluster in order to validate the migration. 143 | * Restart each node (master&client) after the upgrade 144 | 145 | Even after those operations I faced weird issues, so I decided to stop there and concluded that cluster migration was easier and a safer process. 146 | 147 | They are several open issues regarding update procedure so I suppose that it may be a possible alternative in the future. 148 | 149 | * https://github.com/Azure/ACS/issues/5[Azure/acs] 150 | * https://github.com/Azure/acs-engine/issues/464[Azure/ace-engine] 151 | 152 | == Costs 153 | 154 | It will costs a second cluster during the migration but when everything is switched to the new cluster, the previous one can be decommissioned. 155 | 156 | 157 | == Reference implementation 158 | 159 | As of right now there is not reference implementation of Kubernetes Cluster upgrade 160 | -------------------------------------------------------------------------------- /iep-002/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-2: Azure Virtual Networks for Cluster Segregation 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 2 18 | 19 | | Title 20 | | Azure Virtual Networks for Cluster Segregation 21 | 22 | | Author 23 | | link:https://github.com/rtyler[R. Tyler Croy] 24 | 25 | | Status 26 | | :scroll: Complete 27 | 28 | | Type 29 | | Architecture 30 | 31 | | Created 32 | | 2016-11-16 33 | |=== 34 | 35 | 36 | == Abstract 37 | 38 | Currently the only network connecting various infrastructure services is the 39 | public internet. Regardless of the security level of a service, if other 40 | services must connect to it they must do so over the public internet. This is 41 | not an explicit design decision, but rather a consequence of the organic, 42 | cross-datacenter manner in which infrastructure grew. 43 | 44 | 45 | == Specification 46 | 47 | As part of the 48 | link:https://wiki.jenkins-ci.org/display/JENKINS/Azure+Migration+Project+Plan[Azure migration] 49 | the Jenkins project now has access to "modern" software-defined networking 50 | tools which can allow designing and implementing better network topologies than 51 | "just the public internet." 52 | 53 | This can be accomplished with the Azure 54 | link:https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview[Virtual Networks] 55 | feature, similar to the AWS VPC feature, which allows definition of software 56 | defined networks into which Azure resources can be deployed (e.g. VMs, SQL 57 | Server DBs, etc). 58 | 59 | This proposal is for the creation of *three* Virtual Networks for the Jenkins 60 | project infrastructure: 61 | 62 | 63 | . *Public Production* 64 | . *Private Production* 65 | . *Development* 66 | 67 | At some point in the future, there might be more networks necessary, but at 68 | this point this seems sufficient to bootstrap the project infrastructure on 69 | Azure in a reasonably sane fashion. 70 | 71 | === Public Production 72 | 73 | The "Public Production" Virtual Network would contain end-user (developer or 74 | Jenkins user) facing services such as, but not limited to: 75 | 76 | * link:https://jenkins.io[jenkins.io] - Primary (static) website 77 | * link:https://ci.jenkins.io[ci.jenkins.io] - Jenkins-on-Jenkins cluster 78 | * link:https://accounts.jenkins.io[accounts.jenkins.io] - Account app 79 | * JIRA and Confluence 80 | 81 | Services in this network should be appropriately protected with 82 | Network Security Groups, only allowing the necessary application ports to be 83 | services, but should be otherwise considered "public." 84 | 85 | 86 | === Private Production 87 | 88 | The "Private Production" network is for services which are internal, or highly 89 | sensitive, within the Jenkins project's infrastructure. These are services 90 | which include, but are not limited to: 91 | 92 | * Puppet Master - the holder of all secrets for provisioning Public and Private 93 | Production-level services 94 | * "trusted.ci" - a behind-the-scenes Jenkins cluster with release/signing keys 95 | 96 | This network would also utilize Network Security Groups and will be peered with 97 | the Public Production network via 98 | link:https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview[Virtual Network Peering] 99 | which allows the two networks to route between each other via the Azure network 100 | backbone using private IP addresses. This peering is required to manage 101 | services via Puppet in the Public Production network. 102 | 103 | 104 | ==== Contributor Access 105 | 106 | The Private Production network will be locked down and unaccessible from the 107 | public internet. There are however some contributors, such as board members and 108 | team leads, which will need access to services within the Private Production 109 | network. 110 | 111 | These contributors will need to be granted access via a 112 | link:https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-about-vpngateways#point-to-site[VPN Gateway] 113 | in Azure. This creates some minor additional cost and management overhead 114 | per-user, but the security provided by the Private Production network enables 115 | projects such as fully automated core releases. 116 | 117 | 118 | === Development 119 | 120 | The "Development" network is somewhat of a "catch-all" for services which are 121 | not yet ready for production usage, testing, and demonstration environments. 122 | Services which live in this environment *will not* have access to the Puppet 123 | Master and therefore will not be capable of being fully provisioned in the 124 | same manner as "production" services. 125 | 126 | 127 | NOTE: At some point in the future, a staging Puppet master may be provisioned in this 128 | network but that is outside the scope of this document. 129 | 130 | 131 | 132 | --- 133 | 134 | [source] 135 | ---- 136 | +---------------------+ 137 | | | 138 | +---------------> | Public Production <-------+ 139 | | | | | 140 | | +---------------------+ VNet Peering 141 | | | 142 | | +-------------v--------+ 143 | +-------------+ | | 144 | The Internet ---------> + VPN Gateway |-| Private Production | 145 | +-------------+ | | 146 | | +----------------------+ 147 | | 148 | | +----------------+ 149 | | | | 150 | +---------------> | Development | 151 | | | 152 | +----------------+ 153 | ---- 154 | 155 | 156 | == Motivation 157 | 158 | 159 | Structuring the Azure-based infrastructure across the three proprosed Virtual 160 | Networks will create an additional level of service balkanization which we are 161 | currently (pre-Azure) are unable to provide. Per our possible infrastructure 162 | compromise earlier this year 163 | footnote:[https://jenkins.io/blog/2016/04/22/possible-infra-compromise/], 164 | the infrastructure should be *more* balkanized whenever possible to reduce 165 | the impact, or remove the possibility, of incursions into Jenkins project 166 | infrastructure. 167 | 168 | 169 | == Rationale 170 | 171 | 172 | The three Virtual Networks proposed represent a "minimum" structure to get 173 | infrastructure provisioned safely into Azure from a network perspective. 174 | 175 | A completely flat network topology was briefly considered, and does make 176 | management very easy, but leaves us with little network-based protection 177 | against unknown vulnerability in some of the non-end-user-facing services. 178 | Additionally, it is a requirement for at least "trusted.ci" to exist off the 179 | public internet as it contains signing keys and other highly sensitive secrets. 180 | 181 | Trusting Network Security Groups alone may inadvertently leave open holes in 182 | our infrastructure, whereas implementing a fundamental layer two 183 | footnote:[https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_Layer] 184 | separation ensures misconfigurations and/or accidents don't leave sensitive 185 | services in the Jenkins project infrastructure exposed. 186 | 187 | 188 | 189 | == Costs 190 | 191 | The cost of maintenance/implementation of these networks cannot be estimated at 192 | this point in time. 193 | 194 | The monetary cost only plays a factor when routing traffic between two 195 | networks, which would would be: 196 | 197 | [cols=2] 198 | |=== 199 | | Inbound data transfer 200 | | $0.01 per GB 201 | 202 | | Outbound data transfer 203 | | $0.01 per GB 204 | |=== 205 | 206 | link:https://azure.microsoft.com/en-us/pricing/details/virtual-network/[source] 207 | 208 | 209 | == Reference implementation 210 | 211 | As of right now there is no reference implementation of the various Virtual 212 | Networks. 213 | -------------------------------------------------------------------------------- /iep-008/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-008: Ldap on Kubernetes 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 8 18 | 19 | | Title 20 | | Ldap on Kubernetes 21 | 22 | | Author 23 | | link:https://github.com/olblak[Olivier Vernin] 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | Architecture 30 | 31 | | Created 32 | | 2018-01-31 33 | |=== 34 | 35 | 36 | 37 | == Abstract 38 | 39 | As part of the migration to Azure, the Ldap server must be moved. 40 | We can take this opportunity to containerize this service and move it on a container orchestrator like the kubernetes cluster. 41 | This new architecture change must be done carefully as this is a stateful application with a database that can be corrupted or lead to data loss. 42 | 43 | == Specification 44 | 45 | This IEP document is about running ldap on Kubernetes on Azure. 46 | Currently the ldap server is running on a bare metal machine and is configured by puppet. 47 | The objective here, is to dockerize this application in order to run it on a kubernetes cluster. 48 | 49 | As a stateful application, we must take into consideration following aspect when deploying the ldap server in production. 50 | 51 | 52 | === Access 53 | Ldap must be accessible from inside and outside the kubernetes cluster. 54 | In order to reach ldap from the wild, we need a fixed public IP and only allow connection from whitelisted IP. 55 | This can be easily done with the kubernetes resource 'service'. 56 | 57 | [cols="1a,2a", options="header"] 58 | .Access 59 | |=== 60 | |Inside 61 | |Outside 62 | | 63 | * https://accounts.jenkins.io/[Accountapp] 64 | | 65 | * https://repo.jenkins-ci.org/webapp/#/home[Artifactory] 66 | * https://ci.jenkins.io[ci.jenkins.io] 67 | * https://wiki.jenkins.io/[Confluence] 68 | * https://issues.jenkins-ci.org[Jira] 69 | * puppet master 70 | * spambot 71 | * trusted.ci 72 | |=== 73 | 74 | 75 | === Backup/Restore 76 | The procedure to backup/restore the database should be easy, two scripts are provided by the same docker image that run ldap. + 77 | Backups must be stored on an Azure File Storage in order to simplify their access from various location. + 78 | Backup name must respect format 'YYYYmmddHHMM' 79 | 80 | Backups and restore operation must be done in following situations: 81 | 82 | [cols="1a,2a", options="header"] 83 | .Backup/Restore 84 | |=== 85 | |Backup 86 | |Restore 87 | | * On daily basis 88 | * When the application is stopping 89 | * On demand 90 | | * On demand 91 | |=== 92 | 93 | === Certificate 94 | https://issues.jenkins-ci.org/browse/INFRA-1151[INFRA-1151] 95 | 96 | Letsencrypt can be configured with two different methods 97 | 98 | *HTTP-01* 99 | 100 | HTTP-01 configuration needs an Ingress resource that listen on port 443, but this ingress resource cannot be configured to also listening port 389/636. 101 | It means that we need a service for listening port 389/636 unfortunately this service don't handle Letsencrypt certificate requests. 102 | Therefor we would need both resources type, and they can't be configured with the same public IP. 103 | So this method doesn't work. 104 | 105 | 106 | *DNS-01* 107 | 108 | DNS-01 configuration only works with Google/AWS/Cloudflare. 109 | 110 | *Conclusion* 111 | I didn't find an easy way to use Letsencrypt certificate from kube-lego(deprecated)/cert-manager so 112 | we have to go with a manual requested ssl certificate. 113 | 114 | === Data 115 | The ldap database must be store on a stable storage that can be easily mounted/unmounted. 116 | Currently there are no perfect solutions as each solution has advantages and disadvantages. 117 | 118 | ==== Dedicated Azure Disk storage 119 | ReadWriteOnce 120 | [cols="1a,2a", options="header"] 121 | .Dedicated Azure Disk Storage 122 | |=== 123 | |+ 124 | |- 125 | | 126 | * Persistent Data across kubernetes clusters as we only have one container running at the time. 127 | * We only have to restore a backup once. 128 | | 129 | * Complexify cluster upgrade, https://github.com/jenkins-infra/iep/tree/master/iep-007[iep-007], 130 | traffic will be redirected to the new server once the old is deleted. 131 | * It means downtime when we upgrade the container as we must delete the old container before starting the new one. 132 | |=== 133 | 134 | ==== Dynamic Azure Disk Storage 135 | ReadWriteOnce 136 | 137 | [cols="1a,2a", options="header"] 138 | .Dynamic Azure Disk Storage 139 | |=== 140 | |+ 141 | |- 142 | | 143 | * Persistent data associated to a cluster life cycle 144 | * Simplify cluster migration, new cluster can be started even if the old cluster is still running 145 | | 146 | * We must restore a backup on each new kubernetes cluster deployment 147 | * While migrating the cluster, we must be sure to put the old cluster in read only mode. 148 | | 149 | |=== 150 | 151 | ==== Azure File storage 152 | ReadWriteMany + 153 | After running some tests, I noticed bad behaviors while running openldap on CIFS partition. 154 | Like 'permission denied issues' even if the blob storage was mounted as a ldap user, 155 | or database restore that hangs forever, ... 156 | At the end, I decided to not invest further time into this solution. 157 | 158 | ==== Conclusion 159 | Considering that it only takes 5seconds to backup/restore a ldap database, using a dynamic azure disk storage sounds reasonable. 160 | 161 | === Kubernetes Design 162 | .Kubernetes Schema 163 | [source] 164 | .... 165 | +----------------------------------------------------------------------------------------------+ 166 | | Namespace: Ldap | 167 | +----------------------------------------------------------------------------------------------+ 168 | | | 169 | | +----------------------------------+ +----------------------------------+ | 170 | +---------------+ | Statefulset: Ldap | | PersistentVolume: ldap-backup | | 171 | |Service: Ldap | +----------------------------------+ +----------------------------------+ | 172 | +---------------+ | +---------------------------+ | | * Terraform Lifecycle | | 173 | | * Ldap (389) | | | POD: ldap-0 | | | * ReadWriteMany | | 174 | | * Ldaps (636) | | +---------------------------+ |<--------------------------------------+ | 175 | +---------------+ | | +----------------------+ | | +----------------------------------+ | 176 | | | | | | Container: Slapd | | | | PersistentVolume: ldap-data | | 177 | | | | | +----------------------+ | | +----------------------------------+ | 178 | | | | | | * Ldap server | | | | * ClusterLife cycle | | 179 | | +--------->| | +----------------------+ | |<---+ * ReadWriteOnce | | 180 | | | | | | +----------------------------------+ | 181 | | | | +----------------------+ | | | 182 | | | | | Container: Crond | | |<---+----------------------------------+ | 183 | | | | +----------------------+ | | | Secret: Ldap | | 184 | | | | | * Backup Task | | | +----------------------------------+ | 185 | | | | +----------------------+ | | | * SSL certificate | | 186 | | | | | | | * Blob storage credentials | | 187 | | | +---------------------------+ | | * Ldap credentials | | 188 | | +----------------------------------+ +----------------------------------+ | 189 | +----------------------------------------------------------------------------------------------+ 190 | .... 191 | 192 | == Motivation 193 | The motivation here is to benefit from both Kubernetes and Azure services advantages. + 194 | link:https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/[What is Kubernetes?] 195 | 196 | == Rationale 197 | == Costs 198 | In addition of the Kubernetes cluster that we are already paying for, we'll need following services 199 | 200 | * Public IP 201 | * LoadBalancer 202 | * Azure file storage account for backup 203 | * Disk Storage account for Data 204 | * Ssl certificate `Ldap.jenkins.io` 205 | 206 | == Reference implementation 207 | * https://github.com/jenkins-infra/ldap[Docker Container] 208 | * https://github.com/jenkins-infra/jenkins-infra/pull/943[Jenkins-infra PR#943] 209 | * https://github.com/jenkins-infra/azure/pull/45[Azure PR#45] 210 | * https://issues.jenkins-ci.org/browse/INFRA-1131[JIRA Issue] 211 | -------------------------------------------------------------------------------- /iep-001/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-1: Infra Enhancement Proposal Format 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 1 18 | 19 | | Title 20 | | Infra Enhancement Proposal Format 21 | 22 | | Author 23 | | link:https://github.com/rtyler[R Tyler Croy] 24 | 25 | | Status 26 | | :scroll: Complete 27 | 28 | | Type 29 | | Process 30 | 31 | | Created 32 | | 2016-10-25 33 | |=== 34 | 35 | 36 | 37 | == What is an IEP? 38 | 39 | IEP ("eep!") stands for Infrastructure Enhancement Proposa. An IEP is a design 40 | document providing information to the Jenkins community, or describing an 41 | addition/change to the Jenkins project's infrastructure. The IEP should provide a 42 | concise technical specification of the addition/change and a rationale for the 43 | addition/change. 44 | 45 | We intend IEPs to be the primary mechanism for major infrastucture proposals, 46 | collecting feedback on the proposals, and for documenting the design decisions 47 | that have gone into the infrastructure over time. The IEP author is responsible 48 | for building consensus among the infrastructure team members, informing the 49 | users of the infrastructure and documenting dissenting opinions. 50 | 51 | Because the IEPs are maintained as text files in a versioned repository, their 52 | revision history is the historical record of the feature proposal 53 | footnoteref:[ieprepo, The source repository for IEPs can be found on link:https://github.com/jenkins-infra/iep[GitHub]] 54 | 55 | 56 | == IEP Types 57 | 58 | 59 | . **Architectural change** -- an architecture change describes a new way of 60 | delivering a service or services for the Jenkins project infrastructure. 61 | 62 | . **Services change** -- this IEP would describe a new service being added to 63 | the infrastructure or changing how an existing service is deployed 64 | 65 | . **Process change** -- describes a change to how some process for the Jenkins 66 | project, or an introduction of a new process to improve the efficiency, 67 | transparency or some other quality to the operation of project infrastructure. 68 | 69 | 70 | 71 | 72 | == IEP Workflow 73 | 74 | 75 | === Start with an idea for infrastructure 76 | 77 | The IEP process begins with a new idea for Jenkins project infrastructure. It 78 | is highly recommended that a single IEP contains a single key proposal or new 79 | idea. Small changes often don't need a full-fledged IEP document and can simply 80 | be contributed as Puppet or other code to the infrastructure workflow. 81 | 82 | Each IEP must have a champion -- someone who writes the IEP using the style and 83 | format described below, shepherds the discussions in the appropriate forums, 84 | and attempts to build community consensus around the idea. The IEP champion 85 | (a.k.a. Author) should first attempt to ascertain whether the idea is 86 | IEP-worthy by discussing it first on the 87 | link:mailto:infra@lists.jenkins-ci.org[infra mailing list]. 88 | 89 | Once the champion has asked the infra community whether an idea has any 90 | chance of acceptance, a draft IEP should be authored and linked to the mailing 91 | list. This gives the author the chance to flesh out the draft and its ideas 92 | before taking the time to formally submit the proposal. 93 | 94 | 95 | 96 | === Submitting an IEP 97 | 98 | Following a discussion on the infra mailing list, the proposal should be 99 | submitted as a draft IEP via a GitHub pull request to the IEP repository 100 | footnoteref:[ieprepo] 101 | 102 | The standard IEP workflow is: 103 | 104 | * You, the IEP author, fork the IEP repository footnoteref:[ieprepo], and 105 | create a directory named `iep-xxx` that contains a `README.adoc` which is the 106 | IEP document. Replace `xxx` with an incremental IEP number (e.g. `002`) 107 | * Push this to your GitHub fork and submit a pull request. 108 | * The IEP editors review your PR for structure, formatting, and other errors. 109 | Once the IEP editors have reviewed the basic formatting, they will attach the 110 | "draft" label to the pull request. 111 | 112 | IEP authors are responsible for collecting community feedback on a IEP 113 | before submitting it for review. However, wherever possible, long 114 | open-ended discussions on public mailing lists should be avoided. 115 | 116 | Strategies to keep the discussions efficient include: setting up a 117 | separate mailing list for the topic, having the IEP author accept 118 | private comments in the early design phases, setting up a wiki page, etc. 119 | IEP authors should use their discretion here. 120 | 121 | The IEP author is responsible for updating the pull request with appropriate 122 | changes based on these discussions. 123 | 124 | 125 | === IEP Maintenance 126 | 127 | In general, IEPs won't change over time but rather be superceded by follow-up 128 | or new prooposals. Once a proposal has been accepted and implemented, the IEP 129 | should stand the test of time as a reference document for future infrastructure 130 | contributors. 131 | 132 | Process IEPs may be updated over time to reflect changes to development .. 133 | practices and other details. The precise process followed in these cases will 134 | depend on the nature and purpose of the IEP being updated. 135 | 136 | 137 | 138 | === Creating a successful IEP 139 | 140 | Each IEP should have the following parts: 141 | 142 | . **Preamble** -- an AsciiDoc table containing meta-data about the IEP including 143 | the number, a short descriptive title (limited to a maximum of 50 characters), 144 | the names, and optionally the contact info for each author, etc. 145 | . **Abstract** -- a short (200 word) description of the technical issue being 146 | addressed. 147 | . **Specification** -- The technical specification should describe the 148 | requirements, architecture and design of the proposal. 149 | . **Motivation** -- The motivation is critical for IEPs that want to add new 150 | services to the Jenkins project's infrastructure. It should clearly explain 151 | what benefit to the project the service brings, identifying the problems it 152 | solves or the features it offers to the users or developers of the Jenkins 153 | project. IEP submissions without sufficient motivation may be rejected 154 | outright. 155 | . **Rationale** -- The rationale fleshes out the specification by 156 | describing what motivated the design and why particular design 157 | decisions were made. It should describe alternate designs that 158 | were considered and related work, e.g. how the feature is supported 159 | in other languages. 160 | The rationale should provide evidence of consensus within the 161 | community and discuss important objections or concerns raised 162 | during discussion. 163 | . **Costs** -- The costs of making this change - these can be financial costs 164 | associated with new hosting, and/or organisational costs that will be borne by 165 | Jenkins users/teams as part of this change. 166 | . **Reference Implementation** -- The reference implementation must be 167 | completed for new services or process changes. This could be as succinct as 168 | the code repository containing a new application, or an example project 169 | demonstrating a new process change. 170 | While there is merit to the approach of reaching consensus on the 171 | specification and rationale before writing code, the principle of "rough 172 | consensus and running code" is still useful when it comes to resolving 173 | discussions about a service's architecutre or implementation details. 174 | 175 | 176 | === IEP Formats and Templates 177 | 178 | IEPs are UTF-8 encoded text files using the 179 | link:http://asciidoctor.org[AsciiDoc] 180 | format. 181 | AsciiDoc allows for rich markup that is still quite easy to 182 | read, but also results in good-looking and functional HTML. 183 | 184 | 185 | ==== IEP Header Preamble 186 | 187 | Each IEP must begin with an AsciiDoc table containing meta-data relevant to the 188 | IEP. 189 | 190 | [source,asciidoc] 191 | ---- 192 | .Metadata 193 | [cols="2"] 194 | |=== 195 | | IEP 196 | | 1 197 | 198 | | Title 199 | | Infra Enhancement Proposal Format 200 | 201 | | Author 202 | | link:https://github.com/rtyler[R Tyler Croy] 203 | 204 | | Status 205 | | :speech_balloon: In-process 206 | 207 | | Type 208 | | Process 209 | 210 | | Created 211 | | 2016-10-25 212 | |=== 213 | ---- 214 | 215 | 216 | . **IEP** -- Proposal number, use a monotonically increasing number, starting 217 | from the latest merged IEP document 218 | . **Title** -- Brief title explaining the proposal in fewer than 50 characters 219 | . **Author** -- Author/champion of the IEP, in essence, the individual 220 | responsible for seeing the IEP through the process. 221 | . **Status** -- In-process, Accepted, or Rejected. IEPs should be authored with 222 | an In-process status. 223 | . **Type** -- Describes the type of IEP: Architectural, Service, Process 224 | . **Created** -- Date (`%Y%m%d`) when the document was first created. 225 | 226 | 227 | ===== Additional Files 228 | 229 | IEPs may include additional files such as diagrams and code snippets. Such 230 | files should be added into the `iep-xxx/` directory with self-explanatory file 231 | names. 232 | 233 | 234 | === IEP Rejection 235 | 236 | If an IEP is rejected, the pull request for the IEP should still be merged with 237 | additional information added to the header of the document explaining the 238 | decision making process and why the proposal was rejected. 239 | 240 | This should help in the future when decisions must be revisited or reviewed as 241 | tools, technologies and needs of the project change. 242 | -------------------------------------------------------------------------------- /iep-005/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-5: Systematic collection of project events 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | 17 | | IEP 18 | | 5 19 | 20 | | Title 21 | | Systematic collection of project events 22 | 23 | | Author 24 | | link:https://github.com/rtyler[R. Tyler Croy] 25 | 26 | | Status 27 | | :fire: Abandoned 28 | 29 | | Type 30 | | Service 31 | 32 | | Created 33 | | 2016-12-28 34 | |=== 35 | 36 | 37 | 38 | == Abstract 39 | 40 | Considering the size and velocity of the Jenkins project, it can be difficult 41 | to systematically determine the overall health through manual checks, scraping 42 | of various APIs, or through other non-automated means. Additionally, without 43 | any project-owned corpus of data around it is burdensome to answer questions 44 | such as: 45 | 46 | * What's the level of development activities in core, class A/B/C plugins? How are they changing over the time? 47 | * Who are the seasoned contributors? 48 | * Who are the new contributors that we can reach out to and help? 49 | * What's the typical journey of a plugin developer? 50 | 51 | == Specification 52 | 53 | The primary source of project events is currently GitHub, which provides 54 | organization-wide webhooks. This specification focuses primarily on these 55 | events but can be readily extended when new sources of information arise. 56 | 57 | === Technologies Introduced 58 | 59 | This specification introduces a few new technologies which are currently not 60 | part of the Jenkins project infrastructure: 61 | 62 | * link:https://azure.microsoft.com/en-us/services/functions/[Azure Functions] 63 | * link:https://azure.microsoft.com/en-us/services/event-hubs/[Azure EventHub] 64 | * link:https://azure.microsoft.com/en-us/services/documentdb/[Azure DocumentDB] 65 | 66 | 67 | The motivations for selecting each of these pieces of technology is discussed 68 | more in the <> section. 69 | 70 | 71 | 72 | .Component Diagram 73 | [source] 74 | ---- 75 | 76 | +--------------------+ +----------------------------+ +--------------+ 77 | | GitHub (jenkinsci) +----webhook------>| github-event-queue (App) +--enqueue-->| EventHub | 78 | +--------------------+ +----->+----------------+-----------+ | (all events) | 79 | +------------------------+ | | +--------------+ 80 | | GitHub (jenkins-infra) +-------+ | 81 | +------------------------+ | 82 | +----append----+ 83 | | 84 | +--------------v--------+ 85 | | DocumentDB (github) | 86 | +-----------------------+ 87 | ---- 88 | 89 | 90 | === Data Format 91 | 92 | The following describes the JSON format to be expected by applications querying 93 | either DocumentDB or consuming from an EventHub. 94 | 95 | .JSON Event/Document Format 96 | [source,json] 97 | ---- 98 | { 99 | "source" : "github", // <1> 100 | "type" : "pull_request", // <2> 101 | "event" : {}, // <3> 102 | "received" : "2017-01-05T21:56:04.522Z" // <4> 103 | } 104 | ---- 105 | <1> Original source of the event, `"github"` indicates the event originated as a GitHub webhook. 106 | <2> Type of event, in the case of the `github` source, this will be one of the link:https://developer.github.com/webhooks/#events[webhook event types]. 107 | <3> The actual JSON payload of the event. 108 | <4> The ISO-8601 timestamp indicating when the event was received by the Azure Functions app 109 | 110 | This table should be updated when new sources and types are added: 111 | 112 | .Event Sources 113 | |=== 114 | | Identifier | Description | Types 115 | 116 | | `github` 117 | | A GitHub webhook payload 118 | | One of the link:https://developer.github.com/webhooks/#events[webhook event types]. 119 | 120 | |=== 121 | 122 | 123 | === Storage Capacity 124 | 125 | The <> for Azure DocumentDB storage is only based on what is _actually_ 126 | consumed rather than what is provisioned. Therefore each DocumentDB provisioned 127 | for events should be at minimum *25GB*. 128 | 129 | Time-to-live:: Azure DocumentDB supports a document or collection-level time-to-live (TTL), which 130 | can automatically purge documents after the TTL expires. Until we better 131 | understand the amount of events data the project wishes to store, this will 132 | remain *off*. 133 | Consistency:: Until a compelling reason to change the consistency level for the 134 | DocumentDB resources, the default of *session* consistency should be used. 135 | 136 | 137 | === Monitoring 138 | 139 | The monitoring facilities 140 | link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-monitoring[built into Azure Functions] 141 | don't integrate with any of the existing monitoring tools in use by the Jenkins 142 | project. Azure Functions can however, output diagnostic logs and web server 143 | logs into an Azure storage account. This is not scoped in this document because 144 | it is not yet clear whether these logs are necessary and worth the added cost 145 | of storing them. 146 | 147 | zure DocumentDB does not have a supported integration with DataDog. 148 | 149 | 150 | Until there is more support in DataDog or Azure for better monitoring, these 151 | services will *not* be automatically monitored. 152 | 153 | 154 | 155 | == Motivation 156 | 157 | By provisioning relatively simple webhook receivers which not only archive 158 | events data into a live-queryable datastore (DocumentDB), but also publish those 159 | events on an Azure EventHub, the Jenkins project will have an easy-to-access 160 | data store of project events. Additionally, the events enqueued into EventHub 161 | can act as an input for future services (for example, a project health 162 | dashboard) without requiring additional infrastructure to be provisioned. 163 | 164 | == Rationale 165 | 166 | === Technologies Introduced 167 | 168 | An underlying rationale for using all of the Azure-specific technologies 169 | referenced below is that it is cheaper, easier, and faster to use 170 | platform-level services provided by Azure rather than implementing and hosting 171 | the features of each technology itself. 172 | 173 | ==== Azure Functions 174 | 175 | Implement a GitHub webhook receiver in Azure Functions is sufficiently trivial 176 | that it is one of 177 | link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-a-web-hook-or-api-function[their "Get Started" examples], 178 | and as such Azure Functions has explicit support and ehancements for receiving 179 | webhook payloads from GitHub. 180 | 181 | Additionally, Azure Functions represent a "slice" of computation which is 182 | suitable for the purpose of receiving a JSON payload, processing it, and 183 | storing it for later. This as opposed to implementing a new web application 184 | specifically for this purpose which would need either its own virtual machine 185 | or container infrastructure in order to execute. 186 | 187 | 188 | ==== Azure EventHub 189 | 190 | The use of Azure EventHub in the architecture described above is more 191 | future-proofing than a strong requirement to solve the problem at hand. The 192 | assumption being that additional services in the future will wish to consume 193 | some or all of the events received by the deployed Azure Functions app. 194 | 195 | The most practical means of providing this service internally is through a 196 | pub/sub mechanism which EventHubs provide in Azure. EventHubs also provide the 197 | added benefit of automatically expiring old messages along with many other 198 | valuable queueing features such as consumer groups and partitions. 199 | 200 | 201 | ==== Azure DocumentDB 202 | 203 | GitHub webhook event payloads are constructed as JSON, and it is expected that 204 | any subsequent events will be consumed as JSON. As such, a document-oriented 205 | database ("NoSQL") is preferred in order to avoid time-consuming schema 206 | updates. 207 | 208 | 209 | == Costs 210 | 211 | The pricing 212 | link:https://azure.microsoft.com/en-us/pricing/details/functions/[for Azure Functions] 213 | by itself is already confusing and without an existing Functions app 214 | consuming the `jenkinsci` events it's difficult to evaluate what the runtime 215 | cost would be. That said, 1 million monthly executions are provided for free by 216 | Azure, meaning the Function app itself will cost nothing or very little. 217 | 218 | 219 | The noticeable cost of this proposal will come from the 220 | link:https://azure.microsoft.com/en-us/pricing/details/event-hubs/[EventHub] 221 | and 222 | link:https://azure.microsoft.com/en-us/pricing/details/documentdb/[DocumentDB] 223 | storage and transit rates. 224 | 225 | 226 | === DocumentDB 227 | 228 | The storage rate, in East US, is *$0.25 per GB / month*. The throughput rate, in East US, is *$0.008/hr* per hundred 229 | link:https://docs.microsoft.com/en-us/azure/documentdb/documentdb-manage#request-units-and-database-operations[Request Units per second]. 230 | 231 | Assuming each DocumentDB instance is provisioned with 5GB of storage, the 232 | annual storage cost will be roughly *$15*. Though this is likely to go up as 233 | more data is stored over time. 234 | 235 | The throughput rate's annual cost is difficult to ascertain without real-world 236 | usage, but is expected to remain under *$100* barring dramatic shifts in 237 | expected usage. 238 | 239 | === EventHub 240 | 241 | The cost per million ingress events in East US is a paltry $0.028, so not worth 242 | discussing. 243 | 244 | The throughput unit cost (1MB ingress, 2MB egress) comes to an annual cost of 245 | *$133.92*. 246 | 247 | == Reference implementation 248 | 249 | The reference implementation of the Azure Functions app can be found in the 250 | link:https://github.com/jenkins-infra/analytics-functions[jenkins-infra/analytics-functions] 251 | repository. 252 | 253 | The Terraform for actually provisioning the Azure Functions app in Azure can be 254 | found in 255 | link:https://github.com/jenkins-infra/azure/pull/12[this pull request]. 256 | 257 | -------------------------------------------------------------------------------- /iep-003/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-3: Terraform for describing infrastructure as code 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 3 18 | 19 | | Title 20 | | Terraform for describing infrastructure as code 21 | 22 | | Author 23 | | link:https://github.com/rtyler[R. Tyler Croy] 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | Process 30 | 31 | | Created 32 | | 2016-11-16 33 | |=== 34 | 35 | 36 | 37 | == Abstract 38 | 39 | The migration to Azure means all infrastructure is (technically) an API call 40 | away. This means that more of our infrastructure can be managed via automation 41 | rather than previously where, in some cases, it was managed via support 42 | tickets. In order to develop, modify, and deploy Azure-based infrastructure the 43 | Jenkins project infrastructure should be define all infrastructure services and 44 | resources programmatically. 45 | 46 | == Specification 47 | 48 | 49 | link:http://terraform.io[Terraform] 50 | is a tool which provides Azure-specific abstractions 51 | footnote:[https://www.terraform.io/docs/providers/azurerm/index.html] 52 | for defining Azure resources programmatically, in addition to support for 53 | persistent the state of which resources have/have not yet been created. 54 | 55 | This document is not intended to explain all the features of Terraform, but 56 | rather describe how it should be used within the Jenkins project 57 | infrastructure. 58 | 59 | 60 | === Defining Infrastructure 61 | 62 | As a proof-of-concept, an existing Azure Storage account was modeled 63 | footnote:[https://github.com/jenkins-infra/azure/blob/7f3032ab2d2ef411d74d3a81097fbcd575850a34/plans/releases-storage.tf] 64 | with Terraform, e.g.: 65 | 66 | .releases-storage.tf 67 | [source] 68 | ---- 69 | resource "azurerm_resource_group" "releases" { 70 | name = "${var.prefix}jenkinsinfra-releases" 71 | location = "East US 2" 72 | } 73 | resource "azurerm_storage_account" "releases" { 74 | name = "${var.prefix}jenkinsreleases" 75 | resource_group_name = "${azurerm_resource_group.releases.name}" 76 | location = "East US 2" 77 | account_type = "Standard_GRS" 78 | } 79 | resource "azurerm_storage_container" "war" { 80 | name = "war" 81 | resource_group_name = "${azurerm_resource_group.releases.name}" 82 | storage_account_name = "${azurerm_storage_account.releases.name}" 83 | container_access_type = "container" 84 | } 85 | ---- 86 | 87 | Logical clusters or groups of resources should be defined in this manner, using 88 | a single `.tf` file in the `plans/` directory in the repository 89 | footnoteref:[azurerepo, https://github.com/jenkins-infra/azure]. 90 | This makes finding the right Terraform plan corresponding to a specific part of 91 | the infrastructure easy to find, easier to review, and easy to test in 92 | isolation from the other resources. 93 | 94 | 95 | *All identifiers must be prefixed*. 96 | 97 | Azure contains a number of *global* identifier namespaces which can cause 98 | conflicts between two different contributors, or two different environments, 99 | when defining infrastructure. For example, if DevA defines a resource group 100 | named "jenkins", DevB cannot also define a resource group named "jenkins". 101 | _Some_ identifiers are subscription specific, but in order to avoid potential 102 | conflicts, all identifiers in Terraform resources *must* use the `prefix` 103 | variable: 104 | 105 | .example.tf 106 | [source] 107 | ---- 108 | resource "azurerm_resource_group" "example" { 109 | name = "${var.prefix}jenkinsinfra-examples" # <1> 110 | location = "East US 2" 111 | } 112 | ---- 113 | <1> Referencing `var.prefix` pulls in an environment/developer-specific defined prefix 114 | 115 | 116 | === Storing State 117 | 118 | Terraform generates state which allows the tool to act in an idempotent 119 | fashion. That is to say, without a `.tfstate` file of some form, Terraform may 120 | create redundant infrastructure in Azure. 121 | 122 | This state *is* an important part of what makes Terraform a useful tool for the 123 | Jenkins project (see <>). This state contains access keys, and other 124 | semi-confidential information which would not be safe to check into the 125 | repository 126 | footnoteref:[azurerepo]. 127 | 128 | 129 | To reap the benefits of Terraform state without needing a local filesystem or 130 | `.tfstate` file checked into the repository 131 | footnoteref:[azurerepo] 132 | the proposal is to store _production_ Terraform state in Azure blob storage 133 | footnoteref:[blobstore, https://azure.microsoft.com/en-us/services/storage/blobs/] 134 | using Terraform's built-in 135 | link:https://www.terraform.io/docs/state/remote/index.html[Remote State] 136 | functionality. 137 | 138 | For production state, this would be configured as such: 139 | 140 | .prodstate.tf 141 | [source] 142 | ---- 143 | data "terraform_remote_state" "prod_tfstate" { 144 | backend = "azure" 145 | config { 146 | storage_account_name = "jenkinsinfra-tfstate" 147 | container_name = "production" 148 | key = "terraform.tfstate" 149 | } 150 | } 151 | ---- 152 | 153 | 154 | ==== Requirements 155 | 156 | Storing Terraform state in Azure blob storage dictates two requirements to the 157 | infrastructure code: 158 | 159 | . A separate "bootstrap" set of Terraform plans exists to define the storage 160 | containers necessary to store/access production state. 161 | . `terraform apply` statements be run in a consistent fashion, in order to 162 | ensure that the appropriate production state is being referenced during the 163 | execution of the Terraform plans. 164 | 165 | 166 | The first requirement is readily addressed with a separate directory structure 167 | and some tooling in the repository 168 | footnoteref:[azurerepo]. 169 | 170 | The second requirement addressed with the use of a "proper" Jenkins-based 171 | delivery pipeline for the Terraform plans. This would entail a Jenkins 172 | environment which had the appropriate credentials for provisioning 173 | infrastructure in the production Azure environment, e.g.: 174 | 175 | [source, groovy] 176 | ---- 177 | node('azurerm') { 178 | checkout scm 179 | 180 | stage('Validate') { 181 | sh 'terraform validate plans/*.tf' 182 | } 183 | stage('Plan') { 184 | sh 'terraform plan plans' 185 | } 186 | stage('Apply') { 187 | input 'Do the plans look good?' 188 | sh 'terraform apply plans' 189 | } 190 | } 191 | ---- 192 | 193 | 194 | This approach provides a single point of deployment for Terraform plans which 195 | can be then inspected or otherwise interacted with by the entirety of the 196 | Jenkins project infrastructure team instead of relying on individual 197 | contributors' laptops. 198 | 199 | 200 | == Motivation 201 | 202 | By using Terraform to describe *all* infrastructure there will be no "hidden" 203 | infrastructure which is only known to a select few, rather than the current 204 | situation where one or two people might be aware of _where_ certain resources are 205 | located or how they relate to others. 206 | 207 | 208 | Defining all infrastructure resources in Terraform also lowers the bar for new 209 | infrastructure contributions. Not only by making the actual infrastructure 210 | topologies open source, but by allowing practically anybody to provision 211 | infrastructure which resembles Jenkins project infrastructure. Currently there 212 | is no way to provision a "dev version of the Jenkins project infrastructure" 213 | and this would be feasible with Terraform plans describing the project's 214 | infrastructure. 215 | 216 | 217 | == Rationale 218 | 219 | The benefits of describing infrastructure as code should be self-evident, 220 | before considering the rationale for choosing Terraform, first consider some 221 | other options: 222 | 223 | 224 | === Azure Resource Manager (ARM) templates 225 | 226 | ARM templates 227 | footnoteref:[arm, https://azure.microsoft.com/en-us/documentation/articles/resource-group-authoring-templates/] 228 | are conceptually interchangeable with AWS Cloud Formation templates; templates 229 | defined in JSON to describe cloud resources. 230 | 231 | *Pros* 232 | 233 | * Supported for practically all resources on Azure 234 | * Relatively simple to use for the basic use-cases 235 | 236 | 237 | *Cons* 238 | 239 | * No state and therefore 240 | * Not idempotent 241 | * Entirely foreign to many within the operations community. 242 | * Would require an external source of template parameters in order to function. 243 | In essence, ARM allows template parameters but use of parameterized templates 244 | would mean Jenkins project infrastructure automation would require these to 245 | be defined externally to the ARM template to model multiple 246 | (local-dev vs. production) environments 247 | 248 | === Puppet-defined resources 249 | 250 | See also: 251 | link:https://github.com/puppetlabs/puppetlabs-azure[puppetlabs-azure module] 252 | 253 | *Pros* 254 | 255 | * Jenkins infrastructure already has large amounts of Puppet code 256 | implemented, and a well-defined workflow for modifying, testing, and 257 | deploying Puppet code. 258 | * Puppet's graph approach supports idempotency 259 | 260 | *Cons* 261 | 262 | * Puppet must have an "execution context" which for most Puppet catalogues 263 | means a `node` (machine) which the catalogue is being executed against. In 264 | order to provision Azure resources a "deployment" node would need to exist 265 | whose sole job would be to provision Azure resources from Puppet. Basically, 266 | one cannot run Puppet on "Azure" to provision an Azure Load Balancer (for 267 | example). 268 | * The puppetlabs-azure module is not a very common approach which means the 269 | tooling will lag behind "native" (i.e. supported by Microsoft) toolchains 270 | such as ARM templates 271 | footnoteref:[arm]. 272 | 273 | === Terraform 274 | 275 | :+1: 276 | 277 | Terraform is a reasonably popular and well understood tool, which enjoys 278 | contributions from Microsoft for its Azure support. 279 | 280 | *Pros* 281 | 282 | * Stateful and therefore 283 | * Supports idempotent operations 284 | * Widely used in the "modern operations" community, while the specific 285 | resources might not be familiar to newcomers, the tool itself would be. 286 | * Variable substitution and separation of state files allows development 287 | clusters to be created entirely separate from production while still 288 | resembling production infrastructure. 289 | 290 | 291 | *Cons* 292 | 293 | * "Yet another DSL" to learn in order to effectively contribute to the Jenkins 294 | project infrastructure 295 | * Doesn't support all resources defined by Azure, which might dictate the use 296 | of the 297 | link:https://www.terraform.io/docs/providers/azurerm/r/template_deployment.html[azurerm_template_deployment] 298 | resource in Terraform, and still needing to write ARM templates 299 | footnoteref:[arm]. 300 | 301 | 302 | 303 | 304 | 305 | == Costs 306 | 307 | There are no additional financial costs associated with using Terraform. There 308 | is a learning curve associated with Terraform but it's safe to assume that 309 | there's a learning curve with all things Azure for the infrastructure 310 | contributors at this point in time. 311 | 312 | == Reference implementation 313 | 314 | This 315 | link:https://github.com/jenkins-infra/azure/blob/7f3032ab2d2ef411d74d3a81097fbcd575850a34/plans/releases-storage.tf[plan] 316 | is a reference implementation of the only Azure resources provisioned to date. 317 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /iep-012/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-12: Jenkins Release Management 10 | 11 | :toc: 12 | 13 | .Metadata 14 | [cols="2"] 15 | |=== 16 | | IEP 17 | | 12 18 | 19 | | Title 20 | | Jenkins Release Management 21 | 22 | | Author 23 | | link:https://github.com/olblak[Olivier Vernin] 24 | 25 | | Status 26 | | :speech_balloon: In-process 27 | 28 | | Type 29 | | [Architecture] 30 | 31 | | Created 32 | | 2018-07-25 33 | |=== 34 | 35 | 36 | == Abstract 37 | In a matter of continuously improving Jenkins infrastructure project, it's now time to address how Jenkins is released and distributed. 38 | Currently new version can only be signed and published by link:https://github.com/kohsuke[Kohsuke], which brings a lot of responsibilities on his shoulder and prevent the project to release new version when needed. 39 | So the purpose of this document is to design a secure and clear way to allow trusted people to trigger a new release. 40 | 41 | == Specification 42 | As the jenkins project is build and driven by its community, we want to be as much transparent as possible in the way we build, test and release new versions and allow everybody to understand and provide reviews or audits. So the release process must be as much as possible automated and well-defined from git repositories. 43 | 44 | Building, testing, releasing new jenkins version covers different topics which need be addressed in this document. 45 | 46 | * Environment Provisioning 47 | * Environment Configuration 48 | * Environment Workflow 49 | 50 | === Provisioning 51 | Obviously the first element to agree on, is where to run those environments and how to provision them. 52 | Today, the Jenkins Infrastructure project mainly use an Azure account provisioned from some Terraform code located link:https://github.com/jenkins-infra/azure[here]. This repository make it easy for everybody to participate either by auditing, improving, or even simply learning. So we just have to decide on the services that we need there. 53 | 54 | Before going further, we must clarify that while we try to be as open as possible, we still need to have some element well hardened, to avoid malicious person to abuse our systems or to steal private information like keys. To avoid this situation, we need two level of security accesses, one infrastructure reachable for all contributors to have quick feedback on test results and another one with limited access from the internet for either security contributors or the release officer. 55 | 56 | As we are already running Jenkins master and agent in Docker containers with great success, we want to go further and use a container orchestration tool to deploy and configure our build environment. Kubernetes is an awesome tool and therefor selected to orchestrate our workload. We'll use the Azure Kubernetes service. 57 | 58 | Another important element, is where is stored the gpg key and the ssl certificate. 59 | Those two keys are used by maven release to generate signed artifacts. 60 | They are what ensure that end user can trust our artifacts and therefor must be extremely well protected. 61 | 62 | They are two approaches which have both benefits. 63 | 64 | 65 | The first and the most convenient solution is to store every secret encrypted in a git repository and use a tool like link:https://github.com/mozilla/sops[sops] to unencrypt them when needed. This solution make it easy for everybody to audit secret modifications. The build environment only requires a GPG key to decrypt those secrets. This means that we don't need any additional infrastructure other than a Kubernetes cluster. 66 | 67 | 68 | The second solution would be to deploy the GPG and the SSL certificate somewhere else not publicly available and reduce the access to specific networks. 69 | 70 | image:images/azure.png[azure] 71 | 72 | The latter solution open the door to more secure but also more complex infrastructure and complexify the reproducibility of our build environment which is not necessarily needed for non-critical environment. 73 | 74 | Both can be used depending on the environment. 75 | 76 | 77 | This link:https://github.com/jenkins-infra/azure/pull/75[PR] partially implements this design 78 | 79 | === Configuration 80 | Now that our infrastructure provisioning is defined, we can look at how we'll configure and the benefit of using Kubernetes for such build environment. 81 | 82 | Before going further, I want to remind some key features provided by Kubernetes that influence our design. First, every Kubernetes cluster has an internal DNS where each internal URL means a "service-name" inside a namespace or "service-name.namespace" across namespaces. This internal DNS make it easy to spin up multiple environment using the same internal endpoint. 83 | Another powerful feature is that we can map internal endpoint with an external one. 84 | This make the environment totally agnostic of where the services are located. 85 | 86 | Let's come back to our environment configuration. Kubernetes in association with Helm makes it easy to deploy and configure all kind of services required to build test and publish in a very convenient way wherever the cluster is located Azure or even Minikube. 87 | Most of the services that we need can already be deploy with Helm charts. 88 | 89 | The big picture is to create an environment charts, one to rules them all. 90 | This main chart is responsible to deploy jenkins credentials as Kubernetes secret, set a custom jenkins image with specific configuration and finally some chart dependencies to deploy third services (Artifactory,maven registry, ...) 91 | 92 | 93 | For example, let's imagine two different environments, security and release. 94 | The security environment generate short-live artifacts and so is configured to deploy every services needed to share those artifacts (maven, docker, war, etc.). It's easier to control the access, we don't pollute the official repository. 95 | But in the case of the release environment, it doesn't make sense to deploy a maven repository behind maven.release.jenkins.io as we already have link:https://repo.jenkins-ci.org[repo.jenkins-ci.org] 96 | So we just map http://repo with https://repo.jenkins-ci.org. 97 | 98 | .Environment 99 | image:images/environment_namespace.png[] 100 | 101 | Using Kubernetes as CI/CD environment is not only about how we deploy and configure Jenkins but how we deploy and configure all third services that we need. 102 | 103 | This repository use Jenkins-x to configure an environment https://github.com/olblak/jenkins-release-environment. Jenkins-x is an opinionated way for configuring such environment on Kubernetes. 104 | 105 | Following table display the different endpoints that we can use for each environment where the internal endpoint are the same for every environment. 106 | 107 | [cols="h,3*", options="header"] 108 | .Endpoint 109 | |=== 110 | | | Repo | Jenkins | Registry 111 | | Internal | http://repo | http://ci | http://registry 112 | | Weekly | http://repo.weekly.jenkins.io | https://ci.weekly.jenkins.io | http://registry.weekly.jenkins.io 113 | | Security | https://repo.cert.jenkins.io | https://ci.cert.jenkins.io | https://ci.registry.jenkins.io 114 | | Release | https://repo.jenkins-ci.org | https://ci.release.jenkins.io | https://hub.docker.com 115 | |=== 116 | 117 | === Workflow 118 | It's now time to cover the last topic, which is how we build, test and release a specific version for an environment. Jenkins is used as the conductor by means of Jenkinsfile and will execute a succession of actions defined link:https://github.com/olblak/jenkins/blob/master/Jenkinsfile.release[here]. 119 | In order to simplify the maintainability of this script, most of the logic as been moved to a shell link:https://github.com/olblak/jenkins/blob/master/scripts/buildJenkins.bash[script] that accept various arguments. 120 | 121 | All endpoints are hard-coded with Kubernetes internal endpoint like "repo", jenkins, ... And any other configuration is done via environment variables configured at the environment level. 122 | So every environment use the same Jenkinsfile but the output change depending on the environment configuration and the git repository. 123 | 124 | Maven with the link:http://maven.apache.org/maven-release/maven-release-plugin/[release plugin] is used to build Jenkins(link:https://wiki.jenkins.io/pages/viewpage.action?pageId=3309681[why?]). 125 | 126 | So in order to successfully build the application, we must: 127 | 128 | **** 129 | . Retrieve the GPG key. 130 | . Retrieve the private/public SSL certificate. 131 | . Retrieve password to unlock the gpg key and the certificate from Azure Key Vault. 132 | . Prepare the release with Maven ```mvn release:prepare``` 133 | . Perform the release with Maven ```mvn release:perform``` 134 | . Upload the WAR file. 135 | **** 136 | 137 | ==== Authorization 138 | Every environment maintainer is responsible to decide if his environment is publicly accessible or must be deployed in a private network. Then he can decide which mechanism(certificate, SSO ) is necessary either at the ingress level or inside the environment. 139 | 140 | ==== Artifacts 141 | 142 | The Jenkins project provides signed packages for Debian, Redhat, Suse, MacOSX and Windows. 143 | 144 | Because building a new Jenkins version can take quite a lot of time, the link://https://github.com/jenkinsci/packaging[packaging] process must be decoupled from the release process. 145 | This allow a Jenkins administrator to build and publish a war file in advance and only generate and publish distribution packages when needed. 146 | It also reduces the blast radius of an error happening in a Jenkins job once the war file is build and published. 147 | 148 | Packaging a new version means following steps: 149 | **** 150 | . Retrieve the latest git tag from link:/https://github.com/jenkinsci/jenkins[jenkinsci/jenkins]. 151 | . Download the war package 152 | . Build one package per distribution if it wasn't already published. 153 | . Publish one package per distribution. 154 | **** 155 | 156 | ==== Release 157 | We identify three different release types as explain https://jenkins.io/download/[here] 158 | 159 | . link:https://jenkins.io/download/lts/[LTS]: a LTS release is chosen every 12 weeks from the stream of regular releases as the stable release for that time period. https://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/[Download] 160 | . link:http://mirrors.jenkins.io/war/[Weekly]: A weekly release aim to deliver bug fixes and features to users and plugin developers. 161 | . Security Release: Regularly the security officer needs to build a "private" version from jenkinsci-cert/jenkins to do some testing or to share with other security contributors, once ready then merge successful version into link:/https://github.com/jenkinsci/jenkins[jenkinsci/jenkins] 162 | 163 | Deprecated Release 164 | 165 | . link:http://mirrors.jenkins.io/war-stable-rc/[LTS-RC]: Represented the futur stable version. 166 | . link:http://mirrors.jenkins.io/war-rc/[Weekly-RC]: Represented to futur weekly release. 167 | 168 | With the current design, one release corresponds to one environent. 169 | 170 | ==== Credentials 171 | In order to release and publish new releases, we need several credentials. 172 | 173 | A GPG key is used to sign War files and must be stored on an encrypted azure blob storage. The password used to decrypt the GPG key will be stored on an Azure Key Vault. 174 | 175 | A SSL certificate is required to sign 'jar' and will be stored directly on the Azure Key Vault, included the password to decrypt the certificate. The password can also be configured at the Jenkins instance level if we consider that It's better from a security point of view to not store both the certificate key and the password at the same place. 176 | 177 | A ssh key is needed with push permission on link:https://github.com/jenkinsci/jenkins[jenkinsci/jenkins] repository. 178 | 179 | An Azure storage account key is needed to publish some distribution packages to the link:https://github.com/jenkins-infra/azure/blob/master/plans/releases-storage.tf[Azure Blob Storage]. 180 | 181 | == Rationale 182 | 183 | 184 | == Costs 185 | Obviously the major cost is related to the Kubernetes cluster. 186 | 187 | == Reference implementation 188 | * link:http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-June/001448.html[Mail Thread] 189 | * link:https://support.cloudbees.com/hc/en-us/articles/222838288-ssh-credentials-management-with-jenkins[SSH credentials configuration] 190 | * link:https://batmat.net/2017/01/30/do-not-run-your-tests-in-continuous-integration-with-the-root-user/[Do not run your tests with root user] 191 | * link:https://github.com/jenkinsci/jenkins/blob/master/BUILDING.TXT[Build Jenkins] 192 | * link:https://github.com/olblak/jenkins/blob/master/Jenkinsfile.release[Release Jenkinsfile Prototype] 193 | * link:https://gist.github.com/kohsuke/3319b65432ab40793eadc297e2456b79[Release Script] 194 | * link:https://github.com/jenkinsci-cert/jenkins/wiki[Security Wiki] 195 | * link:https://jenkins.io/download/lts/[Release LTS] 196 | * link:https://github.com/jenkins-infra/runbooks/blob/master/security/flow.pdf[Flow] 197 | * https://wiki.jenkins.io/display/SECURITY/SECURITY+issues+in+plugins 198 | 199 | -------------------------------------------------------------------------------- /iep-004/README.adoc: -------------------------------------------------------------------------------- 1 | ifdef::env-github[] 2 | :tip-caption: :bulb: 3 | :note-caption: :information_source: 4 | :important-caption: :heavy_exclamation_mark: 5 | :caution-caption: :fire: 6 | :warning-caption: :warning: 7 | endif::[] 8 | 9 | = IEP-4: Kubernetes for hosting project applications 10 | 11 | :toc: 12 | :hide-uri-scheme: 13 | :sect-anchors: 14 | 15 | .Metadata 16 | [cols="2"] 17 | |=== 18 | | IEP 19 | | 4 20 | 21 | | Title 22 | | Kubernetes for hosting project applications 23 | 24 | | Author 25 | | link:https://github.com/rtyler[R. Tyler Croy], link:https://github.com/olblak[Olivier Vernin] 26 | 27 | | Status 28 | | :speech_balloon: In-process 29 | 30 | | Type 31 | | Architecture 32 | 33 | | Created 34 | | 2016-12-15 35 | |=== 36 | 37 | 38 | == Abstract 39 | 40 | The Jenkins project hosts a number of applications, using different technology 41 | stacks, with 42 | link:https://en.wikipedia.org/wiki/Docker_%28software%29[Docker] 43 | containers as the packaging and runtime environment. While the merits of using Docker 44 | containers to run these applications is not the subject of this IEP document, 45 | the hosting and deployment of these containers is. In essence, some tooling for 46 | safely hosting, deploying, and monitoring containers within the Jenkins 47 | infrastructure is necessary in order to support the applications the project 48 | requires. 49 | 50 | == Specification 51 | 52 | To support aforementioned containerized applications, 53 | link:http://kubernetes.io[Kubernetes] 54 | (also referred to as "k8s") will be deployed using the 55 | link:https://azure.microsoft.com/en-us/services/container-service/[Azure Container Service] 56 | (ACS). This specification will detail the *initial* Kubernetes cluster 57 | architecture but is not prescriptive on how the cluster should grow/shrink as 58 | requirements change for the applications hosted. 59 | 60 | This specification only outlines the architecture for the "Public Production" 61 | footnoteref:[iep2,https://github.com/jenkins-infra/iep/tree/master/iep-002] 62 | Kubernetes cluster. It is expected that the project will run multiple 63 | Kubernetes clusters, following this architecture, depending on the access 64 | control requirements for each discrete Kubernetes. 65 | 66 | At a high level Kubernetes is a master/agent architecture which would live in a 67 | *single region*. For the purposes of Jenkins infrastructure, the production 68 | Kuberenetes clusters would be located in the "East US" Azure region. 69 | footnoteref:[regions,https://azure.microsoft.com/en-us/regions/] 70 | 71 | The virtual machines are all 72 | link:https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/#d-series[D-series] 73 | instances in order to provide an ideal balance of cost and performance. For 74 | more, see <> below. 75 | 76 | 77 | *Kubernetes master* 78 | 79 | A cluster would run a single D3v2 instance as the master node 80 | 81 | 82 | *Kubernetes agents* 83 | 84 | A cluster would run agents using a 85 | link:https://azure.microsoft.com/en-us/services/virtual-machine-scale-sets/[Scale Set] 86 | of D2v2 instances, with a minimum of three agents running at all times. 87 | 88 | 89 | 90 | [NOTE] 91 | ==== 92 | Following 93 | link:https://github.com/jenkins-infra/iep/tree/master/iep-003[IEP-3] 94 | the <> uses Terraform to describe the infrastructure 95 | necessary to support this specification. 96 | ==== 97 | 98 | 99 | === Monitoring 100 | 101 | As the Jenkins project already uses 102 | link:http://datadoghq.com[Datadog] 103 | for monitoring our production infrastructure, the Kubernetes cluster must 104 | integrate appropriately. 105 | 106 | This will be accomplished using the 107 | link:http://docs.datadoghq.com/integrations/kubernetes/[Datadog/Kubernetes integration] 108 | maintained by Datadog themselves. 109 | 110 | The integration is will provide metrics for both: 111 | 112 | * Kubernetes cluster health/status 113 | * Container health/status running atop the cluster 114 | 115 | The metrics required are not specified here under the assumption that all 116 | metrics appropriate for ensuring stable service levels of project 117 | infrastructure will be collected. 118 | 119 | === Logging 120 | 121 | 122 | Centralized logging for applications hosted within Kubernetes will be provided 123 | a combination of containers in the cluster running 124 | link:https://en.wikipedia.org/wiki/Fluentd[Fluentd] 125 | and the Microsoft 126 | link:http://www.microsoft.com/en-us/cloud-platform/operations-management-suite[Operations Management Suite] 127 | (OMS). 128 | 129 | Fluentd container will redirect logs based on rules defined by log's type. 130 | 131 | As a first iteration, we identify two log's type, archive and stream. 132 | They are explain below. 133 | 134 | 135 | ==== Type 136 | ===== Stream 137 | 'Stream' means that logs are directly send to log analytics 138 | where they will be available for a short time period (7 days). 139 | After what they will be definitively deleted. 140 | 141 | Reasons why we consider logs as 'stream' are: 142 | 143 | * Costs, we don't want to pay for useless logs' storage 144 | * Debugging, we may have to analyze application's behaviour 145 | 146 | In order to retrieve logs informations, we'll need an access to log analytics dashboard. 147 | 148 | *! This is the default behaviour followed by all log's types* 149 | 150 | ===== Archive 151 | 'Archive' means that we want to access them for a long time period. 152 | We store them on an azure blob storage (or shared disk). 153 | Those logs will be kept on Azure containers for an undetermined period. 154 | In order to retrieved them, we'll have to request compressed archives from an admin. 155 | 156 | Reasons why we may consider logs as 'archive' are: 157 | 158 | * Need long time period access 159 | * Want to backup important informations 160 | 161 | Logs work as follow: 162 | 163 | * Docker containers write logs to json files located in /var/log/containers/ on each kubernetes agent 164 | * Each kubernetes agent run one fluentd container( as a daemonset) that read logs from /var/log/containers 165 | and apply some 'rules' 166 | 167 | 168 | .Data flow for k8s logs 169 | [source] 170 | .... 171 | +--------------------------------------------------------------+ 172 | | K8s Agent: | 173 | | +------------+ | 174 | | |Container_A | | 175 | | | | | 176 | | Agent Filesystem: +---------+--+ | 177 | | +--------------------+ | Azure LogAnalytics | 185 | | |Fluentd +-------------------------------/ | +--------------------+ 186 | | |Container +-------------------------------\ | +--------------------+ 187 | | +----------+ apply_rule_0_archive_logs_to --------------------->| Azure Blob Storage | 188 | | | +--------------------+ 189 | +--------------------------------------------------------------+ 190 | .... 191 | 192 | In order to know which workflow need to be apply. 193 | We use kubernetes lables. 194 | 195 | By convention we use label 'logtype'. 196 | 197 | If logtype == 'archive', we apply 'archive' workflow. 198 | Otherwise we apply 'stream' workflow. 199 | 200 | pros: 201 | 202 | * We don't have to modify default logging configuration. 203 | * We don't have to rebuild docker image when we change log type. 204 | * We don't have to restart docker container when we modify log type. 205 | * Easy to handle from fluentd configuration. 206 | 207 | cons: 208 | 209 | * We can't have different log's types within an application 210 | 211 | A docker image that implement this workflow can be found in Olivier Vernin's 212 | link:https://github.com/olblak/fluentd-k8s-azure[fluentd-k8s-azure] 213 | repository. 214 | 215 | 216 | === Deployment/Orchestration 217 | 218 | As we made the decision to use a kubernetes infrastructure, + 219 | We still need processes and tools to automate and tests kubernetes deployment. 220 | 221 | Kubernetes use : + 222 | 223 | - Yaml files to define infrastructure state 224 | - Kubectl cli to interact with kubernetes cluster (CRUD operations) 225 | 226 | Because using kubectl to deploy yaml files is idempotent,we can easily script it. 227 | 228 | We investigated two approaches to automate k8s deployment 229 | 230 | 1. Using Jenkins + scripting to apply configurations 231 | 2. Using Puppet to apply configurations 232 | 233 | ==== Jenkins 234 | Each time we need to apply modifications, we just have to create/update yaml configurations files 235 | and let jenkins deploy it. + 236 | Fairly easy as kubectl is idempotent so we just have to follow this process. + 237 | 238 | .Jenkins 239 | +-------------+ +----------------+ 240 | | | | Github | 241 | | Contributor +-------------------------> | jenkins-infra | 242 | | | Commit | | 243 | +-------------+ code +--------+-------+ 244 | | 245 | | Trigger 246 | +-----------------+ | Test&Deploy 247 | | K8s cluster | v 248 | | +-----+ +-----+ | +--------+-------+ 249 | | |Node | |Node | | | Jenkins | 250 | | | 1 | | 2 | <-----------------------| Jenkinsfile | 251 | | +-----+ +-----+ | Apply configurations +----------------+ 252 | +-----------------+ 253 | 254 | But: 255 | 256 | - How do we share/publish secret informations, credentials,...? + 257 | A solution would be to encrypt secrets with password or gpg keys before pushing them 258 | on git repository. 259 | Jenkins will have to unencrypt them before deploying them on kubernetes cluster. 260 | - How do we handle resources ordering? + 261 | We can use naming conventions to be sure that resource 00-secret.yaml will be deploy before 01-daemonset.yaml 262 | - Is some case, complexe logics need to be apply to achieve correct deployments. + 263 | Which can be done through scripting like bash,python,... + 264 | Ex: Updating k8s secrets, do not update secrets used in containers applications. 265 | which mean that each time we update secrets, we also have to take care of pods using them 266 | 267 | Problems explained above are common concerns that configuration management tools try to solve. 268 | 269 | ==== Puppet 270 | We may also use puppet to template and apply kubernetes configurations files. 271 | Main advantages: 272 | - We already have a puppet environment configured and correctly working 273 | - We already have a good testing process with puppet, linting, rspec,... 274 | - We already have a deployment workflow, feature branch -> staging -> production 275 | - We can use hiera to store secrets 276 | - We can use puppet to define complexe scenarios 277 | 278 | .Puppet 279 | 280 | +-------------+ +----------------+ 281 | | | | Github | 282 | | Contributor +--------------------> | jenkins-infra | 283 | | | Commit | | 284 | +-------------+ code +--------+-------+ 285 | | 286 | | Trigger 287 | +-----------------+ | Test 288 | | K8s cluster | | 289 | | +-----+ +-----+ | +--------v-------+ 290 | | |Node | |Node | | | Jenkins | 291 | | | 1 | | 2 | | | Jenkinsfile | 292 | | | | | | | +--------+-------+ 293 | | +-----+ +-----+ | | 294 | +--------+--------+ | Merge 295 | ^ | Production 296 | | | 297 | | +--------v-------+ 298 | | | Puppet | 299 | +---------------------------+ Master | 300 | Apply configurations +----------------+ 301 | 302 | ==== Conclusion 303 | We agreed that we gonna use puppet to deploy kubernetes configurations. 304 | If needed we are still able to use another solution. 305 | 306 | == Motivation 307 | 308 | The motivation for centralizing container hosting is fairly 309 | self-evident. Consistency of management, deployment, logging, monitoring, and 310 | runtime environment will be a major time-saver for volunteers participating in 311 | the Jenkins project. 312 | 313 | Additionally, consolidation on a well understood and supported tool 314 | (Kuberenetes) allows the infrastructure team to spend less time operating the 315 | underlying hosting platform. 316 | 317 | 318 | == Rationale 319 | 320 | As mentioned in the <>, the Jenkins project runs containerized 321 | applications, the merits of which are outside the scope of this document. 322 | Thusly this document outlines an approach for managing numerous containers in 323 | Azure. 324 | 325 | There is a fundamental assumption being made in using Azure Container Service, 326 | that is: it's cheaper/easier/faster to use a "turn-key" solution for building 327 | and running a container orchestrator (e.g. Kubernetes) than it would be to 328 | build out such a cluster ourselves using virtual machines and Puppet (for 329 | example). 330 | 331 | With this assumption, the options provided by ACS are: Kubernetes, Docker 332 | Swarm, or DC/OS. 333 | 334 | The selection for Kubernetes largely rests on two criteria: 335 | 336 | . Kubernetes is supported in some form by two of the three major cloud vendors 337 | (Microsoft, Google). Which indicates project maturity and long-term support but 338 | also flexibility for the Jenkins project to migrate to alternative cloud 339 | vendors if the need were to arise. 340 | . Developer preference: we prefer Kubernetes and the tooling it provides over the alternatives. 341 | 342 | === Docker Swarm 343 | 344 | Docker Swarm is the leading option, behind Kubernetes, But the open source 345 | "swarm mode" functionality is not supported by Azure Container Service, nor is 346 | Docker Swarm supported by any other vendor other than Microsoft at this point. 347 | 348 | The focus from Docker, Inc. seems to be more on products such as 349 | link:https://www.docker.com/products/docker-datacenter[Docker Datacenter] 350 | long-term, which makes choosing Docker Swarm on ACS seem risky. 351 | 352 | === DC/OS 353 | 354 | Similar to Docker Swarm on ACS, there is no mainstream support for DC/OS on 355 | other cloud providers which suggests either immaturity in the project or lack 356 | of long-term committment by platform vendors to support it. 357 | 358 | Additionally, at this point in time, the authors of this document do not know 359 | anybody committed to running production workloads on DC/OS (we're certain they 360 | exist however). 361 | 362 | == Costs 363 | 364 | [quote, https://azure.microsoft.com/en-us/pricing/details/container-service/] 365 | ____ 366 | ACS is a free service that clusters Virtual Machines (VMs) into a container 367 | service. You only pay for the VMs and associated storage and networking 368 | resources consumed. 369 | ____ 370 | 371 | 372 | Assuming a single minimally scaled cluster with a single master and three 373 | agents, the annual cost of the Kubernetes cluster itself would be: *$3,845.64*. 374 | Obviously as the number of agents increases, the cost will increase per-agent 375 | instance. 376 | 377 | 378 | .Costs 379 | |=== 380 | | Instance | Annual Cost (East US) 381 | 382 | | D2v2 383 | | $1278.96 384 | 385 | | D3v2 386 | | $2566.68 387 | |=== 388 | 389 | 390 | [[reference-implementation]] 391 | == Reference Implementation 392 | 393 | 394 | The current reference implementation is authored by 395 | link:https://github.com/olblak[Olivier Vernin] 396 | in 397 | link:https://github.com/jenkins-infra/azure/pull/5[pull request #5] 398 | to the 399 | link:https://github.com/jenkins-infra/azure[azure] 400 | repository. 401 | --------------------------------------------------------------------------------