├── .gitignore
├── iep-012
├── images
│ ├── azure.png
│ ├── environment_namespace.png
│ ├── azure.xml
│ └── environment_namespace.xml
└── README.adoc
├── iep-004
├── images
│ ├── enable_azure_custom_logs.png
│ └── get_azure_OMS_credentials.png
└── README.adoc
├── template.adoc
├── README.adoc
├── iep-006
└── README.adoc
├── iep-009
└── README.adoc
├── iep-007
└── README.adoc
├── iep-002
└── README.adoc
├── iep-008
└── README.adoc
├── iep-001
└── README.adoc
├── iep-005
└── README.adoc
├── iep-003
└── README.adoc
└── LICENSE
/.gitignore:
--------------------------------------------------------------------------------
1 | *.html
2 | *.sw*
3 |
--------------------------------------------------------------------------------
/iep-012/images/azure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-012/images/azure.png
--------------------------------------------------------------------------------
/iep-012/images/environment_namespace.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-012/images/environment_namespace.png
--------------------------------------------------------------------------------
/iep-004/images/enable_azure_custom_logs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-004/images/enable_azure_custom_logs.png
--------------------------------------------------------------------------------
/iep-004/images/get_azure_OMS_credentials.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jenkins-infra/iep/master/iep-004/images/get_azure_OMS_credentials.png
--------------------------------------------------------------------------------
/template.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-X: The title goes here
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | X
18 |
19 | | Title
20 | | The title goes here
21 |
22 | | Author
23 | | The author goes here
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | [Process|Service|Architecture]
30 |
31 | | Created
32 | | 2016-10-25
33 | |===
34 |
35 |
36 |
37 | == Abstract
38 |
39 | == Specification
40 |
41 | == Motivation
42 |
43 | == Rationale
44 |
45 | == Costs
46 |
47 | == Reference implementation
48 |
49 |
--------------------------------------------------------------------------------
/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 |
10 | = Infrastructure Enhancement Proposals
11 |
12 |
13 | This repository is for tracking enhancement proposals to the
14 | link:https://jenkins.io[Jenkins project's]
15 | infrastructure. The goal of this repository is to collect and codify proposals
16 | in a transparent and historically reviewable fashion. It can be thought of
17 | similar to efforts such as the
18 | link:http://www.ietf.org/rfc.html[IETF's RFC process]
19 | or the Python project's
20 | link:https://www.python.org/dev/peps/[PEPs].
21 |
22 |
23 | = Creating a proposal
24 |
25 |
26 | . Prepare an AsciiDoc file outlining your proposal, using requirement verbiage following
27 | link:http://www.faqs.org/rfcs/rfc2119.html[RFC 2119]
28 | and the directions from
29 | link:https://github.com/jenkins-infra/iep/tree/master/iep-001[IEP-001].
30 | . Copy `template.adoc` into the new `iep-xxx/` folder as `README.adoc`,
31 | numbering the directory sequentially from the previous IEP.
32 | . Write proposal
33 | . Open a pull request.
34 |
--------------------------------------------------------------------------------
/iep-012/images/azure.xml:
--------------------------------------------------------------------------------
1 | 5Vlbc5s6EP41foyHiwH7sXYu5yGn45l0pu2TRwYZVMusK4Rj99dXAomLAZfUJO3JyYthWd32+3b1SRnZi93xgaF99C8EmI4sIziO7NuRZZkTyxU/0nLKLd5klhtCRgLlVBqeyA+sjIaypiTASc2RA1BO9nWjD3GMfV6zIcbgue62AVofdY9C3DA8+Yg2rZ9JwKPcOrW80v4PJmGkRzZdtb4d0s5qJUmEAniumOy7kb1gADx/2h0XmMrg6bjk7e47vhYTYzjmvRo4ah78pBeHA7FW9QqMRxBCjOhdaZ0zSOMAyx4M8RbxHRWPpnjER8K/VJ6/Spexo96WmJEd5piphmKK7PQlcxER0gbVxikMzWbBB4mheI0hxrnlnlCqvn/DnJ8UZ1DKQZjKZTwC7NUEE85gixdAgWULtw3DdReL4otGVmAy30DMK57y716EdJ4HT0asM/7KlEDKfOVlK8oiFmLl5RTYi6TBINbLTsKFYYo4OdR7R4q9YeFXAiweFMbteKuhD4imqtOR5VKulpiljF6l+z2VLJyXkamY3FD+3sXBHkjME92JGD7vJ//e4FZBdwlUgJKojUV1YBTGFK0xXUJCOIFYmH0RYEmJ+QEzTkRqPp45rIFz2FUcPlASyg9cMmCO1FvRj5jaXs5ydwxl0RrvEh/hsU8hDcboR8rwigIKVmtEUexjttpgxIVVRk5wr0Yjb3brFquXw+PjZXo0gdcNZqpOnHQlmebvz2XZcZRLVKk42nYNVbwWqjThzGLGsiJbAtqRXiou1XgpdCVpVMqak2a+iabTW68NtQpt5OhLxIU9ziyWIXtCyT7fATbkKOc2BCqWV0dFR7sCSgFcFZXCeA0s0wYsS0YOiGNhXDIIxM9HzJ+BbQfMvi6KD52V50mYJd74QBhPEV3Fal11vrTU5wZfWA5CP/C9yylZT8gG8nYb8Aq03uVcjb6UxbUc+saq8+5mWu8BNptEbCjnZCrW0ItfpvsaiqBUAV/1Nn5ZEWTJWxUE3jvRA15TDzhNPeBeqQeypiIs6FRx0Ht1F78cp0aviXVGpby/3yaW9xrEerHUdGq8+l+RyrT+BKtMfdpTtJqZg9LKGVTRfmJpwvF2+t4VrTgZc0RioWMTzA7Ev1rHXt407frG5dm9VKztXi+X3EHp8fT0+M6JscWn1QGleYRejxCmcaajmozQd0JDn2tMa1BKPCwf3jklEg4suxDrJMTlnWtYohhur9oxsQdgymRQpiRCQsiJ43UEsO29w/wBkXQmvbtEUl8R1EWPisjyKUoS4n+KSFzTWmVGVJVSt6rqoZVaLuRMfYlcEUv2RVreGGPTsVSF+t1j3bDnNrOFrP+5C4BfFqb8RiAkPErXKx8CfIFfvcrP5asepy5c7Le7frOHPS69NLd75uZfdg5qyW19Z9Ijtwe/bddjD7SBLEQbRtYizOwNzid/WcqnMQmEB9kQHGSJv4L1t/ze+e3y35zN+qmPl5NHvJb/9svrf/nPU/vuJw==
--------------------------------------------------------------------------------
/iep-012/images/environment_namespace.xml:
--------------------------------------------------------------------------------
1 | 7Vxdc5s6EP01ebwZMEY4j7Hb3q/0TibuTNtHBRSjBiOPkJO4v/6uQMJggU0MJk5LJjMxQgjpnCPtarXxhTNbvvzJ8Sr8zAISXYys4OXC+XAxGtnjEYI/smSTlXjjq6xgwWmgKm0L5vQnUYWWKl3TgCSlioKxSNBVudBncUx8USrDnLPncrUHFpXfusILYhTMfRyZpV9pIMKsdDLytuV/EboI9ZttpMa3xLqyGkkS4oA9F4qcjxfOjDMmsk/LlxmJJHgal+y5TzV3845xEosmD0xUP55wtFaDu/YF46p3YqOHDB1dyY/rZZRVcKZPhAsKoNzgexLdsoQKymKocs+EYEuoEMkbU+w/Ljhbx8GMRfI5aM15SH8KbVxHdCGfFWwFpaFYRnBhw0e2FhGNySxn0oLCtDkSqKtEcPZICq1b6U9+R1ME4E4faBTpmjGLYURTrF4dkQdof6rwgH6Rl1pQ7Zwq0DhhSyL4BqroB9zsCaVu21O6eN5qxVG4hwWZIFWGlToXecNbAuGD4rCGT9vg06CSxMG1nARw5Uc4SahfhhxGyTffFLjpxXd5cenKyxcqvul68Dm74+z8qJu3hFMYAeGVtGX9IoEx1XZghr6zNfdJWbAC8wVR1ZwaOgp4uxV46zJOIizoU7kbVSSoN9wyCh3csm2V2EbODo1Z79VDxam40w4qt+NOdtrJRmy0A1TiTaHaSlZI6rs7Kb9mZJUWCPiQNbhVWw5oMwGOOhBgG5HtU++pBOheDQI8GwG6hgBDIaRjcC1fMvoUru8vA+Y/En7pg5naFWdBhycxMz6IS4q1C0Mz9lAJS0djWxCe7Y7qldfK1KADQHOyYpc/SPxI4+QPn14yvnjXaKOx9XZoX5nr6r/re8JjWPoyyK0vfJ0IEjxOEgPm5JkuI5wi88BiMVd3JJx+SKPgBm/A0YKSRIC7pq+mIeP0J9THmiO4zYXyx8GJr4R++9BcNqZtP0ngsVsNtr1T9Bm/lCre4EQ7ez6LIrxK6H3aZfngEhYiGk+Vm9nSIzSUorH6VBha0WXNXXarG1nZO7LSLRRUhVCF9XCcSXtZOaa/+B9ekmSFpb2TqpoTf82p2AyaqtNUrYJuaKwROSgqQ5eNlJWtCqa0tJRQMynZLupASleHPT+5EZQbvJV22uYgNBovDs0mtZHH97ohy8SibhfwWix23aejsJg0xaLViGvcTt1S2TmwHQMA17IrALDG7QEwrdXf8QKmYWPTZP8yy0ijiZwJZu8OIrcTnZPlGGT9k3ltBlkwBGGshQrcJl5YdbgnyRaBmzQA82G8LblTA5VFDJ59iFLDG9IgIHG6Qgusp4iEWm09oKvuFH4Br5nc+LnQ9Rlc29tr+JXVuZixGEaDaUoEAWqfiaR3GuAkTNd+0zVtROirF+ZKekcdrMtjg907cMybUVs0c78wvx0wqv3+Phg1d7h3ZEET3crvOWE74NCb9MehZ3A4J/yJgtc9GMjXGEj9RNVe/GQG0/TutMFMt0wyDJJHQX50a0rVKcX7mZdtDal3NobU3N+khtSknHdtX9836R3Q3Kd11eH0KvOac12McnZret8d1x3Q26fhtc2I1y0LBqt7jNV1kBmrPJ3Vtc2gwrBPbWte7VdHzU43M804xLBTPcaWHuC0V2NaFX0Y9qodsNinzayIOBjsmUkdX0IqcWyVWZTeiQN5bFKaAXuzNnT0vZi2odaWYtbG4ZDrKZI2dDR+N7uiaZaGfn8e1W+WpXFEJoVWTpFkgF27QqDvkC1YjKOP29IdE3Qwqcc9oIofRIiNcp/wWjA5V/P33jA5w5WyatTXiXjGpnhqjl4aq6Lx1PPenoSG4Dbl6kgSXJME2+qLhUlfLJwD0pMKpGsip90jffU7IX1VgbTTE9LacAwri474lGgY90SDa4YVdzJw7kgEvmPjg4EhAed8EnBsnQ7dSwYOMiOXhmhOmYFTlwb9Fhk4yAzz1WDRZsSoxgE5gwwcZEbLhhScfVMZHf4vktPFNtGQg9N9bBPtX5z7jG2iIQunm9jmAU77jG2iIQ/n2NjmARb7jG0i899YhkycY+zkm2TiIDOPakjFOZk9RedjT80UrCEZpzsju5/oXo1sVdbVkI3T0vzu57fXPFgzZjGk4xxpehun4+gJ3Io4M8Ay7FhbJ7vWxJPewMJ65o7VIPaMzvy19ItxfVRx6I9qMhGHU3/Nu2vS/Ouf+lfKp+LYH9UkjXR+LOSdQfJF/6dzlTRUHPyjvtIvvN7SL84Cas+E2qvxObqHureTf6347eX3fAL0TUMVCxVZAdm32rRgocasTNzL8peVIXsnKtOhZXmfSTT79HIkwRUJNpOaLVlLgj1vdDKC5XY7/6LCrPr26x6dj/8D
--------------------------------------------------------------------------------
/iep-006/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-006: Jenkins.io on Azure
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 006
18 |
19 | | Title
20 | | Hosting Jenkins.io on Azure
21 |
22 | | Author
23 | | link:https://github.com/olblak[Olblak]
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | [Architecture]
30 |
31 | | Created
32 | | 2017-06-26
33 | |===
34 |
35 |
36 |
37 | == Abstract
38 |
39 | Currently the main Jenkins website (jenkins.io) runs on a single server. +
40 | This basic architecture has his limitations in terms of availability and scalability and could be easily improved by using cloud services.
41 |
42 | The goal of this document is to design the best solution for a jenkins.io hosted on Azure.
43 |
44 | == Specification
45 |
46 | Following elements must be taken into accounts.
47 |
48 | * Jenkins.io only contains static files
49 | * Endpoint Require HTTP/HTTPS support
50 | * Use several URL redirection but only one endpoint (https://jenkins.io)
51 | * www.jenkins.io is updated every 30 minutes
52 |
53 | === Static website
54 |
55 | 'www.jenkins.io' only contains static files which means that using a blob storage for storing
56 | files and azure-cli for updating the website, can be a solution to improve the reliability.
57 |
58 | Advantages of this approach are:
59 |
60 | * No server to administrate
61 | * Geo Redundancy
62 | * Easy files update
63 | * Good SLA (99.99%)
64 |
65 |
66 | === URL
67 |
68 | Main website endpoint is 'https://www.jenkins.io' and everything else should be a permanent redirection to it.
69 |
70 | - 'jenkins-ci.org'
71 | - 'jenkins.io'
72 | - 'www.jenkins-ci.org'
73 | - HTTP traffic
74 |
75 | === Design
76 |
77 | We deploy the website on a blob storage (Azure File Storage) and we use web servers deployed on the Kubernetes cluster to process
78 | HTTP/HTTPS requests, SSL certificate, ...
79 |
80 | There are two ways to implement this solution
81 |
82 | 1. The blob storage is mounted inside the docker container and the container works like a classic web server.
83 | 2. Files in the blob storage are accessed trough a HTTP request, and the container works as a proxy.
84 |
85 | At this moment, there is no technical reason to prefer between those two implementation, therefor I suggest to go with
86 | the blob storage mounted inside the container as we are already doing it for others applications.
87 |
88 | Main advantages of this design are:
89 |
90 | * We already apply this design for others applications (azure repo proxy, pluginsite).
91 | * We can use Letsencrypt certificates for HTTPS
92 | * We can update the website without having to rebuild the docker image
93 | * We have a full control on redirection/proxy rules.
94 |
95 | == Motivation
96 | Motivations here are essentially to increase reliability, availability and scalability.
97 |
98 | == Rationale
99 | They are a lot of different ways to host a static website and I briefly take a look to some of them here.
100 |
101 | === Dedicated Server
102 | Running an Apache/Nginx web server on a dedicated server. +
103 | That's what we are doing at the moment.
104 |
105 | Pros:
106 |
107 | * We already have the configuration management code to manage it.
108 | * It's easy to control the cost as we pay dedicated server/virtual machine on a monthly basis.
109 | * It's easy to do continuous delivery as we only have to upload files on the correct directory.
110 |
111 | Cons:
112 |
113 | * It requires maintenance, server updates, ...
114 | * The dedicated server is a single point of failure.
115 |
116 | === Standalone Docker
117 | The docker image contain everything, the web server and the website. +
118 | That means that the docker image is self-sufficient.
119 | This docker image can be deployed on the k8s cluster. +
120 |
121 | Pros:
122 |
123 | * All docker pros
124 |
125 | Cons:
126 |
127 | * It adds a lot of complexity to the Continuous Delivery process as the website is updated every 30 min.
128 | * A lot of docker images will be build per day (up to 48) which means a lot of layers, spaces used, ...
129 | * We'll have to manage a big amount of docker image tags, ...
130 |
131 |
132 | === Standalone Blob Storage
133 | As 'jenkins.io' is only composed of static files, we can use a blob storage to store
134 | files and azure-cli to update website content.
135 |
136 | Pros:
137 |
138 | * No server to administrate
139 | * Geo Redundancy
140 | * Easy files update
141 | * Good SLA (99.99%)
142 |
143 | Cons:
144 |
145 | * Very basic web server
146 | ** Do not support HTTPS.
147 | ** Do not support URL redirection like 'jenkins-ci.org' -> to 'jenkins.io'
148 | ** URL follow this format 'http:////'
149 |
150 | === Blob Storage + CDN
151 |
152 | Add a CDN in front of a blob storage
153 |
154 | Pros:
155 |
156 | * Provide HTTPS
157 | * Increase performance
158 | * Same than Standalone blob storage
159 |
160 | Cons:
161 |
162 | * Costs
163 | * Do not support domain redirection
164 |
165 | == Costs
166 |
167 | At this moment, it's hard to evaluate the price of this change as Azure blob storage pricing depends on several factors like number of requests done or the amount of data stored.
168 |
169 | == Reference implementation
170 |
171 | This implementation works exactly like https://github.com/jenkins-infra/jenkins-infra/tree/staging/dist/profile/templates/kubernetes/resources/repo_proxy[Repo-proxy], excepted that instead of getting contents from Artifactory and caching the results on the blob storage, we get contents from a blob storage.
172 |
--------------------------------------------------------------------------------
/iep-009/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-9: Incremental builds Maven repository
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 9
18 |
19 | | Title
20 | | Incremental builds Maven repository
21 |
22 | | Author
23 | | link:https://github.com/rtyler[R Tyler Croy]
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | Architecture
30 |
31 | | Created
32 | | 2018-03-19
33 |
34 | | Discussions-To
35 | | link:http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-March/001417.html[infra@lists.jenkins-ci.org]
36 |
37 | |===
38 |
39 |
40 | == Abstract
41 |
42 | In order to support more rapid continuous delivery models, such as that
43 | described by
44 | link:https://github.com/jenkinsci/jep/tree/master/jep/300[Jenkins Essentials],
45 | Jenkins core and plugin builds must be deployed into a Maven repository much
46 | more incrementally rather than waiting for a developer to manually deploy a
47 | release to the existing `releases` footnote:[https://repo.jenkins-ci.org/releases/]
48 | repository.
49 |
50 |
51 | == Specification
52 |
53 | For a more incremental developer workflow, rather than attempting to make sense
54 | of potentially thousands of `SNAPSHOT` versions of artifacts in a repository.
55 | this document proposes a specific and tightly controlled Maven repository for
56 | such builds.
57 |
58 | This would mean Artifactory would have an `incrementals` repository *just* for
59 | these kinds of pre-release builds. For example, rather than
60 | `git-client-2.4.0-SNAPSHOT.jar` the repository would contain
61 | `git-client-2.4.0-a3dbf.jar`, assuming `a3dbf` is the Git short-commit for the
62 | build.
63 |
64 | === Garbage Collection
65 |
66 | Artifacts in the `incrementals` repository must be garbage collected to the
67 | most recent 5 artifacts **or** the most recent 30 days of artifacts. For
68 | example, if a plugin (`io.jenkins.plugins.retrocean`) has no new commits in a
69 | 30 day period, the `io/jenkins/plugins/retrocean/` directory should have no
70 | more than the five most recent artifacts stored. More active plugins should
71 | have no more than 30 days worth of artifacts stored.
72 |
73 |
74 | [NOTE]
75 | ====
76 | Plugin and core tooling should use a consistent format for defining incremental
77 | built versions, such as: `-..-.jar`. The
78 | specific format is not required for this document.
79 | ====
80 |
81 |
82 | == Motivation
83 |
84 | By adding this new, somewhat ephemeral, Maven repository, we can support newer
85 | development workflows in the Jenkins project without adversely affecting the
86 | existing "mainstream" development workflow of Jenkins plugins.
87 |
88 | As mentioned briefly in <>, overhead is sufficiently low to adding that
89 | adding a new Maven repository, even if it's experimental and eventually
90 | abandoned, is worth the exploration.
91 |
92 | == Rationale
93 |
94 | A Maven repository hosted in Artifactory, alongside the `releases` and a number
95 | of other repositories, results in the simplest to deploy, and simplest to
96 | consume "bucket of artifacts" we currently have at our disposal.
97 |
98 | In order to avoid over-reliance on these "incremental builds", the artifacts
99 | themselves should be expected to be deleted after the specified "garbage
100 | collection" period.
101 |
102 |
103 | === Alternate Approaches
104 |
105 | ==== Azure Storage Container
106 |
107 | One alternative approach which was briefly discussed in person with
108 | link:https://github.com/carlossg[Carlos Sanchez] was to drop these
109 | "incremental build" artifacts directly into an Azure storage container.
110 |
111 | This approach was rejected as it would require additional non-native tooling to
112 | be used in the development workflow. By adopting a Maven repository, existing
113 | tools for grabbing Maven dependencies can be utilized by developers wishing to
114 | incorporate "incremental builds" into their build and test workflows.
115 |
116 |
117 | ==== Existing Maven Repository
118 |
119 | Re-using the existing `releases` Maven repository was explicitly _not_
120 | considered as an alternative approach as the Jenkins infrastructure team has
121 | had multiple performance issues
122 | footnote:[http://lists.jenkins-ci.org/pipermail/jenkins-infra/2017-December/001349.html]
123 | with that repository over the past couple of months. Most notably, the indexing
124 | times in the repository have varied between 10 minutes and over 60 minutes due
125 | to opaque server-side issues. The consequence of adding more load, and many
126 | more artifacts, to the repository would severely affect the indexing time and
127 | adversely affect the ability for Jenkins users to receive new plugin updates in
128 | the Update Center footnote:[https://updates.jenkins.io].
129 |
130 |
131 | [[costs]]
132 | == Costs
133 |
134 | As Artifactory is a hosted service provided by link:https://jfrog.com[JFrog],
135 | it is not expected that any additional financial cost will be involved in this
136 | proposal.
137 |
138 | Additionally, since this document proposes an additional repository, it is not
139 | expected to incur any substantial runtime/performance cost on the existing
140 | Update Center and `releases` repository indexing process.
141 |
142 |
143 | == Reference implementation
144 |
145 | Since we have no staging implementation of Artifactory, there is no reference
146 | implementation, acceptance of this IEP document would result in a new
147 | repository being created on
148 | link:https://repo.jenkins-ci.org[repo.jenkins-ci.org].
149 |
--------------------------------------------------------------------------------
/iep-007/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-007: Kubernetes cluster upgrade
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 007
18 |
19 | | Title
20 | | Kubernetes cluster upgrade
21 |
22 | | Author
23 | | link:https://github.com/olblak[Olblak]
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | [Process]
30 |
31 | | Created
32 | | 2016-07-04
33 | |===
34 |
35 |
36 |
37 | == Abstract
38 |
39 | As Kubernetes project often releases new versions, it's interesting to enjoy new features or bugfixes introduced by those new versions.
40 | Therefor we need an easy way to upgrade Kubernetes cluster on a regular basis.
41 |
42 |
43 | == Specification
44 |
45 | Currently there are two main strategies to "upgrade" a cluster. +
46 |
47 | . Upgrading an existing cluster.
48 | . Migrating on a second cluster.
49 |
50 | As Azure do not (yet) provide any tools to upgrade existing clusters, we have to upgrade them manually. +
51 | It appears to be easier and safer to deploy a new cluster and re-deploy all resources on the new one.
52 |
53 | As long as the new cluster stays in the same region than the previous one, we can use same blob storages and attach them to the new cluster. +
54 | The only important element that we loose when we migrate to a new cluster, is the cluster public IP. +
55 | This means that we need to update 'nginx.azure.jenkins.io' with a new public IP.
56 |
57 | === Migration Process
58 |
59 | IMPORTANT: The old cluster must be kept until the new one is ready to serve requests.
60 |
61 | ==== Step 1: Backup
62 |
63 | Ensure secrets containing Letsencrypt certificates are exported. (Require https://github.com/jenkins-infra/jenkins-infra/pull/819[#PR819]) +
64 | A cron job should periodically export letsencrypt certificate into `~/backup/$(CLUSTER)/secret.$(APP)-tls.yaml`
65 |
66 | .Export a secret
67 | ----
68 | .bin/kubectl get secret $(APPLICATION)-tls --export=true --kubeconfig .kube/$(CLUSTER).conf -o yaml > ~/backup/$(CLUSTER)/secret.$(APPLICATION)-tls.yaml
69 | ----
70 |
71 | ==== Step 2: Deploy the new cluster
72 |
73 | Add a second k8s resource in github.com/jenkins-infra/azure, named 'pea' (Require https://github.com/jenkins-infra/iep/pull/11[#PR11])
74 |
75 | ==== Step 3: Configure the new cluster
76 |
77 | * Update following hieraconfig variables with new the k8s cluster information(Require PR on jenkins-infra/jenkins-infra)
78 |
79 | ----
80 | profile::kubernetes::params::clusters:
81 | - server: https://clusterexample1.eastus.cloudapp.azure.com
82 | username: clusterexample1-admin
83 | clustername: clusterexample1
84 | certificate_authority_data: ...
85 | client_certificate_data: ...
86 | client_key_data: ...
87 | ----
88 |
89 | * Run puppet agent
90 | * Get new public IP (Manual operation)
91 | ----
92 | kubectl get service nginx --namespace nginx-ingress
93 | ----
94 | * Restore backed up secrets containing Letsencrypt certificates on the new cluster (Manual operation)
95 | ----
96 | .bin/kubectl apply -f ~/backup/$(OLD_CLUSTER)/secret.*-tls.yaml --kubeconfig .kube/$(CLUSTER).conf
97 | ----
98 | * Validate HTTPS endpoint (Manual operation)
99 | ----
100 | curl -I https://jenkins.io --resolve "jenkins.io:443:"
101 | curl -I https://plugins.jenkins.io --resolve "plugins.jenkins.io:443:"
102 | curl -I https://repo.azure.jenkins.io --resolve "repo.azure.jenkins.io:443:"
103 | ...
104 | ----
105 |
106 | ==== Step 4: Update DNS Record
107 |
108 | Update nginx.azure.jenkins.io with the new public IP (Require PR on jenkins-infra/jenkins-infra)
109 |
110 | [NOTE]
111 | During DNS update, requests will be send either to the new cluster, either to the old cluster.
112 | Users shouldn't detect any differences.
113 |
114 | ==== Step 5: Remove the old cluster
115 | Remove k8s.tf from jenkins-infra/azure (Require PR on jenkins-infra/azure)
116 |
117 | [NOTE]
118 | It may be safer to not automate this step, and just delete the good storage account through Azure portal
119 |
120 |
121 | ==== Conclusion
122 | With this scenario, we shouldn't have any downtime as HTTP/HTTPS requests will almost have (depending on the service) the same response whatever we reach the old or the new cluster.
123 |
124 |
125 | == Motivation
126 |
127 | As testing environments have short lives, we create them to validate deployments then we trash them so they often use the latest version available from Azure.
128 | This means that we may not detect issues when those versions are not aligned with production
129 | We also would like to enjoy improvements from bugfixes and new features.
130 | It's easier to follow Kubernetes documentation if we use a version close to the upstream version.
131 |
132 |
133 | == Rationale
134 |
135 | There isn't any tool to easily upgrade a Kubernetes cluster on Azure so I tried to do it manually.
136 |
137 | I only spent one day to try this upgrade but it wasn't trivial.
138 | I applied following steps manually:
139 |
140 | * Update kubelet && kubectl
141 | * Update Kubernetes options according the version (some options were deprecated and others were new)
142 | * Reproduce the production cluster in order to validate the migration.
143 | * Restart each node (master&client) after the upgrade
144 |
145 | Even after those operations I faced weird issues, so I decided to stop there and concluded that cluster migration was easier and a safer process.
146 |
147 | They are several open issues regarding update procedure so I suppose that it may be a possible alternative in the future.
148 |
149 | * https://github.com/Azure/ACS/issues/5[Azure/acs]
150 | * https://github.com/Azure/acs-engine/issues/464[Azure/ace-engine]
151 |
152 | == Costs
153 |
154 | It will costs a second cluster during the migration but when everything is switched to the new cluster, the previous one can be decommissioned.
155 |
156 |
157 | == Reference implementation
158 |
159 | As of right now there is not reference implementation of Kubernetes Cluster upgrade
160 |
--------------------------------------------------------------------------------
/iep-002/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-2: Azure Virtual Networks for Cluster Segregation
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 2
18 |
19 | | Title
20 | | Azure Virtual Networks for Cluster Segregation
21 |
22 | | Author
23 | | link:https://github.com/rtyler[R. Tyler Croy]
24 |
25 | | Status
26 | | :scroll: Complete
27 |
28 | | Type
29 | | Architecture
30 |
31 | | Created
32 | | 2016-11-16
33 | |===
34 |
35 |
36 | == Abstract
37 |
38 | Currently the only network connecting various infrastructure services is the
39 | public internet. Regardless of the security level of a service, if other
40 | services must connect to it they must do so over the public internet. This is
41 | not an explicit design decision, but rather a consequence of the organic,
42 | cross-datacenter manner in which infrastructure grew.
43 |
44 |
45 | == Specification
46 |
47 | As part of the
48 | link:https://wiki.jenkins-ci.org/display/JENKINS/Azure+Migration+Project+Plan[Azure migration]
49 | the Jenkins project now has access to "modern" software-defined networking
50 | tools which can allow designing and implementing better network topologies than
51 | "just the public internet."
52 |
53 | This can be accomplished with the Azure
54 | link:https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview[Virtual Networks]
55 | feature, similar to the AWS VPC feature, which allows definition of software
56 | defined networks into which Azure resources can be deployed (e.g. VMs, SQL
57 | Server DBs, etc).
58 |
59 | This proposal is for the creation of *three* Virtual Networks for the Jenkins
60 | project infrastructure:
61 |
62 |
63 | . *Public Production*
64 | . *Private Production*
65 | . *Development*
66 |
67 | At some point in the future, there might be more networks necessary, but at
68 | this point this seems sufficient to bootstrap the project infrastructure on
69 | Azure in a reasonably sane fashion.
70 |
71 | === Public Production
72 |
73 | The "Public Production" Virtual Network would contain end-user (developer or
74 | Jenkins user) facing services such as, but not limited to:
75 |
76 | * link:https://jenkins.io[jenkins.io] - Primary (static) website
77 | * link:https://ci.jenkins.io[ci.jenkins.io] - Jenkins-on-Jenkins cluster
78 | * link:https://accounts.jenkins.io[accounts.jenkins.io] - Account app
79 | * JIRA and Confluence
80 |
81 | Services in this network should be appropriately protected with
82 | Network Security Groups, only allowing the necessary application ports to be
83 | services, but should be otherwise considered "public."
84 |
85 |
86 | === Private Production
87 |
88 | The "Private Production" network is for services which are internal, or highly
89 | sensitive, within the Jenkins project's infrastructure. These are services
90 | which include, but are not limited to:
91 |
92 | * Puppet Master - the holder of all secrets for provisioning Public and Private
93 | Production-level services
94 | * "trusted.ci" - a behind-the-scenes Jenkins cluster with release/signing keys
95 |
96 | This network would also utilize Network Security Groups and will be peered with
97 | the Public Production network via
98 | link:https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview[Virtual Network Peering]
99 | which allows the two networks to route between each other via the Azure network
100 | backbone using private IP addresses. This peering is required to manage
101 | services via Puppet in the Public Production network.
102 |
103 |
104 | ==== Contributor Access
105 |
106 | The Private Production network will be locked down and unaccessible from the
107 | public internet. There are however some contributors, such as board members and
108 | team leads, which will need access to services within the Private Production
109 | network.
110 |
111 | These contributors will need to be granted access via a
112 | link:https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-about-vpngateways#point-to-site[VPN Gateway]
113 | in Azure. This creates some minor additional cost and management overhead
114 | per-user, but the security provided by the Private Production network enables
115 | projects such as fully automated core releases.
116 |
117 |
118 | === Development
119 |
120 | The "Development" network is somewhat of a "catch-all" for services which are
121 | not yet ready for production usage, testing, and demonstration environments.
122 | Services which live in this environment *will not* have access to the Puppet
123 | Master and therefore will not be capable of being fully provisioned in the
124 | same manner as "production" services.
125 |
126 |
127 | NOTE: At some point in the future, a staging Puppet master may be provisioned in this
128 | network but that is outside the scope of this document.
129 |
130 |
131 |
132 | ---
133 |
134 | [source]
135 | ----
136 | +---------------------+
137 | | |
138 | +---------------> | Public Production <-------+
139 | | | | |
140 | | +---------------------+ VNet Peering
141 | | |
142 | | +-------------v--------+
143 | +-------------+ | |
144 | The Internet ---------> + VPN Gateway |-| Private Production |
145 | +-------------+ | |
146 | | +----------------------+
147 | |
148 | | +----------------+
149 | | | |
150 | +---------------> | Development |
151 | | |
152 | +----------------+
153 | ----
154 |
155 |
156 | == Motivation
157 |
158 |
159 | Structuring the Azure-based infrastructure across the three proprosed Virtual
160 | Networks will create an additional level of service balkanization which we are
161 | currently (pre-Azure) are unable to provide. Per our possible infrastructure
162 | compromise earlier this year
163 | footnote:[https://jenkins.io/blog/2016/04/22/possible-infra-compromise/],
164 | the infrastructure should be *more* balkanized whenever possible to reduce
165 | the impact, or remove the possibility, of incursions into Jenkins project
166 | infrastructure.
167 |
168 |
169 | == Rationale
170 |
171 |
172 | The three Virtual Networks proposed represent a "minimum" structure to get
173 | infrastructure provisioned safely into Azure from a network perspective.
174 |
175 | A completely flat network topology was briefly considered, and does make
176 | management very easy, but leaves us with little network-based protection
177 | against unknown vulnerability in some of the non-end-user-facing services.
178 | Additionally, it is a requirement for at least "trusted.ci" to exist off the
179 | public internet as it contains signing keys and other highly sensitive secrets.
180 |
181 | Trusting Network Security Groups alone may inadvertently leave open holes in
182 | our infrastructure, whereas implementing a fundamental layer two
183 | footnote:[https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_Layer]
184 | separation ensures misconfigurations and/or accidents don't leave sensitive
185 | services in the Jenkins project infrastructure exposed.
186 |
187 |
188 |
189 | == Costs
190 |
191 | The cost of maintenance/implementation of these networks cannot be estimated at
192 | this point in time.
193 |
194 | The monetary cost only plays a factor when routing traffic between two
195 | networks, which would would be:
196 |
197 | [cols=2]
198 | |===
199 | | Inbound data transfer
200 | | $0.01 per GB
201 |
202 | | Outbound data transfer
203 | | $0.01 per GB
204 | |===
205 |
206 | link:https://azure.microsoft.com/en-us/pricing/details/virtual-network/[source]
207 |
208 |
209 | == Reference implementation
210 |
211 | As of right now there is no reference implementation of the various Virtual
212 | Networks.
213 |
--------------------------------------------------------------------------------
/iep-008/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-008: Ldap on Kubernetes
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 8
18 |
19 | | Title
20 | | Ldap on Kubernetes
21 |
22 | | Author
23 | | link:https://github.com/olblak[Olivier Vernin]
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | Architecture
30 |
31 | | Created
32 | | 2018-01-31
33 | |===
34 |
35 |
36 |
37 | == Abstract
38 |
39 | As part of the migration to Azure, the Ldap server must be moved.
40 | We can take this opportunity to containerize this service and move it on a container orchestrator like the kubernetes cluster.
41 | This new architecture change must be done carefully as this is a stateful application with a database that can be corrupted or lead to data loss.
42 |
43 | == Specification
44 |
45 | This IEP document is about running ldap on Kubernetes on Azure.
46 | Currently the ldap server is running on a bare metal machine and is configured by puppet.
47 | The objective here, is to dockerize this application in order to run it on a kubernetes cluster.
48 |
49 | As a stateful application, we must take into consideration following aspect when deploying the ldap server in production.
50 |
51 |
52 | === Access
53 | Ldap must be accessible from inside and outside the kubernetes cluster.
54 | In order to reach ldap from the wild, we need a fixed public IP and only allow connection from whitelisted IP.
55 | This can be easily done with the kubernetes resource 'service'.
56 |
57 | [cols="1a,2a", options="header"]
58 | .Access
59 | |===
60 | |Inside
61 | |Outside
62 | |
63 | * https://accounts.jenkins.io/[Accountapp]
64 | |
65 | * https://repo.jenkins-ci.org/webapp/#/home[Artifactory]
66 | * https://ci.jenkins.io[ci.jenkins.io]
67 | * https://wiki.jenkins.io/[Confluence]
68 | * https://issues.jenkins-ci.org[Jira]
69 | * puppet master
70 | * spambot
71 | * trusted.ci
72 | |===
73 |
74 |
75 | === Backup/Restore
76 | The procedure to backup/restore the database should be easy, two scripts are provided by the same docker image that run ldap. +
77 | Backups must be stored on an Azure File Storage in order to simplify their access from various location. +
78 | Backup name must respect format 'YYYYmmddHHMM'
79 |
80 | Backups and restore operation must be done in following situations:
81 |
82 | [cols="1a,2a", options="header"]
83 | .Backup/Restore
84 | |===
85 | |Backup
86 | |Restore
87 | | * On daily basis
88 | * When the application is stopping
89 | * On demand
90 | | * On demand
91 | |===
92 |
93 | === Certificate
94 | https://issues.jenkins-ci.org/browse/INFRA-1151[INFRA-1151]
95 |
96 | Letsencrypt can be configured with two different methods
97 |
98 | *HTTP-01*
99 |
100 | HTTP-01 configuration needs an Ingress resource that listen on port 443, but this ingress resource cannot be configured to also listening port 389/636.
101 | It means that we need a service for listening port 389/636 unfortunately this service don't handle Letsencrypt certificate requests.
102 | Therefor we would need both resources type, and they can't be configured with the same public IP.
103 | So this method doesn't work.
104 |
105 |
106 | *DNS-01*
107 |
108 | DNS-01 configuration only works with Google/AWS/Cloudflare.
109 |
110 | *Conclusion*
111 | I didn't find an easy way to use Letsencrypt certificate from kube-lego(deprecated)/cert-manager so
112 | we have to go with a manual requested ssl certificate.
113 |
114 | === Data
115 | The ldap database must be store on a stable storage that can be easily mounted/unmounted.
116 | Currently there are no perfect solutions as each solution has advantages and disadvantages.
117 |
118 | ==== Dedicated Azure Disk storage
119 | ReadWriteOnce
120 | [cols="1a,2a", options="header"]
121 | .Dedicated Azure Disk Storage
122 | |===
123 | |+
124 | |-
125 | |
126 | * Persistent Data across kubernetes clusters as we only have one container running at the time.
127 | * We only have to restore a backup once.
128 | |
129 | * Complexify cluster upgrade, https://github.com/jenkins-infra/iep/tree/master/iep-007[iep-007],
130 | traffic will be redirected to the new server once the old is deleted.
131 | * It means downtime when we upgrade the container as we must delete the old container before starting the new one.
132 | |===
133 |
134 | ==== Dynamic Azure Disk Storage
135 | ReadWriteOnce
136 |
137 | [cols="1a,2a", options="header"]
138 | .Dynamic Azure Disk Storage
139 | |===
140 | |+
141 | |-
142 | |
143 | * Persistent data associated to a cluster life cycle
144 | * Simplify cluster migration, new cluster can be started even if the old cluster is still running
145 | |
146 | * We must restore a backup on each new kubernetes cluster deployment
147 | * While migrating the cluster, we must be sure to put the old cluster in read only mode.
148 | |
149 | |===
150 |
151 | ==== Azure File storage
152 | ReadWriteMany +
153 | After running some tests, I noticed bad behaviors while running openldap on CIFS partition.
154 | Like 'permission denied issues' even if the blob storage was mounted as a ldap user,
155 | or database restore that hangs forever, ...
156 | At the end, I decided to not invest further time into this solution.
157 |
158 | ==== Conclusion
159 | Considering that it only takes 5seconds to backup/restore a ldap database, using a dynamic azure disk storage sounds reasonable.
160 |
161 | === Kubernetes Design
162 | .Kubernetes Schema
163 | [source]
164 | ....
165 | +----------------------------------------------------------------------------------------------+
166 | | Namespace: Ldap |
167 | +----------------------------------------------------------------------------------------------+
168 | | |
169 | | +----------------------------------+ +----------------------------------+ |
170 | +---------------+ | Statefulset: Ldap | | PersistentVolume: ldap-backup | |
171 | |Service: Ldap | +----------------------------------+ +----------------------------------+ |
172 | +---------------+ | +---------------------------+ | | * Terraform Lifecycle | |
173 | | * Ldap (389) | | | POD: ldap-0 | | | * ReadWriteMany | |
174 | | * Ldaps (636) | | +---------------------------+ |<--------------------------------------+ |
175 | +---------------+ | | +----------------------+ | | +----------------------------------+ |
176 | | | | | | Container: Slapd | | | | PersistentVolume: ldap-data | |
177 | | | | | +----------------------+ | | +----------------------------------+ |
178 | | | | | | * Ldap server | | | | * ClusterLife cycle | |
179 | | +--------->| | +----------------------+ | |<---+ * ReadWriteOnce | |
180 | | | | | | +----------------------------------+ |
181 | | | | +----------------------+ | | |
182 | | | | | Container: Crond | | |<---+----------------------------------+ |
183 | | | | +----------------------+ | | | Secret: Ldap | |
184 | | | | | * Backup Task | | | +----------------------------------+ |
185 | | | | +----------------------+ | | | * SSL certificate | |
186 | | | | | | | * Blob storage credentials | |
187 | | | +---------------------------+ | | * Ldap credentials | |
188 | | +----------------------------------+ +----------------------------------+ |
189 | +----------------------------------------------------------------------------------------------+
190 | ....
191 |
192 | == Motivation
193 | The motivation here is to benefit from both Kubernetes and Azure services advantages. +
194 | link:https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/[What is Kubernetes?]
195 |
196 | == Rationale
197 | == Costs
198 | In addition of the Kubernetes cluster that we are already paying for, we'll need following services
199 |
200 | * Public IP
201 | * LoadBalancer
202 | * Azure file storage account for backup
203 | * Disk Storage account for Data
204 | * Ssl certificate `Ldap.jenkins.io`
205 |
206 | == Reference implementation
207 | * https://github.com/jenkins-infra/ldap[Docker Container]
208 | * https://github.com/jenkins-infra/jenkins-infra/pull/943[Jenkins-infra PR#943]
209 | * https://github.com/jenkins-infra/azure/pull/45[Azure PR#45]
210 | * https://issues.jenkins-ci.org/browse/INFRA-1131[JIRA Issue]
211 |
--------------------------------------------------------------------------------
/iep-001/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-1: Infra Enhancement Proposal Format
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 1
18 |
19 | | Title
20 | | Infra Enhancement Proposal Format
21 |
22 | | Author
23 | | link:https://github.com/rtyler[R Tyler Croy]
24 |
25 | | Status
26 | | :scroll: Complete
27 |
28 | | Type
29 | | Process
30 |
31 | | Created
32 | | 2016-10-25
33 | |===
34 |
35 |
36 |
37 | == What is an IEP?
38 |
39 | IEP ("eep!") stands for Infrastructure Enhancement Proposa. An IEP is a design
40 | document providing information to the Jenkins community, or describing an
41 | addition/change to the Jenkins project's infrastructure. The IEP should provide a
42 | concise technical specification of the addition/change and a rationale for the
43 | addition/change.
44 |
45 | We intend IEPs to be the primary mechanism for major infrastucture proposals,
46 | collecting feedback on the proposals, and for documenting the design decisions
47 | that have gone into the infrastructure over time. The IEP author is responsible
48 | for building consensus among the infrastructure team members, informing the
49 | users of the infrastructure and documenting dissenting opinions.
50 |
51 | Because the IEPs are maintained as text files in a versioned repository, their
52 | revision history is the historical record of the feature proposal
53 | footnoteref:[ieprepo, The source repository for IEPs can be found on link:https://github.com/jenkins-infra/iep[GitHub]]
54 |
55 |
56 | == IEP Types
57 |
58 |
59 | . **Architectural change** -- an architecture change describes a new way of
60 | delivering a service or services for the Jenkins project infrastructure.
61 |
62 | . **Services change** -- this IEP would describe a new service being added to
63 | the infrastructure or changing how an existing service is deployed
64 |
65 | . **Process change** -- describes a change to how some process for the Jenkins
66 | project, or an introduction of a new process to improve the efficiency,
67 | transparency or some other quality to the operation of project infrastructure.
68 |
69 |
70 |
71 |
72 | == IEP Workflow
73 |
74 |
75 | === Start with an idea for infrastructure
76 |
77 | The IEP process begins with a new idea for Jenkins project infrastructure. It
78 | is highly recommended that a single IEP contains a single key proposal or new
79 | idea. Small changes often don't need a full-fledged IEP document and can simply
80 | be contributed as Puppet or other code to the infrastructure workflow.
81 |
82 | Each IEP must have a champion -- someone who writes the IEP using the style and
83 | format described below, shepherds the discussions in the appropriate forums,
84 | and attempts to build community consensus around the idea. The IEP champion
85 | (a.k.a. Author) should first attempt to ascertain whether the idea is
86 | IEP-worthy by discussing it first on the
87 | link:mailto:infra@lists.jenkins-ci.org[infra mailing list].
88 |
89 | Once the champion has asked the infra community whether an idea has any
90 | chance of acceptance, a draft IEP should be authored and linked to the mailing
91 | list. This gives the author the chance to flesh out the draft and its ideas
92 | before taking the time to formally submit the proposal.
93 |
94 |
95 |
96 | === Submitting an IEP
97 |
98 | Following a discussion on the infra mailing list, the proposal should be
99 | submitted as a draft IEP via a GitHub pull request to the IEP repository
100 | footnoteref:[ieprepo]
101 |
102 | The standard IEP workflow is:
103 |
104 | * You, the IEP author, fork the IEP repository footnoteref:[ieprepo], and
105 | create a directory named `iep-xxx` that contains a `README.adoc` which is the
106 | IEP document. Replace `xxx` with an incremental IEP number (e.g. `002`)
107 | * Push this to your GitHub fork and submit a pull request.
108 | * The IEP editors review your PR for structure, formatting, and other errors.
109 | Once the IEP editors have reviewed the basic formatting, they will attach the
110 | "draft" label to the pull request.
111 |
112 | IEP authors are responsible for collecting community feedback on a IEP
113 | before submitting it for review. However, wherever possible, long
114 | open-ended discussions on public mailing lists should be avoided.
115 |
116 | Strategies to keep the discussions efficient include: setting up a
117 | separate mailing list for the topic, having the IEP author accept
118 | private comments in the early design phases, setting up a wiki page, etc.
119 | IEP authors should use their discretion here.
120 |
121 | The IEP author is responsible for updating the pull request with appropriate
122 | changes based on these discussions.
123 |
124 |
125 | === IEP Maintenance
126 |
127 | In general, IEPs won't change over time but rather be superceded by follow-up
128 | or new prooposals. Once a proposal has been accepted and implemented, the IEP
129 | should stand the test of time as a reference document for future infrastructure
130 | contributors.
131 |
132 | Process IEPs may be updated over time to reflect changes to development ..
133 | practices and other details. The precise process followed in these cases will
134 | depend on the nature and purpose of the IEP being updated.
135 |
136 |
137 |
138 | === Creating a successful IEP
139 |
140 | Each IEP should have the following parts:
141 |
142 | . **Preamble** -- an AsciiDoc table containing meta-data about the IEP including
143 | the number, a short descriptive title (limited to a maximum of 50 characters),
144 | the names, and optionally the contact info for each author, etc.
145 | . **Abstract** -- a short (200 word) description of the technical issue being
146 | addressed.
147 | . **Specification** -- The technical specification should describe the
148 | requirements, architecture and design of the proposal.
149 | . **Motivation** -- The motivation is critical for IEPs that want to add new
150 | services to the Jenkins project's infrastructure. It should clearly explain
151 | what benefit to the project the service brings, identifying the problems it
152 | solves or the features it offers to the users or developers of the Jenkins
153 | project. IEP submissions without sufficient motivation may be rejected
154 | outright.
155 | . **Rationale** -- The rationale fleshes out the specification by
156 | describing what motivated the design and why particular design
157 | decisions were made. It should describe alternate designs that
158 | were considered and related work, e.g. how the feature is supported
159 | in other languages.
160 | The rationale should provide evidence of consensus within the
161 | community and discuss important objections or concerns raised
162 | during discussion.
163 | . **Costs** -- The costs of making this change - these can be financial costs
164 | associated with new hosting, and/or organisational costs that will be borne by
165 | Jenkins users/teams as part of this change.
166 | . **Reference Implementation** -- The reference implementation must be
167 | completed for new services or process changes. This could be as succinct as
168 | the code repository containing a new application, or an example project
169 | demonstrating a new process change.
170 | While there is merit to the approach of reaching consensus on the
171 | specification and rationale before writing code, the principle of "rough
172 | consensus and running code" is still useful when it comes to resolving
173 | discussions about a service's architecutre or implementation details.
174 |
175 |
176 | === IEP Formats and Templates
177 |
178 | IEPs are UTF-8 encoded text files using the
179 | link:http://asciidoctor.org[AsciiDoc]
180 | format.
181 | AsciiDoc allows for rich markup that is still quite easy to
182 | read, but also results in good-looking and functional HTML.
183 |
184 |
185 | ==== IEP Header Preamble
186 |
187 | Each IEP must begin with an AsciiDoc table containing meta-data relevant to the
188 | IEP.
189 |
190 | [source,asciidoc]
191 | ----
192 | .Metadata
193 | [cols="2"]
194 | |===
195 | | IEP
196 | | 1
197 |
198 | | Title
199 | | Infra Enhancement Proposal Format
200 |
201 | | Author
202 | | link:https://github.com/rtyler[R Tyler Croy]
203 |
204 | | Status
205 | | :speech_balloon: In-process
206 |
207 | | Type
208 | | Process
209 |
210 | | Created
211 | | 2016-10-25
212 | |===
213 | ----
214 |
215 |
216 | . **IEP** -- Proposal number, use a monotonically increasing number, starting
217 | from the latest merged IEP document
218 | . **Title** -- Brief title explaining the proposal in fewer than 50 characters
219 | . **Author** -- Author/champion of the IEP, in essence, the individual
220 | responsible for seeing the IEP through the process.
221 | . **Status** -- In-process, Accepted, or Rejected. IEPs should be authored with
222 | an In-process status.
223 | . **Type** -- Describes the type of IEP: Architectural, Service, Process
224 | . **Created** -- Date (`%Y%m%d`) when the document was first created.
225 |
226 |
227 | ===== Additional Files
228 |
229 | IEPs may include additional files such as diagrams and code snippets. Such
230 | files should be added into the `iep-xxx/` directory with self-explanatory file
231 | names.
232 |
233 |
234 | === IEP Rejection
235 |
236 | If an IEP is rejected, the pull request for the IEP should still be merged with
237 | additional information added to the header of the document explaining the
238 | decision making process and why the proposal was rejected.
239 |
240 | This should help in the future when decisions must be revisited or reviewed as
241 | tools, technologies and needs of the project change.
242 |
--------------------------------------------------------------------------------
/iep-005/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-5: Systematic collection of project events
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 |
17 | | IEP
18 | | 5
19 |
20 | | Title
21 | | Systematic collection of project events
22 |
23 | | Author
24 | | link:https://github.com/rtyler[R. Tyler Croy]
25 |
26 | | Status
27 | | :fire: Abandoned
28 |
29 | | Type
30 | | Service
31 |
32 | | Created
33 | | 2016-12-28
34 | |===
35 |
36 |
37 |
38 | == Abstract
39 |
40 | Considering the size and velocity of the Jenkins project, it can be difficult
41 | to systematically determine the overall health through manual checks, scraping
42 | of various APIs, or through other non-automated means. Additionally, without
43 | any project-owned corpus of data around it is burdensome to answer questions
44 | such as:
45 |
46 | * What's the level of development activities in core, class A/B/C plugins? How are they changing over the time?
47 | * Who are the seasoned contributors?
48 | * Who are the new contributors that we can reach out to and help?
49 | * What's the typical journey of a plugin developer?
50 |
51 | == Specification
52 |
53 | The primary source of project events is currently GitHub, which provides
54 | organization-wide webhooks. This specification focuses primarily on these
55 | events but can be readily extended when new sources of information arise.
56 |
57 | === Technologies Introduced
58 |
59 | This specification introduces a few new technologies which are currently not
60 | part of the Jenkins project infrastructure:
61 |
62 | * link:https://azure.microsoft.com/en-us/services/functions/[Azure Functions]
63 | * link:https://azure.microsoft.com/en-us/services/event-hubs/[Azure EventHub]
64 | * link:https://azure.microsoft.com/en-us/services/documentdb/[Azure DocumentDB]
65 |
66 |
67 | The motivations for selecting each of these pieces of technology is discussed
68 | more in the <> section.
69 |
70 |
71 |
72 | .Component Diagram
73 | [source]
74 | ----
75 |
76 | +--------------------+ +----------------------------+ +--------------+
77 | | GitHub (jenkinsci) +----webhook------>| github-event-queue (App) +--enqueue-->| EventHub |
78 | +--------------------+ +----->+----------------+-----------+ | (all events) |
79 | +------------------------+ | | +--------------+
80 | | GitHub (jenkins-infra) +-------+ |
81 | +------------------------+ |
82 | +----append----+
83 | |
84 | +--------------v--------+
85 | | DocumentDB (github) |
86 | +-----------------------+
87 | ----
88 |
89 |
90 | === Data Format
91 |
92 | The following describes the JSON format to be expected by applications querying
93 | either DocumentDB or consuming from an EventHub.
94 |
95 | .JSON Event/Document Format
96 | [source,json]
97 | ----
98 | {
99 | "source" : "github", // <1>
100 | "type" : "pull_request", // <2>
101 | "event" : {}, // <3>
102 | "received" : "2017-01-05T21:56:04.522Z" // <4>
103 | }
104 | ----
105 | <1> Original source of the event, `"github"` indicates the event originated as a GitHub webhook.
106 | <2> Type of event, in the case of the `github` source, this will be one of the link:https://developer.github.com/webhooks/#events[webhook event types].
107 | <3> The actual JSON payload of the event.
108 | <4> The ISO-8601 timestamp indicating when the event was received by the Azure Functions app
109 |
110 | This table should be updated when new sources and types are added:
111 |
112 | .Event Sources
113 | |===
114 | | Identifier | Description | Types
115 |
116 | | `github`
117 | | A GitHub webhook payload
118 | | One of the link:https://developer.github.com/webhooks/#events[webhook event types].
119 |
120 | |===
121 |
122 |
123 | === Storage Capacity
124 |
125 | The <> for Azure DocumentDB storage is only based on what is _actually_
126 | consumed rather than what is provisioned. Therefore each DocumentDB provisioned
127 | for events should be at minimum *25GB*.
128 |
129 | Time-to-live:: Azure DocumentDB supports a document or collection-level time-to-live (TTL), which
130 | can automatically purge documents after the TTL expires. Until we better
131 | understand the amount of events data the project wishes to store, this will
132 | remain *off*.
133 | Consistency:: Until a compelling reason to change the consistency level for the
134 | DocumentDB resources, the default of *session* consistency should be used.
135 |
136 |
137 | === Monitoring
138 |
139 | The monitoring facilities
140 | link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-monitoring[built into Azure Functions]
141 | don't integrate with any of the existing monitoring tools in use by the Jenkins
142 | project. Azure Functions can however, output diagnostic logs and web server
143 | logs into an Azure storage account. This is not scoped in this document because
144 | it is not yet clear whether these logs are necessary and worth the added cost
145 | of storing them.
146 |
147 | zure DocumentDB does not have a supported integration with DataDog.
148 |
149 |
150 | Until there is more support in DataDog or Azure for better monitoring, these
151 | services will *not* be automatically monitored.
152 |
153 |
154 |
155 | == Motivation
156 |
157 | By provisioning relatively simple webhook receivers which not only archive
158 | events data into a live-queryable datastore (DocumentDB), but also publish those
159 | events on an Azure EventHub, the Jenkins project will have an easy-to-access
160 | data store of project events. Additionally, the events enqueued into EventHub
161 | can act as an input for future services (for example, a project health
162 | dashboard) without requiring additional infrastructure to be provisioned.
163 |
164 | == Rationale
165 |
166 | === Technologies Introduced
167 |
168 | An underlying rationale for using all of the Azure-specific technologies
169 | referenced below is that it is cheaper, easier, and faster to use
170 | platform-level services provided by Azure rather than implementing and hosting
171 | the features of each technology itself.
172 |
173 | ==== Azure Functions
174 |
175 | Implement a GitHub webhook receiver in Azure Functions is sufficiently trivial
176 | that it is one of
177 | link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-a-web-hook-or-api-function[their "Get Started" examples],
178 | and as such Azure Functions has explicit support and ehancements for receiving
179 | webhook payloads from GitHub.
180 |
181 | Additionally, Azure Functions represent a "slice" of computation which is
182 | suitable for the purpose of receiving a JSON payload, processing it, and
183 | storing it for later. This as opposed to implementing a new web application
184 | specifically for this purpose which would need either its own virtual machine
185 | or container infrastructure in order to execute.
186 |
187 |
188 | ==== Azure EventHub
189 |
190 | The use of Azure EventHub in the architecture described above is more
191 | future-proofing than a strong requirement to solve the problem at hand. The
192 | assumption being that additional services in the future will wish to consume
193 | some or all of the events received by the deployed Azure Functions app.
194 |
195 | The most practical means of providing this service internally is through a
196 | pub/sub mechanism which EventHubs provide in Azure. EventHubs also provide the
197 | added benefit of automatically expiring old messages along with many other
198 | valuable queueing features such as consumer groups and partitions.
199 |
200 |
201 | ==== Azure DocumentDB
202 |
203 | GitHub webhook event payloads are constructed as JSON, and it is expected that
204 | any subsequent events will be consumed as JSON. As such, a document-oriented
205 | database ("NoSQL") is preferred in order to avoid time-consuming schema
206 | updates.
207 |
208 |
209 | == Costs
210 |
211 | The pricing
212 | link:https://azure.microsoft.com/en-us/pricing/details/functions/[for Azure Functions]
213 | by itself is already confusing and without an existing Functions app
214 | consuming the `jenkinsci` events it's difficult to evaluate what the runtime
215 | cost would be. That said, 1 million monthly executions are provided for free by
216 | Azure, meaning the Function app itself will cost nothing or very little.
217 |
218 |
219 | The noticeable cost of this proposal will come from the
220 | link:https://azure.microsoft.com/en-us/pricing/details/event-hubs/[EventHub]
221 | and
222 | link:https://azure.microsoft.com/en-us/pricing/details/documentdb/[DocumentDB]
223 | storage and transit rates.
224 |
225 |
226 | === DocumentDB
227 |
228 | The storage rate, in East US, is *$0.25 per GB / month*. The throughput rate, in East US, is *$0.008/hr* per hundred
229 | link:https://docs.microsoft.com/en-us/azure/documentdb/documentdb-manage#request-units-and-database-operations[Request Units per second].
230 |
231 | Assuming each DocumentDB instance is provisioned with 5GB of storage, the
232 | annual storage cost will be roughly *$15*. Though this is likely to go up as
233 | more data is stored over time.
234 |
235 | The throughput rate's annual cost is difficult to ascertain without real-world
236 | usage, but is expected to remain under *$100* barring dramatic shifts in
237 | expected usage.
238 |
239 | === EventHub
240 |
241 | The cost per million ingress events in East US is a paltry $0.028, so not worth
242 | discussing.
243 |
244 | The throughput unit cost (1MB ingress, 2MB egress) comes to an annual cost of
245 | *$133.92*.
246 |
247 | == Reference implementation
248 |
249 | The reference implementation of the Azure Functions app can be found in the
250 | link:https://github.com/jenkins-infra/analytics-functions[jenkins-infra/analytics-functions]
251 | repository.
252 |
253 | The Terraform for actually provisioning the Azure Functions app in Azure can be
254 | found in
255 | link:https://github.com/jenkins-infra/azure/pull/12[this pull request].
256 |
257 |
--------------------------------------------------------------------------------
/iep-003/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-3: Terraform for describing infrastructure as code
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 3
18 |
19 | | Title
20 | | Terraform for describing infrastructure as code
21 |
22 | | Author
23 | | link:https://github.com/rtyler[R. Tyler Croy]
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | Process
30 |
31 | | Created
32 | | 2016-11-16
33 | |===
34 |
35 |
36 |
37 | == Abstract
38 |
39 | The migration to Azure means all infrastructure is (technically) an API call
40 | away. This means that more of our infrastructure can be managed via automation
41 | rather than previously where, in some cases, it was managed via support
42 | tickets. In order to develop, modify, and deploy Azure-based infrastructure the
43 | Jenkins project infrastructure should be define all infrastructure services and
44 | resources programmatically.
45 |
46 | == Specification
47 |
48 |
49 | link:http://terraform.io[Terraform]
50 | is a tool which provides Azure-specific abstractions
51 | footnote:[https://www.terraform.io/docs/providers/azurerm/index.html]
52 | for defining Azure resources programmatically, in addition to support for
53 | persistent the state of which resources have/have not yet been created.
54 |
55 | This document is not intended to explain all the features of Terraform, but
56 | rather describe how it should be used within the Jenkins project
57 | infrastructure.
58 |
59 |
60 | === Defining Infrastructure
61 |
62 | As a proof-of-concept, an existing Azure Storage account was modeled
63 | footnote:[https://github.com/jenkins-infra/azure/blob/7f3032ab2d2ef411d74d3a81097fbcd575850a34/plans/releases-storage.tf]
64 | with Terraform, e.g.:
65 |
66 | .releases-storage.tf
67 | [source]
68 | ----
69 | resource "azurerm_resource_group" "releases" {
70 | name = "${var.prefix}jenkinsinfra-releases"
71 | location = "East US 2"
72 | }
73 | resource "azurerm_storage_account" "releases" {
74 | name = "${var.prefix}jenkinsreleases"
75 | resource_group_name = "${azurerm_resource_group.releases.name}"
76 | location = "East US 2"
77 | account_type = "Standard_GRS"
78 | }
79 | resource "azurerm_storage_container" "war" {
80 | name = "war"
81 | resource_group_name = "${azurerm_resource_group.releases.name}"
82 | storage_account_name = "${azurerm_storage_account.releases.name}"
83 | container_access_type = "container"
84 | }
85 | ----
86 |
87 | Logical clusters or groups of resources should be defined in this manner, using
88 | a single `.tf` file in the `plans/` directory in the repository
89 | footnoteref:[azurerepo, https://github.com/jenkins-infra/azure].
90 | This makes finding the right Terraform plan corresponding to a specific part of
91 | the infrastructure easy to find, easier to review, and easy to test in
92 | isolation from the other resources.
93 |
94 |
95 | *All identifiers must be prefixed*.
96 |
97 | Azure contains a number of *global* identifier namespaces which can cause
98 | conflicts between two different contributors, or two different environments,
99 | when defining infrastructure. For example, if DevA defines a resource group
100 | named "jenkins", DevB cannot also define a resource group named "jenkins".
101 | _Some_ identifiers are subscription specific, but in order to avoid potential
102 | conflicts, all identifiers in Terraform resources *must* use the `prefix`
103 | variable:
104 |
105 | .example.tf
106 | [source]
107 | ----
108 | resource "azurerm_resource_group" "example" {
109 | name = "${var.prefix}jenkinsinfra-examples" # <1>
110 | location = "East US 2"
111 | }
112 | ----
113 | <1> Referencing `var.prefix` pulls in an environment/developer-specific defined prefix
114 |
115 |
116 | === Storing State
117 |
118 | Terraform generates state which allows the tool to act in an idempotent
119 | fashion. That is to say, without a `.tfstate` file of some form, Terraform may
120 | create redundant infrastructure in Azure.
121 |
122 | This state *is* an important part of what makes Terraform a useful tool for the
123 | Jenkins project (see <>). This state contains access keys, and other
124 | semi-confidential information which would not be safe to check into the
125 | repository
126 | footnoteref:[azurerepo].
127 |
128 |
129 | To reap the benefits of Terraform state without needing a local filesystem or
130 | `.tfstate` file checked into the repository
131 | footnoteref:[azurerepo]
132 | the proposal is to store _production_ Terraform state in Azure blob storage
133 | footnoteref:[blobstore, https://azure.microsoft.com/en-us/services/storage/blobs/]
134 | using Terraform's built-in
135 | link:https://www.terraform.io/docs/state/remote/index.html[Remote State]
136 | functionality.
137 |
138 | For production state, this would be configured as such:
139 |
140 | .prodstate.tf
141 | [source]
142 | ----
143 | data "terraform_remote_state" "prod_tfstate" {
144 | backend = "azure"
145 | config {
146 | storage_account_name = "jenkinsinfra-tfstate"
147 | container_name = "production"
148 | key = "terraform.tfstate"
149 | }
150 | }
151 | ----
152 |
153 |
154 | ==== Requirements
155 |
156 | Storing Terraform state in Azure blob storage dictates two requirements to the
157 | infrastructure code:
158 |
159 | . A separate "bootstrap" set of Terraform plans exists to define the storage
160 | containers necessary to store/access production state.
161 | . `terraform apply` statements be run in a consistent fashion, in order to
162 | ensure that the appropriate production state is being referenced during the
163 | execution of the Terraform plans.
164 |
165 |
166 | The first requirement is readily addressed with a separate directory structure
167 | and some tooling in the repository
168 | footnoteref:[azurerepo].
169 |
170 | The second requirement addressed with the use of a "proper" Jenkins-based
171 | delivery pipeline for the Terraform plans. This would entail a Jenkins
172 | environment which had the appropriate credentials for provisioning
173 | infrastructure in the production Azure environment, e.g.:
174 |
175 | [source, groovy]
176 | ----
177 | node('azurerm') {
178 | checkout scm
179 |
180 | stage('Validate') {
181 | sh 'terraform validate plans/*.tf'
182 | }
183 | stage('Plan') {
184 | sh 'terraform plan plans'
185 | }
186 | stage('Apply') {
187 | input 'Do the plans look good?'
188 | sh 'terraform apply plans'
189 | }
190 | }
191 | ----
192 |
193 |
194 | This approach provides a single point of deployment for Terraform plans which
195 | can be then inspected or otherwise interacted with by the entirety of the
196 | Jenkins project infrastructure team instead of relying on individual
197 | contributors' laptops.
198 |
199 |
200 | == Motivation
201 |
202 | By using Terraform to describe *all* infrastructure there will be no "hidden"
203 | infrastructure which is only known to a select few, rather than the current
204 | situation where one or two people might be aware of _where_ certain resources are
205 | located or how they relate to others.
206 |
207 |
208 | Defining all infrastructure resources in Terraform also lowers the bar for new
209 | infrastructure contributions. Not only by making the actual infrastructure
210 | topologies open source, but by allowing practically anybody to provision
211 | infrastructure which resembles Jenkins project infrastructure. Currently there
212 | is no way to provision a "dev version of the Jenkins project infrastructure"
213 | and this would be feasible with Terraform plans describing the project's
214 | infrastructure.
215 |
216 |
217 | == Rationale
218 |
219 | The benefits of describing infrastructure as code should be self-evident,
220 | before considering the rationale for choosing Terraform, first consider some
221 | other options:
222 |
223 |
224 | === Azure Resource Manager (ARM) templates
225 |
226 | ARM templates
227 | footnoteref:[arm, https://azure.microsoft.com/en-us/documentation/articles/resource-group-authoring-templates/]
228 | are conceptually interchangeable with AWS Cloud Formation templates; templates
229 | defined in JSON to describe cloud resources.
230 |
231 | *Pros*
232 |
233 | * Supported for practically all resources on Azure
234 | * Relatively simple to use for the basic use-cases
235 |
236 |
237 | *Cons*
238 |
239 | * No state and therefore
240 | * Not idempotent
241 | * Entirely foreign to many within the operations community.
242 | * Would require an external source of template parameters in order to function.
243 | In essence, ARM allows template parameters but use of parameterized templates
244 | would mean Jenkins project infrastructure automation would require these to
245 | be defined externally to the ARM template to model multiple
246 | (local-dev vs. production) environments
247 |
248 | === Puppet-defined resources
249 |
250 | See also:
251 | link:https://github.com/puppetlabs/puppetlabs-azure[puppetlabs-azure module]
252 |
253 | *Pros*
254 |
255 | * Jenkins infrastructure already has large amounts of Puppet code
256 | implemented, and a well-defined workflow for modifying, testing, and
257 | deploying Puppet code.
258 | * Puppet's graph approach supports idempotency
259 |
260 | *Cons*
261 |
262 | * Puppet must have an "execution context" which for most Puppet catalogues
263 | means a `node` (machine) which the catalogue is being executed against. In
264 | order to provision Azure resources a "deployment" node would need to exist
265 | whose sole job would be to provision Azure resources from Puppet. Basically,
266 | one cannot run Puppet on "Azure" to provision an Azure Load Balancer (for
267 | example).
268 | * The puppetlabs-azure module is not a very common approach which means the
269 | tooling will lag behind "native" (i.e. supported by Microsoft) toolchains
270 | such as ARM templates
271 | footnoteref:[arm].
272 |
273 | === Terraform
274 |
275 | :+1:
276 |
277 | Terraform is a reasonably popular and well understood tool, which enjoys
278 | contributions from Microsoft for its Azure support.
279 |
280 | *Pros*
281 |
282 | * Stateful and therefore
283 | * Supports idempotent operations
284 | * Widely used in the "modern operations" community, while the specific
285 | resources might not be familiar to newcomers, the tool itself would be.
286 | * Variable substitution and separation of state files allows development
287 | clusters to be created entirely separate from production while still
288 | resembling production infrastructure.
289 |
290 |
291 | *Cons*
292 |
293 | * "Yet another DSL" to learn in order to effectively contribute to the Jenkins
294 | project infrastructure
295 | * Doesn't support all resources defined by Azure, which might dictate the use
296 | of the
297 | link:https://www.terraform.io/docs/providers/azurerm/r/template_deployment.html[azurerm_template_deployment]
298 | resource in Terraform, and still needing to write ARM templates
299 | footnoteref:[arm].
300 |
301 |
302 |
303 |
304 |
305 | == Costs
306 |
307 | There are no additional financial costs associated with using Terraform. There
308 | is a learning curve associated with Terraform but it's safe to assume that
309 | there's a learning curve with all things Azure for the infrastructure
310 | contributors at this point in time.
311 |
312 | == Reference implementation
313 |
314 | This
315 | link:https://github.com/jenkins-infra/azure/blob/7f3032ab2d2ef411d74d3a81097fbcd575850a34/plans/releases-storage.tf[plan]
316 | is a reference implementation of the only Azure resources provisioned to date.
317 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "{}"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright {yyyy} {name of copyright owner}
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/iep-012/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-12: Jenkins Release Management
10 |
11 | :toc:
12 |
13 | .Metadata
14 | [cols="2"]
15 | |===
16 | | IEP
17 | | 12
18 |
19 | | Title
20 | | Jenkins Release Management
21 |
22 | | Author
23 | | link:https://github.com/olblak[Olivier Vernin]
24 |
25 | | Status
26 | | :speech_balloon: In-process
27 |
28 | | Type
29 | | [Architecture]
30 |
31 | | Created
32 | | 2018-07-25
33 | |===
34 |
35 |
36 | == Abstract
37 | In a matter of continuously improving Jenkins infrastructure project, it's now time to address how Jenkins is released and distributed.
38 | Currently new version can only be signed and published by link:https://github.com/kohsuke[Kohsuke], which brings a lot of responsibilities on his shoulder and prevent the project to release new version when needed.
39 | So the purpose of this document is to design a secure and clear way to allow trusted people to trigger a new release.
40 |
41 | == Specification
42 | As the jenkins project is build and driven by its community, we want to be as much transparent as possible in the way we build, test and release new versions and allow everybody to understand and provide reviews or audits. So the release process must be as much as possible automated and well-defined from git repositories.
43 |
44 | Building, testing, releasing new jenkins version covers different topics which need be addressed in this document.
45 |
46 | * Environment Provisioning
47 | * Environment Configuration
48 | * Environment Workflow
49 |
50 | === Provisioning
51 | Obviously the first element to agree on, is where to run those environments and how to provision them.
52 | Today, the Jenkins Infrastructure project mainly use an Azure account provisioned from some Terraform code located link:https://github.com/jenkins-infra/azure[here]. This repository make it easy for everybody to participate either by auditing, improving, or even simply learning. So we just have to decide on the services that we need there.
53 |
54 | Before going further, we must clarify that while we try to be as open as possible, we still need to have some element well hardened, to avoid malicious person to abuse our systems or to steal private information like keys. To avoid this situation, we need two level of security accesses, one infrastructure reachable for all contributors to have quick feedback on test results and another one with limited access from the internet for either security contributors or the release officer.
55 |
56 | As we are already running Jenkins master and agent in Docker containers with great success, we want to go further and use a container orchestration tool to deploy and configure our build environment. Kubernetes is an awesome tool and therefor selected to orchestrate our workload. We'll use the Azure Kubernetes service.
57 |
58 | Another important element, is where is stored the gpg key and the ssl certificate.
59 | Those two keys are used by maven release to generate signed artifacts.
60 | They are what ensure that end user can trust our artifacts and therefor must be extremely well protected.
61 |
62 | They are two approaches which have both benefits.
63 |
64 |
65 | The first and the most convenient solution is to store every secret encrypted in a git repository and use a tool like link:https://github.com/mozilla/sops[sops] to unencrypt them when needed. This solution make it easy for everybody to audit secret modifications. The build environment only requires a GPG key to decrypt those secrets. This means that we don't need any additional infrastructure other than a Kubernetes cluster.
66 |
67 |
68 | The second solution would be to deploy the GPG and the SSL certificate somewhere else not publicly available and reduce the access to specific networks.
69 |
70 | image:images/azure.png[azure]
71 |
72 | The latter solution open the door to more secure but also more complex infrastructure and complexify the reproducibility of our build environment which is not necessarily needed for non-critical environment.
73 |
74 | Both can be used depending on the environment.
75 |
76 |
77 | This link:https://github.com/jenkins-infra/azure/pull/75[PR] partially implements this design
78 |
79 | === Configuration
80 | Now that our infrastructure provisioning is defined, we can look at how we'll configure and the benefit of using Kubernetes for such build environment.
81 |
82 | Before going further, I want to remind some key features provided by Kubernetes that influence our design. First, every Kubernetes cluster has an internal DNS where each internal URL means a "service-name" inside a namespace or "service-name.namespace" across namespaces. This internal DNS make it easy to spin up multiple environment using the same internal endpoint.
83 | Another powerful feature is that we can map internal endpoint with an external one.
84 | This make the environment totally agnostic of where the services are located.
85 |
86 | Let's come back to our environment configuration. Kubernetes in association with Helm makes it easy to deploy and configure all kind of services required to build test and publish in a very convenient way wherever the cluster is located Azure or even Minikube.
87 | Most of the services that we need can already be deploy with Helm charts.
88 |
89 | The big picture is to create an environment charts, one to rules them all.
90 | This main chart is responsible to deploy jenkins credentials as Kubernetes secret, set a custom jenkins image with specific configuration and finally some chart dependencies to deploy third services (Artifactory,maven registry, ...)
91 |
92 |
93 | For example, let's imagine two different environments, security and release.
94 | The security environment generate short-live artifacts and so is configured to deploy every services needed to share those artifacts (maven, docker, war, etc.). It's easier to control the access, we don't pollute the official repository.
95 | But in the case of the release environment, it doesn't make sense to deploy a maven repository behind maven.release.jenkins.io as we already have link:https://repo.jenkins-ci.org[repo.jenkins-ci.org]
96 | So we just map http://repo with https://repo.jenkins-ci.org.
97 |
98 | .Environment
99 | image:images/environment_namespace.png[]
100 |
101 | Using Kubernetes as CI/CD environment is not only about how we deploy and configure Jenkins but how we deploy and configure all third services that we need.
102 |
103 | This repository use Jenkins-x to configure an environment https://github.com/olblak/jenkins-release-environment. Jenkins-x is an opinionated way for configuring such environment on Kubernetes.
104 |
105 | Following table display the different endpoints that we can use for each environment where the internal endpoint are the same for every environment.
106 |
107 | [cols="h,3*", options="header"]
108 | .Endpoint
109 | |===
110 | | | Repo | Jenkins | Registry
111 | | Internal | http://repo | http://ci | http://registry
112 | | Weekly | http://repo.weekly.jenkins.io | https://ci.weekly.jenkins.io | http://registry.weekly.jenkins.io
113 | | Security | https://repo.cert.jenkins.io | https://ci.cert.jenkins.io | https://ci.registry.jenkins.io
114 | | Release | https://repo.jenkins-ci.org | https://ci.release.jenkins.io | https://hub.docker.com
115 | |===
116 |
117 | === Workflow
118 | It's now time to cover the last topic, which is how we build, test and release a specific version for an environment. Jenkins is used as the conductor by means of Jenkinsfile and will execute a succession of actions defined link:https://github.com/olblak/jenkins/blob/master/Jenkinsfile.release[here].
119 | In order to simplify the maintainability of this script, most of the logic as been moved to a shell link:https://github.com/olblak/jenkins/blob/master/scripts/buildJenkins.bash[script] that accept various arguments.
120 |
121 | All endpoints are hard-coded with Kubernetes internal endpoint like "repo", jenkins, ... And any other configuration is done via environment variables configured at the environment level.
122 | So every environment use the same Jenkinsfile but the output change depending on the environment configuration and the git repository.
123 |
124 | Maven with the link:http://maven.apache.org/maven-release/maven-release-plugin/[release plugin] is used to build Jenkins(link:https://wiki.jenkins.io/pages/viewpage.action?pageId=3309681[why?]).
125 |
126 | So in order to successfully build the application, we must:
127 |
128 | ****
129 | . Retrieve the GPG key.
130 | . Retrieve the private/public SSL certificate.
131 | . Retrieve password to unlock the gpg key and the certificate from Azure Key Vault.
132 | . Prepare the release with Maven ```mvn release:prepare```
133 | . Perform the release with Maven ```mvn release:perform```
134 | . Upload the WAR file.
135 | ****
136 |
137 | ==== Authorization
138 | Every environment maintainer is responsible to decide if his environment is publicly accessible or must be deployed in a private network. Then he can decide which mechanism(certificate, SSO ) is necessary either at the ingress level or inside the environment.
139 |
140 | ==== Artifacts
141 |
142 | The Jenkins project provides signed packages for Debian, Redhat, Suse, MacOSX and Windows.
143 |
144 | Because building a new Jenkins version can take quite a lot of time, the link://https://github.com/jenkinsci/packaging[packaging] process must be decoupled from the release process.
145 | This allow a Jenkins administrator to build and publish a war file in advance and only generate and publish distribution packages when needed.
146 | It also reduces the blast radius of an error happening in a Jenkins job once the war file is build and published.
147 |
148 | Packaging a new version means following steps:
149 | ****
150 | . Retrieve the latest git tag from link:/https://github.com/jenkinsci/jenkins[jenkinsci/jenkins].
151 | . Download the war package
152 | . Build one package per distribution if it wasn't already published.
153 | . Publish one package per distribution.
154 | ****
155 |
156 | ==== Release
157 | We identify three different release types as explain https://jenkins.io/download/[here]
158 |
159 | . link:https://jenkins.io/download/lts/[LTS]: a LTS release is chosen every 12 weeks from the stream of regular releases as the stable release for that time period. https://repo.jenkins-ci.org/releases/org/jenkins-ci/main/jenkins-war/[Download]
160 | . link:http://mirrors.jenkins.io/war/[Weekly]: A weekly release aim to deliver bug fixes and features to users and plugin developers.
161 | . Security Release: Regularly the security officer needs to build a "private" version from jenkinsci-cert/jenkins to do some testing or to share with other security contributors, once ready then merge successful version into link:/https://github.com/jenkinsci/jenkins[jenkinsci/jenkins]
162 |
163 | Deprecated Release
164 |
165 | . link:http://mirrors.jenkins.io/war-stable-rc/[LTS-RC]: Represented the futur stable version.
166 | . link:http://mirrors.jenkins.io/war-rc/[Weekly-RC]: Represented to futur weekly release.
167 |
168 | With the current design, one release corresponds to one environent.
169 |
170 | ==== Credentials
171 | In order to release and publish new releases, we need several credentials.
172 |
173 | A GPG key is used to sign War files and must be stored on an encrypted azure blob storage. The password used to decrypt the GPG key will be stored on an Azure Key Vault.
174 |
175 | A SSL certificate is required to sign 'jar' and will be stored directly on the Azure Key Vault, included the password to decrypt the certificate. The password can also be configured at the Jenkins instance level if we consider that It's better from a security point of view to not store both the certificate key and the password at the same place.
176 |
177 | A ssh key is needed with push permission on link:https://github.com/jenkinsci/jenkins[jenkinsci/jenkins] repository.
178 |
179 | An Azure storage account key is needed to publish some distribution packages to the link:https://github.com/jenkins-infra/azure/blob/master/plans/releases-storage.tf[Azure Blob Storage].
180 |
181 | == Rationale
182 |
183 |
184 | == Costs
185 | Obviously the major cost is related to the Kubernetes cluster.
186 |
187 | == Reference implementation
188 | * link:http://lists.jenkins-ci.org/pipermail/jenkins-infra/2018-June/001448.html[Mail Thread]
189 | * link:https://support.cloudbees.com/hc/en-us/articles/222838288-ssh-credentials-management-with-jenkins[SSH credentials configuration]
190 | * link:https://batmat.net/2017/01/30/do-not-run-your-tests-in-continuous-integration-with-the-root-user/[Do not run your tests with root user]
191 | * link:https://github.com/jenkinsci/jenkins/blob/master/BUILDING.TXT[Build Jenkins]
192 | * link:https://github.com/olblak/jenkins/blob/master/Jenkinsfile.release[Release Jenkinsfile Prototype]
193 | * link:https://gist.github.com/kohsuke/3319b65432ab40793eadc297e2456b79[Release Script]
194 | * link:https://github.com/jenkinsci-cert/jenkins/wiki[Security Wiki]
195 | * link:https://jenkins.io/download/lts/[Release LTS]
196 | * link:https://github.com/jenkins-infra/runbooks/blob/master/security/flow.pdf[Flow]
197 | * https://wiki.jenkins.io/display/SECURITY/SECURITY+issues+in+plugins
198 |
199 |
--------------------------------------------------------------------------------
/iep-004/README.adoc:
--------------------------------------------------------------------------------
1 | ifdef::env-github[]
2 | :tip-caption: :bulb:
3 | :note-caption: :information_source:
4 | :important-caption: :heavy_exclamation_mark:
5 | :caution-caption: :fire:
6 | :warning-caption: :warning:
7 | endif::[]
8 |
9 | = IEP-4: Kubernetes for hosting project applications
10 |
11 | :toc:
12 | :hide-uri-scheme:
13 | :sect-anchors:
14 |
15 | .Metadata
16 | [cols="2"]
17 | |===
18 | | IEP
19 | | 4
20 |
21 | | Title
22 | | Kubernetes for hosting project applications
23 |
24 | | Author
25 | | link:https://github.com/rtyler[R. Tyler Croy], link:https://github.com/olblak[Olivier Vernin]
26 |
27 | | Status
28 | | :speech_balloon: In-process
29 |
30 | | Type
31 | | Architecture
32 |
33 | | Created
34 | | 2016-12-15
35 | |===
36 |
37 |
38 | == Abstract
39 |
40 | The Jenkins project hosts a number of applications, using different technology
41 | stacks, with
42 | link:https://en.wikipedia.org/wiki/Docker_%28software%29[Docker]
43 | containers as the packaging and runtime environment. While the merits of using Docker
44 | containers to run these applications is not the subject of this IEP document,
45 | the hosting and deployment of these containers is. In essence, some tooling for
46 | safely hosting, deploying, and monitoring containers within the Jenkins
47 | infrastructure is necessary in order to support the applications the project
48 | requires.
49 |
50 | == Specification
51 |
52 | To support aforementioned containerized applications,
53 | link:http://kubernetes.io[Kubernetes]
54 | (also referred to as "k8s") will be deployed using the
55 | link:https://azure.microsoft.com/en-us/services/container-service/[Azure Container Service]
56 | (ACS). This specification will detail the *initial* Kubernetes cluster
57 | architecture but is not prescriptive on how the cluster should grow/shrink as
58 | requirements change for the applications hosted.
59 |
60 | This specification only outlines the architecture for the "Public Production"
61 | footnoteref:[iep2,https://github.com/jenkins-infra/iep/tree/master/iep-002]
62 | Kubernetes cluster. It is expected that the project will run multiple
63 | Kubernetes clusters, following this architecture, depending on the access
64 | control requirements for each discrete Kubernetes.
65 |
66 | At a high level Kubernetes is a master/agent architecture which would live in a
67 | *single region*. For the purposes of Jenkins infrastructure, the production
68 | Kuberenetes clusters would be located in the "East US" Azure region.
69 | footnoteref:[regions,https://azure.microsoft.com/en-us/regions/]
70 |
71 | The virtual machines are all
72 | link:https://azure.microsoft.com/en-us/pricing/details/virtual-machines/series/#d-series[D-series]
73 | instances in order to provide an ideal balance of cost and performance. For
74 | more, see <> below.
75 |
76 |
77 | *Kubernetes master*
78 |
79 | A cluster would run a single D3v2 instance as the master node
80 |
81 |
82 | *Kubernetes agents*
83 |
84 | A cluster would run agents using a
85 | link:https://azure.microsoft.com/en-us/services/virtual-machine-scale-sets/[Scale Set]
86 | of D2v2 instances, with a minimum of three agents running at all times.
87 |
88 |
89 |
90 | [NOTE]
91 | ====
92 | Following
93 | link:https://github.com/jenkins-infra/iep/tree/master/iep-003[IEP-3]
94 | the <> uses Terraform to describe the infrastructure
95 | necessary to support this specification.
96 | ====
97 |
98 |
99 | === Monitoring
100 |
101 | As the Jenkins project already uses
102 | link:http://datadoghq.com[Datadog]
103 | for monitoring our production infrastructure, the Kubernetes cluster must
104 | integrate appropriately.
105 |
106 | This will be accomplished using the
107 | link:http://docs.datadoghq.com/integrations/kubernetes/[Datadog/Kubernetes integration]
108 | maintained by Datadog themselves.
109 |
110 | The integration is will provide metrics for both:
111 |
112 | * Kubernetes cluster health/status
113 | * Container health/status running atop the cluster
114 |
115 | The metrics required are not specified here under the assumption that all
116 | metrics appropriate for ensuring stable service levels of project
117 | infrastructure will be collected.
118 |
119 | === Logging
120 |
121 |
122 | Centralized logging for applications hosted within Kubernetes will be provided
123 | a combination of containers in the cluster running
124 | link:https://en.wikipedia.org/wiki/Fluentd[Fluentd]
125 | and the Microsoft
126 | link:http://www.microsoft.com/en-us/cloud-platform/operations-management-suite[Operations Management Suite]
127 | (OMS).
128 |
129 | Fluentd container will redirect logs based on rules defined by log's type.
130 |
131 | As a first iteration, we identify two log's type, archive and stream.
132 | They are explain below.
133 |
134 |
135 | ==== Type
136 | ===== Stream
137 | 'Stream' means that logs are directly send to log analytics
138 | where they will be available for a short time period (7 days).
139 | After what they will be definitively deleted.
140 |
141 | Reasons why we consider logs as 'stream' are:
142 |
143 | * Costs, we don't want to pay for useless logs' storage
144 | * Debugging, we may have to analyze application's behaviour
145 |
146 | In order to retrieve logs informations, we'll need an access to log analytics dashboard.
147 |
148 | *! This is the default behaviour followed by all log's types*
149 |
150 | ===== Archive
151 | 'Archive' means that we want to access them for a long time period.
152 | We store them on an azure blob storage (or shared disk).
153 | Those logs will be kept on Azure containers for an undetermined period.
154 | In order to retrieved them, we'll have to request compressed archives from an admin.
155 |
156 | Reasons why we may consider logs as 'archive' are:
157 |
158 | * Need long time period access
159 | * Want to backup important informations
160 |
161 | Logs work as follow:
162 |
163 | * Docker containers write logs to json files located in /var/log/containers/ on each kubernetes agent
164 | * Each kubernetes agent run one fluentd container( as a daemonset) that read logs from /var/log/containers
165 | and apply some 'rules'
166 |
167 |
168 | .Data flow for k8s logs
169 | [source]
170 | ....
171 | +--------------------------------------------------------------+
172 | | K8s Agent: |
173 | | +------------+ |
174 | | |Container_A | |
175 | | | | |
176 | | Agent Filesystem: +---------+--+ |
177 | | +--------------------+ | Azure LogAnalytics |
185 | | |Fluentd +-------------------------------/ | +--------------------+
186 | | |Container +-------------------------------\ | +--------------------+
187 | | +----------+ apply_rule_0_archive_logs_to --------------------->| Azure Blob Storage |
188 | | | +--------------------+
189 | +--------------------------------------------------------------+
190 | ....
191 |
192 | In order to know which workflow need to be apply.
193 | We use kubernetes lables.
194 |
195 | By convention we use label 'logtype'.
196 |
197 | If logtype == 'archive', we apply 'archive' workflow.
198 | Otherwise we apply 'stream' workflow.
199 |
200 | pros:
201 |
202 | * We don't have to modify default logging configuration.
203 | * We don't have to rebuild docker image when we change log type.
204 | * We don't have to restart docker container when we modify log type.
205 | * Easy to handle from fluentd configuration.
206 |
207 | cons:
208 |
209 | * We can't have different log's types within an application
210 |
211 | A docker image that implement this workflow can be found in Olivier Vernin's
212 | link:https://github.com/olblak/fluentd-k8s-azure[fluentd-k8s-azure]
213 | repository.
214 |
215 |
216 | === Deployment/Orchestration
217 |
218 | As we made the decision to use a kubernetes infrastructure, +
219 | We still need processes and tools to automate and tests kubernetes deployment.
220 |
221 | Kubernetes use : +
222 |
223 | - Yaml files to define infrastructure state
224 | - Kubectl cli to interact with kubernetes cluster (CRUD operations)
225 |
226 | Because using kubectl to deploy yaml files is idempotent,we can easily script it.
227 |
228 | We investigated two approaches to automate k8s deployment
229 |
230 | 1. Using Jenkins + scripting to apply configurations
231 | 2. Using Puppet to apply configurations
232 |
233 | ==== Jenkins
234 | Each time we need to apply modifications, we just have to create/update yaml configurations files
235 | and let jenkins deploy it. +
236 | Fairly easy as kubectl is idempotent so we just have to follow this process. +
237 |
238 | .Jenkins
239 | +-------------+ +----------------+
240 | | | | Github |
241 | | Contributor +-------------------------> | jenkins-infra |
242 | | | Commit | |
243 | +-------------+ code +--------+-------+
244 | |
245 | | Trigger
246 | +-----------------+ | Test&Deploy
247 | | K8s cluster | v
248 | | +-----+ +-----+ | +--------+-------+
249 | | |Node | |Node | | | Jenkins |
250 | | | 1 | | 2 | <-----------------------| Jenkinsfile |
251 | | +-----+ +-----+ | Apply configurations +----------------+
252 | +-----------------+
253 |
254 | But:
255 |
256 | - How do we share/publish secret informations, credentials,...? +
257 | A solution would be to encrypt secrets with password or gpg keys before pushing them
258 | on git repository.
259 | Jenkins will have to unencrypt them before deploying them on kubernetes cluster.
260 | - How do we handle resources ordering? +
261 | We can use naming conventions to be sure that resource 00-secret.yaml will be deploy before 01-daemonset.yaml
262 | - Is some case, complexe logics need to be apply to achieve correct deployments. +
263 | Which can be done through scripting like bash,python,... +
264 | Ex: Updating k8s secrets, do not update secrets used in containers applications.
265 | which mean that each time we update secrets, we also have to take care of pods using them
266 |
267 | Problems explained above are common concerns that configuration management tools try to solve.
268 |
269 | ==== Puppet
270 | We may also use puppet to template and apply kubernetes configurations files.
271 | Main advantages:
272 | - We already have a puppet environment configured and correctly working
273 | - We already have a good testing process with puppet, linting, rspec,...
274 | - We already have a deployment workflow, feature branch -> staging -> production
275 | - We can use hiera to store secrets
276 | - We can use puppet to define complexe scenarios
277 |
278 | .Puppet
279 |
280 | +-------------+ +----------------+
281 | | | | Github |
282 | | Contributor +--------------------> | jenkins-infra |
283 | | | Commit | |
284 | +-------------+ code +--------+-------+
285 | |
286 | | Trigger
287 | +-----------------+ | Test
288 | | K8s cluster | |
289 | | +-----+ +-----+ | +--------v-------+
290 | | |Node | |Node | | | Jenkins |
291 | | | 1 | | 2 | | | Jenkinsfile |
292 | | | | | | | +--------+-------+
293 | | +-----+ +-----+ | |
294 | +--------+--------+ | Merge
295 | ^ | Production
296 | | |
297 | | +--------v-------+
298 | | | Puppet |
299 | +---------------------------+ Master |
300 | Apply configurations +----------------+
301 |
302 | ==== Conclusion
303 | We agreed that we gonna use puppet to deploy kubernetes configurations.
304 | If needed we are still able to use another solution.
305 |
306 | == Motivation
307 |
308 | The motivation for centralizing container hosting is fairly
309 | self-evident. Consistency of management, deployment, logging, monitoring, and
310 | runtime environment will be a major time-saver for volunteers participating in
311 | the Jenkins project.
312 |
313 | Additionally, consolidation on a well understood and supported tool
314 | (Kuberenetes) allows the infrastructure team to spend less time operating the
315 | underlying hosting platform.
316 |
317 |
318 | == Rationale
319 |
320 | As mentioned in the <>, the Jenkins project runs containerized
321 | applications, the merits of which are outside the scope of this document.
322 | Thusly this document outlines an approach for managing numerous containers in
323 | Azure.
324 |
325 | There is a fundamental assumption being made in using Azure Container Service,
326 | that is: it's cheaper/easier/faster to use a "turn-key" solution for building
327 | and running a container orchestrator (e.g. Kubernetes) than it would be to
328 | build out such a cluster ourselves using virtual machines and Puppet (for
329 | example).
330 |
331 | With this assumption, the options provided by ACS are: Kubernetes, Docker
332 | Swarm, or DC/OS.
333 |
334 | The selection for Kubernetes largely rests on two criteria:
335 |
336 | . Kubernetes is supported in some form by two of the three major cloud vendors
337 | (Microsoft, Google). Which indicates project maturity and long-term support but
338 | also flexibility for the Jenkins project to migrate to alternative cloud
339 | vendors if the need were to arise.
340 | . Developer preference: we prefer Kubernetes and the tooling it provides over the alternatives.
341 |
342 | === Docker Swarm
343 |
344 | Docker Swarm is the leading option, behind Kubernetes, But the open source
345 | "swarm mode" functionality is not supported by Azure Container Service, nor is
346 | Docker Swarm supported by any other vendor other than Microsoft at this point.
347 |
348 | The focus from Docker, Inc. seems to be more on products such as
349 | link:https://www.docker.com/products/docker-datacenter[Docker Datacenter]
350 | long-term, which makes choosing Docker Swarm on ACS seem risky.
351 |
352 | === DC/OS
353 |
354 | Similar to Docker Swarm on ACS, there is no mainstream support for DC/OS on
355 | other cloud providers which suggests either immaturity in the project or lack
356 | of long-term committment by platform vendors to support it.
357 |
358 | Additionally, at this point in time, the authors of this document do not know
359 | anybody committed to running production workloads on DC/OS (we're certain they
360 | exist however).
361 |
362 | == Costs
363 |
364 | [quote, https://azure.microsoft.com/en-us/pricing/details/container-service/]
365 | ____
366 | ACS is a free service that clusters Virtual Machines (VMs) into a container
367 | service. You only pay for the VMs and associated storage and networking
368 | resources consumed.
369 | ____
370 |
371 |
372 | Assuming a single minimally scaled cluster with a single master and three
373 | agents, the annual cost of the Kubernetes cluster itself would be: *$3,845.64*.
374 | Obviously as the number of agents increases, the cost will increase per-agent
375 | instance.
376 |
377 |
378 | .Costs
379 | |===
380 | | Instance | Annual Cost (East US)
381 |
382 | | D2v2
383 | | $1278.96
384 |
385 | | D3v2
386 | | $2566.68
387 | |===
388 |
389 |
390 | [[reference-implementation]]
391 | == Reference Implementation
392 |
393 |
394 | The current reference implementation is authored by
395 | link:https://github.com/olblak[Olivier Vernin]
396 | in
397 | link:https://github.com/jenkins-infra/azure/pull/5[pull request #5]
398 | to the
399 | link:https://github.com/jenkins-infra/azure[azure]
400 | repository.
401 |
--------------------------------------------------------------------------------