├── LICENSE
├── README.md
├── TODO
├── authorization_and_authentication.md
├── backing_queue.md
├── basic_publish.md
├── channels.md
├── credit_flow.md
├── deliver_to_queues.md
├── exchange_decorators.md
├── interceptors.md
├── internal_events.md
├── mandatory_message_handling.md
├── metrics_and_management_plugin.md
├── mirroring.md
├── networking_and_connections.md
├── publisher_confirms.md
├── queue_decorators.md
├── queues_and_message_store.md
├── rabbit_boot_process.md
├── transactions_in_exchange_modules.md
├── uninterupted_cluster_upgrade.md
└── variable_queue.md


/LICENSE:
--------------------------------------------------------------------------------
  1 | THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS
  2 | CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS
  3 | PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE
  4 | WORK OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS
  5 | PROHIBITED.
  6 | 
  7 | BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND
  8 | AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS
  9 | LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU
 10 | THE RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH
 11 | TERMS AND CONDITIONS.
 12 | 
 13 | 1. Definitions
 14 | 
 15 | "Adaptation" means a work based upon the Work, or upon the Work and
 16 | other pre-existing works, such as a translation, adaptation,
 17 | derivative work, arrangement of music or other alterations of a
 18 | literary or artistic work, or phonogram or performance and includes
 19 | cinematographic adaptations or any other form in which the Work may be
 20 | recast, transformed, or adapted including in any form recognizably
 21 | derived from the original, except that a work that constitutes a
 22 | Collection will not be considered an Adaptation for the purpose of
 23 | this License. For the avoidance of doubt, where the Work is a musical
 24 | work, performance or phonogram, the synchronization of the Work in
 25 | timed-relation with a moving image ("synching") will be considered an
 26 | Adaptation for the purpose of this License.  "Collection" means a
 27 | collection of literary or artistic works, such as encyclopedias and
 28 | anthologies, or performances, phonograms or broadcasts, or other works
 29 | or subject matter other than works listed in Section 1(f) below,
 30 | which, by reason of the selection and arrangement of their contents,
 31 | constitute intellectual creations, in which the Work is included in
 32 | its entirety in unmodified form along with one or more other
 33 | contributions, each constituting separate and independent works in
 34 | themselves, which together are assembled into a collective whole. A
 35 | work that constitutes a Collection will not be considered an
 36 | Adaptation (as defined below) for the purposes of this License.
 37 | "Creative Commons Compatible License" means a license that is listed
 38 | at https://creativecommons.org/compatiblelicenses that has been
 39 | approved by Creative Commons as being essentially equivalent to this
 40 | License, including, at a minimum, because that license: (i) contains
 41 | terms that have the same purpose, meaning and effect as the License
 42 | Elements of this License; and, (ii) explicitly permits the relicensing
 43 | of adaptations of works made available under that license under this
 44 | License or a Creative Commons jurisdiction license with the same
 45 | License Elements as this License.  "Distribute" means to make
 46 | available to the public the original and copies of the Work or
 47 | Adaptation, as appropriate, through sale or other transfer of
 48 | ownership.  "License Elements" means the following high-level license
 49 | attributes as selected by Licensor and indicated in the title of this
 50 | License: Attribution, ShareAlike.  "Licensor" means the individual,
 51 | individuals, entity or entities that offer(s) the Work under the terms
 52 | of this License.  "Original Author" means, in the case of a literary
 53 | or artistic work, the individual, individuals, entity or entities who
 54 | created the Work or if no individual or entity can be identified, the
 55 | publisher; and in addition (i) in the case of a performance the
 56 | actors, singers, musicians, dancers, and other persons who act, sing,
 57 | deliver, declaim, play in, interpret or otherwise perform literary or
 58 | artistic works or expressions of folklore; (ii) in the case of a
 59 | phonogram the producer being the person or legal entity who first
 60 | fixes the sounds of a performance or other sounds; and, (iii) in the
 61 | case of broadcasts, the organization that transmits the broadcast.
 62 | "Work" means the literary and/or artistic work offered under the terms
 63 | of this License including without limitation any production in the
 64 | literary, scientific and artistic domain, whatever may be the mode or
 65 | form of its expression including digital form, such as a book,
 66 | pamphlet and other writing; a lecture, address, sermon or other work
 67 | of the same nature; a dramatic or dramatico-musical work; a
 68 | choreographic work or entertainment in dumb show; a musical
 69 | composition with or without words; a cinematographic work to which are
 70 | assimilated works expressed by a process analogous to cinematography;
 71 | a work of drawing, painting, architecture, sculpture, engraving or
 72 | lithography; a photographic work to which are assimilated works
 73 | expressed by a process analogous to photography; a work of applied
 74 | art; an illustration, map, plan, sketch or three-dimensional work
 75 | relative to geography, topography, architecture or science; a
 76 | performance; a broadcast; a phonogram; a compilation of data to the
 77 | extent it is protected as a copyrightable work; or a work performed by
 78 | a variety or circus performer to the extent it is not otherwise
 79 | considered a literary or artistic work.  "You" means an individual or
 80 | entity exercising rights under this License who has not previously
 81 | violated the terms of this License with respect to the Work, or who
 82 | has received express permission from the Licensor to exercise rights
 83 | under this License despite a previous violation.  "Publicly Perform"
 84 | means to perform public recitations of the Work and to communicate to
 85 | the public those public recitations, by any means or process,
 86 | including by wire or wireless means or public digital performances; to
 87 | make available to the public Works in such a way that members of the
 88 | public may access these Works from a place and at a place individually
 89 | chosen by them; to perform the Work to the public by any means or
 90 | process and the communication to the public of the performances of the
 91 | Work, including by public digital performance; to broadcast and
 92 | rebroadcast the Work by any means including signs, sounds or images.
 93 | "Reproduce" means to make copies of the Work by any means including
 94 | without limitation by sound or visual recordings and the right of
 95 | fixation and reproducing fixations of the Work, including storage of a
 96 | protected performance or phonogram in digital form or other electronic
 97 | medium.  2. Fair Dealing Rights. Nothing in this License is intended
 98 | to reduce, limit, or restrict any uses free from copyright or rights
 99 | arising from limitations or exceptions that are provided for in
100 | connection with the copyright protection under copyright law or other
101 | applicable laws.
102 | 
103 | 3. License Grant. Subject to the terms and conditions of this License,
104 | Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
105 | perpetual (for the duration of the applicable copyright) license to
106 | exercise the rights in the Work as stated below:
107 | 
108 | to Reproduce the Work, to incorporate the Work into one or more
109 | Collections, and to Reproduce the Work as incorporated in the
110 | Collections; to create and Reproduce Adaptations provided that any
111 | such Adaptation, including any translation in any medium, takes
112 | reasonable steps to clearly label, demarcate or otherwise identify
113 | that changes were made to the original Work. For example, a
114 | translation could be marked "The original work was translated from
115 | English to Spanish," or a modification could indicate "The original
116 | work has been modified."; to Distribute and Publicly Perform the Work
117 | including as incorporated in Collections; and, to Distribute and
118 | Publicly Perform Adaptations.  For the avoidance of doubt:
119 | 
120 | Non-waivable Compulsory License Schemes. In those jurisdictions in
121 | which the right to collect royalties through any statutory or
122 | compulsory licensing scheme cannot be waived, the Licensor reserves
123 | the exclusive right to collect such royalties for any exercise by You
124 | of the rights granted under this License; Waivable Compulsory License
125 | Schemes. In those jurisdictions in which the right to collect
126 | royalties through any statutory or compulsory licensing scheme can be
127 | waived, the Licensor waives the exclusive right to collect such
128 | royalties for any exercise by You of the rights granted under this
129 | License; and, Voluntary License Schemes. The Licensor waives the right
130 | to collect royalties, whether individually or, in the event that the
131 | Licensor is a member of a collecting society that administers
132 | voluntary licensing schemes, via that society, from any exercise by
133 | You of the rights granted under this License.  The above rights may be
134 | exercised in all media and formats whether now known or hereafter
135 | devised. The above rights include the right to make such modifications
136 | as are technically necessary to exercise the rights in other media and
137 | formats. Subject to Section 8(f), all rights not expressly granted by
138 | Licensor are hereby reserved.
139 | 
140 | 4. Restrictions. The license granted in Section 3 above is expressly made subject to and limited by the following restrictions:
141 | 
142 | You may Distribute or Publicly Perform the Work only under the terms
143 | of this License. You must include a copy of, or the Uniform Resource
144 | Identifier (URI) for, this License with every copy of the Work You
145 | Distribute or Publicly Perform. You may not offer or impose any terms
146 | on the Work that restrict the terms of this License or the ability of
147 | the recipient of the Work to exercise the rights granted to that
148 | recipient under the terms of the License. You may not sublicense the
149 | Work. You must keep intact all notices that refer to this License and
150 | to the disclaimer of warranties with every copy of the Work You
151 | Distribute or Publicly Perform. When You Distribute or Publicly
152 | Perform the Work, You may not impose any effective technological
153 | measures on the Work that restrict the ability of a recipient of the
154 | Work from You to exercise the rights granted to that recipient under
155 | the terms of the License. This Section 4(a) applies to the Work as
156 | incorporated in a Collection, but this does not require the Collection
157 | apart from the Work itself to be made subject to the terms of this
158 | License. If You create a Collection, upon notice from any Licensor You
159 | must, to the extent practicable, remove from the Collection any credit
160 | as required by Section 4(c), as requested. If You create an
161 | Adaptation, upon notice from any Licensor You must, to the extent
162 | practicable, remove from the Adaptation any credit as required by
163 | Section 4(c), as requested.  You may Distribute or Publicly Perform an
164 | Adaptation only under the terms of: (i) this License; (ii) a later
165 | version of this License with the same License Elements as this
166 | License; (iii) a Creative Commons jurisdiction license (either this or
167 | a later license version) that contains the same License Elements as
168 | this License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative
169 | Commons Compatible License. If you license the Adaptation under one of
170 | the licenses mentioned in (iv), you must comply with the terms of that
171 | license. If you license the Adaptation under the terms of any of the
172 | licenses mentioned in (i), (ii) or (iii) (the "Applicable License"),
173 | you must comply with the terms of the Applicable License generally and
174 | the following provisions: (I) You must include a copy of, or the URI
175 | for, the Applicable License with every copy of each Adaptation You
176 | Distribute or Publicly Perform; (II) You may not offer or impose any
177 | terms on the Adaptation that restrict the terms of the Applicable
178 | License or the ability of the recipient of the Adaptation to exercise
179 | the rights granted to that recipient under the terms of the Applicable
180 | License; (III) You must keep intact all notices that refer to the
181 | Applicable License and to the disclaimer of warranties with every copy
182 | of the Work as included in the Adaptation You Distribute or Publicly
183 | Perform; (IV) when You Distribute or Publicly Perform the Adaptation,
184 | You may not impose any effective technological measures on the
185 | Adaptation that restrict the ability of a recipient of the Adaptation
186 | from You to exercise the rights granted to that recipient under the
187 | terms of the Applicable License. This Section 4(b) applies to the
188 | Adaptation as incorporated in a Collection, but this does not require
189 | the Collection apart from the Adaptation itself to be made subject to
190 | the terms of the Applicable License.  If You Distribute, or Publicly
191 | Perform the Work or any Adaptations or Collections, You must, unless a
192 | request has been made pursuant to Section 4(a), keep intact all
193 | copyright notices for the Work and provide, reasonable to the medium
194 | or means You are utilizing: (i) the name of the Original Author (or
195 | pseudonym, if applicable) if supplied, and/or if the Original Author
196 | and/or Licensor designate another party or parties (e.g., a sponsor
197 | institute, publishing entity, journal) for attribution ("Attribution
198 | Parties") in Licensor's copyright notice, terms of service or by other
199 | reasonable means, the name of such party or parties; (ii) the title of
200 | the Work if supplied; (iii) to the extent reasonably practicable, the
201 | URI, if any, that Licensor specifies to be associated with the Work,
202 | unless such URI does not refer to the copyright notice or licensing
203 | information for the Work; and (iv) , consistent with Ssection 3(b), in
204 | the case of an Adaptation, a credit identifying the use of the Work in
205 | the Adaptation (e.g., "French translation of the Work by Original
206 | Author," or "Screenplay based on original Work by Original
207 | Author"). The credit required by this Section 4(c) may be implemented
208 | in any reasonable manner; provided, however, that in the case of a
209 | Adaptation or Collection, at a minimum such credit will appear, if a
210 | credit for all contributing authors of the Adaptation or Collection
211 | appears, then as part of these credits and in a manner at least as
212 | prominent as the credits for the other contributing authors. For the
213 | avoidance of doubt, You may only use the credit required by this
214 | Section for the purpose of attribution in the manner set out above
215 | and, by exercising Your rights under this License, You may not
216 | implicitly or explicitly assert or imply any connection with,
217 | sponsorship or endorsement by the Original Author, Licensor and/or
218 | Attribution Parties, as appropriate, of You or Your use of the Work,
219 | without the separate, express prior written permission of the Original
220 | Author, Licensor and/or Attribution Parties.  Except as otherwise
221 | agreed in writing by the Licensor or as may be otherwise permitted by
222 | applicable law, if You Reproduce, Distribute or Publicly Perform the
223 | Work either by itself or as part of any Adaptations or Collections,
224 | You must not distort, mutilate, modify or take other derogatory action
225 | in relation to the Work which would be prejudicial to the Original
226 | Author's honor or reputation. Licensor agrees that in those
227 | jurisdictions (e.g. Japan), in which any exercise of the right granted
228 | in Section 3(b) of this License (the right to make Adaptations) would
229 | be deemed to be a distortion, mutilation, modification or other
230 | derogatory action prejudicial to the Original Author's honor and
231 | reputation, the Licensor will waive or not assert, as appropriate,
232 | this Section, to the fullest extent permitted by the applicable
233 | national law, to enable You to reasonably exercise Your right under
234 | Section 3(b) of this License (right to make Adaptations) but not
235 | otherwise.
236 | 
237 | 5. Representations, Warranties and Disclaimer
238 | 
239 | UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING,
240 | LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR
241 | WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED,
242 | STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF
243 | TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE,
244 | NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY,
245 | OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT
246 | DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED
247 | WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.
248 | 
249 | 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY
250 | APPLICABLE LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY
251 | LEGAL THEORY FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR
252 | EXEMPLARY DAMAGES ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK,
253 | EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
254 | 
255 | 7. Termination
256 | 
257 | This License and the rights granted hereunder will terminate
258 | automatically upon any breach by You of the terms of this
259 | License. Individuals or entities who have received Adaptations or
260 | Collections from You under this License, however, will not have their
261 | licenses terminated provided such individuals or entities remain in
262 | full compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8
263 | will survive any termination of this License.  Subject to the above
264 | terms and conditions, the license granted here is perpetual (for the
265 | duration of the applicable copyright in the Work). Notwithstanding the
266 | above, Licensor reserves the right to release the Work under different
267 | license terms or to stop distributing the Work at any time; provided,
268 | however that any such election will not serve to withdraw this License
269 | (or any other license that has been, or is required to be, granted
270 | under the terms of this License), and this License will continue in
271 | full force and effect unless terminated as stated above.
272 | 
273 | 8. Miscellaneous
274 | 
275 | Each time You Distribute or Publicly Perform the Work or a Collection,
276 | the Licensor offers to the recipient a license to the Work on the same
277 | terms and conditions as the license granted to You under this License.
278 | Each time You Distribute or Publicly Perform an Adaptation, Licensor
279 | offers to the recipient a license to the original Work on the same
280 | terms and conditions as the license granted to You under this License.
281 | If any provision of this License is invalid or unenforceable under
282 | applicable law, it shall not affect the validity or enforceability of
283 | the remainder of the terms of this License, and without further action
284 | by the parties to this agreement, such provision shall be reformed to
285 | the minimum extent necessary to make such provision valid and
286 | enforceable.  No term or provision of this License shall be deemed
287 | waived and no breach consented to unless such waiver or consent shall
288 | be in writing and signed by the party to be charged with such waiver
289 | or consent.  This License constitutes the entire agreement between the
290 | parties with respect to the Work licensed here. There are no
291 | understandings, agreements or representations with respect to the Work
292 | not specified here. Licensor shall not be bound by any additional
293 | provisions that may appear in any communication from You. This License
294 | may not be modified without the mutual written agreement of the
295 | Licensor and You.  The rights granted under, and the subject matter
296 | referenced, in this License were drafted utilizing the terminology of
297 | the Berne Convention for the Protection of Literary and Artistic Works
298 | (as amended on September 28, 1979), the Rome Convention of 1961, the
299 | WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms
300 | Treaty of 1996 and the Universal Copyright Convention (as revised on
301 | July 24, 1971). These rights and subject matter take effect in the
302 | relevant jurisdiction in which the License terms are sought to be
303 | enforced according to the corresponding provisions of the
304 | implementation of those treaty provisions in the applicable national
305 | law. If the standard suite of rights granted under applicable
306 | copyright law includes additional rights not granted under this
307 | License, such additional rights are deemed to be included in the
308 | License; this License is not intended to restrict the license of any
309 | rights under applicable law.
310 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # RabbitMQ Internals #
  2 | 
  3 | This project aims to explain how RabbitMQ works internally. The goal
  4 | is to make it easier to contribute for newcomers to the project, and
  5 | at the same time have a common repository of knowledge to be shared
  6 | across the project contributors.
  7 | 
  8 | ## Purpose ##
  9 | 
 10 | Most interesting modules in RabbitMQ projects have documentation
 11 | essays, sometimes quite extensive, at the top. The aim here is not to
 12 | duplicate what's there, but to provide the highest-level overview as
 13 | to the overall architecture.
 14 | 
 15 | ## Guides ##
 16 | 
 17 | In order to understand how RabbitMQ's internals work, it's better to
 18 | try to follow the logic of how a message progresses through
 19 | RabbitMQ as it is handled by the broker, otherwise, you would end up
 20 | navigating through many guides without a clear context of what's going
 21 | on, or without knowing what to read next. Therefore we have prepared
 22 | the following guides to help you understand how RabbitMQ works:
 23 | 
 24 | ### Basic Publish Guide ###
 25 | 
 26 | Here we follow the life of a message since it's received from the
 27 | network until it has been routed by the exchanges. We take a look at
 28 | the various processing steps that happen to a message right until it
 29 | is delivered to one or perhaps many queues.
 30 | 
 31 | [Basic Publish](./basic_publish.md)
 32 | 
 33 | ### Deliver To Queues Guide ###
 34 | 
 35 | After the message has been routed, the broker needs to deliver that
 36 | message to the respective queues. Here not only the message has to be
 37 | sent to queues, but also mandatory messages and publisher confirms
 38 | need to be taken into account. Also, the queue needs to try to deliver
 39 | the message to prospective consumer, otherwise the message ends up
 40 | queued.
 41 | 
 42 | [Deliver To Queues](./deliver_to_queues.md)
 43 | 
 44 | ### Queues and Message Store
 45 | 
 46 | Provides an overview of the Erlang processes that back queues
 47 | and how they interact with the message store, message index and so on.
 48 | 
 49 | [Queues and Message Store](./queues_and_message_store.md)
 50 | 
 51 | ### Variable Queue Guide ###
 52 | 
 53 | Ultimately, messages end up queued at the
 54 | [backing queue](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_backing_queue.erl). From
 55 | here they can be retrieved, acked, purged, and so on. The most common
 56 | implementation of the backing queue behaviour is the
 57 | `rabbit_variable_queue`
 58 | [module](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_variable_queue.erl),
 59 | explained in the following guide:
 60 | 
 61 | [Variable Queue](./variable_queue.md)
 62 | 
 63 | ### Mandatory Messages and Publisher Confirm Guides ###
 64 | 
 65 | As explained on the [Deliver To Queues](./deliver_to_queues.md) guide,
 66 | a channel has to handle messages published as mandatory and also take
 67 | care of publisher confirms. These processes are explained in the
 68 | following guides:
 69 | 
 70 | - [Mandatory Message Handling](./mandatory_message_handling.md)
 71 | - [Publisher Confirms](./publisher_confirms.md)
 72 | 
 73 | ### Authentication and Authorization ###
 74 | 
 75 | As explained in the [Basic Publish](./basic_publish.md), there are
 76 | some rules to see if a message can be accepted by the broker from a
 77 | certain publisher. This is explained in the following guide:
 78 | 
 79 | [Authorization and Authentication Backends](./authorization_and_authentication.md)
 80 | 
 81 | ### Internal Event Subsystem
 82 | 
 83 | In some cases components in a running node communicate via events.
 84 | Some events are consumed by other nodes.
 85 | 
 86 | [Internal Events](./internal_events.md)
 87 | 
 88 | ### Management Plugin ###
 89 | 
 90 | An architectural overview of the v3.6.7+ version of the management plugin.
 91 | 
 92 | [Metrics and Management Plugin](./metrics_and_management_plugin.md)
 93 | 
 94 | ## Maturity and Completeness
 95 | 
 96 | These guides are not complete, haven't been edited, and are work in
 97 | progress in general.
 98 | 
 99 | So if you find yourself wanting more detail, check the code first!
100 | 
101 | ## License
102 | 
103 | (c) Pivotal Software Inc, 2015-2016
104 | 
105 | Released under the
106 | [Creative Commons Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/)
107 | license.
108 | 


--------------------------------------------------------------------------------
/TODO:
--------------------------------------------------------------------------------
 1 | Overall architecture
 2 |  - which things are processes, how do messages flow, how do plugins work,
 3 |    what are the major subsystems?
 4 | 
 5 | Profiling / debugging
 6 |  - fprof and/or dbg
 7 | 
 8 | Shovel / federation
 9 |  - History / rationales, how they work, Erlang client, future directions
10 | 


--------------------------------------------------------------------------------
/authorization_and_authentication.md:
--------------------------------------------------------------------------------
  1 | # Authorization and Authentication Backends
  2 | 
  3 | This document describes authentication and authorization machinery that
  4 | implements [access control](https://www.rabbitmq.com/access-control.html).
  5 | 
  6 | Authentication backends should not be confused with authentication mechanisms,
  7 | which are defined in some protocols supported by RabbitMQ.
  8 | For AMQP 0-9-1 authentication mechanisms, see [documentation](https://www.rabbitmq.com/authentication.html).
  9 | 
 10 | ## Definitions
 11 | 
 12 | Authentication and authorization are often confused or used interchangeably. That's
 13 | wrong and RabbitMQ separates the two. For the sake of simplicity, we'll define
 14 | authentication as "identifying who the user is" and authorization as
 15 | "determining what the user is and isn't allowed to do."
 16 | 
 17 | 
 18 | ## Authentication Mechanisms
 19 | 
 20 | AMQP 0-9-1 supports multiple authentication **mechanisms**. The mechanisms decide how a
 21 | client connection authenticates, for example, what should be considered a
 22 | set of credentials.
 23 | 
 24 | In practice in 99% of cases only two mechanisms are used:
 25 | 
 26 |  * `PLAIN` (a set of credentials such as username and password)
 27 |  * `EXTERNAL`, which assumes authentication happens out of band (not performed
 28 |    by RabbitMQ authN backends), usually [using x509 (TLS) certificates](https://github.com/rabbitmq/rabbitmq-server/tree/master/deps/rabbitmq_auth_mechanism_ssl).
 29 |    This mechanism ignores client-provided credentials and relies on TLS [peer certificate chain
 30 |    verification](https://tools.ietf.org/html/rfc6818).
 31 | 
 32 | When a client connection reaches [authentication stage](https://github.com/rabbitmq/rabbitmq-server/blob/v3.7.2/src/rabbit_reader.erl#L1304), a mechanism requested by the client
 33 | and supported by the server is selected. The mechanism module then checks whether it can
 34 | be applied to a connection (e.g. the TLS-based mechanism will reject non-TLS connections).
 35 | 
 36 | An authentication mechanism is a module that implements the [rabbit_auth_mechanism](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/rabbit_auth_mechanism.erl) behaviour, which includes
 37 | 3 functions:
 38 | 
 39 |  * `init/1`: self-explanatory
 40 |  * `should_offer/1`: if this mechanism is enabled, should it be offered for a given socket?
 41 |  * `handle_response/2`: the core authentication logic of the mechanism
 42 | 
 43 | The `PLAIN` mechanism extracts client credentials and passes them to
 44 | a chain of authentication and authorization backends.
 45 | 
 46 | 
 47 | ## Authentication and Authorization Backends
 48 | 
 49 | Authentication (authN) and authorization (authZ) backend(s) use
 50 | client-provided credentials to decide whether the client passes
 51 | authentication and should be granted access to the target virtual
 52 | host.
 53 | 
 54 | The above sentence implies that the `PLAIN` (or similar)
 55 | authentication mechanism is used and already validated the presence of
 56 | client credentials.
 57 | 
 58 | Authentication and authorization backends form a chain of
 59 | responsibility: a set of backends is applied to the same set of client
 60 | credentials and as soon as one of them reports success, the entire
 61 | operation is considered to be successful.
 62 | 
 63 | Authentication and authorization backends can be provided by
 64 | plugins. They are modules that must implement the following
 65 | behaviours:
 66 | 
 67 |  * `rabbit_authn_backend` for authentication ("authn") backends
 68 |  * `rabbit_authz_backend` for authorization ("authz") backends
 69 | 
 70 | It is possible to implement both in a single module.
 71 | For example `internal`, `ldap` and `http` backends do so.
 72 | 
 73 | It is possible to use multiple backends for authn or authz. Then the first
 74 | positive result returned by a backend in the chain is considered to be final.
 75 | 
 76 | ### AuthN Backend
 77 | 
 78 | The `rabbit_authn_backend` behaviour defines authentication process with single function: 
 79 | 
 80 | ``` erlang
 81 | user_login_authentication(UserName, AuthProps) -> {ok, #auth_user{}} | {refused, Format, Args} | {error, Reason}
 82 | ``` 
 83 | Where `UserName` is the name of the user which is trying to authorize,
 84 | `AuthProps` is an authorization context (proplist) (e.g. it can be `[]` for x509 certificate-based
 85 | authentication or `[{password, Password}]` for password-based one).
 86 | 
 87 | This function returns
 88 | 
 89 |  * `{ok, #auth_user{}}` in case of successfull authentication. The `#auth_user{}` record is then passed
 90 |     on to other modules, associated with the connection, etc.
 91 |  * `{refused, Format, Args}` when user authentication fails. `Format` and `Args` are meant to be used
 92 |     with `io:format/2` and similar functions.
 93 |  * `{error, Reason}` when an unexpected error occurs.
 94 | 
 95 | ### AuthZ Backend
 96 | 
 97 | The `rabbit_authz_backend` behaviour defines functions that authorize access
 98 | to RabbitMQ resources, such as `vhost`, `exchange`, `queue` or `topic`.
 99 | 
100 | It contains following functions:
101 | 
102 | ``` erlang
103 | % if user is allowed to access broker.
104 | user_login_authorization(UserName) -> {ok, Impl} | {ok, Impl, Tags} | {refused, Format, Args} | {error, Reason}.
105 | % if user has access to specific vhost.
106 | check_vhost_access(#auth_user{}, Vhost, Context) -> boolean() | {error, Reason}.
107 | % if user has access to specific resource
108 | check_resource_access(#auth_user{}, #resource{}, Permission), Context -> boolean() | {error, Reason}.
109 | % if user has access to specific topic
110 | check_topic_access(#auth_user{}, #resource{}, Permission, Context) -> boolean() | {error, Reason}.
111 | % if the backend supports state or credential expiration
112 | state_can_expire() -> boolean()
113 | % optional: update backend state (eg. new JWT token)
114 | update_state(#auth_user{}, NewState) -> {ok, #auth_user{}} | {refused, Fmt, Args} | {error, Reason}.
115 | ```
116 | 
117 | Where 
118 | 
119 |  * `UserName`, `Format`, `Args`: see above.
120 |  * `Impl` is internal state of authorization backend. It will vary between backends and can be thought of
121 |    as backend's `State`.
122 |  * `Tags` is user tags. Those are used by features such as policies, plugins such as management, and so on. Tags can be an empty list.
123 |  * `Vhost` is self-explanatory
124 |  * `Permission` currently one of `configure`, `read`, or `write`
125 |  * `Context` is a map with additional information (like peer address or routing key, or protocol-specific information like MQTT client ID). It is slightly different for each of the three check functions.
126 | 
127 | The `#auth_user{}` record represents a user whenever we need to
128 | check access to vhosts and resources.
129 | 
130 | This record has the following structure:
131 | `#auth_user{ username :: binary(), impl :: any(), tags :: [any()] }`,
132 | 
133 | where `impl` is internal backend state, `tags` is user tags (see above).
134 | 
135 | `impl` can be used to check resource access by querying an external data source or performing
136 | a check solely on the provided state (local data).
137 | 
138 | `#resource{ virtual_host :: binary(), kind :: query|exchange|topic, name :: binary() }`
139 | represents a resource (a queue, exchange, or topic) access to which is restricted.
140 | 
141 | ### Configuring Backends
142 | 
143 | Backends are configured the usual way and can have multiple "syntaxes"
144 | (recognised forms):
145 | 
146 | ``` erlang
147 | % To enable single backend:
148 | {rabbit, [{auth_backends, [my_auth_backend]}]}.
149 | % To check several backends. If one is refused - check next.
150 | {rabbit, [{auth_backends, [my_auth_backend, my_other_auth_backend]}]}.
151 | % To use different modules as AuthN and AuthZ backends
152 | {rabbit, [{auth_backends, [{my_authn_backend, my_authz_backend}]}]}.
153 | % You can still fallback if using different modules
154 | {rabbit, [{auth_backends, [{my_authn_backend, my_authz_backend}, my_other_auth_backend]}]}.
155 | ```
156 | 
157 | If backend is defined by a tuple,
158 | the first element will be used as an `AuthN` module and the second as the `AuthZ` one.
159 | If it is defined by an atom, it will be used for both `AuthN` and `AuthZ`.
160 | 
161 | When a backend is defined by a list, the server will use modules in the chain in order
162 | until one of them returns a positive result or the list is exhausted (the Chain of Responsibility
163 | pattern in object-oriented parlance).
164 | 
165 | If authentication is successfull then the `AuthZ` backend from the same tuple ("chain element")
166 | will be used for authorization checks later.
167 | 
168 | ### Example Backends:
169 | 
170 |  * `rabbit_auth_backend_dummy`: a dummy no-op backend, only used as the most trivial example
171 |  * `rabbit_auth_backend_internal`: internal data store backend. See https://www.rabbitmq.com/access-control.html for more info
172 |  * `rabbit_auth_backend_ldap`: provides `AuthN` and `AuthZ` backends in a single module, backed by LDAP
173 |  * `rabbit_auth_backend_http`: provides `AuthN` and `AuthZ` backends in a single module, backed by an HTTP service
174 |  * `rabbit_auth_backend_amqp`: provides `AuthN` and `AuthZ` backends in a single module, backed by an AMQP 0-9-1 service that uses request/response ("RPC")
175 |  * `rabbit_auth_backend_oauth2`: provides `AuthN` and `AuthZ` backends in a single module, that uses OAuth 2.0 (JWT) tokens. It is not specific to but developed against [Cloud Foundry UAA](https://github.com/cloudfoundry/uaa).
176 | 
177 | ### Permission caching
178 | 
179 | Permissions are cached for latest operations by each channel. There is
180 | a cache for resource and another for topic permissions. When a
181 | `Resource/Topic + Context + Permission` triplet is successfully
182 | authorized, it is added to the corresponding cache. When the cache is
183 | full (currently max 12 entries in each), oldest entry is rotated
184 | out. The caches are cleared when the channel process is
185 | hibernated. They are also cleared every `channel_tick_interval` if any
186 | enabled backend supports state expiration.
187 | 
188 | To give an extreme and unrealistic example: when only the internal
189 | backend is enabled a client only doing publishes to the same resource
190 | with high frequency can continue to do so indefinitely even after its
191 | permission is revoked for that resource (because the channel cache is
192 | never cleared and never rotates).
193 | 
194 | Separately there is also a [caching authN/authZ
195 | backend](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_auth_backend_cache/README.md)
196 | that provides a TTL based caching layer for another backend. It is
197 | useful to reduce network traffic and latency for backends that use
198 | external services (like LDAP or HTTP). Its time based expiration
199 | allows better control over when entries are evicted from the cache.
200 | 


--------------------------------------------------------------------------------
/backing_queue.md:
--------------------------------------------------------------------------------
  1 | # Backing Queue #
  2 | 
  3 | RabbitMQ supports plugable backing queues by modules implementing the
  4 | `rabbit_backing_queue` behaviour.
  5 | 
  6 | The backing queue `init/3` callback expects an `async_callback()`
  7 | parameter which is a `fun` callback which takes the backing queue
  8 | state, and returns a new state. Keep reading to understand what all
  9 | this callback mumbo-jumbo means.
 10 | 
 11 | **TL;DR:** due to the two problems explained below, this callback
 12 |   takes care of executing certain functions in the context of a
 13 |   particular Erlang process.
 14 | 
 15 | ## Process Dictionary Problem ##
 16 | 
 17 | Understanding how this callback works is vital since the persistence
 18 | layer of the backing queue does heavy use of the _process dictionary_
 19 | and the use of `self()` to track who opened which file handle. What
 20 | this means is that even tho the backing queue behaviour callbacks
 21 | seem to have the referential transparent property, they do not. Behind
 22 | the scenes, some of the backing queue behaviour callbacks will `put/get`
 23 | values to/from the process dictionary, but if one of said callbacks is
 24 | executed in a different process context, then those values won't be
 25 | found on the process dictionary, and everything else breaks havoc.
 26 | 
 27 | ## File Handle Cache Problem ##
 28 | 
 29 | The same applies for the `file_handle_cache` tracking who owns which
 30 | file handle by calling `self()` inside its functions implementations
 31 | instead of expecting a `Pid` as parameter for example. The call of
 32 | `self()` again violates referential transparency. The function
 33 | behaviour now depends on the process context on which it's
 34 | called. This means that closing file handles must be done from the
 35 | same caller that issued the file open.
 36 | 
 37 | ## How Things Work ##
 38 | 
 39 | The function `rabbit_amqqueue_process:bq_init/3` takes care of
 40 | initializing the backing queue implementation, whether it is the
 41 | `rabbit_variable_queue`, the `rabbit_mirror_queue_master`, the
 42 | `rabbit_priority_queue` or your own backing queue behaviour
 43 | implementation.
 44 | 
 45 | The async callback passed into `BQ:init` is defined as:
 46 | 
 47 | ```erlang
 48 | fun (Mod, Fun) ->
 49 |         rabbit_amqqueue:run_backing_queue(Self, Mod, Fun)
 50 | end
 51 | ```
 52 | 
 53 | This `fun` will take a module argument, which is usually an atom
 54 | referring to the backing queue module being used, for example
 55 | `rabbit_variable_queue`, or `rabbit_mirror_queue_master`. The second
 56 | argument expected by this callback is a `fun` that will be passed
 57 | along to `rabbit_amqqueue:run_backing_queue/3`. Now lets see what
 58 | `rabbit_amqqueue:run_backing_queue` does.
 59 | 
 60 | ### rabbit_amqqueue:run_backing_queue ###
 61 | 
 62 | The function body is like this:
 63 | 
 64 | ```erlang
 65 | run_backing_queue(QPid, Mod, Fun) ->
 66 |     gen_server2:cast(QPid, {run_backing_queue, Mod, Fun}).
 67 | ```
 68 | 
 69 | It sends a `{run_backing_queue, Mod, Fun}` message to whatever process
 70 | was provided as `QPid`. **This is important**, since that process'
 71 | context is the one which will get its process dictionary modified
 72 | indirectly, and at the same time will own file handles when they are
 73 | opened by the _msg\_store_ for example.
 74 | 
 75 | Back to `rabbit_amqqueue_process` we will see that this module has a
 76 | callback for the message mentioned above:
 77 | 
 78 | ```erlang
 79 | handle_cast({run_backing_queue, Mod, Fun},
 80 |             State = #q{backing_queue = BQ, backing_queue_state = BQS}) ->
 81 |     noreply(State#q{backing_queue_state = BQ:invoke(Mod, Fun, BQS)});
 82 | ```
 83 | 
 84 | This function takes care of extracting the current `backing_queue`
 85 | module and `backing_queue_state` from its own process state, and then
 86 | calling `BQ:invoke(Mod, Fun, BQS)`.
 87 | 
 88 | This is what `BQ:invoke/3` does:
 89 | 
 90 | 
 91 | ```erlang
 92 | invoke(?MODULE, Fun, State) -> Fun(?MODULE, State);
 93 | invoke(      _,   _, State) -> State.
 94 | ```
 95 | 
 96 | Invoke's implementation is pretty simple, if the `Mod` argument
 97 | provided to it matches the current module, in this example
 98 | `rabbit_variable_queue`, then the `Fun` will be executed with
 99 | `rabbit_variable_queue` as first parameter and the backing queue
100 | `State` as the second argument. To reiterate, what's important to
101 | understand is that `Fun` will be executed in the context of whatever
102 | `QPid` was referring to above. In the case we are analyzing so far,
103 | this is the `rabbit_amqqueue_process` pid.
104 | 
105 | ## What Fun ##
106 | 
107 | Now let's try to find out what `Fun` actually is. To get to this we
108 | need to see how `rabbit_variable_queue` is initialized`.
109 | 
110 | `rabbit_variable_queue:init/6` will call into
111 | `msg_store_client_init/3` passing our initial callback as the third
112 | parameter (`msg_store_client_init/3` then expands into
113 | `msg_store_client_init/4`). Let's refresh what that callback was:
114 | 
115 | ```erlang
116 | fun (Mod, Fun) ->
117 |         rabbit_amqqueue:run_backing_queue(Self, Mod, Fun)
118 | end
119 | ```
120 | 
121 | That callback will be now wrapped into yet another `fun` like this:
122 | 
123 | ```erlang
124 | fun () -> Callback(?MODULE, CloseFDsFun) end
125 | ```
126 | 
127 | To see that in context:
128 | 
129 | ```erlang
130 | msg_store_client_init(MsgStore, Ref, MsgOnDiskFun, Callback) ->
131 |     CloseFDsFun = msg_store_close_fds_fun(MsgStore =:= ?PERSISTENT_MSG_STORE),
132 |     rabbit_msg_store:client_init(MsgStore, Ref, MsgOnDiskFun,
133 |                                  fun () -> Callback(?MODULE, CloseFDsFun) end).
134 | ```
135 | 
136 | So now we have a clue of what the `Fun` passed into our callback might
137 | be. It is whatever `msg_store_close_fds_fun` returned as
138 | `CloseFDsFun`. Let's check:
139 | 
140 | ```erlang
141 | msg_store_close_fds_fun(IsPersistent) ->
142 |     fun (?MODULE, State = #vqstate { msg_store_clients = MSCState }) ->
143 |             {ok, MSCState1} = msg_store_close_fds(MSCState, IsPersistent),
144 |             State #vqstate { msg_store_clients = MSCState1 }
145 |     end.
146 | ```
147 | 
148 | We get a `fun` that will only be executed if the `Mod` argument
149 | matches, in this case `rabbit_variable_queue`. That `fun` takes as
150 | second argument our `rabbit_variable_queue` state.
151 | 
152 | On `msg_store_client_init/4` above we said that our initial callback
153 | gets wrapped like this:
154 | 
155 | ```erlang
156 | fun () -> Callback(?MODULE, CloseFDsFun) end
157 | ```
158 | 
159 | This means inside the msg_store, at various places, that `fun` closure
160 | gets called without arguments which in turn calls our callback with
161 | the `CloseFDsFun`. We end up with something like what's below after
162 | some expansions:
163 | 
164 | ```erlang
165 | fun (rabbit_variable_queue, Fun) ->
166 |         rabbit_amqqueue:run_backing_queue(QPid, rabbit_variable_queue,
167 |             fun (?MODULE, State = #vqstate { msg_store_clients = MSCState }) ->
168 |                 {ok, MSCState1} = msg_store_close_fds(MSCState, IsPersistent),
169 |                 State #vqstate { msg_store_clients = MSCState1 }
170 |             end)
171 | end
172 | ```
173 | 
174 | So our `rabbit_amqqueue_process` will ask the backing queue module
175 | to invoke that expanded fun in the context of the
176 | `rabbit_amqqueue_process` Pid:
177 | 
178 | ```erlang
179 | handle_cast({run_backing_queue, Mod, Fun},
180 |             State = #q{backing_queue = BQ, backing_queue_state = BQS}) ->
181 |     noreply(State#q{backing_queue_state = BQ:invoke(Mod, Fun, BQS)});
182 | ```
183 | 
184 | This very same technique is used on `rabbit_variable_queue:init/3` to
185 | setup the functions that will write messages to disk (see
186 | `rabbit_variable_queue:msgs_written_to_disk/3`) and the ones that will
187 | write the message indexes to disk (see
188 | `rabbit_variable_queue:msg_indices_written_to_disk/2`).
189 | 
190 | ## It's About Context ##
191 | 
192 | From all these layers of indirection, what's important to understand
193 | is that the `Pid` passed into `rabbit_amqqueue:run_backing_queue/3`
194 | determines the context on which all the functions implementing message
195 | persistence will be run. Unless your `rabbit_backing_queue` behaviour
196 | implementation is just a proxy like that of `rabbit_priority_queue`,
197 | you must take that `Pid` context into account, since it will hold file
198 | handles references and its process dictionary will be the one where
199 | the `file_handle_cache` will store its information.
200 | 
201 | If you want a second example of what we outlined above, take a look at
202 | `rabbit_mirror_queue_slave:bq_init/3` where the Pid provided to
203 | `run_backing_queue/3` in this case is the _slave_ Pid. The slave
204 | process implements it's own `handle_cast({run_backing_queue, Mod,
205 | Fun}, State)` function clause, on which funs from
206 | `rabbit_variable_queue` like `msg_store_close_fds_fun`,
207 | `msgs_written_to_disk`, `msg_indices_written_to_disk` and
208 | `msgs_and_indices_written_to_disk` will be run.
209 | 


--------------------------------------------------------------------------------
/basic_publish.md:
--------------------------------------------------------------------------------
  1 | # Publishing Messages into RabbitMQ #
  2 | 
  3 | One of the best ways to cover the various parts of RabbitMQ's
  4 | architecture is to see what happens when a message gets published into
  5 | the broker. In this document we are going to visit the different
  6 | subsystems a message crosses inside the broker. Let's start by
  7 | `rabbit_reader`.
  8 | 
  9 | The `rabbit_reader` process is the one that takes care of reading data
 10 | from the network and forwarding it to the respective channel
 11 | process. Messages get into the channel when the reader calls the
 12 | function `rabbit_channel:do_flow/3`, this function will call the
 13 | `credit_flow` module to track that a message was received from the
 14 | reader, so it could eventually throttle down the reader in case the
 15 | message publisher is sending more messages in than the amount the
 16 | broker can handle at a particular time. Read more about Credit Flow
 17 | [here](./credit_flow.md). More information about the Reader process
 18 | can be found in the
 19 | [Networking and Connections guide](./networking_and_connections.md#rabbit_reader).
 20 | 
 21 | ## Arriving into the Channel Process ##
 22 | 
 23 | Once Credit Flow is accounted for, then the `do_flow/3` function will
 24 | issue an asynchronous `gen_server:cast/2` into the channel process
 25 | passing in this Erlang message: `{method, Method, Content,
 26 | flow}`. There we have the AMQP `Method`, then method `Content`, and
 27 | the atom `flow` indicating the channel that credit flow is in use.
 28 | 
 29 | When the cast reaches the `handle_cast/2` function inside the channel
 30 | module, we are finally inside the channel process memory and execution
 31 | path. If `flow` was in use, as is the case here, then the channel will
 32 | issue a `credit_flow:ack/1` to the reader process. Then the AMQP
 33 | method that's being processed will be passed to the
 34 | [Interceptor](./interceptors.md) defined for the channel, in case
 35 | there are any. After the Interceptors are done processing the AMQP
 36 | method, then the channel process will continue processing the method,
 37 | in our case the function `handle_method/3` will be called, with a
 38 | `basic.publish` record.
 39 | 
 40 | ## Inside basic.publish ##
 41 | 
 42 | basic.publish works by receiving an AMQP message, an Exchange and a
 43 | Routing Key, and it will use the exchange to route the message to one
 44 | or various queues, based on the routing key. Let's see how's that
 45 | accomplished.
 46 | 
 47 | The first thing the function does is to check the size of the message 
 48 | since RabbitMQ has an upper limit of 2GB for messages.
 49 | 
 50 | Then the function needs to build the resource record for the
 51 | Exchange. Exchanges and Queues are represented internally with a
 52 | resource record that keeps track of the name, and the vhost where the
 53 | exchange or queue was declared. The type declaration record looks like
 54 | this:
 55 | 
 56 | ```erlang
 57 | #resource{virtual_host :: VirtualHost,
 58 |           kind         :: Kind,
 59 |           name         :: Name}
 60 | ```
 61 | 
 62 | So if a message was published to the default vhost to an exchange
 63 | called `"my_exchange"`, we will end up with the following record:
 64 | 
 65 | ```erlang
 66 | #resource{virtual_host = <<"/">>
 67 |           kind         = exchange,
 68 |           name         = <<"my_exchange">>}
 69 | ```
 70 | 
 71 | Resources like that one are used everywhere in RabbitMQ, so it's a
 72 | good idea to study their parts in the
 73 | [rabbit_types](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/rabbit_types.erl)
 74 | module where this declarations are defined.
 75 | 
 76 | Once we have the exchange record, `basic.publish` will use it to see
 77 | if the user publishing the message has write permissions to this
 78 | particular exchange by calling the function
 79 | `check_write_permitted/2`. Read more about the different kind of
 80 | permissions here:
 81 | [access-control](https://www.rabbitmq.com/access-control.html)
 82 | 
 83 | If the user does have permission to publish messages to this exchange,
 84 | then the channel will query the Mnesia database trying to find out if
 85 | the exchange actually exists, so the function
 86 | `rabbit_exchange:lookup_or_die/1` is called in order to retrieve the
 87 | actual exchange record from the database, if the exchange is not found,
 88 | then a channel error is raised by `lookup_or_die/1`. Keep in mind that
 89 | one thing is the exchange resource we mentioned above, and another
 90 | much different is the exchange record stored in mnesia. The latter
 91 | holds up much more information about the actual exchange, like it's
 92 | type for example (direct, fanout, topic, etc). Here's the exchange
 93 | record definition from `rabbit.hrl`:
 94 | 
 95 | ```erlang
 96 | %% fields described as 'transient' here are cleared when writing to
 97 | %% rabbit_durable_<thing>
 98 | -record(exchange, {
 99 |           name, type, durable, auto_delete, internal, arguments, %% immutable
100 |           scratches,    %% durable, explicitly updated via update_scratch/3
101 |           policy,       %% durable, implicitly updated when policy changes
102 |           decorators}). %% transient, recalculated in store/1 (i.e. recovery)
103 | ```
104 | 
105 | Then we need to check that the record returned by Mnesia is not an 
106 | internal exchange, otherwise an error will be raised and the publish 
107 | will fail.
108 | 
109 | The next thing to do is to validate the user id provided with the
110 | basic publish, if any. If provided, this user id has to be validated
111 | against the user that created the channel where the message is being
112 | published. More details
113 | [here](https://www.rabbitmq.com/validated-user-id.html)
114 | 
115 | Then we need to validate if the message expiration header that the
116 | user provided is correct. More info about the Per-Message-TTL
117 | [here](https://www.rabbitmq.com/ttl.html#per-message-ttl)
118 | 
119 | Then it's time to check if the message was published as _Mandatory_ or
120 | if the channel is in _Transaction_ or _Confirm Mode_. If this is the
121 | case, then the `publish_seqno` field on the channel state will be
122 | incremented to account for the new publish that's being handled. This
123 | Message Sequence Number will be later used to reply back to the
124 | publisher in case the message was Mandatory and/or the channel was in
125 | [Confirm Mode](https://www.rabbitmq.com/confirms.html). See also the
126 | document [Delivering Messages to Queues](./deliver_to_queues.md).
127 | 
128 | After all these steps have been completed, it's time to route the AMQP
129 | message, but in order to do that we need to wrap the message first
130 | into a `#basic_message` record, and then pass it to the exchange and
131 | queues as a `#delivery{}` record:
132 | 
133 | ```erlang
134 | -record(basic_message,
135 |         {exchange_name,     %% The exchange where the message was received
136 |          routing_keys = [], %% Routing keys used during publish
137 |          content,           %% The message content
138 |          id,                %% A `rabbit_guid:gen()` generated id
139 |          is_persistent}).   %% Whether the message was published as persistent
140 | 
141 | -record(delivery,
142 |         {mandatory,  %% Whether the message was published as mandatory
143 |          confirm,    %% Whether the message needs confirming
144 |          sender,     %% The pid of the process that created the delivery
145 |          message,    %% The #basic_message record
146 |          msg_seq_no, %% Msg Sequence Number from the channel publish_seqno field
147 |          flow}).     %% Should flow control be used for this delivery
148 | ```
149 | 
150 | ## Message Routing ##
151 | 
152 | The `#delivery` we just created on the previous step is now passed to
153 | the exchange via the function `rabbit_exchange:route/2`. If the
154 | exchange name used during `basic.publish` is the empty string
155 | `<<"">>`, then the `default` exchange is assumed, and the `route/2`
156 | will just return the queue name associated with the routing key, per
157 | AMQP spec. If that's not the case, then the delivery will be processed
158 | first by the [exchange decorators](./exchange_decorators.md) that are
159 | configured to the exchange that's handling the routing. The decorators
160 | will send back a list of _destinations_. At this point, delivery will
161 | finally reach the exchange, where the routing algorithm implemented by
162 | the exchange will take place. This process will return a new list of
163 | _destinations_ which will be merged and deduplicated with the list
164 | returned before by the decorators. At this point, all the destinations
165 | proposed by the
166 | [Exchange To Exchange](https://www.rabbitmq.com/e2e.html) bindings are
167 | also included in the list of destinations that will be returned to the
168 | channel.
169 | 
170 | ## Processing Routing Results ##
171 | 
172 | Now the channel has a list of queues to which it should deliver the
173 | messages. Before doing that, we need to see if the channel is in
174 | transaction mode, if that's the case, then the `#delivery` and the
175 | list of queues are enqueued for later until the transaction is
176 | committed. Keep in mind that transaction support in RabbitMQ are a
177 | very simple form of
178 | [message batching](https://www.rabbitmq.com/semantics.html). If the
179 | channel is not in transaction mode, then the message will be delivered
180 | to the queues returned by the routing function.
181 | 
182 | ## Summary ##
183 | 
184 | We saw in this guide that messages arrive via the network into the
185 | `rabbit_reader` process. This process forwards commands to
186 | `rabbit_channel` processes who take care of processing the various
187 | AMQP methods. In this case, we are seeing what happens when a message
188 | is published into RabbitMQ. Once credit flow has been acked back to
189 | the reader process, then it's time to take care of handling the
190 | message. First it will go to the interceptors, who might modify or
191 | augment the AMQP method received from the _reader_. Then the channel
192 | must make sure the message complies to the size limits set at the
193 | broker side. Once that's done, we need to see if the user has
194 | permission to publish message to the selected exchange. If that's fine
195 | and the `user_id` and `expiration` headers of the message are
196 | validated, then it's time to route the message. The exchange who
197 | handles the message will return back a list of queues to which the
198 | message must be delivered to. At this point we are done with the
199 | message and the channel is ready to keep processing commands.
200 | 
201 | Now we can continue with the next guide and see what happens when
202 | messages are delivered to queues:
203 | [Delivering Messages to Queues](./deliver_to_queues.md)
204 | 


--------------------------------------------------------------------------------
/channels.md:
--------------------------------------------------------------------------------
1 | # Channels
2 | 
3 | This guide provides an overview of AMQP 0-9-1 channel implementation.
4 | Before you start, please take a look at the [Networking and Connections](./networking_and_connections.md) one.
5 | 


--------------------------------------------------------------------------------
/credit_flow.md:
--------------------------------------------------------------------------------
  1 | # Credit Flow #
  2 | 
  3 | In order to prevent fast publishers from overflowing the broker with
  4 | more messages than it can handle at any particular moment, RabbitMQ
  5 | implements an internal mechanism called credit flow that will be used
  6 | by the various systems inside RabbitMQ to throttle down publishers,
  7 | while allowing the message consumers to catch up. In this blog post we
  8 | are going to see how credit flow works, and what we can do to tune its
  9 | configuration for an optimal behaviour.
 10 | 
 11 | Since version 3.5.5, RabbitMQ includes a couple of new configuration
 12 | values that let users fiddle with the internal credit flow
 13 | settings. Understanding how these work according to your particular
 14 | workload can help you get the most out of RabbitMQ in terms of
 15 | performance, but beware, increasing these values just to see what
 16 | happens can have adverse effects on how RabbitMQ is able to respond to
 17 | message bursts, affecting the internal strategies that RabbitMQ has in
 18 | order to deal with memory pressure. Handle with care.
 19 | 
 20 | To understand the new credit flow settings first we need to understand
 21 | how the internals of RabbitMQ work with regards to message publishing
 22 | and paging messages to disk. Let’s see first how message publishing
 23 | works in RabbitMQ.
 24 | 
 25 | ## Message Publishing ##
 26 | 
 27 | To see how credit_flow and its settings affect publishing, let’s see
 28 | how internal messages flow in RabbitMQ. Keep in mind that RabbitMQ is
 29 | implemented in Erlang, where processes communicate by sending messages
 30 | to each other.
 31 | 
 32 | Whenever a RabbitMQ instance is running, there are probably hundreds
 33 | of Erlang processes exchanging messages to communicate with each
 34 | other. We have for example a reader process that reads AMQP frames
 35 | from the network. Those frames are transformed into AMQP commands that
 36 | are forwarded to the AMQP channel process. If this channel is handling
 37 | a publish, it needs to ask a particular exchange for the list of
 38 | queues where this message should end up going, which means the channel
 39 | will deliver the message to each of those queues. Finally if the AMQP
 40 | message needs to be persisted, the msg_store process will receive it
 41 | and write it to disk. So whenever we publish an AMQP message to
 42 | RabbitMQ we have the following Erlang message flow[1]:
 43 | 
 44 | ```
 45 | reader -> channel -> queue process -> message store.
 46 | ```
 47 | 
 48 | In order to prevent any of those processes from overflowing the next
 49 | one down the chain, we have a credit flow mechanism in place. Each
 50 | process initially grants certain amount of credits to the process that
 51 | is sending them messages. Once a process is able to handle N of
 52 | those messages, it will grant more credit to the process that sent
 53 | them. Under default credit flow settings (`credit_flow_default_credit`
 54 | under `rabbitmq.config`) these values are 200 messages of initial
 55 | credit, and after 50 messages processed by the receiving process, the
 56 | process that sent the messages will be granted 50 more credits.
 57 | 
 58 | Say we are publishing messages to RabbitMQ, this means the reader will
 59 | be sending one erlang message to the channel process per AMQP
 60 | basic.publish received. Each of those messages will consume one of
 61 | these credits from the channel. Once the channel is able to process 50
 62 | of those messages, it will grant more credit to the reader. So far so
 63 | good.
 64 | 
 65 | In turn the channel will send the message to the queue process that
 66 | matched the message routing rules. This will consume one credit from
 67 | the credit granted by the queue process to the channel. After the
 68 | queue process manages to handle 50 deliveries, it will grant 50 more
 69 | credits to the channel.
 70 | 
 71 | Finally if a message is deemed to be persistent (it’s persistent and
 72 | published to a durable queue), it will be sent to the message store,
 73 | which in this case will also consume credits from the ones granted by
 74 | the message store to the queue process. In this case the initial
 75 | values are different and handled by the `msg_store_credit_disc_bound`
 76 | setting: 2000 messages of initial credit and 500 more credits after
 77 | 500 messages are processed by the message store.
 78 | 
 79 | So we know how internal messages flow inside RabbitMQ and when credit
 80 | is granted to a process that’s above in the msg stream. The tricky
 81 | part comes when credit is granted between processes. Under normal
 82 | conditions a channel will process 50 messages from the reader, and
 83 | then grant the reader 50 more credits, but keep in mind that a channel
 84 | is not just handling publishes, it’s also sending messages to
 85 | consumers, routing messages to queues and so on.
 86 | 
 87 | What happens if the reader is sending messages to the channel at a
 88 | higher speed of what the channel is able to process? If we reach this
 89 | situation, then the channel will block the reader process, which will
 90 | result in producers being throttled down by RabbitMQ. Under default
 91 | settings, the reader will be blocked once it sends 200 messages to the
 92 | channel, but the channel is not able to process at least 50 of them,
 93 | in order to grant credit back to the reader.
 94 | 
 95 | Again, under normal conditions, once the channel manages to go through
 96 | the message backlog, it will grant more credit to the reader, but
 97 | there’s a catch. What if the channel process is being blocked by the
 98 | queue process, due to similar reasons? Then the new credit that was
 99 | supposed to go to the reader process will be deferred. The reader
100 | process will remain blocked.
101 | 
102 | Once the queue process manages to go through the deliveries backlog
103 | from the channel, it will grant more credit to the channel, unblocking
104 | it, which will result in the channel granting more credit to the
105 | reader, unblocking it. Once again, that’s under normal conditions,
106 | but, you guessed it, what if the message store is blocking the queue
107 | process? Then credit to the channel will be deferred, which will
108 | remain blocked, deferring credit to the reader, leaving the reader
109 | blocked. At some point, the message store will grant messages to the
110 | queue process, which will grant messages back to the channel, and then
111 | the channel will finally grant messages to the reader and unblock the
112 | reader:
113 | 
114 | ```
115 | reader <--[grant]-- channel <--[grant]-- queue process <--[grant]--message store
116 | ```
117 | 
118 | Having one channel and one queue process makes things easier to
119 | understand but it might not reflect reality. It’s common for RabbitMQ
120 | users to have more than one channel publishing messages on the same
121 | connection. Even more common is to have one message being routed to
122 | more than one queue. What happens with the credit flow scheme we’ve
123 | just explained is that if one of those queues blocks the channel, then
124 | the reader will be blocked as well.
125 | 
126 | The problem is that from a reader standpoint, when we read a frame
127 | from the network, we don’t even know to which channel it belongs
128 | to. Keep in mind that channels are a logical concept on top of AMQP
129 | connections. So even if a new AMQP command will end up in a channel
130 | that is not blocking the reader, the reader has no way of knowing
131 | it. Note that we only block publishing connections, consumers
132 | connections are unaffected since we want consumers to drain messages
133 | from queues. This is a good reason why it might be better to have
134 | connections dedicated to publishing messages, and connections
135 | dedicated for consumers only.
136 | 
137 | On a similar fashion, whenever a channel is processing message
138 | publishes, it doesn’t know where messages will end up going, until it
139 | performs routing. So a channel might be receiving a message that
140 | should end up in a queue that is not blocking the channel. Since at
141 | ingress time we don’t know any of this, then the credit flow strategy
142 | in place is to block the reader until processes down the chain are
143 | able to handle new messages.
144 | 
145 | One of the new settings introduced in RabbitMQ 3.5.5 is the ability to
146 | modify the values for `credit_flow_default_credit`. This setting takes
147 | a tuple of the form `{InitialCredit,
148 | MoreCreditAfter}`. `InitialCredit` is set to 200 by default, and
149 | `MoreCreditAfter` is set to 50. Depending on your particular workflow,
150 | you need to decide if it’s worth bumping those values. Let’s see the
151 | message flow scheme again:
152 | 
153 | ```
154 | reader -> channel -> queue process -> message store.
155 | ```
156 | 
157 | Bumping the values for `{InitialCredit, MoreCreditAfter}` will mean
158 | that at any point in that chain we could end up with more messages
159 | than those that can be handled by the broker at that particular point
160 | in time. More messages means more RAM usage. The same can be said
161 | about `msg_store_credit_disc_bound`, but keep in mind that there’s
162 | only one message store[2] per RabbitMQ instance, and there can be many
163 | channels sending messages to the same queue process. So while a queue
164 | process has a value of 2000 as InitialCredit from the message store,
165 | that queue can be ingesting many times that value from different
166 | channel/connection sources. So 200 credits as initial
167 | `credit_flow_default_credit` value could be seen as too conservative,
168 | but you need to understand if according to your workflow that’s still
169 | good enough or not.
170 | 
171 | ## Message Paging ##
172 | 
173 | Let’s take a look at how RabbitMQ queues store messages. When a
174 | message enters the queue, the queue needs to determine if the message
175 | should be persisted or not. If the message has to be persisted, then
176 | RabbitMQ will do so right away[3]. Now even if a message was persisted
177 | to disk, this doesn’t mean the message got removed from RAM, since
178 | RabbitMQ keeps a cache of messages in RAM for fast access when
179 | delivering messages to consumers. Whenever we are talking about paging
180 | messages out to disk, we are talking about what RabbitMQ does when it
181 | has to send messages from this cache to the file system.
182 | 
183 | When RabbitMQ decides it needs to page messages to disk it will call
184 | the function `reduce_memory_use` on the internal queue implementation in
185 | order to send messages to the file system. Messages are going to be
186 | paged out in batches; how big are those batches depends on the current
187 | memory pressure status. It basically works like this:
188 | 
189 | The function `reduce_memory_use` will receive a number called target
190 | ram count which tells RabbitMQ that it should try to page out messages
191 | until only that many remain in RAM. Keep in mind that whether messages
192 | are persistent or not, they are still kept in RAM for fast delivery to
193 | consumers. Only when memory pressure kicks in, is when messages in
194 | memory are paged out to disk. Quoting from our code comments: “The
195 | question of whether a message is in RAM and whether it is persistent
196 | are orthogonal”.
197 | 
198 | The number of messages that are accounted for during this chunk
199 | calculation are those messages that are in RAM (in the aforementioned
200 | cache), plus the number of pending acks that are kept in RAM (i.e.:
201 | messages that were delivered to consumers and are pending
202 | acknowledgment). If we have 20000 messages in RAM (cache + pending
203 | acks) and then target ram count is set to 8000, we will have to page
204 | out 12000 messages. This means paging will receive a quota of 12000
205 | messages. Each message paged out to disk will consume one unit from
206 | that quota, whether it’s a pending ack, or an actual message from the
207 | cache.
208 | 
209 | Once we know how many messages need to be paged out, we need to decide
210 | from where we should page them first: pending acks, or the message
211 | cache. If pending acks is growing faster than messages the cache, ie:
212 | more messages are being delivered to consumers than those being
213 | ingested, this means the algorithm will try to page out pending acks
214 | first, and then try to push messages from the cache to the file
215 | system. If the cache is growing faster than pending acks, then
216 | messages from the cache will be pushed out first.
217 | 
218 | The catch here is that paging messages from pending acks (or the cache
219 | if that comes first) might result in the first part of the process
220 | consuming all the quota of messages that need to be pushed to disk. So
221 | if pending acks pushes 12000 acks to disk as in our example, this
222 | means we won’t page out messages from the cache, and vice versa.
223 | 
224 | This first part of the paging process sent to disk certain amount of
225 | messages (between acks + messages paged from the cache). The messages
226 | that were paged out just had their contents paged out, but their
227 | position in the queue is still in RAM. Now the queue needs to decide
228 | if this extra information that’s kept in RAM needs to be paged out as
229 | well, to further reduce memory usage. Here is where
230 | `msg_store_io_batch_size` finally enters into play (coupled with
231 | `msg_store_credit_disc_bound` as well). Let’s try to understand how
232 | they work.
233 | 
234 | The settings for `msg_store_credit_disc_bound` affect how internal
235 | credit flow is handled when sending message to disk. The
236 | `rabbitmq_msg_store` module implements a database that takes care of
237 | persisting messages to disk. Some details about the why’s of this
238 | implementation can be found here: RabbitMQ, backing stores, databases
239 | and disks.
240 | 
241 | The message store has a credit system for each of the clients that
242 | send writes to it. Every RabbitMQ queue would be a read/write client
243 | for this store. The message store has a credits mechanism to prevent a
244 | particular writer to overflow its inbox with messages. Assuming
245 | current default values, when a writer starts talking to the message
246 | store, it receives an initial credit of 2000 messages, and it will
247 | receive more credit once 500 messages are processed. When is this
248 | credit consumed then? Credit is consumed whenever we write to the
249 | message store, but that doesn’t happen for every message. The plot
250 | thickens.
251 | 
252 | Since version 3.5.0 it’s possible to embed small messages into the
253 | queue index, instead of having to reach the message store for
254 | that. Messages that are smaller than a configurable setting (currently
255 | 4096 bytes) will go to the queue index when persisted, so those
256 | messages won’t consume this credit. Now, let’s see what happens with
257 | messages that do need to go to the message store.
258 | 
259 | Whenever we publish a message that’s determined to be persistent
260 | (persistent messages published to a durable queue), then that message
261 | will consume one of these credits. If a message has to paged out to
262 | disk from the cache mentioned above, it will also consume one
263 | credit. So if during message paging we consume more credits than those
264 | currently available for our queue, the first half of the paging
265 | process might stop, since there’s no point in sending writes to the
266 | message store when it won’t accept them. This means that from the
267 | initial quota of 12000 that we would have had to page out, we only
268 | managed to process 2000 of them (assuming all of them need to go to
269 | the message store).
270 | 
271 | So we managed to page out 2000 messages, but we still keep their
272 | position in the queue in RAM. Now the paging process will determine if
273 | it needs to also page out any of these messages positions to disk as
274 | well. RabbitMQ will calculate how many of them can stay in RAM, and
275 | then it will try to page out the remaining of them to disk. For this
276 | second paging to happen, the amount of messages that has to be paged
277 | to disk must be greater than `msg_store_io_batch_size`. The bigger
278 | this number is, the more message positions RabbitMQ will keep in RAM,
279 | so again, depending on your particular workload, you need to tune this
280 | parameter as well.
281 | 
282 | Another thing we improved significantly in 3.5.5 is the performance of
283 | paging queue index contents to disk. If your messages are generally
284 | smaller than `queue_index_embed_msgs_below`, then you’ll see the
285 | benefit of these changes. These changes also affect how message
286 | positions are paged out to disk, so you should see improvements in
287 | this area as well. So while having a low `msg_store_io_batch_size`
288 | might mean the queue index will have more work paging to disk, keep in
289 | mind this process has been optimized.
290 | 
291 | ## Queue Mirroring ##
292 | 
293 | To keep the descriptions above a bit simpler, we avoided bringing
294 | queue mirroring into the picture. Credit flows also affects mirroring
295 | from a channel point of view. When a channel delivers AMQP messages to
296 | queues, it sends the message to each mirror, consuming one credit from
297 | each mirror process. If any of the mirrors is slow processing the
298 | message then that particular mirror might be responsible for the
299 | channel being blocked. If the channel is being blocked by a mirror,
300 | and that queue mirror gets partitioned from the network, then the
301 | channel will be unblocked only after RabbitMQ detects the mirror
302 | death.
303 | 
304 | Credit flow also takes part when synchronising mirrored queues, but
305 | this is something you shouldn’t care too much about, mostly because
306 | there’s nothing you could do about it, since mirror synchronisation is
307 | handled entirely by RabbitMQ.
308 | 
309 | ## Footnotes ##
310 | 
311 | 1. A message can be delivered to more than one queue process.
312 | 2. There are two message stores, one for transient messages and one for persistent messages.
313 | 3. RabbitMQ will call fsync every 200 ms.
314 | 


--------------------------------------------------------------------------------
/deliver_to_queues.md:
--------------------------------------------------------------------------------
  1 | # Deliver To Queues #
  2 | 
  3 | In this document we will be going over quite a few RabbitMQ modules,
  4 | since a message crosses all of these once it enters the broker:
  5 | 
  6 | ```
  7 | rabbit_reader -> rabbit_channel -> rabbit_amqqueue -> delegate -> rabbit_amqqueue_process -> rabbit_backing_queue
  8 | ```
  9 | 
 10 | Let's see this process in more detail.
 11 | 
 12 | The process of delivering messages to queues start during
 13 | `basic.publish`, right after the channel receives the result from
 14 | calling `rabbit_exchange:route/2`.
 15 | 
 16 | First we need to lookup the list of `#amqqueue` records based on the
 17 | destinations obtained from `route/2`. These records will be passed to
 18 | the function `rabbit_amqqueue:deliver/2` where they will be used to
 19 | obtain the _pids_ of the queue process where the message is going to
 20 | be delivered. Once the master and slave pids have been obtained, then
 21 | the message can start its way to be delivered to a queue process,
 22 | which consists of two parts: accounting for credit flow, and casting
 23 | the message into the queue process.
 24 | 
 25 | If the message delivery arrived with `flow = true`, then `credit_flow`
 26 | must be accounted for this message. One credit for each master Pid
 27 | where the message should arrive, plus one credit for each slave pid
 28 | that receives the message.
 29 | 
 30 | Then the message delivery will be sent to master pids and slave pids,
 31 | via the `delegate` framework. The Erlang message will have this shape:
 32 | 
 33 | ```erlang
 34 | {deliver,            %% message tag
 35 |  Delivery,           %% The Delivery record
 36 |  SlaveWhenPublished} %% The Pid that received the message, was it a
 37 |                      %% slave when the deliver was published? This is
 38 |                      %% used in case of slave promotion
 39 | ```
 40 | 
 41 | You can learn more about the delegate framework
 42 | [here](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/delegate.erl#L10).
 43 | 
 44 | ## AMQQueue Process Message Handling ##
 45 | 
 46 | At this point the message delivery will finally arrive at the queue
 47 | process, implemented as a `gen_server2` callback inside the
 48 | `rabbit_amqqueue_process` module. The message from the delegate
 49 | framework will be received by the `handle_cast/2` callback. This
 50 | callback will ack the `credit_flow` issued in above, and it will
 51 | monitor the message sender. The message sender is usually the
 52 | `rabbit_channel` that received the process. This pid is tracked using
 53 | the
 54 | [pmon module](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/pmon.erl). The
 55 | state is kept as part of the `senders` field in the gen_server state
 56 | record. Once the message sender is accounted for the delivery is
 57 | passed to the function `deliver_or_enqueue/3`. There is where the
 58 | message will either be sent to a consumer or enqueued into the backing
 59 | queue.
 60 | 
 61 | ### Mandatory Message Handling ###
 62 | 
 63 | The first thing `deliver_or_enqueue/3` does is to account for the
 64 | mandatory flag of the delivery. If the message was published as
 65 | mandatory, then at this point the queue process will consider the
 66 | message as routed to the queue. To that effect, the queue process will
 67 | cast the message `{mandatory_received, MsgSeqNo}` to the channel pid
 68 | that received the delivery. The channel process will proceed to
 69 | forget the message, since from the point of view mandatory message
 70 | handling, there isn't anything left to do for that particular
 71 | delivery.
 72 | 
 73 | Take a look at the
 74 | [mandatory message handling guide](./mandatory_message_handling.md) for
 75 | more info.
 76 | 
 77 | ### Message Confirm Handling ###
 78 | 
 79 | When handling confirms we need to take into account two things: is the
 80 | queue durable, and was the message published as persistent. If that's
 81 | the case, then the queue process will keep track of the `MsgId` in
 82 | order to confirm the message back later to the channel that received
 83 | it from a producer. To achieve that, the queue process keeps track of
 84 | a dictionary in the process state, using `msg_id_to_channel` record
 85 | field to hold it. As the name of the field implies, this dictionary
 86 | maps _msg ids_ to _channels_. When a message is finally persisted to
 87 | disk by the backing queue, then the BQ will notify the queue process,
 88 | which will send the confirm back to the channel using the
 89 | `msg_id_to_channel` dictionary just mentioned.
 90 | 
 91 | If the queue was non durable, or the message was published as
 92 | transient, then the queue process will proceed to issue a confirm back
 93 | to the channel that sent the message in.
 94 | 
 95 | The function `rabbit_misc:confirm_to_sender/2` is the one taking care
 96 | of sending confirms back to channels.
 97 | 
 98 | Take a look at the
 99 | [publisher confirm handling guide](./publisher_confirms.md) for more info.
100 | 
101 | ### Check for Message Duplicates ###
102 | 
103 | The next step is to check if the message has been seen by the queue
104 | before. If the backing queue responds that the message is a duplicate,
105 | then processing stops right here, since there's anything left to do
106 | for this delivery, so `deliver_or_enqueue/3` simply returns.
107 | 
108 | ### Attempt to Deliver the Message to a Consumer ###
109 | 
110 | To try to send the message delivery to a consumer, the function
111 | `attempt_delivery/4` is called. This function will in turn call
112 | `rabbit_queue_consumers:delivery/3` which takes a `FetchFun`, the
113 | `QueueName`, and the `Consumers State` for this particular queue. The
114 | Fetch Fun will return the message that will be delivered to the
115 | consumer (if a consumer is available). This function deals with
116 | message acknowledgment from the point of view of the queue. If the
117 | consumer is in `ackmode = true`, then the message will be
118 | `publish_delivered` into the backing queue, otherwise the message will
119 | be discarded.
120 | 
121 | Discarding a message involves confirming the message, in case that's
122 | required for this particular delivery, and telling the backing queue
123 | to discard it as well.
124 | 
125 | Once the queue attempted to deliver the message straight to a
126 | consumer, it will call the function `maybe_notify_decorators/2` which
127 | takes care of telling the queue decorators that the consumer state
128 | might have changed. See the [queue decorators](./queue_decorators.md)
129 | guide for more information on how decorators work.
130 | 
131 | The `attempt_delivery/4` will return back to the
132 | `deliver_or_enqueue/3` function telling it if the message was
133 | `delivered` or if it is still `undelivered`. If the message was
134 | delivered to a consumer, then there's nothing else to do, and
135 | `deliver_or_enqueue/3` will simply return. Otherwise there's still
136 | more to do.
137 | 
138 | ### Handling Undelivered Messages ###
139 | 
140 | When handling undelivering messages, there's a special case that can
141 | be considered an optimization. If the queue has a
142 | [TTL](https://www.rabbitmq.com/ttl.html) of 0, and no
143 | [DLX](https://www.rabbitmq.com/dlx.html) has been set up, then there
144 | is no point in queueing this message, so it can be discarded in the
145 | same way as explained above.
146 | 
147 | If a message cannot be discarded, then it has to be enqueued, so the
148 | queue process will `publish` the message into the backing queue. After
149 | the message has been published, we need to enforce the various
150 | policies that might apply to this queue, like `max-length` for
151 | example. This means we need to see if the queue head has to be
152 | dropped. Once that's enforced, then we also have to check if we need
153 | to drop expired messages. Both these functions work in conjunction
154 | with the DLX feature mentioned above. At this point
155 | `deliver_or_enqueue/3` returns.
156 | 
157 | ## Bookkeeping ##
158 | 
159 | Even if we are done with the delivery after this was handled by the
160 | respective queue processes where it was sent, we still need to perform
161 | some bookkeeping on the channel side. The `rabbit_amqqueue:deliver/2`
162 | function will return a list of `QPids` that received the
163 | messages. This list of pids will be used now for bookkeeping.
164 | 
165 | ### Queue Monitoring ###
166 | 
167 | The first thing to do is to monitor the queue pids to which the
168 | message was delivered. This is done among other things, to account for
169 | credit flow in case the queue goes down. We don't want to block the
170 | channel forever if a queue that's blocking it is actually down.
171 | 
172 | Take a look at the `handle_info` channel callback for the case when a
173 | `DOWN` message is
174 | [received](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_channel.erl#L818).
175 | 
176 | ### Process Mandatory Messages ###
177 | 
178 | Here if the message wasn't delivered to any queue, then it's time to
179 | issue `basic.return`s back to the publisher that sent them. If the
180 | message was delivered to queues, then those `QPids` will be kept into
181 | a dictionary for later processing.
182 | 
183 | As explained above, once the queue process receives the message
184 | delivery, then it will take care of updating the `mandatory`
185 | dictionary on the channel's state.
186 | 
187 | ### Process Confirms ###
188 | 
189 | Similar as with mandatory messages, if the message wasn't routed to
190 | any queue, then it's time to record the message as confirmed. If the
191 | message was delivered to some queues, then it will be tracked as
192 | unconfirmed until the queue updates the message status.
193 | 
194 | ### Stats Update ###
195 | 
196 | The final step for the channel is to account for stats, so it will
197 | update the exchange stats, indicating that a message has been routed,
198 | and then it will also update the queue stats, to indicate that a
199 | message was delivered to this or that queue.
200 | 
201 | ## Summary ##
202 | 
203 | Delivering a message to a RabbitMQ queue is quite an involved process,
204 | and we didn't even touch on queue mirroring! The main things to
205 | account for when handling a delivery are mandatory messages and
206 | message confirms. Both have to be handled accordingly, and the whole
207 | process is coordinated between the channel process and the queue
208 | process that receives the message. Other than that, the queue needs to
209 | see if the message can be delivered to a consumer or if it has to be
210 | enqueued for later. Once this is handled, the queue needs to enforce
211 | the various policies that can be applied to it, like TTLs, or
212 | max-lengths.
213 | 
214 | To understand what happens once a message arrives to a queue, take
215 | look at the [variable queue](./variable_queue.md) guide.
216 | 


--------------------------------------------------------------------------------
/exchange_decorators.md:
--------------------------------------------------------------------------------
 1 | # Exchange Decorators #
 2 | 
 3 | Exchange decorators are modules implemented as behaviours that can let
 4 | you extend existing exchanges. For example, you might want to perform
 5 | some actions only when the exchange is created, or deleted, but leave
 6 | alone the whole routing logic to the underlying exchange.
 7 | 
 8 | Decorators are usually associated with exchanges via policies.
 9 | 
10 | See the `active_for/1`
11 | [callback](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_exchange_decorator.erl#L70)
12 | to understand which functions on the exchange would be decorated.
13 | 
14 | Take a look at the
15 | [Sharding Plugin](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_sharding/src/rabbit_sharding_exchange_decorator.erl)
16 | and the
17 | [Federation Plugin](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_federation/src/rabbit_federation_exchange.erl)
18 | to see how exchange decorators are implemented.
19 | 


--------------------------------------------------------------------------------
/interceptors.md:
--------------------------------------------------------------------------------
 1 | # Interceptors #
 2 | 
 3 | Interceptors are modules implemented as behaviours that allow plugin
 4 | authors to intercept and modify AMQP methods before they are handled
 5 | by the channel process. They were originally created for the
 6 | development of the
 7 | [Sharding Plugin](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_sharding/README.extra.md#intercepted-channel-behaviour)
 8 | to facilitate mapping queue names as specified by users vs. the actual
 9 | names used by sharded queues. Another plugin using interceptors is the
10 | [Message Timestamp Plugin](https://github.com/rabbitmq/rabbitmq-message-timestamp)
11 | which injects timestamps into message properties during
12 | `basic.publish`.
13 | 
14 | An interceptor must implement the `rabbit_channel_interceptor`
15 | behaviour. The most important callback is `intercept/3` where an
16 | interceptor will be provided with the original AMQP method record that
17 | the channel should process, the AMQP method content, if any, and the
18 | interceptor state (see
19 | [init/1](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_channel_interceptor.erl#L36)). This
20 | callback should take the AMQP method that was passed to it, and the
21 | content, and modify it accordingly. For example, the Sharding Plugin
22 | will receive a `basic.consume` method, with a sharded queue called
23 | `my_queue` and it will map that name, to the appropriate shard for the
24 | client that issued the `basic.consume` command, for example:
25 | `sharding: my_queue - 1`. This means that if the channel received the
26 | following record:
27 | 
28 | ```erlang
29 | #'basic.consume'{queue = <<"my_queue">>}
30 | ```
31 | 
32 | Then the interceptor will pass back the following transformed method:
33 | 
34 | ```erlang
35 | #'basic.consume'{queue = <<"sharding: my_queue - 1">>}
36 | ```
37 | 
38 | This process is transparent to the user and to the channel
39 | code. There's no need by RabbitMQ core developers to add extra
40 | functionality to the `rabbit_channel` in order to support sharding,
41 | since the interceptors take care of that. For example, if we need to
42 | inject a timestamp into each message that crossed the broker, instead
43 | of modifying the `rabbit_channel` code to do that, we can just create
44 | a new interceptor for the `basic.publish` method, and there inject the
45 | desired timestamp.
46 | 
47 | Interceptors can do more than just modifying AMQP methods, they can
48 | also forbid its access. A good example is again the Sharding
49 | Plugin. If we have a sharded queue called `my_queue` then it won't
50 | make much sense to allow users to declare queues with that name, so
51 | the sharding interceptor also intercepts the `queue.declare` method,
52 | but in this case if the queue name provided matches that of a sharded
53 | queue, then a channel error is produced.
54 | 
55 | Keep in mind that while we can enable several interceptors, only one
56 | interceptor can intercept a particular AMQP method, otherwise we would
57 | need to define interceptors priorities, plus a way to merge the
58 | results of their invocations.
59 | 
60 | ## Enabling Interceptors ##
61 | 
62 | To enable interceptors, they have to be registered into the
63 | [rabbit_registry](./rabbit_registry.md), via a
64 | [boot step](./boot_steps.md):
65 | 
66 | ```erlang
67 | -rabbit_boot_step({?MODULE,
68 |                    [{description, "sharding interceptor"},
69 |                     {mfa, {rabbit_registry, register,
70 |                            [channel_interceptor,
71 |                             <<"sharding interceptor">>, ?MODULE]}},
72 |                     {cleanup, {rabbit_registry, unregister,
73 |                                [channel_interceptor,
74 |                                 <<"sharding interceptor">>]}},
75 |                     {requires, rabbit_registry},
76 |                     {enables, recovery}]}).
77 | ```
78 | 
79 | Once the interceptor is registered, only new channels will use it to
80 | intercept AMQP methods. Channels that were already running won't load
81 | the interceptor. In a similar fashion, if a plugin that provides
82 | interceptors is disabled, then only new channels will stop using these
83 | interceptors.
84 | 


--------------------------------------------------------------------------------
/internal_events.md:
--------------------------------------------------------------------------------
 1 | # Internal Events
 2 | 
 3 | This document describes a mechanism RabbitMQ components use to notify
 4 | each other of events. Note that this mechanism is not used to transfer
 5 | messages between components or nodes. It is also entirely transient.
 6 | 
 7 | ## Overview
 8 | 
 9 | Client connection, channels, queues, consumers, and other parts of the system
10 | naturally generate events. Other parts of the system can be interested
11 | in observing those events. RabbitMQ has a very minimalistic mechanism that
12 | is used for internal event notifications both within a single node and
13 | across nodes in a cluster.
14 | 
15 | For example, when a policy is modified, RabbitMQ needs to apply it
16 | to matching queues and notify the queues that no longer match.
17 | These events are irrelevant to clients and have no relation to messaging
18 | protocols.
19 | 
20 | ## Events, Metrics, Stats
21 | 
22 | Perhaps the heaviest user of this notification subsystem, known
23 | as `rabbit_event`, is the [management plugin](./metrics_and_management_plugin.md).
24 | Management plugin's metrics are collected in a variety of ways but often
25 | transferred over the internal event subsystem.
26 | 
27 | For example, when a connection is accepted, authenticated and access
28 | to the target virtual host is authorised, it will emit an event of type
29 | `connection_created`. When a connection is closed or fails for any reason,
30 | a `connection_closed` event is emitted. Events from connections, channels, queues, consumers,
31 | and so on are processed and stored as metrics
32 | to be later served over HTTP to the management plugin UI application.
33 | 
34 | 
35 | ## rabbit_event
36 | 
37 | Both internal event publishers and consumers interact with the notification
38 | subsystem using a single module, `rabbit_event`. Publishers typically
39 | use the `rabbit_event:notify/2` function, consumers register
40 | [gen_event](https://learnyousomeerlang.com/event-handlers) event handlers.
41 | 
42 | Every event is an instance of the `#event` record.
43 | An event has a name (e.g. `connection_created` or `queue_deleted`), a timestamp and an
44 | dictionary-like data structure (a proplist) for payload.
45 | 
46 | The mechanism is very minimalistic: every event handler receives a copy
47 | of every event and ignores those that are irrelevant to it.
48 | 
49 | 
50 | ## Acting User Details
51 | 
52 | Starting with RabbitMQ 3.7.0, internal components try to associate an
53 | acting user with each emitted event, where possible. For example,
54 | if a channel is opened on a connection, the acting user is the user
55 | of that connection.
56 | 
57 | In some cases there is no acting user or it cannot be known, for example,
58 | when a connection is forcefully closed via CLI tools. In such cases
59 | dummy usernames are used, e.g. `rmq-internal` or `rmq-cli`.
60 | 
61 | 
62 | ## rabbitmq-event-exchange Plugin
63 | 
64 | [rabbitmq-event-exchange](https://github.com/rabbitmq/rabbitmq-server/tree/master/deps/rabbitmq_event_exchange) is a plugin that consumes internal events
65 | and re-publishes them to a topic exchange, thus exposing the events
66 | to clients (applications).
67 | 
68 | This can be used by monitoring an audit systems.
69 | 


--------------------------------------------------------------------------------
/mandatory_message_handling.md:
--------------------------------------------------------------------------------
 1 | # Mandatory Message Handling #
 2 | 
 3 | When we publish a message with the mandatory flag on, this means that
 4 | the broker must notify the publisher if the message is not routed to
 5 | any queue via the `basic.return` AMQP command. In this guide we will
 6 | see how the channel handles mandatory messages.
 7 | 
 8 | ## Tracking Mandatory Messages ##
 9 | 
10 | Note: Implementation was changed in
11 | [3.8.0](https://github.com/rabbitmq/rabbitmq-server/pull/1831),
12 | removing the dtree strucutre described below.
13 | 
14 | Mandatory messages are tracked in the `mandatory` field of the
15 | channel's state record. Messages are tracked using our own
16 | [dtree](https://github.com/rabbitmq/rabbitmq-server/blob/v3.7.28/src/dtree.erl)
17 | data structure. As explained in that module documentation, entries on
18 | the _dual-index tree_ are stored using a primary key, a set of
19 | secondary keys, and a value. In the case of tracking mandatory
20 | messages we have:
21 | 
22 | - primary key: the `MsgSeqNo` assigned to the message by the channel
23 | - secondary keys: the list of queue pids where the message was routed
24 | - value: the actual message
25 | 
26 | Keep in mind that delivering the message to queues is an asynchronous
27 | operation, this means that in the `deliver_to_queues/2` function
28 | inside the channel, we just know to which queues the message was
29 | routed to, but this doesn't mean it had arrived there. Therefore only
30 | when a queue has accepted the message, we can forget about it and
31 | consider it as routed.
32 | 
33 | ## Forgetting About Mandatory Messages ##
34 | 
35 | Once a queue has received a message, in other words, the message was
36 | successfully routed to the queue, the queue process will cast the
37 | following message back to the channel pid:
38 | 
39 | ```
40 | {mandatory_received, MsgSeqNo}
41 | ```
42 | 
43 |  The channel will then proceed to use the `MsgSeqNo` to forget about
44 | the mandatory message it was tracking, by deleting it from the
45 | `mandatory` dtree from the channel state.
46 | 
47 | Keep in mind that mandatory messages only require that a return is
48 | sent in case the message is unroutable, so it's safe to forget about
49 | it once the message has been routed to a queue.
50 | 
51 | ## Sending Returns ##
52 | 
53 | If a mandatory message cannot be routed to any queue then we need to
54 | send `basic.return`s back to the publisher. This is done in two
55 | different places, responding to different situations.
56 | 
57 | The first one is the obvious one, if the message is not routed to any
58 | queue, then we can safely send a `basic.return` back. Take a look at
59 | the function `rabbit_channel:process_routing_mandatory/5` for more details.
60 | 
61 | The other situation arises when a queue where we had just publishes
62 | messages crashes before it is able to receive the message. As
63 | explained in the [deliver to queues](./deliver_to_queues.md) guide, we
64 | monitor the QPids where the message was delivered. If the monitor
65 | reports that the queue has crashed, then we will send `basic.return`
66 | for all the messages that were delivered to the queues that
67 | crashed. Take a look at the function
68 | `rabbit_channel:handle_publishing_queue_down/3` for more information.
69 | 
70 | ## Related guides ##
71 | 
72 | - [basic publish](./basic_publish.md)
73 | - [deliver to queues](./deliver_to_queues.md)
74 | 


--------------------------------------------------------------------------------
/metrics_and_management_plugin.md:
--------------------------------------------------------------------------------
  1 | # Metrics and Management Plugin Architecture (3.6.7+)
  2 | 
  3 | This document describes key implementation aspects of [RabbitMQ management plugin](https://www.rabbitmq.com/management.html)
  4 | starting with version 3.6.7. Earlier versions of the plugin had a substantially different
  5 | architecture.
  6 | 
  7 | ## Overview
  8 | 
  9 | Since 3.6.7 the management plugin has been re-designed to spread the memory
 10 | used for statistics across the entire rabbit cluster instead of aggregating
 11 | it all in a single node. Doing this isn't free. There is a trade-off in
 12 | metric latency and processing for memory stability.
 13 | 
 14 | 
 15 | ## Components
 16 | 
 17 | There are four main components:
 18 | 
 19 |  * [Internal event notifications](./internal_events.md)
 20 |  * Core metrics
 21 |  * rabbitmq-management-agent
 22 |  * rabbitmq-management
 23 | 
 24 | 
 25 | 
 26 | ## Core metrics
 27 | 
 28 | Core metrics are implemented in the rabbitmq server itself consisting of
 29 | a set of of ETS tables storing either counters or proplists containing details
 30 | or metrics of various entities. The schema of each table is documented in
 31 | [rabbit_core_metrics.hrl](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/include/rabbit_core_metrics.hrl)
 32 | in `rabbitmq-common`.
 33 | 
 34 | Mostly counters that are incremented in real-time as message interactions occur
 35 | in queues, channels, exchanges etc.
 36 | 
 37 | This replaces the previous approach of emitting events containing metrics
 38 | at regular intervals. `created` and `deleted` events are still emitted,
 39 | however `stats` events have been removed.
 40 | 
 41 | Because no unbounded queues are involved this approach should have fixed
 42 | memory overhead in relation to the number of active entities in the system.
 43 | 
 44 | 
 45 | 
 46 | ## Management Agent
 47 | 
 48 | [rabbitmq-managment-agent](https://github.com/rabbitmq/rabbitmq-server/tree/master/deps/rabbitmq_management_agent) is responsible for turning core metrics into
 49 | data structures suitable for `rabbitmq-management` consumption.  This is
 50 | done on a per node basis. There are no inter-node communications involved.
 51 | 
 52 | The management agent runs a set of metrics collector processes. There is one
 53 | process per core metrics table. Each collector periodically read its associated
 54 | core metrics table and performs some table-specific processing which produces
 55 | new data points to be inserted into the management metrics tables (defined in
 56 | [rabbitmq_mgmt_metrics.hrl](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_management_agent/include/rabbit_mgmt_metrics.hrl)).
 57 | The collection interval is determined by the smallest configured retention intervals.
 58 | 
 59 | In addition to the collector processes there is a garbage collection event
 60 | handler that handles the `delete` events emitted by the various processes to ensure
 61 | stats are completely cleared up. To make this efficient there is also a set of
 62 | index tables (see `rabbitmq_mgmt_metrics.hrl`) that allow the gc process to
 63 | remove all stats for a particular entity.
 64 | 
 65 | The management agent plugin also hosts the `rabbitmq_mgmt_external_stats` process
 66 | that periodically updates the core metrics tables with node specific stats
 67 | (such as free disk space or file descriptors available, data and log directory locations, et cetera).
 68 | Arguably this should be moved to the core at some point.
 69 | 
 70 | It is worth noting that the latency of metric processing is now related to the retention
 71 | interval and is typically higher than the previous version. To put it differently, it can
 72 | take longer for the stats DB to have up-to-date information after a particular event occurs.
 73 | This has no effect on the user but test suites that use the HTTP API would often
 74 | [need adapting](https://github.com/michaelklishin/rabbit-hole/blob/master/bin/ci/before_build.sh#L11).
 75 | 
 76 | 
 77 | ### exometer_slide
 78 | 
 79 | The [exometer_slide](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbitmq_management_agent/src/exometer_slide.erl)
 80 | module is a key part of the management stats processing.
 81 | It allows us to reasonably efficiently store a sliding window of incoming metrics
 82 | and also perform various processing on this window. It was extracted from the
 83 | [exometer_core](https://github.com/Feuerlabs/exometer_core) project but has
 84 | since been heavily modified to fit our specific needs.
 85 | 
 86 | One notable addition is the "incremental" slide type that is used to aggregate
 87 | data from multiple sources. A typical example would be vhost message rates.
 88 | 
 89 | 
 90 | ## HTTP API
 91 | 
 92 | The `rabbitmq-management` plugin is now mostly a fairly thin HTTP API layer.
 93 | 
 94 | It also handles the distributed querying and stats merging logic. When a stats
 95 | request comes in the plugin contacts each node in parallel for a set of "raw"
 96 | stats (typically `exometer_slide` instances). It uses the [delegate](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/delegate.erl)
 97 | module for this and has it's own `delegate` supervision tree to avoid affecting
 98 | the one used for core rabbit delegations. Once stats for
 99 | each node has been collected it merges the data then proceeds with processing
100 | this (for example turn sliding window data points into rates) for API
101 | consumption. Most of this logic is implemented in the `rabbit_mgmt_db` module.
102 | 
103 | This distributed querying/merging is arguably the most complex part of the stats
104 | system.
105 | 
106 | ### Distributed Querying Aggregation of Complex Stats
107 | 
108 | Because the information returned by the HTTP API is fairly heavily augmented (e.g.
109 | a request for a queue would also contain channel details) we often have to
110 | perform multiple distributed queries in response to a stats request.
111 | For example, to get the channel details for a queue we first have to fetch the
112 | queue stats, inspect the consumers attached to that queue then query for the
113 | channel details based on the consumer channel).
114 | 
115 | There are also inefficiencies when listing entities whose number could
116 | be unbounded (queues, channels, exchanges and connections).
117 | As management allows for sorting on almost any stats including rates we always
118 | need to fetch _all_ entity stats from each node, merge, sort then typically
119 | return a smaller page of items to the API. For systems with lots of such
120 | entities this can become very inefficient as potentially large amounts of data
121 | need to travel between nodes for each request. Therefore all requests that can
122 | return large numbers of entities go through an adaptive cache processes that adjusts
123 | its cache expiry time based on how long it took to fetch all that data. This
124 | should provide some degree of protection against excessive entity listings. It
125 | would be prudent to reduce the frequency of these queries if at all possible
126 | in heavily loaded systems.
127 | 


--------------------------------------------------------------------------------
/mirroring.md:
--------------------------------------------------------------------------------
 1 | The essay at the top of rabbit_mirror_queue_coordinator has quite a
 2 | decent overview of how mirroring works. In order to avoid repetition,
 3 | this will aim to be an even higher-level summary.
 4 | 
 5 | How mirroring works
 6 | -------------------
 7 | 
 8 | In very quick terms: the master is a rabbit_backing_queue (BQ)
 9 | implementation "between" the rabbit_amqqueue_process and
10 | rabbit_variable_queue (VQ), while the slave is a full process
11 | implementing most of a queue (again in terms of VQ/BQ). They
12 | communicate via GM. Since the master can't receive messages in its own
13 | right there is also an associated coordinator process.
14 | 
15 | See rabbit_mirror_queue_coordinator for much more.
16 | 
17 | How mirroring is controlled
18 | ---------------------------
19 | 
20 | Policies call into the queue to tell them their policy has changed,
21 | which calls into rabbit_mirror_queue_misc to update mirrors. Each
22 | mirroring mode is an implementation of the behaviour
23 | rabbit_mirror_queue_mode - rmq_misc selects the appropriate rmq_mode,
24 | asks it which nodes should have slaves, and starts and stops slaves as
25 | appropriate.
26 | 
27 | Eager synchronisation
28 | ---------------------
29 | 
30 | The master and all slaves need to come to a halt while synchronising:
31 | we assume that handling publishes, deliveries or acks while
32 | synchronisation is ongoing is too hard. Therefore although the master
33 | and slaves are gen_servers, they essentially go into
34 | manually-implemented selective "receive" loops while syncing, only
35 | responding to a small set of messages and letting others back up -
36 | flow control will typically stop publishers from publishing too
37 | much. While syncing, the processes do respond to info requests and
38 | emit info messages periodically, so that rabbitmqctl and management do
39 | not become unresponsive and outdated respectively, but otherwise they
40 | are dead to the world.
41 | 
42 | Because of this, we need to take care not to interfere with the state
43 | of the master too much - leaving it with a different flow control
44 | state or set of monitors than it entered the sync process with would
45 | lead to subtle bugs. The master therefore spawns a local "syncer"
46 | process which handles communication with the slaves to sync them.
47 | 
48 | See rabbit_mirror_queue_sync for more details on how exactly the sync
49 | protocol works.
50 | 
51 | GM
52 | --
53 | 
54 | The gm.erl module contains an essay on how GM works. Unfortunately
55 | it's not easy to understand; a property which it shares with GM in
56 | general.
57 | 
58 | The overall principle is fairly clear: GM processes form a ring,
59 | around which messages are broadcast. Each message goes round twice,
60 | once to publish and once to acknowledge. New members enter the ring at
61 | any point; if a member dies the ring needs to heal and ensure that
62 | messages which might have been lost are sent again.
63 | 
64 | The last part is tricky. Much of the complexity in GM is around
65 | knowing which members have what knowledge, and how to bring members up
66 | to speed if one fails.
67 | 
68 | Additionally, the fact that ring information travels through two
69 | routes (around the ring itself, and through Mnesia) makes it even
70 | harder to reason about.
71 | 
72 | Finally, GM is not at all designed to cope with partial network
73 | partitions: if A is partitioned from B, then B can remove it from
74 | Mnesia, and that information can leak back to A via C. We currently
75 | don't handle this situation well; this is the biggest unsolved
76 | problem in RabbitMQ.
77 | 
78 | It might be worth replacing GM altogether with something new.
79 | 
80 | If modifying it, be very conservative about even small changes.
81 | 


--------------------------------------------------------------------------------
/networking_and_connections.md:
--------------------------------------------------------------------------------
  1 | # Networking and Connections
  2 | 
  3 | This guide provides an overview of how networking (TCP socket acceptors, listeners) and connections
  4 | (for multiple protocols) are implemented in RabbitMQ.
  5 | 
  6 | ## Boot Step(s)
  7 | 
  8 | When RabbitMQ starts, it executes a directed graph of boot steps, which depend on each other.
  9 | One of the steps is responsible for starting the networking-related machinery:
 10 | 
 11 | ``` erlang
 12 | -rabbit_boot_step({networking,
 13 |                    [{mfa,         {rabbit_networking, boot, []}},
 14 |                     {requires,    log_relay}]}).
 15 | ```
 16 | 
 17 | As you can see above, the function that kicks things off is `rabbit_networking:boot/0`.
 18 | 
 19 | ### Listener Tracking
 20 | 
 21 | Before we dive into `rabbit_networking:boot/0`, it should be explained
 22 | that listeners are tracked using a Mnesia table, `rabbit_listener`.
 23 | 
 24 | The purpose of tracking listeners is two-fold:
 25 | 
 26 |  * to make it possible to stop active listeners during shutdown
 27 |  * to make it possible to list them e.g. in the management UI
 28 | 
 29 | The table is updated by `rabbit_networking:tcp_listener_started/3` and
 30 | `rabbit_networking:tcp_listener_stopped/3`.
 31 | 
 32 | ### Distribution Listener
 33 | 
 34 | Erlang distribution TCP listener is also tracked: `rabbit_networking:boot/0`
 35 | uses node name and `erl_epmd:port_please/2` to determine distribution port.
 36 | 
 37 | ### Messaging Protocol Listeners
 38 | 
 39 | Every protocol supported usually has 1 or 2 listeners:
 40 | 
 41 |  * Plain ("X over TCP")
 42 |  * TLS ("X over TLS")
 43 | 
 44 | Listeners are collected from config file sections of RabbitMQ core
 45 | and plugins that provide protocol support (e.g. STOMP).
 46 | 
 47 | `rabbit_networking:boot_tcp/0` and `rabbit_networking:boot_ssl/0` start plain TCP and
 48 | TLS listeners, respectively.
 49 | 
 50 | 
 51 | ## Listener Process Tree
 52 | 
 53 | RabbitMQ as of 3.6.0 uses [Ranch](https://github.com/ninenines/ranch) in [embedded mode](https://github.com/ninenines/ranch/blob/master/doc/src/guide/embedded.asciidoc)
 54 | to accept TCP connections.
 55 | 
 56 | A listener is represented by two processes, which are
 57 | started under the `tcp_listener_sup` supervisor:
 58 | 
 59 |  * `tcp_listener`
 60 |  * `ranch_listener_sup`
 61 | 
 62 | The former handles listener tracking (see above), the latter is
 63 | a [Ranch listener](https://github.com/ninenines/ranch/blob/master/doc/src/guide/listeners.asciidoc) process.
 64 | 
 65 | `tcp_listener_sup` itself is a child of `rabbit_sup`, the top-level
 66 | RabbitMQ supervisor.
 67 | 
 68 | Every listener has one or more acceptors (under `ranch_acceptors_sup`)
 69 | and a supervisor for accepted client connections (under `ranch_conns_sup`).
 70 | 
 71 | `ranch_conns_sup` supervises client connections, which will be covered in more
 72 | details in the following section.
 73 | 
 74 | 
 75 | ## AMQP 0-9-1 Connection Process Tree
 76 | 
 77 | Every AMQP 0-9-1 connection has a supervisor, `rabbit_connection_sup`, which is placed under
 78 | `ranch_conns_sup` in the process tree. It supervises two processes:
 79 | 
 80 |  * `rabbit_reader`: an important module in the protocol, see below
 81 |  * `rabbit_connection_helper_sup`: supervises helper processes
 82 | 
 83 | So the hierarchy of processes looks like this:
 84 | 
 85 | ``` org-mode
 86 |  * rabbit_connection_sup
 87 |  ** rabbit_reader
 88 |  ** rabbit_connection_helper_sup
 89 |  *** rabbit_channel_sup_sup
 90 |  *** heartbeat_receiver
 91 |  *** heartbeat_sender
 92 |  *** rabbit_queue_collector
 93 | ```
 94 | 
 95 | ### rabbit_reader
 96 | 
 97 | `rabbit_reader` is one of the key modules in the AMQP 0-9-1 implementation.
 98 | 
 99 | This is a `gen_server`-like process that handles binary data parsing,
100 | authentication, connection negotiation state machine, and keeps track
101 | of channel processes.
102 | 
103 | This module also handles protocol "hand-offs" such as that to AMQP 1.0 reader
104 | when an AMQP 1.0 client connects (despite being a completely different protocol,
105 | it uses the same port as AMQP 0-9-1).
106 | 
107 | ### Auxiliary processes
108 | 
109 | Every connection has several auxiliary processes supervised by
110 | `rabbit_connection_helper_sup`:
111 | 
112 |  * `rabbit_channel_sup_sup`
113 |  * `heartbeat_receiver` (module: `rabbit_heartbeat`)
114 |  * `heartbeat_sender` (module: `rabbit_heartbeat`)
115 |  * `rabbit_queue_collector`
116 | 
117 | #### rabbit_channel_sup_sup
118 | 
119 | Top-level supervisor for channels. Every channel is represented by
120 | a group of processes under a supervisor (`rabbit_channel_sup`).
121 | 
122 | #### Heartbeat Processes
123 | 
124 | Heartbeat implementation uses two processes, one for sending heartbeats
125 | and another handling client heartbeats.
126 | 
127 | See `rabbit_heartbeat:start_heartbeat_sender/4` and `rabbit_heartbeat:stop_heartbeat_sender/4`.
128 | 
129 | #### Queue Collector
130 | 
131 | Queue collector keeps track of exclusive queues, whose lifecycle
132 | is tied to that of their declaring connection. Whenever a connection
133 | is closed or lost, all exclusive queues that belonged to it must be
134 | cleaned up.
135 | 
136 | `rabbit_queue_collector` is a `gen_server` which handles the above.
137 | 
138 | 
139 | ## STOMP Connection Process Tree
140 | 
141 | For STOMP, TCP listener and Ranch supervision tree is similar to
142 | that of AMQP 0-9-1 (see above) except that the `tcp_listener_sup` supervisor
143 | is under `rabbit_stomp_sup`.
144 | 
145 | Every STOMP client connection has a supervisor, `rabbit_stomp_client_sup`, which
146 | supervises 2 processes:
147 | 
148 |  * `rabbit_stomp_reader`
149 |  * `rabbit_stomp_heartbeat_sup`
150 | 
151 | `rabbit_stomp_reader` is similar to `rabbit_reader` but also
152 | handles parsed protocol commands (this structure is new in 3.6.0 and
153 | matches the one used by MQTT plugin).
154 | 
155 | Finally, `rabbit_stomp_heartbeat_sup` supervises heartbeat delivery,
156 | reusing `rabbit_heartbeat`.
157 | 
158 | 
159 | ## MQTT Connection Process Tree
160 | 
161 | MQTT TCP listener and Ranch supervision tree is effectively identical to
162 | that of STOMP (see above).
163 | 
164 | Every MQTT client connection has a supervisor, `rabbit_mqtt_connection_sup`,
165 | which supervises two processes:
166 | 
167 |  * `rabbit_mqtt_reader` combines protocol parsing, state machine, and command handling
168 |  * `rabbit_mqtt_keepalive_sup` that handles MQTT keep-alives (heartbeats),
169 |    reusing `rabbit_heartbeat`
170 | 


--------------------------------------------------------------------------------
/publisher_confirms.md:
--------------------------------------------------------------------------------
 1 | # Publisher Confirms #
 2 | 
 3 | Publisher Confirms are a way to tell publishers that messages have
 4 | been accepted by the broker and that the broker now takes full
 5 | responsibility for the message (ie: it was written to disk if it was
 6 | persistent, replicated if mirroring was enabled, and so on). Take a
 7 | look at the
 8 | [publisher confirms](https://www.rabbitmq.com/confirms.html)
 9 | documentation in order to understand the feature.
10 | 
11 | ## Tracking Confirms ##
12 | 
13 | Note: Implementation was slightly changed in
14 | [3.8.0](https://github.com/rabbitmq/rabbitmq-server/pull/1893),
15 | replacing the dtree strucutre described below.
16 | 
17 | Confirms work a bit differently than mandatory messages and
18 | `basic.return`. In the case of mandatory messages we only need to send
19 | a `basic.return` if the message can't be routed, but for publisher
20 | confirms we need to send an `ack` if the message was accepted by the
21 | broker and a `nack` if that's not the case. This means we need two
22 | fields in the channel's state record in order to track this
23 | information.
24 | 
25 | The first one is a
26 | [dtree](https://github.com/rabbitmq/rabbitmq-server/blob/v3.7.28/src/dtree.erl)
27 | stored in the field `unconfirmed`, which keeps track of the `MsgSeqNo`
28 | associated with the QPids to which the message was delivered and
29 | the Exchange Name used to publish the message. As explained in the
30 | _dtree_ documentation, entries on the _dual-index tree_ are stored
31 | using a primary key, a set of secondary keys, and a value. In the case
32 | of tracking unconfirmed messages we have:
33 | 
34 | - primary key: the `MsgSeqNo` assigned to the message by the channel
35 | - secondary keys: the list of queue pids where the message was routed
36 | - value: the exchange where the message was published
37 | 
38 | The second field used when dealing with confirms is called `confirmed`
39 | which keeps a list of the messages that were delivered to queues and
40 | for which the queues have taken responsibility. The only thing left to
41 | do with these messages is for the channel to send the `acks` back to
42 | the publisher. This list tracks pairs of `{MsgSeqNo, XName}`.
43 | 
44 | ## Marking Messages as Confirmed ##
45 | 
46 | Once a queue has dealt with a message (for example persisted it to
47 | disk, in the case of persistent messages), then it will send confirms
48 | back to the channel. This done by the QPid by casting the following
49 | message back to the channel:
50 | 
51 | ```erlang
52 | {confirm, MsgSeqNos, QPid}
53 | ```
54 | 
55 | The channel will deal with this message in the proper `handle_cast/2`
56 | callback where it will remove the `MsgSeqNos` from the `unconfirmed`
57 | dtree and then call `record_confirms/2`, with the passing in those
58 | `MsgSeqNos` and related exchange names that were obtained form the
59 | dtree. Keep in mind that at this point the function `record_confirms/2`
60 | is only adding the messages to the `confirmed` list. They channel
61 | still needs to send the `acks` back to the publisher.
62 | 
63 | Another place where confirms are recorded is in the function
64 | `process_routing_confirm/5`. If the message wasn't routed to any
65 | queue, then it will be immediately marked as confirmed by being added
66 | to the `confirmed` list, otherwise it will track them in the
67 | `unconfirmed` dtree until a queue confirms the message.
68 | 
69 | Finally as explained in the
70 | [deliver to queues](./deliver_to_queues.md) guide, we monitor the
71 | QPids where the message was delivered. If the monitor reports that the
72 | queue has crashed the function `handle_publishing_queue_down/3` will
73 | be called. In the case of confirms there are two cases: if the queue
74 | had an abnormal exit, then `nacks` will be sent to messages that were
75 | routed to that particular queue that just went down. If the queue had
76 | a normal exit, then messages that went to that queue will be marked as
77 | confirmed.
78 | 
79 | ## Sending Out Confirms ##
80 | 
81 | Confirms or `acks` are sent back to the publisher whenever the channel
82 | processes a request. For example, once it dealt with an AMQP method
83 | like `basic.publish`, the channel will then send confirms out, if any.
84 | 
85 | For mode details, take a look at the functions `reply/2`, `noreply/3`
86 | and `next_state/1` inside the `rabbit_channel` module.
87 | 
88 | ## Sending Out Nacks ##
89 | 
90 | Nacks are sent to publishers when a queue that should have handled the
91 | message has exited with an abnormal reason. Check the function
92 | `handle_publishing_queue_down/3` for more information.
93 | 
94 | ## Related guides and documentation ##
95 | 
96 | - [publisher confirms](https://www.rabbitmq.com/confirms.html)
97 | - [basic publish](./basic_publish.md)
98 | - [deliver to queues](./deliver_to_queues.md)
99 | 


--------------------------------------------------------------------------------
/queue_decorators.md:
--------------------------------------------------------------------------------
 1 | # Queue Decorators #
 2 | 
 3 | Queue decorators are modules implemented as behaviours that can let
 4 | you extend queues. A decorator will have a set of callbacks that will
 5 | be called from the queue process in response to some events that might
 6 | happen at the queue, like when messages are delivered to consumers,
 7 | which might cause the list of active consumers to be updated.
 8 | 
 9 | They were added to the broker as a way to handle
10 | [consumer priorities](https://www.rabbitmq.com/consumer-priority.html)
11 | or by the federation plugin, to know when to move messages across
12 | [federated queues](https://www.rabbitmq.com/federated-queues.html).
13 | 
14 | Decorators need to implement the `rabbit_queue_decorator`
15 | [behaviour](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_queue_decorator.erl)
16 | and are usually associated with queues via policies.
17 | 
18 | A Queue decorator can receive notifications of the following events:
19 | 
20 | - Queue Startup
21 | - Queue Shutdown
22 | - Consumer State Changed (active consumers)
23 | - Queue Policy Changed
24 | 


--------------------------------------------------------------------------------
/queues_and_message_store.md:
--------------------------------------------------------------------------------
  1 | This file attempts to document the overall structure of a queue, and
  2 | how persistence works.
  3 | 
  4 | Each queue is a [gen_server2 Erlang process](https://learnyousomeerlang.com/clients-and-servers). The usual pattern of the API and
  5 | implementation being in one file is not applied; `rabbit_amqqueue` is
  6 | the API (a module) and `rabbit_amqqueue_process` is the implementation (a `gen_server2`).
  7 | 
  8 | Startup
  9 | -------
 10 | 
 11 | The queue's supervisor initially starts the process as
 12 | rabbit_prequeue. This is a gen_server which determines whether the
 13 | process is an HA slave or a regular queue or master (see HA
 14 | documentation), and if so whether it is starting afresh or needs to
 15 | recover. This then uses the gen_server2 "become" mechanism to become
 16 | the correct sort of process - for this document we'll deal with
 17 | rabbit_amqqueue_process for regular queues.
 18 | 
 19 | The queue process decides for itself what it should be (rather than
 20 | having some library function that starts different types of processes)
 21 | so that it can do the right thing if it crashes and is restarted by
 22 | the supervisor - it might have been started as a master but need to
 23 | restart as a slave after crashing, for example. Or vice-versa.
 24 | 
 25 | Sub-modules
 26 | -----------
 27 | 
 28 | The queue process probably has the most code running in it of any
 29 | process; the rabbit_amqqueue_process has had various subsystems broken
 30 | out of it into separate modules over the years. The most major such
 31 | break-out is the queue implementation API, described by the
 32 | rabbit_backing_queue behaviour.
 33 | 
 34 | The aim of the code within rabbit_amqqueue_process is therefore mainly
 35 | to take the abstract queue implementation and make it support AMQPish
 36 | features, by handling consumers, implementing features like TTL and max
 37 | length in terms of lower level APIs, and coordinating everything.
 38 | 
 39 | Recently all the consumer-handling code was moved into
 40 | rabbit_queue_consumers.
 41 | 
 42 | rabbit_backing_queue
 43 | --------------------
 44 | 
 45 | The behaviour rabbit_backing_queue (BQ) implements a Rabbit-ish queue
 46 | with persistence and so on. The module rabbit_variable_queue (VQ) is
 47 | the major implementation of this behaviour.
 48 | 
 49 | This split was introduced with the "new" persister in 2.0.0. At the
 50 | time this was done so the old persister could be offered as a backup
 51 | (rabbit_invariable_queue) if serious bugs were found in the new
 52 | implementation. rabbit_invariable_queue is long gone but the mechanism
 53 | to configure an alternate module is still there. At various times
 54 | there have been proposals to provide alternate queue implementations
 55 | (using Mnesia, SQL etc) but this never came to anything. (One
 56 | rationale for optional e.g. SQL-based queues is that they would make
 57 | queue-browsing, atomic transactions and so on trivial, at the cost of
 58 | performance.)
 59 | 
 60 | The BQ behaviour had a secondary use that has turned out to be
 61 | important - it provides an API where we can insert a proxy to modify
 62 | how the queue behaves by intercepting calls and deferring to
 63 | VQ. Currently there are two such proxies: rabbit_mirror_queue_master
 64 | (see HA documentation) and rabbit_priority_queue (which implements
 65 | priority queues by providing one BQ implemented in terms of several
 66 | BQs.
 67 | 
 68 | rabbit_variable_queue
 69 | ---------------------
 70 | 
 71 | So this is the meat of the queue implementation. This implements a
 72 | queue in terms of various sub-queues, with various degrees of
 73 | paged-out-ness.
 74 | 
 75 | publish -> [q1 -> q2 -> delta -> q3 -> q4] -> consumer
 76 | 
 77 | q1 and q4 contain "alpha" messages, meaning messages are entirely
 78 | within RAM. q2 and q3 contain "beta" and "gamma" messages, meaning
 79 | they have metadata in RAM (message ID, position etc) and contents on
 80 | disk. Finally, delta messages are on disk only. Many of the subqueues
 81 | can be empty so that messages do not need to pass through all states
 82 | if the queue is short.
 83 | 
 84 | The essay at the top of rabbit_variable_queue goes into a great deal
 85 | more detail on this.
 86 | 
 87 | Most of the complexity of VQ deals with moving messages between the
 88 | various queues in an optimised way. The actual persistence is handled
 89 | by rabbit_queue_index (QI) and rabbit_msg_store.
 90 | 
 91 | rabbit_queue_index
 92 | ------------------
 93 | 
 94 | QI contains metadata that needs to be held per queue even if one
 95 | message is published to multiple queues - publication records with a
 96 | small amount of metadata, and delivery / acknowledgement record. In
 97 | 3.5.0 the QI was extended to directly handle persistence of tiny
 98 | messages to improve performance by reducing the number of I/O ops we
 99 | do. The QI exists as "segment" files containing a log of the actions
100 | which have taken place for an ordered segment (i.e. part) of the
101 | queue, and an out of order journal which we write to any time anything
102 | happens. Again, see the module for much more detail.
103 | 
104 | Note that everything as far as this part is within the main queue
105 | process.
106 | 
107 | rabbit_msg_store
108 | ----------------
109 | 
110 | #### The following note applies to versions prior to 3.7
111 | 
112 | -----------------
113 | 
114 | There are also two msg_store processes per broker - one for transient
115 | messages and one for persistent ones (the transient one can be deleted at startup).
116 | 
117 | -----------------
118 | 
119 | Since version 3.7 message stores are organised according to
120 | [per-vhost message store](#per-vhost-message-store)
121 | 
122 | -----------------
123 | 
124 | The msg_store is a disk-based reference-counting key-value store,
125 | storing messages in log-structured files. Again, see its module for
126 | more details.
127 | 
128 | If one message is published to multiple queues, they will all submit
129 | it to the message store, and the store will detect the non-first
130 | requests to record the message and just increment the reference count.
131 | 
132 | The message store is designed to allow clients (i.e. queues) to read
133 | from store files directly without calling into the message store
134 | process. Only writes go via the process. There are a number of shared
135 | ETS tables to coordinate what's going on.
136 | 
137 | We go to some effort to avoid unnecessary work. For example, the
138 | message store maintains a table of "flying writes" - writes which have
139 | been submitted by queues but not yet actioned. If a request to delete
140 | a message is enqueued before the message is actually written, the
141 | write is cancelled.
142 | 
143 | The message store needs an index, from message-id to {file, offset,
144 | etc}. This is also pluggable. The default index is implemented in ETS
145 | and so each message has an in-memory cost.
146 | 
147 | Message store index also contains reference-counters for messages
148 | and serves as a synchronization point between queues, message store process
149 | and GC process. Message store inserts new entries to the index and updates
150 | reference-counters, GC prcess updates file locations and removes entries
151 | using `delete_object`, queue processes only read entries.
152 | 
153 | Reference-counter updates, file location updates and deletes from the index
154 | should be atomic.
155 | 
156 | Message store logic assumes that lookup operations for non-existent message
157 | locations (if message is not yet written to file) are cheap.
158 | 
159 | See the [message store index behaviour module](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/rabbit_msg_store_index.erl) for more details.
160 | 
161 | The message store also needs to be garbage collected. There's an extra
162 | process for GC (so that GC can lock some files and the message store
163 | can concurrently serve from the rest). Within the message store, "GC"
164 | boils down to combining together two files, both of which are known to
165 | have over 50% messages where the ref count has gone to 0. See the
166 | `rabbit_msg_store_gc` module for more details on how that works.
167 | 
168 | 
169 | Per-vhost message store
170 | ------------------------
171 | 
172 | *Per-vhost message store was introduced in version 3.7*
173 | 
174 | ### Process structure
175 | 
176 | Since version 3.7 queues and message stores processes are grouped in
177 | supervision trees per-vhost.
178 | 
179 | The goal here is to isolate processes managing data (like queues and message stores)
180 | on different vhosts from each other.
181 | So when there is an issue in one vhost, others can function without interruptions.
182 | Vhosts that experienced errors can restart and recover their data or stay "down"
183 | for some time until an operator intervene and fix the error.
184 | 
185 | The data directories are also isolated per-vhost. Each vhost has its own data
186 | directory with all the queues and message stores in it.
187 | 
188 | The supervision tree for two vhosts and two queues per vhost would look like:
189 | 
190 | ```
191 | 
192 | rabbit_sup
193 | |
194 | |
195 | --- ...
196 | |
197 | |
198 | --- rabbit_vhost_sup_sup
199 |     |
200 |     |
201 |     --- <rabbit_vhost_sup_wrapper> - supervision tree for vhost_1
202 |     |   |
203 |     |   |
204 |     |   --- <rabbit_vhost_process>
205 |     |   |
206 |     |   |
207 |     |   --- <rabbit_vhost_sup>
208 |     |       |
209 |     |       |
210 |     |       --- <rabbit_recovery_terms>
211 |     |       |
212 |     |       |
213 |     |       --- <rabbit_msg_store> - persistent message store for vhost_1
214 |     |       |
215 |     |       |
216 |     |       --- <rabbit_msg_store> - transient message store for vhost_1
217 |     |       |
218 |     |       |
219 |     |       --- <rabbit_amqqueue_sup_sup> - supervisor to contain queues for vhost_1
220 |     |           |
221 |     |           |
222 |     |           --- <rabbit_amqqueue_sup> - vhost_1/queue_1 supervisor
223 |     |           |   |
224 |     |           |   |
225 |     |           |   <rabbit_amqqueue_process/rabbit_mirror_queue_slave> - vhost_1/queue_1 process
226 |     |           |
227 |     |           |
228 |     |           --- <rabbit_amqqueue_sup> - vhost_1/queue_2 supervisor
229 |     |               |
230 |     |               |
231 |     |               <rabbit_amqqueue_process/rabbit_mirror_queue_slave> - vhost_1/queue_2 process
232 |     |
233 |     |
234 |     --- <rabbit_vhost_sup_wrapper> - supervision tree for vhost_2
235 |         |
236 |         |
237 |         --- <rabbit_vhost_process>
238 |         |
239 |         |
240 |         --- <rabbit_vhost_sup>
241 |             |
242 |             |
243 |             --- <rabbit_recovery_terms>
244 |             |
245 |             |
246 |             --- <rabbit_msg_store> - persistent message store for vhost_2
247 |             |
248 |             |
249 |             --- <rabbit_msg_store> - transient message store for vhost_2
250 |             |
251 |             |
252 |             --- <rabbit_amqqueue_sup_sup> - supervisor to contain queues for vhost_2
253 |                 |
254 |                 |
255 |                 --- <rabbit_amqqueue_sup> - vhost_1/queue_1 supervisor
256 |                 |   |
257 |                 |   |
258 |                 |   <rabbit_amqqueue_process/rabbit_mirror_queue_slave> - vhost_1/queue_1 process
259 |                 |
260 |                 |
261 |                 --- <rabbit_amqqueue_sup> - vhost_1/queue_2 supervisor
262 |                     |
263 |                     |
264 |                     <rabbit_amqqueue_process/rabbit_mirror_queue_slave> - vhost_1/queue_2 process
265 | 
266 | ```
267 | Processes given in `<angle brackets>` are not registered. Names represent controlling modules.
268 | 
269 | As you can see, each vhost has it's own pair of message stores and all the vhost
270 | queue processes are grouped in the vhost queues supervisor (`rabbit_amqqueue_sup_sup`).
271 | 
272 | #### Recovery
273 | 
274 | If a queue process fails, it can be restored without impacting other queues.
275 | 
276 | If a message store fails, the entire vhost message store will be restarted,
277 | including both message stores and all the vhost queues.
278 | This is because of callback based publish acknowledgements, if a message store
279 | restarts and queue processes keep going, some messages can never
280 | be acknowledged.
281 | 
282 | Vhost restart process follows same recovery steps as when a node starts.
283 | 
284 | #### More about vhost processes and modules
285 | 
286 | ##### rabbit_vhost_sup_sup
287 | --------------------------
288 | 
289 | A `simple_one_for_one` supervisor. Serves as a container for vhosts.
290 | Has an API for starting and stopping vhost supervisors, retrieving a vhost supervisor
291 | by name, and checking if a vhost is alive.
292 | 
293 | Also manages an ETS table, containing an index of vhost processes.
294 | 
295 | The module is aware of the `vhost_restart_strategy` setting, which controls if a single
296 | vhost failure and inability to restart should take down the entire node.
297 | 
298 | If the `rabbit_vhost_sup_sup` supervisor crashes - the node will be shut down.
299 | 
300 | 
301 | ##### rabbit_vhost_sup_wrapper
302 | ------------------------------
303 | 
304 | An intermediate supervisor to control vhost restarts.
305 | It allows several restarts (3 in 5 minutes).
306 | 3 restarts - to handle failures in both message stores,
307 | 5 minutes - so if there is a data corruption error, there is enough time to get
308 | the error during recover, so the supervisor will not retry recoveries forever.
309 | 
310 | After max restarts it gives up with `shutdown` message, which can be interpreted
311 | by the `rabbit_vhost_sup_sup` supervisor according to configured `vhost_restart_strategy`.
312 | 
313 | The wrapper makes sure that `rabbit_vhost_sup` is started before recovery process
314 | and is empty, because recovery process will dynamically add children to `rabbit_vhost_sup`.
315 | 
316 | Should this process fail, the vhost will not be restarted. If an exit signal is
317 | not `normal` or `shutdown`, the `rabbit_vhost_sup_sup` process will crash
318 | which will take down the node.
319 | 
320 | 
321 | ##### rabbit_vhost_process
322 | --------------------------
323 | 
324 | An entity process for a vhost. It manages the vhost recovery process on start and
325 | notifies that vhost is down on terminate.
326 | 
327 | The aliveness status of this process is used to check that the vhost is "alive".
328 | 
329 | This process will also terminate the vhost supervision tree if the vhost is deleted
330 | from the database.
331 | 
332 | 
333 | ##### rabbit_vhost_sup
334 | ----------------------
335 | 
336 | A container supervisor for a vhost data store processes, such as message stores,
337 | queues and recovery terms.
338 | 
339 | The restart strategy is `one_for_all`, which will restart the vhost should any
340 | message store process fail. This will restart all the vhost queues.
341 | 
342 | Should this process crash, the vhost will be restarted (up to 3 times in 5 minutes)
343 | using recovery process.
344 | 
345 | ### Data storage
346 | 
347 | Each vhost data is stored in a separate directory.
348 | The directory name for a vhost is `<mnesia_dir>/msg_stores/vhosts/<vhost_hash>`,
349 | where `<mnesia_dir>` is a configured RabbitMQ data directory (`RABBITMQ_MNESIA_DIR` variable)
350 | and `<vhost_hash>` is a hash of the vhost name. The hash is used to comply with
351 | file name restrictions.
352 | 
353 | A vhost name hash can be generated using the `rabbit_vhost:dir/1` function.
354 | 
355 | A vhost directory path can be generated using the `rabbit_vhost:msg_store_dir_path/1` function.
356 | 
357 | Each vhost directory contains all its message stores and queues directories.
358 | 
359 | Example directory structure of a message store (with one vhost for simplicity):
360 | 
361 | ```
362 | mnesia_dir
363 | |
364 | |
365 | --- ...
366 | |
367 | |
368 | --- msg_stores
369 |     |
370 |     |
371 |     --- vhosts
372 |         |
373 |         |
374 |         --- <vhost_hash>
375 |             |
376 |             |
377 |             --- .vhost - a file, containing the vhost name
378 |             |
379 |             |
380 |             --- recovery.dets
381 |             |
382 |             |
383 |             --- msg_store_persistent - persistent message store
384 |             |   |
385 |             |   |
386 |             |   --- ... - the message store data files
387 |             |
388 |             |
389 |             --- msg_store_transient - transient message store
390 |             |   |
391 |             |   |
392 |             |   --- ...
393 |             |
394 |             |
395 |             --- queues
396 |                 |
397 |                 |
398 |                 --- <queue_name_hash>
399 |                 |   |
400 |                 |   |
401 |                 |   --- .queue_name - a file, containing the vhost and the queue name
402 |                 |   |
403 |                 |   |
404 |                 |   --- ... - the queue data files
405 |                 |
406 |                 |
407 |                 --- <queue_name_hash>
408 |                     |
409 |                     |
410 |                     --- .queue_name
411 |                     |
412 |                     |
413 |                     --- ...
414 | ```
415 | 
416 | Each vhost directory contains `.vhost` file, with a name of the vhost. The file
417 | can be used for troubleshooting, when the RabbitMQ node cannot be used to
418 | generate the vhost directory name.
419 | 
420 | Each vhost has it's own recovery DETS table.
421 | 
422 | Queue directory names are also generated using a hash function.
423 | 
424 | Each queue directory contains a `.queue_name` file with the queue and the vhost names.
425 | 


--------------------------------------------------------------------------------
/rabbit_boot_process.md:
--------------------------------------------------------------------------------
  1 | Original: https://github.com/videlalvaro/rabbit-internals/blob/master/rabbit_boot_process.md
  2 | 
  3 | ## RabbitMQ Boot Process ##
  4 | 
  5 | RabbitMQ is designed as an Erlang/OTP application which means that during start up it will be initialized as such. The function `rabbit:start/2` will be called which lives in the file `rabbit.erl` where the [application behaviour](http://erlang.org/doc/apps/kernel/application.html#Module:start-2) is implemented.
  6 | 
  7 | When RabbitMQ starts running it goes through a series of what are called __boot steps__ that take care of initializing all the core components of the broker in a specific order. The whole boot step concept is –as far as I can tell– something unique to RabbitMQ. The idea behind it is that each subsystem that forms part of RabbitMQ as a whole will declare on which other systems it depends on and if it's successfully started, which other systems it will enable. For example, there's no point in accepting client connections if the layer that routes messages to queues is not enabled.
  8 | 
  9 | The implementation is very elegant, it relies on adding custom attributes to erlang modules that declare how to start a boot step, in which boot steps it depends on and which boot steps it will enable, here's an example:
 10 | 
 11 |     -rabbit_boot_step({recovery,
 12 |                        [{description, "exchange, queue and binding recovery"},
 13 |                         {mfa,         {rabbit, recover, []}},
 14 |                         {requires,    empty_db_check},
 15 |                         {enables,     routing_ready}]}).
 16 | 
 17 | Here the step name is `recovery`, which as the description says it manages _"exchange, queue and binding recovery"_. It requires the `empty_db_check` boot step and enables the `routing_ready` boot step. As you can see, there's a `mfa` argument which specifies the `Module` `Function` and `Arguments` to call in order to start this boot step.
 18 | 
 19 | So far this seems very simple and even can make us doubt of the usefulness of such approach: why is there a need for boot steps at all? Why there isn't just a call to functions one after the other and that's it? Well, it is not that simple.
 20 | 
 21 | Boot steps can be separated into groups. A group of boot steps will enabled certain other group. For example `routing_ready` is actually enabled by many others boot steps, not just `recovery`. One of such steps is the `empty_db_check` that ensures that the Mnesia, Erlang's built-in distributed database, has the default data, like the default `guest` user for example. Also we can see that the `recovery` boot step also depends on `empty_db_check` so this logic takes care of running them in the right order that will satisfy the interdependencies they have.
 22 | 
 23 | There are boot steps that don't enable nor require others to be run. They are used to signal that a group of boot steps have happened as a whole, so the next group can start running. For example we have the external infrastructure step:
 24 | 
 25 |     {external_infrastructure,
 26 |         [{description,"external infrastructure ready"}]}
 27 | 
 28 | As you see it lacks the `requires` and the `enables` properties. But since many steps declare that their enable it, then `external_infrastructure` won't be run until those steps are run. Also many steps that come after in the chain require `external_infrastructure` to have run before, so they won't be started either until it had been processed.
 29 | 
 30 | But the story doesn't ends here. RabbitMQ can be extended with plugins that add new exchanges or authentication methods to the broker. Taking the exchanges as an example, each exchange type is registered into the broker via the `rabbit_registry` module, that means the `rabbit_registry` has to be started __before__ we can register a plugin. If we want to add new exchanges we don't have to worry about when they will be started by the broker, neither we have to care of managing the functional dependencies of our exchange. We just add a `-rabbit_boot_step` declaration to our exchange module where we say that our custom exchange depends on `rabbit_registry` et voilà, the exchange will be ready to use.
 31 | 
 32 | There's more to it too. In the same way your custom exchange can add their own boot steps to hook up into the server boot process, you can add extra boot steps that perform some stuff in between of RabbitMQ's predefined boot steps. Keep in mind that you have to know what you are doing if you are going to plug into RabbitMQ boot process.
 33 | 
 34 | Now, if you have been doing some Erlang programming you may be wondering at this point how does this even work at all. Erlang modules can have attributes, like the list of exported functions, or the declaration of which behaviour is implemented by the module, but there's no where a mention in the Erlang documentation about `boot_steps` and of course there's nothing about `-rabbit_boot_steps`. How do they work then?
 35 | 
 36 | When the broker is starting it builds a list of all the modules defined in the loaded applications. Once the list of modules is ready it's scanned for attributes called `rabbit_boot_steps`. If there are any, they are added to a new list. This list is further processed and converted into an [directed acyclic graph](http://en.wikipedia.org/wiki/Directed_acyclic_graph) which is used to maintain an order between the boot steps, that is the boot steps are ordered according to their dependencies. Here is where I think relies the elegance of this solution: add declarations to modules in the form of custom module attributes, scan for them and do something smart with the information. This speaks about the flexibility of Erlang as a language.
 37 | 
 38 | ## Individual boot steps in detail ##
 39 | 
 40 | Here's a graphic that shows the boot steps and their interconnections. An arrow from boot step __A__ to boot step __B__ means that __A__ enables __B__. A line with no arrows on both ends from __A__ to __B__ means that __A__ is required by __B__. You can open the image file in a separate window to see it [full size](http://github.com/videlalvaro/rabbit-internals/raw/master/images/boot_steps.png).
 41 | 
 42 | ![demo](http://github.com/videlalvaro/rabbit-internals/raw/master/images/boot_steps.png "Rabbit Boot Steps")
 43 | 
 44 | As we can see there the boot steps are somehow grouped. All starts at the `pre_boot` step continues at the `external_infrastructure` step and so on. Between `pre_boot` and `external_infrastructure` other steps occur that contribute to enable `external_infrastructure`. Now let's give a brief description of what happens on each of them.
 45 | 
 46 | ### pre_boot ###
 47 | 
 48 | The `pre_boot` signals the start of the boot process. After it happens RabbitMQ will start processing the other boot steps like `file_handle_cache`.
 49 | 
 50 | ### external_infrastructure ###
 51 | 
 52 | The `file_handle_cache` is used to manage file handles to synchronize reads and writes to them. See `file_handle_cache.erl` for an in depth explanation of its purpose.
 53 | 
 54 | The next step that starts is the `worker_pool`. The worker pool process manages a pool of up to `N` number of workers where `N` is the return of `erlang:system_info(schedulers)`. It's used to parallelize function calls across the pool.
 55 | 
 56 | Then the turn goes to the `database` step. This one is used to prepare the [Mnesia](http://www.erlang.org/doc/man/mnesia.html) database which is used by RabbitMQ to track exchanges meta information, users, vhosts, bindings, etc.
 57 | 
 58 | The `codec_correctness_check` is used to ensure that the AMQP binary generator is working properly, that is, that it will generate the right protocol frames.
 59 | 
 60 | Once all the previous steps have run then the `external_infrastructure` step will be processed signaling the boot process that it can continue with the following steps.
 61 | 
 62 | ### kernel_ready ###
 63 | 
 64 | Once the external infrastructure is ready RabbitMQ will proceed with booting its own kernel. The first step will be the `rabbit_registry` which keeps a registry of plugins and their modules. For example it maps authentication mechanisms to modules with the actual implementation. The same thing is done from _exchange type_ to _exchange type implementation_. This means that if a message is published to an exchange of type _direct_ the registry will be responsible of telling the broker where the routing logic for the direct exchange resides, returning the module name.
 65 | 
 66 | After the `rabbit_registry` is ready, it's time to start the authentication modules. RabbitMQ will go through each of them, starting them and making them available. Some steps here are `rabbit_auth_mechanism_amqplain`, `rabbit_auth_mechanism_plain` and so on. If there's a plugin implementing an authentication mechanism, then it will be started at this point.
 67 | 
 68 | The next step is the `rabbit_event` which handles event notification for statistics collection. For example when a new channel is created, then a notification like `rabbit_event:notify(channel_created, infos(?CREATION_EVENT_KEYS, State))` is fired.
 69 | 
 70 | Then is time for the `rabbit_log` to start which manages the logging inside RabbitMQ. This process will delegate logging calls to the native error_logger module.
 71 | 
 72 | The same procedure used to enable the authentication mechanism is now repeated for the exchanges. Steps like `rabbit_exchange_type_direct` or `rabbit_exchange_type_fanout` are executed here. If you installed plugins with custom exchange types, they will be registered at this point.
 73 | 
 74 | Now is time to run the `kernel_ready` step in order to continue initializing the core of RabbitMQ.
 75 | 
 76 | ### core_initialized ###
 77 | 
 78 | The first step of this group is the `rabbit_alarm` which starts the memory alarm handler. It will perform alarm management for different events that may happen during the broker life. For example if the memory is about to surpass the `memory_high_watermar` setting, then this module will fire an event.
 79 | 
 80 | Next is the `rabbit_node_monitor` which notifies other nodes in the cluster about its own node presence. It also takes cares of dealing with the situation of other node dying.
 81 | 
 82 | Then is the turn of the `delegate_sup` step. This supervisor will start a pool of children that will be used to parallelize calls to processes. For example when routing messages, the delegates take care of sending the messages to each of the queues that ought to receive the message.
 83 | 
 84 | The next step to be started is the `guid_generator` which as its name implies is used as a _Globally Unique Identifier Server_. This process is called for example when the server needs to generate random queue names, or consumer tags, etc.
 85 | 
 86 | Next on the list is the `rabbit_memory_monitor` which monitors queues memory usage. It will take care of flushing messages to disk when a queue reaches certain level of memory.
 87 | 
 88 | Finally the `core_initialized` step will be run and the boot step process will continue with the routing infrastructure.
 89 | 
 90 | ### routing_ready ###
 91 | 
 92 | At this stage RabbitMQ will start to fill up the Mnesia tables with information regarding the exchanges, routing and bindings. In order to do so first the step `empty_db_check` is run. This step will check that the database has the required information inside else it will insert it. At this point the default `guest` user will be created.
 93 | 
 94 | Once the database is properly setup the `recovery` step is run. This step will restart recover the bindings between queues and exchanges. At this point is where the actual queue processes are started.
 95 | 
 96 | After the queues are running the new boot steps that involve the mirrored queues will be called. Once the mirrored queues are ready the `routing_ready` step will take part and the boot step procedure will continue.
 97 | 
 98 | ### log_relay ###
 99 | 
100 | Before RabbitMQ is ready to start accepting clients is time to start the `rabbit_error_logger` which is done during the `log_relay` boot step and from here the `networking` will be ready to run.
101 | 
102 | ### networking ###
103 | 
104 | The `networking` will start all the supervisors that are concerned with the different listeners specified in the application configuration. A `tcp_listener_sup` will be started for each interface/port combination in which RabbitMQ is listening to. The SSL listeners will be started and the tcp client will be ready to accept connections.
105 | 
106 | ### direct_client ###
107 | 
108 | RabbitMQ is nearly done with the boot process. The `direct_client` step is used to start the supervisor tree that takes cares of accepting _direct client connections_. The direct client is used for AMQP connections that use the Erlang distribution protocol. Once this is finished is time to proceed to the final step.
109 | 
110 | ### notify_cluster ###
111 | 
112 | At this point RabbitMQ is ready to start munching messages. The only thing that remains to do is to notify other nodes in the cluster of it's own presence. That is accomplished via the `notify_cluster` step.
113 | 
114 | ## Summary ##
115 | 
116 | If you read this far you can see that starting an application like RabbitMQ is not an easy task. Thanks to the __boot steps__ technique the process can be managed in such a way that the interdependencies between processes can be satisfied without sacrificing sanity. What's even more impressive is that this technique can be used to extend the broker in a way that goes beyond what the original developers planed for the server.
117 | 


--------------------------------------------------------------------------------
/transactions_in_exchange_modules.md:
--------------------------------------------------------------------------------
 1 | # What are those transactions inside the exchange callback modules? #
 2 | 
 3 | Many callbacks inside the `rabbit_exchange_type` behaviour expect a
 4 | `tx()` parameter which is defined as follows:
 5 | 
 6 | ```erlang
 7 | -type(tx() :: 'transaction' | 'none').
 8 | ```
 9 | 
10 | Then for example create is defined like:
11 | 
12 | ```erlang
13 |  %% called after declaration and recovery
14 | -callback create(tx(), rabbit_types:exchange()) -> 'ok'.
15 | ```
16 | 
17 | The question is, what's the purpose of that transaction parameter?
18 | 
19 | This is related to how RabbitMQ runs Mnesia transactions for its
20 | internal bookkeeping:
21 | 
22 | [rabbit_misc:execute_mnesia_transaction/2](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit_common/src/rabbit_misc.erl#L586)
23 | 
24 | As you can see in that code there's this PrePostCommitFun which is
25 | called in Mnesia transaction context, and after the transaction has
26 | run.
27 | 
28 | So here for example: in
29 | [rabbit_exchange:declare/7](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_exchange.erl#L143)
30 | the create callback from the exchange is called inside a Mnesia
31 | transaction, and outside of afterwards.
32 | 
33 | You can see this in action/understand the usefulness of it when
34 | considering an exchange like the topic exchange which keeps track of
35 | its own data structures:
36 | 
37 | [rabbit_exchange_type_topic:delete/3](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_exchange_type_topic.erl#L49)
38 | [rabbit_exchange_type_topic:add_binding/3](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_exchange_type_topic.erl#L59)
39 | [rabbit_exchange_type_topic:remove_bindings/3](https://github.com/rabbitmq/rabbitmq-server/blob/master/deps/rabbit/src/rabbit_exchange_type_topic.erl#L64)
40 | 
41 | Deleting the exchange, adding or removing bindings, are all done
42 | inside a Mnesia transaction for consistency reasons.
43 | 


--------------------------------------------------------------------------------
/uninterupted_cluster_upgrade.md:
--------------------------------------------------------------------------------
  1 | # Uninterrupted cluster upgrade between minor releases
  2 | 
  3 | ## Status as of November 2016
  4 | 
  5 | Currently, when you want to upgrade RabbitMQ from a minor version to
  6 | the next minor version (eg. 3.5.x to 3.6.x), you need to shutdown the
  7 | entire cluster. A RabbitMQ cluster with a mix of multiple minor versions
  8 | is unsupported: that's when we introduce incompatible changes, in
  9 | particular to the Mnesia schema.
 10 | 
 11 | The same rules apply for upgrades between major versions.
 12 | 
 13 | On rare occasions, like between 3.6.5 and 3.6.6, we need to import a
 14 | breaking change and again, you must shutdown the entire cluster to
 15 | upgrade.
 16 | 
 17 | ## Plan to fix the situation
 18 | 
 19 | ### Scope
 20 | 
 21 | This project targets the following goals:
 22 | 
 23 | 1. Being able to run any minor versions from the same major branch mixed
 24 |    in the same cluster.
 25 | 2. Being able to gradually upgrade all nodes of a cluster to a later
 26 |    minor version and still benefit from the new features and bugfixes at
 27 |    the end of the process.
 28 | 
 29 | The first item is a prerequisite to the second item.
 30 | 
 31 | This project does not try to make upgrades between major versions
 32 | possible without terminating a cluster: thus, breaking changes requiring
 33 | a cluster shutdown may happen even once this project is complete.
 34 | 
 35 | ### Compatibility between brokers
 36 | 
 37 | #### Elements to consider
 38 | 
 39 | To be able to run different versions of RabbitMQ inside a single
 40 | cluster, all nodes must understand and emit data in a common format.
 41 | 
 42 | Here are the elements shared or exchanged by nodes:
 43 | 
 44 | * record definitions; They are the building blocks of inter-process
 45 |   communication and the Mnesia schema below;
 46 | * the Mnesia schema;
 47 | * messages exchanged between nodes;
 48 | * plugins ABI.
 49 | 
 50 | Today, we can't achieve compatibility because when we need to eg.
 51 | expand a record, we add a new field, which makes it incompatible with
 52 | a previous version of that record. A plugin must even be recompiled
 53 | against the new record definition to work again.
 54 | 
 55 | Therefore, records and general messages must be redesigned to be
 56 | extensible without breaking compatibility.
 57 | 
 58 | #### Extensible and backward-compatible records
 59 | 
 60 | We need to rethink how our records are designed to:
 61 | 
 62 | * allow modifications of a record without breaking a table schema
 63 |   (if there is one associated);
 64 | * permit the use of old and new records inside old and new code.
 65 | 
 66 | We can use an Erlang map inside a record to allow extensibility,
 67 | yet retaining backward compatibility: new keys would hold new
 68 | informations, while old key could still exist. And using maps
 69 | still allows pattern matching.
 70 | 
 71 | Here is an example with the `#amqqueue` record from the 3.6.x branch,
 72 | focused on the lists of queue slaves:
 73 | 
 74 | ```erlang
 75 | #amqqueue{
 76 |   name,
 77 |   slaves = [],
 78 |   sync_slaves = []
 79 | }.
 80 | ```
 81 | 
 82 | In 3.7.x, we add another list related to queue slaves:
 83 | 
 84 | ```erlang
 85 | #amqqueue{
 86 |   name,
 87 |   slaves = [],
 88 |   sync_slaves = [],
 89 |   slaves_pending_shutdown = []
 90 | }.
 91 | ```
 92 | 
 93 | And we already know we may need yet another list to track slaves pending
 94 | startup.
 95 | 
 96 | Instead, we could have used an extensible record in the 3.6.x branch of the form:
 97 | 
 98 | ```erlang
 99 | #amqqueue{
100 |   name,
101 |   features = #{
102 |     slaves_list => #{ % The value could be anything: a record, a map, ...
103 |       slaves => [],
104 |       sync_slaves => []
105 |     }
106 |   }
107 | }.
108 | ```
109 | 
110 | And in 3.7.x, it could have been extended like this:
111 | 
112 | ```erlang
113 | #amqqueue{
114 |   name,
115 |   features = #{
116 |     slaves_list => #{
117 |       slaves => [],
118 |       sync_slaves => [],
119 |       slaves_pending_shutdown = []
120 |     }
121 |     % 'slaves_list' could exist, if the record was converted for
122 |     % instance, but would have a lower precedence.
123 |   }
124 | }.
125 | ```
126 | 
127 | This particular change fits the existing map so we don't need to
128 | introduce a new map. However the new code must support the absence of
129 | the `slaves_pending_shutdown` field and the old code must not trip up on
130 | this unknown field.
131 | 
132 | If in 3.7.x, the representation of slaves would have required a complete
133 | revamp, a new map could be introduced:
134 | 
135 | ```erlang
136 | #amqqueue{
137 |   name,
138 |   features = #{
139 |     slaves_list_v2 => #{
140 |       rabbit@host1 => #{
141 |         pid = Pid,
142 |         synced = true,
143 |         state = ready
144 |       },
145 |       rabbit@host2 => #{
146 |         pid = Pid,
147 |         synced = false,
148 |         state = pending_startup
149 |       }
150 |     }
151 |     % 'slaves_list' could exist, if the record was converted for
152 |     % instance, but would have a lower precedence.
153 |   }
154 | }.
155 | ```
156 | 
157 | In 3.6.x, the code would look for `slaves_list` in the `features` map.
158 | In 3.7.x, the code would look for `slaves_list_v2` and fallback on
159 | `slaves_list` if the former key is missing (meaning it's an older
160 | record):
161 | 
162 | ```erlang
163 | do_things(#amqqueue{features = #{slaves_list_v2 := Slaves}}) ->
164 |     % Do things with the new format of slaves list.
165 |     really_do_things(Slaves);
166 | do_things(#amqqueue{features = #{slaves_list := Slaves}}) ->
167 |     % Convert the old format of slaves list and do things with it.
168 |     Slaves1 = % ...
169 |     really_do_things(Slaves).
170 | ```
171 | 
172 | In the end, not matter the complexity of the change in this case, the
173 | record is unchanged and thus, the Mnesia table schema remains the same.
174 | 
175 | #### Feature flags
176 | 
177 | Because we want to have different versions of RabbitMQ in the same
178 | cluster, new nodes must not produce new records while older nodes are
179 | still around.
180 | 
181 | Just looking at the version of running nodes is not enough either
182 | because there could be stopped nodes. Furthermore, if new code is
183 | backported to an older release for whatever reason, a new record could
184 | be supported by a non-contiguous set of versions.
185 | 
186 | Other projects such as ZFS resolve that by using *feature flags*. A
187 | given version of ZFS code has support for a certain list of features and
188 | a filesystem has a list of features enabled. A ZFS implementation can
189 | look at the features enabled on a particular filesystem:
190 | 
191 | * If a feature supported by the implementation is disabled, it continues
192 |   to use the old format when writing data.
193 | * If a feature enabled on the filesystem is not supported by the
194 |   implementation, it refuses to mount the filesystem.
195 | 
196 | The user is responsible for enabling features when he is sure he won't
197 | have to mount a filesystem with an older implementation.
198 | 
199 | We can use the same principle with RabbitMQ. Each version comes with a
200 | list of supported "features" and the list of enabled features is stored
201 | in Mnesia.
202 | 
203 | When a new node starts, it looks at the enabled features. If it doesn't
204 | support one of them, it refuses to boot. If it supports more features
205 | than enabled, it makes sure to never produce data which would rely on
206 | disabled features because other nodes might not support that or the user
207 | may want to rollback to an older version.
208 | 
209 | The user can enable new features when the cluster is ready. All nodes
210 | must support the new feature to allow it to be enabled. If that is not
211 | the case, the feature is not enabled.
212 | 
213 | Once new features are enabled, nodes can produce newer data. At this
214 | point, it means old and new data are in flight (eg. in queues in
215 | memory). That's why the code must still support both formats/messages.
216 | 
217 | If we take the same `#amqqueue` record example:
218 | 
219 | * RabbitMQ 3.6.x would have the following feature flags:
220 | 
221 |  ```erlang
222 | SuportedFeatures = [
223 |       amqqueue_slave_list
224 | ].
225 | ```
226 | 
227 | * RabbitMQ 3.7.x would have the following feature flags:
228 | 
229 |  ```erlang
230 | SuppoertedFeatures = [
231 |       amqqueue_slave_list,
232 |       amqqueue_slave_list_v2
233 | ].
234 | ```
235 | 
236 | * Initially, only the `amqqueue_slave_list` feature would be enabled,
237 |   while the cluster is running RabbitMQ 3.6.x.
238 | 
239 | After RabbitMQ is upgraded from 3.6.x to 3.7.0, it would not produce
240 | `#amqqueue` records with the `slave_list_v2` map entry yet. It would
241 | continue to use the old `slave_list` map entry. However, once the
242 | feature is enabled, it would produce the new entry, while still keeping
243 | support for the old entry which might still be in flight.
244 | 
245 | #### When to get rid of old data support
246 | 
247 | The RabbitMQ code must keep old code around for the entire life of the
248 | major branch.
249 | 
250 | In the next major branch, we may remove old code because mixing major
251 | versions remains unsupported.
252 | 
253 | However, all features must remain in the list of supported features,
254 | even if the code to handle them disappeared. This is because enabled
255 | features in an existing cluster must be present in the list of supported
256 | features.
257 | 
258 | #### Note about the performance
259 | 
260 | Checking feature flags in Mnesia for every operations would be
261 | expensive. A possible solution is to cache the information in the
262 | process, either in the state or the dictionary. Then once an operator
263 | decides to enable features, processes could be notified so they refresh
264 | their cached list of enabled features.
265 | 
266 | ### Upgrading a cluster
267 | 
268 | With backward-compatible code, extensible records and feature flags in
269 | place, upgrading a cluster would consist of:
270 | 
271 | 1. Installing the new version of RabbitMQ on all nodes. This means:
272 | 
273 |  * stop the broker
274 |  * install the new version
275 |  * restart the broker
276 | 
277 |  At this point, no new feature flag is enabled. The user benefits from
278 |  the changes which do not depend on new features only. He can decide to
279 |  rollback to a previous version of RabbitMQ.
280 | 
281 | 2. Once all nodes are running the latest code, the user can enable new
282 |  features.
283 | 
284 |  He can decide to enable all of them at once or just one. This is useful
285 |  if he hits a problem fixed by a particular feature and wants to verify
286 |  the issue is actually solved. This can be useful to us too during
287 |  development.
288 | 
289 |  Brokers produce and exchange new records, while still handling old
290 |  ones. Rollback is not possible anymore.
291 | 
292 | ### Working with breaking changes
293 | 
294 | #### From a developer point of view
295 | 
296 | When working on the core of RabbitMQ or a plugin, wether it is a tier-1
297 | plugin or not, a developer must be rigourous about changing shared and
298 | exchanged data format. When he wants to introduce a breaking change, he
299 | will have to:
300 | 
301 | * add and document a new feature flag;
302 | * update the code so old and new formats are both handled.
303 | 
304 | The code snippets above give an overview of that.
305 | 
306 | Specifically for plugins, they may come with a list of feature flags
307 | they require to run. This could be in addition to or instead of the
308 | RabbitMQ version check.
309 | 
310 | > **TODO:** The implementation details remain to be designed.
311 | 
312 | #### From an operator point of view
313 | 
314 | An operator will need commands to manage features flags during a cluster
315 | upgrade:
316 | 
317 | * list feature flags;
318 | * get information about a feature flag;
319 | * enable one, many or all feature flags.
320 | 
321 | When listing feature flags or querying informations about them, the
322 | following elements will be of interest:
323 | * the name of the feature;
324 | * a description of the change;
325 | * is the feature is enabled;
326 | * (optional) a list of RabbitMQ versions where the feature flag was
327 |   introduced;
328 | 
329 | > **TODO:** The implementation details remain to be designed.
330 | 
331 | ## Future out-of-scope ideas
332 | 
333 | ### Downgrading a node
334 | 
335 | Downgrading a node means that features not supported by the targetted old
336 | version must be disabled. Disabling a feature means that:
337 | 
338 | * Nodes needs to produce old record again.
339 | * New records still in flight need to be converted back to their old
340 |   version.
341 | 
342 | The latter point would need code to find in-flight data and convert it.
343 | Once this is done, the old version of RabbitMQ can be deployed exactly
344 | like the new one was installed. Thus the order of operations is simply
345 | reversed.
346 | 
347 | Obivously the difficulty is in the "find and convert data" part. This
348 | would be impossible for certain features.
349 | 


--------------------------------------------------------------------------------
/variable_queue.md:
--------------------------------------------------------------------------------
  1 | # Variable Queue #
  2 | 
  3 | ## Publishing messages ##
  4 | 
  5 | When a message is published to the queue, the first thing we have to
  6 | do is to determine if the message is persistent. We track this
  7 | information in the `msg_status` record:
  8 | 
  9 | ```erlang
 10 | -record(msg_status,
 11 |         { seq_id,
 12 |           msg_id,
 13 |           msg,
 14 |           is_persistent,
 15 |           is_delivered,
 16 |           msg_in_store,
 17 |           index_on_disk,
 18 |           persist_to,
 19 |           msg_props
 20 |         }).
 21 | ```
 22 | 
 23 | Message statuses are kept in a record where the `is_persistent` field
 24 | is set to `true` if the queue is durable and the message was published
 25 | as persistent:
 26 | 
 27 | ```erlang
 28 | is_persistent = IsDurable andalso IsPersistent
 29 | ```
 30 | 
 31 | If it was determined that the message needs persistence, then it will
 32 | be immediately written to disk, either to the message store or the 
 33 | queue index, depending on the message size (see
 34 | `queue_index_embed_msgs_below`).
 35 | 
 36 | Internally the `variable_queue` keeps messages on four `queue` data
 37 | structures. They are a variation of erlang's _queue_ module, but which
 38 | some extensions that allow getting the queue length in constant
 39 | time. These four queues are identified on the variable queue state as
 40 | `q1`, `q2`, `q3` and `q4`. The need for these four queues becomes
 41 | apparent once disk paging is taken into account.
 42 | 
 43 | `q4` keeps track of the oldest messages, that is, those at the front
 44 | of the queue, those that will be delivered earlier to consumers.
 45 | 
 46 | `q3` only has messages when there has been some disk paging due to
 47 | memory pressure, or if we have a queue that has recovered contents
 48 | from disk, due to a broker restart for instance. This means that
 49 | messages that once were in `q4` only, now have had their content
 50 | pushed to disk, and their references are now kept in `q3`. So when a
 51 | message arrives into the variable queue, we need to determine if the
 52 | message needs to be inserted at the back of `q4`, or somewhere else.
 53 | 
 54 | If `q3` is empty, this means we haven't paged queue contents to disk,
 55 | so the messages at the front of the queue are still in `q4`, and the
 56 | last message arriving to the queue is still in `q4` as well. So new
 57 | messages can be inserted at the back of `q4`. Now, if `q3` has
 58 | messages in it, this means at some point we have paged to disk, so
 59 | some messages that were at the rear of `q4` are in `q3` now. This
 60 | means a new message _can't_ be inserted into `q4`, otherwise we will
 61 | lose message ordering; therefore, if `q3` has messages, new messages
 62 | go into `q1`.
 63 | 
 64 | ```erlang
 65 | case ?QUEUE:is_empty(Q3) of
 66 |     false -> State1 #vqstate { q1 = ?QUEUE:in(m(MsgStatus1), Q1) };
 67 |     true  -> State1 #vqstate { q4 = ?QUEUE:in(m(MsgStatus1), Q4) }
 68 | end,
 69 | ```
 70 | 
 71 | ## Fetching Messages ##
 72 | 
 73 | Messages are fetched by calling `queue_out/1`, which retrieves
 74 | messages from `q4` or, if `q4` is empty then they are retrieved from
 75 | `q3`.
 76 | 
 77 | For `q3` to have messages, it means that at some point messages were
 78 | paged to disk due to memory pressure which led to
 79 | `push_alphas_to_betas/2` being called. Another way for `q3` to get
 80 | messages is when we load messages from disk into memory, for example
 81 | when `maybe_deltas_to_betas/1` is called; this can happen when we are
 82 | recovering queue contents during queue initialization, or also when we
 83 | try to load more messages from disk so they can be delivered to
 84 | clients.
 85 | 
 86 | When there are no more messages in `q4`, `fetch_from_q3/1` is called
 87 | trying to obtain messages from `q3`. If `q3` is empty, then the queue
 88 | must be empty. Remember that if `q3` wasn't empty, then new messages
 89 | arriving into the queue were put into `q1`.
 90 | 
 91 | If `q3` wasn't empty, but the message fetched was the last one there,
 92 | we must see if we need to load more messages from disk, or if we need
 93 | to migrate `q1` messages into `q4`.
 94 | 
 95 | So let's say we fetched the last message from `q3` and we know there
 96 | are no more messages on disk (delta count = 0). If there were new
 97 | publishes while `q3` had messages, those messages are in `q4`, so we
 98 | need to move messages from `q1` into `q4`. Why? Remember that during
 99 | publishing messages are queued into `q4` when `q3` is empty, otherwise
100 | they go into `q1`. Imagine that we were publishing messages into `q4`,
101 | then at some point we had to `push_alphas_to_betas`, which means some
102 | `q4` messages were moved into `q3`. Now that `q3` has some messages,
103 | _new_ messages are put into `q1`, but from the point of view for an
104 | external user of the backing queue, `q3` messages come first than
105 | those in `q1`, i.e.: they are at the front of the queue. So when there
106 | are no more messages in `q3`, we can start consuming those in `q1`,
107 | but since `queue_out/1` only fetches messages from `q4`, we move `q1`
108 | contents there.
109 | 
110 | Now let's say there were remaining messages on disk, instead of moving
111 | messages from `q1` into `q4`, we have to load messages from disk into
112 | `q3`. This is accomplished by calling `maybe_deltas_to_betas/1`.
113 | 
114 | `maybe_deltas_to_betas/1` reads the index and then depending on what
115 | it finds there, it loads messages at the rear of `q3`. If there are no
116 | more messages on disk, then those messages that are in `q2` are moved
117 | to the rear of `q3`. When we look into message paging we will see why
118 | only when there are no more paged messages, we move `q2` into `q3`,
119 | and why `q2` messages go to the back of `q3`. For now keep in mind
120 | that all this message shuffling is done to ensure message ordering
121 | from an external observer's point of view. `q2` only has messages if
122 | `q1` had messages before, and `q1` pushes messages to `q2` only when
123 | some messages have been paged before, so whatever is on disk, comes
124 | before than whatever is on `q2`.
125 | 
126 | Remember, messages in `q1` are recently published messages that went
127 | there because `q3` had messages, so those are the last ones we should
128 | deliver.
129 | 
130 | ## Paging messages to disk ##
131 | 
132 | Disk paging starts when the function `reduce_memory_use/1` is
133 | called. This function calls `push_alphas_to_betas/2` to start sending
134 | messages to disk. _Alpha_ messages, are those messages where the
135 | contents and the queue position are held on RAM, and we are trying to
136 | convert them to _betas_, i.e.: messages where we still keep the
137 | position of the message in RAM, but we send the contents to disk.
138 | 
139 | When paging to disk we try to first page those messages that are going
140 | to be delivered later, those that from a client point of view are at
141 | the rear of the queue, so we start with `q1`'s contents. If there are
142 | messages on disk (because we have paged out queue contents already, or
143 | because we didn't load all the queue contents in memory), then
144 | messages are moved from `q1` into `q2`, otherwise they go into
145 | `q3`. Then we move messages from `q4` into `q3`.
146 | 
147 | Keep in mind that we move messages based on a _quota_ that's
148 | calculated taking into account how messages are in ram vs how many
149 | messages `set_ram_duration_target/2` decided that need to be paged out
150 | to disk. If after moving messages from _alpha_ into _beta_ this quota
151 | hasn't been consumed entirely, we have to then push messages from
152 | _beta_ to _delta_. _Delta_ messages are those messages where the
153 | message content and their position are only held on disk.
154 | 
155 | To page to disk those messages whose position in the queue was still
156 | on RAM we call `push_betas_to_deltas/2`. We first page messages from
157 | `q3` into disk, and then we page messages from `q2`, but there's a
158 | catch. Keep in mind that we might not want to page every single
159 | message out to disk. `q3` holds messages that are in front of the
160 | queue compared to those in `q2`, so `q3` messages are paged in reverse
161 | order, that is, those messages at the rear of `q3` are sent to disk
162 | first, with the idea that if the quota of messages that need paging is
163 | reached, then we will keep in RAM messages that will be sent soon to
164 | clients.
165 | 
166 | ## Example ##
167 | 
168 | **publish msgs `[1, 2, 3, 4, 5, 6, 7, 8, 9]`**:
169 | 
170 | ```
171 | Q4: [1, 2, 3, 4, 5, 6, 7, 8, 9]
172 | 
173 | Q3: []
174 | 
175 | Q2: []
176 | 
177 | Q1: []
178 | 
179 | Delta: []
180 | ```
181 | 
182 | **publish msgs `[10]`**:
183 | 
184 | ```
185 | q4: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
186 | 
187 | Q3: []
188 | 
189 | Q2: []
190 | 
191 | Q1: []
192 | 
193 | Delta: []
194 | ```
195 | 
196 | **push_alphas_to_betas**:
197 | 
198 | ```
199 | Q4: [1, 2, 3, 4, 5, 6, 7]
200 | 
201 | Q3: [8, 9, 10]
202 | 
203 | Q2: []
204 | 
205 | Q1: []
206 | 
207 | Delta: []
208 | ```
209 | 
210 | **publish msgs `[11, 12, 13, 14, 15]`**:
211 | 
212 | ```
213 | Q4: [1, 2, 3, 4, 5, 6, 7]
214 | 
215 | Q3: [8, 9, 10]
216 | 
217 | Q2: []
218 | 
219 | Q1: [11, 12, 13, 14, 15]
220 | 
221 | Delta: []
222 | ```
223 | 
224 | **push_alphas_to_betas**:
225 | 
226 | ```
227 | Q4: [1, 2, 3, 4]
228 | 
229 | Q3: [5, 6, 7, 8, 9, 10, 11, 12, 13]
230 | 
231 | Q2: []
232 | 
233 | Q1: [14, 15]
234 | 
235 | Delta: []
236 | ```
237 | 
238 | **push_betas_to_deltas**:
239 | 
240 | ```
241 | Q4: [1, 2, 3, 4]
242 | 
243 | Q3: [5, 6, 7, 8, 9]
244 | 
245 | Q2: []
246 | 
247 | Q1: [14, 15]
248 | 
249 | Delta: [10, 11, 12, 13]
250 | ```
251 | 
252 | **publish msgs `[16, 17, 18, 19, 20]`**:
253 | 
254 | ```
255 | Q4: [1, 2, 3, 4]
256 | 
257 | Q3: [5, 6, 7, 8, 9]
258 | 
259 | Q2: []
260 | 
261 | Q1: [14, 15, 16, 17, 18, 19, 20]
262 | 
263 | Delta: [10, 11, 12, 13]
264 | ```
265 | 
266 | **push_alphas_to_betas**:
267 | 
268 | ```
269 | Q4: [1]
270 | 
271 | Q3: [2, 3, 4, 5, 6, 7, 8, 9]
272 | 
273 | Q2: [14, 15, 16, 17]
274 | 
275 | Q1: [18, 19, 20]
276 | 
277 | Delta: [10, 11, 12, 13]
278 | ```
279 | 
280 | **fetch 3 messages**:
281 | 
282 | ```
283 | Q4: []
284 | 
285 | Q3: [4, 5, 6, 7, 8, 9]
286 | 
287 | Q2: [14, 15, 16, 17]
288 | 
289 | Q1: [18, 19, 20]
290 | 
291 | Delta: [10, 11, 12, 13]
292 | ```
293 | 
294 | **fetch 5 messages**:
295 | 
296 | ```
297 | Q4: []
298 | 
299 | Q3: [9]
300 | 
301 | Q2: [14, 15, 16, 17]
302 | 
303 | Q1: [18, 19, 20]
304 | 
305 | Delta: [10, 11, 12, 13]
306 | ```
307 | 
308 | **fetch 1 message**:
309 | 
310 | ```
311 | Q4: []
312 | 
313 | Q3: []
314 | 
315 | Q2: [14, 15, 16, 17]
316 | 
317 | Q1: [18, 19, 20]
318 | 
319 | Delta: [10, 11, 12, 13]
320 | ```
321 | 
322 | **Q3 became empty, but we have msgs on disk/delta, so we call
323 | maybe_deltas_to_betas to load messages from delta into Q3**:
324 | 
325 | ```
326 | Q4: []
327 | 
328 | Q3: [10, 11, 12, 13]
329 | 
330 | Q2: [14, 15, 16, 17]
331 | 
332 | Q1: [18, 19, 20]
333 | 
334 | Delta: []
335 | ```
336 | 
337 | **maybe_deltas_to_betas saw that now there are no more messages on
338 | disk, so it joins Q3 with Q2, Q2 messages going to the rear of Q3**:
339 | 
340 | ```
341 | Q4: []
342 | 
343 | Q3: [10, 11, 12, 13, 14, 15, 16, 17]
344 | 
345 | Q2: []
346 | 
347 | Q1: [18, 19, 20]
348 | 
349 | Delta: []
350 | ```
351 | 
352 | **publish msgs `[21, 22, 23, 24, 25]`**:
353 | 
354 | ```
355 | Q4: []
356 | 
357 | Q3: [10, 11, 12, 13, 14, 15, 16, 17]
358 | 
359 | Q2: []
360 | 
361 | Q1: [18, 19, 20, 21, 22, 23, 24, 25]
362 | 
363 | Delta: []
364 | ```
365 | 
366 | **fetch 8 messages**:
367 | 
368 | ```
369 | Q4: []
370 | 
371 | Q3: []
372 | 
373 | Q2: []
374 | 
375 | Q1: [18, 19, 20, 21, 22, 23, 24, 25]
376 | 
377 | Delta: []
378 | ```
379 | 
380 | **Q3 is empty and Delta is empty as well, time to move Q1 messages into
381 | Q4**:
382 | 
383 | ```
384 | Q4: [18, 19, 20, 21, 22, 23, 24, 25]
385 | 
386 | Q3: []
387 | 
388 | Q2: []
389 | 
390 | Q1: []
391 | 
392 | Delta: []
393 | ```
394 | 
395 | **fetch 1 message**:
396 | 
397 | ```
398 | Q4: [19, 20, 21, 22, 23, 24, 25]
399 | 
400 | Q3: []
401 | 
402 | Q2: []
403 | 
404 | Q1: []
405 | 
406 | Delta: []
407 | ```
408 | 
409 | and so on.
410 | 


--------------------------------------------------------------------------------