├── Makefile
├── readme.md
├── LICENSE
└── tcp_nanqinlang.c
/Makefile:
--------------------------------------------------------------------------------
1 | obj-m := tcp_nanqinlang.o
2 |
3 | all:
4 | make -C /lib/modules/`uname -r`/build M=`pwd` modules CC=/usr/bin/gcc-6
5 |
6 | clean:
7 | make -C /lib/modules/`uname -r`/build M=`pwd` clean
8 |
9 | install:
10 | install tcp_nanqinlang.ko /lib/modules/`uname -r`/kernel/net/ipv4
11 | insmod /lib/modules/`uname -r`/kernel/net/ipv4/tcp_nanqinlang.ko
12 | depmod -a
13 |
14 | uninstall:
15 | rm /lib/modules/`uname -r`/kernel/net/ipv4/tcp_nanqinlang.ko
--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
1 | # tcp_nanqinlang
2 |
3 | [](https://github.com/tcp-nanqinlang/tested)
4 | [](https://github.com/tcp-nanqinlang/tested)
5 | [](https://github.com/tcp-nanqinlang/tested)
6 | [](https://github.com/tcp-nanqinlang/tested)
7 |
8 | `super-powered-testing branch` !
9 |
10 | As this will, `this repo is just for testing`, please do not use it with important environment.
11 |
12 | ## manual
13 | ### requirements
14 | the bbr source file only support for `Ubuntu kernel v4.9.3-v4.12.x`
15 |
16 | the Makefile using `gcc-6`, you can modify it to gcc-4.9, etc.
17 |
18 | ### usage
19 | this repo gives you a source file and Makefile
20 |
21 | After you ensure you have a environment with essential requirements, you should run this followings then:
22 | ```bash
23 | make
24 | make install
25 | ```
26 |
27 | If you have no a environment, you should build that.
28 | via: https://sometimesnaive.org/article/38
29 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU General Public License is a free, copyleft license for
11 | software and other kinds of works.
12 |
13 | The licenses for most software and other practical works are designed
14 | to take away your freedom to share and change the works. By contrast,
15 | the GNU General Public License is intended to guarantee your freedom to
16 | share and change all versions of a program--to make sure it remains free
17 | software for all its users. We, the Free Software Foundation, use the
18 | GNU General Public License for most of our software; it applies also to
19 | any other work released this way by its authors. You can apply it to
20 | your programs, too.
21 |
22 | When we speak of free software, we are referring to freedom, not
23 | price. Our General Public Licenses are designed to make sure that you
24 | have the freedom to distribute copies of free software (and charge for
25 | them if you wish), that you receive source code or can get it if you
26 | want it, that you can change the software or use pieces of it in new
27 | free programs, and that you know you can do these things.
28 |
29 | To protect your rights, we need to prevent others from denying you
30 | these rights or asking you to surrender the rights. Therefore, you have
31 | certain responsibilities if you distribute copies of the software, or if
32 | you modify it: responsibilities to respect the freedom of others.
33 |
34 | For example, if you distribute copies of such a program, whether
35 | gratis or for a fee, you must pass on to the recipients the same
36 | freedoms that you received. You must make sure that they, too, receive
37 | or can get the source code. And you must show them these terms so they
38 | know their rights.
39 |
40 | Developers that use the GNU GPL protect your rights with two steps:
41 | (1) assert copyright on the software, and (2) offer you this License
42 | giving you legal permission to copy, distribute and/or modify it.
43 |
44 | For the developers' and authors' protection, the GPL clearly explains
45 | that there is no warranty for this free software. For both users' and
46 | authors' sake, the GPL requires that modified versions be marked as
47 | changed, so that their problems will not be attributed erroneously to
48 | authors of previous versions.
49 |
50 | Some devices are designed to deny users access to install or run
51 | modified versions of the software inside them, although the manufacturer
52 | can do so. This is fundamentally incompatible with the aim of
53 | protecting users' freedom to change the software. The systematic
54 | pattern of such abuse occurs in the area of products for individuals to
55 | use, which is precisely where it is most unacceptable. Therefore, we
56 | have designed this version of the GPL to prohibit the practice for those
57 | products. If such problems arise substantially in other domains, we
58 | stand ready to extend this provision to those domains in future versions
59 | of the GPL, as needed to protect the freedom of users.
60 |
61 | Finally, every program is threatened constantly by software patents.
62 | States should not allow patents to restrict development and use of
63 | software on general-purpose computers, but in those that do, we wish to
64 | avoid the special danger that patents applied to a free program could
65 | make it effectively proprietary. To prevent this, the GPL assures that
66 | patents cannot be used to render the program non-free.
67 |
68 | The precise terms and conditions for copying, distribution and
69 | modification follow.
70 |
71 | TERMS AND CONDITIONS
72 |
73 | 0. Definitions.
74 |
75 | "This License" refers to version 3 of the GNU General Public License.
76 |
77 | "Copyright" also means copyright-like laws that apply to other kinds of
78 | works, such as semiconductor masks.
79 |
80 | "The Program" refers to any copyrightable work licensed under this
81 | License. Each licensee is addressed as "you". "Licensees" and
82 | "recipients" may be individuals or organizations.
83 |
84 | To "modify" a work means to copy from or adapt all or part of the work
85 | in a fashion requiring copyright permission, other than the making of an
86 | exact copy. The resulting work is called a "modified version" of the
87 | earlier work or a work "based on" the earlier work.
88 |
89 | A "covered work" means either the unmodified Program or a work based
90 | on the Program.
91 |
92 | To "propagate" a work means to do anything with it that, without
93 | permission, would make you directly or secondarily liable for
94 | infringement under applicable copyright law, except executing it on a
95 | computer or modifying a private copy. Propagation includes copying,
96 | distribution (with or without modification), making available to the
97 | public, and in some countries other activities as well.
98 |
99 | To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies. Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 |
103 | An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License. If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 |
112 | 1. Source Code.
113 |
114 | The "source code" for a work means the preferred form of the work
115 | for making modifications to it. "Object code" means any non-source
116 | form of a work.
117 |
118 | A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 |
123 | The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form. A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 |
134 | The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities. However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work. For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 |
147 | The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 |
151 | The Corresponding Source for a work in source code form is that
152 | same work.
153 |
154 | 2. Basic Permissions.
155 |
156 | All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met. This License explicitly affirms your unlimited
159 | permission to run the unmodified Program. The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work. This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 |
164 | You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force. You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright. Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 |
175 | Conveying under any other circumstances is permitted solely under
176 | the conditions stated below. Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 |
179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 |
181 | No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 |
187 | When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 |
195 | 4. Conveying Verbatim Copies.
196 |
197 | You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 |
205 | You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 |
208 | 5. Conveying Modified Source Versions.
209 |
210 | You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 |
214 | a) The work must carry prominent notices stating that you modified
215 | it, and giving a relevant date.
216 |
217 | b) The work must carry prominent notices stating that it is
218 | released under this License and any conditions added under section
219 | 7. This requirement modifies the requirement in section 4 to
220 | "keep intact all notices".
221 |
222 | c) You must license the entire work, as a whole, under this
223 | License to anyone who comes into possession of a copy. This
224 | License will therefore apply, along with any applicable section 7
225 | additional terms, to the whole of the work, and all its parts,
226 | regardless of how they are packaged. This License gives no
227 | permission to license the work in any other way, but it does not
228 | invalidate such permission if you have separately received it.
229 |
230 | d) If the work has interactive user interfaces, each must display
231 | Appropriate Legal Notices; however, if the Program has interactive
232 | interfaces that do not display Appropriate Legal Notices, your
233 | work need not make them do so.
234 |
235 | A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit. Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 |
245 | 6. Conveying Non-Source Forms.
246 |
247 | You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 |
252 | a) Convey the object code in, or embodied in, a physical product
253 | (including a physical distribution medium), accompanied by the
254 | Corresponding Source fixed on a durable physical medium
255 | customarily used for software interchange.
256 |
257 | b) Convey the object code in, or embodied in, a physical product
258 | (including a physical distribution medium), accompanied by a
259 | written offer, valid for at least three years and valid for as
260 | long as you offer spare parts or customer support for that product
261 | model, to give anyone who possesses the object code either (1) a
262 | copy of the Corresponding Source for all the software in the
263 | product that is covered by this License, on a durable physical
264 | medium customarily used for software interchange, for a price no
265 | more than your reasonable cost of physically performing this
266 | conveying of source, or (2) access to copy the
267 | Corresponding Source from a network server at no charge.
268 |
269 | c) Convey individual copies of the object code with a copy of the
270 | written offer to provide the Corresponding Source. This
271 | alternative is allowed only occasionally and noncommercially, and
272 | only if you received the object code with such an offer, in accord
273 | with subsection 6b.
274 |
275 | d) Convey the object code by offering access from a designated
276 | place (gratis or for a charge), and offer equivalent access to the
277 | Corresponding Source in the same way through the same place at no
278 | further charge. You need not require recipients to copy the
279 | Corresponding Source along with the object code. If the place to
280 | copy the object code is a network server, the Corresponding Source
281 | may be on a different server (operated by you or a third party)
282 | that supports equivalent copying facilities, provided you maintain
283 | clear directions next to the object code saying where to find the
284 | Corresponding Source. Regardless of what server hosts the
285 | Corresponding Source, you remain obligated to ensure that it is
286 | available for as long as needed to satisfy these requirements.
287 |
288 | e) Convey the object code using peer-to-peer transmission, provided
289 | you inform other peers where the object code and Corresponding
290 | Source of the work are being offered to the general public at no
291 | charge under subsection 6d.
292 |
293 | A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 |
297 | A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling. In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage. For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product. A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 |
310 | "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source. The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 |
318 | If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information. But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 |
329 | The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed. Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 |
337 | Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 |
343 | 7. Additional Terms.
344 |
345 | "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law. If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 |
354 | When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it. (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.) You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 |
361 | Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 |
365 | a) Disclaiming warranty or limiting liability differently from the
366 | terms of sections 15 and 16 of this License; or
367 |
368 | b) Requiring preservation of specified reasonable legal notices or
369 | author attributions in that material or in the Appropriate Legal
370 | Notices displayed by works containing it; or
371 |
372 | c) Prohibiting misrepresentation of the origin of that material, or
373 | requiring that modified versions of such material be marked in
374 | reasonable ways as different from the original version; or
375 |
376 | d) Limiting the use for publicity purposes of names of licensors or
377 | authors of the material; or
378 |
379 | e) Declining to grant rights under trademark law for use of some
380 | trade names, trademarks, or service marks; or
381 |
382 | f) Requiring indemnification of licensors and authors of that
383 | material by anyone who conveys the material (or modified versions of
384 | it) with contractual assumptions of liability to the recipient, for
385 | any liability that these contractual assumptions directly impose on
386 | those licensors and authors.
387 |
388 | All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10. If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term. If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 |
398 | If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 |
403 | Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 |
407 | 8. Termination.
408 |
409 | You may not propagate or modify a covered work except as expressly
410 | provided under this License. Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 |
415 | However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 |
422 | Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 |
429 | Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License. If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 |
435 | 9. Acceptance Not Required for Having Copies.
436 |
437 | You are not required to accept this License in order to receive or
438 | run a copy of the Program. Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance. However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work. These actions infringe copyright if you do
443 | not accept this License. Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 |
446 | 10. Automatic Licensing of Downstream Recipients.
447 |
448 | Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License. You are not responsible
451 | for enforcing compliance by third parties with this License.
452 |
453 | An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations. If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 |
463 | You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License. For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 |
471 | 11. Patents.
472 |
473 | A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based. The
475 | work thus licensed is called the contributor's "contributor version".
476 |
477 | A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version. For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 |
487 | Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 |
492 | In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement). To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 |
499 | If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients. "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 |
513 | If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 |
521 | A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License. You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 |
536 | Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 |
540 | 12. No Surrender of Others' Freedom.
541 |
542 | If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License. If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all. For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 |
552 | 13. Use with the GNU Affero General Public License.
553 |
554 | Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work. The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 |
563 | 14. Revised Versions of this License.
564 |
565 | The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time. Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 |
570 | Each version is given a distinguishing version number. If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation. If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 |
579 | If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 |
584 | Later license versions may give you additional or different
585 | permissions. However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 |
589 | 15. Disclaimer of Warranty.
590 |
591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 |
600 | 16. Limitation of Liability.
601 |
602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 |
612 | 17. Interpretation of Sections 15 and 16.
613 |
614 | If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 |
621 | END OF TERMS AND CONDITIONS
622 |
623 | How to Apply These Terms to Your New Programs
624 |
625 | If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 |
629 | To do so, attach the following notices to the program. It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 |
634 |
635 | Copyright (C)
636 |
637 | This program is free software: you can redistribute it and/or modify
638 | it under the terms of the GNU General Public License as published by
639 | the Free Software Foundation, either version 3 of the License, or
640 | (at your option) any later version.
641 |
642 | This program is distributed in the hope that it will be useful,
643 | but WITHOUT ANY WARRANTY; without even the implied warranty of
644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645 | GNU General Public License for more details.
646 |
647 | You should have received a copy of the GNU General Public License
648 | along with this program. If not, see .
649 |
650 | Also add information on how to contact you by electronic and paper mail.
651 |
652 | If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 |
655 | Copyright (C)
656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 | This is free software, and you are welcome to redistribute it
658 | under certain conditions; type `show c' for details.
659 |
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License. Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 |
664 | You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | .
668 |
669 | The GNU General Public License does not permit incorporating your program
670 | into proprietary programs. If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library. If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License. But first, please read
674 | .
675 |
--------------------------------------------------------------------------------
/tcp_nanqinlang.c:
--------------------------------------------------------------------------------
1 | /* tcp_nanqinlang
2 |
3 | * Debian
4 |
5 | * kernel v4.9.3-v4.12.x
6 |
7 | × New BBR Congestion Control
8 |
9 | * Modified by (C) 2017-2018 nanqinlang
10 |
11 | * A super violent branch, and just for test !
12 |
13 | *******************************************************************************
14 | * Bottleneck Bandwidth and RTT (BBR) congestion control
15 | *
16 | * BBR congestion control computes the sending rate based on the delivery
17 | * rate (throughput) estimated from ACKs. In a nutshell:
18 | *
19 | * On each ACK, update our model of the network path:
20 | * bottleneck_bandwidth = windowed_max(delivered / elapsed, 10 round trips)
21 | * min_rtt = windowed_min(rtt, 10 seconds)
22 | * pacing_rate = pacing_gain * bottleneck_bandwidth
23 | * cwnd = max(cwnd_gain * bottleneck_bandwidth * min_rtt, 4)
24 | *
25 | * The core algorithm does not react directly to packet losses or delays,
26 | * although BBR may adjust the size of next send per ACK when loss is
27 | * observed, or adjust the sending rate if it estimates there is a
28 | * traffic policer, in order to keep the drop rate reasonable.
29 | *
30 | * Here is a state transition diagram for BBR:
31 | *
32 | * |
33 | * V
34 | * +---> STARTUP ----+
35 | * | | |
36 | * | V |
37 | * | DRAIN ----+
38 | * | | |
39 | * | V |
40 | * +---> PROBE_BW ----+
41 | * | ^ | |
42 | * | | | |
43 | * | +----+ |
44 | * | |
45 | * +---- PROBE_RTT <--+
46 | *
47 | * A BBR flow starts in STARTUP, and ramps up its sending rate quickly.
48 | * When it estimates the pipe is full, it enters DRAIN to drain the queue.
49 | * In steady state a BBR flow only uses PROBE_BW and PROBE_RTT.
50 | * A long-lived BBR flow spends the vast majority of its time remaining
51 | * (repeatedly) in PROBE_BW, fully probing and utilizing the pipe's bandwidth
52 | * in a fair manner, with a small, bounded queue. *If* a flow has been
53 | * continuously sending for the entire min_rtt window, and hasn't seen an RTT
54 | * sample that matches or decreases its min_rtt estimate for 10 seconds, then
55 | * it briefly enters PROBE_RTT to cut inflight to a minimum value to re-probe
56 | * the path's two-way propagation delay (min_rtt). When exiting PROBE_RTT, if
57 | * we estimated that we reached the full bw of the pipe then we enter PROBE_BW;
58 | * otherwise we enter STARTUP to try to fill the pipe.
59 | *
60 | * BBR is described in detail in:
61 | * "BBR: Congestion-Based Congestion Control",
62 | * Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh,
63 | * Van Jacobson. ACM Queue, Vol. 14 No. 5, September-October 2016.
64 | *
65 | * There is a public e-mail list for discussing BBR development and testing:
66 | * https://groups.google.com/forum/#!forum/bbr-dev
67 | *
68 | * NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing enabled,
69 | * since pacing is integral to the BBR design and implementation.
70 | * BBR without pacing would not function properly, and may incur unnecessary
71 | * high packet loss rates.
72 | */
73 |
74 | #include
75 | #include
76 | #include
77 | #include
78 | #include
79 | #include
80 |
81 | /* Scale factor for rate in pkt/uSec unit to avoid truncation in bandwidth
82 | * estimation. The rate unit ~= (1500 bytes / 1 usec / 2^24) ~= 715 bps.
83 | * This handles bandwidths from 0.06pps (715bps) to 256Mpps (3Tbps) in a u32.
84 | * Since the minimum window is >=4 packets, the lower bound isn't
85 | * an issue. The upper bound isn't an issue with existing technologies.
86 | */
87 | #define BW_SCALE 24
88 | #define BW_UNIT (1 << BW_SCALE)
89 |
90 | #define BBR_SCALE 8 /* scaling factor for fractions in BBR (e.g. gains) */
91 | #define BBR_UNIT (1 << BBR_SCALE)
92 |
93 | #define CYCLE_LEN 8 /* number of phases in a pacing gain cycle */
94 |
95 |
96 | // **************************************************************************
97 | // the following is the main
98 | // **************************************************************************
99 |
100 |
101 | /* BBR has the following modes for deciding how fast to send: */
102 | // four working mode
103 | enum bbr_mode {
104 | BBR_STARTUP, /* ramp up sending rate rapidly to fill pipe */
105 | BBR_DRAIN, /* drain any queue created during startup */
106 | BBR_PROBE_BW, /* discover, share bw: pace around estimated bw */
107 | BBR_PROBE_RTT, /* cut cwnd to min to probe min_rtt */
108 | };
109 |
110 |
111 | /* BBR congestion control block */
112 | // control block with u32 values you set
113 | struct bbr {
114 | u32 min_rtt_us; /* min RTT in min_rtt_win_sec window */
115 | u32 min_rtt_stamp; /* timestamp of min_rtt_us */
116 | u32 probe_rtt_done_stamp; /* end time for BBR_PROBE_RTT mode */
117 | struct minmax bw; /* Max recent delivery rate in pkts/uS << 24 */
118 | u32 rtt_cnt; /* count of packet-timed rounds elapsed */
119 | u32 next_rtt_delivered; /* scb->tx.delivered at end of round */
120 | struct skb_mstamp cycle_mstamp; /* time of this cycle phase start */
121 | u32 mode:3, /* current bbr_mode in state machine */
122 | prev_ca_state:3, /* CA state on previous ACK */
123 | packet_conservation:1, /* use packet conservation? */
124 | restore_cwnd:1, /* decided to revert cwnd to old value */
125 | round_start:1, /* start of packet-timed tx->ack round? */
126 | tso_segs_goal:7, /* segments we want in each skb we send */
127 | idle_restart:1, /* restarting after idle? */
128 | probe_rtt_round_done:1, /* a BBR_PROBE_RTT round at 4 pkts? */
129 | unused:5,
130 | lt_is_sampling:1, /* taking long-term ("LT") samples now? */
131 | lt_rtt_cnt:7, /* round trips in long-term interval */
132 | lt_use_bw:1; /* use lt_bw as our bw estimate? */
133 | u32 lt_bw; /* LT est delivery rate in pkts/uS << 24 */
134 | u32 lt_last_delivered; /* LT intvl start: tp->delivered */
135 | u32 lt_last_stamp; /* LT intvl start: tp->delivered_mstamp */
136 | u32 lt_last_lost; /* LT intvl start: tp->lost */
137 | u32 pacing_gain:10, /* current gain for setting pacing rate */
138 | cwnd_gain:10, /* current gain for setting cwnd */
139 | full_bw_cnt:3, /* number of rounds without large bw gains */
140 | cycle_idx:3, /* current index in pacing_gain cycle array */
141 | unused_b:5;
142 | u32 prior_cwnd; /* prior cwnd upon entering loss recovery */
143 | u32 full_bw; /* recent bw, to estimate if pipe is full */
144 | };
145 |
146 |
147 | /* Window length of bw filter (in rounds): */
148 | static const int bbr_bw_rtts = CYCLE_LEN + 7;
149 | /* Window length of min_rtt filter (in sec): */
150 | static const u32 bbr_min_rtt_win_sec = 7;
151 | /* Minimum time (in ms) spent at bbr_cwnd_min_target in BBR_PROBE_RTT mode: */
152 | static const u32 bbr_probe_rtt_mode_ms = 70;
153 | /* Skip TSO below the following bandwidth (bits/sec): */
154 | static const int bbr_min_tso_rate = 1024000;
155 |
156 | /* We use a high_gain value of 2/ln(2) because it's the smallest pacing gain
157 | * that will allow a smoothly increasing pacing rate that will double each RTT
158 | * and send the same number of packets per RTT that an un-paced, slow-starting
159 | * Reno or CUBIC flow would:
160 | */
161 | static const int bbr_high_gain = BBR_UNIT * 3250 / 1000 + 1;
162 | /* The pacing gain of 1/high_gain in BBR_DRAIN is calculated to typically drain
163 | * the queue created in BBR_STARTUP in a single round:
164 | */
165 | static const int bbr_drain_gain = BBR_UNIT * 1000 / 3250;
166 | /* The gain for deriving steady-state cwnd tolerates delayed/stretched ACKs: */
167 | static const int bbr_cwnd_gain = BBR_UNIT * 2;
168 | /* The pacing_gain values for the PROBE_BW gain cycle, to discover/share bw: */
169 | static const int bbr_pacing_gain[] = {
170 | // for the stable bbr mode "BBR_PROBE_BW" which makes the fastest speed mode.
171 | // there are 8 pacing rate
172 | BBR_UNIT * 8 / 4, /* probe for more available bw */
173 | BBR_UNIT * 3 / 4, /* drain queue and/or yield bw to other flows */
174 | BBR_UNIT * 7 / 4, BBR_UNIT * 7 / 4, BBR_UNIT * 7 / 4, /* cruise at 1.0*bw to utilize pipe, */
175 | BBR_UNIT * 8 / 4, BBR_UNIT * 8 / 4, BBR_UNIT * 8 / 4 /* without creating excess queue... */
176 | };
177 |
178 | /* Randomize the starting gain cycling phase over N phases: */
179 | static const u32 bbr_cycle_rand = 7;
180 |
181 | /* Try to keep at least this many packets in flight, if things go smoothly. For
182 | * smooth functioning, a sliding window protocol ACKing every other packet
183 | * needs at least 4 packets in flight:
184 | */
185 | // minimumly keeps 4 package when discover minimum rtt
186 | static const u32 bbr_cwnd_min_target = 4;
187 |
188 | /* To estimate if BBR_STARTUP mode (i.e. high_gain) has filled pipe... */
189 | /* If bw has increased significantly (1.25x), there may be more bw available: */
190 | static const u32 bbr_full_bw_thresh = BBR_UNIT * 8 / 4;
191 | /* But after 3 rounds w/o significant bw growth, estimate pipe is full: */
192 | static const u32 bbr_full_bw_cnt = 3;
193 |
194 | /* "long-term" ("LT") bandwidth estimator parameters... */
195 | /* The minimum number of rounds in an LT bw sampling interval: */
196 | static const u32 bbr_lt_intvl_min_rtts = 4;
197 | /* If lost/delivered ratio > 20%, interval is "lossy" and we may be policed: */
198 | static const u32 bbr_lt_loss_thresh = 60;
199 | /* If 2 intervals have a bw ratio <= 1/8, their bw is "consistent": */
200 | static const u32 bbr_lt_bw_ratio = BBR_UNIT / 4;
201 | /* If 2 intervals have a bw diff <= 4 Kbit/sec their bw is "consistent": */
202 | static const u32 bbr_lt_bw_diff = 4000 / 4;
203 | /* If we estimate we're policed, use lt_bw for this many round trips: */
204 | static const u32 bbr_lt_bw_max_rtts = 40;
205 |
206 | /* Do we estimate that STARTUP filled the pipe? */
207 | static bool bbr_full_bw_reached(const struct sock *sk)
208 | {
209 | const struct bbr *bbr = inet_csk_ca(sk);
210 |
211 | return bbr->full_bw_cnt >= bbr_full_bw_cnt;
212 | }
213 |
214 | /* Return the windowed max recent bandwidth sample, in pkts/uS << BW_SCALE. */
215 | static u32 bbr_max_bw(const struct sock *sk)
216 | {
217 | struct bbr *bbr = inet_csk_ca(sk);
218 |
219 | return minmax_get(&bbr->bw);
220 | }
221 |
222 | /* Return the estimated bandwidth of the path, in pkts/uS << BW_SCALE. */
223 | static u32 bbr_bw(const struct sock *sk)
224 | {
225 | struct bbr *bbr = inet_csk_ca(sk);
226 |
227 | return bbr->lt_use_bw ? bbr->lt_bw : bbr_max_bw(sk);
228 | }
229 |
230 | /* Return rate in bytes per second, optionally with a gain.
231 | * The order here is chosen carefully to avoid overflow of u64. This should
232 | * work for input rates of up to 2.9Tbit/sec and gain of 2.89x.
233 | */
234 | static u64 bbr_rate_bytes_per_sec(struct sock *sk, u64 rate, int gain)
235 | {
236 | rate *= tcp_mss_to_mtu(sk, tcp_sk(sk)->mss_cache);
237 | rate *= gain;
238 | rate >>= BBR_SCALE;
239 | rate *= USEC_PER_SEC;
240 | return rate >> BW_SCALE;
241 | }
242 |
243 | /* Pace using current bw estimate and a gain factor. In order to help drive the
244 | * network toward lower queues while maintaining high utilization and low
245 | * latency, the average pacing rate aims to be slightly (~1%) lower than the
246 | * estimated bandwidth. This is an important aspect of the design. In this
247 | * implementation this slightly lower pacing rate is achieved implicitly by not
248 | * including link-layer headers in the packet size used for the pacing rate.
249 | */
250 | static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain)
251 | {
252 | struct bbr *bbr = inet_csk_ca(sk);
253 | u64 rate = bw;
254 |
255 | rate = bbr_rate_bytes_per_sec(sk, rate, gain);
256 | rate = min_t(u64, rate, sk->sk_max_pacing_rate);
257 | if (bbr->mode != BBR_STARTUP || rate > sk->sk_pacing_rate)
258 | sk->sk_pacing_rate = rate;
259 | }
260 |
261 | /* Return count of segments we want in the skbs we send, or 0 for default. */
262 | static u32 bbr_tso_segs_goal(struct sock *sk)
263 | {
264 | struct bbr *bbr = inet_csk_ca(sk);
265 |
266 | return bbr->tso_segs_goal;
267 | }
268 |
269 | static void bbr_set_tso_segs_goal(struct sock *sk)
270 | {
271 | struct tcp_sock *tp = tcp_sk(sk);
272 | struct bbr *bbr = inet_csk_ca(sk);
273 | u32 min_segs;
274 |
275 | min_segs = sk->sk_pacing_rate < (bbr_min_tso_rate >> 3) ? 1 : 2;
276 | bbr->tso_segs_goal = min(tcp_tso_autosize(sk, tp->mss_cache, min_segs),
277 | 0x7FU);
278 | }
279 |
280 | /* Save "last known good" cwnd so we can restore it after losses or PROBE_RTT */
281 | static void bbr_save_cwnd(struct sock *sk)
282 | {
283 | struct tcp_sock *tp = tcp_sk(sk);
284 | struct bbr *bbr = inet_csk_ca(sk);
285 |
286 | if (bbr->prev_ca_state < TCP_CA_Recovery && bbr->mode != BBR_PROBE_RTT)
287 | bbr->prior_cwnd = tp->snd_cwnd; /* this cwnd is good enough */
288 | else /* loss recovery or BBR_PROBE_RTT have temporarily cut cwnd */
289 | bbr->prior_cwnd = max(bbr->prior_cwnd, tp->snd_cwnd);
290 | }
291 |
292 | static void bbr_cwnd_event(struct sock *sk, enum tcp_ca_event event)
293 | {
294 | struct tcp_sock *tp = tcp_sk(sk);
295 | struct bbr *bbr = inet_csk_ca(sk);
296 |
297 | if (event == CA_EVENT_TX_START && tp->app_limited) {
298 | bbr->idle_restart = 1;
299 | /* Avoid pointless buffer overflows: pace at est. bw if we don't
300 | * need more speed (we're restarting from idle and app-limited).
301 | */
302 | if (bbr->mode == BBR_PROBE_BW)
303 | bbr_set_pacing_rate(sk, bbr_bw(sk), BBR_UNIT);
304 | }
305 | }
306 |
307 | /* Find target cwnd. Right-size the cwnd based on min RTT and the
308 | * estimated bottleneck bandwidth:
309 | *
310 | * cwnd = bw * min_rtt * gain = BDP * gain
311 | *
312 | * The key factor, gain, controls the amount of queue. While a small gain
313 | * builds a smaller queue, it becomes more vulnerable to noise in RTT
314 | * measurements (e.g., delayed ACKs or other ACK compression effects). This
315 | * noise may cause BBR to under-estimate the rate.
316 | *
317 | * To achieve full performance in high-speed paths, we budget enough cwnd to
318 | * fit full-sized skbs in-flight on both end hosts to fully utilize the path:
319 | * - one skb in sending host Qdisc,
320 | * - one skb in sending host TSO/GSO engine
321 | * - one skb being received by receiver host LRO/GRO/delayed-ACK engine
322 | * Don't worry, at low rates (bbr_min_tso_rate) this won't bloat cwnd because
323 | * in such cases tso_segs_goal is 1. The minimum cwnd is 4 packets,
324 | * which allows 2 outstanding 2-packet sequences, to try to keep pipe
325 | * full even with ACK-every-other-packet delayed ACKs.
326 | */
327 | static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain)
328 | {
329 | struct bbr *bbr = inet_csk_ca(sk);
330 | u32 cwnd;
331 | u64 w;
332 |
333 | /* If we've never had a valid RTT sample, cap cwnd at the initial
334 | * default. This should only happen when the connection is not using TCP
335 | * timestamps and has retransmitted all of the SYN/SYNACK/data packets
336 | * ACKed so far. In this case, an RTO can cut cwnd to 1, in which
337 | * case we need to slow-start up toward something safe: TCP_INIT_CWND.
338 | */
339 | if (unlikely(bbr->min_rtt_us == ~0U)) /* no valid RTT samples yet? */
340 | return TCP_INIT_CWND; /* be safe: cap at default initial cwnd*/
341 |
342 | w = (u64)bw * bbr->min_rtt_us;
343 |
344 | /* Apply a gain to the given value, then remove the BW_SCALE shift. */
345 | cwnd = (((w * gain) >> BBR_SCALE) + BW_UNIT - 1) / BW_UNIT;
346 |
347 | /* Allow enough full-sized skbs in flight to utilize end systems. */
348 | cwnd += 3 * bbr->tso_segs_goal;
349 |
350 | /* Reduce delayed ACKs by rounding up cwnd to the next even number. */
351 | cwnd = (cwnd + 1) & ~1U;
352 |
353 | return cwnd;
354 | }
355 |
356 | /* An optimization in BBR to reduce losses: On the first round of recovery, we
357 | * follow the packet conservation principle: send P packets per P packets acked.
358 | * After that, we slow-start and send at most 2*P packets per P packets acked.
359 | * After recovery finishes, or upon undo, we restore the cwnd we had when
360 | * recovery started (capped by the target cwnd based on estimated BDP).
361 | *
362 | * TODO(ycheng/ncardwell): implement a rate-based approach.
363 | */
364 | static bool bbr_set_cwnd_to_recover_or_restore(
365 | struct sock *sk, const struct rate_sample *rs, u32 acked, u32 *new_cwnd)
366 | {
367 | struct tcp_sock *tp = tcp_sk(sk);
368 | struct bbr *bbr = inet_csk_ca(sk);
369 | u8 prev_state = bbr->prev_ca_state, state = inet_csk(sk)->icsk_ca_state;
370 | u32 cwnd = tp->snd_cwnd;
371 |
372 | /* An ACK for P pkts should release at most 2*P packets. We do this
373 | * in two steps. First, here we deduct the number of lost packets.
374 | * Then, in bbr_set_cwnd() we slow start up toward the target cwnd.
375 | */
376 | if (rs->losses > 0)
377 | cwnd = max_t(s32, cwnd - rs->losses, 1);
378 |
379 | if (state == TCP_CA_Recovery && prev_state != TCP_CA_Recovery) {
380 | /* Starting 1st round of Recovery, so do packet conservation. */
381 | bbr->packet_conservation = 1;
382 | bbr->next_rtt_delivered = tp->delivered; /* start round now */
383 | /* Cut unused cwnd from app behavior, TSQ, or TSO deferral: */
384 | cwnd = tcp_packets_in_flight(tp) + acked;
385 | } else if (prev_state >= TCP_CA_Recovery && state < TCP_CA_Recovery) {
386 | /* Exiting loss recovery; restore cwnd saved before recovery. */
387 | bbr->restore_cwnd = 1;
388 | bbr->packet_conservation = 0;
389 | }
390 | bbr->prev_ca_state = state;
391 |
392 | if (bbr->restore_cwnd) {
393 | /* Restore cwnd after exiting loss recovery or PROBE_RTT. */
394 | cwnd = max(cwnd, bbr->prior_cwnd);
395 | bbr->restore_cwnd = 0;
396 | }
397 |
398 | if (bbr->packet_conservation) {
399 | *new_cwnd = max(cwnd, tcp_packets_in_flight(tp) + acked);
400 | return true; /* yes, using packet conservation */
401 | }
402 | *new_cwnd = cwnd;
403 | return false;
404 | }
405 |
406 | /* Slow-start up toward target cwnd (if bw estimate is growing, or packet loss
407 | * has drawn us down below target), or snap down to target if we're above it.
408 | */
409 | static void bbr_set_cwnd(struct sock *sk, const struct rate_sample *rs,
410 | u32 acked, u32 bw, int gain)
411 | {
412 | struct tcp_sock *tp = tcp_sk(sk);
413 | struct bbr *bbr = inet_csk_ca(sk);
414 | u32 cwnd = 0, target_cwnd = 0;
415 |
416 | if (!acked)
417 | return;
418 |
419 | if (bbr_set_cwnd_to_recover_or_restore(sk, rs, acked, &cwnd))
420 | goto done;
421 |
422 | /* If we're below target cwnd, slow start cwnd toward target cwnd. */
423 | target_cwnd = bbr_target_cwnd(sk, bw, gain);
424 | if (bbr_full_bw_reached(sk)) /* only cut cwnd if we filled the pipe */
425 | cwnd = min(cwnd + acked, target_cwnd);
426 | else if (cwnd < target_cwnd || tp->delivered < TCP_INIT_CWND)
427 | cwnd = cwnd + acked;
428 | cwnd = max(cwnd, bbr_cwnd_min_target);
429 |
430 | done:
431 | tp->snd_cwnd = min(cwnd, tp->snd_cwnd_clamp); /* apply global cap */
432 | if (bbr->mode == BBR_PROBE_RTT) /* drain queue, refresh min_rtt */
433 | tp->snd_cwnd = max(tp->snd_cwnd >> 1, bbr_cwnd_min_target);
434 | }
435 |
436 | /* End cycle phase if it's time and/or we hit the phase's in-flight target. */
437 | static bool bbr_is_next_cycle_phase(struct sock *sk,
438 | const struct rate_sample *rs)
439 | {
440 | struct tcp_sock *tp = tcp_sk(sk);
441 | struct bbr *bbr = inet_csk_ca(sk);
442 | bool is_full_length =
443 | skb_mstamp_us_delta(&tp->delivered_mstamp, &bbr->cycle_mstamp) >
444 | bbr->min_rtt_us;
445 | u32 inflight, bw;
446 |
447 | /* The pacing_gain of 1.0 paces at the estimated bw to try to fully
448 | * use the pipe without increasing the queue.
449 | */
450 | if (bbr->pacing_gain == BBR_UNIT)
451 | return is_full_length; /* just use wall clock time */
452 |
453 | inflight = rs->prior_in_flight; /* what was in-flight before ACK? */
454 | bw = bbr_max_bw(sk);
455 |
456 | /* A pacing_gain > 1.0 probes for bw by trying to raise inflight to at
457 | * least pacing_gain*BDP; this may take more than min_rtt if min_rtt is
458 | * small (e.g. on a LAN). We do not persist if packets are lost, since
459 | * a path with small buffers may not hold that much.
460 | */
461 | if (bbr->pacing_gain > BBR_UNIT)
462 | return is_full_length &&
463 | (rs->losses || /* perhaps pacing_gain*BDP won't fit */
464 | inflight >= bbr_target_cwnd(sk, bw, bbr->pacing_gain));
465 |
466 | /* A pacing_gain < 1.0 tries to drain extra queue we added if bw
467 | * probing didn't find more bw. If inflight falls to match BDP then we
468 | * estimate queue is drained; persisting would underutilize the pipe.
469 | */
470 | return is_full_length ||
471 | inflight <= bbr_target_cwnd(sk, bw, BBR_UNIT);
472 | }
473 |
474 | static void bbr_advance_cycle_phase(struct sock *sk)
475 | {
476 | struct tcp_sock *tp = tcp_sk(sk);
477 | struct bbr *bbr = inet_csk_ca(sk);
478 |
479 | bbr->cycle_idx = (bbr->cycle_idx + 1) & (CYCLE_LEN - 1);
480 | bbr->cycle_mstamp = tp->delivered_mstamp;
481 | bbr->pacing_gain = bbr_pacing_gain[bbr->cycle_idx];
482 | }
483 |
484 | /* Gain cycling: cycle pacing gain to converge to fair share of available bw. */
485 | static void bbr_update_cycle_phase(struct sock *sk,
486 | const struct rate_sample *rs)
487 | {
488 | struct bbr *bbr = inet_csk_ca(sk);
489 |
490 | if ((bbr->mode == BBR_PROBE_BW) && !bbr->lt_use_bw &&
491 | bbr_is_next_cycle_phase(sk, rs))
492 | bbr_advance_cycle_phase(sk);
493 | }
494 |
495 | static void bbr_reset_startup_mode(struct sock *sk)
496 | {
497 | struct bbr *bbr = inet_csk_ca(sk);
498 |
499 | bbr->mode = BBR_STARTUP;
500 | bbr->pacing_gain = bbr_high_gain;
501 | bbr->cwnd_gain = bbr_high_gain;
502 | }
503 |
504 | static void bbr_reset_probe_bw_mode(struct sock *sk)
505 | {
506 | struct bbr *bbr = inet_csk_ca(sk);
507 |
508 | bbr->mode = BBR_PROBE_BW;
509 | bbr->pacing_gain = BBR_UNIT;
510 | bbr->cwnd_gain = bbr_cwnd_gain;
511 | bbr->cycle_idx = CYCLE_LEN - 1 - prandom_u32_max(bbr_cycle_rand);
512 | bbr_advance_cycle_phase(sk); /* flip to next phase of gain cycle */
513 | }
514 |
515 | static void bbr_reset_mode(struct sock *sk)
516 | {
517 | if (!bbr_full_bw_reached(sk))
518 | bbr_reset_startup_mode(sk);
519 | else
520 | bbr_reset_probe_bw_mode(sk);
521 | }
522 |
523 | /* Start a new long-term sampling interval. */
524 | static void bbr_reset_lt_bw_sampling_interval(struct sock *sk)
525 | {
526 | struct tcp_sock *tp = tcp_sk(sk);
527 | struct bbr *bbr = inet_csk_ca(sk);
528 |
529 | bbr->lt_last_stamp = tp->delivered_mstamp.stamp_jiffies;
530 | bbr->lt_last_delivered = tp->delivered;
531 | bbr->lt_last_lost = tp->lost;
532 | bbr->lt_rtt_cnt = 0;
533 | }
534 |
535 | /* Completely reset long-term bandwidth sampling. */
536 | static void bbr_reset_lt_bw_sampling(struct sock *sk)
537 | {
538 | struct bbr *bbr = inet_csk_ca(sk);
539 |
540 | bbr->lt_bw = 0;
541 | bbr->lt_use_bw = 0;
542 | bbr->lt_is_sampling = false;
543 | bbr_reset_lt_bw_sampling_interval(sk);
544 | }
545 |
546 | /* Long-term bw sampling interval is done. Estimate whether we're policed. */
547 | static void bbr_lt_bw_interval_done(struct sock *sk, u32 bw)
548 | {
549 | struct bbr *bbr = inet_csk_ca(sk);
550 | u32 diff;
551 |
552 | if (bbr->lt_bw) { /* do we have bw from a previous interval? */
553 | /* Is new bw close to the lt_bw from the previous interval? */
554 | diff = abs(bw - bbr->lt_bw);
555 | if ((diff * BBR_UNIT <= bbr_lt_bw_ratio * bbr->lt_bw) ||
556 | (bbr_rate_bytes_per_sec(sk, diff, BBR_UNIT) <=
557 | bbr_lt_bw_diff)) {
558 | /* All criteria are met; estimate we're policed. */
559 | bbr->lt_bw = (bw + bbr->lt_bw) >> 1; /* avg 2 intvls */
560 | bbr->lt_use_bw = 1;
561 | bbr->pacing_gain = BBR_UNIT; /* try to avoid drops */
562 | bbr->lt_rtt_cnt = 0;
563 | return;
564 | }
565 | }
566 | bbr->lt_bw = bw;
567 | bbr_reset_lt_bw_sampling_interval(sk);
568 | }
569 |
570 | /* Token-bucket traffic policers are common (see "An Internet-Wide Analysis of
571 | * Traffic Policing", SIGCOMM 2016). BBR detects token-bucket policers and
572 | * explicitly models their policed rate, to reduce unnecessary losses. We
573 | * estimate that we're policed if we see 2 consecutive sampling intervals with
574 | * consistent throughput and high packet loss. If we think we're being policed,
575 | * set lt_bw to the "long-term" average delivery rate from those 2 intervals.
576 | */
577 | static void bbr_lt_bw_sampling(struct sock *sk, const struct rate_sample *rs)
578 | {
579 | struct tcp_sock *tp = tcp_sk(sk);
580 | struct bbr *bbr = inet_csk_ca(sk);
581 | u32 lost, delivered;
582 | u64 bw;
583 | s32 t;
584 |
585 | if (bbr->lt_use_bw) { /* already using long-term rate, lt_bw? */
586 | if (bbr->mode == BBR_PROBE_BW && bbr->round_start &&
587 | ++bbr->lt_rtt_cnt >= bbr_lt_bw_max_rtts) {
588 | bbr_reset_lt_bw_sampling(sk); /* stop using lt_bw */
589 | bbr_reset_probe_bw_mode(sk); /* restart gain cycling */
590 | }
591 | return;
592 | }
593 |
594 | /* Wait for the first loss before sampling, to let the policer exhaust
595 | * its tokens and estimate the steady-state rate allowed by the policer.
596 | * Starting samples earlier includes bursts that over-estimate the bw.
597 | */
598 | if (!bbr->lt_is_sampling) {
599 | if (!rs->losses)
600 | return;
601 | bbr_reset_lt_bw_sampling_interval(sk);
602 | bbr->lt_is_sampling = true;
603 | }
604 |
605 | /* To avoid underestimates, reset sampling if we run out of data. */
606 | if (rs->is_app_limited) {
607 | bbr_reset_lt_bw_sampling(sk);
608 | return;
609 | }
610 |
611 | if (bbr->round_start)
612 | bbr->lt_rtt_cnt++; /* count round trips in this interval */
613 | if (bbr->lt_rtt_cnt < bbr_lt_intvl_min_rtts)
614 | return; /* sampling interval needs to be longer */
615 | if (bbr->lt_rtt_cnt > 4 * bbr_lt_intvl_min_rtts) {
616 | bbr_reset_lt_bw_sampling(sk); /* interval is too long */
617 | return;
618 | }
619 |
620 | /* End sampling interval when a packet is lost, so we estimate the
621 | * policer tokens were exhausted. Stopping the sampling before the
622 | * tokens are exhausted under-estimates the policed rate.
623 | */
624 | if (!rs->losses)
625 | return;
626 |
627 | /* Calculate packets lost and delivered in sampling interval. */
628 | lost = tp->lost - bbr->lt_last_lost;
629 | delivered = tp->delivered - bbr->lt_last_delivered;
630 | /* Is loss rate (lost/delivered) >= lt_loss_thresh? If not, wait. */
631 | if (!delivered || (lost << BBR_SCALE) < bbr_lt_loss_thresh * delivered)
632 | return;
633 |
634 | /* Find average delivery rate in this sampling interval. */
635 | t = (s32)(tp->delivered_mstamp.stamp_jiffies - bbr->lt_last_stamp);
636 | if (t < 1)
637 | return; /* interval is less than one jiffy, so wait */
638 | t = jiffies_to_usecs(t);
639 | /* Interval long enough for jiffies_to_usecs() to return a bogus 0? */
640 | if (t < 1) {
641 | bbr_reset_lt_bw_sampling(sk); /* interval too long; reset */
642 | return;
643 | }
644 | bw = (u64)delivered * BW_UNIT;
645 | do_div(bw, t);
646 | bbr_lt_bw_interval_done(sk, bw);
647 | }
648 |
649 | /* Estimate the bandwidth based on how fast packets are delivered */
650 | static void bbr_update_bw(struct sock *sk, const struct rate_sample *rs)
651 | {
652 | struct tcp_sock *tp = tcp_sk(sk);
653 | struct bbr *bbr = inet_csk_ca(sk);
654 | u64 bw;
655 |
656 | bbr->round_start = 0;
657 | if (rs->delivered < 0 || rs->interval_us <= 0)
658 | return; /* Not a valid observation */
659 |
660 | /* See if we've reached the next RTT */
661 | if (!before(rs->prior_delivered, bbr->next_rtt_delivered)) {
662 | bbr->next_rtt_delivered = tp->delivered;
663 | bbr->rtt_cnt++;
664 | bbr->round_start = 1;
665 | bbr->packet_conservation = 0;
666 | }
667 |
668 | bbr_lt_bw_sampling(sk, rs);
669 |
670 | /* Divide delivered by the interval to find a (lower bound) bottleneck
671 | * bandwidth sample. Delivered is in packets and interval_us in uS and
672 | * ratio will be <<1 for most connections. So delivered is first scaled.
673 | */
674 | bw = (u64)rs->delivered * BW_UNIT;
675 | do_div(bw, rs->interval_us);
676 |
677 | /* If this sample is application-limited, it is likely to have a very
678 | * low delivered count that represents application behavior rather than
679 | * the available network rate. Such a sample could drag down estimated
680 | * bw, causing needless slow-down. Thus, to continue to send at the
681 | * last measured network rate, we filter out app-limited samples unless
682 | * they describe the path bw at least as well as our bw model.
683 | *
684 | * So the goal during app-limited phase is to proceed with the best
685 | * network rate no matter how long. We automatically leave this
686 | * phase when app writes faster than the network can deliver :)
687 | */
688 | if (!rs->is_app_limited || bw >= bbr_max_bw(sk)) {
689 | /* Incorporate new sample into our max bw filter. */
690 | minmax_running_max(&bbr->bw, bbr_bw_rtts, bbr->rtt_cnt, bw);
691 | }
692 | }
693 |
694 | /* Estimate when the pipe is full, using the change in delivery rate: BBR
695 | * estimates that STARTUP filled the pipe if the estimated bw hasn't changed by
696 | * at least bbr_full_bw_thresh (25%) after bbr_full_bw_cnt (3) non-app-limited
697 | * rounds. Why 3 rounds: 1: rwin autotuning grows the rwin, 2: we fill the
698 | * higher rwin, 3: we get higher delivery rate samples. Or transient
699 | * cross-traffic or radio noise can go away. CUBIC Hystart shares a similar
700 | * design goal, but uses delay and inter-ACK spacing instead of bandwidth.
701 | */
702 | static void bbr_check_full_bw_reached(struct sock *sk,
703 | const struct rate_sample *rs)
704 | {
705 | struct bbr *bbr = inet_csk_ca(sk);
706 | u32 bw_thresh;
707 |
708 | if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited)
709 | return;
710 |
711 | bw_thresh = (u64)bbr->full_bw * bbr_full_bw_thresh >> BBR_SCALE;
712 | if (bbr_max_bw(sk) >= bw_thresh) {
713 | bbr->full_bw = bbr_max_bw(sk);
714 | bbr->full_bw_cnt = 0;
715 | return;
716 | }
717 | ++bbr->full_bw_cnt;
718 | }
719 |
720 | /* If pipe is probably full, drain the queue and then enter steady-state. */
721 | static void bbr_check_drain(struct sock *sk, const struct rate_sample *rs)
722 | {
723 | struct bbr *bbr = inet_csk_ca(sk);
724 |
725 | if (bbr->mode == BBR_STARTUP && bbr_full_bw_reached(sk)) {
726 | bbr->mode = BBR_DRAIN; /* drain queue we created */
727 | bbr->pacing_gain = bbr_drain_gain; /* pace slow to drain */
728 | bbr->cwnd_gain = bbr_high_gain; /* maintain cwnd */
729 | } /* fall through to check if in-flight is already small: */
730 | if (bbr->mode == BBR_DRAIN &&
731 | tcp_packets_in_flight(tcp_sk(sk)) <=
732 | bbr_target_cwnd(sk, bbr_max_bw(sk), BBR_UNIT))
733 | bbr_reset_probe_bw_mode(sk); /* we estimate queue is drained */
734 | }
735 |
736 | /* The goal of PROBE_RTT mode is to have BBR flows cooperatively and
737 | * periodically drain the bottleneck queue, to converge to measure the true
738 | * min_rtt (unloaded propagation delay). This allows the flows to keep queues
739 | * small (reducing queuing delay and packet loss) and achieve fairness among
740 | * BBR flows.
741 | *
742 | * The min_rtt filter window is 10 seconds. When the min_rtt estimate expires,
743 | * we enter PROBE_RTT mode and cap the cwnd at bbr_cwnd_min_target=4 packets.
744 | * After at least bbr_probe_rtt_mode_ms=200ms and at least one packet-timed
745 | * round trip elapsed with that flight size <= 4, we leave PROBE_RTT mode and
746 | * re-enter the previous mode. BBR uses 200ms to approximately bound the
747 | * performance penalty of PROBE_RTT's cwnd capping to roughly 2% (200ms/10s).
748 | *
749 | * Note that flows need only pay 2% if they are busy sending over the last 10
750 | * seconds. Interactive applications (e.g., Web, RPCs, video chunks) often have
751 | * natural silences or low-rate periods within 10 seconds where the rate is low
752 | * enough for long enough to drain its queue in the bottleneck. We pick up
753 | * these min RTT measurements opportunistically with our min_rtt filter. :-)
754 | */
755 | static void bbr_update_min_rtt(struct sock *sk, const struct rate_sample *rs)
756 | {
757 | struct tcp_sock *tp = tcp_sk(sk);
758 | struct bbr *bbr = inet_csk_ca(sk);
759 | bool filter_expired;
760 |
761 | /* Track min RTT seen in the min_rtt_win_sec filter window: */
762 | // as above BBR_Structure define: "min_rtt_win_sec = 5 seconds"
763 | filter_expired = after(tcp_time_stamp,
764 | bbr->min_rtt_stamp + bbr_min_rtt_win_sec * HZ);
765 | if (rs->rtt_us >= 0 &&
766 | (rs->rtt_us <= bbr->min_rtt_us || filter_expired)) {
767 | bbr->min_rtt_us = rs->rtt_us;
768 | bbr->min_rtt_stamp = tcp_time_stamp;
769 | }
770 |
771 | if (bbr_probe_rtt_mode_ms > 0 && filter_expired &&
772 | !bbr->idle_restart && bbr->mode != BBR_PROBE_RTT) {
773 | bbr->mode = BBR_PROBE_RTT; /* dip, drain queue */
774 | bbr->pacing_gain = BBR_UNIT;
775 | bbr->cwnd_gain = BBR_UNIT;
776 | bbr_save_cwnd(sk); /* note cwnd so we can restore it */
777 | bbr->probe_rtt_done_stamp = 0;
778 | }
779 |
780 | if (bbr->mode == BBR_PROBE_RTT) {
781 | /* Ignore low rate samples during this mode. */
782 | tp->app_limited =
783 | (tp->delivered + tcp_packets_in_flight(tp)) ? : 1;
784 | /* Maintain min packets in flight for max(200 ms, 1 round). */
785 | if (!bbr->probe_rtt_done_stamp &&
786 | tcp_packets_in_flight(tp) <= bbr_cwnd_min_target) {
787 | bbr->probe_rtt_done_stamp = tcp_time_stamp +
788 | msecs_to_jiffies(bbr_probe_rtt_mode_ms >> 1);
789 | bbr->probe_rtt_round_done = 0;
790 | bbr->next_rtt_delivered = tp->delivered;
791 | } else if (bbr->probe_rtt_done_stamp) {
792 | if (bbr->round_start)
793 | bbr->probe_rtt_round_done = 1;
794 | if (bbr->probe_rtt_round_done &&
795 | after(tcp_time_stamp, bbr->probe_rtt_done_stamp)) {
796 | bbr->min_rtt_stamp = tcp_time_stamp;
797 | bbr->restore_cwnd = 1; /* snap to prior_cwnd */
798 | bbr_reset_mode(sk);
799 | }
800 | }
801 | }
802 | bbr->idle_restart = 0;
803 | }
804 |
805 | static void bbr_update_model(struct sock *sk, const struct rate_sample *rs)
806 | {
807 | bbr_update_bw(sk, rs);
808 | bbr_update_cycle_phase(sk, rs);
809 | bbr_check_full_bw_reached(sk, rs);
810 | bbr_check_drain(sk, rs);
811 | bbr_update_min_rtt(sk, rs);
812 | }
813 |
814 | static void bbr_main(struct sock *sk, const struct rate_sample *rs)
815 | {
816 | struct bbr *bbr = inet_csk_ca(sk);
817 | u32 bw;
818 |
819 | bbr_update_model(sk, rs);
820 |
821 | bw = bbr_bw(sk);
822 | bbr_set_pacing_rate(sk, bw, bbr->pacing_gain);
823 | bbr_set_tso_segs_goal(sk);
824 | bbr_set_cwnd(sk, rs, rs->acked_sacked, bw, bbr->cwnd_gain);
825 | }
826 |
827 | static void bbr_init(struct sock *sk)
828 | {
829 | struct tcp_sock *tp = tcp_sk(sk);
830 | struct bbr *bbr = inet_csk_ca(sk);
831 | u64 bw;
832 |
833 | bbr->prior_cwnd = 0;
834 | bbr->tso_segs_goal = 0; /* default segs per skb until first ACK */
835 | bbr->rtt_cnt = 0;
836 | bbr->next_rtt_delivered = 0;
837 | bbr->prev_ca_state = TCP_CA_Open;
838 | bbr->packet_conservation = 0;
839 |
840 | bbr->probe_rtt_done_stamp = 0;
841 | bbr->probe_rtt_round_done = 0;
842 | bbr->min_rtt_us = tcp_min_rtt(tp);
843 | bbr->min_rtt_stamp = tcp_time_stamp;
844 |
845 | minmax_reset(&bbr->bw, bbr->rtt_cnt, 0); /* init max bw to 0 */
846 |
847 | /* Initialize pacing rate to: high_gain * init_cwnd / RTT. */
848 | bw = (u64)tp->snd_cwnd * BW_UNIT;
849 | do_div(bw, (tp->srtt_us >> 3) ? : USEC_PER_MSEC);
850 | sk->sk_pacing_rate = 0; /* force an update of sk_pacing_rate */
851 | bbr_set_pacing_rate(sk, bw, bbr_high_gain);
852 |
853 | bbr->restore_cwnd = 0;
854 | bbr->round_start = 0;
855 | bbr->idle_restart = 0;
856 | bbr->full_bw = 0;
857 | bbr->full_bw_cnt = 0;
858 | bbr->cycle_mstamp.v64 = 0;
859 | bbr->cycle_idx = 0;
860 | bbr_reset_lt_bw_sampling(sk);
861 | bbr_reset_startup_mode(sk);
862 | }
863 |
864 | static u32 bbr_sndbuf_expand(struct sock *sk)
865 | {
866 | /* Provision 3 * cwnd since BBR may slow-start even during recovery. */
867 | return 3;
868 | }
869 |
870 | /* In theory BBR does not need to undo the cwnd since it does not
871 | * always reduce cwnd on losses (see bbr_main()). Keep it for now.
872 | */
873 | static u32 bbr_undo_cwnd(struct sock *sk)
874 | {
875 | return tcp_sk(sk)->snd_cwnd;
876 | }
877 |
878 | /* Entering loss recovery, so save cwnd for when we exit or undo recovery. */
879 | static u32 bbr_ssthresh(struct sock *sk)
880 | {
881 | bbr_save_cwnd(sk);
882 | return TCP_INFINITE_SSTHRESH; /* BBR does not use ssthresh */
883 | }
884 |
885 | static size_t bbr_get_info(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info)
886 | {
887 | if (ext & (1 << (INET_DIAG_BBRINFO - 1)) ||
888 | ext & (1 << (INET_DIAG_VEGASINFO - 1))) {
889 | struct tcp_sock *tp = tcp_sk(sk);
890 | struct bbr *bbr = inet_csk_ca(sk);
891 | u64 bw = bbr_bw(sk);
892 | bw = bw * tp->mss_cache * USEC_PER_SEC >> BW_SCALE;
893 | memset(&info->bbr, 0, sizeof(info->bbr));
894 | info->bbr.bbr_bw_lo = (u32)bw;
895 | info->bbr.bbr_bw_hi = (u32)(bw >> 32);
896 | info->bbr.bbr_min_rtt = bbr->min_rtt_us;
897 | info->bbr.bbr_pacing_gain = bbr->pacing_gain;
898 | info->bbr.bbr_cwnd_gain = bbr->cwnd_gain;
899 | *attr = INET_DIAG_BBRINFO;
900 | return sizeof(info->bbr);
901 | }
902 | return 0;
903 | }
904 |
905 | static void bbr_set_state(struct sock *sk, u8 new_state)
906 | {
907 | struct bbr *bbr = inet_csk_ca(sk);
908 |
909 | if (new_state == TCP_CA_Loss) {
910 | struct rate_sample rs = { .losses = 1 };
911 |
912 | bbr->prev_ca_state = TCP_CA_Loss;
913 | bbr->full_bw = 0;
914 | bbr->round_start = 1; /* treat RTO like end of a round */
915 | bbr_lt_bw_sampling(sk, &rs);
916 | }
917 | }
918 |
919 | static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = {
920 | .flags = TCP_CONG_NON_RESTRICTED,
921 | .name = "nanqinlang",
922 | .owner = THIS_MODULE,
923 | .init = bbr_init,
924 | .cong_control = bbr_main,
925 | .sndbuf_expand = bbr_sndbuf_expand,
926 | .undo_cwnd = bbr_undo_cwnd,
927 | .cwnd_event = bbr_cwnd_event,
928 | .ssthresh = bbr_ssthresh,
929 | .tso_segs_goal = bbr_tso_segs_goal,
930 | .get_info = bbr_get_info,
931 | .set_state = bbr_set_state,
932 | };
933 |
934 | static int __init bbr_register(void)
935 | {
936 | BUILD_BUG_ON(sizeof(struct bbr) > ICSK_CA_PRIV_SIZE);
937 | return tcp_register_congestion_control(&tcp_bbr_cong_ops);
938 | }
939 |
940 | static void __exit bbr_unregister(void)
941 | {
942 | tcp_unregister_congestion_control(&tcp_bbr_cong_ops);
943 | }
944 |
945 | module_init(bbr_register);
946 | module_exit(bbr_unregister);
947 |
948 | MODULE_AUTHOR("Van Jacobson ");
949 | MODULE_AUTHOR("Neal Cardwell ");
950 | MODULE_AUTHOR("Yuchung Cheng ");
951 | MODULE_AUTHOR("Soheil Hassas Yeganeh ");
952 | MODULE_LICENSE("Dual BSD/GPL");
953 | MODULE_DESCRIPTION("TCP BBR (Bottleneck Bandwidth and RTT)");
954 | MODULE_AUTHOR("Nanqinlang ");
955 |
--------------------------------------------------------------------------------