├── Makefile ├── readme.md ├── LICENSE └── tcp_nanqinlang.c /Makefile: -------------------------------------------------------------------------------- 1 | obj-m := tcp_nanqinlang.o 2 | 3 | all: 4 | make -C /lib/modules/`uname -r`/build M=`pwd` modules CC=/usr/bin/gcc-6 5 | 6 | clean: 7 | make -C /lib/modules/`uname -r`/build M=`pwd` clean 8 | 9 | install: 10 | install tcp_nanqinlang.ko /lib/modules/`uname -r`/kernel/net/ipv4 11 | insmod /lib/modules/`uname -r`/kernel/net/ipv4/tcp_nanqinlang.ko 12 | depmod -a 13 | 14 | uninstall: 15 | rm /lib/modules/`uname -r`/kernel/net/ipv4/tcp_nanqinlang.ko -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # tcp_nanqinlang 2 | 3 | [![build](https://github.com/nanqinlang/SVG/blob/master/build%20passing.svg)](https://github.com/tcp-nanqinlang/tested) 4 | [![language](https://github.com/nanqinlang/SVG/blob/master/language-c-blue.svg)](https://github.com/tcp-nanqinlang/tested) 5 | [![author](https://github.com/nanqinlang/SVG/blob/master/author-nanqinlang-lightgrey.svg)](https://github.com/tcp-nanqinlang/tested) 6 | [![license](https://github.com/nanqinlang/SVG/blob/master/license-GPLv3-orange.svg)](https://github.com/tcp-nanqinlang/tested) 7 | 8 | `super-powered-testing branch` ! 9 | 10 | As this will, `this repo is just for testing`, please do not use it with important environment. 11 | 12 | ## manual 13 | ### requirements 14 | the bbr source file only support for `Ubuntu kernel v4.9.3-v4.12.x` 15 | 16 | the Makefile using `gcc-6`, you can modify it to gcc-4.9, etc. 17 | 18 | ### usage 19 | this repo gives you a source file and Makefile 20 | 21 | After you ensure you have a environment with essential requirements, you should run this followings then: 22 | ```bash 23 | make 24 | make install 25 | ``` 26 | 27 | If you have no a environment, you should build that. 28 | via: https://sometimesnaive.org/article/38 29 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | 635 | Copyright (C) 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | Copyright (C) 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | -------------------------------------------------------------------------------- /tcp_nanqinlang.c: -------------------------------------------------------------------------------- 1 | /* tcp_nanqinlang 2 | 3 | * Debian 4 | 5 | * kernel v4.9.3-v4.12.x 6 | 7 | × New BBR Congestion Control 8 | 9 | * Modified by (C) 2017-2018 nanqinlang 10 | 11 | * A super violent branch, and just for test ! 12 | 13 | ******************************************************************************* 14 | * Bottleneck Bandwidth and RTT (BBR) congestion control 15 | * 16 | * BBR congestion control computes the sending rate based on the delivery 17 | * rate (throughput) estimated from ACKs. In a nutshell: 18 | * 19 | * On each ACK, update our model of the network path: 20 | * bottleneck_bandwidth = windowed_max(delivered / elapsed, 10 round trips) 21 | * min_rtt = windowed_min(rtt, 10 seconds) 22 | * pacing_rate = pacing_gain * bottleneck_bandwidth 23 | * cwnd = max(cwnd_gain * bottleneck_bandwidth * min_rtt, 4) 24 | * 25 | * The core algorithm does not react directly to packet losses or delays, 26 | * although BBR may adjust the size of next send per ACK when loss is 27 | * observed, or adjust the sending rate if it estimates there is a 28 | * traffic policer, in order to keep the drop rate reasonable. 29 | * 30 | * Here is a state transition diagram for BBR: 31 | * 32 | * | 33 | * V 34 | * +---> STARTUP ----+ 35 | * | | | 36 | * | V | 37 | * | DRAIN ----+ 38 | * | | | 39 | * | V | 40 | * +---> PROBE_BW ----+ 41 | * | ^ | | 42 | * | | | | 43 | * | +----+ | 44 | * | | 45 | * +---- PROBE_RTT <--+ 46 | * 47 | * A BBR flow starts in STARTUP, and ramps up its sending rate quickly. 48 | * When it estimates the pipe is full, it enters DRAIN to drain the queue. 49 | * In steady state a BBR flow only uses PROBE_BW and PROBE_RTT. 50 | * A long-lived BBR flow spends the vast majority of its time remaining 51 | * (repeatedly) in PROBE_BW, fully probing and utilizing the pipe's bandwidth 52 | * in a fair manner, with a small, bounded queue. *If* a flow has been 53 | * continuously sending for the entire min_rtt window, and hasn't seen an RTT 54 | * sample that matches or decreases its min_rtt estimate for 10 seconds, then 55 | * it briefly enters PROBE_RTT to cut inflight to a minimum value to re-probe 56 | * the path's two-way propagation delay (min_rtt). When exiting PROBE_RTT, if 57 | * we estimated that we reached the full bw of the pipe then we enter PROBE_BW; 58 | * otherwise we enter STARTUP to try to fill the pipe. 59 | * 60 | * BBR is described in detail in: 61 | * "BBR: Congestion-Based Congestion Control", 62 | * Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, 63 | * Van Jacobson. ACM Queue, Vol. 14 No. 5, September-October 2016. 64 | * 65 | * There is a public e-mail list for discussing BBR development and testing: 66 | * https://groups.google.com/forum/#!forum/bbr-dev 67 | * 68 | * NOTE: BBR *must* be used with the fq qdisc ("man tc-fq") with pacing enabled, 69 | * since pacing is integral to the BBR design and implementation. 70 | * BBR without pacing would not function properly, and may incur unnecessary 71 | * high packet loss rates. 72 | */ 73 | 74 | #include 75 | #include 76 | #include 77 | #include 78 | #include 79 | #include 80 | 81 | /* Scale factor for rate in pkt/uSec unit to avoid truncation in bandwidth 82 | * estimation. The rate unit ~= (1500 bytes / 1 usec / 2^24) ~= 715 bps. 83 | * This handles bandwidths from 0.06pps (715bps) to 256Mpps (3Tbps) in a u32. 84 | * Since the minimum window is >=4 packets, the lower bound isn't 85 | * an issue. The upper bound isn't an issue with existing technologies. 86 | */ 87 | #define BW_SCALE 24 88 | #define BW_UNIT (1 << BW_SCALE) 89 | 90 | #define BBR_SCALE 8 /* scaling factor for fractions in BBR (e.g. gains) */ 91 | #define BBR_UNIT (1 << BBR_SCALE) 92 | 93 | #define CYCLE_LEN 8 /* number of phases in a pacing gain cycle */ 94 | 95 | 96 | // ************************************************************************** 97 | // the following is the main 98 | // ************************************************************************** 99 | 100 | 101 | /* BBR has the following modes for deciding how fast to send: */ 102 | // four working mode 103 | enum bbr_mode { 104 | BBR_STARTUP, /* ramp up sending rate rapidly to fill pipe */ 105 | BBR_DRAIN, /* drain any queue created during startup */ 106 | BBR_PROBE_BW, /* discover, share bw: pace around estimated bw */ 107 | BBR_PROBE_RTT, /* cut cwnd to min to probe min_rtt */ 108 | }; 109 | 110 | 111 | /* BBR congestion control block */ 112 | // control block with u32 values you set 113 | struct bbr { 114 | u32 min_rtt_us; /* min RTT in min_rtt_win_sec window */ 115 | u32 min_rtt_stamp; /* timestamp of min_rtt_us */ 116 | u32 probe_rtt_done_stamp; /* end time for BBR_PROBE_RTT mode */ 117 | struct minmax bw; /* Max recent delivery rate in pkts/uS << 24 */ 118 | u32 rtt_cnt; /* count of packet-timed rounds elapsed */ 119 | u32 next_rtt_delivered; /* scb->tx.delivered at end of round */ 120 | struct skb_mstamp cycle_mstamp; /* time of this cycle phase start */ 121 | u32 mode:3, /* current bbr_mode in state machine */ 122 | prev_ca_state:3, /* CA state on previous ACK */ 123 | packet_conservation:1, /* use packet conservation? */ 124 | restore_cwnd:1, /* decided to revert cwnd to old value */ 125 | round_start:1, /* start of packet-timed tx->ack round? */ 126 | tso_segs_goal:7, /* segments we want in each skb we send */ 127 | idle_restart:1, /* restarting after idle? */ 128 | probe_rtt_round_done:1, /* a BBR_PROBE_RTT round at 4 pkts? */ 129 | unused:5, 130 | lt_is_sampling:1, /* taking long-term ("LT") samples now? */ 131 | lt_rtt_cnt:7, /* round trips in long-term interval */ 132 | lt_use_bw:1; /* use lt_bw as our bw estimate? */ 133 | u32 lt_bw; /* LT est delivery rate in pkts/uS << 24 */ 134 | u32 lt_last_delivered; /* LT intvl start: tp->delivered */ 135 | u32 lt_last_stamp; /* LT intvl start: tp->delivered_mstamp */ 136 | u32 lt_last_lost; /* LT intvl start: tp->lost */ 137 | u32 pacing_gain:10, /* current gain for setting pacing rate */ 138 | cwnd_gain:10, /* current gain for setting cwnd */ 139 | full_bw_cnt:3, /* number of rounds without large bw gains */ 140 | cycle_idx:3, /* current index in pacing_gain cycle array */ 141 | unused_b:5; 142 | u32 prior_cwnd; /* prior cwnd upon entering loss recovery */ 143 | u32 full_bw; /* recent bw, to estimate if pipe is full */ 144 | }; 145 | 146 | 147 | /* Window length of bw filter (in rounds): */ 148 | static const int bbr_bw_rtts = CYCLE_LEN + 7; 149 | /* Window length of min_rtt filter (in sec): */ 150 | static const u32 bbr_min_rtt_win_sec = 7; 151 | /* Minimum time (in ms) spent at bbr_cwnd_min_target in BBR_PROBE_RTT mode: */ 152 | static const u32 bbr_probe_rtt_mode_ms = 70; 153 | /* Skip TSO below the following bandwidth (bits/sec): */ 154 | static const int bbr_min_tso_rate = 1024000; 155 | 156 | /* We use a high_gain value of 2/ln(2) because it's the smallest pacing gain 157 | * that will allow a smoothly increasing pacing rate that will double each RTT 158 | * and send the same number of packets per RTT that an un-paced, slow-starting 159 | * Reno or CUBIC flow would: 160 | */ 161 | static const int bbr_high_gain = BBR_UNIT * 3250 / 1000 + 1; 162 | /* The pacing gain of 1/high_gain in BBR_DRAIN is calculated to typically drain 163 | * the queue created in BBR_STARTUP in a single round: 164 | */ 165 | static const int bbr_drain_gain = BBR_UNIT * 1000 / 3250; 166 | /* The gain for deriving steady-state cwnd tolerates delayed/stretched ACKs: */ 167 | static const int bbr_cwnd_gain = BBR_UNIT * 2; 168 | /* The pacing_gain values for the PROBE_BW gain cycle, to discover/share bw: */ 169 | static const int bbr_pacing_gain[] = { 170 | // for the stable bbr mode "BBR_PROBE_BW" which makes the fastest speed mode. 171 | // there are 8 pacing rate 172 | BBR_UNIT * 8 / 4, /* probe for more available bw */ 173 | BBR_UNIT * 3 / 4, /* drain queue and/or yield bw to other flows */ 174 | BBR_UNIT * 7 / 4, BBR_UNIT * 7 / 4, BBR_UNIT * 7 / 4, /* cruise at 1.0*bw to utilize pipe, */ 175 | BBR_UNIT * 8 / 4, BBR_UNIT * 8 / 4, BBR_UNIT * 8 / 4 /* without creating excess queue... */ 176 | }; 177 | 178 | /* Randomize the starting gain cycling phase over N phases: */ 179 | static const u32 bbr_cycle_rand = 7; 180 | 181 | /* Try to keep at least this many packets in flight, if things go smoothly. For 182 | * smooth functioning, a sliding window protocol ACKing every other packet 183 | * needs at least 4 packets in flight: 184 | */ 185 | // minimumly keeps 4 package when discover minimum rtt 186 | static const u32 bbr_cwnd_min_target = 4; 187 | 188 | /* To estimate if BBR_STARTUP mode (i.e. high_gain) has filled pipe... */ 189 | /* If bw has increased significantly (1.25x), there may be more bw available: */ 190 | static const u32 bbr_full_bw_thresh = BBR_UNIT * 8 / 4; 191 | /* But after 3 rounds w/o significant bw growth, estimate pipe is full: */ 192 | static const u32 bbr_full_bw_cnt = 3; 193 | 194 | /* "long-term" ("LT") bandwidth estimator parameters... */ 195 | /* The minimum number of rounds in an LT bw sampling interval: */ 196 | static const u32 bbr_lt_intvl_min_rtts = 4; 197 | /* If lost/delivered ratio > 20%, interval is "lossy" and we may be policed: */ 198 | static const u32 bbr_lt_loss_thresh = 60; 199 | /* If 2 intervals have a bw ratio <= 1/8, their bw is "consistent": */ 200 | static const u32 bbr_lt_bw_ratio = BBR_UNIT / 4; 201 | /* If 2 intervals have a bw diff <= 4 Kbit/sec their bw is "consistent": */ 202 | static const u32 bbr_lt_bw_diff = 4000 / 4; 203 | /* If we estimate we're policed, use lt_bw for this many round trips: */ 204 | static const u32 bbr_lt_bw_max_rtts = 40; 205 | 206 | /* Do we estimate that STARTUP filled the pipe? */ 207 | static bool bbr_full_bw_reached(const struct sock *sk) 208 | { 209 | const struct bbr *bbr = inet_csk_ca(sk); 210 | 211 | return bbr->full_bw_cnt >= bbr_full_bw_cnt; 212 | } 213 | 214 | /* Return the windowed max recent bandwidth sample, in pkts/uS << BW_SCALE. */ 215 | static u32 bbr_max_bw(const struct sock *sk) 216 | { 217 | struct bbr *bbr = inet_csk_ca(sk); 218 | 219 | return minmax_get(&bbr->bw); 220 | } 221 | 222 | /* Return the estimated bandwidth of the path, in pkts/uS << BW_SCALE. */ 223 | static u32 bbr_bw(const struct sock *sk) 224 | { 225 | struct bbr *bbr = inet_csk_ca(sk); 226 | 227 | return bbr->lt_use_bw ? bbr->lt_bw : bbr_max_bw(sk); 228 | } 229 | 230 | /* Return rate in bytes per second, optionally with a gain. 231 | * The order here is chosen carefully to avoid overflow of u64. This should 232 | * work for input rates of up to 2.9Tbit/sec and gain of 2.89x. 233 | */ 234 | static u64 bbr_rate_bytes_per_sec(struct sock *sk, u64 rate, int gain) 235 | { 236 | rate *= tcp_mss_to_mtu(sk, tcp_sk(sk)->mss_cache); 237 | rate *= gain; 238 | rate >>= BBR_SCALE; 239 | rate *= USEC_PER_SEC; 240 | return rate >> BW_SCALE; 241 | } 242 | 243 | /* Pace using current bw estimate and a gain factor. In order to help drive the 244 | * network toward lower queues while maintaining high utilization and low 245 | * latency, the average pacing rate aims to be slightly (~1%) lower than the 246 | * estimated bandwidth. This is an important aspect of the design. In this 247 | * implementation this slightly lower pacing rate is achieved implicitly by not 248 | * including link-layer headers in the packet size used for the pacing rate. 249 | */ 250 | static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain) 251 | { 252 | struct bbr *bbr = inet_csk_ca(sk); 253 | u64 rate = bw; 254 | 255 | rate = bbr_rate_bytes_per_sec(sk, rate, gain); 256 | rate = min_t(u64, rate, sk->sk_max_pacing_rate); 257 | if (bbr->mode != BBR_STARTUP || rate > sk->sk_pacing_rate) 258 | sk->sk_pacing_rate = rate; 259 | } 260 | 261 | /* Return count of segments we want in the skbs we send, or 0 for default. */ 262 | static u32 bbr_tso_segs_goal(struct sock *sk) 263 | { 264 | struct bbr *bbr = inet_csk_ca(sk); 265 | 266 | return bbr->tso_segs_goal; 267 | } 268 | 269 | static void bbr_set_tso_segs_goal(struct sock *sk) 270 | { 271 | struct tcp_sock *tp = tcp_sk(sk); 272 | struct bbr *bbr = inet_csk_ca(sk); 273 | u32 min_segs; 274 | 275 | min_segs = sk->sk_pacing_rate < (bbr_min_tso_rate >> 3) ? 1 : 2; 276 | bbr->tso_segs_goal = min(tcp_tso_autosize(sk, tp->mss_cache, min_segs), 277 | 0x7FU); 278 | } 279 | 280 | /* Save "last known good" cwnd so we can restore it after losses or PROBE_RTT */ 281 | static void bbr_save_cwnd(struct sock *sk) 282 | { 283 | struct tcp_sock *tp = tcp_sk(sk); 284 | struct bbr *bbr = inet_csk_ca(sk); 285 | 286 | if (bbr->prev_ca_state < TCP_CA_Recovery && bbr->mode != BBR_PROBE_RTT) 287 | bbr->prior_cwnd = tp->snd_cwnd; /* this cwnd is good enough */ 288 | else /* loss recovery or BBR_PROBE_RTT have temporarily cut cwnd */ 289 | bbr->prior_cwnd = max(bbr->prior_cwnd, tp->snd_cwnd); 290 | } 291 | 292 | static void bbr_cwnd_event(struct sock *sk, enum tcp_ca_event event) 293 | { 294 | struct tcp_sock *tp = tcp_sk(sk); 295 | struct bbr *bbr = inet_csk_ca(sk); 296 | 297 | if (event == CA_EVENT_TX_START && tp->app_limited) { 298 | bbr->idle_restart = 1; 299 | /* Avoid pointless buffer overflows: pace at est. bw if we don't 300 | * need more speed (we're restarting from idle and app-limited). 301 | */ 302 | if (bbr->mode == BBR_PROBE_BW) 303 | bbr_set_pacing_rate(sk, bbr_bw(sk), BBR_UNIT); 304 | } 305 | } 306 | 307 | /* Find target cwnd. Right-size the cwnd based on min RTT and the 308 | * estimated bottleneck bandwidth: 309 | * 310 | * cwnd = bw * min_rtt * gain = BDP * gain 311 | * 312 | * The key factor, gain, controls the amount of queue. While a small gain 313 | * builds a smaller queue, it becomes more vulnerable to noise in RTT 314 | * measurements (e.g., delayed ACKs or other ACK compression effects). This 315 | * noise may cause BBR to under-estimate the rate. 316 | * 317 | * To achieve full performance in high-speed paths, we budget enough cwnd to 318 | * fit full-sized skbs in-flight on both end hosts to fully utilize the path: 319 | * - one skb in sending host Qdisc, 320 | * - one skb in sending host TSO/GSO engine 321 | * - one skb being received by receiver host LRO/GRO/delayed-ACK engine 322 | * Don't worry, at low rates (bbr_min_tso_rate) this won't bloat cwnd because 323 | * in such cases tso_segs_goal is 1. The minimum cwnd is 4 packets, 324 | * which allows 2 outstanding 2-packet sequences, to try to keep pipe 325 | * full even with ACK-every-other-packet delayed ACKs. 326 | */ 327 | static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain) 328 | { 329 | struct bbr *bbr = inet_csk_ca(sk); 330 | u32 cwnd; 331 | u64 w; 332 | 333 | /* If we've never had a valid RTT sample, cap cwnd at the initial 334 | * default. This should only happen when the connection is not using TCP 335 | * timestamps and has retransmitted all of the SYN/SYNACK/data packets 336 | * ACKed so far. In this case, an RTO can cut cwnd to 1, in which 337 | * case we need to slow-start up toward something safe: TCP_INIT_CWND. 338 | */ 339 | if (unlikely(bbr->min_rtt_us == ~0U)) /* no valid RTT samples yet? */ 340 | return TCP_INIT_CWND; /* be safe: cap at default initial cwnd*/ 341 | 342 | w = (u64)bw * bbr->min_rtt_us; 343 | 344 | /* Apply a gain to the given value, then remove the BW_SCALE shift. */ 345 | cwnd = (((w * gain) >> BBR_SCALE) + BW_UNIT - 1) / BW_UNIT; 346 | 347 | /* Allow enough full-sized skbs in flight to utilize end systems. */ 348 | cwnd += 3 * bbr->tso_segs_goal; 349 | 350 | /* Reduce delayed ACKs by rounding up cwnd to the next even number. */ 351 | cwnd = (cwnd + 1) & ~1U; 352 | 353 | return cwnd; 354 | } 355 | 356 | /* An optimization in BBR to reduce losses: On the first round of recovery, we 357 | * follow the packet conservation principle: send P packets per P packets acked. 358 | * After that, we slow-start and send at most 2*P packets per P packets acked. 359 | * After recovery finishes, or upon undo, we restore the cwnd we had when 360 | * recovery started (capped by the target cwnd based on estimated BDP). 361 | * 362 | * TODO(ycheng/ncardwell): implement a rate-based approach. 363 | */ 364 | static bool bbr_set_cwnd_to_recover_or_restore( 365 | struct sock *sk, const struct rate_sample *rs, u32 acked, u32 *new_cwnd) 366 | { 367 | struct tcp_sock *tp = tcp_sk(sk); 368 | struct bbr *bbr = inet_csk_ca(sk); 369 | u8 prev_state = bbr->prev_ca_state, state = inet_csk(sk)->icsk_ca_state; 370 | u32 cwnd = tp->snd_cwnd; 371 | 372 | /* An ACK for P pkts should release at most 2*P packets. We do this 373 | * in two steps. First, here we deduct the number of lost packets. 374 | * Then, in bbr_set_cwnd() we slow start up toward the target cwnd. 375 | */ 376 | if (rs->losses > 0) 377 | cwnd = max_t(s32, cwnd - rs->losses, 1); 378 | 379 | if (state == TCP_CA_Recovery && prev_state != TCP_CA_Recovery) { 380 | /* Starting 1st round of Recovery, so do packet conservation. */ 381 | bbr->packet_conservation = 1; 382 | bbr->next_rtt_delivered = tp->delivered; /* start round now */ 383 | /* Cut unused cwnd from app behavior, TSQ, or TSO deferral: */ 384 | cwnd = tcp_packets_in_flight(tp) + acked; 385 | } else if (prev_state >= TCP_CA_Recovery && state < TCP_CA_Recovery) { 386 | /* Exiting loss recovery; restore cwnd saved before recovery. */ 387 | bbr->restore_cwnd = 1; 388 | bbr->packet_conservation = 0; 389 | } 390 | bbr->prev_ca_state = state; 391 | 392 | if (bbr->restore_cwnd) { 393 | /* Restore cwnd after exiting loss recovery or PROBE_RTT. */ 394 | cwnd = max(cwnd, bbr->prior_cwnd); 395 | bbr->restore_cwnd = 0; 396 | } 397 | 398 | if (bbr->packet_conservation) { 399 | *new_cwnd = max(cwnd, tcp_packets_in_flight(tp) + acked); 400 | return true; /* yes, using packet conservation */ 401 | } 402 | *new_cwnd = cwnd; 403 | return false; 404 | } 405 | 406 | /* Slow-start up toward target cwnd (if bw estimate is growing, or packet loss 407 | * has drawn us down below target), or snap down to target if we're above it. 408 | */ 409 | static void bbr_set_cwnd(struct sock *sk, const struct rate_sample *rs, 410 | u32 acked, u32 bw, int gain) 411 | { 412 | struct tcp_sock *tp = tcp_sk(sk); 413 | struct bbr *bbr = inet_csk_ca(sk); 414 | u32 cwnd = 0, target_cwnd = 0; 415 | 416 | if (!acked) 417 | return; 418 | 419 | if (bbr_set_cwnd_to_recover_or_restore(sk, rs, acked, &cwnd)) 420 | goto done; 421 | 422 | /* If we're below target cwnd, slow start cwnd toward target cwnd. */ 423 | target_cwnd = bbr_target_cwnd(sk, bw, gain); 424 | if (bbr_full_bw_reached(sk)) /* only cut cwnd if we filled the pipe */ 425 | cwnd = min(cwnd + acked, target_cwnd); 426 | else if (cwnd < target_cwnd || tp->delivered < TCP_INIT_CWND) 427 | cwnd = cwnd + acked; 428 | cwnd = max(cwnd, bbr_cwnd_min_target); 429 | 430 | done: 431 | tp->snd_cwnd = min(cwnd, tp->snd_cwnd_clamp); /* apply global cap */ 432 | if (bbr->mode == BBR_PROBE_RTT) /* drain queue, refresh min_rtt */ 433 | tp->snd_cwnd = max(tp->snd_cwnd >> 1, bbr_cwnd_min_target); 434 | } 435 | 436 | /* End cycle phase if it's time and/or we hit the phase's in-flight target. */ 437 | static bool bbr_is_next_cycle_phase(struct sock *sk, 438 | const struct rate_sample *rs) 439 | { 440 | struct tcp_sock *tp = tcp_sk(sk); 441 | struct bbr *bbr = inet_csk_ca(sk); 442 | bool is_full_length = 443 | skb_mstamp_us_delta(&tp->delivered_mstamp, &bbr->cycle_mstamp) > 444 | bbr->min_rtt_us; 445 | u32 inflight, bw; 446 | 447 | /* The pacing_gain of 1.0 paces at the estimated bw to try to fully 448 | * use the pipe without increasing the queue. 449 | */ 450 | if (bbr->pacing_gain == BBR_UNIT) 451 | return is_full_length; /* just use wall clock time */ 452 | 453 | inflight = rs->prior_in_flight; /* what was in-flight before ACK? */ 454 | bw = bbr_max_bw(sk); 455 | 456 | /* A pacing_gain > 1.0 probes for bw by trying to raise inflight to at 457 | * least pacing_gain*BDP; this may take more than min_rtt if min_rtt is 458 | * small (e.g. on a LAN). We do not persist if packets are lost, since 459 | * a path with small buffers may not hold that much. 460 | */ 461 | if (bbr->pacing_gain > BBR_UNIT) 462 | return is_full_length && 463 | (rs->losses || /* perhaps pacing_gain*BDP won't fit */ 464 | inflight >= bbr_target_cwnd(sk, bw, bbr->pacing_gain)); 465 | 466 | /* A pacing_gain < 1.0 tries to drain extra queue we added if bw 467 | * probing didn't find more bw. If inflight falls to match BDP then we 468 | * estimate queue is drained; persisting would underutilize the pipe. 469 | */ 470 | return is_full_length || 471 | inflight <= bbr_target_cwnd(sk, bw, BBR_UNIT); 472 | } 473 | 474 | static void bbr_advance_cycle_phase(struct sock *sk) 475 | { 476 | struct tcp_sock *tp = tcp_sk(sk); 477 | struct bbr *bbr = inet_csk_ca(sk); 478 | 479 | bbr->cycle_idx = (bbr->cycle_idx + 1) & (CYCLE_LEN - 1); 480 | bbr->cycle_mstamp = tp->delivered_mstamp; 481 | bbr->pacing_gain = bbr_pacing_gain[bbr->cycle_idx]; 482 | } 483 | 484 | /* Gain cycling: cycle pacing gain to converge to fair share of available bw. */ 485 | static void bbr_update_cycle_phase(struct sock *sk, 486 | const struct rate_sample *rs) 487 | { 488 | struct bbr *bbr = inet_csk_ca(sk); 489 | 490 | if ((bbr->mode == BBR_PROBE_BW) && !bbr->lt_use_bw && 491 | bbr_is_next_cycle_phase(sk, rs)) 492 | bbr_advance_cycle_phase(sk); 493 | } 494 | 495 | static void bbr_reset_startup_mode(struct sock *sk) 496 | { 497 | struct bbr *bbr = inet_csk_ca(sk); 498 | 499 | bbr->mode = BBR_STARTUP; 500 | bbr->pacing_gain = bbr_high_gain; 501 | bbr->cwnd_gain = bbr_high_gain; 502 | } 503 | 504 | static void bbr_reset_probe_bw_mode(struct sock *sk) 505 | { 506 | struct bbr *bbr = inet_csk_ca(sk); 507 | 508 | bbr->mode = BBR_PROBE_BW; 509 | bbr->pacing_gain = BBR_UNIT; 510 | bbr->cwnd_gain = bbr_cwnd_gain; 511 | bbr->cycle_idx = CYCLE_LEN - 1 - prandom_u32_max(bbr_cycle_rand); 512 | bbr_advance_cycle_phase(sk); /* flip to next phase of gain cycle */ 513 | } 514 | 515 | static void bbr_reset_mode(struct sock *sk) 516 | { 517 | if (!bbr_full_bw_reached(sk)) 518 | bbr_reset_startup_mode(sk); 519 | else 520 | bbr_reset_probe_bw_mode(sk); 521 | } 522 | 523 | /* Start a new long-term sampling interval. */ 524 | static void bbr_reset_lt_bw_sampling_interval(struct sock *sk) 525 | { 526 | struct tcp_sock *tp = tcp_sk(sk); 527 | struct bbr *bbr = inet_csk_ca(sk); 528 | 529 | bbr->lt_last_stamp = tp->delivered_mstamp.stamp_jiffies; 530 | bbr->lt_last_delivered = tp->delivered; 531 | bbr->lt_last_lost = tp->lost; 532 | bbr->lt_rtt_cnt = 0; 533 | } 534 | 535 | /* Completely reset long-term bandwidth sampling. */ 536 | static void bbr_reset_lt_bw_sampling(struct sock *sk) 537 | { 538 | struct bbr *bbr = inet_csk_ca(sk); 539 | 540 | bbr->lt_bw = 0; 541 | bbr->lt_use_bw = 0; 542 | bbr->lt_is_sampling = false; 543 | bbr_reset_lt_bw_sampling_interval(sk); 544 | } 545 | 546 | /* Long-term bw sampling interval is done. Estimate whether we're policed. */ 547 | static void bbr_lt_bw_interval_done(struct sock *sk, u32 bw) 548 | { 549 | struct bbr *bbr = inet_csk_ca(sk); 550 | u32 diff; 551 | 552 | if (bbr->lt_bw) { /* do we have bw from a previous interval? */ 553 | /* Is new bw close to the lt_bw from the previous interval? */ 554 | diff = abs(bw - bbr->lt_bw); 555 | if ((diff * BBR_UNIT <= bbr_lt_bw_ratio * bbr->lt_bw) || 556 | (bbr_rate_bytes_per_sec(sk, diff, BBR_UNIT) <= 557 | bbr_lt_bw_diff)) { 558 | /* All criteria are met; estimate we're policed. */ 559 | bbr->lt_bw = (bw + bbr->lt_bw) >> 1; /* avg 2 intvls */ 560 | bbr->lt_use_bw = 1; 561 | bbr->pacing_gain = BBR_UNIT; /* try to avoid drops */ 562 | bbr->lt_rtt_cnt = 0; 563 | return; 564 | } 565 | } 566 | bbr->lt_bw = bw; 567 | bbr_reset_lt_bw_sampling_interval(sk); 568 | } 569 | 570 | /* Token-bucket traffic policers are common (see "An Internet-Wide Analysis of 571 | * Traffic Policing", SIGCOMM 2016). BBR detects token-bucket policers and 572 | * explicitly models their policed rate, to reduce unnecessary losses. We 573 | * estimate that we're policed if we see 2 consecutive sampling intervals with 574 | * consistent throughput and high packet loss. If we think we're being policed, 575 | * set lt_bw to the "long-term" average delivery rate from those 2 intervals. 576 | */ 577 | static void bbr_lt_bw_sampling(struct sock *sk, const struct rate_sample *rs) 578 | { 579 | struct tcp_sock *tp = tcp_sk(sk); 580 | struct bbr *bbr = inet_csk_ca(sk); 581 | u32 lost, delivered; 582 | u64 bw; 583 | s32 t; 584 | 585 | if (bbr->lt_use_bw) { /* already using long-term rate, lt_bw? */ 586 | if (bbr->mode == BBR_PROBE_BW && bbr->round_start && 587 | ++bbr->lt_rtt_cnt >= bbr_lt_bw_max_rtts) { 588 | bbr_reset_lt_bw_sampling(sk); /* stop using lt_bw */ 589 | bbr_reset_probe_bw_mode(sk); /* restart gain cycling */ 590 | } 591 | return; 592 | } 593 | 594 | /* Wait for the first loss before sampling, to let the policer exhaust 595 | * its tokens and estimate the steady-state rate allowed by the policer. 596 | * Starting samples earlier includes bursts that over-estimate the bw. 597 | */ 598 | if (!bbr->lt_is_sampling) { 599 | if (!rs->losses) 600 | return; 601 | bbr_reset_lt_bw_sampling_interval(sk); 602 | bbr->lt_is_sampling = true; 603 | } 604 | 605 | /* To avoid underestimates, reset sampling if we run out of data. */ 606 | if (rs->is_app_limited) { 607 | bbr_reset_lt_bw_sampling(sk); 608 | return; 609 | } 610 | 611 | if (bbr->round_start) 612 | bbr->lt_rtt_cnt++; /* count round trips in this interval */ 613 | if (bbr->lt_rtt_cnt < bbr_lt_intvl_min_rtts) 614 | return; /* sampling interval needs to be longer */ 615 | if (bbr->lt_rtt_cnt > 4 * bbr_lt_intvl_min_rtts) { 616 | bbr_reset_lt_bw_sampling(sk); /* interval is too long */ 617 | return; 618 | } 619 | 620 | /* End sampling interval when a packet is lost, so we estimate the 621 | * policer tokens were exhausted. Stopping the sampling before the 622 | * tokens are exhausted under-estimates the policed rate. 623 | */ 624 | if (!rs->losses) 625 | return; 626 | 627 | /* Calculate packets lost and delivered in sampling interval. */ 628 | lost = tp->lost - bbr->lt_last_lost; 629 | delivered = tp->delivered - bbr->lt_last_delivered; 630 | /* Is loss rate (lost/delivered) >= lt_loss_thresh? If not, wait. */ 631 | if (!delivered || (lost << BBR_SCALE) < bbr_lt_loss_thresh * delivered) 632 | return; 633 | 634 | /* Find average delivery rate in this sampling interval. */ 635 | t = (s32)(tp->delivered_mstamp.stamp_jiffies - bbr->lt_last_stamp); 636 | if (t < 1) 637 | return; /* interval is less than one jiffy, so wait */ 638 | t = jiffies_to_usecs(t); 639 | /* Interval long enough for jiffies_to_usecs() to return a bogus 0? */ 640 | if (t < 1) { 641 | bbr_reset_lt_bw_sampling(sk); /* interval too long; reset */ 642 | return; 643 | } 644 | bw = (u64)delivered * BW_UNIT; 645 | do_div(bw, t); 646 | bbr_lt_bw_interval_done(sk, bw); 647 | } 648 | 649 | /* Estimate the bandwidth based on how fast packets are delivered */ 650 | static void bbr_update_bw(struct sock *sk, const struct rate_sample *rs) 651 | { 652 | struct tcp_sock *tp = tcp_sk(sk); 653 | struct bbr *bbr = inet_csk_ca(sk); 654 | u64 bw; 655 | 656 | bbr->round_start = 0; 657 | if (rs->delivered < 0 || rs->interval_us <= 0) 658 | return; /* Not a valid observation */ 659 | 660 | /* See if we've reached the next RTT */ 661 | if (!before(rs->prior_delivered, bbr->next_rtt_delivered)) { 662 | bbr->next_rtt_delivered = tp->delivered; 663 | bbr->rtt_cnt++; 664 | bbr->round_start = 1; 665 | bbr->packet_conservation = 0; 666 | } 667 | 668 | bbr_lt_bw_sampling(sk, rs); 669 | 670 | /* Divide delivered by the interval to find a (lower bound) bottleneck 671 | * bandwidth sample. Delivered is in packets and interval_us in uS and 672 | * ratio will be <<1 for most connections. So delivered is first scaled. 673 | */ 674 | bw = (u64)rs->delivered * BW_UNIT; 675 | do_div(bw, rs->interval_us); 676 | 677 | /* If this sample is application-limited, it is likely to have a very 678 | * low delivered count that represents application behavior rather than 679 | * the available network rate. Such a sample could drag down estimated 680 | * bw, causing needless slow-down. Thus, to continue to send at the 681 | * last measured network rate, we filter out app-limited samples unless 682 | * they describe the path bw at least as well as our bw model. 683 | * 684 | * So the goal during app-limited phase is to proceed with the best 685 | * network rate no matter how long. We automatically leave this 686 | * phase when app writes faster than the network can deliver :) 687 | */ 688 | if (!rs->is_app_limited || bw >= bbr_max_bw(sk)) { 689 | /* Incorporate new sample into our max bw filter. */ 690 | minmax_running_max(&bbr->bw, bbr_bw_rtts, bbr->rtt_cnt, bw); 691 | } 692 | } 693 | 694 | /* Estimate when the pipe is full, using the change in delivery rate: BBR 695 | * estimates that STARTUP filled the pipe if the estimated bw hasn't changed by 696 | * at least bbr_full_bw_thresh (25%) after bbr_full_bw_cnt (3) non-app-limited 697 | * rounds. Why 3 rounds: 1: rwin autotuning grows the rwin, 2: we fill the 698 | * higher rwin, 3: we get higher delivery rate samples. Or transient 699 | * cross-traffic or radio noise can go away. CUBIC Hystart shares a similar 700 | * design goal, but uses delay and inter-ACK spacing instead of bandwidth. 701 | */ 702 | static void bbr_check_full_bw_reached(struct sock *sk, 703 | const struct rate_sample *rs) 704 | { 705 | struct bbr *bbr = inet_csk_ca(sk); 706 | u32 bw_thresh; 707 | 708 | if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited) 709 | return; 710 | 711 | bw_thresh = (u64)bbr->full_bw * bbr_full_bw_thresh >> BBR_SCALE; 712 | if (bbr_max_bw(sk) >= bw_thresh) { 713 | bbr->full_bw = bbr_max_bw(sk); 714 | bbr->full_bw_cnt = 0; 715 | return; 716 | } 717 | ++bbr->full_bw_cnt; 718 | } 719 | 720 | /* If pipe is probably full, drain the queue and then enter steady-state. */ 721 | static void bbr_check_drain(struct sock *sk, const struct rate_sample *rs) 722 | { 723 | struct bbr *bbr = inet_csk_ca(sk); 724 | 725 | if (bbr->mode == BBR_STARTUP && bbr_full_bw_reached(sk)) { 726 | bbr->mode = BBR_DRAIN; /* drain queue we created */ 727 | bbr->pacing_gain = bbr_drain_gain; /* pace slow to drain */ 728 | bbr->cwnd_gain = bbr_high_gain; /* maintain cwnd */ 729 | } /* fall through to check if in-flight is already small: */ 730 | if (bbr->mode == BBR_DRAIN && 731 | tcp_packets_in_flight(tcp_sk(sk)) <= 732 | bbr_target_cwnd(sk, bbr_max_bw(sk), BBR_UNIT)) 733 | bbr_reset_probe_bw_mode(sk); /* we estimate queue is drained */ 734 | } 735 | 736 | /* The goal of PROBE_RTT mode is to have BBR flows cooperatively and 737 | * periodically drain the bottleneck queue, to converge to measure the true 738 | * min_rtt (unloaded propagation delay). This allows the flows to keep queues 739 | * small (reducing queuing delay and packet loss) and achieve fairness among 740 | * BBR flows. 741 | * 742 | * The min_rtt filter window is 10 seconds. When the min_rtt estimate expires, 743 | * we enter PROBE_RTT mode and cap the cwnd at bbr_cwnd_min_target=4 packets. 744 | * After at least bbr_probe_rtt_mode_ms=200ms and at least one packet-timed 745 | * round trip elapsed with that flight size <= 4, we leave PROBE_RTT mode and 746 | * re-enter the previous mode. BBR uses 200ms to approximately bound the 747 | * performance penalty of PROBE_RTT's cwnd capping to roughly 2% (200ms/10s). 748 | * 749 | * Note that flows need only pay 2% if they are busy sending over the last 10 750 | * seconds. Interactive applications (e.g., Web, RPCs, video chunks) often have 751 | * natural silences or low-rate periods within 10 seconds where the rate is low 752 | * enough for long enough to drain its queue in the bottleneck. We pick up 753 | * these min RTT measurements opportunistically with our min_rtt filter. :-) 754 | */ 755 | static void bbr_update_min_rtt(struct sock *sk, const struct rate_sample *rs) 756 | { 757 | struct tcp_sock *tp = tcp_sk(sk); 758 | struct bbr *bbr = inet_csk_ca(sk); 759 | bool filter_expired; 760 | 761 | /* Track min RTT seen in the min_rtt_win_sec filter window: */ 762 | // as above BBR_Structure define: "min_rtt_win_sec = 5 seconds" 763 | filter_expired = after(tcp_time_stamp, 764 | bbr->min_rtt_stamp + bbr_min_rtt_win_sec * HZ); 765 | if (rs->rtt_us >= 0 && 766 | (rs->rtt_us <= bbr->min_rtt_us || filter_expired)) { 767 | bbr->min_rtt_us = rs->rtt_us; 768 | bbr->min_rtt_stamp = tcp_time_stamp; 769 | } 770 | 771 | if (bbr_probe_rtt_mode_ms > 0 && filter_expired && 772 | !bbr->idle_restart && bbr->mode != BBR_PROBE_RTT) { 773 | bbr->mode = BBR_PROBE_RTT; /* dip, drain queue */ 774 | bbr->pacing_gain = BBR_UNIT; 775 | bbr->cwnd_gain = BBR_UNIT; 776 | bbr_save_cwnd(sk); /* note cwnd so we can restore it */ 777 | bbr->probe_rtt_done_stamp = 0; 778 | } 779 | 780 | if (bbr->mode == BBR_PROBE_RTT) { 781 | /* Ignore low rate samples during this mode. */ 782 | tp->app_limited = 783 | (tp->delivered + tcp_packets_in_flight(tp)) ? : 1; 784 | /* Maintain min packets in flight for max(200 ms, 1 round). */ 785 | if (!bbr->probe_rtt_done_stamp && 786 | tcp_packets_in_flight(tp) <= bbr_cwnd_min_target) { 787 | bbr->probe_rtt_done_stamp = tcp_time_stamp + 788 | msecs_to_jiffies(bbr_probe_rtt_mode_ms >> 1); 789 | bbr->probe_rtt_round_done = 0; 790 | bbr->next_rtt_delivered = tp->delivered; 791 | } else if (bbr->probe_rtt_done_stamp) { 792 | if (bbr->round_start) 793 | bbr->probe_rtt_round_done = 1; 794 | if (bbr->probe_rtt_round_done && 795 | after(tcp_time_stamp, bbr->probe_rtt_done_stamp)) { 796 | bbr->min_rtt_stamp = tcp_time_stamp; 797 | bbr->restore_cwnd = 1; /* snap to prior_cwnd */ 798 | bbr_reset_mode(sk); 799 | } 800 | } 801 | } 802 | bbr->idle_restart = 0; 803 | } 804 | 805 | static void bbr_update_model(struct sock *sk, const struct rate_sample *rs) 806 | { 807 | bbr_update_bw(sk, rs); 808 | bbr_update_cycle_phase(sk, rs); 809 | bbr_check_full_bw_reached(sk, rs); 810 | bbr_check_drain(sk, rs); 811 | bbr_update_min_rtt(sk, rs); 812 | } 813 | 814 | static void bbr_main(struct sock *sk, const struct rate_sample *rs) 815 | { 816 | struct bbr *bbr = inet_csk_ca(sk); 817 | u32 bw; 818 | 819 | bbr_update_model(sk, rs); 820 | 821 | bw = bbr_bw(sk); 822 | bbr_set_pacing_rate(sk, bw, bbr->pacing_gain); 823 | bbr_set_tso_segs_goal(sk); 824 | bbr_set_cwnd(sk, rs, rs->acked_sacked, bw, bbr->cwnd_gain); 825 | } 826 | 827 | static void bbr_init(struct sock *sk) 828 | { 829 | struct tcp_sock *tp = tcp_sk(sk); 830 | struct bbr *bbr = inet_csk_ca(sk); 831 | u64 bw; 832 | 833 | bbr->prior_cwnd = 0; 834 | bbr->tso_segs_goal = 0; /* default segs per skb until first ACK */ 835 | bbr->rtt_cnt = 0; 836 | bbr->next_rtt_delivered = 0; 837 | bbr->prev_ca_state = TCP_CA_Open; 838 | bbr->packet_conservation = 0; 839 | 840 | bbr->probe_rtt_done_stamp = 0; 841 | bbr->probe_rtt_round_done = 0; 842 | bbr->min_rtt_us = tcp_min_rtt(tp); 843 | bbr->min_rtt_stamp = tcp_time_stamp; 844 | 845 | minmax_reset(&bbr->bw, bbr->rtt_cnt, 0); /* init max bw to 0 */ 846 | 847 | /* Initialize pacing rate to: high_gain * init_cwnd / RTT. */ 848 | bw = (u64)tp->snd_cwnd * BW_UNIT; 849 | do_div(bw, (tp->srtt_us >> 3) ? : USEC_PER_MSEC); 850 | sk->sk_pacing_rate = 0; /* force an update of sk_pacing_rate */ 851 | bbr_set_pacing_rate(sk, bw, bbr_high_gain); 852 | 853 | bbr->restore_cwnd = 0; 854 | bbr->round_start = 0; 855 | bbr->idle_restart = 0; 856 | bbr->full_bw = 0; 857 | bbr->full_bw_cnt = 0; 858 | bbr->cycle_mstamp.v64 = 0; 859 | bbr->cycle_idx = 0; 860 | bbr_reset_lt_bw_sampling(sk); 861 | bbr_reset_startup_mode(sk); 862 | } 863 | 864 | static u32 bbr_sndbuf_expand(struct sock *sk) 865 | { 866 | /* Provision 3 * cwnd since BBR may slow-start even during recovery. */ 867 | return 3; 868 | } 869 | 870 | /* In theory BBR does not need to undo the cwnd since it does not 871 | * always reduce cwnd on losses (see bbr_main()). Keep it for now. 872 | */ 873 | static u32 bbr_undo_cwnd(struct sock *sk) 874 | { 875 | return tcp_sk(sk)->snd_cwnd; 876 | } 877 | 878 | /* Entering loss recovery, so save cwnd for when we exit or undo recovery. */ 879 | static u32 bbr_ssthresh(struct sock *sk) 880 | { 881 | bbr_save_cwnd(sk); 882 | return TCP_INFINITE_SSTHRESH; /* BBR does not use ssthresh */ 883 | } 884 | 885 | static size_t bbr_get_info(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info) 886 | { 887 | if (ext & (1 << (INET_DIAG_BBRINFO - 1)) || 888 | ext & (1 << (INET_DIAG_VEGASINFO - 1))) { 889 | struct tcp_sock *tp = tcp_sk(sk); 890 | struct bbr *bbr = inet_csk_ca(sk); 891 | u64 bw = bbr_bw(sk); 892 | bw = bw * tp->mss_cache * USEC_PER_SEC >> BW_SCALE; 893 | memset(&info->bbr, 0, sizeof(info->bbr)); 894 | info->bbr.bbr_bw_lo = (u32)bw; 895 | info->bbr.bbr_bw_hi = (u32)(bw >> 32); 896 | info->bbr.bbr_min_rtt = bbr->min_rtt_us; 897 | info->bbr.bbr_pacing_gain = bbr->pacing_gain; 898 | info->bbr.bbr_cwnd_gain = bbr->cwnd_gain; 899 | *attr = INET_DIAG_BBRINFO; 900 | return sizeof(info->bbr); 901 | } 902 | return 0; 903 | } 904 | 905 | static void bbr_set_state(struct sock *sk, u8 new_state) 906 | { 907 | struct bbr *bbr = inet_csk_ca(sk); 908 | 909 | if (new_state == TCP_CA_Loss) { 910 | struct rate_sample rs = { .losses = 1 }; 911 | 912 | bbr->prev_ca_state = TCP_CA_Loss; 913 | bbr->full_bw = 0; 914 | bbr->round_start = 1; /* treat RTO like end of a round */ 915 | bbr_lt_bw_sampling(sk, &rs); 916 | } 917 | } 918 | 919 | static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = { 920 | .flags = TCP_CONG_NON_RESTRICTED, 921 | .name = "nanqinlang", 922 | .owner = THIS_MODULE, 923 | .init = bbr_init, 924 | .cong_control = bbr_main, 925 | .sndbuf_expand = bbr_sndbuf_expand, 926 | .undo_cwnd = bbr_undo_cwnd, 927 | .cwnd_event = bbr_cwnd_event, 928 | .ssthresh = bbr_ssthresh, 929 | .tso_segs_goal = bbr_tso_segs_goal, 930 | .get_info = bbr_get_info, 931 | .set_state = bbr_set_state, 932 | }; 933 | 934 | static int __init bbr_register(void) 935 | { 936 | BUILD_BUG_ON(sizeof(struct bbr) > ICSK_CA_PRIV_SIZE); 937 | return tcp_register_congestion_control(&tcp_bbr_cong_ops); 938 | } 939 | 940 | static void __exit bbr_unregister(void) 941 | { 942 | tcp_unregister_congestion_control(&tcp_bbr_cong_ops); 943 | } 944 | 945 | module_init(bbr_register); 946 | module_exit(bbr_unregister); 947 | 948 | MODULE_AUTHOR("Van Jacobson "); 949 | MODULE_AUTHOR("Neal Cardwell "); 950 | MODULE_AUTHOR("Yuchung Cheng "); 951 | MODULE_AUTHOR("Soheil Hassas Yeganeh "); 952 | MODULE_LICENSE("Dual BSD/GPL"); 953 | MODULE_DESCRIPTION("TCP BBR (Bottleneck Bandwidth and RTT)"); 954 | MODULE_AUTHOR("Nanqinlang "); 955 | --------------------------------------------------------------------------------