├── LICENSE
├── README
└── v3.10
├── 0001-net-tcp-TCP-with-Forward-Error-Correction-Common.patch
├── 0002-net-tcp-TCP-with-Forward-Error-Correction-Receiver.patch
└── 0003-net-tcp-TCP-with-Forward-Error-Correction-Sender.patch
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 2, June 1991
3 |
4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
6 | Everyone is permitted to copy and distribute verbatim copies
7 | of this license document, but changing it is not allowed.
8 |
9 | Preamble
10 |
11 | The licenses for most software are designed to take away your
12 | freedom to share and change it. By contrast, the GNU General Public
13 | License is intended to guarantee your freedom to share and change free
14 | software--to make sure the software is free for all its users. This
15 | General Public License applies to most of the Free Software
16 | Foundation's software and to any other program whose authors commit to
17 | using it. (Some other Free Software Foundation software is covered by
18 | the GNU Lesser General Public License instead.) You can apply it to
19 | your programs, too.
20 |
21 | When we speak of free software, we are referring to freedom, not
22 | price. Our General Public Licenses are designed to make sure that you
23 | have the freedom to distribute copies of free software (and charge for
24 | this service if you wish), that you receive source code or can get it
25 | if you want it, that you can change the software or use pieces of it
26 | in new free programs; and that you know you can do these things.
27 |
28 | To protect your rights, we need to make restrictions that forbid
29 | anyone to deny you these rights or to ask you to surrender the rights.
30 | These restrictions translate to certain responsibilities for you if you
31 | distribute copies of the software, or if you modify it.
32 |
33 | For example, if you distribute copies of such a program, whether
34 | gratis or for a fee, you must give the recipients all the rights that
35 | you have. You must make sure that they, too, receive or can get the
36 | source code. And you must show them these terms so they know their
37 | rights.
38 |
39 | We protect your rights with two steps: (1) copyright the software, and
40 | (2) offer you this license which gives you legal permission to copy,
41 | distribute and/or modify the software.
42 |
43 | Also, for each author's protection and ours, we want to make certain
44 | that everyone understands that there is no warranty for this free
45 | software. If the software is modified by someone else and passed on, we
46 | want its recipients to know that what they have is not the original, so
47 | that any problems introduced by others will not reflect on the original
48 | authors' reputations.
49 |
50 | Finally, any free program is threatened constantly by software
51 | patents. We wish to avoid the danger that redistributors of a free
52 | program will individually obtain patent licenses, in effect making the
53 | program proprietary. To prevent this, we have made it clear that any
54 | patent must be licensed for everyone's free use or not licensed at all.
55 |
56 | The precise terms and conditions for copying, distribution and
57 | modification follow.
58 |
59 | GNU GENERAL PUBLIC LICENSE
60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
61 |
62 | 0. This License applies to any program or other work which contains
63 | a notice placed by the copyright holder saying it may be distributed
64 | under the terms of this General Public License. The "Program", below,
65 | refers to any such program or work, and a "work based on the Program"
66 | means either the Program or any derivative work under copyright law:
67 | that is to say, a work containing the Program or a portion of it,
68 | either verbatim or with modifications and/or translated into another
69 | language. (Hereinafter, translation is included without limitation in
70 | the term "modification".) Each licensee is addressed as "you".
71 |
72 | Activities other than copying, distribution and modification are not
73 | covered by this License; they are outside its scope. The act of
74 | running the Program is not restricted, and the output from the Program
75 | is covered only if its contents constitute a work based on the
76 | Program (independent of having been made by running the Program).
77 | Whether that is true depends on what the Program does.
78 |
79 | 1. You may copy and distribute verbatim copies of the Program's
80 | source code as you receive it, in any medium, provided that you
81 | conspicuously and appropriately publish on each copy an appropriate
82 | copyright notice and disclaimer of warranty; keep intact all the
83 | notices that refer to this License and to the absence of any warranty;
84 | and give any other recipients of the Program a copy of this License
85 | along with the Program.
86 |
87 | You may charge a fee for the physical act of transferring a copy, and
88 | you may at your option offer warranty protection in exchange for a fee.
89 |
90 | 2. You may modify your copy or copies of the Program or any portion
91 | of it, thus forming a work based on the Program, and copy and
92 | distribute such modifications or work under the terms of Section 1
93 | above, provided that you also meet all of these conditions:
94 |
95 | a) You must cause the modified files to carry prominent notices
96 | stating that you changed the files and the date of any change.
97 |
98 | b) You must cause any work that you distribute or publish, that in
99 | whole or in part contains or is derived from the Program or any
100 | part thereof, to be licensed as a whole at no charge to all third
101 | parties under the terms of this License.
102 |
103 | c) If the modified program normally reads commands interactively
104 | when run, you must cause it, when started running for such
105 | interactive use in the most ordinary way, to print or display an
106 | announcement including an appropriate copyright notice and a
107 | notice that there is no warranty (or else, saying that you provide
108 | a warranty) and that users may redistribute the program under
109 | these conditions, and telling the user how to view a copy of this
110 | License. (Exception: if the Program itself is interactive but
111 | does not normally print such an announcement, your work based on
112 | the Program is not required to print an announcement.)
113 |
114 | These requirements apply to the modified work as a whole. If
115 | identifiable sections of that work are not derived from the Program,
116 | and can be reasonably considered independent and separate works in
117 | themselves, then this License, and its terms, do not apply to those
118 | sections when you distribute them as separate works. But when you
119 | distribute the same sections as part of a whole which is a work based
120 | on the Program, the distribution of the whole must be on the terms of
121 | this License, whose permissions for other licensees extend to the
122 | entire whole, and thus to each and every part regardless of who wrote it.
123 |
124 | Thus, it is not the intent of this section to claim rights or contest
125 | your rights to work written entirely by you; rather, the intent is to
126 | exercise the right to control the distribution of derivative or
127 | collective works based on the Program.
128 |
129 | In addition, mere aggregation of another work not based on the Program
130 | with the Program (or with a work based on the Program) on a volume of
131 | a storage or distribution medium does not bring the other work under
132 | the scope of this License.
133 |
134 | 3. You may copy and distribute the Program (or a work based on it,
135 | under Section 2) in object code or executable form under the terms of
136 | Sections 1 and 2 above provided that you also do one of the following:
137 |
138 | a) Accompany it with the complete corresponding machine-readable
139 | source code, which must be distributed under the terms of Sections
140 | 1 and 2 above on a medium customarily used for software interchange; or,
141 |
142 | b) Accompany it with a written offer, valid for at least three
143 | years, to give any third party, for a charge no more than your
144 | cost of physically performing source distribution, a complete
145 | machine-readable copy of the corresponding source code, to be
146 | distributed under the terms of Sections 1 and 2 above on a medium
147 | customarily used for software interchange; or,
148 |
149 | c) Accompany it with the information you received as to the offer
150 | to distribute corresponding source code. (This alternative is
151 | allowed only for noncommercial distribution and only if you
152 | received the program in object code or executable form with such
153 | an offer, in accord with Subsection b above.)
154 |
155 | The source code for a work means the preferred form of the work for
156 | making modifications to it. For an executable work, complete source
157 | code means all the source code for all modules it contains, plus any
158 | associated interface definition files, plus the scripts used to
159 | control compilation and installation of the executable. However, as a
160 | special exception, the source code distributed need not include
161 | anything that is normally distributed (in either source or binary
162 | form) with the major components (compiler, kernel, and so on) of the
163 | operating system on which the executable runs, unless that component
164 | itself accompanies the executable.
165 |
166 | If distribution of executable or object code is made by offering
167 | access to copy from a designated place, then offering equivalent
168 | access to copy the source code from the same place counts as
169 | distribution of the source code, even though third parties are not
170 | compelled to copy the source along with the object code.
171 |
172 | 4. You may not copy, modify, sublicense, or distribute the Program
173 | except as expressly provided under this License. Any attempt
174 | otherwise to copy, modify, sublicense or distribute the Program is
175 | void, and will automatically terminate your rights under this License.
176 | However, parties who have received copies, or rights, from you under
177 | this License will not have their licenses terminated so long as such
178 | parties remain in full compliance.
179 |
180 | 5. You are not required to accept this License, since you have not
181 | signed it. However, nothing else grants you permission to modify or
182 | distribute the Program or its derivative works. These actions are
183 | prohibited by law if you do not accept this License. Therefore, by
184 | modifying or distributing the Program (or any work based on the
185 | Program), you indicate your acceptance of this License to do so, and
186 | all its terms and conditions for copying, distributing or modifying
187 | the Program or works based on it.
188 |
189 | 6. Each time you redistribute the Program (or any work based on the
190 | Program), the recipient automatically receives a license from the
191 | original licensor to copy, distribute or modify the Program subject to
192 | these terms and conditions. You may not impose any further
193 | restrictions on the recipients' exercise of the rights granted herein.
194 | You are not responsible for enforcing compliance by third parties to
195 | this License.
196 |
197 | 7. If, as a consequence of a court judgment or allegation of patent
198 | infringement or for any other reason (not limited to patent issues),
199 | conditions are imposed on you (whether by court order, agreement or
200 | otherwise) that contradict the conditions of this License, they do not
201 | excuse you from the conditions of this License. If you cannot
202 | distribute so as to satisfy simultaneously your obligations under this
203 | License and any other pertinent obligations, then as a consequence you
204 | may not distribute the Program at all. For example, if a patent
205 | license would not permit royalty-free redistribution of the Program by
206 | all those who receive copies directly or indirectly through you, then
207 | the only way you could satisfy both it and this License would be to
208 | refrain entirely from distribution of the Program.
209 |
210 | If any portion of this section is held invalid or unenforceable under
211 | any particular circumstance, the balance of the section is intended to
212 | apply and the section as a whole is intended to apply in other
213 | circumstances.
214 |
215 | It is not the purpose of this section to induce you to infringe any
216 | patents or other property right claims or to contest validity of any
217 | such claims; this section has the sole purpose of protecting the
218 | integrity of the free software distribution system, which is
219 | implemented by public license practices. Many people have made
220 | generous contributions to the wide range of software distributed
221 | through that system in reliance on consistent application of that
222 | system; it is up to the author/donor to decide if he or she is willing
223 | to distribute software through any other system and a licensee cannot
224 | impose that choice.
225 |
226 | This section is intended to make thoroughly clear what is believed to
227 | be a consequence of the rest of this License.
228 |
229 | 8. If the distribution and/or use of the Program is restricted in
230 | certain countries either by patents or by copyrighted interfaces, the
231 | original copyright holder who places the Program under this License
232 | may add an explicit geographical distribution limitation excluding
233 | those countries, so that distribution is permitted only in or among
234 | countries not thus excluded. In such case, this License incorporates
235 | the limitation as if written in the body of this License.
236 |
237 | 9. The Free Software Foundation may publish revised and/or new versions
238 | of the General Public License from time to time. Such new versions will
239 | be similar in spirit to the present version, but may differ in detail to
240 | address new problems or concerns.
241 |
242 | Each version is given a distinguishing version number. If the Program
243 | specifies a version number of this License which applies to it and "any
244 | later version", you have the option of following the terms and conditions
245 | either of that version or of any later version published by the Free
246 | Software Foundation. If the Program does not specify a version number of
247 | this License, you may choose any version ever published by the Free Software
248 | Foundation.
249 |
250 | 10. If you wish to incorporate parts of the Program into other free
251 | programs whose distribution conditions are different, write to the author
252 | to ask for permission. For software which is copyrighted by the Free
253 | Software Foundation, write to the Free Software Foundation; we sometimes
254 | make exceptions for this. Our decision will be guided by the two goals
255 | of preserving the free status of all derivatives of our free software and
256 | of promoting the sharing and reuse of software generally.
257 |
258 | NO WARRANTY
259 |
260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268 | REPAIR OR CORRECTION.
269 |
270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278 | POSSIBILITY OF SUCH DAMAGES.
279 |
280 | END OF TERMS AND CONDITIONS
281 |
282 | How to Apply These Terms to Your New Programs
283 |
284 | If you develop a new program, and you want it to be of the greatest
285 | possible use to the public, the best way to achieve this is to make it
286 | free software which everyone can redistribute and change under these terms.
287 |
288 | To do so, attach the following notices to the program. It is safest
289 | to attach them to the start of each source file to most effectively
290 | convey the exclusion of warranty; and each file should have at least
291 | the "copyright" line and a pointer to where the full notice is found.
292 |
293 | {description}
294 | Copyright (C) {year} {fullname}
295 |
296 | This program is free software; you can redistribute it and/or modify
297 | it under the terms of the GNU General Public License as published by
298 | the Free Software Foundation; either version 2 of the License, or
299 | (at your option) any later version.
300 |
301 | This program is distributed in the hope that it will be useful,
302 | but WITHOUT ANY WARRANTY; without even the implied warranty of
303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
304 | GNU General Public License for more details.
305 |
306 | You should have received a copy of the GNU General Public License along
307 | with this program; if not, write to the Free Software Foundation, Inc.,
308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
309 |
310 | Also add information on how to contact you by electronic and paper mail.
311 |
312 | If the program is interactive, make it output a short notice like this
313 | when it starts in an interactive mode:
314 |
315 | Gnomovision version 69, Copyright (C) year name of author
316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
317 | This is free software, and you are welcome to redistribute it
318 | under certain conditions; type `show c' for details.
319 |
320 | The hypothetical commands `show w' and `show c' should show the appropriate
321 | parts of the General Public License. Of course, the commands you use may
322 | be called something other than `show w' and `show c'; they could even be
323 | mouse-clicks or menu items--whatever suits your program.
324 |
325 | You should also get your employer (if you work as a programmer) or your
326 | school, if any, to sign a "copyright disclaimer" for the program, if
327 | necessary. Here is a sample; alter the names:
328 |
329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program
330 | `Gnomovision' (which makes passes at compilers) written by James Hacker.
331 |
332 | {signature of Ty Coon}, 1 April 1989
333 | Ty Coon, President of Vice
334 |
335 | This General Public License does not permit incorporating your program into
336 | proprietary programs. If your program is a subroutine library, you may
337 | consider it more useful to permit linking proprietary applications with the
338 | library. If this is what you want to do, use the GNU Lesser General
339 | Public License instead of this License.
340 |
--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
1 | This repository hosts modifications to the Linux kernel to enable forward error
2 | correction (FEC) in TCP. The technique is described as the "Corrective"
3 | approach in our SIGCOMM 2013 publication titled "Reducing Web Latency: The
4 | Virtue of Gentle Aggression".
5 |
6 | The modifications were originally developed for the Linux kernel version 2.6.34
7 | and have since been rebased to version 3.10 - though without extensive testing!
8 |
9 | We invite you to play around with the patches and welcome your feedback. We are
10 | also happy to apply bugfixes and future rebases you might have to the existing
11 | patch set.
12 |
13 | WARNING: Since the modifications have not been tested extensively in the current
14 | kernel version, we advise you to execute tests in an isolated environment with
15 | an option for a recovery from kernel panics, etc.
16 |
17 |
18 | ### Patch components
19 |
20 | The changes are grouped into three patches building on top of each other:
21 | Common, Receiver, and Sender. For a detailed description of the parts
22 | implemented by each patch please check the description at the top of each patch
23 | file.
24 |
25 |
26 | ### Installation
27 |
28 | To get started fetch the Linux kernel version used as the base for the patch set
29 | (here version 3.10):
30 |
31 | $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
32 | $ cd linux
33 | $ git checkout tags/v3.10
34 |
35 | Next, check out the patches (or download them directly). Then apply them in the
36 | right order (Common, then Receiver, then Sender):
37 |
38 | $ git apply /0001-net-tcp-TCP-with-Forward-Error-Correction-Common.patch
40 | $ git apply /0002-net-tcp-TCP-with-Forward-Error-Correction-Receiver.patch
42 | $ git apply /0003-net-tcp-TCP-with-Forward-Error-Correction-Sender.patch
44 |
45 | All three `apply` calls do NOT produce any console output if applied correctly.
46 |
47 | NOTE: If you want to apply the patches to a different kernel version, keep in
48 | mind that the apply step can fail if some changes are conflicted with the
49 | changed code base. A possible solution is to apply a patch partially (using the
50 | `--reject` flag when running `git apply`) and then resolve the conflicts stored
51 | in `.rej` files manually.
52 |
53 | Finally, you need to compile and install the modified kernel. There are many
54 | tutorials on how to do this out there, so we avoid replicating instructions for
55 | this here.
56 |
57 | The FEC feature is turned off by default in an environment running with a
58 | modified kernel. To enable it run:
59 |
60 | $ sysctl net.ipv4.tcp_fec=1
61 |
62 |
63 | ### Testing
64 |
65 | We developed a set of packetdrill test routines to check the proper
66 | functionality of the FEC engine. We will publish them in this repository soon.
67 |
68 | If you have further questions, feel free to contact any of the contributors.
69 |
--------------------------------------------------------------------------------
/v3.10/0001-net-tcp-TCP-with-Forward-Error-Correction-Common.patch:
--------------------------------------------------------------------------------
1 | From 1e117e7a24bddda7d53afe8da640ac4433f6450c Mon Sep 17 00:00:00 2001
2 | From: Tobias Flach
3 | Date: Mon, 25 Aug 2014 16:44:23 -0700
4 | Subject: [PATCH] net-tcp: TCP with Forward Error Correction (Common)
5 |
6 | Implemetation of the common part of forward error correction for server and
7 | client in TCP.
8 |
9 | Implemented components:
10 | * New sysctl to enable FEC (set to 1)
11 | * Additional socket buffer pointer for a struct storing FEC control parameters
12 | * FEC option encoding and decoding
13 | * Negotiation during connection setup:
14 | - The client requests FEC usage by adding an FEC option to the SYN
15 | packet. That is, option kind EXP (0xFE), two magic bytes to
16 | identify the FEC option (0xDC60), and one extra byte to identify
17 | the encoding type
18 | - Currently supported encoding types are:
19 | TCP_FEC_TYPE_XOR_ALL 1 XORs every MSS length segment
20 | TCP_FEC_TYPE_XOR_SKIP_1 2 XORs every other MSS length segment
21 | - If the server supports FEC, it copies the option over to the
22 | SYN/ACK packet.
23 | - Following a successful negotiation every packet carries an FEC
24 | option. Regular data packets and regular acknowledgements
25 | carry a short FEC option with a 1-byte value encoding various
26 | flags. Encoded packets carry the option with a 4-byte value,
27 | encoding the flags and the encoding range. Acknowledgements
28 | after a failed recovery carry the option with a 4-byte value,
29 | encoding the flags and the loss range.
30 | ---
31 | include/linux/skbuff.h | 4 +-
32 | include/linux/tcp.h | 38 ++++++++++++++
33 | include/net/request_sock.h | 2 +
34 | include/net/tcp.h | 10 ++++
35 | include/net/tcp_fec.h | 53 +++++++++++++++++++
36 | include/uapi/linux/tcp.h | 1 +
37 | net/ipv4/Makefile | 2 +-
38 | net/ipv4/sysctl_net_ipv4.c | 9 ++++
39 | net/ipv4/tcp.c | 10 ++++
40 | net/ipv4/tcp_fec.c | 124 +++++++++++++++++++++++++++++++++++++++++++++
41 | net/ipv4/tcp_input.c | 29 +++++++++++
42 | net/ipv4/tcp_ipv4.c | 3 ++
43 | net/ipv4/tcp_minisocks.c | 7 +++
44 | net/ipv4/tcp_output.c | 45 ++++++++++++++++
45 | net/ipv6/tcp_ipv6.c | 2 +
46 | 15 files changed, 337 insertions(+), 2 deletions(-)
47 | create mode 100644 include/net/tcp_fec.h
48 | create mode 100644 net/ipv4/tcp_fec.c
49 |
50 | diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
51 | index dec1748..b652f1c 100644
52 | --- a/include/linux/skbuff.h
53 | +++ b/include/linux/skbuff.h
54 | @@ -418,8 +418,10 @@ struct sk_buff {
55 | * layer. Please put your private variables there. If you
56 | * want to keep them across layers you have to do a skb_clone()
57 | * first. This is owned by whoever has the skb queued ATM.
58 | + *
59 | + * Increased the CB to hold pointer to an FEC structure.
60 | */
61 | - char cb[48] __aligned(8);
62 | + char cb[56] __aligned(8);
63 |
64 | unsigned long _skb_refdst;
65 | #ifdef CONFIG_XFRM
66 | diff --git a/include/linux/tcp.h b/include/linux/tcp.h
67 | index 5adbc33..b5f23d8 100644
68 | --- a/include/linux/tcp.h
69 | +++ b/include/linux/tcp.h
70 | @@ -77,6 +77,24 @@ struct tcp_sack_block {
71 | #define TCP_FACK_ENABLED (1 << 1) /*1 = FACK is enabled locally*/
72 | #define TCP_DSACK_SEEN (1 << 2) /*1 = DSACK was received from peer*/
73 |
74 | +/* Flags transmitted in the first FEC option byte after magic bytes
75 | + * (except if option is used for negotiation) */
76 | +#define TCP_FEC_RECOVERY_CWR 0x80 /* Recovery triggered CWR */
77 | +#define TCP_FEC_RECOVERY_SUCCESSFUL 0x40 /* Local recovery done */
78 | +#define TCP_FEC_RECOVERY_FAILED 0x20 /* Local recovery failed */
79 | +#define TCP_FEC_ENCODED 0x10 /* Packet is FEC-encoded */
80 | +
81 | +struct tcp_fec {
82 | + u8 type; /* Requested FEC type (negotiation only,
83 | + * see net/tcp_fec.h for type defs) */
84 | + u32 enc_seq; /* Sequence number of first encoded byte */
85 | + u32 enc_len; /* Encoding length */
86 | + u32 lost_seq; /* Sequence number of first lost byte */
87 | + u32 lost_len; /* Loss length */
88 | + u8 flags; /* See flag definitions above */
89 | + bool saw_fec; /* FEC option was retrieved from packet */
90 | +};
91 | +
92 | struct tcp_options_received {
93 | /* PAWS/RTTM data */
94 | long ts_recent_stamp;/* Time we stored ts_recent (for aging) */
95 | @@ -93,12 +111,14 @@ struct tcp_options_received {
96 | u8 num_sacks; /* Number of SACK blocks */
97 | u16 user_mss; /* mss requested by user in ioctl */
98 | u16 mss_clamp; /* Maximal mss, negotiated at connection setup */
99 | + struct tcp_fec fec; /* FEC-related parameters */
100 | };
101 |
102 | static inline void tcp_clear_options(struct tcp_options_received *rx_opt)
103 | {
104 | rx_opt->tstamp_ok = rx_opt->sack_ok = 0;
105 | rx_opt->wscale_ok = rx_opt->snd_wscale = 0;
106 | + memset(&(rx_opt->fec), 0, sizeof(struct tcp_fec));
107 | }
108 |
109 | /* This is the max number of SACKS that we'll generate and process. It's safe
110 | @@ -321,6 +341,24 @@ struct tcp_sock {
111 | * socket. Used to retransmit SYNACKs etc.
112 | */
113 | struct request_sock *fastopen_rsk;
114 | +
115 | +/* TCP FEC parameters
116 | + * type - negotiated FEC type to be used
117 | + * next_seq - next sequence which was not FEC-encoded before
118 | + * lost_len - bytes after rcv_nxt considered lost
119 | + * flags - see TCP_FEC_* flag definitions above
120 | + * bytes_rcv_queue - number of bytes stored in queued SKBs
121 | + * rcv_queue - copies from the socket's receive queue kept for
122 | + * FEC recovery
123 | + */
124 | + struct {
125 | + u8 type;
126 | + u32 next_seq;
127 | + u32 lost_len;
128 | + u8 flags;
129 | + u32 bytes_rcv_queue;
130 | + struct sk_buff_head rcv_queue;
131 | + } fec;
132 | };
133 |
134 | enum tsq_flags {
135 | diff --git a/include/net/request_sock.h b/include/net/request_sock.h
136 | index 59795e4..06705c2 100644
137 | --- a/include/net/request_sock.h
138 | +++ b/include/net/request_sock.h
139 | @@ -62,6 +62,8 @@ struct request_sock {
140 | struct sock *sk;
141 | u32 secid;
142 | u32 peer_secid;
143 | + u8 fec_type; /* Encoding type (see
144 | + * net/tcp_fec.h) */
145 | };
146 |
147 | static inline struct request_sock *reqsk_alloc(const struct request_sock_ops *ops)
148 | diff --git a/include/net/tcp.h b/include/net/tcp.h
149 | index 5bba80f..9a949a9 100644
150 | --- a/include/net/tcp.h
151 | +++ b/include/net/tcp.h
152 | @@ -184,6 +184,7 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
153 | * experimental options. See draft-ietf-tcpm-experimental-options-00.txt
154 | */
155 | #define TCPOPT_FASTOPEN_MAGIC 0xF989
156 | +#define TCPOPT_FEC_MAGIC 0xDC60
157 |
158 | /*
159 | * TCP option lengths
160 | @@ -199,6 +200,7 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
161 | #define TCPOLEN_COOKIE_PAIR 3 /* Cookie pair header extension */
162 | #define TCPOLEN_COOKIE_MIN (TCPOLEN_COOKIE_BASE+TCP_COOKIE_MIN)
163 | #define TCPOLEN_COOKIE_MAX (TCPOLEN_COOKIE_BASE+TCP_COOKIE_MAX)
164 | +#define TCPOLEN_EXP_FEC_BASE 4
165 |
166 | /* But this is what stacks really send out. */
167 | #define TCPOLEN_TSTAMP_ALIGNED 12
168 | @@ -209,6 +211,7 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
169 | #define TCPOLEN_SACK_PERBLOCK 8
170 | #define TCPOLEN_MD5SIG_ALIGNED 20
171 | #define TCPOLEN_MSS_ALIGNED 4
172 | +#define TCPOLEN_EXP_FEC_NEGOTIATION_ALIGNED 8
173 |
174 | /* Flags in tp->nonagle */
175 | #define TCP_NAGLE_OFF 1 /* Nagle's algo is disabled */
176 | @@ -240,6 +243,10 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
177 | * cookie/data not present. (For testing purpose!)
178 | */
179 | #define TFO_SERVER_ALWAYS 0x1000
180 | +/* Maximum number of in-order bytes kept in the receiver's buffer for FEC
181 | + * recoveries. The sender will never send more than this in a single FEC
182 | + * packet. */
183 | +#define FEC_RCV_QUEUE_LIMIT 16000
184 |
185 | extern struct inet_timewait_death_row tcp_death_row;
186 |
187 | @@ -287,6 +294,7 @@ extern int sysctl_tcp_thin_dupack;
188 | extern int sysctl_tcp_early_retrans;
189 | extern int sysctl_tcp_limit_output_bytes;
190 | extern int sysctl_tcp_challenge_ack_limit;
191 | +extern int sysctl_tcp_fec;
192 |
193 | extern atomic_long_t tcp_memory_allocated;
194 | extern struct percpu_counter tcp_sockets_allocated;
195 | @@ -713,6 +721,7 @@ struct tcp_skb_cb {
196 | __u8 ip_dsfield; /* IPv4 tos or IPv6 dsfield */
197 | /* 1 byte hole */
198 | __u32 ack_seq; /* Sequence number ACK'd */
199 | + struct tcp_fec *fec; /* FEC parameters */
200 | };
201 |
202 | #define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0]))
203 | @@ -1093,6 +1102,7 @@ static inline void tcp_openreq_init(struct request_sock *req,
204 | ireq->ecn_ok = 0;
205 | ireq->rmt_port = tcp_hdr(skb)->source;
206 | ireq->loc_port = tcp_hdr(skb)->dest;
207 | + req->fec_type = rx_opt->fec.type;
208 | }
209 |
210 | /* Compute time elapsed between SYNACK and the ACK completing 3WHS */
211 | diff --git a/include/net/tcp_fec.h b/include/net/tcp_fec.h
212 | new file mode 100644
213 | index 0000000..ba219d1
214 | --- /dev/null
215 | +++ b/include/net/tcp_fec.h
216 | @@ -0,0 +1,53 @@
217 | +#ifndef _TCP_FEC_H
218 | +#define _TCP_FEC_H
219 | +
220 | +#include
221 | +#include
222 | +
223 | +/* FEC-encoding types (8 bits, internal) */
224 | +#define TCP_FEC_TYPE_NONE 0 /* FEC disabled */
225 | +#define TCP_FEC_TYPE_XOR_ALL 1 /* XOR every MSS length segment */
226 | +#define TCP_FEC_TYPE_XOR_SKIP_1 2 /* XOR every other MSS length
227 | + * segment */
228 | +
229 | +#define TCP_FEC_NUM_TYPES 3
230 | +
231 | +/*
232 | + * Returns true if FEC is enabled for the socket
233 | + */
234 | +static inline bool tcp_fec_is_enabled(const struct tcp_sock *tp)
235 | +{
236 | + return unlikely(tp->fec.type > 0);
237 | +}
238 | +
239 | +/*
240 | + * Returns true if the current packet in the buffer is FEC-encoded
241 | + */
242 | +static inline bool tcp_fec_is_encoded(const struct tcp_sock *tp)
243 | +{
244 | + return unlikely((tp->rx_opt.fec.flags & TCP_FEC_ENCODED) &&
245 | + (tp->rx_opt.fec.saw_fec));
246 | +}
247 | +
248 | +/*
249 | + * Decodes FEC parameters and stores them in the FEC struct
250 | + * @seq - sequence number of the packet
251 | + * @ack_seq - ACKed sequence number
252 | + * @is_syn - true, if option was attached to a packet with a SYN flag
253 | + * @ptr - points to the first byte of the FEC option after kind, length,
254 | + * and possible magic bytes
255 | + * @len - option length (without kind, length, magic bytes)
256 | + */
257 | +int tcp_fec_decode_option(struct tcp_fec *fec, u32 seq, u32 ack_seq,
258 | + bool is_syn, const unsigned char *ptr,
259 | + unsigned int len);
260 | +
261 | +/*
262 | + * Encodes FEC parameters to wire format
263 | + * Pointer points to the first byte of the FEC option after kind, length,
264 | + * and possible magic bytes (pointer will be moved to first unoccupied byte)
265 | + */
266 | +int tcp_fec_encode_option(struct tcp_sock *tp, struct tcp_fec *fec,
267 | + __be32 **ptr);
268 | +
269 | +#endif
270 | diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
271 | index 8d776eb..15e3aba 100644
272 | --- a/include/uapi/linux/tcp.h
273 | +++ b/include/uapi/linux/tcp.h
274 | @@ -111,6 +111,7 @@ enum {
275 | #define TCP_REPAIR_OPTIONS 22
276 | #define TCP_FASTOPEN 23 /* Enable FastOpen on listeners */
277 | #define TCP_TIMESTAMP 24
278 | +#define TCP_FEC 25 /* Forward error correction */
279 |
280 | struct tcp_repair_opt {
281 | __u32 opt_code;
282 | diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
283 | index 089cb9f..7ec6035 100644
284 | --- a/net/ipv4/Makefile
285 | +++ b/net/ipv4/Makefile
286 | @@ -6,7 +6,7 @@ obj-y := route.o inetpeer.o protocol.o \
287 | ip_input.o ip_fragment.o ip_forward.o ip_options.o \
288 | ip_output.o ip_sockglue.o inet_hashtables.o \
289 | inet_timewait_sock.o inet_connection_sock.o \
290 | - tcp.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
291 | + tcp.o tcp_fec.o tcp_input.o tcp_output.o tcp_timer.o tcp_ipv4.o \
292 | tcp_minisocks.o tcp_cong.o tcp_metrics.o tcp_fastopen.o \
293 | datagram.o raw.o udp.o udplite.o \
294 | arp.o icmp.o devinet.o af_inet.o igmp.o \
295 | diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
296 | index fa2f63f..42ea051 100644
297 | --- a/net/ipv4/sysctl_net_ipv4.c
298 | +++ b/net/ipv4/sysctl_net_ipv4.c
299 | @@ -771,6 +771,15 @@ static struct ctl_table ipv4_table[] = {
300 | .proc_handler = proc_dointvec_minmax,
301 | .extra1 = &one
302 | },
303 | + {
304 | + .procname = "tcp_fec",
305 | + .data = &sysctl_tcp_fec,
306 | + .maxlen = sizeof(int),
307 | + .mode = 0644,
308 | + .proc_handler = proc_dointvec,
309 | + .extra1 = &zero,
310 | + .extra2 = &one,
311 | + },
312 | { }
313 | };
314 |
315 | diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
316 | index ab450c0..a243f86 100644
317 | --- a/net/ipv4/tcp.c
318 | +++ b/net/ipv4/tcp.c
319 | @@ -276,6 +276,7 @@
320 | #include
321 | #include
322 | #include
323 | +#include
324 |
325 | #include
326 | #include
327 | @@ -2624,6 +2625,12 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
328 | else
329 | tp->tsoffset = val - tcp_time_stamp;
330 | break;
331 | + case TCP_FEC:
332 | + if (sysctl_tcp_fec && val >= 0 && val < TCP_FEC_NUM_TYPES)
333 | + tp->fec.type = val;
334 | + else
335 | + err = -EINVAL;
336 | + break;
337 | default:
338 | err = -ENOPROTOOPT;
339 | break;
340 | @@ -2840,6 +2847,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
341 | case TCP_TIMESTAMP:
342 | val = tcp_time_stamp + tp->tsoffset;
343 | break;
344 | + case TCP_FEC:
345 | + val = tp->fec.type;
346 | + break;
347 | default:
348 | return -ENOPROTOOPT;
349 | }
350 | diff --git a/net/ipv4/tcp_fec.c b/net/ipv4/tcp_fec.c
351 | new file mode 100644
352 | index 0000000..97a48ce
353 | --- /dev/null
354 | +++ b/net/ipv4/tcp_fec.c
355 | @@ -0,0 +1,124 @@
356 | +#include
357 | +
358 | +/* Decodes FEC parameters and stores them in the FEC struct
359 | + * @seq - sequence number of the packet
360 | + * @ack_seq - ACKed sequence number
361 | + * @is_syn - true, if option was attached to a packet with a SYN flag
362 | + * @ptr - points to the first byte of the FEC option after kind, length,
363 | + * and possible magic bytes
364 | + * @len - option length (without kind, length, magic bytes)
365 | + */
366 | +int tcp_fec_decode_option(struct tcp_fec *fec, u32 seq, u32 ack_seq,
367 | + bool is_syn, const unsigned char *ptr,
368 | + unsigned int len)
369 | +{
370 | + /* reset / initialize option values which should be evaluated
371 | + * with EVERY incoming packet
372 | + */
373 | + fec->flags = 0;
374 | + fec->saw_fec = 1;
375 | +
376 | + if (len == 1) {
377 | + /* Short option */
378 | + u8 val = *((u8 *) ptr);
379 | + if (is_syn) {
380 | + /* Negotiation */
381 | + fec->type = val;
382 | + } else {
383 | + /* Regular packet */
384 | + fec->flags = val;
385 | + }
386 | +
387 | + return 0;
388 | + }
389 | +
390 | + if (len == 4) {
391 | + /* Long option */
392 | + u32 val = get_unaligned_be32(ptr);
393 | + fec->flags = val >> 24;
394 | +
395 | + if (fec->flags & TCP_FEC_ENCODED) {
396 | + fec->enc_seq = seq;
397 | + fec->enc_len = val & 0xFFFFFF;
398 | + } else if (fec->flags & TCP_FEC_RECOVERY_FAILED) {
399 | + fec->lost_seq = ack_seq;
400 | + fec->lost_len = val & 0xFFFFFF;
401 | + } else {
402 | + return -EINVAL;
403 | + }
404 | +
405 | + return 0;
406 | + }
407 | +
408 | + /* Invalid option length */
409 | + return -EINVAL;
410 | +}
411 | +
412 | +/* Encodes FEC parameters to wire format
413 | + * @ptr - Encoded option is written to this memory location (and the pointer
414 | + * is advanced to the next unoccupied byte, 4-byte aligned)
415 | + * Returns the length of the encoded option (including alignment)
416 | + */
417 | +int tcp_fec_encode_option(struct tcp_sock *tp, struct tcp_fec *fec,
418 | + __be32 **ptr)
419 | +{
420 | + int len;
421 | +
422 | + fec->flags |= tp->fec.flags;
423 | + fec->lost_len = tp->fec.lost_len;
424 | + tp->fec.flags &= ~TCP_FEC_RECOVERY_CWR;
425 | + tp->fec.flags &= ~TCP_FEC_RECOVERY_FAILED;
426 | +
427 | + /* Encode fixed option part (option kind, length, and magic bytes) */
428 | + if (fec->flags & (TCP_FEC_ENCODED | TCP_FEC_RECOVERY_FAILED))
429 | + len = 4 + TCPOLEN_EXP_FEC_BASE; /* Long option */
430 | + else
431 | + len = 1 + TCPOLEN_EXP_FEC_BASE; /* Short option */
432 | +
433 | + **ptr = htonl((TCPOPT_EXP << 24) | (len << 16) | TCPOPT_FEC_MAGIC);
434 | + (*ptr)++;
435 | +
436 | + if ((fec->flags & TCP_FEC_ENCODED) &&
437 | + (fec->flags & TCP_FEC_RECOVERY_FAILED)) {
438 | + /* TODO Special case: need to separate loss indication
439 | + * from encoding or make option 12 bytes long
440 | + * This can only happen if a node receives and sends FEC
441 | + * data
442 | + */
443 | + fec->flags &= ~TCP_FEC_RECOVERY_FAILED;
444 | + }
445 | +
446 | + if (fec->flags & TCP_FEC_ENCODED) {
447 | + /* FEC-encoded packets carry:
448 | + *
449 | + */
450 | + **ptr = htonl((fec->flags << 24) |
451 | + (fec->enc_len));
452 | + (*ptr)++;
453 | + return 8;
454 | + } else if (fec->flags & TCP_FEC_RECOVERY_FAILED) {
455 | + /* Packets with failed recovery indication carry:
456 | + *
457 | + */
458 | + **ptr = htonl((fec->flags << 24) |
459 | + (fec->lost_len));
460 | + (*ptr)++;
461 | + return 8;
462 | + } else if (fec->type) {
463 | + /* Negotiation packets carry: */
464 | + **ptr = htonl((fec->type << 24) |
465 | + (TCPOPT_NOP << 16) |
466 | + (TCPOPT_NOP << 8) |
467 | + TCPOPT_NOP);
468 | + (*ptr)++;
469 | + return 8;
470 | + } else {
471 | + /* All other packets carry: */
472 | + **ptr = htonl((fec->flags << 24) |
473 | + (TCPOPT_NOP << 16) |
474 | + (TCPOPT_NOP << 8) |
475 | + TCPOPT_NOP);
476 | + (*ptr)++;
477 | + return 8;
478 | + }
479 | +}
480 | diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
481 | index 9c62257..3260498 100644
482 | --- a/net/ipv4/tcp_input.c
483 | +++ b/net/ipv4/tcp_input.c
484 | @@ -70,6 +70,7 @@
485 | #include
486 | #include
487 | #include
488 | +#include
489 | #include
490 | #include
491 | #include
492 | @@ -3564,6 +3565,20 @@ void tcp_parse_options(const struct sk_buff *skb,
493 | break;
494 | #endif
495 | case TCPOPT_EXP:
496 | + /* TCP FEC option shares code 254 using a
497 | + * 16 bit magic number.
498 | + */
499 | + if (sysctl_tcp_fec &&
500 | + get_unaligned_be16(ptr) ==
501 | + TCPOPT_FEC_MAGIC) {
502 | + tcp_fec_decode_option(&(opt_rx->fec),
503 | + ntohl(th->seq),
504 | + ntohl(th->ack_seq), th->syn,
505 | + ptr + 2,
506 | + opsize - TCPOLEN_EXP_FEC_BASE);
507 | + break;
508 | + }
509 | +
510 | /* Fast Open option shares code 254 using a
511 | * 16 bits magic number. It's valid only in
512 | * SYN or SYN-ACK with an even size.
513 | @@ -5093,6 +5108,7 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
514 | */
515 |
516 | tp->rx_opt.saw_tstamp = 0;
517 | + tp->rx_opt.fec.saw_fec = 0;
518 |
519 | /* pred_flags is 0xS?10 << 16 + snd_wnd
520 | * if header_prediction is to be made
521 | @@ -5463,6 +5479,15 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
522 | if (tcp_is_sack(tp) && sysctl_tcp_fack)
523 | tcp_enable_fack(tp);
524 |
525 | + /*
526 | + * FEC negotiation
527 | + * Disable FEC if both ends do not agree on the FEC type used
528 | + */
529 | + if (tp->fec.type != tp->rx_opt.fec.type) {
530 | + tp->fec.type = 0;
531 | + tp->rx_opt.fec.type = 0;
532 | + }
533 | +
534 | tcp_mtup_init(sk);
535 | tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
536 | tcp_initialize_rcv_mss(sk);
537 | @@ -5740,6 +5765,10 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
538 |
539 | tcp_initialize_rcv_mss(sk);
540 | tcp_fast_path_on(tp);
541 | +
542 | + /* SYN requested FEC usage */
543 | + if (tp->rx_opt.fec.type > 0)
544 | + tp->fec.type = tp->rx_opt.fec.type;
545 | } else {
546 | return 1;
547 | }
548 | diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
549 | index 7999fc5..04e3bf0 100644
550 | --- a/net/ipv4/tcp_ipv4.c
551 | +++ b/net/ipv4/tcp_ipv4.c
552 | @@ -74,6 +74,7 @@
553 | #include
554 | #include
555 | #include
556 | +#include
557 | #include
558 |
559 | #include
560 | @@ -213,6 +214,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
561 |
562 | tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;
563 |
564 | + memset(&(tp->rx_opt.fec), 0, sizeof(struct tcp_fec));
565 | +
566 | /* Socket identity is still unknown (sport may be zero).
567 | * However we set state to SYN-SENT and not releasing socket
568 | * lock select source port, enter ourselves into the hash tables and
569 | diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
570 | index 0f01788..1d0bf2f 100644
571 | --- a/net/ipv4/tcp_minisocks.c
572 | +++ b/net/ipv4/tcp_minisocks.c
573 | @@ -483,6 +483,13 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
574 | newtp->fastopen_rsk = NULL;
575 | newtp->syn_data_acked = 0;
576 |
577 | + /* TCP FEC option */
578 | + newtp->rx_opt.fec.type = sysctl_tcp_fec ? req->fec_type : 0;
579 | + newtp->fec.type = newtp->fec.flags = 0;
580 | + newtp->fec.next_seq = newtp->snd_nxt;
581 | + newtp->fec.bytes_rcv_queue = 0;
582 | + skb_queue_head_init(&newtp->fec.rcv_queue);
583 | +
584 | TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_PASSIVEOPENS);
585 | }
586 | return newsk;
587 | diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
588 | index ec335fa..00daf84 100644
589 | --- a/net/ipv4/tcp_output.c
590 | +++ b/net/ipv4/tcp_output.c
591 | @@ -37,6 +37,7 @@
592 | #define pr_fmt(fmt) "TCP: " fmt
593 |
594 | #include
595 | +#include
596 |
597 | #include
598 | #include
599 | @@ -65,6 +66,8 @@ int sysctl_tcp_base_mss __read_mostly = TCP_BASE_MSS;
600 | /* By default, RFC2861 behavior. */
601 | int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
602 |
603 | +int sysctl_tcp_fec __read_mostly;
604 | +
605 | static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
606 | int push_one, gfp_t gfp);
607 |
608 | @@ -381,6 +384,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
609 | #define OPTION_MD5 (1 << 2)
610 | #define OPTION_WSCALE (1 << 3)
611 | #define OPTION_FAST_OPEN_COOKIE (1 << 8)
612 | +#define OPTION_FEC (1 << 9)
613 |
614 | struct tcp_out_options {
615 | u16 options; /* bit field of OPTION_* */
616 | @@ -391,6 +395,7 @@ struct tcp_out_options {
617 | __u8 *hash_location; /* temporary pointer, overloaded */
618 | __u32 tsval, tsecr; /* need to include OPTION_TS */
619 | struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */
620 | + struct tcp_fec fec; /* FEC parameters */
621 | };
622 |
623 | /* Write previously computed TCP options to the packet.
624 | @@ -490,6 +495,9 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
625 | }
626 | ptr += (foc->len + 3) >> 2;
627 | }
628 | +
629 | + if (unlikely(OPTION_FEC & options))
630 | + tcp_fec_encode_option(tp, &(opts->fec), &ptr);
631 | }
632 |
633 | /* Compute TCP options for SYN packets. This is not the final
634 | @@ -553,6 +561,14 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
635 | }
636 | }
637 |
638 | + /* Prepare for FEC negotation if requested */
639 | + if (unlikely(tcp_fec_is_enabled(tp)) &&
640 | + remaining >= TCPOLEN_EXP_FEC_NEGOTIATION_ALIGNED) {
641 | + opts->options |= OPTION_FEC;
642 | + opts->fec.type = tp->fec.type;
643 | + remaining -= TCPOLEN_EXP_FEC_NEGOTIATION_ALIGNED;
644 | + }
645 | +
646 | return MAX_TCP_OPTION_SPACE - remaining;
647 | }
648 |
649 | @@ -614,6 +630,16 @@ static unsigned int tcp_synack_options(struct sock *sk,
650 | }
651 | }
652 |
653 | + /* Handle request for FEC support from other side
654 | + * (respond with same FEC option if FEC is locally supported)
655 | + */
656 | + if (sysctl_tcp_fec && unlikely(req->fec_type) &&
657 | + remaining >= TCPOLEN_EXP_FEC_NEGOTIATION_ALIGNED) {
658 | + opts->options |= OPTION_FEC;
659 | + opts->fec.type = req->fec_type;
660 | + remaining -= TCPOLEN_EXP_FEC_NEGOTIATION_ALIGNED;
661 | + }
662 | +
663 | return MAX_TCP_OPTION_SPACE - remaining;
664 | }
665 |
666 | @@ -657,6 +683,19 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
667 | opts->num_sack_blocks * TCPOLEN_SACK_PERBLOCK;
668 | }
669 |
670 | + /* Prepare option if connection has FEC enabled */
671 | + if (tcp_fec_is_enabled(tp)) {
672 | + opts->options |= OPTION_FEC;
673 | + if (tcb && tcb->fec)
674 | + opts->fec = *(tcb->fec);
675 | +
676 | + /* regardless of packet type we need 4 more bytes
677 | + * including alignment
678 | + */
679 | + size += 4;
680 | + size += TCPOLEN_EXP_FEC_BASE;
681 | + }
682 | +
683 | return size;
684 | }
685 |
686 | @@ -2956,6 +2995,12 @@ int tcp_connect(struct sock *sk)
687 | */
688 | tp->snd_nxt = tp->write_seq;
689 | tp->pushed_seq = tp->write_seq;
690 | +
691 | + /* Initialize FEC members */
692 | + tp->fec.next_seq = tp->snd_nxt;
693 | + tp->fec.bytes_rcv_queue = 0;
694 | + skb_queue_head_init(&tp->fec.rcv_queue);
695 | +
696 | TCP_INC_STATS(sock_net(sk), TCP_MIB_ACTIVEOPENS);
697 |
698 | /* Timer for repeating the SYN until an answer. */
699 | diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
700 | index 0a17ed9..dc2d12a 100644
701 | --- a/net/ipv6/tcp_ipv6.c
702 | +++ b/net/ipv6/tcp_ipv6.c
703 | @@ -288,6 +288,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
704 |
705 | tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
706 |
707 | + memset(&(tp->rx_opt.fec), 0, sizeof(struct tcp_fec));
708 | +
709 | inet->inet_dport = usin->sin6_port;
710 |
711 | tcp_set_state(sk, TCP_SYN_SENT);
712 | --
713 | 2.1.0.rc2.206.gedb03e5
714 |
715 |
--------------------------------------------------------------------------------
/v3.10/0002-net-tcp-TCP-with-Forward-Error-Correction-Receiver.patch:
--------------------------------------------------------------------------------
1 | From 26339b95439aa587066938965f033e68a0d6f6c5 Mon Sep 17 00:00:00 2001
2 | From: Tobias Flach
3 | Date: Mon, 25 Aug 2014 16:46:16 -0700
4 | Subject: [PATCH] net-tcp: TCP with Forward Error Correction (Receiver)
5 |
6 | Implemetation of the receiver part of forward error correction in TCP.
7 |
8 | Implemented components:
9 | * Detection of an FEC packet:
10 | - FEC-encoded packets have the ENCODED flag set in the FEC flags
11 | byte. If a packet does not carry an FEC option at all, the
12 | packet is discarded.
13 | * Payload recovery (decoding):
14 | - The receiver keeps up to (currently) 16000 in-order bytes in the
15 | buffer for possible recoveries. Data is not duplicated, instead
16 | extra references are kept for the in-order SKBs to avoid them
17 | being freed once they are consumed by higher layers.
18 | - Once an FEC packet is received, the receiver tries to recover
19 | any encoded data which was not received yet by reversing the
20 | encoding steps.
21 | - If a byte block can be reconstructed, an SKB is allocated and
22 | a TCP header attached, before the new packet is forwarded to
23 | regular reception routines.
24 | * Acknowledgements:
25 | - On successful recovery, every outgoing packet has the FEC flag
26 | RECOVERY_SUCCESS enabled. On reception of this flag, the
27 | receiver reduces the congestion window (similarly to ECN)
28 | and sets the FEC flag RECOVERY_CWR in the next outgoing packet.
29 | The RECOVERY_SUCCESS flag is transmitted for every packet until
30 | RECOVERY_ACK has been received (similarly to sending ECE until
31 | CWR is received in ECN).
32 | - On failed recovery, an extra acknowledgement for rcv_nxt is
33 | generated. The FEC flag RECOVERY_FAIL is enabled and the remaining
34 | 24 bits in the option encode the number of bytes after ack_seq
35 | which are considered loss. The receiver marks this byte
36 | range as lost and can start retransmissions.
37 | ---
38 | include/net/tcp_fec.h | 27 ++
39 | net/ipv4/tcp_fec.c | 732 +++++++++++++++++++++++++++++++++++++++++++++++
40 | net/ipv4/tcp_input.c | 40 ++-
41 | net/ipv4/tcp_minisocks.c | 2 +
42 | 4 files changed, 798 insertions(+), 3 deletions(-)
43 |
44 | diff --git a/include/net/tcp_fec.h b/include/net/tcp_fec.h
45 | index ba219d1..1660e58 100644
46 | --- a/include/net/tcp_fec.h
47 | +++ b/include/net/tcp_fec.h
48 | @@ -50,4 +50,31 @@ int tcp_fec_decode_option(struct tcp_fec *fec, u32 seq, u32 ack_seq,
49 | int tcp_fec_encode_option(struct tcp_sock *tp, struct tcp_fec *fec,
50 | __be32 **ptr);
51 |
52 | +/*
53 | + * Processes the current packet in the buffer (treated as FEC packet)
54 | + */
55 | +int tcp_fec_process(struct sock *sk, struct sk_buff *skb);
56 | +
57 | +/*
58 | + * Checks the received options for loss indicators and acts upon them.
59 | + * In particular, the function handles window reduction requests and processes
60 | + * tail loss indicators.
61 | + * Returns: 1, if window is reduced - 0, otherwise
62 | + */
63 | +int tcp_fec_check_ack(struct sock *sk, u32 ack_seq);
64 | +
65 | +/*
66 | + * Since data in the socket's receive queue can get consumed by other parties
67 | + * we need to keep extra references these SKBs until they are no longer
68 | + * required for possible future recoveries.
69 | + * @skb - buffer which is moved to the receive queue
70 | + */
71 | +int tcp_fec_update_queue(struct sock *sk, struct sk_buff *skb);
72 | +
73 | +/*
74 | + * Disables FEC for this connection (includes clearing references
75 | + * to buffers in receive queue)
76 | + */
77 | +void tcp_fec_disable(struct sock *sk);
78 | +
79 | #endif
80 | diff --git a/net/ipv4/tcp_fec.c b/net/ipv4/tcp_fec.c
81 | index 97a48ce..3a8bd6d 100644
82 | --- a/net/ipv4/tcp_fec.c
83 | +++ b/net/ipv4/tcp_fec.c
84 | @@ -1,5 +1,30 @@
85 | #include
86 |
87 | +/* Codes for incoming FEC packet processing */
88 | +#define FEC_NO_LOSS 1
89 | +#define FEC_LOSS_UNRECOVERED 2
90 | +#define FEC_LOSS_RECOVERED 3
91 | +
92 | +/* Receiver routines */
93 | +static int tcp_fec_process_xor(struct sock *sk, const struct sk_buff *skb,
94 | + unsigned int block_skip);
95 | +static int tcp_fec_recover(struct sock *sk, const struct sk_buff *skb,
96 | + unsigned char *data, u32 seq, int len);
97 | +static void tcp_fec_send_ack(struct sock *sk, const struct sk_buff *skb,
98 | + int recovery_status);
99 | +static void tcp_fec_reduce_window(struct sock *sk);
100 | +static void tcp_fec_mark_skbs_lost(struct sock *sk);
101 | +static bool tcp_fec_update_decoded_option(struct sk_buff *skb);
102 | +static struct sk_buff *tcp_fec_make_decoded_pkt(struct sock *sk,
103 | + const struct sk_buff *skb, unsigned char *dec_data,
104 | + u32 seq, unsigned int len);
105 | +
106 | +/* Buffer access routine */
107 | +static unsigned int tcp_fec_get_next_block(struct sock *sk,
108 | + struct sk_buff **skb, struct sk_buff_head *queue,
109 | + u32 seq, unsigned int block_len,
110 | + unsigned char *block);
111 | +
112 | /* Decodes FEC parameters and stores them in the FEC struct
113 | * @seq - sequence number of the packet
114 | * @ack_seq - ACKed sequence number
115 | @@ -122,3 +147,710 @@ int tcp_fec_encode_option(struct tcp_sock *tp, struct tcp_fec *fec,
116 | return 8;
117 | }
118 | }
119 | +
120 | +/* Processes the current packet in the buffer, treated as an FEC packet
121 | + * (assumes that options were already processed)
122 | + */
123 | +int tcp_fec_process(struct sock *sk, struct sk_buff *skb)
124 | +{
125 | + struct tcp_sock *tp;
126 | + struct tcphdr *th;
127 | + int recovery_status, err;
128 | + u32 end_seq;
129 | +
130 | + tp = tcp_sk(sk);
131 | + th = tcp_hdr(skb);
132 | + recovery_status = 0;
133 | +
134 | + /* drop packet if packet is not encoded */
135 | + if (!(tp->rx_opt.fec.flags & TCP_FEC_ENCODED))
136 | + return -1;
137 | +
138 | + /* check if all encoded packets were already received */
139 | + end_seq = tp->rx_opt.fec.enc_seq + tp->rx_opt.fec.enc_len;
140 | + if (!after(end_seq, tp->rcv_nxt)) {
141 | + tcp_fec_send_ack(sk, skb, FEC_NO_LOSS);
142 | + return 0;
143 | + }
144 | +
145 | + /* linearize the SKB (for easier payload access) */
146 | + err = skb_linearize(skb);
147 | + if (err)
148 | + return err;
149 | +
150 | + /* data recovery */
151 | + switch (tp->fec.type) {
152 | + case TCP_FEC_TYPE_NONE:
153 | + return -1;
154 | + case TCP_FEC_TYPE_XOR_ALL:
155 | + recovery_status = tcp_fec_process_xor(sk, skb, 0);
156 | + break;
157 | + case TCP_FEC_TYPE_XOR_SKIP_1:
158 | + recovery_status = tcp_fec_process_xor(sk, skb, 1);
159 | + break;
160 | + }
161 | +
162 | + /* TODO error handling; -ENOMEM, etc. - disable FEC? */
163 | + if (recovery_status < 0)
164 | + return recovery_status;
165 | +
166 | + /* Send an explicit ACK if recovery failed */
167 | + if (recovery_status == FEC_LOSS_UNRECOVERED)
168 | + tcp_fec_send_ack(sk, skb, recovery_status);
169 | +
170 | + return 0;
171 | +}
172 | +
173 | +/* Checks the received options for loss indicators and acts upon them.
174 | + * In particular, the function handles recovery flags (indicators for
175 | + * successful and failed recoveries, tail losses)
176 | + * Returns: 1, if ACK contains a loss indicator
177 | + */
178 | +int tcp_fec_check_ack(struct sock *sk, u32 ack_seq)
179 | +{
180 | + struct tcp_sock *tp;
181 | +
182 | + tp = tcp_sk(sk);
183 | +
184 | + /* Clear local recovery indication (and ECN CWR demand)
185 | + * if it was ACKED by the other node
186 | + */
187 | + if (tp->rx_opt.fec.flags & TCP_FEC_RECOVERY_CWR) {
188 | + tp->fec.flags &= ~TCP_FEC_RECOVERY_SUCCESSFUL;
189 | + tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
190 | + }
191 | +
192 | + /* Check for tail loss indicators
193 | + * This happens when FEC was unable to recover the lost data and
194 | + * thus only sends an ACK with the loss range back. Everything not
195 | + * ACKed/SACKed now, is considered lost now.
196 | + */
197 | + if (tp->rx_opt.fec.flags & TCP_FEC_RECOVERY_FAILED) {
198 | + tcp_fec_mark_skbs_lost(sk);
199 | + return 1;
200 | + }
201 | +
202 | + /* Check if the remote endpoint successfully recovered data,
203 | + * if so we trigger a window reduction
204 | + */
205 | + if (tp->rx_opt.fec.flags & TCP_FEC_RECOVERY_SUCCESSFUL) {
206 | + /* Ignore flag if window was already reduced for the current
207 | + * loss episode or if previous reduction was not signaled
208 | + * yet (no outgoing packets)
209 | + */
210 | + if (after(ack_seq, tp->high_seq) &&
211 | + !(tp->fec.flags & TCP_FEC_RECOVERY_CWR)) {
212 | + tcp_fec_reduce_window(sk);
213 | + tp->fec.flags |= TCP_FEC_RECOVERY_CWR;
214 | + }
215 | +
216 | + return 1;
217 | + }
218 | +
219 | + return 0;
220 | +}
221 | +
222 | +/* Since data in the socket's receive queue can get consumed by other parties
223 | + * we need to clone these SKBs until they are no longer required for possible
224 | + * future recoveries. This function is called after the TCP header has been
225 | + * removed from the SKB already. All parameters required for recovery are
226 | + * stored in the SKB's control buffer.
227 | + * @skb - buffer which is moved to the receive queue
228 | + */
229 | +int tcp_fec_update_queue(struct sock *sk, struct sk_buff *skb)
230 | +{
231 | + struct tcp_sock *tp;
232 | + struct sk_buff *cskb;
233 | + u32 data_len;
234 | + int extra_bytes, err;
235 | + tp = tcp_sk(sk);
236 | +
237 | + /* clone the SKB and add it to the FEC receive queue
238 | + * (a simple extra reference to the SKB is not sufficient since
239 | + * since SKBs can only be queued on one list at a time)
240 | + */
241 | + cskb = skb_clone(skb, GFP_ATOMIC);
242 | + if (cskb == NULL)
243 | + return -ENOMEM;
244 | +
245 | + /* linearize the SKB (for easier payload access) */
246 | + err = skb_linearize(cskb);
247 | + if (err)
248 | + return err;
249 | +
250 | + data_len = skb->len;
251 | + if (!data_len) {
252 | + kfree_skb(cskb);
253 | + return 0;
254 | + }
255 | +
256 | + skb_queue_tail(&tp->fec.rcv_queue, cskb);
257 | + tp->fec.bytes_rcv_queue += data_len;
258 | +
259 | + /* check if we can dereference old SKBs (as long as we have enough
260 | + * data for future recoveries)
261 | + */
262 | + extra_bytes = tp->fec.bytes_rcv_queue - FEC_RCV_QUEUE_LIMIT;
263 | + while (extra_bytes > 0) {
264 | + cskb = skb_peek(&tp->fec.rcv_queue);
265 | + if (cskb == NULL)
266 | + return -EINVAL;
267 | +
268 | + data_len = TCP_SKB_CB(cskb)->end_seq - TCP_SKB_CB(cskb)->seq;
269 | + if (data_len > extra_bytes) {
270 | + break;
271 | + } else {
272 | + extra_bytes -= data_len;
273 | + tp->fec.bytes_rcv_queue -= data_len;
274 | + skb_unlink(cskb, &tp->fec.rcv_queue);
275 | + kfree_skb(cskb);
276 | + }
277 | + }
278 | +
279 | + return 0;
280 | +}
281 | +
282 | +/* Disables FEC for this connection (includes clearing references
283 | + * to buffers in receive queue)
284 | + */
285 | +void tcp_fec_disable(struct sock *sk)
286 | +{
287 | + struct tcp_sock *tp = tcp_sk(sk);
288 | +
289 | + if (!tcp_fec_is_enabled(tp))
290 | + return;
291 | +
292 | + tp->fec.type = 0;
293 | + tp->fec.bytes_rcv_queue = 0;
294 | + skb_queue_purge(&tp->fec.rcv_queue);
295 | +}
296 | +
297 | +/* Processes the current packet in the buffer, treated as an FEC packet
298 | + * with XOR-encoded payload (assumes that options were already processed)
299 | + * Returns: negative code, if an error occurred;
300 | + * positive code, otherwise (recovery status)
301 | + * @block_skip - Number of unencoded blocks between two encoded blocks
302 | + */
303 | +static int tcp_fec_process_xor(struct sock *sk, const struct sk_buff *skb,
304 | + unsigned int block_skip)
305 | +{
306 | + struct sk_buff *pskb;
307 | + struct tcp_sock *tp;
308 | + struct tcphdr *th;
309 | + u32 next_seq, end_seq, rec_seq;
310 | + unsigned char *data, *block;
311 | + unsigned int i, offset, data_len, block_len, rec_len;
312 | + bool seen_loss;
313 | + int ret;
314 | +
315 | + pskb = NULL;
316 | + tp = tcp_sk(sk);
317 | + th = tcp_hdr(skb);
318 | + next_seq = tp->rx_opt.fec.enc_seq;
319 | + end_seq = next_seq + tp->rx_opt.fec.enc_len;
320 | + block_len = skb->len - tcp_hdrlen(skb);
321 | + seen_loss = false;
322 | + offset = 0;
323 | +
324 | + /* memory allocation for decoding / recovered SKB data */
325 | + data = kmalloc(2 * block_len, GFP_ATOMIC);
326 | + if (data == NULL)
327 | + return -ENOMEM;
328 | +
329 | + block = data + block_len;
330 | +
331 | + /* copy FEC payload (skip TCP header) */
332 | + memcpy(data, skb->data + tcp_hdrlen(skb), block_len);
333 | +
334 | + /* process in-sequence data */
335 | + while ((data_len = tcp_fec_get_next_block(sk, &pskb,
336 | + &tp->fec.rcv_queue, next_seq,
337 | + min(block_len, end_seq - next_seq),
338 | + block))) {
339 | + next_seq += data_len;
340 | +
341 | + /* XOR with existing payload */
342 | + for (i = 0; i < data_len; i++)
343 | + data[i] ^= block[i];
344 | +
345 | + /* we could no read a whole MSS block, which means we
346 | + * reached the end of the queue or end of range which the
347 | + * FEC packet covers
348 | + */
349 | + if (data_len < block_len)
350 | + break;
351 | +
352 | + /* skip unencoded blocks if there is more data encoded */
353 | + if (end_seq - next_seq > 0)
354 | + next_seq += block_len * block_skip;
355 | + }
356 | +
357 | + /* check if all encoded bytes were already received */
358 | + if (next_seq == end_seq) {
359 | + kfree(data);
360 | + return FEC_NO_LOSS;
361 | + }
362 | +
363 | + /* we always recover one whole MSS block (otherwise slicing
364 | + * would introduce a lot of additional complexity here) and handle
365 | + * cut out already received sequences later
366 | + */
367 | + rec_seq = next_seq;
368 | + rec_len = min(block_len, end_seq - rec_seq);
369 | + offset = data_len;
370 | + if ((rec_seq + rec_len) == end_seq)
371 | + goto recover;
372 | +
373 | + next_seq += block_len * (block_skip + 1);
374 | + pskb = NULL;
375 | +
376 | + /* read a possibly partial (smaller than MSS) block to fill up the
377 | + * previously unfilled block and achieve alignment again
378 | + */
379 | + data_len = tcp_fec_get_next_block(sk, &pskb, &tp->out_of_order_queue,
380 | + next_seq, block_len - offset, block);
381 | +
382 | + next_seq += data_len;
383 | +
384 | + /* check if we could not read as much data as requested */
385 | + if ((next_seq != end_seq) && (data_len < (block_len - offset)))
386 | + goto clean;
387 | +
388 | + /* XOR with existing payload */
389 | + for (i = 0; i < data_len; i++)
390 | + data[i+offset] ^= block[i];
391 | +
392 | + /* skip unencoded blocks if there is more data encoded */
393 | + if (end_seq - next_seq > 0)
394 | + next_seq += block_len * block_skip;
395 | +
396 | + /* read all necessary blocks to finish decoding */
397 | + while ((data_len = tcp_fec_get_next_block(sk, &pskb,
398 | + &tp->out_of_order_queue, next_seq,
399 | + min(block_len, end_seq - next_seq),
400 | + block))) {
401 | + next_seq += data_len;
402 | +
403 | + /* XOR with existing payload */
404 | + for (i = 0; i < data_len; i++)
405 | + data[i] ^= block[i];
406 | +
407 | + /* we could not read a whole MSS block, which means we reached
408 | + * the end of the queue or end of range which the FEC packet
409 | + * covers
410 | + */
411 | + if (data_len < block_len)
412 | + break;
413 | +
414 | + /* skip unencoded blocks if there is more data encoded */
415 | + if (end_seq - next_seq > 0)
416 | + next_seq += block_len * block_skip;
417 | + }
418 | +
419 | + /* check if additional losses were observed (cannot recover) */
420 | + if (next_seq != end_seq)
421 | + goto clean;
422 | +
423 | +recover:
424 | + /* create and process recovered packets */
425 | + for (i = 0; i < rec_len; i++)
426 | + block[i] = data[(offset + i) % block_len];
427 | +
428 | + if (block_skip && ((block_len - offset) < rec_len)) {
429 | + /* recover non-consecutive sequence ranges (only when
430 | + * slicing is used)
431 | + */
432 | + u32 second_seq;
433 | + unsigned int second_seq_len, first_seq_len;
434 | +
435 | + first_seq_len = block_len - offset;
436 | + second_seq = rec_seq + first_seq_len + block_len * block_skip;
437 | + second_seq_len = rec_len - first_seq_len;
438 | +
439 | + ret = tcp_fec_recover(sk, skb, block, rec_seq, first_seq_len);
440 | + if (ret >= 0) {
441 | + int second_ret = tcp_fec_recover(sk, skb,
442 | + block + first_seq_len,
443 | + second_seq, second_seq_len);
444 | + if (second_ret < 0 || !ret)
445 | + ret = second_ret;
446 | + }
447 | + } else {
448 | + ret = tcp_fec_recover(sk, skb, block, rec_seq, rec_len);
449 | + }
450 | +
451 | + kfree(data);
452 | + return ret ? ret : FEC_LOSS_RECOVERED;
453 | +
454 | +clean:
455 | + kfree(data);
456 | + return FEC_LOSS_UNRECOVERED;
457 | +}
458 | +
459 | +/* Create a recovered packet and forward it to the reception routine */
460 | +static int tcp_fec_recover(struct sock *sk, const struct sk_buff *skb,
461 | + unsigned char *data, u32 seq, int len)
462 | +{
463 | + struct sk_buff *rskb;
464 | + struct tcp_sock *tp;
465 | +
466 | + tp = tcp_sk(sk);
467 | +
468 | + /* We will notify the remote node that recovery was successful */
469 | + tp->fec.flags |= TCP_FEC_RECOVERY_SUCCESSFUL;
470 | +
471 | + /* Check if we received some tail of the recovered sequence already
472 | + * by looking at the current SACK blocks (we don't want to recover
473 | + * more data than necessary to prevent DSACKS)
474 | + */
475 | + if (tcp_is_sack(tp)) {
476 | + int i;
477 | + for (i = 0; i < tp->rx_opt.num_sacks; i++) {
478 | + if (before(tp->selective_acks[i].start_seq,
479 | + seq + len) &&
480 | + !before(tp->selective_acks[i].end_seq,
481 | + seq + len)) {
482 | + len = tp->selective_acks[i].start_seq - seq;
483 | + break;
484 | + }
485 | + }
486 | + }
487 | +
488 | + /* We might have prematurely asked for a recovery in the case where the
489 | + * whole recovery sequence is already covered by SACKs
490 | + */
491 | + if (len <= 0)
492 | + return FEC_NO_LOSS;
493 | +
494 | + /* Create decoded packet and forward to reception routine */
495 | + rskb = tcp_fec_make_decoded_pkt(sk, skb, data, seq, len);
496 | + if (rskb == NULL)
497 | + return -EINVAL;
498 | +
499 | + return tcp_rcv_established(sk, rskb, tcp_hdr(rskb), rskb->len);
500 | +}
501 | +
502 | +/* Sends an ACK for the FEC packet and encodes any congestion or
503 | + * and/or recovery information
504 | + */
505 | +static void tcp_fec_send_ack(struct sock *sk, const struct sk_buff *skb,
506 | + int recovery_status)
507 | +{
508 | + struct tcp_sock *tp;
509 | + u32 end_seq;
510 | +
511 | + tp = tcp_sk(sk);
512 | +
513 | + /* Right now we only need an outgoing ACK if FEC recovery failed,
514 | + * in all other cases ACKs are implicitly generated
515 | + */
516 | + switch (recovery_status) {
517 | + case FEC_LOSS_UNRECOVERED:
518 | + end_seq = tp->rx_opt.fec.enc_seq + tp->rx_opt.fec.enc_len;
519 | + tp->fec.flags |= TCP_FEC_RECOVERY_FAILED;
520 | + tp->fec.lost_len = end_seq - tp->rcv_nxt;
521 | + tcp_send_ack(sk);
522 | + break;
523 | + }
524 | +}
525 | +
526 | +/* Reduces the congestion window (similar to completed fast recovery)
527 | + * If the node is already in recovery mode, undo is disabled to enforce
528 | + * the window reduction upon completion
529 | + */
530 | +static void tcp_fec_reduce_window(struct sock *sk)
531 | +{
532 | + struct tcp_sock *tp;
533 | + const struct inet_connection_sock *icsk;
534 | +
535 | + tp = tcp_sk(sk);
536 | + icsk = inet_csk(sk);
537 | +
538 | + if (icsk->icsk_ca_state < TCP_CA_CWR) {
539 | + tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
540 | + if (tp->snd_ssthresh < TCP_INFINITE_SSTHRESH) {
541 | + tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
542 | + tp->snd_cwnd_stamp = tcp_time_stamp;
543 | + }
544 | +
545 | + /* Any future window reduction requests are ignored until
546 | + * snd_nxt is ACKed
547 | + */
548 | + tp->high_seq = tp->snd_nxt;
549 | + tp->undo_marker = 0;
550 | + } else {
551 | + /* Socket is in some congestion mode and we only need to make
552 | + * sure that window reduction is executed when recovery
553 | + * is finished
554 | + */
555 | + tp->undo_marker = 0;
556 | + }
557 | +}
558 | +
559 | +/* The incoming ACK indicates a failed recovery.
560 | + * Mark all unacked SKBs in the loss range as lost.
561 | + * TODO With interleaved coding, we have the additional constraint
562 | + * that the SKBs in the loss range also must have been encoded the
563 | + * triggering FEC packet, and for that we need to keep some info
564 | + * about FEC packets on the sender side
565 | + */
566 | +static void tcp_fec_mark_skbs_lost(struct sock *sk)
567 | +{
568 | + struct tcp_sock *tp;
569 | + struct sk_buff *skb;
570 | + u32 start_seq, end_seq;
571 | +
572 | + tp = tcp_sk(sk);
573 | + skb = tp->lost_skb_hint ? tp->lost_skb_hint : tcp_write_queue_head(sk);
574 | +
575 | + /* All SKBs falling completely in the range are marked */
576 | + start_seq = tp->rx_opt.fec.lost_seq;
577 | + end_seq = tp->rx_opt.fec.lost_seq + tp->rx_opt.fec.lost_len;
578 | +
579 | + tcp_for_write_queue_from(skb, sk) {
580 | + if (skb == tcp_send_head(sk))
581 | + break;
582 | +
583 | + /* Past loss range */
584 | + if (!before(TCP_SKB_CB(skb)->seq, end_seq))
585 | + break;
586 | +
587 | + /* SKB not (fully) within range */
588 | + if (before(TCP_SKB_CB(skb)->seq, start_seq) ||
589 | + after(TCP_SKB_CB(skb)->end_seq, end_seq))
590 | + continue;
591 | +
592 | + /* SKB already marked */
593 | + if (TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))
594 | + continue;
595 | +
596 | + /* Verify retransmit hint before marking
597 | + * (see tcp_verify_retransmit_hint(),
598 | + * copied since method defined static in tcp_input.c)
599 | + */
600 | + if ((tp->retransmit_skb_hint == NULL) ||
601 | + before(TCP_SKB_CB(skb)->seq,
602 | + TCP_SKB_CB(tp->retransmit_skb_hint)->seq))
603 | + tp->retransmit_skb_hint = skb;
604 | +
605 | + if (!tp->lost_out ||
606 | + after(TCP_SKB_CB(skb)->end_seq, tp->retransmit_high))
607 | + tp->retransmit_high = TCP_SKB_CB(skb)->end_seq;
608 | +
609 | + /* Mark SKB as lost (see tcp_skb_mark_lost()) */
610 | + tp->lost_out += tcp_skb_pcount(skb);
611 | + TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
612 | + }
613 | +
614 | + tcp_verify_left_out(tp);
615 | +}
616 | +
617 | +/* Searches for the FEC option in the packet header and replaces
618 | + * the long option with a short one padded by NOPs.
619 | + * This is done to convert the option used by an encoded packet
620 | + * to the option used by a recovered packet.
621 | + */
622 | +static bool tcp_fec_update_decoded_option(struct sk_buff *skb)
623 | +{
624 | + struct tcphdr *th;
625 | + unsigned char *ptr;
626 | + int length;
627 | +
628 | + th = tcp_hdr(skb);
629 | + ptr = (unsigned char *) (th + 1);
630 | + length = (th->doff * 4) - sizeof(struct tcphdr);
631 | +
632 | + while (length > 0) {
633 | + int opcode = *ptr++;
634 | + int opsize;
635 | +
636 | + switch (opcode) {
637 | + case TCPOPT_EOL:
638 | + return 0;
639 | + case TCPOPT_NOP:
640 | + length--;
641 | + continue;
642 | + default:
643 | + opsize = *ptr++;
644 | + if (opsize < 2 || opsize > length)
645 | + return 0;
646 | +
647 | + if (opcode == TCPOPT_EXP &&
648 | + get_unaligned_be16(ptr) == TCPOPT_FEC_MAGIC) {
649 | + /* Update FEC option:
650 | + * 1. Convert long option into short option
651 | + * 2. Clear ENCODED flag (keep other flags)
652 | + * 3. Replace option value (long option) by NOPs
653 | + */
654 | + u32 *fec_opt_start = (u32 *) (ptr - 2);
655 | + *fec_opt_start = htonl((
656 | + get_unaligned_be32(fec_opt_start) &
657 | + 0xFF00FFFF) | 0x00050000);
658 | + *(fec_opt_start + 1) = htonl((
659 | + get_unaligned_be32(fec_opt_start + 1) &
660 | + 0xEF000000) | 0x00010101);
661 | +
662 | + return 1;
663 | + }
664 | +
665 | + ptr += opsize - 2;
666 | + length -= opsize;
667 | + }
668 | + }
669 | +
670 | + return 0;
671 | +}
672 | +
673 | +/* Allocates an SKB for data we want to forward to reception routines
674 | + * (recovered data) by making a copy of the FEC SKB and replacing the data
675 | + * part, all other segments (options, etc.) are preserved
676 | + */
677 | +static struct sk_buff *tcp_fec_make_decoded_pkt(struct sock *sk,
678 | + const struct sk_buff *skb,
679 | + unsigned char *dec_data,
680 | + u32 seq, unsigned int len)
681 | +{
682 | + struct tcp_sock *tp;
683 | + struct sk_buff *nskb;
684 | +
685 | + tp = tcp_sk(sk);
686 | + nskb = skb_copy(skb, GFP_ATOMIC);
687 | + if (nskb == NULL)
688 | + return NULL;
689 | +
690 | + /* Update FEC option for the new packet */
691 | + if (!tcp_fec_update_decoded_option(nskb)) {
692 | + /* TODO Do we need this catch? Technically we don't reach this
693 | + * method if there is no FEC option in the header.
694 | + */
695 | + return NULL;
696 | + }
697 | +
698 | + /* check if we received some tail of the recovered sequence already
699 | + * by looking at the current SACK blocks (we don't want to recover
700 | + * more data than necessary to prevent DSACKS)
701 | + */
702 | + if (tcp_is_sack(tp)) {
703 | + int i;
704 | + for (i = 0; i < tp->rx_opt.num_sacks; i++) {
705 | + if (before(tp->selective_acks[i].start_seq,
706 | + seq + len) &&
707 | + !before(tp->selective_acks[i].end_seq,
708 | + seq + len)) {
709 | + len = tp->selective_acks[i].start_seq - seq;
710 | + break;
711 | + }
712 | + }
713 | + }
714 | +
715 | + /* trim data section to fit recovered sequence if necessary */
716 | + if (len < (TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq))
717 | + skb_trim(nskb, len + tcp_hdrlen(nskb));
718 | +
719 | + /* fix the sequence numbers */
720 | + tcp_hdr(nskb)->seq = htonl(seq);
721 | + tcp_hdr(nskb)->ack_seq = htonl(tp->snd_una);
722 | + TCP_SKB_CB(nskb)->seq = seq;
723 | + TCP_SKB_CB(nskb)->end_seq = seq + len;
724 | +
725 | + /* replace SKB payload with recovered data */
726 | + memcpy(nskb->data + tcp_hdrlen(nskb), dec_data, len);
727 | +
728 | + /* packets used for recovery had their checksums checked already */
729 | + nskb->ip_summed = CHECKSUM_UNNECESSARY;
730 | +
731 | + return nskb;
732 | +}
733 | +
734 | +/* Gets the next byte block from an SKB queue (any SKB which is touched
735 | + * in this procedure will be linearized to simplify payload access)
736 | + * @skb - Points to SKB from which previous block was extracted (useful
737 | + * for successive calls to this function, which avoids moving through
738 | + * the whole queue again)
739 | + * @queue - SKB queue to read from (SKB has to point to an element on this
740 | + * queue)
741 | + * @seq - Sequence number of first byte in the block
742 | + * @block_len
743 | + * @block
744 | + *
745 | + * Returns the bytes written to the block memory
746 | + */
747 | +static unsigned int tcp_fec_get_next_block(struct sock *sk,
748 | + struct sk_buff **skb,
749 | + struct sk_buff_head *queue, u32 seq,
750 | + unsigned int block_len, unsigned char *block)
751 | +{
752 | + unsigned int cur_len, offset, num_bytes;
753 | + int err;
754 | + u32 end_seq;
755 | +
756 | + cur_len = 0;
757 | +
758 | + /* Get first SKB of the write queue and specify next sequence to
759 | + * encode
760 | + */
761 | + if (*skb == NULL) {
762 | + *skb = skb_peek(queue);
763 | + if (*skb == NULL)
764 | + return 0;
765 | + }
766 | +
767 | + /* move to SKB which stores the next sequence to encode */
768 | + while (*skb) {
769 | + /* If we observe an RST/SYN, we stop here to avoid
770 | + * handling corner cases
771 | + */
772 | + if (TCP_SKB_CB(*skb)->tcp_flags &
773 | + (TCPHDR_RST |
774 | + TCPHDR_SYN))
775 | + return 0;
776 | + if (!before(seq, TCP_SKB_CB(*skb)->seq) &&
777 | + before(seq, TCP_SKB_CB(*skb)->end_seq))
778 | + break;
779 | + if (*skb == skb_peek_tail(queue)) {
780 | + *skb = NULL;
781 | + break;
782 | + }
783 | +
784 | + *skb = skb_queue_next(queue, *skb);
785 | + }
786 | +
787 | + if (*skb == NULL)
788 | + return 0;
789 | +
790 | + /* copy bytes from SKBs (connected sequences) */
791 | + while (*skb && (cur_len < block_len)) {
792 | + err = skb_linearize(*skb);
793 | + if (err)
794 | + return err;
795 | +
796 | + /* Deal with the end seq number being incremented by
797 | + * one if the FIN flag is set (we don't want to encode this)
798 | + */
799 | + end_seq = TCP_SKB_CB(*skb)->end_seq;
800 | + if (TCP_SKB_CB(*skb)->tcp_flags & TCPHDR_FIN)
801 | + end_seq--;
802 | +
803 | + if ((seq >= TCP_SKB_CB(*skb)->seq) && (seq < end_seq)) {
804 | + /* Copy data depending on:
805 | + * - remaining space in the block
806 | + * - remaining data in the SKB
807 | + */
808 | + offset = seq - TCP_SKB_CB(*skb)->seq;
809 | + num_bytes = min(block_len - cur_len,
810 | + end_seq - seq);
811 | +
812 | + memcpy(block + cur_len, (*skb)->data + offset,
813 | + num_bytes);
814 | + cur_len += num_bytes;
815 | + seq += num_bytes;
816 | + }
817 | +
818 | + if (*skb == skb_peek_tail(queue) || cur_len >= block_len)
819 | + break;
820 | +
821 | + *skb = skb_queue_next(queue, *skb);
822 | + }
823 | +
824 | + return cur_len;
825 | +}
826 | diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
827 | index 3260498..4d17c5f 100644
828 | --- a/net/ipv4/tcp_input.c
829 | +++ b/net/ipv4/tcp_input.c
830 | @@ -107,6 +107,7 @@ int sysctl_tcp_early_retrans __read_mostly = 3;
831 | #define FLAG_SYN_ACKED 0x10 /* This ACK acknowledged SYN. */
832 | #define FLAG_DATA_SACKED 0x20 /* New SACK. */
833 | #define FLAG_ECE 0x40 /* ECE in this ACK */
834 | +#define FLAG_FEC_CWR_REQUESTED 0x80 /* cwnd reduction requested */
835 | #define FLAG_SLOWPATH 0x100 /* Do not skip RFC checks for window update.*/
836 | #define FLAG_ORIG_SACK_ACKED 0x200 /* Never retransmitted data are (s)acked */
837 | #define FLAG_SND_UNA_ADVANCED 0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
838 | @@ -116,8 +117,9 @@ int sysctl_tcp_early_retrans __read_mostly = 3;
839 |
840 | #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED)
841 | #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
842 | -#define FLAG_CA_ALERT (FLAG_DATA_SACKED|FLAG_ECE)
843 | +#define FLAG_CA_ALERT (FLAG_DATA_SACKED|FLAG_ECE|FLAG_FEC_CWR_REQUESTED)
844 | #define FLAG_FORWARD_PROGRESS (FLAG_ACKED|FLAG_DATA_SACKED)
845 | +#define FLAG_CONGESTION (FLAG_ECE|FLAG_FEC_CWR_REQUESTED)
846 |
847 | #define TCP_REMNANT (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN|TCP_FLAG_PSH)
848 | #define TCP_HP_BITS (~(TCP_RESERVED_BITS|TCP_FLAG_PSH))
849 | @@ -2536,7 +2538,8 @@ void tcp_enter_cwr(struct sock *sk, const int set_ssthresh)
850 | struct tcp_sock *tp = tcp_sk(sk);
851 |
852 | tp->prior_ssthresh = 0;
853 | - if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {
854 | + if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR &&
855 | + after(tp->snd_una, tp->high_seq)) {
856 | tp->undo_marker = 0;
857 | tcp_init_cwnd_reduction(sk, set_ssthresh);
858 | tcp_set_ca_state(sk, TCP_CA_CWR);
859 | @@ -3195,7 +3198,7 @@ static inline bool tcp_ack_is_dubious(const struct sock *sk, const int flag)
860 | static inline bool tcp_may_raise_cwnd(const struct sock *sk, const int flag)
861 | {
862 | const struct tcp_sock *tp = tcp_sk(sk);
863 | - return (!(flag & FLAG_ECE) || tp->snd_cwnd < tp->snd_ssthresh) &&
864 | + return (!(flag & FLAG_CONGESTION) || tp->snd_cwnd < tp->snd_ssthresh) &&
865 | !tcp_in_cwnd_reduction(sk);
866 | }
867 |
868 | @@ -3363,6 +3366,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
869 | if (after(ack, prior_snd_una))
870 | flag |= FLAG_SND_UNA_ADVANCED;
871 |
872 | + /* Check if FEC expects and executes a window reduction */
873 | + if (tcp_fec_is_enabled(tp) && tcp_fec_check_ack(sk, ack))
874 | + flag |= FLAG_FEC_CWR_REQUESTED;
875 | +
876 | prior_fackets = tp->fackets_out;
877 | prior_in_flight = tcp_packets_in_flight(tp);
878 |
879 | @@ -4059,6 +4066,9 @@ static void tcp_ofo_queue(struct sock *sk)
880 | tp->rcv_nxt, TCP_SKB_CB(skb)->seq,
881 | TCP_SKB_CB(skb)->end_seq);
882 |
883 | + if (tcp_fec_is_enabled(tp))
884 | + tcp_fec_update_queue(sk, skb);
885 | +
886 | __skb_unlink(skb, &tp->out_of_order_queue);
887 | __skb_queue_tail(&sk->sk_receive_queue, skb);
888 | tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
889 | @@ -4335,6 +4345,9 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
890 | goto out_of_window;
891 |
892 | /* Ok. In sequence. In window. */
893 | + if (tcp_fec_is_enabled(tp))
894 | + tcp_fec_update_queue(sk, skb);
895 | +
896 | if (tp->ucopy.task == current &&
897 | tp->copied_seq == tp->rcv_nxt && tp->ucopy.len &&
898 | sock_owned_by_user(sk) && !tp->urg_data) {
899 | @@ -4653,6 +4666,12 @@ static int tcp_prune_queue(struct sock *sk)
900 | tp->copied_seq, tp->rcv_nxt);
901 | sk_mem_reclaim(sk);
902 |
903 | + /* Disable FEC if it was enabled to prevent keeping data
904 | + * in the receive queue longer than necessary
905 | + */
906 | + if (tcp_fec_is_enabled(tp))
907 | + tcp_fec_disable(sk);
908 | +
909 | if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf)
910 | return 0;
911 |
912 | @@ -5010,6 +5029,21 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
913 | /* Reset is accepted even if it did not pass PAWS. */
914 | }
915 |
916 | + /* Special processing if FEC is enabled */
917 | + if (tcp_fec_is_enabled(tp)) {
918 | + if (tcp_fec_is_encoded(tp)) {
919 | + tcp_fec_process(sk, skb);
920 | + goto discard;
921 | + } else if (!tp->rx_opt.fec.saw_fec && th->ack &&
922 | + sk->sk_state == TCP_LAST_ACK) {
923 | + /* TODO Sometimes the FEC option is not appended to the
924 | + * FIN-ACK packet; socket options cleared?
925 | + */
926 | + tcp_ack(sk, skb, FLAG_SLOWPATH);
927 | + goto discard;
928 | + }
929 | + }
930 | +
931 | /* Step 1: check sequence number */
932 | if (!tcp_sequence(tp, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq)) {
933 | /* RFC793, page 37: "In all states except SYN-SENT, all reset
934 | diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
935 | index 1d0bf2f..acfc144 100644
936 | --- a/net/ipv4/tcp_minisocks.c
937 | +++ b/net/ipv4/tcp_minisocks.c
938 | @@ -483,6 +483,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
939 | newtp->fastopen_rsk = NULL;
940 | newtp->syn_data_acked = 0;
941 |
942 | + newtp->high_seq = newtp->snd_nxt;
943 | +
944 | /* TCP FEC option */
945 | newtp->rx_opt.fec.type = sysctl_tcp_fec ? req->fec_type : 0;
946 | newtp->fec.type = newtp->fec.flags = 0;
947 | --
948 | 2.1.0.rc2.206.gedb03e5
949 |
950 |
--------------------------------------------------------------------------------
/v3.10/0003-net-tcp-TCP-with-Forward-Error-Correction-Sender.patch:
--------------------------------------------------------------------------------
1 | From 928d69eec0343e39ecb3560e095b5c16d8d9977a Mon Sep 17 00:00:00 2001
2 | From: Tobias Flach
3 | Date: Mon, 25 Aug 2014 16:47:33 -0700
4 | Subject: [PATCH] net-tcp: TCP with Forward Error Correction (Sender)
5 |
6 | Implementation of the sender part of forward error correction in TCP.
7 |
8 | Implemented components:
9 | * FEC payload construction and transmission (encoding):
10 | - The FEC mechanism is invoked after 1/4 RTT after a transmission
11 | (can be a GSO/TSO packet).
12 | - The encoding scheme is negotiated during connection setup. In
13 | the case of the basic XOR, it XORs all byte blocks (ignoring packet
14 | boundaries, but using the current MSS as the block size) which
15 | were already transmitted but never FEC-encoded before.
16 | Depending on the specified maximum number of bytes per FEC payload
17 | (see FEC_RCV_QUEUE_LIMIT), it is possible that multiple FEC packets are
18 | generated in this step.
19 | - Currently, the FEC option carries the length of the sequence
20 | range used for encoding (that is, sequence number of the last
21 | encoded byte minus the sequence number of the first encoded byte).
22 | This is sufficient to determine the length of
23 | all encoded blocks on the receiver side (all blocks are MSS
24 | bytes large, except for the last one).
25 | * sysctl_tcp_fec extension to toggle FEC transmit during loss episodes:
26 | - Valid values are:
27 | 0 FEC is disabled
28 | 1 FEC is enabled except for loss episodes
29 | 2 FEC is enabled including for loss episodes
30 | ---
31 | include/net/inet_connection_sock.h | 4 +-
32 | include/net/tcp_fec.h | 26 +++
33 | net/ipv4/inet_diag.c | 3 +-
34 | net/ipv4/sysctl_net_ipv4.c | 3 +-
35 | net/ipv4/tcp_fec.c | 396 +++++++++++++++++++++++++++++++++++++
36 | net/ipv4/tcp_input.c | 6 +
37 | net/ipv4/tcp_ipv4.c | 3 +-
38 | net/ipv4/tcp_output.c | 5 +-
39 | net/ipv4/tcp_timer.c | 14 +-
40 | 9 files changed, 454 insertions(+), 6 deletions(-)
41 |
42 | diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
43 | index de2c785..d13c597 100644
44 | --- a/include/net/inet_connection_sock.h
45 | +++ b/include/net/inet_connection_sock.h
46 | @@ -135,6 +135,7 @@ struct inet_connection_sock {
47 | #define ICSK_TIME_PROBE0 3 /* Zero window probe timer */
48 | #define ICSK_TIME_EARLY_RETRANS 4 /* Early retransmit timer */
49 | #define ICSK_TIME_LOSS_PROBE 5 /* Tail loss probe timer */
50 | +#define ICSK_TIME_FEC 6 /* FEC delayed send timer */
51 |
52 | static inline struct inet_connection_sock *inet_csk(const struct sock *sk)
53 | {
54 | @@ -225,7 +226,8 @@ static inline void inet_csk_reset_xmit_timer(struct sock *sk, const int what,
55 | }
56 |
57 | if (what == ICSK_TIME_RETRANS || what == ICSK_TIME_PROBE0 ||
58 | - what == ICSK_TIME_EARLY_RETRANS || what == ICSK_TIME_LOSS_PROBE) {
59 | + what == ICSK_TIME_EARLY_RETRANS || what == ICSK_TIME_LOSS_PROBE ||
60 | + what == ICSK_TIME_FEC) {
61 | icsk->icsk_pending = what;
62 | icsk->icsk_timeout = jiffies + when;
63 | sk_reset_timer(sk, &icsk->icsk_retransmit_timer, icsk->icsk_timeout);
64 | diff --git a/include/net/tcp_fec.h b/include/net/tcp_fec.h
65 | index 1660e58..38f2c40 100644
66 | --- a/include/net/tcp_fec.h
67 | +++ b/include/net/tcp_fec.h
68 | @@ -12,6 +12,9 @@
69 |
70 | #define TCP_FEC_NUM_TYPES 3
71 |
72 | +/* Delay transmission of FEC packets (delay defined in tcp_fec_arm_timer()) */
73 | +#define TCP_FEC_DELAYED_SEND 1
74 | +
75 | /*
76 | * Returns true if FEC is enabled for the socket
77 | */
78 | @@ -77,4 +80,27 @@ int tcp_fec_update_queue(struct sock *sk, struct sk_buff *skb);
79 | */
80 | void tcp_fec_disable(struct sock *sk);
81 |
82 | +/* Arms the timer for a delayed FEC transmission if there is
83 | + * no earlier timeout defined (i.e. retransmission timeout)
84 | + */
85 | +void tcp_fec_arm_timer(struct sock *sk);
86 | +
87 | +/* The FEC timer fired. Force an FEC transmission for the
88 | + * last unencoded burst. Rearm the RTO timer (which was switched
89 | + * out when setting the FEC timer). Set a new FEC timer if there
90 | + * is pending unencoded data.
91 | + */
92 | +void tcp_fec_timer(struct sock *sk);
93 | +
94 | +/* If FEC packets transmissions are delayed set a timer
95 | + * (if not already set), otherwise invoke the FEC mechanism
96 | + * immediately
97 | + */
98 | +int tcp_fec_invoke(struct sock *sk);
99 | +
100 | +/* Invoke the FEC mechanism set for the connection;
101 | + * Create and sends out FEC packets
102 | + */
103 | +int tcp_fec_invoke_nodelay(struct sock *sk);
104 | +
105 | #endif
106 | diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
107 | index 5f64875..e151283 100644
108 | --- a/net/ipv4/inet_diag.c
109 | +++ b/net/ipv4/inet_diag.c
110 | @@ -160,7 +160,8 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
111 |
112 | if (icsk->icsk_pending == ICSK_TIME_RETRANS ||
113 | icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
114 | - icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
115 | + icsk->icsk_pending == ICSK_TIME_LOSS_PROBE ||
116 | + icsk->icsk_pending == ICSK_TIME_FEC) {
117 | r->idiag_timer = 1;
118 | r->idiag_retrans = icsk->icsk_retransmits;
119 | r->idiag_expires = EXPIRES_IN_MS(icsk->icsk_timeout);
120 | diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
121 | index 42ea051..389900f 100644
122 | --- a/net/ipv4/sysctl_net_ipv4.c
123 | +++ b/net/ipv4/sysctl_net_ipv4.c
124 | @@ -28,6 +28,7 @@
125 |
126 | static int zero;
127 | static int one = 1;
128 | +static int two = 2;
129 | static int four = 4;
130 | static int tcp_retr1_max = 255;
131 | static int ip_local_port_range_min[] = { 1, 1 };
132 | @@ -778,7 +779,7 @@ static struct ctl_table ipv4_table[] = {
133 | .mode = 0644,
134 | .proc_handler = proc_dointvec,
135 | .extra1 = &zero,
136 | - .extra2 = &one,
137 | + .extra2 = &two,
138 | },
139 | { }
140 | };
141 | diff --git a/net/ipv4/tcp_fec.c b/net/ipv4/tcp_fec.c
142 | index 3a8bd6d..7f04e49 100644
143 | --- a/net/ipv4/tcp_fec.c
144 | +++ b/net/ipv4/tcp_fec.c
145 | @@ -19,12 +19,30 @@ static struct sk_buff *tcp_fec_make_decoded_pkt(struct sock *sk,
146 | const struct sk_buff *skb, unsigned char *dec_data,
147 | u32 seq, unsigned int len);
148 |
149 | +/* Sender routines */
150 | +static int tcp_fec_create(struct sock *sk, struct sk_buff_head *list);
151 | +static int tcp_fec_create_xor(struct sock *sk, struct sk_buff_head *list,
152 | + unsigned int first_seq, unsigned int block_len,
153 | + unsigned int block_skip,
154 | + unsigned int max_encoded_per_pkt);
155 | +static struct sk_buff *tcp_fec_make_encoded_pkt(struct sock *sk,
156 | + struct tcp_fec *fec, unsigned char *enc_data,
157 | + u32 seq);
158 | +static int tcp_fec_xmit_all(struct sock *sk, struct sk_buff_head *list);
159 | +static int tcp_fec_xmit(struct sock *sk, struct sk_buff *skb);
160 | +
161 | /* Buffer access routine */
162 | static unsigned int tcp_fec_get_next_block(struct sock *sk,
163 | struct sk_buff **skb, struct sk_buff_head *queue,
164 | u32 seq, unsigned int block_len,
165 | unsigned char *block);
166 |
167 | +/* Have to define this signature here since the actual function was static
168 | + * and tcp_output.c has no corresponding header file
169 | + */
170 | +extern int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
171 | + gfp_t gfp_mask);
172 | +
173 | /* Decodes FEC parameters and stores them in the FEC struct
174 | * @seq - sequence number of the packet
175 | * @ack_seq - ACKed sequence number
176 | @@ -854,3 +872,381 @@ static unsigned int tcp_fec_get_next_block(struct sock *sk,
177 |
178 | return cur_len;
179 | }
180 | +
181 | +/* Arms the timer for a delayed FEC transmission if there is
182 | + * no earlier timeout defined (i.e. retransmission timeout)
183 | + */
184 | +void tcp_fec_arm_timer(struct sock *sk)
185 | +{
186 | + struct inet_connection_sock *icsk;
187 | + struct tcp_sock *tp;
188 | + u32 delta, timeout, rtt;
189 | +
190 | + icsk = inet_csk(sk);
191 | + tp = tcp_sk(sk);
192 | +
193 | + /* Only arm a timer if connection is established */
194 | + if (sk->sk_state != TCP_ESTABLISHED)
195 | + return;
196 | +
197 | + /* Forward next sequence to be encoded if unencoded data was acked */
198 | + if (after(tp->snd_una, tp->fec.next_seq))
199 | + tp->fec.next_seq = tp->snd_una;
200 | +
201 | + /* Don't arm the timer if there is no unencoded data left */
202 | + if (!before(tp->fec.next_seq, tp->snd_nxt))
203 | + return;
204 | +
205 | + /* TODO handle other timers which might be armed;
206 | + * EARLY_RETRANS? LOSS_PROBE?
207 | + */
208 | +
209 | + /* Compute timeout (currently 0.25 * RTT) */
210 | + rtt = tp->srtt >> 3;
211 | + timeout = rtt >> 2;
212 | +
213 | + /* Compute delay between transmission of original packet and this call
214 | + * (difference is subtracted from timeout value)
215 | + */
216 | + delta = 0;
217 | + if (delta > timeout) {
218 | + tcp_fec_invoke_nodelay(sk);
219 | + return;
220 | + } else if (delta > 0) {
221 | + timeout -= delta;
222 | + }
223 | +
224 | + /* Do not replace a timeout occurring earlier */
225 | + if (jiffies + timeout >= icsk->icsk_timeout)
226 | + return;
227 | +
228 | + inet_csk_reset_xmit_timer(sk, ICSK_TIME_FEC, timeout, TCP_RTO_MAX);
229 | +}
230 | +
231 | +/* The FEC timer fired. Force an FEC transmission for the
232 | + * last unencoded burst. Rearm the RTO timer (which was switched
233 | + * out when setting the FEC timer). Set a new FEC timer if there
234 | + * is pending unencoded data.
235 | + */
236 | +void tcp_fec_timer(struct sock *sk)
237 | +{
238 | + struct inet_connection_sock *icsk;
239 | + struct tcp_sock *tp;
240 | +
241 | + icsk = inet_csk(sk);
242 | + tp = tcp_sk(sk);
243 | +
244 | + tcp_fec_invoke_nodelay(sk);
245 | +
246 | + icsk->icsk_pending = 0;
247 | + tcp_rearm_rto(sk);
248 | +
249 | + tcp_fec_arm_timer(sk);
250 | +}
251 | +
252 | +/* If FEC packet transmissions are delayed set a timer
253 | + * (if not already set), otherwise invoke the FEC mechanism
254 | + * immediately
255 | + */
256 | +int tcp_fec_invoke(struct sock *sk)
257 | +{
258 | + struct inet_connection_sock *icsk;
259 | + struct tcp_sock *tp;
260 | +
261 | + icsk = inet_csk(sk);
262 | + tp = tcp_sk(sk);
263 | +
264 | +#ifndef TCP_FEC_DELAYED_SEND
265 | + return tcp_fec_invoke_nodelay(sk);
266 | +#else
267 | + /* Set the timer for sending an FEC packet if no FEC
268 | + * timer is active yet
269 | + */
270 | + if (!icsk->icsk_pending || icsk->icsk_pending != ICSK_TIME_FEC)
271 | + tcp_fec_arm_timer(sk);
272 | +#endif
273 | +
274 | + return 0;
275 | +}
276 | +
277 | +/* Invokes the FEC mechanism set for the connection;
278 | + * Creates and sends out FEC packets
279 | + */
280 | +int tcp_fec_invoke_nodelay(struct sock *sk)
281 | +{
282 | + int err;
283 | + struct sk_buff_head *list;
284 | + struct sk_buff *skb;
285 | + struct tcp_fec *fec;
286 | +
287 | + list = kmalloc(sizeof(struct sk_buff_head), GFP_ATOMIC);
288 | + if (list == NULL)
289 | + return -ENOMEM;
290 | +
291 | + skb_queue_head_init(list);
292 | + err = tcp_fec_create(sk, list);
293 | + if (err)
294 | + goto clean;
295 | +
296 | + err = tcp_fec_xmit_all(sk, list);
297 | + if (err)
298 | + goto clean;
299 | +
300 | +clean:
301 | + /* Purge all SKBs (purge FEC structs first) */
302 | + skb = (struct sk_buff *) list;
303 | + while (!skb_queue_is_last(list, skb)) {
304 | + skb = skb_queue_next(list, skb);
305 | + fec = TCP_SKB_CB(skb)->fec;
306 | + if (fec != NULL) {
307 | + kfree(fec);
308 | + TCP_SKB_CB(skb)->fec = NULL;
309 | + }
310 | + }
311 | +
312 | + skb_queue_purge(list);
313 | + kfree(list);
314 | +
315 | + /* TODO error handling; -ENOMEM, etc. - disable FEC? */
316 | +
317 | + return err;
318 | +}
319 | +
320 | +/* Creates one or more FEC packets (can depend on the FEC type used)
321 | + * and puts them in a queue
322 | + * @list: queue head
323 | + */
324 | +static int tcp_fec_create(struct sock *sk, struct sk_buff_head *list)
325 | +{
326 | + struct tcp_sock *tp;
327 | + unsigned int first_seq, block_len;
328 | + int err;
329 | +
330 | + tp = tcp_sk(sk);
331 | +
332 | + /* Update the pointer to the first byte to be encoded next
333 | + * (this only matters when a packet was ACKed before it was
334 | + * encoded)
335 | + */
336 | + if (after(tp->snd_una, tp->fec.next_seq))
337 | + tp->fec.next_seq = tp->snd_una;
338 | +
339 | + first_seq = tp->fec.next_seq;
340 | + block_len = tcp_current_mss(sk);
341 | +
342 | + switch (tp->fec.type) {
343 | + case TCP_FEC_TYPE_NONE:
344 | + return 0;
345 | + case TCP_FEC_TYPE_XOR_ALL:
346 | + return tcp_fec_create_xor(sk, list, first_seq,
347 | + block_len, 0,
348 | + FEC_RCV_QUEUE_LIMIT - block_len);
349 | + case TCP_FEC_TYPE_XOR_SKIP_1:
350 | + err = tcp_fec_create_xor(sk, list, first_seq, block_len, 1,
351 | + FEC_RCV_QUEUE_LIMIT - block_len);
352 | + if (err)
353 | + return err;
354 | +
355 | + return tcp_fec_create_xor(sk, list, first_seq + block_len,
356 | + block_len, 1,
357 | + FEC_RCV_QUEUE_LIMIT - block_len);
358 | + }
359 | +
360 | + return 0;
361 | +}
362 | +
363 | +/* Creates FEC packet(s) using XOR encoding
364 | + * (allocates memory for the FEC structs)
365 | + * @first_seq - Sequence number of first byte to be encoded
366 | + * @block_len - Block length (typically MSS)
367 | + * @block_skip - Number of unencoded blocks between two encoded blocks
368 | + * @max_encoded_per_pkt - maximum number of blocks encoded per packet
369 | + * (0, if unlimited)
370 | + */
371 | +static int tcp_fec_create_xor(struct sock *sk, struct sk_buff_head *list,
372 | + unsigned int first_seq, unsigned int block_len,
373 | + unsigned int block_skip,
374 | + unsigned int max_encoded_per_pkt)
375 | +{
376 | + struct tcp_sock *tp;
377 | + struct sk_buff *skb, *fskb;
378 | + struct tcp_fec *fec;
379 | + unsigned int c_encoded; /* Number of currently encoded blocks
380 | + not yet added to an FEC packet */
381 | + unsigned int next_seq; /* Next byte to encode */
382 | + unsigned int i;
383 | + unsigned char *data, *block;
384 | + u16 data_len;
385 | +
386 | + tp = tcp_sk(sk);
387 | + skb = NULL;
388 | + c_encoded = 0;
389 | + next_seq = first_seq;
390 | +
391 | + /* memory allocation
392 | + * data - used temporarily to obtain byte blocks and store the payload
393 | + (is freed before returning; we need two blocks here to store
394 | + the previously XORed data that has not been added to an FEC
395 | + packet yet, and the new to-be XORed data extracted from one
396 | + or more existing buffers)
397 | +
398 | + * fec - used to store the FEC parameters
399 | + (is freed after the corresponding packet is forwarded to the
400 | + transmission routine)
401 | + */
402 | + data = kmalloc(2 * block_len, GFP_ATOMIC);
403 | + if (data == NULL)
404 | + return -ENOMEM;
405 | +
406 | + fec = kmalloc(sizeof(struct tcp_fec), GFP_ATOMIC);
407 | + if (fec == NULL) {
408 | + kfree(data);
409 | + return -ENOMEM;
410 | + }
411 | +
412 | + memset(data, 0, 2 * block_len);
413 | + memset(fec, 0, sizeof(struct tcp_fec));
414 | +
415 | + block = data + block_len;
416 | +
417 | + /* encode data blocks
418 | + * XXX atomicity check?
419 | + */
420 | + fec->enc_seq = next_seq;
421 | + while ((data_len = tcp_fec_get_next_block(sk, &skb,
422 | + &sk->sk_write_queue, next_seq,
423 | + min(block_len, tp->snd_nxt - next_seq),
424 | + block))) {
425 | + /* Check if we reached the encoding limit; then create packet
426 | + * with current payload and add it to the queue
427 | + */
428 | + if (max_encoded_per_pkt > 0 &&
429 | + c_encoded >= max_encoded_per_pkt) {
430 | + fskb = tcp_fec_make_encoded_pkt(sk, fec, data,
431 | + block_len);
432 | + if (fskb == NULL) {
433 | + kfree(data);
434 | + kfree(fec);
435 | + return -EINVAL;
436 | + }
437 | +
438 | + skb_queue_tail(list, fskb);
439 | + memset(data, 0, block_len);
440 | + c_encoded = 0;
441 | +
442 | + /* memory allocation for the FEC struct of the next
443 | + * packet
444 | + */
445 | + fec = kmalloc(sizeof(struct tcp_fec), GFP_ATOMIC);
446 | + if (fec == NULL) {
447 | + kfree(data);
448 | + return -ENOMEM;
449 | + }
450 | +
451 | + memset(fec, 0, sizeof(struct tcp_fec));
452 | + fec->enc_seq = next_seq;
453 | + }
454 | +
455 | + next_seq += data_len;
456 | + fec->enc_len = next_seq - fec->enc_seq;
457 | +
458 | + /* encode block into existing payload (XOR) */
459 | + for (i = 0; i < data_len; i++)
460 | + data[i] ^= block[i];
461 | +
462 | + c_encoded++;
463 | +
464 | + /* skip over blocks which are not requested for encoding */
465 | + next_seq += block_len * block_skip;
466 | + }
467 | +
468 | + /* create final packet if some data was selected for encoding */
469 | + if (c_encoded > 0) {
470 | + fskb = tcp_fec_make_encoded_pkt(sk, fec, data, block_len);
471 | + if (fskb == NULL) {
472 | + kfree(data);
473 | + kfree(fec);
474 | + return -EINVAL;
475 | + }
476 | +
477 | + skb_queue_tail(list, fskb);
478 | + } else {
479 | + kfree(fec);
480 | + }
481 | +
482 | + tp->fec.next_seq = next_seq;
483 | + kfree(data);
484 | +
485 | + return 0;
486 | +}
487 | +
488 | +/* Allocates an SKB for data we want to send and assigns
489 | + * the necessary options and fields
490 | + */
491 | +static struct sk_buff *tcp_fec_make_encoded_pkt(struct sock *sk,
492 | + struct tcp_fec *fec,
493 | + unsigned char *enc_data,
494 | + unsigned int len)
495 | +{
496 | + struct sk_buff *skb;
497 | + unsigned char *data;
498 | +
499 | + /* See tcp_make_synack(); 15 probably for tail pointer etc.? */
500 | + len = min(len, fec->enc_len);
501 | + skb = alloc_skb(MAX_TCP_HEADER + 15 + len, GFP_ATOMIC);
502 | + if (skb == NULL)
503 | + return NULL;
504 | +
505 | + /* Reserve space for headers */
506 | + skb_reserve(skb, MAX_TCP_HEADER);
507 | +
508 | + /* Specify sequence number and FEC struct address in control buffer */
509 | + fec->flags |= TCP_FEC_ENCODED;
510 | + TCP_SKB_CB(skb)->seq = fec->enc_seq;
511 | + TCP_SKB_CB(skb)->fec = fec;
512 | +
513 | + /* Enable ACK flag (required for all data packets) */
514 | + TCP_SKB_CB(skb)->tcp_flags = TCPHDR_ACK;
515 | +
516 | + /* Set GSO parameters */
517 | + skb_shinfo(skb)->gso_segs = 1;
518 | + skb_shinfo(skb)->gso_size = 0;
519 | + skb_shinfo(skb)->gso_type = 0;
520 | +
521 | + /* Append payload to SKB */
522 | + data = skb_put(skb, len);
523 | + memcpy(data, enc_data, len);
524 | +
525 | + skb->ip_summed = CHECKSUM_PARTIAL;
526 | +
527 | + return skb;
528 | +}
529 | +
530 | +/* Transmit all FEC packets in a list */
531 | +static int tcp_fec_xmit_all(struct sock *sk, struct sk_buff_head *list)
532 | +{
533 | + struct sk_buff *skb;
534 | + int err;
535 | +
536 | + if (list == NULL || skb_queue_empty(list))
537 | + return 0;
538 | +
539 | + skb = (struct sk_buff *) list;
540 | + while (!skb_queue_is_last(list, skb)) {
541 | + skb = skb_queue_next(list, skb);
542 | + err = tcp_fec_xmit(sk, skb);
543 | + if (err)
544 | + return err;
545 | + }
546 | +
547 | + return 0;
548 | +}
549 | +
550 | +/* Transmits an FEC packet */
551 | +static int tcp_fec_xmit(struct sock *sk, struct sk_buff *skb)
552 | +{
553 | + /* TODO timers? no retransmissions, but want to deactivate FEC
554 | + * if we never get any FEC ACKs back
555 | + */
556 | + return tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
557 | +}
558 | diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
559 | index 4d17c5f..e1b70bf 100644
560 | --- a/net/ipv4/tcp_input.c
561 | +++ b/net/ipv4/tcp_input.c
562 | @@ -2938,6 +2938,12 @@ void tcp_rearm_rto(struct sock *sk)
563 | if (tp->fastopen_rsk)
564 | return;
565 |
566 | + /* Don't rearm the timer if an FEC timer is active.
567 | + * The FEC handler will rearm the timer once the event is handled.
568 | + */
569 | + if (icsk->icsk_pending == ICSK_TIME_FEC)
570 | + return;
571 | +
572 | if (!tp->packets_out) {
573 | inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS);
574 | } else {
575 | diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
576 | index 04e3bf0..d97fb04 100644
577 | --- a/net/ipv4/tcp_ipv4.c
578 | +++ b/net/ipv4/tcp_ipv4.c
579 | @@ -2667,7 +2667,8 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i, int *len)
580 |
581 | if (icsk->icsk_pending == ICSK_TIME_RETRANS ||
582 | icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
583 | - icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
584 | + icsk->icsk_pending == ICSK_TIME_LOSS_PROBE ||
585 | + icsk->icsk_pending == ICSK_TIME_FEC) {
586 | timer_active = 1;
587 | timer_expires = icsk->icsk_timeout;
588 | } else if (icsk->icsk_pending == ICSK_TIME_PROBE0) {
589 | diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
590 | index 00daf84..cba34c0 100644
591 | --- a/net/ipv4/tcp_output.c
592 | +++ b/net/ipv4/tcp_output.c
593 | @@ -864,7 +864,7 @@ void tcp_wfree(struct sk_buff *skb)
594 | * We are working here with either a clone of the original
595 | * SKB, or a fresh unique copy made by the retransmit engine.
596 | */
597 | -static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
598 | +int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
599 | gfp_t gfp_mask)
600 | {
601 | const struct inet_connection_sock *icsk = inet_csk(sk);
602 | @@ -1936,6 +1936,9 @@ repair:
603 | break;
604 | }
605 |
606 | + if (tcp_fec_is_enabled(tp))
607 | + tcp_fec_invoke(sk);
608 | +
609 | if (likely(sent_pkts)) {
610 | if (tcp_in_cwnd_reduction(sk))
611 | tp->prr_out += sent_pkts;
612 | diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
613 | index 4b85e6f..b2808d4 100644
614 | --- a/net/ipv4/tcp_timer.c
615 | +++ b/net/ipv4/tcp_timer.c
616 | @@ -21,6 +21,7 @@
617 | #include
618 | #include
619 | #include
620 | +#include
621 |
622 | int sysctl_tcp_syn_retries __read_mostly = TCP_SYN_RETRIES;
623 | int sysctl_tcp_synack_retries __read_mostly = TCP_SYNACK_RETRIES;
624 | @@ -472,7 +473,15 @@ out_reset_timer:
625 | if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0, 0))
626 | __sk_dst_reset(sk);
627 |
628 | -out:;
629 | +out:
630 | + /* FEC will switch out the RTO timer if a delayed FEC transmission
631 | + * should happen earlier than this. RTO timer will be switched in
632 | + * once the FEC timer fired.
633 | + * FEC transmissions during a loss episode require that the sysctl
634 | + * value is >= 2.
635 | + */
636 | + if (tcp_fec_is_enabled(tp) && sysctl_tcp_fec >= 2)
637 | + tcp_fec_arm_timer(sk);
638 | }
639 |
640 | void tcp_write_timer_handler(struct sock *sk)
641 | @@ -497,6 +506,9 @@ void tcp_write_timer_handler(struct sock *sk)
642 | case ICSK_TIME_LOSS_PROBE:
643 | tcp_send_loss_probe(sk);
644 | break;
645 | + case ICSK_TIME_FEC:
646 | + tcp_fec_timer(sk);
647 | + break;
648 | case ICSK_TIME_RETRANS:
649 | icsk->icsk_pending = 0;
650 | tcp_retransmit_timer(sk);
651 | --
652 | 2.1.0.rc2.206.gedb03e5
653 |
654 |
--------------------------------------------------------------------------------