├── .gitignore └── pipelines.md /.gitignore: -------------------------------------------------------------------------------- 1 | ## Ignore Visual Studio temporary files, build results, and 2 | ## files generated by popular Visual Studio add-ons. 3 | ## 4 | ## Get latest from https://github.com/github/gitignore/blob/master/VisualStudio.gitignore 5 | 6 | # User-specific files 7 | *.suo 8 | *.user 9 | *.userosscache 10 | *.sln.docstates 11 | 12 | # User-specific files (MonoDevelop/Xamarin Studio) 13 | *.userprefs 14 | 15 | # Build results 16 | [Dd]ebug/ 17 | [Dd]ebugPublic/ 18 | [Rr]elease/ 19 | [Rr]eleases/ 20 | x64/ 21 | x86/ 22 | bld/ 23 | [Bb]in/ 24 | [Oo]bj/ 25 | [Ll]og/ 26 | 27 | # Visual Studio 2015/2017 cache/options directory 28 | .vs/ 29 | # Uncomment if you have tasks that create the project's static files in wwwroot 30 | #wwwroot/ 31 | 32 | # Visual Studio 2017 auto generated files 33 | Generated\ Files/ 34 | 35 | # MSTest test Results 36 | [Tt]est[Rr]esult*/ 37 | [Bb]uild[Ll]og.* 38 | 39 | # NUNIT 40 | *.VisualState.xml 41 | TestResult.xml 42 | 43 | # Build Results of an ATL Project 44 | [Dd]ebugPS/ 45 | [Rr]eleasePS/ 46 | dlldata.c 47 | 48 | # Benchmark Results 49 | BenchmarkDotNet.Artifacts/ 50 | 51 | # .NET Core 52 | project.lock.json 53 | project.fragment.lock.json 54 | artifacts/ 55 | **/Properties/launchSettings.json 56 | 57 | # StyleCop 58 | StyleCopReport.xml 59 | 60 | # Files built by Visual Studio 61 | *_i.c 62 | *_p.c 63 | *_i.h 64 | *.ilk 65 | *.meta 66 | *.obj 67 | *.iobj 68 | *.pch 69 | *.pdb 70 | *.ipdb 71 | *.pgc 72 | *.pgd 73 | *.rsp 74 | *.sbr 75 | *.tlb 76 | *.tli 77 | *.tlh 78 | *.tmp 79 | *.tmp_proj 80 | *.log 81 | *.vspscc 82 | *.vssscc 83 | .builds 84 | *.pidb 85 | *.svclog 86 | *.scc 87 | 88 | # Chutzpah Test files 89 | _Chutzpah* 90 | 91 | # Visual C++ cache files 92 | ipch/ 93 | *.aps 94 | *.ncb 95 | *.opendb 96 | *.opensdf 97 | *.sdf 98 | *.cachefile 99 | *.VC.db 100 | *.VC.VC.opendb 101 | 102 | # Visual Studio profiler 103 | *.psess 104 | *.vsp 105 | *.vspx 106 | *.sap 107 | 108 | # Visual Studio Trace Files 109 | *.e2e 110 | 111 | # TFS 2012 Local Workspace 112 | $tf/ 113 | 114 | # Guidance Automation Toolkit 115 | *.gpState 116 | 117 | # ReSharper is a .NET coding add-in 118 | _ReSharper*/ 119 | *.[Rr]e[Ss]harper 120 | *.DotSettings.user 121 | 122 | # JustCode is a .NET coding add-in 123 | .JustCode 124 | 125 | # TeamCity is a build add-in 126 | _TeamCity* 127 | 128 | # DotCover is a Code Coverage Tool 129 | *.dotCover 130 | 131 | # AxoCover is a Code Coverage Tool 132 | .axoCover/* 133 | !.axoCover/settings.json 134 | 135 | # Visual Studio code coverage results 136 | *.coverage 137 | *.coveragexml 138 | 139 | # NCrunch 140 | _NCrunch_* 141 | .*crunch*.local.xml 142 | nCrunchTemp_* 143 | 144 | # MightyMoose 145 | *.mm.* 146 | AutoTest.Net/ 147 | 148 | # Web workbench (sass) 149 | .sass-cache/ 150 | 151 | # Installshield output folder 152 | [Ee]xpress/ 153 | 154 | # DocProject is a documentation generator add-in 155 | DocProject/buildhelp/ 156 | DocProject/Help/*.HxT 157 | DocProject/Help/*.HxC 158 | DocProject/Help/*.hhc 159 | DocProject/Help/*.hhk 160 | DocProject/Help/*.hhp 161 | DocProject/Help/Html2 162 | DocProject/Help/html 163 | 164 | # Click-Once directory 165 | publish/ 166 | 167 | # Publish Web Output 168 | *.[Pp]ublish.xml 169 | *.azurePubxml 170 | # Note: Comment the next line if you want to checkin your web deploy settings, 171 | # but database connection strings (with potential passwords) will be unencrypted 172 | *.pubxml 173 | *.publishproj 174 | 175 | # Microsoft Azure Web App publish settings. Comment the next line if you want to 176 | # checkin your Azure Web App publish settings, but sensitive information contained 177 | # in these scripts will be unencrypted 178 | PublishScripts/ 179 | 180 | # NuGet Packages 181 | *.nupkg 182 | # The packages folder can be ignored because of Package Restore 183 | **/[Pp]ackages/* 184 | # except build/, which is used as an MSBuild target. 185 | !**/[Pp]ackages/build/ 186 | # Uncomment if necessary however generally it will be regenerated when needed 187 | #!**/[Pp]ackages/repositories.config 188 | # NuGet v3's project.json files produces more ignorable files 189 | *.nuget.props 190 | *.nuget.targets 191 | 192 | # Microsoft Azure Build Output 193 | csx/ 194 | *.build.csdef 195 | 196 | # Microsoft Azure Emulator 197 | ecf/ 198 | rcf/ 199 | 200 | # Windows Store app package directories and files 201 | AppPackages/ 202 | BundleArtifacts/ 203 | Package.StoreAssociation.xml 204 | _pkginfo.txt 205 | *.appx 206 | 207 | # Visual Studio cache files 208 | # files ending in .cache can be ignored 209 | *.[Cc]ache 210 | # but keep track of directories ending in .cache 211 | !*.[Cc]ache/ 212 | 213 | # Others 214 | ClientBin/ 215 | ~$* 216 | *~ 217 | *.dbmdl 218 | *.dbproj.schemaview 219 | *.jfm 220 | *.pfx 221 | *.publishsettings 222 | orleans.codegen.cs 223 | 224 | # Including strong name files can present a security risk 225 | # (https://github.com/github/gitignore/pull/2483#issue-259490424) 226 | #*.snk 227 | 228 | # Since there are multiple workflows, uncomment next line to ignore bower_components 229 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622) 230 | #bower_components/ 231 | 232 | # RIA/Silverlight projects 233 | Generated_Code/ 234 | 235 | # Backup & report files from converting an old project file 236 | # to a newer Visual Studio version. Backup files are not needed, 237 | # because we have git ;-) 238 | _UpgradeReport_Files/ 239 | Backup*/ 240 | UpgradeLog*.XML 241 | UpgradeLog*.htm 242 | ServiceFabricBackup/ 243 | *.rptproj.bak 244 | 245 | # SQL Server files 246 | *.mdf 247 | *.ldf 248 | *.ndf 249 | 250 | # Business Intelligence projects 251 | *.rdl.data 252 | *.bim.layout 253 | *.bim_*.settings 254 | *.rptproj.rsuser 255 | 256 | # Microsoft Fakes 257 | FakesAssemblies/ 258 | 259 | # GhostDoc plugin setting file 260 | *.GhostDoc.xml 261 | 262 | # Node.js Tools for Visual Studio 263 | .ntvs_analysis.dat 264 | node_modules/ 265 | 266 | # Visual Studio 6 build log 267 | *.plg 268 | 269 | # Visual Studio 6 workspace options file 270 | *.opt 271 | 272 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.) 273 | *.vbw 274 | 275 | # Visual Studio LightSwitch build output 276 | **/*.HTMLClient/GeneratedArtifacts 277 | **/*.DesktopClient/GeneratedArtifacts 278 | **/*.DesktopClient/ModelManifest.xml 279 | **/*.Server/GeneratedArtifacts 280 | **/*.Server/ModelManifest.xml 281 | _Pvt_Extensions 282 | 283 | # Paket dependency manager 284 | .paket/paket.exe 285 | paket-files/ 286 | 287 | # FAKE - F# Make 288 | .fake/ 289 | 290 | # JetBrains Rider 291 | .idea/ 292 | *.sln.iml 293 | 294 | # CodeRush 295 | .cr/ 296 | 297 | # Python Tools for Visual Studio (PTVS) 298 | __pycache__/ 299 | *.pyc 300 | 301 | # Cake - Uncomment if you are using it 302 | # tools/** 303 | # !tools/packages.config 304 | 305 | # Tabs Studio 306 | *.tss 307 | 308 | # Telerik's JustMock configuration file 309 | *.jmconfig 310 | 311 | # BizTalk build output 312 | *.btp.cs 313 | *.btm.cs 314 | *.odx.cs 315 | *.xsd.cs 316 | 317 | # OpenCover UI analysis results 318 | OpenCover/ 319 | 320 | # Azure Stream Analytics local run output 321 | ASALocalRun/ 322 | 323 | # MSBuild Binary and Structured Log 324 | *.binlog 325 | 326 | # NVidia Nsight GPU debugger configuration file 327 | *.nvuser 328 | 329 | # MFractors (Xamarin productivity tool) working folder 330 | .mfractor/ 331 | -------------------------------------------------------------------------------- /pipelines.md: -------------------------------------------------------------------------------- 1 | # System.IO.Pipelines: High performance IO in .NET 2 | 3 | [System.IO.Pipelines](https://www.nuget.org/packages/System.IO.Pipelines/) is a new library that is designed to make it easier to do high performance IO in .NET. It's a library targeting .NET Standard that works on all .NET implementations. 4 | 5 | Pipelines was born from the work the .NET Core team did to make Kestrel one of the [fastest web servers in the industry](https://www.techempower.com/benchmarks/#section=data-r16&hw=ph&test=plaintext). What started as an implementation detail inside of Kestrel progressed into a re-usable API that shipped in 2.1 as a first class BCL API (System.IO.Pipelines) available for all .NET developers. 6 | 7 | ## What problem does it solve? 8 | 9 | Correctly parsing data from a stream or socket is dominated by boilerplate code and has many corner cases, leading to complex code that is difficult to maintain. 10 | Achieving high performance and being correct, while also dealing with this complexity is difficult. Pipelines aims to solve this complexity. 11 | 12 | ## What extra complexity exists today? 13 | 14 | Let's start with a simple problem. We want to write a TCP server that receives line-delimited messages (delimited by \n) from a client. 15 | 16 | ### TCP Server with NetworkStream 17 | 18 | *DISCLAIMER: As with all performance sensitive work, each of the scenarios should be measured within the context of your application. The overhead of the various techniques mentioned may not be necessary depending on the scale your networking applications need to handle.* 19 | 20 | The typical code you would write in .NET before pipelines looks something like this: 21 | 22 | ```C# 23 | async Task ProcessLinesAsync(NetworkStream stream) 24 | { 25 | var buffer = new byte[1024]; 26 | await stream.ReadAsync(buffer, 0, buffer.Length); 27 | 28 | // Process a single line from the buffer 29 | ProcessLine(buffer); 30 | } 31 | ``` 32 | 33 | This code might work when testing locally but it's has several errors: 34 | - The entire message (end of line) may not have been received in a single call to `ReadAsync`. 35 | - It's ignoring the result of `stream.ReadAsync()` which returns how much data was actually filled into the buffer. 36 | - It doesn't handle the case where multiple lines come back in a single `ReadAsync` call. 37 | 38 | These are some of the common pitfalls when reading streaming data. To account for this we need to make a few changes: 39 | - We need to buffer the incoming data until we have found a new line. 40 | - We need to parse *all* of the lines returned in the buffer 41 | 42 | ```C# 43 | async Task ProcessLinesAsync(NetworkStream stream) 44 | { 45 | var buffer = new byte[1024]; 46 | var bytesBuffered = 0; 47 | var bytesConsumed = 0; 48 | 49 | while (true) 50 | { 51 | var bytesRead = await stream.ReadAsync(buffer, bytesBuffered, buffer.Length - bytesBuffered); 52 | if (bytesRead == 0) 53 | { 54 | // EOF 55 | break; 56 | } 57 | // Keep track of the amount of buffered bytes 58 | bytesBuffered += bytesRead; 59 | 60 | var linePosition = -1; 61 | 62 | do 63 | { 64 | // Look for a EOL in the buffered data 65 | linePosition = Array.IndexOf(buffer, (byte)'\n', bytesConsumed, bytesBuffered - bytesConsumed); 66 | 67 | if (linePosition >= 0) 68 | { 69 | // Calculate the length of the line based on the offset 70 | var lineLength = linePosition - bytesConsumed; 71 | 72 | // Process the line 73 | ProcessLine(buffer, bytesConsumed, lineLength); 74 | 75 | // Move the bytesConsumed to skip past the line we consumed (including \n) 76 | bytesConsumed += lineLength + 1; 77 | } 78 | } 79 | while (linePosition >= 0); 80 | } 81 | } 82 | ``` 83 | 84 | Once again, this might work in local testing but it's possible that the line is bigger than 1KiB (1024 bytes). We need to resize the input buffer until we have found a new line. 85 | 86 | Also, we're allocating buffers on the heap as longer lines are processed. We can improve this by using the `ArrayPool` to avoid repeated buffer allocations as we parse longer lines from the client. 87 | 88 | ```C# 89 | async Task ProcessLinesAsync(NetworkStream stream) 90 | { 91 | byte[] buffer = ArrayPool.Shared.Rent(1024); 92 | var bytesBuffered = 0; 93 | var bytesConsumed = 0; 94 | 95 | while (true) 96 | { 97 | // Calculate the amount of bytes remaining in the buffer 98 | var bytesRemaining = buffer.Length - bytesBuffered; 99 | 100 | if (bytesRemaining == 0) 101 | { 102 | // Double the buffer size and copy the previously buffered data into the new buffer 103 | var newBuffer = ArrayPool.Shared.Rent(buffer.Length * 2); 104 | Buffer.BlockCopy(buffer, 0, newBuffer, 0, buffer.Length); 105 | // Return the old buffer to the pool 106 | ArrayPool.Shared.Return(buffer); 107 | buffer = newBuffer; 108 | bytesRemaining = buffer.Length - bytesBuffered; 109 | } 110 | 111 | var bytesRead = await stream.ReadAsync(buffer, bytesBuffered, bytesRemaining); 112 | if (bytesRead == 0) 113 | { 114 | // EOF 115 | break; 116 | } 117 | 118 | // Keep track of the amount of buffered bytes 119 | bytesBuffered += bytesRead; 120 | 121 | do 122 | { 123 | // Look for a EOL in the buffered data 124 | linePosition = Array.IndexOf(buffer, (byte)'\n', bytesConsumed, bytesBuffered - bytesConsumed); 125 | 126 | if (linePosition >= 0) 127 | { 128 | // Calculate the length of the line based on the offset 129 | var lineLength = linePosition - bytesConsumed; 130 | 131 | // Process the line 132 | ProcessLine(buffer, bytesConsumed, lineLength); 133 | 134 | // Move the bytesConsumed to skip past the line we consumed (including \n) 135 | bytesConsumed += lineLength + 1; 136 | } 137 | } 138 | while (linePosition >= 0); 139 | } 140 | } 141 | ``` 142 | 143 | This code works but now we're re-sizing the buffer which results in more buffer copies. It also uses more memory as the logic doesn't shrink the buffer after lines are processed. To avoid this, we can store a list of buffers instead of resizing each time we cross the 1KiB buffer size. 144 | 145 | Also, we don't grow the the 1KiB buffer until it's completely empty. This means we can end up passing smaller and smaller buffers to `ReadAsync` which will result in more calls into the operating system. 146 | 147 | To mitigate this, we'll allocate a new buffer when there's less than 512 bytes remaining in the existing buffer: 148 | 149 | ```C# 150 | public class BufferSegment 151 | { 152 | public byte[] Buffer { get; set; } 153 | public int Count { get; set; } 154 | 155 | public int Remaining => Buffer.Length - Count; 156 | } 157 | 158 | async Task ProcessLinesAsync(NetworkStream stream) 159 | { 160 | const int minimumBufferSize = 512; 161 | 162 | var segments = new List(); 163 | var bytesConsumed = 0; 164 | var bytesConsumedBufferIndex = 0; 165 | var segment = new BufferSegment { Buffer = ArrayPool.Shared.Rent(1024) }; 166 | 167 | segments.Add(segment); 168 | 169 | while (true) 170 | { 171 | // Calculate the amount of bytes remaining in the buffer 172 | if (segment.Remaining < minimumBufferSize) 173 | { 174 | // Allocate a new segment 175 | segment = new BufferSegment { Buffer = ArrayPool.Shared.Rent(1024) }; 176 | segments.Add(segment); 177 | } 178 | 179 | var bytesRead = await stream.ReadAsync(segment.Buffer, segment.Count, segment.Remaining); 180 | if (bytesRead == 0) 181 | { 182 | break; 183 | } 184 | 185 | // Keep track of the amount of buffered bytes 186 | segment.Count += bytesRead; 187 | 188 | while (true) 189 | { 190 | // Look for a EOL in the list of segments 191 | var (segmentIndex, segmentOffset) = IndexOf(segments, (byte)'\n', bytesConsumedBufferIndex, bytesConsumed); 192 | 193 | if (segmentIndex >= 0) 194 | { 195 | // Process the line 196 | ProcessLine(segments, segmentIndex, segmentOffset); 197 | 198 | bytesConsumedBufferIndex = segmentOffset; 199 | bytesConsumed = segmentOffset + 1; 200 | } 201 | else 202 | { 203 | break; 204 | } 205 | } 206 | 207 | // Drop fully consumed segments from the list so we don't look at them again 208 | for (var i = bytesConsumedBufferIndex; i >= 0; --i) 209 | { 210 | var consumedSegment = segments[i]; 211 | // Return all segments unless this is the current segment 212 | if (consumedSegment != segment) 213 | { 214 | ArrayPool.Shared.Return(consumedSegment.Buffer); 215 | segments.RemoveAt(i); 216 | } 217 | } 218 | } 219 | } 220 | 221 | (int segmentIndex, int segmentOffest) IndexOf(List segments, byte value, int startBufferIndex, int startSegmentOffset) 222 | { 223 | var first = true; 224 | for (var i = startBufferIndex; i < segments.Count; ++i) 225 | { 226 | var segment = segments[i]; 227 | // Start from the correct offset 228 | var offset = first ? startSegmentOffset : 0; 229 | var index = Array.IndexOf(segment.Buffer, value, offset, segment.Count - offset); 230 | 231 | if (index >= 0) 232 | { 233 | // Return the buffer index and the index within that segment where EOL was found 234 | return (i, index); 235 | } 236 | 237 | first = false; 238 | } 239 | return (-1, -1); 240 | } 241 | ``` 242 | 243 | This code just got *much* more complicated. We're keeping track of the filled up buffers as we're looking for the delimeter. To do this, we're using a `List` here to represent the buffered data while looking for the new line delimeter. As a result, `ProcessLine` and `IndexOf` now accept a `List` instead of a `byte[]`, `offset` and `count`. Our parsing logic needs to now handle one or more buffer segments. 244 | 245 | Our server now handles partial messages, and it uses pooled memory to reduce overall memory consumption but there are still a couple more changes we need to make: 246 | 247 | 1. The `byte[]` we're using from the `ArrayPool` are just regular managed arrays. This means whenever we do a `ReadAsync` or `WriteAsync`, those buffers get pinned for the lifetime of the asynchronous operation (in order to interop with the native IO APIs on the operating system). This has performance implications on the garbage collector since pinned memory cannot be moved which can lead to heap fragmentation. Depending on how long the async operations are pending, the pool implementation may need to change. 248 | 2. The throughput can be optimized by decoupling the reading and processing logic. This creates a batching effect that lets the parsing logic consume larger chunks of buffers, instead of reading more data only after parsing a single line. This introduces some additional complexity: 249 | - We need two loops that run independently of each other. One that reads from the `Socket` and one that parses the buffers. 250 | - We need a way to signal the parsing logic when data becomes available. 251 | - We need to decide what happens if the loop reading from the `Socket` is "too fast". We need a way to throttle the reading loop if the parsing logic can't keep up. This is commonly referred to as "flow control" or "back pressure". 252 | - We need to make sure things are thread safe. We're now sharing a set of buffers between the reading loop and the parsing loop and those run independently on different threads. 253 | - The memory management logic is now spread across two different pieces of code, the code that rents from the buffer pool is reading from the socket and the code that returns from the buffer pool is the parsing logic. 254 | - We need to be extremely careful with how we return buffers after the parsing logic is done with them. If we're not careful, it's possible that we return a buffer that's still being written to by the `Socket` reading logic. 255 | 256 | The complexity has gone through the roof (and we haven't even covered all of the cases). High performance networking usually means writing very complex code in order to eke out more performance from the system. 257 | 258 | *The goal of `System.IO.Pipelines` is to make writing this type of code easier.* 259 | 260 | ### TCP server with System.IO.Pipelines 261 | 262 | Let's take a look at what this example looks like with `System.IO.Pipelines`: 263 | 264 | ```C# 265 | async Task ProcessLinesAsync(Socket socket) 266 | { 267 | var pipe = new Pipe(); 268 | Task writing = FillPipeAsync(socket, pipe.Writer); 269 | Task reading = ReadPipeAsync(pipe.Reader); 270 | 271 | return Task.WhenAll(reading, writing); 272 | } 273 | 274 | async Task FillPipeAsync(Socket socket, PipeWriter writer) 275 | { 276 | const int minimumBufferSize = 512; 277 | 278 | while (true) 279 | { 280 | // Allocate at least 512 bytes from the PipeWriter 281 | Memory memory = writer.GetMemory(minimumBufferSize); 282 | try 283 | { 284 | int bytesRead = await socket.ReceiveAsync(memory, SocketFlags.None); 285 | if (bytesRead == 0) 286 | { 287 | break; 288 | } 289 | // Tell the PipeWriter how much was read from the Socket 290 | writer.Advance(bytesRead); 291 | } 292 | catch (Exception ex) 293 | { 294 | LogError(ex); 295 | break; 296 | } 297 | 298 | // Make the data available to the PipeReader 299 | FlushResult result = await writer.FlushAsync(); 300 | 301 | if (result.IsCompleted) 302 | { 303 | break; 304 | } 305 | } 306 | 307 | // Tell the PipeReader that there's no more data coming 308 | writer.Complete(); 309 | } 310 | 311 | async Task ReadPipeAsync(PipeReader reader) 312 | { 313 | while (true) 314 | { 315 | ReadResult result = await reader.ReadAsync(); 316 | 317 | ReadOnlySequence buffer = result.Buffer; 318 | SequencePosition? position = null; 319 | 320 | do 321 | { 322 | // Look for a EOL in the buffer 323 | position = buffer.PositionOf((byte)'\n'); 324 | 325 | if (position != null) 326 | { 327 | // Process the line 328 | ProcessLine(buffer.Slice(0, position.Value)); 329 | 330 | // Skip the line + the \n character (basically position) 331 | buffer = buffer.Slice(buffer.GetPosition(1, position.Value)); 332 | } 333 | } 334 | while (position != null); 335 | 336 | // Tell the PipeReader how much of the buffer we have consumed 337 | reader.AdvanceTo(buffer.Start, buffer.End); 338 | 339 | // Stop reading if there's no more data coming 340 | if (result.IsCompleted) 341 | { 342 | break; 343 | } 344 | } 345 | 346 | // Mark the PipeReader as complete 347 | reader.Complete(); 348 | } 349 | ``` 350 | 351 | The pipelines version of our line reader has 2 loops: 352 | - `FillPipeAsync` reads from the `Socket` and writes into the `PipeWriter`. 353 | - `ReadPipeAsync` reads from the `PipeReader` and parses incoming lines. 354 | 355 | Unlike the original examples, there are no explicit buffers allocated anywhere. This is one of pipelines' core features. All buffer management is delegated to the `PipeReader`/`PipeWriter` implementations. 356 | 357 | **This makes it easier for consuming code to focus solely on the business logic instead of complex buffer management.** 358 | 359 | In the first loop, we first call `PipeWriter.GetMemory(int)` to get some memory from the underlying writer; then we call `PipeWriter.Advance(int)` to tell the `PipeWriter` how much data we actually wrote to the buffer. We then call `PipeWriter.FlushAsync()` to make the data available to the `PipeReader`. 360 | 361 | In the second loop, we're consuming the buffers written by the `PipeWriter` which ultimately comes from the `Socket`. When the call to `PipeReader.ReadAsync()` returns, we get a `ReadResult` which contains 2 important pieces of information, the data that was read in the form of `ReadOnlySequence` and a bool `IsCompleted` that lets the reader know if the writer is done writing (EOF). After finding the end of line (EOL) delimiter and parsing the line, we slice the buffer to skip what we've already processed and then we call `PipeReader.AdvanceTo` to tell the `PipeReader` how much data we have consumed. 362 | 363 | At the end of each of the loops, we complete both the reader and the writer. This lets the underlying `Pipe` release all of the memory it allocated. 364 | 365 | ## System.IO.Pipelines 366 | 367 | ### Partial Reads 368 | 369 | Besides handling the memory management, the other core pipelines feature is the ability to peek at data in the `Pipe` without actually consuming it. 370 | 371 | `PipeReader` has two core APIs `ReadAsync` and `AdvanceTo`. `ReadAsync` gets the data in the `Pipe`, `AdvanceTo` tells the `PipeReader` that these buffers are no longer required by the reader so they can be discarded (for example returned to the underlying buffer pool). 372 | 373 | Here's an example of an http parser that reads partial data buffers data in the `Pipe` until a valid start line is received. 374 | 375 | ![image](https://user-images.githubusercontent.com/95136/42349904-1a6e3484-8063-11e8-8ac2-7f8e636b4a23.png) 376 | 377 | ### ReadOnlySequence\ 378 | 379 | The `Pipe` implementation stores a linked list of buffers that get passed between the `PipeWriter` and `PipeReader`. `PipeReader.ReadAsync` exposes a `ReadOnlySequence` which is a new BCL type that represents a view over one or more segments of `ReadOnlyMemory`, similar to `Span` and `Memory` which provide a view over arrays and strings. 380 | 381 | ![image](https://user-images.githubusercontent.com/95136/42292592-74a4028e-7f88-11e8-85f7-a6b2f925769d.png) 382 | 383 | The `Pipe` internally maintains pointers to where the reader and writer are in the overall set of allocated data and updates them as data is written or read. The `SequencePosition` represents a single point in the linked list of buffers and can be used to efficiently slice the `ReadOnlySequence`. 384 | 385 | Since the `ReadOnlySequence` can support one or more segments, it's typical for high performance processing logic to split fast and slow paths based on single or multiple segments. 386 | 387 | For example, here's a routine that converts an ASCII `ReadOnlySequence` into a `string`: 388 | 389 | ```C# 390 | string GetAsciiString(ReadOnlySequence buffer) 391 | { 392 | if (buffer.IsSingleSegment) 393 | { 394 | return Encoding.ASCII.GetString(buffer.First.Span); 395 | } 396 | 397 | return string.Create((int)buffer.Length, buffer, (span, sequence) => 398 | { 399 | foreach (var segment in sequence) 400 | { 401 | Encoding.ASCII.GetChars(segment.Span, span); 402 | 403 | span = span.Slice(segment.Length); 404 | } 405 | }); 406 | } 407 | ``` 408 | 409 | ### Back pressure and flow control 410 | 411 | In a perfect world, reading & parsing work as a team: the reading thread consumes the data from the network and puts it in buffers while the parsing thread is responsible for constructing the appropriate data structures. Normally, parsing will take more time than just copying blocks of data from the network. As a result, the reading thread can easily overwhelm the parsing thread. The result is that the reading thread will have to either slow down or allocate more memory to store the data for the parsing thread. For optimal performance, there is a balance between frequent pauses and allocating more memory. 412 | 413 | To solve this problem, the pipe has two settings to control the flow of data, the `PauseWriterThreshold` and the `ResumeWriterThreshold`. The `PauseWriterThreshold` determines how much data should be buffered before calls to `PipeWriter.FlushAsync` pauses. The `ResumeWriterThreshold` controls how much the reader has to consume before writing can resume. 414 | 415 | ![image](https://user-images.githubusercontent.com/95136/42291183-0114a0f2-7f7f-11e8-983f-5332b7585a09.png) 416 | 417 | `PipeWriter.FlushAsync` "blocks" when the amount of data in the `Pipe` crosses `PauseWriterThreshold` and "unblocks" when it becomes lower than `ResumeWriterThreshold`. Two values are used to prevent thrashing around the limit. 418 | 419 | ### Scheduling IO 420 | 421 | Usually when using async/await, continuations are called on either on thread pool threads or on the current `SynchronizationContext`. 422 | 423 | When doing IO it's very important to have fine-grained control over where that IO is performed so that one can take advantage of CPU caches more effectively, which is critical for high-performance applications like web servers. Pipelines exposes a `PipeScheduler` that determines where asynchronous callbacks run. This gives the caller fine-grained control over exactly what threads are used for IO. 424 | 425 | An example of this in practice is in the Kestrel Libuv transport where IO callbacks run on dedicated event loop threads. 426 | 427 | ### Other benefits of the `PipeReader` pattern: 428 | - Some underlying systems support a "bufferless wait", that is, a buffer never needs to be allocated until there's actually data available in the underlying system. For example on Linux with epoll, it's possible to wait until data is ready before actually supplying a buffer to do the read. This avoids the problem where having a large number of threads waiting for data doesn't immediately require reserving a huge amount of memory. 429 | - The default `Pipe` makes it easy to write unit tests against networking code because the parsing logic is separated from the networking code so unit tests only run the parsing logic against in-memory buffers rather than consuming directly from the network. It also makes it easy to test those hard to test patterns where partial data is sent. ASP.NET Core uses this to test various aspects of the Kestrel's http parser. 430 | - Systems that allow exposing the underlying OS buffers (like the Registered IO APIs on Windows) to user code are a natural fit for pipelines since buffers are always provided by the `PipeReader` implementation. 431 | 432 | ### Other Related types 433 | 434 | As part of making System.IO.Pipelines, we also added a number of new primitive BCL types: 435 | - [MemoryPool\](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memorypool-1?view=netcore-2.1), [IMemoryOwner\](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.imemoryowner-1?view=netcore-2.1), [MemoryManager\](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.memorymanager-1?view=netcore-2.1) - .NET Core 1.0 added [ArrayPool\](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1?view=netcore-2.1) and in .NET Core 2.1 we now have a more general abstraction for a pool that works over any `Memory`. This provides an extensibility point that lets you plug in more advanced allocation strategies as well as control how buffers are managed (for e.g. provide pre-pinned buffers instead of purely managed arrays). 436 | - [IBufferWriter\](https://docs.microsoft.com/en-us/dotnet/api/system.buffers.ibufferwriter-1?view=netcore-2.1) - Represents a sink for writing synchronous buffered data. (`PipeWriter` implements this) 437 | - [IValueTaskSource](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.sources.ivaluetasksource-1?view=netcore-2.1) - [ValueTask\](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.valuetask-1?view=netcore-2.1) has existed since .NET Core 1.1 but has gained some super powers in .NET Core 2.1 to allow allocation-free awaitable async operations. See https://github.com/dotnet/corefx/issues/27445 for more details. 438 | 439 | ## How do I use Pipelines? 440 | 441 | The APIs exist in the [System.IO.Pipelines](https://www.nuget.org/packages/System.IO.Pipelines/) nuget package. 442 | 443 | Here's an example of a .NET Core 2.1 server application that uses pipelines to handle line based messages (our example above) https://github.com/davidfowl/TcpEcho. It should run with `dotnet run` (or by running it in Visual Studio). It listens to a socket on port 8087 and writes out received messages to the console. You can use a client like netcat or putty to make a connection to 8087 and send line based messages to see it working. 444 | 445 | Today Pipelines powers Kestrel and SignalR, and we hope to see it at the center of many networking libraries and components from the .NET community.  446 | --------------------------------------------------------------------------------