├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .git-backup 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | How to C in 2016 2 | ================ 3 | 4 | Reuse: [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) 5 | 6 | * Original document: [How to C in 2016] -- [Matt Stancliff] 7 | * Corrections from: [Keith S. Thompson] 8 | * Merged by: [Bryan Elliott] 9 | 10 | _This is a draft Matt wrote in early 2015 and never got around to publishing. He 11 | published the mostly unpolished version because it wasn't doing anybody any good 12 | sitting in his drafts folder. The simplest change was updating year 2015 to 2016 13 | at publication time._ 14 | 15 | _[Keith Thompson][Keith S. Thompson] provided a nice set of 16 | corrections and alternative opinions at 17 | [howto-c-response]._ 18 | 19 | _[Bryan Elliott] merged Keith's comments into Matt's text, using the response's 20 | git repository, and [Dom Christie]'s [to-markdown] converter to get text to 21 | start with, followed by some post-processing with [pandoc]. 22 | 23 | *Feel free to submit fixes/improvements/complaints as necessary. 24 | -[Matt]* 25 | 26 | *[Adrián Arroyo Calle] provides a Spanish 27 | translation at 28 | [¿Cómo programar en C (en 2016)?]* 29 | 30 | *Now on to the article...* 31 | 32 | Caveat 33 | ------ 34 | 35 | This page of generic overview details isn't about cross-architecture 36 | intricacies of C, but best-practices with respect to the language. External 37 | knowledge and experience is expected of you if you're to fully use any examples 38 | provided. It almost goes without saying: You should know your target platform 39 | if you intend to develop for it. 40 | 41 | Writing C 42 | --------- 43 | 44 | > The first rule of C is don't write C if you can avoid it. 45 | 46 | - [Jan-Erik Rediger] 47 | 48 | C provides a lot of power, speed, flexibility, and lifetime to code, however, C 49 | gives you, as the 1995 spiritual predecessor to this document informs you 50 | eponymously, [Enough Rope to Shoot Yourself In the Foot]. Whether or not you 51 | use C is a broad topic, largely out of scope for this document. Whatever your 52 | opinion on the topic, if you must write your software using C, you should 53 | follow modern rules - whenever "modern" happens to be. 54 | 55 | C has been around since the [early 1970s]. People have "learned C" at various 56 | points during its evolution, but knowledge usually get stuck after learning, so 57 | everybody has a different set of things they believe about C based on the 58 | year(s) they first started learning. 59 | 60 | It's important to not remain stuck in your "things I learned in the 80s/90s" 61 | mindset of C development. 62 | 63 | This page assumes you are on a modern platform conforming to modern standards 64 | and you have no excessive legacy compatibility requirements. We shouldn't be 65 | globally tied to ancient standards just because some companies refuse to 66 | upgrade 20 year old systems. 67 | 68 | Preflight 69 | --------- 70 | 71 | Standard c99 (c99 means "C Standard from 1999"; c11 means "C Standard from 72 | 2011", so 11 > 99). 73 | 74 | * clang, default 75 | * clang uses an extended version of C11 by default (`GNU C11 mode`), so no 76 | extra options are needed for modern features. _If you want standard 77 | C11, you need to specify `-std=c11`; if you want standard C99, use 78 | `-std=c99`._ clang compiles your source files faster than gcc. 79 | * gcc requires you specify `-std=c99` or `-std=c11` 80 | * gcc builds source files slower than clang, but *sometimes* generates 81 | faster code. Performance comparisons and regression testings are 82 | important. 83 | * gcc-5 defaults to `GNU C11 mode` (same as clang). 84 | * clang and gcc support the [gcc-specific extensions] this enables, 85 | however, many compilers do not. If you need wide portability - that is, 86 | _exactly_ c11 or c99 - you should still specify `-std=c11` 87 | or `-std=c99`. 88 | 89 | Optimizations 90 | 91 | * -O2, -O3 92 | * generally you want `-O2`, but sometimes you want `-O3`. Test under both 93 | levels (and across compilers) then keep the best performing binaries. 94 | * -Os 95 | * `-Os` (optimize for size) helps if your concern is cache efficiency 96 | (which it should be) 97 | * For more information on optimization, see [GCC's optimize options] 98 | 99 | Warnings 100 | 101 | * `-Wall -Wextra -pedantic` 102 | * [newer compiler versions] have `-Wpedantic`, but they still accept the 103 | ancient `-pedantic` as well for wider backwards compatibility. 104 | * during testing you should add `-Werror` and `-Wshadow` on all your platforms 105 | * it can be tricky deploying production source using `-Werror` because 106 | different platforms and compilers and libraries can emit different 107 | warnings. You probably don't want to kill a user's entire build just 108 | because their version of GCC on a platform you've never seen complains 109 | in new and wonderous ways. 110 | * extra fancy options include `-Wstrict-overflow -fno-strict-aliasing` 111 | * Either specify `-fno-strict-aliasing` or be sure to only access objects 112 | as the type they have at creation. Since so much existing C code 113 | aliases across types, using `-fno-strict-aliasing` is a much safer bet 114 | if you don't control the entire underlying source tree. 115 | * as of now, Clang reports some valid syntax as a warning, so you should add 116 | `-Wno-missing-field-initializers` 117 | * GCC fixed this unnecessary warning after GCC 4.7.0 118 | 119 | Building 120 | 121 | * Compilation units 122 | * The most common way of building C projects is to decompose every source 123 | file into an object file then link all the objects together at the end. 124 | This procedure works great for incremental development, but it is 125 | suboptimal for performance and optimization. Your compiler can't detect 126 | potential optimization across file boundaries this way. 127 | * LTO — Link Time Optimization 128 | * LTO fixes the "source analysis and optimization across compilation units 129 | problem" by annotating object files with intermediate representation so 130 | source-aware optimizations can be carried out across compilation units 131 | at link time (this slows down the linking process noticeably, but `make 132 | -j` helps). 133 | * [clang LTO] ([guide]) 134 | * [gcc LTO] 135 | * As of 2016, clang and 136 | gcc releases support LTO by just adding `-flto` to your command line 137 | options during object compilation and final library/program linking. 138 | * `LTO` still needs some babysitting though. Sometimes, if your program has 139 | code not used directly but used by additional libraries, LTO can evict 140 | functions or code because it detects, globally when linking, some code 141 | is unused/unreachable and doesn't *need* to be included in the final 142 | linked result. 143 | 144 | Arch 145 | 146 | * `-march=native` 147 | * give the compiler permission to use your CPU's full feature set 148 | * again, performance testing and regression testing is important (then 149 | comparing the results across multiple compilers and/or compiler 150 | versions) is important to make sure any enabled optimizations don't 151 | have adverse side effects. 152 | * `-msse2` and `-msse4.2` may be useful if you need to target 153 | not-your-build-machine features. 154 | 155 | Writing code 156 | ------------ 157 | 158 | ### Types 159 | 160 | If you find yourself typing `char` or `short` or `long` or `unsigned` 161 | into new code, you should question the purpose of the variable. 162 | 163 | `int` is going to be the most "natural" integer type for the current platform - 164 | which may or may not be what you want. If you want signed integers that are 165 | reasonably fast and are at least 16 bits, there's nothing wrong with using 166 | `int`. ``'s int_least16_t`, is usually the same type - they have the 167 | same requirements, at least - but is more verbose than it needs to be. 168 | 169 | To ensure consistency in non-storage data types for modern programs, you should 170 | `#include ` then use _standard_ types. 171 | 172 | The common standard types are: 173 | 174 | * `int8_t`, `int16_t`, `int32_t`, `int64_t` — signed integers 175 | * `uint8_t`, `uint16_t`, `uint32_t`, `uint64_t` — unsigned integers 176 | * `float` - 32-bit minimum floating point 177 | * `double` - 64-bit minimum floating point 178 | 179 | Developers routinely abuse `char` to mean "byte" even when they are doing 180 | unsigned byte manipulations - however, while `char` is guaranteed to mean a 181 | byte, a byte inly guaranteed to be _at least_ 8 bits; it is not guaranteed to be 182 | _only_ 8 bits. That said, POSIX requires that CHAR_BIT == 8. 183 | 184 | If you want bytes, use unsigned char. If you want octets, use uint8_t. If 185 | CHAR_BIT > 8, uint8_t won't exist, and your code won't compile (which is 186 | probably what you want). 187 | 188 | If a pre-existing API requires `char` (e.g. `strncat`, printf'ing "%s", ...) or 189 | if you're initializing a read-only string (e.g. `const char *hello = "hello";`) 190 | because the C type of string literals (`"hello"`) is `char []`. 191 | 192 | In C11 we have native unicode support, and the type of UTF-8 string literals is 193 | still `char *` even for multibyte sequences like `const char *abcgrr = 194 | u8"abc😬";` - but keep in mind, strlen still reports the number of _bytes_ in a 195 | char[], not the number of codepoints. If you need strong UTF-8 support for 196 | things like parsing and text processing, it's recommended to use [libutf8]. 197 | 198 | #### Signedness 199 | 200 | At no point should you be typing the word `unsigned` into your code. We can now 201 | write code without the ugly C convention of multi-word types that impair 202 | readability as well as usage. Who wants to type `unsigned long long int` when 203 | you can type `uint64_t`? The `` types are more *explicit*, more 204 | *exact* in meaning, convey *intentions* better, and are more *compact* for 205 | typographic *usage* and *readability*. 206 | 207 | But, you may say, "I need to cast pointers to `long` for dirty pointer math!" 208 | 209 | You may say that. But you are wrong. 210 | 211 | The correct type for pointer math is `uintptr_t` defined in ``, while 212 | the also useful `ptrdiff_t` is defined in [stddef.h]. 213 | 214 | Instead of: 215 | 216 | long diff = (long)ptrOld - (long)ptrNew; 217 | 218 | Use: 219 | 220 | ptrdiff_t diff = (uintptr_t)ptrOld - (uintptr_t)ptrNew; 221 | 222 | Also: 223 | 224 | printf("%p is unaligned by %" PRIuPTR " bytes.\n", (void *)p, ((uintptr_t)somePtr & (sizeof(void *) - 1))); 225 | 226 | #### System-Dependent Types 227 | 228 | You continue arguing, "on a 32 bit platform I want 32 bit longs and on a 64 bit 229 | platform I want 64 bit longs!" 230 | 231 | If we skip over the line of thinking where you are *deliberately* introducing 232 | difficult to reason about code by using two different sizes depending on 233 | platform, you still don't want to use `long` for system-dependent types. 234 | 235 | In these situations, you should use `intptr_t`, defined in ` — the 236 | integer type defined to be the word size of your current platform. 237 | 238 | On 32-bit platforms, `intptr_t` is normally `int32_t`. 239 | 240 | On 64-bit platforms, `intptr_t` is normally `int64_t`. 241 | 242 | `intptr_t` also comes in a `uintptr_t` flavor. It's possible that an 243 | implementation that cannot convert `void*` to an integer type without loss of 244 | information will not define `uintptr_t` - however, such implementations are rare, 245 | perhaps nonexistent. 246 | 247 | For holding pointer offsets, we have the aptly named `ptrdiff_t` which is the 248 | proper type for storing values of subtracted pointers. 249 | 250 | #### Maximum Value Holders 251 | 252 | Do you need an integer type capable of holding any integer usable on 253 | your system? 254 | 255 | People tend to use the largest known type in this case, such as casting smaller 256 | unsigned types to `uint64_t`, but there's a more technically correct way to 257 | guarantee any value can hold any other value. 258 | 259 | The safest container for any integer is `intmax_t` (also `uintmax_t`). You can 260 | assign or cast any signed integer to `intmax_t` with no loss of precision, and 261 | you can assign or cast any unsigned integer to `uintmax_t` with no loss of 262 | precision. 263 | 264 | #### That Other Type 265 | 266 | The most widely used system-dependent type is `size_t` and is provided 267 | by [stddef.h]. 268 | 269 | `size_t` is basically as "is the unsigned integral type of the result of the 270 | sizeof operator" which also means it's capable of holding the largest memory 271 | offset within an object. 272 | 273 | In practical use, `size_t` is the return type of `sizeof` operator. 274 | 275 | In either case: `size_t` is *practically* defined to be the same as `uintptr_t` 276 | on all modern platforms, so on a 32-bit platform `size_t` is normally 277 | `uint32_t` and on a 64-bit platform `size_t` is normally `uint64_t`. 278 | 279 | On modern desktops, `size_t` can represent any offset within your program - 280 | however, in legacy systems (e.g., older x86 systems that exposed addressing with 281 | "near" and "far" pointers), possible future systems that violate the assumption 282 | of pointer consistency, as well as some embedded systems, this is not always 283 | true. 284 | 285 | On POSIX, there is also `ssize_t` which is a signed `size_t` used as the return 286 | value from library functions that return `-1` on error. On Windows systems, 287 | most POSIX functions that return `ssize_t` return `int` instead; you should do 288 | something like this in your program's common types header: 289 | 290 | #ifndef _POSIX_VERSION 291 | # ifdef _WIN32 292 | typedef int ssize_t 293 | # endif 294 | #endif 295 | 296 | So, should you use `size_t` for arbitrary system-dependent sizes in your own 297 | function parameters? Technically, `size_t` is the return type of `sizeof`, so 298 | any functions accepting a size value representing a number of bytes is allowed 299 | to be a `size_t`. 300 | 301 | Other uses include: `size_t` is the type of the argument to malloc, and 302 | `ssize_t` is the return type of `read()` and `write()` (except on Windows where 303 | `ssize_t` doesn't exist and the return values are just `int`). 304 | 305 | #### Printing Types 306 | 307 | You should avoid casting types during printing, opting instead to use proper 308 | type specifiers. 309 | 310 | These include, but are not limited to: 311 | 312 | * `size_t` - `%zu` 313 | * `ssize_t` - `%zd` 314 | * `ptrdiff_t` - `%td` 315 | * raw pointer value - `%p` (prints hex in modern compilers; cast your pointer 316 | to `(void *)` first) 317 | * 64-bit types should be printed using `PRIu64` (unsigned) and `PRId64` (signed) 318 | * on some platforms a 64-bit value is a `long` and on others it's a 319 | `long long` 320 | * it is actually impossible to specify a correct cross-platform format 321 | string without these format macros because the types change out from 322 | under you (and remember, casting values before printing is not safe or 323 | logical). 324 | * `intptr_t` — `"%" PRIdPTR` 325 | * `uintptr_t` — `"%" PRIuPTR` 326 | * `intmax_t` — `"%" PRIdMAX` 327 | * `uintmax_t` — `"%" PRIuMAX` 328 | 329 | One note about the `PRI*` formatting specifiers: they are *macros* and the 330 | macros expand to proper printf type specifiers on a platform-specific basis. 331 | This means you can't do: 332 | 333 | printf("Local number: %PRIdPTR\n\n", someIntPtr); 334 | 335 | but instead, because they are macros, you do: 336 | 337 | printf("Local number: %" PRIdPTR "\n\n", someIntPtr); 338 | 339 | Notice you put the `%` *inside* the format string literal within your code, but 340 | the type specifier is *outside* your format string literal. This is because 341 | all adjacent strings get concatentated by the preprocessor into one final 342 | combined string literal. 343 | 344 | ### C99 allows variable declarations anywhere 345 | 346 | So, do NOT do this: 347 | 348 | void test(uint8_t input) { 349 | uint32_t b; 350 | 351 | if (input > 3) { 352 | return; 353 | } 354 | 355 | b = input; 356 | } 357 | 358 | do THIS instead: 359 | 360 | void test(uint8_t input) { 361 | if (input > 3) { 362 | return; 363 | } 364 | 365 | uint32_t b = input; 366 | } 367 | 368 | Caveat: if you have tight loops, test the placement of your initializers. 369 | Sometimes scattered declarations can cause unexpected slowdowns. For regular 370 | non-fast-path code (which is most of everything in the world), it's best to be 371 | as clear as possible, and defining types next to your initializations is a big 372 | readability improvement. 373 | 374 | ### C99 allows `for` loops to declare counters inline 375 | 376 | So, do NOT do this: 377 | 378 | uint32_t i; 379 | 380 | for (i = 0; i < 10; i++) 381 | 382 | Do THIS instead: 383 | 384 | for (uint32_t i = 0; i < 10; i++) 385 | 386 | One exception: if you need to retain your counter value after the loop exits, 387 | obviously don't declare your counter scoped to the loop itself. 388 | 389 | ### Most modern compilers support `#pragma once` 390 | 391 | `#pragma once` tells the compiler to only include your header once and you _do 392 | not_ need three lines of header guards anymore. This pragma is widely supported 393 | across most compilers across all platforms. One notable exception is Oracle's 394 | Solaris Studio C/C++. 395 | 396 | Symlinks and hardlinks can cause the same file to be found under different 397 | names, which can confuse `#pragma once`. Moreover, include guarding may 398 | incorrectly treat two _different_ files as the same file, if the compiler is 399 | using heuristics to compare linked files (such as size or modification time). 400 | In cases like this, you'll want to define your own symbols using ifndef/define. 401 | 402 | Otherwise, instead of this: 403 | 404 | #ifndef PROJECT_HEADERNAME 405 | #define PROJECT_HEADERNAME 406 | . 407 | . 408 | . 409 | #endif /* PROJECT_HEADERNAME */ 410 | 411 | You have the option of doing this instead: 412 | 413 | #pragma once 414 | 415 | Which is, in our opinion, much cleaner code. 416 | 417 | For more details, see list of supported compilers at [pragma once]. 418 | 419 | ### C allows static initialization of auto-allocated arrays 420 | 421 | So, do NOT do this: 422 | 423 | uint32_t numbers[64]; 424 | memset(numbers, 0, sizeof(numbers)); 425 | 426 | Do THIS instead: 427 | 428 | uint32_t numbers[64] = {0}; 429 | 430 | ### C allows static initialization of auto-allocated structs 431 | 432 | So, do NOT do this: 433 | 434 | struct thing { 435 | uint64_t index; 436 | uint32_t counter; 437 | }; 438 | 439 | struct thing localThing; 440 | 441 | void initThing(void) { 442 | memset(&localThing, 0, sizeof(localThing)); 443 | } 444 | 445 | Do THIS instead: 446 | 447 | struct thing { 448 | uint64_t index; 449 | uint32_t counter; 450 | }; 451 | 452 | struct thing localThing = {0}; 453 | 454 | **NOTE**: While there's normally no reason to care about padding bytes, in the 455 | event you do, it's important to know that the `{0}` method does not zero them 456 | out. For example, on a 64-bit platform, `struct thing` will have 4 bytes of 457 | padding after `counter` (on a 64-bit platform) because structs are padded to 458 | word-sized increments. If you need to zero out an entire struct *including* 459 | unused padding, use `memset(&localThing, 0, sizeof(localThing))` because 460 | `sizeof(localThing) == 16 bytes` even though the addressable contents is only 461 | `8 + 4 = 12 bytes`. 462 | 463 | If you need to re-initialize already allocated structs, declare a global 464 | zero-struct for later assignment: 465 | 466 | struct thing { 467 | uint64_t index; 468 | uint32_t counter; 469 | }; 470 | 471 | static const struct thing localThingNull = {0}; 472 | . 473 | . 474 | . 475 | struct thing localThing = {.counter = 3}; 476 | . 477 | . 478 | . 479 | localThing = localThingNull; 480 | 481 | If you are lucky enough to be in a C99 (or newer) environment, you can use 482 | compound literals instead of keeping a global "zero struct" around (also see, 483 | from 2001, [The New C: Compound Literals]). 484 | 485 | Compound literals allow you to directly assign from anyonomus structs: 486 | 487 | localThing = (struct thing){0}; 488 | 489 | ### C99 added variable length arrays (C11 made them optional) 490 | 491 | So, do NOT do this: 492 | 493 | uintmax_t arrayLength = strtoumax(argv[1], NULL, 10); 494 | void *array[]; 495 | 496 | array = malloc(sizeof(*array) * arrayLength); 497 | 498 | /* remember to free(array) when you're done using it */ 499 | 500 | Do THIS instead: 501 | 502 | uintmax_t arrayLength = strtoumax(argv[1], NULL, 10); 503 | void *array[arrayLength]; 504 | 505 | /* no need to free array */ 506 | 507 | **IMPORTANT CAVEAT:** variable length arrays are (usually) stack allocated just 508 | like regular arrays. If you wouldn't create a 3 million element regular array 509 | statically, don't attempt to create a 3 million element array at runtime using 510 | this syntax. These are not scalable python/ruby auto-growing lists. If you 511 | specify a runtime array length and the length is too big for your stack, your 512 | program will do awful things (crashes, security issues). Variable Length Arrays 513 | are convienient for small, single-purpose situations, but should not be relied 514 | on at scale in production software. If sometimes you need a 3 element array and 515 | other times a 3 million element array, definitely do not use the variable 516 | length array capability. 517 | 518 | It's good to be aware of the VLA syntax in case you encounter it live (or want 519 | it for quick one-off testing), but it can almost be considered a [dangerous 520 | anti-pattern] since you can crash your programs fairly simple by forgetting 521 | element size bounds checks or by forgetting you are on a strange target 522 | platform with no free stack space. 523 | 524 | NOTE: You must be certain `arrayLength` is a reasonable size in this situation. 525 | (i.e. less than a few KB, sometime your stack will max out at 4 KB on weird 526 | platforms). You can't stack allocate *huge* arrays (millions of entries), but 527 | if you know you have a limited count, it's much easier to use [C99 VLA] 528 | capabilities rather than manually requesting heap memory from malloc. 529 | 530 | DOUBLE NOTE: there is no user input checking above, so the user can easily kill 531 | your program by allocating a giant VLA. [Some people] go as far to call VLAs an 532 | anti-pattern, but if you keep your bounds tight, it can be a tiny win in 533 | certain situations. 534 | 535 | ### C99 allows annotating non-overlapping pointer parameters 536 | 537 | See the [restrict keyword] (often `__restrict`) 538 | 539 | ### Parameter Types 540 | 541 | If a function accepts **arbitrary** input data and a length to process, don't 542 | restrict the type of the parameter. 543 | 544 | So, do NOT do this: 545 | 546 | void processAddBytesOverflow(uint8_t *bytes, size_t len) { 547 | for (uint32_t i = 0; i < len; i++) { 548 | bytes[0] += bytes[i]; 549 | } 550 | } 551 | 552 | Do THIS instead: 553 | 554 | void processAddBytesOverflow(void *input, size_t len) { 555 | uint8_t *bytes = (uint8_t*) input; 556 | 557 | for (uint32_t i = 0; i < len; i++) { 558 | bytes[0] += bytes[i]; 559 | } 560 | } 561 | 562 | The input types to your functions describe the *interface* to your code, not 563 | what your code is doing with the parameters. The interface to the code above 564 | means "accept a byte array and a length", so you don't want to restrict your 565 | callers to only uint8\_t byte streams. Maybe your users even want to pass in 566 | old-style `char *` values or something else unexpected. 567 | 568 | By declaring your input type as `void *` then re-assigning or re-casting to the 569 | actual type you want inside your function, you save the users of your function 570 | from having to think about abstractions *inside* your own library. 571 | 572 | Some readers have pointed out alignment problems with analogues to this example: 573 | while accessing a sequence of bytes, as we do here, is always safe, accessing 574 | wider types might not be; for a different write up dealing with cross-platform 575 | alignment issues, see [Unaligned Memory Access]. 576 | 577 | ### Return Parameter Types 578 | 579 | C99 gives us the power of `` which defines `true` to `1` and `false` 580 | to `0`. 581 | 582 | A widespread convention within POSIX systems is for return value >=0 for 583 | success, and <0 for one of a number of failure codes. `0` is often used for 584 | success, since typically there's only one way for a function to succeed, but 585 | multiple paths to failure. It's important to follow this convention when 586 | adding new functions to such an interface. 587 | 588 | If you do this, and you don't need to report a positive for success, you may 589 | want to define an enum that gives some description of the return value, for the 590 | sake of readability, both up and downstream: 591 | 592 | enum My_Status_Code { 593 | error_io = -2, 594 | error_sz = -1, 595 | ok = 0 596 | }; 597 | 598 | /* ... */ 599 | 600 | switch (response) { 601 | case My_Status_Code.error_io: 602 | // report IO error 603 | break; 604 | case My_Status_Code.error_sz: 605 | // report size error 606 | break; 607 | case My_Status_Code.ok: 608 | // it worked. 609 | break; 610 | } 611 | 612 | If your function should either succeed or fail and there's no detail necessary 613 | in how it does so, you should return `true` or `false`. 614 | 615 | If a function mutates an input parameter to the extent the parameter is 616 | invalidated, instead of returning the altered pointer, your entire API should 617 | force double pointers as parameters anywhere an input can be invalidated. 618 | Coding with "for some calls, the return value invalidates the input" is too 619 | error prone for mass usage. 620 | 621 | So, do NOT do this: 622 | 623 | void *growthOptional(void *grow, size_t currentLen, size_t newLen) { 624 | if (newLen > currentLen) { 625 | void *newGrow = realloc(grow, newLen); 626 | if (newGrow) { 627 | /* resize success */ 628 | grow = newGrow; 629 | } else { 630 | /* resize failed, free existing and signal failure through NULL */ 631 | free(grow); 632 | grow = NULL; 633 | } 634 | } 635 | 636 | return grow; 637 | } 638 | 639 | Do THIS instead: 640 | 641 | /* Return value: 642 | * - 'true' if newLen > currentLen and attempted to grow 643 | * - 'true' does not signify success here, the success is still in '*_grow' 644 | * - 'false' if newLen <= currentLen */ 645 | bool growthOptional(void **_grow, size_t currentLen, size_t newLen) { 646 | void *grow = *_grow; 647 | if (newLen > currentLen) { 648 | void *newGrow = realloc(grow, newLen); 649 | if (newGrow) { 650 | /* resize success */ 651 | *_grow = newGrow; 652 | return true; 653 | } 654 | 655 | /* resize failure */ 656 | free(grow); 657 | *_grow = NULL; 658 | 659 | /* for this function, 660 | * 'true' doesn't mean success, it means 'attempted grow' */ 661 | return true; 662 | } 663 | 664 | return false; 665 | } 666 | 667 | Or, even better, Do THIS instead: 668 | 669 | typedef enum growthResult { 670 | GROWTH_RESULT_SUCCESS = 1, 671 | GROWTH_RESULT_FAILURE_GROW_NOT_NECESSARY, 672 | GROWTH_RESULT_FAILURE_ALLOCATION_FAILED 673 | } growthResult; 674 | 675 | growthResult growthOptional(void **_grow, size_t currentLen, size_t newLen) { 676 | void *grow = *_grow; 677 | if (newLen > currentLen) { 678 | void *newGrow = realloc(grow, newLen); 679 | if (newGrow) { 680 | /* resize success */ 681 | *_grow = newGrow; 682 | return GROWTH_RESULT_SUCCESS; 683 | } 684 | 685 | /* resize failure, don't remove data because we can signal error */ 686 | return GROWTH_RESULT_FAILURE_ALLOCATION_FAILED; 687 | } 688 | 689 | return GROWTH_RESULT_FAILURE_GROW_NOT_NECESSARY; 690 | } 691 | 692 | ### Formatting 693 | 694 | Coding style is simultaneously very important and utterly worthless. 695 | 696 | If your project has a 50 page coding style guideline, nobody will help you. 697 | But, if your code isn't readable, nobody will *want* to help you. 698 | 699 | The solution here is to **always** use an automated code formatter. 700 | 701 | The only usable C formatter as of 2016 is [clang-format]. clang-format has the 702 | best defaults of any automatic C formatter and is still actively developed. 703 | 704 | Here's my preferred script to run clang-format with good parameters: 705 | 706 | #!/usr/bin/env bash 707 | 708 | clang-format -style="{BasedOnStyle: llvm, IndentWidth: 4, AllowShortFunctionsOnASingleLine: None, KeepEmptyLinesAtTheStartOfBlocks: false}" "$@" 709 | 710 | Then call it as (assuming you named the script `cleanup-format`): 711 | 712 | matt@foo:~/repos/badcode% cleanup-format -i *.{c,h,cc,cpp,hpp,cxx} 713 | 714 | The `-i` option overwrites existing files in place with formatting changes 715 | instead of writing to new files or creating backup files. 716 | 717 | If you have many files, you can recursively process an entire source tree 718 | in parallel: 719 | 720 | #!/usr/bin/env bash 721 | 722 | # note: clang-tidy only accepts one file at a time, but we can run it 723 | # parallel against disjoint collections at once. 724 | find . $ -name \*.c -or -name \*.cpp -or -name \*.cc $ |xargs -n1 -P4 cleanup-tidy 725 | 726 | # clang-format accepts multiple files during one run, but let's limit it to 12 727 | # here so we (hopefully) avoid excessive memory usage. 728 | find . $ -name \*.c -or -name \*.cpp -or -name \*.cc -or -name \*.h $ |xargs -n12 -P4 cleanup-format -i 729 | 730 | Now, there's a new cleanup-tidy script there. The contents of `cleanup-tidy` is: 731 | 732 | #!/usr/bin/env bash 733 | 734 | clang-tidy \ 735 | -fix \ 736 | -fix-errors \ 737 | -header-filter=.* \ 738 | --checks=readability-braces-around-statements,misc-macro-parentheses \ 739 | $1 \ 740 | -- -I. 741 | 742 | [clang-tidy] is policy driven code refactoring tool. The options above enable 743 | two fixups: 744 | 745 | * `readability-braces-around-statements` — force all `if`/`while`/`for` 746 | statement bodies to be enclosed in braces 747 | * It's an accident of history for C to have "brace optional" single 748 | statements after loop constructs and conditionals. It is *inexcusable* 749 | to write modern code without braces enforced on every loop and every 750 | conditional. Trying to argue "but, the compiler accepts it!" has 751 | *nothing* to do with the readability, maintainability, 752 | understandability, or skimability of code. You aren't programming to 753 | please your compiler, you are programming to please future people who 754 | have to maintain your current brain state years after everybody has 755 | forgotten why anything exists in the first place. 756 | * `misc-macro-parentheses` — automatically add parens around all parameters 757 | used in macro bodies 758 | 759 | `clang-tidy` is great when it works, but for some complex code bases it can get 760 | stuck. Also, `clang-tidy` doesn't *format*, so you need to run `clang-format` 761 | after you tidy to align new braces and reflow macros. 762 | 763 | Remmeber, however, that there is an important, overriding rule to code formatting 764 | in any situation: 765 | 766 | * **Follow the conventions of the project you're working on.** 767 | 768 | ### Readability 769 | 770 | *the writing seems to start slowing down here...* 771 | 772 | #### Comments 773 | 774 | logical self-contained portions of code file 775 | 776 | #### File Structure 777 | 778 | Try to limit files to a max of 1,000 lines (1,500 lines in really bad cases). 779 | If your tests are in-line with your source file (for testing static functions, 780 | etc), adjust as necessary. 781 | 782 | ### misc thoughts 783 | 784 | #### Allocation 785 | 786 | You should usually use `calloc`. For most allocations, there is no performance 787 | penalty for getting zero'd memory. 788 | 789 | That said, `calloc` *does* have a performance impact for **huge** allocations, 790 | and on some embedded targets, legacy hardware, etc - but in no case is it slower 791 | than a `malloc/memset` call. 792 | 793 | Additionally, zeroing memory often means that buggy code (yes, your code is 794 | buggy. So's mine.) will have consistent behavior; but, by definition, it will 795 | not have correct behavior. Consistently incorrect behavior can be more 796 | difficult to track down. If you're trying to program defensively, you might 797 | consider initializing allocated memory to some value that's known to be 798 | _in_valid rather than one that might be valid. 799 | 800 | If you don't like the function protype of `calloc(object count, size per 801 | object)` you can wrap it with `#define mycalloc(N) calloc(1, N)` - though, this 802 | may not always be the best thing to do. 803 | 804 | One advantage of using `calloc()` directly without a wrapper is, unlike 805 | `malloc()`, `calloc()` can check for integer overflow because it multiplies its 806 | arguments together to obtain your final allocation size. If you are only 807 | allocating tiny things, wrapping `calloc()` is fine. If you are allocating 808 | potentially unbounded streams of data, you may want to retain the regular 809 | `calloc(element count, size of each element)` calling convention. 810 | 811 | However, `calloc` allocations remove valgrind's ability to warn you about 812 | unintentional reads or copies of uninitialized memory since allocations get 813 | initialized to `0` automatically 814 | 815 | No advice can be universal, but trying to give *exactly perfect* generic 816 | recommendations (especially with regards to memory allocation) would end up 817 | reading like a book of language specifications. 818 | 819 | For references on how `calloc()` gives you clean memory for free, see these 820 | nice writeups: 821 | 822 | * [Benchmarking fun with calloc() and zero pages (2007)] 823 | * [Copy-on-write in virtual memory management] 824 | 825 | All that said, we maintain that the best practice is to always use `calloc()` 826 | for most common scenarios of 2016. 827 | 828 | Side Note: The pre-zero'd memory delivered to you by `calloc()` is a one-shot 829 | deal. If you `realloc()` your `calloc()` allocation, the grown memory extended 830 | by realloc is *not* new zero'd out memory. Your grown allocation is filled with 831 | whatever regular uninitialized contents your kernel provides. If you need 832 | zero'd memory after a realloc, you must manually `memset()` the extent of your 833 | grown allocation. 834 | 835 | #### Avoid memset 836 | 837 | Never `memset(ptr, 0, len)` when you can statically initialize a structure (or 838 | array) to zero (or reset it back to zero by assigning from an in-line compound 839 | literal or by assigning from a global zero'd out structure; see above). 840 | 841 | Though, `memset()` is your only choice if you need to zero out a struct 842 | including its padding bytes (because `{0}` only sets defined fields, not 843 | undefined offsets filled by padding). 844 | 845 | #### Comments 846 | 847 | Comments are useful for documenting the functionality of your code, however, 848 | they have an important other function. 849 | 850 | This document describes what we consider to be best practices for C code, 851 | however, there are _always_ exceptions. When you need to deviate from 852 | standards, it's important - for other developers, and for the "you" of next 853 | week/month/year - to put a comment in your code explaining why this was 854 | required. 855 | 856 | Generally speaking, comments should not be used to hide code from the compiler 857 | - at least, not from within a source repository. Old code is already preserved 858 | in the source repository, so commented-out code only serves as a distraction. 859 | It's better to just delete commented-out code before committing. 860 | 861 | Learn More 862 | ---------- 863 | 864 | Also see [Fixed width integer types (since C99)] 865 | 866 | Also see Apple's [Making Code 64-Bit Clean] 867 | 868 | Also see the [sizes of C types across architectures] — unless you keep that 869 | entire table in your head for every line of code you write, you should use 870 | explicitly defined integer widths and never use char/short/int/long built-in 871 | storage types. 872 | 873 | Also see [size\_t and ptrdiff\_t] 874 | 875 | Also see [Secure Coding]. If you really want to write everything perfectly, 876 | simply memorize their thousand simple examples. 877 | 878 | Also see [Modern C] by Jens Gustedt at Inria. 879 | 880 | ### Closing 881 | 882 | Writing correct code at scale is essentially impossible. We have multiple 883 | operating systems, runtimes, libraries, and hardware platforms to worry about 884 | without even considering things like random bit flips in RAM or our block 885 | devices lying to us with unknown probability. 886 | 887 | The best we can do is write simple, understandable code with as few 888 | indirections and as little undocumented magic as possible. 889 | 890 | -[Matt] — [@mattsta] — [☁mattsta] 891 | 892 | ### Attributions 893 | 894 | This made the twitter and HN rounds, so many people helpfully pointed out flaws 895 | or biased thoughts I'm promulgating here. 896 | 897 | First up, Jeremy Faller and [Sos Sosowski] and Martin Heistermann and a few 898 | other people were kind enough to point out my `memset()` example was broken and 899 | provided the proper fix. 900 | 901 | Martin Heistermann also pointed out the `localThing = localThingNull` example 902 | was broken. 903 | 904 | The opening quote about not writing C if you can avoid it is from the wise 905 | internet sage [@badboy\_]. 906 | 907 | [Remi Gacogne] pointed out I forgot `-Wextra`. 908 | 909 | [Levi Pearson] pointed out gcc-5 defaults to gnu11 instead of c89 as well as 910 | clarifying the default clang mode. 911 | 912 | [Christopher] pointed out the `-O2` vs `-O3` section could use a little more 913 | clarification. 914 | 915 | [Chad Miller] pointed out I was being lazy in the clang-format script params. 916 | 917 | [Many] people also pointed out the `calloc()` advice isn't *always* a good idea 918 | if you have extreme circumstances or non-standard hardware (examples of bad 919 | ideas: huge allocations, allocations on embedded jiggers, allocations on 30 920 | year old hardware, etc). 921 | 922 | Charles Randolph pointed out I misspelled the world "Building." 923 | 924 | Sven Neuhaus pointed out kindly I also do not posess the ability to spell 925 | "initialization" or "initializers." (and also pointed out I misspelled 926 | "initialization" wrong the first time here as well) 927 | 928 | [Colm MacCárthaigh] pointed out I forgot to mention `#pragma once`. 929 | 930 | [Jeffrey Yasskin] pointed out we should kill strict aliasing too (mainly a gcc 931 | optimization). 932 | 933 | Jeffery Yasskin also provided better wording around the 934 | `-fno-strict-aliasing` section. 935 | 936 | [Chris Palmer] and a few others pointed out calloc-vs-malloc parameter 937 | advantages and the overall drawback of writing a wrapper for `calloc()` because 938 | `calloc()` provides a more secure interface than `malloc()` in the first place. 939 | 940 | Damien Sorresso pointed out we should remind people `realloc()` doesn't zero 941 | out grown memory after an initial zero'd `calloc()` request. 942 | 943 | Pat Pogson pointed out I was unable to spell the word "declare" correctly 944 | as well. 945 | 946 | [@TopShibe] pointed out the stack-allocated initialization example was wrong 947 | because the examples I gave were global variables. Updated wording to just mean 948 | "auto-allocated" things, be it stack or data sections. 949 | 950 | [Jonathan Grynspan][dangerous anti-pattern] suggested harsher wording around 951 | the VLA example because they **are** dangerous when used incorrectly. 952 | 953 | David O'Mahony kindly pointed out I can't spell "specify" either. 954 | 955 | Dr. David Alan Gilbert pointed out `ssize_t` is a POSIXism and Windows either 956 | doesn't have it or defines `ssize_t` as an *unsigned* quantity which obviously 957 | introduces all kinds of fun behavior when your type is signed on POSIX 958 | platforms and unsigned on Windows. 959 | 960 | Chris Ridd suggested we explicitly mention C99 is C from 1999 and C11 is C from 961 | 2011 because otherwise it looks strange having 11 be newer than 99. 962 | 963 | Chris Ridd also noticed the `clang-format` example used unclear naming 964 | conventions and suggested better consistency across examples. 965 | 966 | [Anthony Le Goff] pointed us to a book-length treatment of many modern C ideas 967 | called [Modern C]. 968 | 969 | Stuart Popejoy pointed out my inaccurate spelling of deliberately was truly 970 | inaccurate. 971 | 972 | jack rosen pointed out my usage of the word 'exists' does not mean 'exits' as I 973 | intended. 974 | 975 | Jo Booth pointed out I like to spell compatibility as compatability, which 976 | seems more logical, but English commonality disagrees. 977 | 978 | Stephen Anderson decoded my aberrant spelling of 'stil' back into 'still.' 979 | 980 | Richard Weinberger pointed out struct initialization with `{0}` doesn't zero 981 | out padding bytes, so sending a `{0}` struct over the wire can leak unintended 982 | bytes on under-specified structs. 983 | 984 | [@JayBhukhanwala] pointed out the function comment in [Return Parameter Types] 985 | was inaccurate because I didn't update the comment when the code changed (story 986 | of our lives, right?). 987 | 988 | Lorenzo pointed out we should explicitly provide a warning concerning potential 989 | cross-platform alignment issues in section [Parameter Types]. 990 | 991 | [Paolo G. Giarrusso] re-clarified the alignment warning I previously added to 992 | be more correct regarding the examples given. 993 | 994 | Fabian Klötzl provided the valid struct compound literal assignment example 995 | since it's perfectly valid syntax I just hadn't run across before. 996 | 997 | Omkar Ekbote provided a very thorough walkthrough of typos and consistency 998 | problems here including that I couldn't spell "platform," "actually," 999 | "defining," "experience," "simultaneously," "readability," as well as noted 1000 | some other unclear wordings. 1001 | 1002 | Carlo Bellettini fixed my aberrant spelling of the word aberrant. 1003 | 1004 | [Keith S Thompson][Keith S. Thompson] provided many technical corrections in 1005 | his great article [how-to-c-response]. 1006 | 1007 | Many people on reddit went apeshit because this article originally had 1008 | `#import` somewhere by mistake. Sorry, crazy people, but this started out as an 1009 | unedited and unreviewed year old draft when originally pushed live. The error 1010 | has since been remedied. 1011 | 1012 | Some people also pointed out the static initialization example uses globals 1013 | which are always initialized to zero by default anyway (and that they aren't 1014 | even initialized, they are statically allocated). This is a poor choice of 1015 | example on my part, but the concepts still stand for typical usage within 1016 | function scopes. The examples were meant to be any generic "code snippet" and 1017 | not necessarily top level globals. 1018 | 1019 | A few people seem to have read this as an "I hate C" page, but it isn't. C is 1020 | dangerous in the wrong hands (not enough testing, not enough experience when 1021 | widely deployed), so paradoxically the two kinds of C developers should only be 1022 | novice hobbyists (code failure causes no problems, it's just a toy) or people 1023 | who are willing to test their asses off (code failure causes life or financial 1024 | loss, it's not just a toy) should be writting C code for production usage. 1025 | There's not much room for "casual observer C development." For the rest of the 1026 | world, that's why we have Erlang. 1027 | 1028 | Many people have also mentioned their own pet issues as well or issues beyond 1029 | the scope of this article (including new C11 only features like [George 1030 | Makrydakis] reminding us about C11 generic abilities). 1031 | 1032 | Perhaps another article about "Practical C" will show up to cover testing, 1033 | profiling, performance tracing, optional-but-useful warning levels, etc. 1034 | 1035 | [How to C in 2016]: https://matt.sh/howto-c 1036 | [Matt Stancliff]: http://matt.sh 1037 | [Keith S. Thompson]: https://github.com/Keith-S-Thompson 1038 | [Bryan Elliott]: https://github.com/Fordi 1039 | [howto-c-response]: https://github.com/Keith-S-Thompson/how-to-c-response/blob/master/README.md 1040 | [Dom Christie]: https://github.com/domchristie 1041 | [to-markdown]: https://github.com/domchristie/to-markdown 1042 | [pandoc]: http://pandoc.org 1043 | [Matt]: mailto:matt@matt.sh 1044 | [Adrián Arroyo Calle]: https://github.com/AdrianArroyoCalle 1045 | [¿Cómo programar en C (en 2016)?]: https://adrianarroyocalle.github.io/blog/2016/01/10/como-programar-en-c-en-2016/ 1046 | [Jan-Erik Rediger]: http://fnordig.de/ 1047 | [Enough Rope to Shoot Yourself In the Foot]: http://www.goodreads.com/book/show/103892.Enough_Rope_to_Shoot_Yourself_in_the_Foot 1048 | [early 1970s]: https://www.bell-labs.com/usr/dmr/www/chist.html 1049 | [gcc-specific extensions]: https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html 1050 | [GCC's optimize options]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html 1051 | [libutf8]: http://www.haible.de/bruno/packages-libutf8.html 1052 | [newer compiler versions]: https://twitter.com/oliviergay/status/685389448142565376 1053 | [clang LTO]: http://llvm.org/docs/LinkTimeOptimization.html 1054 | [guide]: http://llvm.org/docs/GoldPlugin.html 1055 | [gcc LTO]: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html 1056 | [stddef.h]: http://pubs.opengroup.org/onlinepubs/7908799/xsh/stddef.h.html 1057 | [pragma once]: https://en.wikipedia.org/wiki/Pragma_once 1058 | [The New C: Compound Literals]: http://www.drdobbs.com/the-new-c-compound-literals/184401404 1059 | [dangerous anti-pattern]: https://twitter.com/grynspan/status/685509158024691712 1060 | [C99 VLA]: https://en.wikipedia.org/wiki/Variable-length_array 1061 | [Some people]: https://twitter.com/comex/status/685423016981966848 1062 | [restrict keyword]: https://en.wikipedia.org/wiki/Restrict 1063 | [Unaligned Memory Access]: https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt 1064 | [clang-format]: http://clang.llvm.org/docs/ClangFormat.html 1065 | [clang-tidy]: http://clang.llvm.org/extra/clang-tidy/ 1066 | [Benchmarking fun with calloc() and zero pages (2007)]: https://blogs.fau.de/hager/archives/825 1067 | [Copy-on-write in virtual memory management]: https://en.wikipedia.org/wiki/Copy-on-write#Copy-on-write_in_virtual_memory_management 1068 | [Fixed width integer types (since C99)]: http://en.cppreference.com/w/c/types/integer 1069 | [Making Code 64-Bit Clean]: https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/64bitPorting/MakingCode64-BitClean/MakingCode64-BitClean.html 1070 | [sizes of C types across architectures]: https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=4374 1071 | [size\_t and ptrdiff\_t]: http://www.viva64.com/en/a/0050/ 1072 | [Secure Coding]: https://www.securecoding.cert.org/confluence/display/c/SEI+CERT+C+Coding+Standard 1073 | [Modern C]: http://icube-icps.unistra.fr/img_auth.php/d/db/ModernC.pdf 1074 | [@mattsta]: https://twitter.com/mattsta 1075 | [☁mattsta]: https://github.com/mattsta 1076 | [Sos Sosowski]: https://twitter.com/Sosowski/status/685431663501926400 1077 | [Remi Gacogne]: https://twitter.com/rgacogne/status/685390620723154944 1078 | [Levi Pearson]: https://twitter.com/pineal_servo/status/685393454487056384 1079 | [Christopher]: https://twitter.com/shrydar/status/685375992114757632 1080 | [Chad Miller]: https://twitter.com/chadmiller/status/685469896914919424 1081 | [Many]: https://twitter.com/lordcyphar/status/685444198481412096 1082 | [Colm MacCárthaigh]: https://twitter.com/colmmacc/status/685493166988906497 1083 | [Jeffrey Yasskin]: https://twitter.com/jyasskin/status/685493531515826176 1084 | [Chris Palmer]: https://twitter.com/fugueish/status/685503534230458369 1085 | [@TopShibe]: https://twitter.com/TopShibe/status/685505183762223105 1086 | [Anthony Le Goff]: https://twitter.com/Ideo_logiq/status/685384708188930048 1087 | [@JayBhukhanwala]: https://twitter.com/JayBhukhanwala 1088 | [Return Parameter Types]: #_return-parameter-types 1089 | [Parameter Types]: #_parameter-types 1090 | [Paolo G. Giarrusso]: https://twitter.com/Blaisorblade/status/686042231917178881 1091 | [how-to-c-response]: https://github.com/Keith-S-Thompson/how-to-c-response 1092 | [George Makrydakis]: https://twitter.com/irrequietus/status/685407732464226306 1093 | --------------------------------------------------------------------------------