├── .gitignore
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .git-backup
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | How to C in 2016
2 | ================
3 |
4 | Reuse: [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
5 |
6 | * Original document: [How to C in 2016] -- [Matt Stancliff]
7 | * Corrections from: [Keith S. Thompson]
8 | * Merged by: [Bryan Elliott]
9 |
10 | _This is a draft Matt wrote in early 2015 and never got around to publishing. He
11 | published the mostly unpolished version because it wasn't doing anybody any good
12 | sitting in his drafts folder. The simplest change was updating year 2015 to 2016
13 | at publication time._
14 |
15 | _[Keith Thompson][Keith S. Thompson] provided a nice set of
16 | corrections and alternative opinions at
17 | [howto-c-response]._
18 |
19 | _[Bryan Elliott] merged Keith's comments into Matt's text, using the response's
20 | git repository, and [Dom Christie]'s [to-markdown] converter to get text to
21 | start with, followed by some post-processing with [pandoc].
22 |
23 | *Feel free to submit fixes/improvements/complaints as necessary.
24 | -[Matt]*
25 |
26 | *[Adrián Arroyo Calle] provides a Spanish
27 | translation at
28 | [¿Cómo programar en C (en 2016)?]*
29 |
30 | *Now on to the article...*
31 |
32 | Caveat
33 | ------
34 |
35 | This page of generic overview details isn't about cross-architecture
36 | intricacies of C, but best-practices with respect to the language. External
37 | knowledge and experience is expected of you if you're to fully use any examples
38 | provided. It almost goes without saying: You should know your target platform
39 | if you intend to develop for it.
40 |
41 | Writing C
42 | ---------
43 |
44 | > The first rule of C is don't write C if you can avoid it.
45 |
46 | - [Jan-Erik Rediger]
47 |
48 | C provides a lot of power, speed, flexibility, and lifetime to code, however, C
49 | gives you, as the 1995 spiritual predecessor to this document informs you
50 | eponymously, [Enough Rope to Shoot Yourself In the Foot]. Whether or not you
51 | use C is a broad topic, largely out of scope for this document. Whatever your
52 | opinion on the topic, if you must write your software using C, you should
53 | follow modern rules - whenever "modern" happens to be.
54 |
55 | C has been around since the [early 1970s]. People have "learned C" at various
56 | points during its evolution, but knowledge usually get stuck after learning, so
57 | everybody has a different set of things they believe about C based on the
58 | year(s) they first started learning.
59 |
60 | It's important to not remain stuck in your "things I learned in the 80s/90s"
61 | mindset of C development.
62 |
63 | This page assumes you are on a modern platform conforming to modern standards
64 | and you have no excessive legacy compatibility requirements. We shouldn't be
65 | globally tied to ancient standards just because some companies refuse to
66 | upgrade 20 year old systems.
67 |
68 | Preflight
69 | ---------
70 |
71 | Standard c99 (c99 means "C Standard from 1999"; c11 means "C Standard from
72 | 2011", so 11 > 99).
73 |
74 | * clang, default
75 | * clang uses an extended version of C11 by default (`GNU C11 mode`), so no
76 | extra options are needed for modern features. _If you want standard
77 | C11, you need to specify `-std=c11`; if you want standard C99, use
78 | `-std=c99`._ clang compiles your source files faster than gcc.
79 | * gcc requires you specify `-std=c99` or `-std=c11`
80 | * gcc builds source files slower than clang, but *sometimes* generates
81 | faster code. Performance comparisons and regression testings are
82 | important.
83 | * gcc-5 defaults to `GNU C11 mode` (same as clang).
84 | * clang and gcc support the [gcc-specific extensions] this enables,
85 | however, many compilers do not. If you need wide portability - that is,
86 | _exactly_ c11 or c99 - you should still specify `-std=c11`
87 | or `-std=c99`.
88 |
89 | Optimizations
90 |
91 | * -O2, -O3
92 | * generally you want `-O2`, but sometimes you want `-O3`. Test under both
93 | levels (and across compilers) then keep the best performing binaries.
94 | * -Os
95 | * `-Os` (optimize for size) helps if your concern is cache efficiency
96 | (which it should be)
97 | * For more information on optimization, see [GCC's optimize options]
98 |
99 | Warnings
100 |
101 | * `-Wall -Wextra -pedantic`
102 | * [newer compiler versions] have `-Wpedantic`, but they still accept the
103 | ancient `-pedantic` as well for wider backwards compatibility.
104 | * during testing you should add `-Werror` and `-Wshadow` on all your platforms
105 | * it can be tricky deploying production source using `-Werror` because
106 | different platforms and compilers and libraries can emit different
107 | warnings. You probably don't want to kill a user's entire build just
108 | because their version of GCC on a platform you've never seen complains
109 | in new and wonderous ways.
110 | * extra fancy options include `-Wstrict-overflow -fno-strict-aliasing`
111 | * Either specify `-fno-strict-aliasing` or be sure to only access objects
112 | as the type they have at creation. Since so much existing C code
113 | aliases across types, using `-fno-strict-aliasing` is a much safer bet
114 | if you don't control the entire underlying source tree.
115 | * as of now, Clang reports some valid syntax as a warning, so you should add
116 | `-Wno-missing-field-initializers`
117 | * GCC fixed this unnecessary warning after GCC 4.7.0
118 |
119 | Building
120 |
121 | * Compilation units
122 | * The most common way of building C projects is to decompose every source
123 | file into an object file then link all the objects together at the end.
124 | This procedure works great for incremental development, but it is
125 | suboptimal for performance and optimization. Your compiler can't detect
126 | potential optimization across file boundaries this way.
127 | * LTO — Link Time Optimization
128 | * LTO fixes the "source analysis and optimization across compilation units
129 | problem" by annotating object files with intermediate representation so
130 | source-aware optimizations can be carried out across compilation units
131 | at link time (this slows down the linking process noticeably, but `make
132 | -j` helps).
133 | * [clang LTO] ([guide])
134 | * [gcc LTO]
135 | * As of 2016, clang and
136 | gcc releases support LTO by just adding `-flto` to your command line
137 | options during object compilation and final library/program linking.
138 | * `LTO` still needs some babysitting though. Sometimes, if your program has
139 | code not used directly but used by additional libraries, LTO can evict
140 | functions or code because it detects, globally when linking, some code
141 | is unused/unreachable and doesn't *need* to be included in the final
142 | linked result.
143 |
144 | Arch
145 |
146 | * `-march=native`
147 | * give the compiler permission to use your CPU's full feature set
148 | * again, performance testing and regression testing is important (then
149 | comparing the results across multiple compilers and/or compiler
150 | versions) is important to make sure any enabled optimizations don't
151 | have adverse side effects.
152 | * `-msse2` and `-msse4.2` may be useful if you need to target
153 | not-your-build-machine features.
154 |
155 | Writing code
156 | ------------
157 |
158 | ### Types
159 |
160 | If you find yourself typing `char` or `short` or `long` or `unsigned`
161 | into new code, you should question the purpose of the variable.
162 |
163 | `int` is going to be the most "natural" integer type for the current platform -
164 | which may or may not be what you want. If you want signed integers that are
165 | reasonably fast and are at least 16 bits, there's nothing wrong with using
166 | `int`. ``'s int_least16_t`, is usually the same type - they have the
167 | same requirements, at least - but is more verbose than it needs to be.
168 |
169 | To ensure consistency in non-storage data types for modern programs, you should
170 | `#include ` then use _standard_ types.
171 |
172 | The common standard types are:
173 |
174 | * `int8_t`, `int16_t`, `int32_t`, `int64_t` — signed integers
175 | * `uint8_t`, `uint16_t`, `uint32_t`, `uint64_t` — unsigned integers
176 | * `float` - 32-bit minimum floating point
177 | * `double` - 64-bit minimum floating point
178 |
179 | Developers routinely abuse `char` to mean "byte" even when they are doing
180 | unsigned byte manipulations - however, while `char` is guaranteed to mean a
181 | byte, a byte inly guaranteed to be _at least_ 8 bits; it is not guaranteed to be
182 | _only_ 8 bits. That said, POSIX requires that CHAR_BIT == 8.
183 |
184 | If you want bytes, use unsigned char. If you want octets, use uint8_t. If
185 | CHAR_BIT > 8, uint8_t won't exist, and your code won't compile (which is
186 | probably what you want).
187 |
188 | If a pre-existing API requires `char` (e.g. `strncat`, printf'ing "%s", ...) or
189 | if you're initializing a read-only string (e.g. `const char *hello = "hello";`)
190 | because the C type of string literals (`"hello"`) is `char []`.
191 |
192 | In C11 we have native unicode support, and the type of UTF-8 string literals is
193 | still `char *` even for multibyte sequences like `const char *abcgrr =
194 | u8"abc😬";` - but keep in mind, strlen still reports the number of _bytes_ in a
195 | char[], not the number of codepoints. If you need strong UTF-8 support for
196 | things like parsing and text processing, it's recommended to use [libutf8].
197 |
198 | #### Signedness
199 |
200 | At no point should you be typing the word `unsigned` into your code. We can now
201 | write code without the ugly C convention of multi-word types that impair
202 | readability as well as usage. Who wants to type `unsigned long long int` when
203 | you can type `uint64_t`? The `` types are more *explicit*, more
204 | *exact* in meaning, convey *intentions* better, and are more *compact* for
205 | typographic *usage* and *readability*.
206 |
207 | But, you may say, "I need to cast pointers to `long` for dirty pointer math!"
208 |
209 | You may say that. But you are wrong.
210 |
211 | The correct type for pointer math is `uintptr_t` defined in ``, while
212 | the also useful `ptrdiff_t` is defined in [stddef.h].
213 |
214 | Instead of:
215 |
216 | long diff = (long)ptrOld - (long)ptrNew;
217 |
218 | Use:
219 |
220 | ptrdiff_t diff = (uintptr_t)ptrOld - (uintptr_t)ptrNew;
221 |
222 | Also:
223 |
224 | printf("%p is unaligned by %" PRIuPTR " bytes.\n", (void *)p, ((uintptr_t)somePtr & (sizeof(void *) - 1)));
225 |
226 | #### System-Dependent Types
227 |
228 | You continue arguing, "on a 32 bit platform I want 32 bit longs and on a 64 bit
229 | platform I want 64 bit longs!"
230 |
231 | If we skip over the line of thinking where you are *deliberately* introducing
232 | difficult to reason about code by using two different sizes depending on
233 | platform, you still don't want to use `long` for system-dependent types.
234 |
235 | In these situations, you should use `intptr_t`, defined in ` — the
236 | integer type defined to be the word size of your current platform.
237 |
238 | On 32-bit platforms, `intptr_t` is normally `int32_t`.
239 |
240 | On 64-bit platforms, `intptr_t` is normally `int64_t`.
241 |
242 | `intptr_t` also comes in a `uintptr_t` flavor. It's possible that an
243 | implementation that cannot convert `void*` to an integer type without loss of
244 | information will not define `uintptr_t` - however, such implementations are rare,
245 | perhaps nonexistent.
246 |
247 | For holding pointer offsets, we have the aptly named `ptrdiff_t` which is the
248 | proper type for storing values of subtracted pointers.
249 |
250 | #### Maximum Value Holders
251 |
252 | Do you need an integer type capable of holding any integer usable on
253 | your system?
254 |
255 | People tend to use the largest known type in this case, such as casting smaller
256 | unsigned types to `uint64_t`, but there's a more technically correct way to
257 | guarantee any value can hold any other value.
258 |
259 | The safest container for any integer is `intmax_t` (also `uintmax_t`). You can
260 | assign or cast any signed integer to `intmax_t` with no loss of precision, and
261 | you can assign or cast any unsigned integer to `uintmax_t` with no loss of
262 | precision.
263 |
264 | #### That Other Type
265 |
266 | The most widely used system-dependent type is `size_t` and is provided
267 | by [stddef.h].
268 |
269 | `size_t` is basically as "is the unsigned integral type of the result of the
270 | sizeof operator" which also means it's capable of holding the largest memory
271 | offset within an object.
272 |
273 | In practical use, `size_t` is the return type of `sizeof` operator.
274 |
275 | In either case: `size_t` is *practically* defined to be the same as `uintptr_t`
276 | on all modern platforms, so on a 32-bit platform `size_t` is normally
277 | `uint32_t` and on a 64-bit platform `size_t` is normally `uint64_t`.
278 |
279 | On modern desktops, `size_t` can represent any offset within your program -
280 | however, in legacy systems (e.g., older x86 systems that exposed addressing with
281 | "near" and "far" pointers), possible future systems that violate the assumption
282 | of pointer consistency, as well as some embedded systems, this is not always
283 | true.
284 |
285 | On POSIX, there is also `ssize_t` which is a signed `size_t` used as the return
286 | value from library functions that return `-1` on error. On Windows systems,
287 | most POSIX functions that return `ssize_t` return `int` instead; you should do
288 | something like this in your program's common types header:
289 |
290 | #ifndef _POSIX_VERSION
291 | # ifdef _WIN32
292 | typedef int ssize_t
293 | # endif
294 | #endif
295 |
296 | So, should you use `size_t` for arbitrary system-dependent sizes in your own
297 | function parameters? Technically, `size_t` is the return type of `sizeof`, so
298 | any functions accepting a size value representing a number of bytes is allowed
299 | to be a `size_t`.
300 |
301 | Other uses include: `size_t` is the type of the argument to malloc, and
302 | `ssize_t` is the return type of `read()` and `write()` (except on Windows where
303 | `ssize_t` doesn't exist and the return values are just `int`).
304 |
305 | #### Printing Types
306 |
307 | You should avoid casting types during printing, opting instead to use proper
308 | type specifiers.
309 |
310 | These include, but are not limited to:
311 |
312 | * `size_t` - `%zu`
313 | * `ssize_t` - `%zd`
314 | * `ptrdiff_t` - `%td`
315 | * raw pointer value - `%p` (prints hex in modern compilers; cast your pointer
316 | to `(void *)` first)
317 | * 64-bit types should be printed using `PRIu64` (unsigned) and `PRId64` (signed)
318 | * on some platforms a 64-bit value is a `long` and on others it's a
319 | `long long`
320 | * it is actually impossible to specify a correct cross-platform format
321 | string without these format macros because the types change out from
322 | under you (and remember, casting values before printing is not safe or
323 | logical).
324 | * `intptr_t` — `"%" PRIdPTR`
325 | * `uintptr_t` — `"%" PRIuPTR`
326 | * `intmax_t` — `"%" PRIdMAX`
327 | * `uintmax_t` — `"%" PRIuMAX`
328 |
329 | One note about the `PRI*` formatting specifiers: they are *macros* and the
330 | macros expand to proper printf type specifiers on a platform-specific basis.
331 | This means you can't do:
332 |
333 | printf("Local number: %PRIdPTR\n\n", someIntPtr);
334 |
335 | but instead, because they are macros, you do:
336 |
337 | printf("Local number: %" PRIdPTR "\n\n", someIntPtr);
338 |
339 | Notice you put the `%` *inside* the format string literal within your code, but
340 | the type specifier is *outside* your format string literal. This is because
341 | all adjacent strings get concatentated by the preprocessor into one final
342 | combined string literal.
343 |
344 | ### C99 allows variable declarations anywhere
345 |
346 | So, do NOT do this:
347 |
348 | void test(uint8_t input) {
349 | uint32_t b;
350 |
351 | if (input > 3) {
352 | return;
353 | }
354 |
355 | b = input;
356 | }
357 |
358 | do THIS instead:
359 |
360 | void test(uint8_t input) {
361 | if (input > 3) {
362 | return;
363 | }
364 |
365 | uint32_t b = input;
366 | }
367 |
368 | Caveat: if you have tight loops, test the placement of your initializers.
369 | Sometimes scattered declarations can cause unexpected slowdowns. For regular
370 | non-fast-path code (which is most of everything in the world), it's best to be
371 | as clear as possible, and defining types next to your initializations is a big
372 | readability improvement.
373 |
374 | ### C99 allows `for` loops to declare counters inline
375 |
376 | So, do NOT do this:
377 |
378 | uint32_t i;
379 |
380 | for (i = 0; i < 10; i++)
381 |
382 | Do THIS instead:
383 |
384 | for (uint32_t i = 0; i < 10; i++)
385 |
386 | One exception: if you need to retain your counter value after the loop exits,
387 | obviously don't declare your counter scoped to the loop itself.
388 |
389 | ### Most modern compilers support `#pragma once`
390 |
391 | `#pragma once` tells the compiler to only include your header once and you _do
392 | not_ need three lines of header guards anymore. This pragma is widely supported
393 | across most compilers across all platforms. One notable exception is Oracle's
394 | Solaris Studio C/C++.
395 |
396 | Symlinks and hardlinks can cause the same file to be found under different
397 | names, which can confuse `#pragma once`. Moreover, include guarding may
398 | incorrectly treat two _different_ files as the same file, if the compiler is
399 | using heuristics to compare linked files (such as size or modification time).
400 | In cases like this, you'll want to define your own symbols using ifndef/define.
401 |
402 | Otherwise, instead of this:
403 |
404 | #ifndef PROJECT_HEADERNAME
405 | #define PROJECT_HEADERNAME
406 | .
407 | .
408 | .
409 | #endif /* PROJECT_HEADERNAME */
410 |
411 | You have the option of doing this instead:
412 |
413 | #pragma once
414 |
415 | Which is, in our opinion, much cleaner code.
416 |
417 | For more details, see list of supported compilers at [pragma once].
418 |
419 | ### C allows static initialization of auto-allocated arrays
420 |
421 | So, do NOT do this:
422 |
423 | uint32_t numbers[64];
424 | memset(numbers, 0, sizeof(numbers));
425 |
426 | Do THIS instead:
427 |
428 | uint32_t numbers[64] = {0};
429 |
430 | ### C allows static initialization of auto-allocated structs
431 |
432 | So, do NOT do this:
433 |
434 | struct thing {
435 | uint64_t index;
436 | uint32_t counter;
437 | };
438 |
439 | struct thing localThing;
440 |
441 | void initThing(void) {
442 | memset(&localThing, 0, sizeof(localThing));
443 | }
444 |
445 | Do THIS instead:
446 |
447 | struct thing {
448 | uint64_t index;
449 | uint32_t counter;
450 | };
451 |
452 | struct thing localThing = {0};
453 |
454 | **NOTE**: While there's normally no reason to care about padding bytes, in the
455 | event you do, it's important to know that the `{0}` method does not zero them
456 | out. For example, on a 64-bit platform, `struct thing` will have 4 bytes of
457 | padding after `counter` (on a 64-bit platform) because structs are padded to
458 | word-sized increments. If you need to zero out an entire struct *including*
459 | unused padding, use `memset(&localThing, 0, sizeof(localThing))` because
460 | `sizeof(localThing) == 16 bytes` even though the addressable contents is only
461 | `8 + 4 = 12 bytes`.
462 |
463 | If you need to re-initialize already allocated structs, declare a global
464 | zero-struct for later assignment:
465 |
466 | struct thing {
467 | uint64_t index;
468 | uint32_t counter;
469 | };
470 |
471 | static const struct thing localThingNull = {0};
472 | .
473 | .
474 | .
475 | struct thing localThing = {.counter = 3};
476 | .
477 | .
478 | .
479 | localThing = localThingNull;
480 |
481 | If you are lucky enough to be in a C99 (or newer) environment, you can use
482 | compound literals instead of keeping a global "zero struct" around (also see,
483 | from 2001, [The New C: Compound Literals]).
484 |
485 | Compound literals allow you to directly assign from anyonomus structs:
486 |
487 | localThing = (struct thing){0};
488 |
489 | ### C99 added variable length arrays (C11 made them optional)
490 |
491 | So, do NOT do this:
492 |
493 | uintmax_t arrayLength = strtoumax(argv[1], NULL, 10);
494 | void *array[];
495 |
496 | array = malloc(sizeof(*array) * arrayLength);
497 |
498 | /* remember to free(array) when you're done using it */
499 |
500 | Do THIS instead:
501 |
502 | uintmax_t arrayLength = strtoumax(argv[1], NULL, 10);
503 | void *array[arrayLength];
504 |
505 | /* no need to free array */
506 |
507 | **IMPORTANT CAVEAT:** variable length arrays are (usually) stack allocated just
508 | like regular arrays. If you wouldn't create a 3 million element regular array
509 | statically, don't attempt to create a 3 million element array at runtime using
510 | this syntax. These are not scalable python/ruby auto-growing lists. If you
511 | specify a runtime array length and the length is too big for your stack, your
512 | program will do awful things (crashes, security issues). Variable Length Arrays
513 | are convienient for small, single-purpose situations, but should not be relied
514 | on at scale in production software. If sometimes you need a 3 element array and
515 | other times a 3 million element array, definitely do not use the variable
516 | length array capability.
517 |
518 | It's good to be aware of the VLA syntax in case you encounter it live (or want
519 | it for quick one-off testing), but it can almost be considered a [dangerous
520 | anti-pattern] since you can crash your programs fairly simple by forgetting
521 | element size bounds checks or by forgetting you are on a strange target
522 | platform with no free stack space.
523 |
524 | NOTE: You must be certain `arrayLength` is a reasonable size in this situation.
525 | (i.e. less than a few KB, sometime your stack will max out at 4 KB on weird
526 | platforms). You can't stack allocate *huge* arrays (millions of entries), but
527 | if you know you have a limited count, it's much easier to use [C99 VLA]
528 | capabilities rather than manually requesting heap memory from malloc.
529 |
530 | DOUBLE NOTE: there is no user input checking above, so the user can easily kill
531 | your program by allocating a giant VLA. [Some people] go as far to call VLAs an
532 | anti-pattern, but if you keep your bounds tight, it can be a tiny win in
533 | certain situations.
534 |
535 | ### C99 allows annotating non-overlapping pointer parameters
536 |
537 | See the [restrict keyword] (often `__restrict`)
538 |
539 | ### Parameter Types
540 |
541 | If a function accepts **arbitrary** input data and a length to process, don't
542 | restrict the type of the parameter.
543 |
544 | So, do NOT do this:
545 |
546 | void processAddBytesOverflow(uint8_t *bytes, size_t len) {
547 | for (uint32_t i = 0; i < len; i++) {
548 | bytes[0] += bytes[i];
549 | }
550 | }
551 |
552 | Do THIS instead:
553 |
554 | void processAddBytesOverflow(void *input, size_t len) {
555 | uint8_t *bytes = (uint8_t*) input;
556 |
557 | for (uint32_t i = 0; i < len; i++) {
558 | bytes[0] += bytes[i];
559 | }
560 | }
561 |
562 | The input types to your functions describe the *interface* to your code, not
563 | what your code is doing with the parameters. The interface to the code above
564 | means "accept a byte array and a length", so you don't want to restrict your
565 | callers to only uint8\_t byte streams. Maybe your users even want to pass in
566 | old-style `char *` values or something else unexpected.
567 |
568 | By declaring your input type as `void *` then re-assigning or re-casting to the
569 | actual type you want inside your function, you save the users of your function
570 | from having to think about abstractions *inside* your own library.
571 |
572 | Some readers have pointed out alignment problems with analogues to this example:
573 | while accessing a sequence of bytes, as we do here, is always safe, accessing
574 | wider types might not be; for a different write up dealing with cross-platform
575 | alignment issues, see [Unaligned Memory Access].
576 |
577 | ### Return Parameter Types
578 |
579 | C99 gives us the power of `` which defines `true` to `1` and `false`
580 | to `0`.
581 |
582 | A widespread convention within POSIX systems is for return value >=0 for
583 | success, and <0 for one of a number of failure codes. `0` is often used for
584 | success, since typically there's only one way for a function to succeed, but
585 | multiple paths to failure. It's important to follow this convention when
586 | adding new functions to such an interface.
587 |
588 | If you do this, and you don't need to report a positive for success, you may
589 | want to define an enum that gives some description of the return value, for the
590 | sake of readability, both up and downstream:
591 |
592 | enum My_Status_Code {
593 | error_io = -2,
594 | error_sz = -1,
595 | ok = 0
596 | };
597 |
598 | /* ... */
599 |
600 | switch (response) {
601 | case My_Status_Code.error_io:
602 | // report IO error
603 | break;
604 | case My_Status_Code.error_sz:
605 | // report size error
606 | break;
607 | case My_Status_Code.ok:
608 | // it worked.
609 | break;
610 | }
611 |
612 | If your function should either succeed or fail and there's no detail necessary
613 | in how it does so, you should return `true` or `false`.
614 |
615 | If a function mutates an input parameter to the extent the parameter is
616 | invalidated, instead of returning the altered pointer, your entire API should
617 | force double pointers as parameters anywhere an input can be invalidated.
618 | Coding with "for some calls, the return value invalidates the input" is too
619 | error prone for mass usage.
620 |
621 | So, do NOT do this:
622 |
623 | void *growthOptional(void *grow, size_t currentLen, size_t newLen) {
624 | if (newLen > currentLen) {
625 | void *newGrow = realloc(grow, newLen);
626 | if (newGrow) {
627 | /* resize success */
628 | grow = newGrow;
629 | } else {
630 | /* resize failed, free existing and signal failure through NULL */
631 | free(grow);
632 | grow = NULL;
633 | }
634 | }
635 |
636 | return grow;
637 | }
638 |
639 | Do THIS instead:
640 |
641 | /* Return value:
642 | * - 'true' if newLen > currentLen and attempted to grow
643 | * - 'true' does not signify success here, the success is still in '*_grow'
644 | * - 'false' if newLen <= currentLen */
645 | bool growthOptional(void **_grow, size_t currentLen, size_t newLen) {
646 | void *grow = *_grow;
647 | if (newLen > currentLen) {
648 | void *newGrow = realloc(grow, newLen);
649 | if (newGrow) {
650 | /* resize success */
651 | *_grow = newGrow;
652 | return true;
653 | }
654 |
655 | /* resize failure */
656 | free(grow);
657 | *_grow = NULL;
658 |
659 | /* for this function,
660 | * 'true' doesn't mean success, it means 'attempted grow' */
661 | return true;
662 | }
663 |
664 | return false;
665 | }
666 |
667 | Or, even better, Do THIS instead:
668 |
669 | typedef enum growthResult {
670 | GROWTH_RESULT_SUCCESS = 1,
671 | GROWTH_RESULT_FAILURE_GROW_NOT_NECESSARY,
672 | GROWTH_RESULT_FAILURE_ALLOCATION_FAILED
673 | } growthResult;
674 |
675 | growthResult growthOptional(void **_grow, size_t currentLen, size_t newLen) {
676 | void *grow = *_grow;
677 | if (newLen > currentLen) {
678 | void *newGrow = realloc(grow, newLen);
679 | if (newGrow) {
680 | /* resize success */
681 | *_grow = newGrow;
682 | return GROWTH_RESULT_SUCCESS;
683 | }
684 |
685 | /* resize failure, don't remove data because we can signal error */
686 | return GROWTH_RESULT_FAILURE_ALLOCATION_FAILED;
687 | }
688 |
689 | return GROWTH_RESULT_FAILURE_GROW_NOT_NECESSARY;
690 | }
691 |
692 | ### Formatting
693 |
694 | Coding style is simultaneously very important and utterly worthless.
695 |
696 | If your project has a 50 page coding style guideline, nobody will help you.
697 | But, if your code isn't readable, nobody will *want* to help you.
698 |
699 | The solution here is to **always** use an automated code formatter.
700 |
701 | The only usable C formatter as of 2016 is [clang-format]. clang-format has the
702 | best defaults of any automatic C formatter and is still actively developed.
703 |
704 | Here's my preferred script to run clang-format with good parameters:
705 |
706 | #!/usr/bin/env bash
707 |
708 | clang-format -style="{BasedOnStyle: llvm, IndentWidth: 4, AllowShortFunctionsOnASingleLine: None, KeepEmptyLinesAtTheStartOfBlocks: false}" "$@"
709 |
710 | Then call it as (assuming you named the script `cleanup-format`):
711 |
712 | matt@foo:~/repos/badcode% cleanup-format -i *.{c,h,cc,cpp,hpp,cxx}
713 |
714 | The `-i` option overwrites existing files in place with formatting changes
715 | instead of writing to new files or creating backup files.
716 |
717 | If you have many files, you can recursively process an entire source tree
718 | in parallel:
719 |
720 | #!/usr/bin/env bash
721 |
722 | # note: clang-tidy only accepts one file at a time, but we can run it
723 | # parallel against disjoint collections at once.
724 | find . \( -name \*.c -or -name \*.cpp -or -name \*.cc \) |xargs -n1 -P4 cleanup-tidy
725 |
726 | # clang-format accepts multiple files during one run, but let's limit it to 12
727 | # here so we (hopefully) avoid excessive memory usage.
728 | find . \( -name \*.c -or -name \*.cpp -or -name \*.cc -or -name \*.h \) |xargs -n12 -P4 cleanup-format -i
729 |
730 | Now, there's a new cleanup-tidy script there. The contents of `cleanup-tidy` is:
731 |
732 | #!/usr/bin/env bash
733 |
734 | clang-tidy \
735 | -fix \
736 | -fix-errors \
737 | -header-filter=.* \
738 | --checks=readability-braces-around-statements,misc-macro-parentheses \
739 | $1 \
740 | -- -I.
741 |
742 | [clang-tidy] is policy driven code refactoring tool. The options above enable
743 | two fixups:
744 |
745 | * `readability-braces-around-statements` — force all `if`/`while`/`for`
746 | statement bodies to be enclosed in braces
747 | * It's an accident of history for C to have "brace optional" single
748 | statements after loop constructs and conditionals. It is *inexcusable*
749 | to write modern code without braces enforced on every loop and every
750 | conditional. Trying to argue "but, the compiler accepts it!" has
751 | *nothing* to do with the readability, maintainability,
752 | understandability, or skimability of code. You aren't programming to
753 | please your compiler, you are programming to please future people who
754 | have to maintain your current brain state years after everybody has
755 | forgotten why anything exists in the first place.
756 | * `misc-macro-parentheses` — automatically add parens around all parameters
757 | used in macro bodies
758 |
759 | `clang-tidy` is great when it works, but for some complex code bases it can get
760 | stuck. Also, `clang-tidy` doesn't *format*, so you need to run `clang-format`
761 | after you tidy to align new braces and reflow macros.
762 |
763 | Remmeber, however, that there is an important, overriding rule to code formatting
764 | in any situation:
765 |
766 | * **Follow the conventions of the project you're working on.**
767 |
768 | ### Readability
769 |
770 | *the writing seems to start slowing down here...*
771 |
772 | #### Comments
773 |
774 | logical self-contained portions of code file
775 |
776 | #### File Structure
777 |
778 | Try to limit files to a max of 1,000 lines (1,500 lines in really bad cases).
779 | If your tests are in-line with your source file (for testing static functions,
780 | etc), adjust as necessary.
781 |
782 | ### misc thoughts
783 |
784 | #### Allocation
785 |
786 | You should usually use `calloc`. For most allocations, there is no performance
787 | penalty for getting zero'd memory.
788 |
789 | That said, `calloc` *does* have a performance impact for **huge** allocations,
790 | and on some embedded targets, legacy hardware, etc - but in no case is it slower
791 | than a `malloc/memset` call.
792 |
793 | Additionally, zeroing memory often means that buggy code (yes, your code is
794 | buggy. So's mine.) will have consistent behavior; but, by definition, it will
795 | not have correct behavior. Consistently incorrect behavior can be more
796 | difficult to track down. If you're trying to program defensively, you might
797 | consider initializing allocated memory to some value that's known to be
798 | _in_valid rather than one that might be valid.
799 |
800 | If you don't like the function protype of `calloc(object count, size per
801 | object)` you can wrap it with `#define mycalloc(N) calloc(1, N)` - though, this
802 | may not always be the best thing to do.
803 |
804 | One advantage of using `calloc()` directly without a wrapper is, unlike
805 | `malloc()`, `calloc()` can check for integer overflow because it multiplies its
806 | arguments together to obtain your final allocation size. If you are only
807 | allocating tiny things, wrapping `calloc()` is fine. If you are allocating
808 | potentially unbounded streams of data, you may want to retain the regular
809 | `calloc(element count, size of each element)` calling convention.
810 |
811 | However, `calloc` allocations remove valgrind's ability to warn you about
812 | unintentional reads or copies of uninitialized memory since allocations get
813 | initialized to `0` automatically
814 |
815 | No advice can be universal, but trying to give *exactly perfect* generic
816 | recommendations (especially with regards to memory allocation) would end up
817 | reading like a book of language specifications.
818 |
819 | For references on how `calloc()` gives you clean memory for free, see these
820 | nice writeups:
821 |
822 | * [Benchmarking fun with calloc() and zero pages (2007)]
823 | * [Copy-on-write in virtual memory management]
824 |
825 | All that said, we maintain that the best practice is to always use `calloc()`
826 | for most common scenarios of 2016.
827 |
828 | Side Note: The pre-zero'd memory delivered to you by `calloc()` is a one-shot
829 | deal. If you `realloc()` your `calloc()` allocation, the grown memory extended
830 | by realloc is *not* new zero'd out memory. Your grown allocation is filled with
831 | whatever regular uninitialized contents your kernel provides. If you need
832 | zero'd memory after a realloc, you must manually `memset()` the extent of your
833 | grown allocation.
834 |
835 | #### Avoid memset
836 |
837 | Never `memset(ptr, 0, len)` when you can statically initialize a structure (or
838 | array) to zero (or reset it back to zero by assigning from an in-line compound
839 | literal or by assigning from a global zero'd out structure; see above).
840 |
841 | Though, `memset()` is your only choice if you need to zero out a struct
842 | including its padding bytes (because `{0}` only sets defined fields, not
843 | undefined offsets filled by padding).
844 |
845 | #### Comments
846 |
847 | Comments are useful for documenting the functionality of your code, however,
848 | they have an important other function.
849 |
850 | This document describes what we consider to be best practices for C code,
851 | however, there are _always_ exceptions. When you need to deviate from
852 | standards, it's important - for other developers, and for the "you" of next
853 | week/month/year - to put a comment in your code explaining why this was
854 | required.
855 |
856 | Generally speaking, comments should not be used to hide code from the compiler
857 | - at least, not from within a source repository. Old code is already preserved
858 | in the source repository, so commented-out code only serves as a distraction.
859 | It's better to just delete commented-out code before committing.
860 |
861 | Learn More
862 | ----------
863 |
864 | Also see [Fixed width integer types (since C99)]
865 |
866 | Also see Apple's [Making Code 64-Bit Clean]
867 |
868 | Also see the [sizes of C types across architectures] — unless you keep that
869 | entire table in your head for every line of code you write, you should use
870 | explicitly defined integer widths and never use char/short/int/long built-in
871 | storage types.
872 |
873 | Also see [size\_t and ptrdiff\_t]
874 |
875 | Also see [Secure Coding]. If you really want to write everything perfectly,
876 | simply memorize their thousand simple examples.
877 |
878 | Also see [Modern C] by Jens Gustedt at Inria.
879 |
880 | ### Closing
881 |
882 | Writing correct code at scale is essentially impossible. We have multiple
883 | operating systems, runtimes, libraries, and hardware platforms to worry about
884 | without even considering things like random bit flips in RAM or our block
885 | devices lying to us with unknown probability.
886 |
887 | The best we can do is write simple, understandable code with as few
888 | indirections and as little undocumented magic as possible.
889 |
890 | -[Matt] — [@mattsta] — [☁mattsta]
891 |
892 | ### Attributions
893 |
894 | This made the twitter and HN rounds, so many people helpfully pointed out flaws
895 | or biased thoughts I'm promulgating here.
896 |
897 | First up, Jeremy Faller and [Sos Sosowski] and Martin Heistermann and a few
898 | other people were kind enough to point out my `memset()` example was broken and
899 | provided the proper fix.
900 |
901 | Martin Heistermann also pointed out the `localThing = localThingNull` example
902 | was broken.
903 |
904 | The opening quote about not writing C if you can avoid it is from the wise
905 | internet sage [@badboy\_].
906 |
907 | [Remi Gacogne] pointed out I forgot `-Wextra`.
908 |
909 | [Levi Pearson] pointed out gcc-5 defaults to gnu11 instead of c89 as well as
910 | clarifying the default clang mode.
911 |
912 | [Christopher] pointed out the `-O2` vs `-O3` section could use a little more
913 | clarification.
914 |
915 | [Chad Miller] pointed out I was being lazy in the clang-format script params.
916 |
917 | [Many] people also pointed out the `calloc()` advice isn't *always* a good idea
918 | if you have extreme circumstances or non-standard hardware (examples of bad
919 | ideas: huge allocations, allocations on embedded jiggers, allocations on 30
920 | year old hardware, etc).
921 |
922 | Charles Randolph pointed out I misspelled the world "Building."
923 |
924 | Sven Neuhaus pointed out kindly I also do not posess the ability to spell
925 | "initialization" or "initializers." (and also pointed out I misspelled
926 | "initialization" wrong the first time here as well)
927 |
928 | [Colm MacCárthaigh] pointed out I forgot to mention `#pragma once`.
929 |
930 | [Jeffrey Yasskin] pointed out we should kill strict aliasing too (mainly a gcc
931 | optimization).
932 |
933 | Jeffery Yasskin also provided better wording around the
934 | `-fno-strict-aliasing` section.
935 |
936 | [Chris Palmer] and a few others pointed out calloc-vs-malloc parameter
937 | advantages and the overall drawback of writing a wrapper for `calloc()` because
938 | `calloc()` provides a more secure interface than `malloc()` in the first place.
939 |
940 | Damien Sorresso pointed out we should remind people `realloc()` doesn't zero
941 | out grown memory after an initial zero'd `calloc()` request.
942 |
943 | Pat Pogson pointed out I was unable to spell the word "declare" correctly
944 | as well.
945 |
946 | [@TopShibe] pointed out the stack-allocated initialization example was wrong
947 | because the examples I gave were global variables. Updated wording to just mean
948 | "auto-allocated" things, be it stack or data sections.
949 |
950 | [Jonathan Grynspan][dangerous anti-pattern] suggested harsher wording around
951 | the VLA example because they **are** dangerous when used incorrectly.
952 |
953 | David O'Mahony kindly pointed out I can't spell "specify" either.
954 |
955 | Dr. David Alan Gilbert pointed out `ssize_t` is a POSIXism and Windows either
956 | doesn't have it or defines `ssize_t` as an *unsigned* quantity which obviously
957 | introduces all kinds of fun behavior when your type is signed on POSIX
958 | platforms and unsigned on Windows.
959 |
960 | Chris Ridd suggested we explicitly mention C99 is C from 1999 and C11 is C from
961 | 2011 because otherwise it looks strange having 11 be newer than 99.
962 |
963 | Chris Ridd also noticed the `clang-format` example used unclear naming
964 | conventions and suggested better consistency across examples.
965 |
966 | [Anthony Le Goff] pointed us to a book-length treatment of many modern C ideas
967 | called [Modern C].
968 |
969 | Stuart Popejoy pointed out my inaccurate spelling of deliberately was truly
970 | inaccurate.
971 |
972 | jack rosen pointed out my usage of the word 'exists' does not mean 'exits' as I
973 | intended.
974 |
975 | Jo Booth pointed out I like to spell compatibility as compatability, which
976 | seems more logical, but English commonality disagrees.
977 |
978 | Stephen Anderson decoded my aberrant spelling of 'stil' back into 'still.'
979 |
980 | Richard Weinberger pointed out struct initialization with `{0}` doesn't zero
981 | out padding bytes, so sending a `{0}` struct over the wire can leak unintended
982 | bytes on under-specified structs.
983 |
984 | [@JayBhukhanwala] pointed out the function comment in [Return Parameter Types]
985 | was inaccurate because I didn't update the comment when the code changed (story
986 | of our lives, right?).
987 |
988 | Lorenzo pointed out we should explicitly provide a warning concerning potential
989 | cross-platform alignment issues in section [Parameter Types].
990 |
991 | [Paolo G. Giarrusso] re-clarified the alignment warning I previously added to
992 | be more correct regarding the examples given.
993 |
994 | Fabian Klötzl provided the valid struct compound literal assignment example
995 | since it's perfectly valid syntax I just hadn't run across before.
996 |
997 | Omkar Ekbote provided a very thorough walkthrough of typos and consistency
998 | problems here including that I couldn't spell "platform," "actually,"
999 | "defining," "experience," "simultaneously," "readability," as well as noted
1000 | some other unclear wordings.
1001 |
1002 | Carlo Bellettini fixed my aberrant spelling of the word aberrant.
1003 |
1004 | [Keith S Thompson][Keith S. Thompson] provided many technical corrections in
1005 | his great article [how-to-c-response].
1006 |
1007 | Many people on reddit went apeshit because this article originally had
1008 | `#import` somewhere by mistake. Sorry, crazy people, but this started out as an
1009 | unedited and unreviewed year old draft when originally pushed live. The error
1010 | has since been remedied.
1011 |
1012 | Some people also pointed out the static initialization example uses globals
1013 | which are always initialized to zero by default anyway (and that they aren't
1014 | even initialized, they are statically allocated). This is a poor choice of
1015 | example on my part, but the concepts still stand for typical usage within
1016 | function scopes. The examples were meant to be any generic "code snippet" and
1017 | not necessarily top level globals.
1018 |
1019 | A few people seem to have read this as an "I hate C" page, but it isn't. C is
1020 | dangerous in the wrong hands (not enough testing, not enough experience when
1021 | widely deployed), so paradoxically the two kinds of C developers should only be
1022 | novice hobbyists (code failure causes no problems, it's just a toy) or people
1023 | who are willing to test their asses off (code failure causes life or financial
1024 | loss, it's not just a toy) should be writting C code for production usage.
1025 | There's not much room for "casual observer C development." For the rest of the
1026 | world, that's why we have Erlang.
1027 |
1028 | Many people have also mentioned their own pet issues as well or issues beyond
1029 | the scope of this article (including new C11 only features like [George
1030 | Makrydakis] reminding us about C11 generic abilities).
1031 |
1032 | Perhaps another article about "Practical C" will show up to cover testing,
1033 | profiling, performance tracing, optional-but-useful warning levels, etc.
1034 |
1035 | [How to C in 2016]: https://matt.sh/howto-c
1036 | [Matt Stancliff]: http://matt.sh
1037 | [Keith S. Thompson]: https://github.com/Keith-S-Thompson
1038 | [Bryan Elliott]: https://github.com/Fordi
1039 | [howto-c-response]: https://github.com/Keith-S-Thompson/how-to-c-response/blob/master/README.md
1040 | [Dom Christie]: https://github.com/domchristie
1041 | [to-markdown]: https://github.com/domchristie/to-markdown
1042 | [pandoc]: http://pandoc.org
1043 | [Matt]: mailto:matt@matt.sh
1044 | [Adrián Arroyo Calle]: https://github.com/AdrianArroyoCalle
1045 | [¿Cómo programar en C (en 2016)?]: https://adrianarroyocalle.github.io/blog/2016/01/10/como-programar-en-c-en-2016/
1046 | [Jan-Erik Rediger]: http://fnordig.de/
1047 | [Enough Rope to Shoot Yourself In the Foot]: http://www.goodreads.com/book/show/103892.Enough_Rope_to_Shoot_Yourself_in_the_Foot
1048 | [early 1970s]: https://www.bell-labs.com/usr/dmr/www/chist.html
1049 | [gcc-specific extensions]: https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
1050 | [GCC's optimize options]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
1051 | [libutf8]: http://www.haible.de/bruno/packages-libutf8.html
1052 | [newer compiler versions]: https://twitter.com/oliviergay/status/685389448142565376
1053 | [clang LTO]: http://llvm.org/docs/LinkTimeOptimization.html
1054 | [guide]: http://llvm.org/docs/GoldPlugin.html
1055 | [gcc LTO]: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
1056 | [stddef.h]: http://pubs.opengroup.org/onlinepubs/7908799/xsh/stddef.h.html
1057 | [pragma once]: https://en.wikipedia.org/wiki/Pragma_once
1058 | [The New C: Compound Literals]: http://www.drdobbs.com/the-new-c-compound-literals/184401404
1059 | [dangerous anti-pattern]: https://twitter.com/grynspan/status/685509158024691712
1060 | [C99 VLA]: https://en.wikipedia.org/wiki/Variable-length_array
1061 | [Some people]: https://twitter.com/comex/status/685423016981966848
1062 | [restrict keyword]: https://en.wikipedia.org/wiki/Restrict
1063 | [Unaligned Memory Access]: https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt
1064 | [clang-format]: http://clang.llvm.org/docs/ClangFormat.html
1065 | [clang-tidy]: http://clang.llvm.org/extra/clang-tidy/
1066 | [Benchmarking fun with calloc() and zero pages (2007)]: https://blogs.fau.de/hager/archives/825
1067 | [Copy-on-write in virtual memory management]: https://en.wikipedia.org/wiki/Copy-on-write#Copy-on-write_in_virtual_memory_management
1068 | [Fixed width integer types (since C99)]: http://en.cppreference.com/w/c/types/integer
1069 | [Making Code 64-Bit Clean]: https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/64bitPorting/MakingCode64-BitClean/MakingCode64-BitClean.html
1070 | [sizes of C types across architectures]: https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=4374
1071 | [size\_t and ptrdiff\_t]: http://www.viva64.com/en/a/0050/
1072 | [Secure Coding]: https://www.securecoding.cert.org/confluence/display/c/SEI+CERT+C+Coding+Standard
1073 | [Modern C]: http://icube-icps.unistra.fr/img_auth.php/d/db/ModernC.pdf
1074 | [@mattsta]: https://twitter.com/mattsta
1075 | [☁mattsta]: https://github.com/mattsta
1076 | [Sos Sosowski]: https://twitter.com/Sosowski/status/685431663501926400
1077 | [Remi Gacogne]: https://twitter.com/rgacogne/status/685390620723154944
1078 | [Levi Pearson]: https://twitter.com/pineal_servo/status/685393454487056384
1079 | [Christopher]: https://twitter.com/shrydar/status/685375992114757632
1080 | [Chad Miller]: https://twitter.com/chadmiller/status/685469896914919424
1081 | [Many]: https://twitter.com/lordcyphar/status/685444198481412096
1082 | [Colm MacCárthaigh]: https://twitter.com/colmmacc/status/685493166988906497
1083 | [Jeffrey Yasskin]: https://twitter.com/jyasskin/status/685493531515826176
1084 | [Chris Palmer]: https://twitter.com/fugueish/status/685503534230458369
1085 | [@TopShibe]: https://twitter.com/TopShibe/status/685505183762223105
1086 | [Anthony Le Goff]: https://twitter.com/Ideo_logiq/status/685384708188930048
1087 | [@JayBhukhanwala]: https://twitter.com/JayBhukhanwala
1088 | [Return Parameter Types]: #_return-parameter-types
1089 | [Parameter Types]: #_parameter-types
1090 | [Paolo G. Giarrusso]: https://twitter.com/Blaisorblade/status/686042231917178881
1091 | [how-to-c-response]: https://github.com/Keith-S-Thompson/how-to-c-response
1092 | [George Makrydakis]: https://twitter.com/irrequietus/status/685407732464226306
1093 |
--------------------------------------------------------------------------------