└── README.md /README.md: -------------------------------------------------------------------------------- 1 | This repository is a collection of neat C & C++ trivia and oddities. 2 | 3 | ### Table of contents: 4 | - [Both languages](#both-languages) 5 | - ["Special operators"](#special-operators) 6 | - [Bugs and Implementation Quirks](#bugs-and-implementation-quirks) 7 | - [C++](#c) 8 | - [Bugs and Implementation Quirks](#bugs-and-implementation-quirks-1) 9 | - [C](#c-1) 10 | - [Bugs and Implementation Quirks](#bugs-and-implementation-quirks-2) 11 | - [Talks](#talks) 12 | 13 | ## Both languages 14 | 15 | - `0` is technically tokenized as an octal literal. 16 | - Array access is commutative: `arr[i]` and `i[arr]` are equivalent. This is because array access is 17 | defined as a direct translation to `*(arr + i)` when arr is an lvalue with array type. 18 | - `sizeof(0)["abcd"]` is `1`. 19 | - C and C++ grammar allows prototypes in declaration lists: `int a, foo(), * bar(), main();`. 20 | - `https://www.google.com` is a valid line of C/C++ code, but you're limited to one occurrence of 21 | each protocol per function. 22 | - Operator precedence and associativity is *not* the same as order of evaluation. The following examples 23 | are undefined or unspecified behavior in C and some versions of C++: 24 | ```cpp 25 | void foo(int i, int* arr) { 26 | i = i++; // UB in C or before C++17 27 | i = i++ + ++i; // UB 28 | arr[i] = i++; // UB in C or before C++17 29 | bar(puts("a"), puts("b")); // clang spits out a b, gcc spits out b a 30 | } 31 | ``` 32 | [https://en.cppreference.com/w/cpp/language/eval_order](https://en.cppreference.com/w/cpp/language/eval_order) 33 | - Unknown attributes are ignored without causing an error (since C++17 and C23). This allows all 34 | sorts of attribute nonsense (And all of these can of course be applied to variables too): 35 | ```cpp 36 | [[std::vector]] void foo() {} // Yes, even in C 37 | [[code::blocks]] void foo() {} 38 | [[]] void foo() {} 39 | [[,]] void foo() {} 40 | [[]][[]][[]][[]][[]] void foo() {} 41 | [[typedef ::long]] void foo() {} 42 | [[ 43 | #include "/proc/cpuinfo" 44 | ]] void foo() {} 45 | // C++ only: 46 | [[foo...]] void foo() {} 47 | [[using std:]] void foo() {} 48 | ``` 49 | - Attributes may appear almost anywhere in a declaration: 50 | ```cpp 51 | [[foo]] int [[bar]] baz [[biz]] () [[buz]]; 52 | [[foo]] constexpr [[bar]] int [[baz]] biz [[buz]] () [[boz]]; 53 | // ^ second one is gcc and msvc only, decl-specifier-spec technically prevents an attribute here 54 | ``` 55 | - The operand of the `sizeof` operator cannot be a C-style cast. `sizeof (int)*p` is parsed as 56 | `(sizeof(int)) * p` rather than `sizeof((int)*p)`. 57 | - Precedence is ignored in the conditional operator between `?` and `:`: 58 | `c ? a = 1, y = 2 : foo();` is parsed as `c ? (a = 1, y = 2) : foo();`. 59 | - `llU` is a valid (non-user-defined) integer suffix 60 | - `(void)` cast 61 | ```c 62 | void foo(int x) { 63 | (void)x; // useful for suppressing unused parameter warnings 64 | // C++ only: (will be a warning with -Wpedantic) 65 | return (void)"You can also return anything from a void function"; 66 | } 67 | ``` 68 | - You cannot augment a typedef (or `using` alias) with `unsigned`: 69 | ```c 70 | typedef long long ll 71 | void foo(unsigned ll) {} // unsigned implies unsigned int, ll here is a parameter name 72 | ``` 73 | - `typedef` is a storage class specifier and can appear before, after, or in the middle of a type in a declaration 74 | ```c 75 | unsigned typedef int u32; 76 | ``` 77 | - Preprocessor directives can be empty: 78 | ```c 79 | #include 80 | # 81 | # 82 | int main() { 83 | # 84 | // ... 85 | } 86 | ``` 87 | - Switch statement bodies are allowed to be any single statement (not just compound statements), like other control flow structures: 88 | ```cpp 89 | switch(x) case 1: case 2: puts("foo"); 90 | ``` 91 | - Case labels do not need to be in the top-level statement sequence 92 | ```cpp 93 | int x = 2; 94 | int i = 0; 95 | switch(x) { 96 | default: 97 | if(foo()) { 98 | while(i++ < 5) { 99 | case 2: 100 | puts("lol"); 101 | } 102 | } 103 | } 104 | ``` 105 | - `"a" + 1 == ""` can technically evaluate to `true`. As can `"a" == "a\0\0"`. 106 | - C and C++ support a set of 107 | [digraph tokens and trigraphs](https://en.wikipedia.org/wiki/Digraphs_and_trigraphs_(programming)#C) and [alternative tokens](https://en.wikipedia.org/wiki/C_alternative_tokens) to 108 | accommodate certain [archaic character sets](https://en.wikipedia.org/wiki/ISO/IEC_646) which rendered some ASCII characters differently. `not` and `not_eq` also exist because some 109 | [EBCDIC character sets](https://en.wikipedia.org/wiki/EBCDIC) didn't have a character that rendered as an exclamation mark. Trigraphs were removed from C++ in C++17 and C in C23, 110 | because they were replaced before tokenization which caused some surprising behavior: 111 | ```cpp 112 | puts("??("); // when trigraphs are supported, this outputs [ instead of ??( 113 | puts("<:"); // outputs <:, there is no way to use digraphs in character constants or string literals 114 | ``` 115 | - ISO C forbids conversion between function and object pointers, and ISO C++ allows implementations to forbid such conversions: 116 | ```cpp 117 | void (*func_ptr)() = dlsym(mylib, "func"); // gcc and clang yield a warning in pedantic mode 118 | ``` 119 | However, if taking the address to the function pointer first, then casting to `void**` and finally dereferencing this pointer again, makes it (usually) work without warnings: 120 | ```cpp 121 | void (*func_ptr)(); 122 | *(void**)&func_ptr = dlsym(mylib, "func"); 123 | ``` 124 | Though this trick gets around the warning, the behavior is undefined due to strict aliasing so it may not work. 125 | - It's possible to declare multiple functions at once and use typedefs / using declarations for signatures: 126 | ```cpp 127 | // declares void foo(int); void* baz(float); 128 | void foo(int), * bar(float); 129 | // declares void foo(); void bar(); 130 | typedef void fn(); // or using fn = void(); 131 | fn foo, bar; 132 | ``` 133 | ### "Special operators" 134 | - ["`-->` operator"](https://stackoverflow.com/q/1642028/15675011), really just a combination of two operators 135 | ```cpp 136 | int x = 10; 137 | while (x --> 0) { // x goes to 0 138 | printf("%d ", x); 139 | } 140 | ``` 141 | - ["Tadpole operator"](https://devblogs.microsoft.com/oldnewthing/20150525-00/?p=45044): 142 | 143 | Syntax | Meaning | Mnemonic 144 | -------|---------|--------- 145 | -~y | y + 1 | Tadpole swimming toward a value makes it bigger 146 | ~-y | y - 1 | Tadpole swimming away from a value makes it smaller 147 | - "Unset operator": `x &~ mask` unsets `mask` bits in `x` 148 | - Boolean identity: `!-!b` 149 | 150 | ### Bugs and Implementation Quirks 151 | - Clang / LLVM internally can start doing non-multiple of 8 arithmetic in its internal representation (even without the 152 | use of `_ExtInt` or `_BitInt`). For example, [this code](https://godbolt.org/z/v49P6W38r) results in 33-bit arithmetic 153 | as a result of the optimizer identifying the loop induction. 154 | 155 | ## C++ 156 | 157 | - The size of an empty struct is `1`. This is because the C++ memory model guarantees disjoint 158 | storages (and thus disjoint addresses) for all distinct objects. 159 | [https://eel.is/c++draft/basic.memobj#intro.object-9.sentence-2](https://eel.is/c++draft/basic.memobj#intro.object-9.sentence-2) 160 | - All types must be deduced the same in an `auto` declarator list. I.e. `auto x = 1, y = 1.5;` is 161 | not allowed. 162 | - What would be idiomatic uses of `malloc` in C are UB in C++ prior to C++20, [more details here](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html#idiomatic-c-code-as-c) 163 | ```cpp 164 | struct S { int x; }; 165 | S* s = malloc(sizeof(S)); 166 | s->x = 1; // an object S hasn't been created and its lifetime hasn't started, placement new is required to make this well-formed 167 | ``` 168 | - C++ supports a set of alternative tokens such as `and`, `or`, `bitand`, `compl`, etc. which are 169 | equivalent to their primary counterparts. Truly, equivalent: 170 | ```cpp 171 | struct S { 172 | S() = default; 173 | S(const S bitand) = delete; 174 | S(S and) = delete; 175 | compl S() = default; 176 | } 177 | void foo() { 178 | alignas(S) unsigned char b[sizeof(S)]; 179 | new (b) S(); 180 | std::launder((S*)b)->compl S(); 181 | } 182 | ``` 183 | - Vexing parse: 184 | ```cpp 185 | // Vexing parse: This isn't a variable, it's a function declaration 186 | T foo(); 187 | // Most vexing parse: This is still a function declaration (taking a T(*)()) 188 | T foo(T()); 189 | // "More vexing parse": 190 | T foo(T((()))); // This is also a function declaration taking a T(*)() 191 | T foo(T (((a)))); // this is a function declaration taking a T 192 | // This is a variable definition 193 | T foo((T())); 194 | ``` 195 | - C++ structs can have stray semicolons: 196 | ```cpp 197 | struct S { ;;;;; }; 198 | ``` 199 | - [Function try-blocks](https://en.cppreference.com/w/cpp/language/function-try-block) are a 200 | convenient way to wrap an entire function body with exception handlers and the only way to catch 201 | exceptions in member initializer lists: 202 | ```cpp 203 | template struct S { 204 | T t; 205 | S(const T& t) try : t(t) { 206 | ... 207 | } catch(...) { 208 | ... 209 | } 210 | }; 211 | ``` 212 | - `noexcept` is both a specifier and operator 213 | ```cpp 214 | void foo() noexcept(noexcept(noexcept(true))) {} 215 | ``` 216 | - `throw()` is the same as `noexcept` in C++17. 217 | - You can write `extern "C++"` as well as `extern "C"`, these are the only two standard linkage 218 | languages, but others can be defined by the implementation. Give us `extern "Python"` and 219 | `extern "Java"`! 220 | - A declaration can have arbitrarily many linkage language specifiers: 221 | ```cpp 222 | extern "C" extern "C++" extern "C" extern "C++" void foo(int) {} 223 | ``` 224 | The innermost specification is used. [https://eel.is/c++draft/dcl.link#5.sentence-2](https://eel.is/c++draft/dcl.link#5.sentence-2) 225 | - The language grammar allows `for`-style `init-statement`s in `switch` and `if` statements, Since 226 | C++17: 227 | ```cpp 228 | switch(int x = foo(); t[x]) { ... } 229 | if(auto [a, b, c] = foo(); c) { ... } 230 | // ranged for allows an init-statement too (just no iteration-expression) 231 | for(auto [vec, map] = foo.bar(); const auto& item : vec) { ... } 232 | ``` 233 | - `while` loops do not support `init-statement`s because that would make them 234 | [just another for loop](https://stackoverflow.com/a/59986173/15675011). 235 | - A `condition` may be a declaration. This allows up to two declarations per `switch`, `if`, or 236 | `for` statement: 237 | ```cpp 238 | if(int x = foo()) { ... } // intended use 239 | if(auto [a, b, c] = foo(); auto x = bar(a, b, c)) { ... } 240 | for(auto [a, b, c] = foo(); int x = baz(); c++) { ... } 241 | ``` 242 | - While an `init-statement` may make array or structured binding declarations, `condition` 243 | declarations may not. I.e. these are not valid: 244 | ```cpp 245 | if(auto [a, b, c] = foo()) { } 246 | if(int arr[] = {1, 2, 3, 4}) { } 247 | ``` 248 | - The following are valid C++ statements: 249 | ```cpp 250 | if(; true) { ... } // empty init-statement 251 | if(false; true) { ... } 252 | if(auto main() -> int; true) { ... } 253 | if(class foobar; true) { ... } 254 | if(typedef int i32; true) { ... } 255 | if(using A = B; true) { ... } // Since C++23 256 | for(struct { int a = 0, b = 100; } s; s.a < s.b; s.a++, s.b--) { ... } 257 | ``` 258 | - We cannot, however, do any of the following: 259 | ```cpp 260 | if(static_assert(true); true) { ... } 261 | if(using namespace std; true) { ... } 262 | if(extern "C" int puts(const char*); true) { puts("hello world"); } 263 | if(friend void operator<<(); true) { ... } // syntactically valid, not semantically valid 264 | ``` 265 | - `goto` is disallowed in `constexpr` functions 266 | - `static` storage local variables are not permitted in constexpr functions until C++23 267 | - Structured bindings can't be used in constexpr declarations 268 | - The following is a valid "hello world" implementation 269 | ```cpp 270 | auto& hello_world = std::cout<<"Hello World"<` is lexed 299 | correctly and not as `std::vector[:std::string>`: 300 | > Otherwise, if the next three characters are <​::​ and the subsequent character is neither : nor >, 301 | > the < is treated as a preprocessing token by itself and not as the first character of the 302 | > alternative token <: 303 | 304 | [https://eel.is/c++draft/lex.pptoken#3.2](https://eel.is/c++draft/lex.pptoken#3.2) 305 | - `std::numeric_limits::max` and related functions are functions because there was originally 306 | concern that some values may not be available at compile time. E.g. 307 | `std::numeric_limits::min` which was dependant on rounding mode. These functions are 308 | `constexpr` since C++11 but at that point it was too late to change them from functions. 309 | - The original proposed syntax for lambdas looked like `<>(int x) : [y] (x + y)` 310 | (what's now `[y](int x) { return x + y; }`). `<&>(x) ( x * y )` or 311 | `<&>(x) -> int { return x * y; }` would have been the syntax for `[&](auto x) { return x * y; }`. 312 | Also, in the original proposal there was no mutable keyword for lambdas. Instead the call operator 313 | was always const and captures were always marked mutable. Initial proposal papers: 314 | [N1958](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1958.pdf), 315 | [N1968](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1968.pdf), 316 | [N2329 (N1968 rev 1)](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2329.pdf). 317 | - `std::string('0', '0')` is a string of 48 `'0'`'s, `std::string{'0', '0'}` is the string `"00"` 318 | - James Bond was [added to the C++ standard in C++17](https://github.com/cplusplus/draft/commit/703d892264af814a64140b17ffe2bf6ae9274dde) 319 | - The C++ standard contains a small poem: 320 | > When writing a specialization, be careful about its location; or to make it compile will be such a trial as to 321 | > kindle its self-immolation. 322 | 323 | https://eel.is/c++draft/temp.spec#temp.expl.spec-8 324 | - CV qualifiers don't apply to objects until [their construction is complete](https://eel.is/c++draft/class.ctor.general#5.sentence-2), 325 | and relatedly there are no cv-qualified constructors 326 | - Array elements, and objects in general, are always destroyed in reverse order of construction. Standard quote for [arrays](https://eel.is/c++draft/class.dtor#14.sentence-5) 327 | - A lambda's `operator()` is automatically `constexpr` if it meets the requirements for a constexpr function [https://eel.is/c++draft/expr.prim.lambda.closure#5.sentence-6](https://eel.is/c++draft/expr.prim.lambda.closure#5.sentence-6) 328 | 329 | ### Bugs and Implementation Quirks 330 | - `decltype(std)` is an `int` in gcc (prior to version 14). Bug reports: 331 | [#1](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100482), 332 | [#2](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101213). 333 | - Prior to gcc 10, `decltype(decltype(decltype))` could be used to generate [exponential error 334 | messages](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92105). 335 | - `typedef int i = 0;` segfaults msvc 336 | - This compiles and links in gcc (prior to version 15, or when compiling outside of pedantic mode) 337 | ```cpp 338 | namespace foobar { 339 | extern "C" int main() { 340 | puts("Hello world!"); 341 | } 342 | } 343 | ``` 344 | - Compiler can't decide which is correct, both are rejected by gcc: 345 | ```cpp 346 | extern extern "C" extern "C++" int x; // accepted by clang (with warning) 347 | extern "C" extern "C++" extern int x; // accepted by cland (with warning) and msvc (no warning) 348 | ``` 349 | GCC is correct. The second is more correct due to `linkage-specification`s, but, it's disallowed to 350 | specify a storage class in a `linkage-specificaiton` 351 | [https://eel.is/c++draft/dcl.link#8.sentence-2](https://eel.is/c++draft/dcl.link#8.sentence-2). 352 | - Double `[[gnu::constructor]]`'s are ignored but they are still allowed on `main` so hello world 353 | prints twice here. 354 | ```cpp 355 | [[gnu::constructor]] [[gnu::constructor]] int main() { 356 | puts("Hello, World!"); 357 | } 358 | ``` 359 | 360 | ## C 361 | 362 | - Source code of [the very first C compiler](https://github.com/mortdeus/legacy-cc). 363 | - An empty struct is UB in C. Standard quote: 6.7.2.1.8 (C11-C23). 364 | - A significant subset of possible identifiers are reserved in C. These include identifiers which 365 | begin with `is` or `to`, `str`, or `mem` followed by a lowercase letter in the global scope. It's 366 | undefined to declare/define a one of these reserved identifiers in the global scope. So, the 367 | following program may 1) print 1, 2) wipe your hard drive, 3) summon cthulhu, 4) other. All are 368 | behaviors are equally correct. 369 | ```c 370 | #include 371 | int iseven(int n) { 372 | return n % 2 == 0; 373 | } 374 | int main() { 375 | printf("%d", iseven(2)); 376 | } 377 | ``` 378 | - Expressions in parameter declarations are evaluated by gcc/clang. Due to sequencing this prints 379 | number 1-10: 380 | ```c 381 | #include 382 | int first = 0; 383 | int main(int, char**); 384 | int main(int a, char *b[(first++ > 8) ? 1 : (main(0, 0) || 1)]) { 385 | printf("%d\n", first--); 386 | } 387 | ``` 388 | - Similarly this is a valid "hello world" program in C 389 | ```c 390 | int main(int a, char *b[puts("Hello World") || 1]) {} 391 | ``` 392 | - `auto` is a keyword in C. Since C23 it can be used to deduce the type in a declaration similar to C++, but with more restrictions. 393 | Before C23, its only standard use was to redundantly specify that a declaration had automatic storage duration. 394 | - `extern const void x;` is valid a valid declaration in C for the same reason `extern struct S s;` is valid - `void` is 395 | an incomplete type 396 | - This is not valid in C++ because incomplete types in general are not allowed in extern declarations, incomplete 397 | class types are specifically explicitly permitted in [[dcl.stc]/7](https://eel.is/c++draft/dcl.stc#7) 398 | - The following is valid C: 399 | ```c 400 | signed _Noreturn const long volatile long static _Atomic inline f(void); 401 | ``` 402 | 403 | ### Bugs and Implementation Quirks 404 | - gcc allows labels before a declaration, or at the end of a compound statement before C23 without warnings: 405 | ```c 406 | switch(x) { case 1: } 407 | switch(x) { default: int y; } 408 | // clang generates warnings for both, gcc generates warnings only when using pedantic 409 | ``` 410 | - This compiles [without error](https://godbolt.org/z/471Eh7sGc) in TCC 411 | ```c 412 | static inline int foo(void) { 413 | [[[[[[[[{{(})); 414 | } 415 | int main(void) { 416 | return _Generic(1, int:0, float:((}}]]]); 417 | } 418 | ``` 419 | 420 | ## Talks 421 | Some talks about C++ oddities: 422 | - [Fun with (user-defined) attributes](https://youtu.be/Pt6oeIpzue4) 423 | - [Can I has grammar?](https://youtu.be/tsG95Y-C14k) 424 | - [C++ WAT](https://youtu.be/rNNnPrMHsAA) 425 | - [Non-conforming C++: the Secrets the Committee Is Hiding From You](https://youtu.be/IAdLwUXRUvg) 426 | --------------------------------------------------------------------------------