├── .gitignore ├── source ├── Standardizing C++.pdf ├── index.rst ├── D2151r0.bs ├── DCanadian.bs ├── P0528r0.cc ├── D1501R0.bs ├── _templates │ └── layout.html ├── bikeshed.bs ├── P1205R0.bs ├── N4509.cc ├── P0152.cc ├── N4509.rst ├── P0908r0.bs ├── P0152R1.rst ├── P0152R0.rst ├── Math.signbit.bs ├── P0476r0.bs ├── p1102r0.bs ├── P0476r1.bs ├── P0502r0.bs ├── P0418r1.bs ├── P0418r2.bs ├── p1119r0.bs ├── N4523.rst ├── P1018R19.bs ├── P0528r1.bs ├── P0528r2.bs ├── P1018r6.bs ├── P1225R0.bs ├── P0154R0.rst ├── P0154R1.rst ├── P0476r2.bs ├── conf.py ├── p0528r3.bs ├── N4522.rst └── P1018r5.bs ├── .travis.yml ├── linkcheck.sh ├── deploy.sh ├── README.md └── Makefile /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | *.html 3 | build/ 4 | -------------------------------------------------------------------------------- /source/Standardizing C++.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jfbastien/papers/HEAD/source/Standardizing C++.pdf -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | sudo: false 2 | # This repository doesn't contain Python code, but it uses Python tooling. 3 | language: python 4 | python: 5 | - "2.7" 6 | install: 7 | - pip install sphinx pygments lxml setuptools --upgrade 8 | - git clone https://github.com/tabatkins/bikeshed.git 9 | - pip install --editable $PWD/bikeshed 10 | - bikeshed update 11 | script: 12 | - make html 13 | - ./linkcheck.sh 14 | notifications: 15 | email: false 16 | -------------------------------------------------------------------------------- /source/index.rst: -------------------------------------------------------------------------------- 1 | C++ standards committee papers by JF Bastien 2 | ============================================ 3 | 4 | Here are a few papers that I've written for the C++ standards committee. This 5 | list isn't comprehensive and currently only contains the papers which I've moved 6 | to github_. 7 | 8 | .. _github: https://github.com/jfbastien/papers 9 | 10 | .. toctree:: 11 | :maxdepth: 1 12 | 13 | N4455 14 | P0152R1 15 | P0154R1 16 | P0153R0 17 | P0193R1 18 | 2016-02 19 | 20 | Previous revisions of the above papers: 21 | 22 | .. toctree:: 23 | :maxdepth: 1 24 | 25 | N4509 26 | P0152R0 27 | N4523 28 | P0154R0 29 | N4522 30 | P0193R0 31 | -------------------------------------------------------------------------------- /linkcheck.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -u 4 | set -x 5 | 6 | out=./build/linkcheck/output.txt 7 | rm -rf $out 8 | 9 | # Ignore linkcheck failures: new documents point to their own github location 10 | # which doesn't exist yet. 11 | make linkcheck 12 | 13 | if [ ! -f $out ]; then 14 | echo "Cannot find $out" 15 | exit 1 16 | fi 17 | 18 | # Manually check failures, discarding self-point failures. The others matter. 19 | # 20 | # The output.txt format is: 21 | # filename.rst:LINE: [broken] https://example.com/path/to/filename.rst: HTTP Error 404: Not Found 22 | grep -v "^\([^:]*\):.*\1" $out | grep "\[broken\]" 23 | if [ $? -eq 0 ]; then 24 | cat $out 25 | exit 1 26 | fi 27 | 28 | exit 0 29 | -------------------------------------------------------------------------------- /deploy.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | set -e 4 | set -u 5 | 6 | # Deploy generated html pages to github.io. 7 | 8 | BUILD=./build/ 9 | HTML=$BUILD/html 10 | DIR=$BUILD/jfbastien.github.io 11 | PAPERS=$DIR/papers 12 | CLONE=git@github.com:jfbastien/jfbastien.github.io.git 13 | 14 | HASH=$(git rev-parse HEAD) 15 | SUBJECT=$(git log -n1 --pretty=format:%s) 16 | 17 | # Hacky reuse of git's require_clean_work_tree. 18 | OPTIONS_SPEC= 19 | LONG_USAGE= 20 | USAGE= 21 | NONGIT_OK= 22 | SUBDIRECTORY_OK= 23 | source $(git --exec-path)/git-sh-setup "" 24 | require_clean_work_tree deploy "Please commit or stash changes." 25 | 26 | # Copy generated html files to the github.io repo. 27 | rm -rf $DIR 28 | mkdir $DIR 29 | git clone $CLONE $DIR 30 | find $HTML/*.html -maxdepth 1 -type f \ 31 | \( -iname "*.html" ! -iname "genindex.html" ! -iname "search.html" \) | \ 32 | xargs -I{} cp {} $PAPERS/ 33 | 34 | # Commit the changes, and deploy them. 35 | pushd $PAPERS 36 | git status 37 | git add "*.html" 38 | git commit -m "Update '$SUBJECT' 39 | 40 | Hash: $HASH" 41 | git push origin master 42 | popd 43 | -------------------------------------------------------------------------------- /source/D2151r0.bs: -------------------------------------------------------------------------------- 1 |
 2 | Title: Language Evolution Issue List
 3 | Shortname: P2151
 4 | Revision: 0
 5 | Audience: EWG
 6 | Status: D
 7 | Group: WG21
 8 | URL: http://wg21.link/P2151r0
 9 | !Source: github.com/jfbastien/papers/blob/master/source/P2151r0.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Date: 2020-04-10
12 | Markup Shorthands: markdown yes
13 | Toggle Diffs: no
14 | No abstract: true
15 | 
16 | 17 | The purpose of this document is to record the status of issues which have come before the Evolution Working Group (EWG) of the INCITS PL22.16 and ISO WG21 C++ Standards Committee. Issues represent potential defects in the C++ Standard. Issues against Core Language, Library, and Library Evolution are tracked separately. 18 | 19 | EWG issues were previously tracked by [[N4539]]. 20 | 21 | This document contains: 22 | 23 | * Evolution issues which are actively being considered by the Evolution Working Group, i.e., issues which have a status of New, Open, Ready, or Review. 24 | * Evolution issues which have have been closed since the document was last updated. 25 | 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # C++ Standard Committee Papers 2 | 3 | Build status: [![Build Status](https://travis-ci.org/jfbastien/papers.svg?branch=master)](https://travis-ci.org/jfbastien/papers) 4 | 5 | Official C++ Standard Committee papers are available from [the C++ mailings][]. 6 | 7 | More information on the C++ Standard Committee is available on 8 | [the Committee site][]. 9 | 10 | I've written a few of these papers and co-authored a few others. 11 | I initially wrote them using reStructuredText, but have now moved to 12 | [bikeshed](https://github.com/tabatkins/bikeshed). Papers in this repository are 13 | final and published when numbered `N` or `P`, and are drafts when numbered 14 | `D`. This is an ISO thing: I can't revise already-published `N` or `P` 15 | papers. The paper revision (the `R` part in `P` numbered papers) has to be 16 | incremented, and a new paper published. 17 | 18 | New paper numbers are obtained through the Committee's Vice-Chair. The Committee's 19 | website details [how to submit proposals][]. 20 | 21 | [the Committee site]: https://isocpp.org/std/the-committee 22 | [the C++ mailings]: http://open-std.org/jtc1/sc22/wg21/docs/papers/ 23 | [how to submit proposals]: https://isocpp.org/std/submit-a-proposal 24 | -------------------------------------------------------------------------------- /source/DCanadian.bs: -------------------------------------------------------------------------------- 1 |
 2 | Title: Canadian friends are not friends
 3 | Shortname: D????
 4 | Revision: 0
 5 | Audience: EWG
 6 | Status: D
 7 | Group: WG21
 8 | URL: http://wg21.link/P????
 9 | !Source: github.com/jfbastien/papers/blob/master/source/DCanadian.bs
10 | Editor: JF Bastien, Woven by Toyota, cxx@jfbastien.com
11 | Editor: Bruno Cardoso Lopes, Meta, bruno.cardoso@gmail.com
12 | Editor: Michael Spencer, Apple, bigcheesegs@gmail.com
13 | Date: 2023-06-13
14 | Markup Shorthands: markdown yes
15 | Toggle Diffs: no
16 | No abstract: true
17 | 
18 | 19 | This paper addresses [[CWG1699]]. 20 | 21 | ``` 22 | import Canadian; // Contains `export class Canadian { class buddy {}; friend struct friendly; };` 23 | 24 | class c { 25 | class n {}; 26 | friend struct friendly; 27 | }; 28 | 29 | void g() { // #2 30 | // 'n' accessible here? 31 | } 32 | 33 | struct friendly { 34 | friend class c::n; // #1 35 | friend void g(); // #2 36 | friend void h(); // #3 37 | friend void f() { c::n(); } // #4 (EDG/MSVC Reject, Clang/GCC Accept) 38 | friend class Canadian::buddy; 39 | friend void ohCanada() { // #5 40 | // Canadian::buddy accessible here? 41 | } 42 | }; 43 | 44 | void h() { // #3 45 | // 'n' accessible here? 46 | } 47 | ``` 48 | -------------------------------------------------------------------------------- /source/P0528r0.cc: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | struct Padded { 8 | char c = 0xFF; 9 | // Padding here. 10 | int i = 0xFEEDFACE; 11 | Padded() = default; 12 | }; 13 | typedef std::atomic Atomic; 14 | typedef std::aligned_storage::type Storage; 15 | 16 | void peek(const char* what, void *into) { 17 | printf("%16s %08x %08x\n", what, *(int*)into, *(1 + (int*)into)); 18 | } 19 | 20 | Storage* create() { 21 | auto* storage = new Storage(); 22 | std::memset(storage, 0xBA, sizeof(Storage)); 23 | asm volatile("":::"memory"); 24 | peek("storage", storage); 25 | return storage; 26 | } 27 | 28 | Atomic* change(Storage* storage) { 29 | // As if we used an allocator which reuses memory. 30 | auto* atomic = new(storage) Atomic; 31 | peek("atomic placed", atomic); 32 | std::atomic_init(atomic, Padded()); // Which bits go in? 33 | peek("atomic init", atomic); 34 | return atomic; 35 | } 36 | 37 | Padded infloop_maybe(Atomic* atomic) { 38 | Padded desired; // Padding unknown. 39 | Padded expected; // Could be different. 40 | peek("desired before", &desired); 41 | peek("expected before", &expected); 42 | peek("atomic before", atomic); 43 | while ( 44 | !atomic->compare_exchange_strong( 45 | expected, 46 | desired // Padding bits added and removed here ˙ ͜ʟ˙ 47 | )); 48 | peek("expected after", &expected); 49 | peek("atomic after", atomic); 50 | return expected; // Maybe changed here as well. 51 | } 52 | 53 | int main() { 54 | auto* storage = create(); 55 | auto* atomic = change(storage); 56 | Padded p = infloop_maybe(atomic); 57 | peek("main", &p); 58 | return 0; 59 | } 60 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = build 9 | 10 | # User-friendly check for sphinx-build 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) 13 | endif 14 | 15 | # Internal variables. 16 | PAPEROPT_a4 = -D latex_paper_size=a4 17 | PAPEROPT_letter = -D latex_paper_size=letter 18 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source 19 | 20 | .PHONY: help clean html linkcheck deploy 21 | 22 | help: 23 | @echo "Please use \`make ' where is one of" 24 | @echo " html to make standalone HTML files" 25 | @echo " linkcheck to check all external links for integrity" 26 | @echo " deploy to deploy to github.io" 27 | 28 | clean: 29 | rm -rf $(BUILDDIR)/* 30 | 31 | html: 32 | echo "Building sphinx sources" 33 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 34 | bikeshed update 35 | find ./source/ -name "*.bs" -type f | xargs -I{} -t -n1 bikeshed spec {} 36 | mv ./source/*.html $(BUILDDIR)/html/ 37 | @echo 38 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 39 | 40 | linkcheck: 41 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 42 | @echo 43 | @echo "Link check complete; look for any errors in the above output " \ 44 | "or in $(BUILDDIR)/linkcheck/output.txt." 45 | 46 | deploy: clean html linkcheck 47 | ./deploy.sh 48 | -------------------------------------------------------------------------------- /source/D1501R0.bs: -------------------------------------------------------------------------------- 1 | 19 | 20 | We’ve gathered input from a variety of folks involved in audio at Apple, and 21 | here is our joint, considered position regarding the `std::audio` proposal in 22 | [[P1386R0]]. 23 | 24 | Audio is important to the Apple ecosystem. The type system, and determinism of 25 | C++ lends itself well to the audio software domain. In the proposal we like the 26 | formalization of data types and algorithms that are common in the audio domain. 27 | However, we are concerned about the audio device interfaces and requiring C++ 28 | systems to have a specific implementation. 29 | 30 | Creating a good interface between software and audio hardware is something that 31 | on the surface seems straightforward, but on a practical system is challenging 32 | to implement correctly. This area has typically been fairly platform-specific or 33 | handled by specialist libraries, and may not be immediately amenable to 34 | standardization. We think it’s best not to standardize audio hardware I/O. 35 | 36 | Instead of attempting to standardize the interface and mechanism of audio 37 | hardware, providing a common representation of audio data could be an area of 38 | exploration that is suited to the language. 39 | -------------------------------------------------------------------------------- /source/_templates/layout.html: -------------------------------------------------------------------------------- 1 | {# 2 | Single-page template. 3 | #} 4 | {%- block doctype -%} 5 | 6 | {%- endblock %} 7 | {%- set titlesuffix = "" %} 8 | 9 | 10 | 11 | 12 | 13 | 14 | 55 | {%- block htmltitle %} 56 | {{ title|striptags|e }}{{ titlesuffix }} 57 | {%- endblock %} 58 | {%- block extrahead %} {% endblock %} 59 | 60 | 61 | {%- block header %}{% endblock %} 62 | {%- block content %} 63 | {%- block document %} 64 |
65 | {% block body %} {% endblock %} 66 |
67 | {%- endblock %} 68 | {%- endblock %} 69 | 70 | 71 | -------------------------------------------------------------------------------- /source/bikeshed.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | Coloring the shed {#colour} 18 | ================= 19 | 20 | Thoughtful standards people put significant effort into writing their 21 | papers. Often, too much of that effort goes into style or 22 | format instead of content. This meta-paper is ironically all 23 | style and no C++ content. It proposes that you stop formatting and start using 24 | bikeshed. 25 | 26 | While we're at it, we'll also propose that you use a public version control 27 | service such as github to make it easier for 28 | reviewers to see how a paper evolved, both while in draft state as well as from 29 | one revision to another. Final papers are meant to be consumed as-is, but your 30 | paper collaborators, editors, or future-self will thank you when performing 31 | archaeology to untangle the inevitable nonsensical part of your final paper. 32 | 33 | To do {#todo} 34 | ===== 35 | 36 | https://github.com/tabatkins/bikeshed/blob/master/docs/quick-start.md 37 | 38 | 1. Basics 39 | - What does the final paper look like? 40 | - What does the source look like? (see section 4.) 41 | - Who uses it? 42 | - Takes care of the boilerplate 43 | 2. Convenience 44 | - Webpages work everywhere 45 | - Readable offline, no downloads 46 | - Unicode Just Works™ (even the EDG wiki now supports it) 47 | 3. Good practice 48 | - github for diffs: easier to track changes 49 | - github integration: auto-generation, etc 50 | 4. markdown + HTML escape hatch 51 | - https://github.com/tabatkins/bikeshed/blob/master/docs/markup.md 52 | - Railroad diagrams 53 | - Code, and syntax highlight 54 | - Toggle diff 55 | 5. Link to other papers 56 | 6. Getting started 57 | - Installing https://github.com/tabatkins/bikeshed/blob/master/docs/install.md 58 | -------------------------------------------------------------------------------- /source/P1205R0.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | Issues {#issues} 18 | ====== 19 | 20 | The C++ Coroutine TS [[N4736]] has issues 31 and 32 listed in [[P0664R5]]: 21 | 22 | > **31.** Add a note warning about thread switching near await and/or `coroutine_handle` wording. 23 | > 24 | > Add a note warning about thread switching near await and/or `coroutine_handle` wording 25 | > 26 | > **32.** Add a normative text making it UB to migrate coroutines between certain kind of execution agents. 27 | > 28 | > Add a normative text making it UB to migrate coroutines between certain kind of execution agents. Clarify that migrating between `std::thread`s is OK. But migrating between CPU and GPU is UB. 29 | 30 | Discussion {#discuss} 31 | ========== 32 | 33 | Using `co_await`, one can teleport a suspended execution between execution agents: 34 | 35 | 36 | thread::id get_an_id() { 37 | 38 | // here: acquire a lock, read thread_local 39 | 40 | co_yield std::this_thread::get_id(); //< one result 41 | 42 | // UB: release the lock, reuse the same thread_local 43 | 44 | co_return std::this_thread::get_id(); //< different result 45 | } 46 | 47 | 48 | We say "teleport" here because the code that relocates the coroutine is outside 49 | the coroutine, in a possibly unrelated part of the program. This teleportation 50 | can take your coroutine to many interesting places, for example: 51 | 52 | 1. the thread that runs `main` 53 | 2. threads from `std::thread` / `std::async` 54 | 3. elemental functions of `std::par`, `std::par_unseq`, `std::unseq` algorithms 55 | 4. global / `thread_local` constructors (see note) 56 | 5. global / `thread_local` / `static` destructors (see note) 57 | 6. functions registered with `at_exit` / `quick_exit` 58 | 7. signal handlers 59 | 8. future `fibers_context` of [[P0876R3]] 60 | 61 | Note that it is presently implementation-defined whether many of these functions 62 | run in a specific thread, a single thread, or in many unspecified threads—see 63 | [[CWG2046]]. 64 | 65 | Proposed Resolution {#resolution} 66 | =================== 67 | 68 | After [[N4736]] [**dcl.fct.def.coroutine**] ❡6: 69 | 70 |
71 | 72 | A suspended coroutine can be resumed to continue execution by invoking a 73 | resumption member function of an object of type `coroutine_handle<P>` 74 | associated with this instance of the coroutine. The function that invoked a 75 | resumption member function is called *resumer*. Invoking a resumption member 76 | function for a coroutine that is not suspended results in undefined behavior. 77 | 78 |
79 | 80 | Add ❡7: 81 | 82 |
83 | 84 | 85 | Resuming a coroutine on an execution agent other than the one it was suspended 86 | on has implementation-defined behavior unless both are instances of 87 | `std::thread`. [*Note*: a coroutine that is moved this way should avoid the use 88 | of `thread_local` or `mutex` objects. — *End note*.] 89 | 90 | 91 |
92 | -------------------------------------------------------------------------------- /source/N4509.cc: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | namespace std { 5 | 6 | namespace detail { 7 | // It is implementation-defined what this returns, as long as: 8 | // 9 | // if (std::atomic::is_always_lock_free) 10 | // assert(std::atomic()::is_lock_free()); 11 | // 12 | // An implementation may therefore have more variable template 13 | // specializations than the ones shown below. 14 | template static constexpr bool is_always_lock_free = false; 15 | 16 | // Implementations must match the C ATOMIC_*_LOCK_FREE macro values. 17 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_BOOL_LOCK_FREE; 18 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE; 19 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE; 20 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE; 21 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR16_T_LOCK_FREE; 22 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR32_T_LOCK_FREE; 23 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_WCHAR_T_LOCK_FREE; 24 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE; 25 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE; 26 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE; 27 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE; 28 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE; 29 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE; 30 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE; 31 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE; 32 | template static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE; 33 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE; 34 | 35 | // The macros do not support float, double, long double, but C++ does 36 | // support atomics of these types. An implementation shall ensure that these 37 | // types, as well as user-defined types, guarantee the above invariant that 38 | // is_always_lock_free implies is_lock_free for the same type. 39 | } 40 | 41 | template 42 | struct atomic_n4509 { 43 | // ... 44 | static constexpr bool is_always_lock_free = detail::is_always_lock_free; 45 | // ... 46 | }; 47 | 48 | } 49 | 50 | template using atomic = std::atomic_n4509; 51 | 52 | int main() { 53 | std::cout << 54 | "bool\t" << atomic::is_always_lock_free << '\n' << 55 | "char\t" << atomic::is_always_lock_free << '\n' << 56 | "signed char\t" << atomic::is_always_lock_free << '\n' << 57 | "unsigned char\t" << atomic::is_always_lock_free << '\n' << 58 | "char16_t\t" << atomic::is_always_lock_free << '\n' << 59 | "char32_t\t" << atomic::is_always_lock_free << '\n' << 60 | "wchar_t\t" << atomic::is_always_lock_free << '\n' << 61 | "short\t" << atomic::is_always_lock_free << '\n' << 62 | "unsigned short\t" << atomic::is_always_lock_free << '\n' << 63 | "int\t" << atomic::is_always_lock_free << '\n' << 64 | "unsigned int\t" << atomic::is_always_lock_free << '\n' << 65 | "long\t" << atomic::is_always_lock_free << '\n' << 66 | "unsigned long\t" << atomic::is_always_lock_free << '\n' << 67 | "long long\t" << atomic::is_always_lock_free << '\n' << 68 | "unsigned long long\t" << atomic::is_always_lock_free << '\n' << 69 | "void*\t" << atomic::is_always_lock_free << '\n' << 70 | "std::nullptr_t\t" << atomic::is_always_lock_free << '\n'; 71 | 72 | return 0; 73 | } 74 | -------------------------------------------------------------------------------- /source/P0152.cc: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | namespace std { 5 | 6 | namespace detail { 7 | // It is implementation-defined what this returns, as long as: 8 | // 9 | // if (std::atomic::is_always_lock_free) 10 | // assert(std::atomic()::is_lock_free()); 11 | // 12 | // An implementation may therefore have more variable template 13 | // specializations than the ones shown below. 14 | template static constexpr bool is_always_lock_free = false; 15 | 16 | // Implementations must match the C ATOMIC_*_LOCK_FREE macro values. 17 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_BOOL_LOCK_FREE; 18 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE; 19 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE; 20 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE; 21 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR16_T_LOCK_FREE; 22 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR32_T_LOCK_FREE; 23 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_WCHAR_T_LOCK_FREE; 24 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE; 25 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE; 26 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE; 27 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE; 28 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE; 29 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE; 30 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE; 31 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE; 32 | template static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE; 33 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE; 34 | 35 | // The macros do not support float, double, long double, but C++ does 36 | // support atomics of these types. An implementation shall ensure that these 37 | // types, as well as user-defined types, guarantee the above invariant that 38 | // is_always_lock_free implies is_lock_free for the same type. 39 | } 40 | 41 | template 42 | struct atomic_n4509 { 43 | // ... 44 | static constexpr bool is_always_lock_free = detail::is_always_lock_free; 45 | // ... 46 | }; 47 | 48 | } 49 | 50 | template using atomic = std::atomic_n4509; 51 | 52 | int main() { 53 | std::cout << 54 | "bool\t" << atomic::is_always_lock_free << '\n' << 55 | "char\t" << atomic::is_always_lock_free << '\n' << 56 | "signed char\t" << atomic::is_always_lock_free << '\n' << 57 | "unsigned char\t" << atomic::is_always_lock_free << '\n' << 58 | "char16_t\t" << atomic::is_always_lock_free << '\n' << 59 | "char32_t\t" << atomic::is_always_lock_free << '\n' << 60 | "wchar_t\t" << atomic::is_always_lock_free << '\n' << 61 | "short\t" << atomic::is_always_lock_free << '\n' << 62 | "unsigned short\t" << atomic::is_always_lock_free << '\n' << 63 | "int\t" << atomic::is_always_lock_free << '\n' << 64 | "unsigned int\t" << atomic::is_always_lock_free << '\n' << 65 | "long\t" << atomic::is_always_lock_free << '\n' << 66 | "unsigned long\t" << atomic::is_always_lock_free << '\n' << 67 | "long long\t" << atomic::is_always_lock_free << '\n' << 68 | "unsigned long long\t" << atomic::is_always_lock_free << '\n' << 69 | "void*\t" << atomic::is_always_lock_free << '\n' << 70 | "std::nullptr_t\t" << atomic::is_always_lock_free << '\n'; 71 | 72 | return 0; 73 | } 74 | -------------------------------------------------------------------------------- /source/N4509.rst: -------------------------------------------------------------------------------- 1 | ================================================== 2 | N4509 ``constexpr atomic::is_always_lock_free`` 3 | ================================================== 4 | 5 | :Author: Olivier Giroux 6 | :Contact: ogiroux@nvidia.com 7 | :Author: JF Bastien 8 | :Contact: jfb@google.com 9 | :Author: Jeff Snyder 10 | :Contact: jeff-isocpp@caffeinated.me.uk 11 | :Date: 2015-05-05 12 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4509.rst 13 | :Source: https://github.com/jfbastien/papers/blob/master/source/N4509.cc 14 | 15 | The current design for ``std::atomic`` affords implementations the critical 16 | freedom to revert to critical sections when hardware support for atomic 17 | operations does not meet the size or semantic requirements for the associated 18 | type ``T``. This: 19 | 20 | * Preserves C++ support on aging hardware. 21 | * Supports developers who don't target a specific architecture e.g. with the 22 | ``-march=xxx`` flag. 23 | * Improves the portability of abstract representations for C++ programs, 24 | e.g. when compiling C++ code to execute portably within a web browser. 25 | 26 | The Standard also ensures that developers can be informed of the 27 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member 28 | and free-functions. This is important because programmers may want to select 29 | algorithm implementations, or even select algorithms, based on this 30 | knowledge. Developers are equally likely to do so for correctness and 31 | performance reasons. 32 | 33 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.** 34 | 35 | There is poor support for static determination of lock-freedom guarantees. 36 | 37 | At the present time the Standard has limited support in this domain: the 38 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the 39 | corresponding atomic type is *always* lock-free, sometimes lock-free or never 40 | lock-free, respectively. These macros are little more than a consolation prize 41 | because they do not work with an arbitrary type ``T`` (as the C++ native 42 | ``std::atomic`` library intends) and they leave adaptation for generic 43 | programming entirely up to the developer. 44 | 45 | This leads to the present, counter-intuitive state of the art whereby 46 | non-traditional uses of C++ have better support than high-performance 47 | computing. We aim to make the smallest possible change that improves the 48 | situation for HPC while leaving all other uses untouched. 49 | 50 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is 51 | suitable for use with SFINAE and ``static_assert``. 52 | 53 | ----------------- 54 | Proposed addition 55 | ----------------- 56 | 57 | Under 29.5 Atomic types [**atomics.types.generic**]: 58 | 59 | .. code-block:: c++ 60 | 61 | namespace std { 62 | template struct atomic { 63 | static constexpr bool is_always_lock_free = /* implementation-defined */; 64 | // Omitting all other members for brevity. 65 | }; 66 | template <> struct atomic { 67 | static constexpr bool is_always_lock_free = /* implementation-defined */; 68 | // Omitting all other members for brevity. 69 | }; 70 | template struct atomic { 71 | static constexpr bool is_always_lock_free = /* implementation-defined */; 72 | // Omitting all other members for brevity. 73 | }; 74 | } 75 | 76 | After paragraph 2: 77 | 78 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's 79 | operations are always lock-free, and false otherwise. The value of 80 | ``is_always_lock_free`` shall be consistent with the value of the corresponding 81 | ``ATOMIC_..._LOCK_FREE`` macro, if defined. 82 | 83 | Under 29.6.5 Requirements for operations on atomic types 84 | [**atomics.types.operations.req**], in paragraph 7: 85 | 86 | The return value of the ``is_lock_free`` member function shall be consistent 87 | with the value of ``is_always_lock_free`` for the same type. 88 | 89 | [*Example:* the following should never fail, 90 | 91 | .. code-block:: c++ 92 | 93 | if (atomic::is_always_lock_free) 94 | assert(atomic().is_lock_free()); 95 | 96 | — *end example*] 97 | 98 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added. 99 | 100 | ------------------- 101 | Additional material 102 | ------------------- 103 | 104 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions 105 | (which the ``is_lock_free`` functions have) because these require a 106 | pointer. This makes the free functions significantly less useful as compile-time 107 | ``constexpr``. 108 | 109 | We show a sample implementation: 110 | 111 | .. literalinclude:: N4509.cc 112 | :language: c++ 113 | :lines: 4-48 114 | -------------------------------------------------------------------------------- /source/P0908r0.bs: -------------------------------------------------------------------------------- 1 | 13 | 14 | The `offsetof` macro, inherited from C and applicable to standard-layout 15 | classes (and, conditionally, other classes) in C++, calculates the layout 16 | offset of a member within a class. `offsetof` is useful for calculating an 17 | object pointer given a pointer to one of its members: 18 | 19 | 20 | 21 | struct link { 22 | ... 23 | }; 24 | 25 | struct container { 26 | link l; 27 | }; 28 | 29 | container* container_from_link(link* x) { 30 | // x is known to be the .l part of some container 31 | uintptr_t x_address = reinterpret_cast<uintptr_t>(x); 32 | size_t l_offset = offsetof(container, l); 33 | return reinterpret_cast<container*>(x_address - l_offset); 34 | } 35 | 36 | 37 | 38 | This pattern is used in several implementations of intrusive containers, such 39 | as Linux kernel linked lists (`struct list_head`). 40 | 41 | Unfortunately, although `offsetof` works for some unusual 42 | member-designators, it does not work for pointers to members. This won’t 43 | compile: 44 | 45 | 46 | 47 | template <typename Container, typename Link, Link (Container::* member)> 48 | Container* generic_container_from_link(Link* x) { 49 | uintptr_t x_address = reinterpret_cast<uintptr_t>(x); 50 | size_t link_offset = offsetof(Container, member); // error! 51 | return reinterpret_cast<Container*>(x_address - link_offset); 52 | } 53 | 54 | 55 | 56 | Programmers currently compute pointer-to-member offsets using `nullptr` casts 57 | (i.e., the incorrect folk implementation of `offsetof`, which invokes 58 | undefined behavior), or by jumping through other hoops: 59 | 60 | 61 | 62 | template <typename Container, typename Link, Link (Container::* member)> 63 | Container* generic_container_from_link(Link* x) { 64 | ... 65 | alignas(Container) char container_space[sizeof(Container)] = {}; 66 | Container* fake_container = reinterpret_cast<Container*>(container_space); 67 | size_t link_offset = reinterpret_cast<uintptr_t>(&(fake_container->*member)) 68 | - reinterpret_cast<uintptr_t>(fake_container); 69 | ... 70 | } 71 | 72 | 73 | 74 | `offsetof` with pointer-to-member member-designators should simply work. 75 | Modern compilers implement `offsetof` using an extension (`__builtin_offsetof` 76 | in GCC and LLVM), so implementation need not require library changes. To avoid 77 | ambiguity, we propose this syntax: 78 | 79 | 80 | 81 | size_t link_offset = offsetof(Container, .*member); 82 | 83 | 84 | 85 | 86 | Questions {#qq} 87 | ========= 88 | 89 | Must a pointer-to-member expression in an `offsetof` member-designator be a 90 | constant expression (such as a template argument)? The C standard requires 91 | that “the expression `&(t.member-designator)` evaluates to an address 92 | constant,” which might make this code illegal: 93 | 94 | 95 | 96 | struct container { 97 | char array[200]; 98 | }; 99 | 100 | int index = /* dynamic value */; 101 | size_t offset = offsetof(container, array[index]); // questionable 102 | 103 | 104 | 105 | But since several current compilers accept dynamic array indexes, the proposed 106 | wording allows any pointer to member. 107 | 108 | 109 | Proposed Wording {#word} 110 | ================ 111 | 112 | In Sizes, alignments, and offsets [**support.types.layout**], modify the first 113 | sentence of ❡1 as follows: 114 | 115 |
116 | 117 | The macro `offsetof(type, member-designator)` has the same semantics as the 118 | corresponding macro in the C standard library header ``, but accepts 119 | a restricted set of `type` arguments and a superset of 120 | `member-designator` arguments in this International Standard. 121 | 122 |
123 | 124 | Add this paragraph after ❡1: 125 | 126 |
127 | 128 | An `offsetof` `member-designator` may contain pointer-to-member 129 | expressions as well as `member-designators` acceptable in C. A 130 | `member-designator` may begin with a prefix `.` or `.*` operator (e.g., 131 | `offsetof(type, .member_name)` or `offsetof(type, .*pointer_to_member)`). If 132 | the prefix operator is omitted, `.` is assumed. 133 | 134 |
135 | 136 | 137 | Example online discussions of the issue {#disc} 138 | ======================================= 139 | 140 | * [LLVMdev] Evaluation of offsetof() macro 141 | * Working around offsetof limitations in C++ 142 | -------------------------------------------------------------------------------- /source/P0152R1.rst: -------------------------------------------------------------------------------- 1 | ==================================================== 2 | P0152R1 ``constexpr atomic::is_always_lock_free`` 3 | ==================================================== 4 | 5 | :Author: Olivier Giroux 6 | :Contact: ogiroux@nvidia.com 7 | :Author: JF Bastien 8 | :Contact: jfb@google.com 9 | :Author: Jeff Snyder 10 | :Contact: jeff-isocpp@caffeinated.me.uk 11 | :Date: 2016-03-02 12 | :Previous: http://wg21.link/N4509 13 | :Previous: http://wg21.link/P0152R0 14 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0152R1.rst 15 | :Source: https://github.com/jfbastien/papers/blob/master/source/P0152.cc 16 | 17 | The current design for ``std::atomic`` affords implementations the critical 18 | freedom to revert to critical sections when hardware support for atomic 19 | operations does not meet the size or semantic requirements for the associated 20 | type ``T``. This: 21 | 22 | * Preserves C++ support on aging hardware. 23 | * Supports developers who don't target a specific architecture e.g. with the 24 | ``-march=xxx`` flag. 25 | * Improves the portability of abstract representations for C++ programs, 26 | e.g. when compiling C++ code to execute portably within a web browser. 27 | 28 | The Standard also ensures that developers can be informed of the 29 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member 30 | and free-functions. This is important because programmers may want to select 31 | algorithm implementations, or even select algorithms, based on this 32 | knowledge. Developers are equally likely to do so for correctness and 33 | performance reasons. 34 | 35 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.** 36 | 37 | There is poor support for static determination of lock-freedom guarantees. 38 | 39 | At the present time the Standard has limited support in this domain: the 40 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the 41 | corresponding atomic type is *always* lock-free, sometimes lock-free or never 42 | lock-free, respectively. These macros are little more than a consolation prize 43 | because they do not work with an arbitrary type ``T`` (as the C++ native 44 | ``std::atomic`` library intends) and they leave adaptation for generic 45 | programming entirely up to the developer. 46 | 47 | This leads to the present, counter-intuitive state of the art whereby 48 | non-traditional uses of C++ have better support than high-performance 49 | computing. We aim to make the smallest possible change that improves the 50 | situation for HPC while leaving all other uses untouched. 51 | 52 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is 53 | suitable for use with SFINAE and ``static_assert``. 54 | 55 | ----------------- 56 | Proposed addition 57 | ----------------- 58 | 59 | Under 29.5 Atomic types [**atomics.types.generic**]: 60 | 61 | .. code-block:: c++ 62 | 63 | namespace std { 64 | template struct atomic { 65 | static constexpr bool is_always_lock_free = implementation-defined; 66 | // Omitting all other members for brevity. 67 | }; 68 | template <> struct atomic { 69 | static constexpr bool is_always_lock_free = implementation-defined; 70 | // Omitting all other members for brevity. 71 | }; 72 | template struct atomic { 73 | static constexpr bool is_always_lock_free = implementation-defined; 74 | // Omitting all other members for brevity. 75 | }; 76 | } 77 | 78 | Under 29.6.5 Requirements for operations on atomic types 79 | [**atomics.types.operations.req**], between paragraphs 6 and 7: 80 | 81 | .. code-block:: c++ 82 | 83 | static constexpr bool is_always_lock_free = implementation-defined; 84 | 85 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's 86 | operations are always lock-free, and false otherwise. 87 | 88 | [*Note:* The value of ``is_always_lock_free`` is consistent with the value of 89 | the corresponding ``ATOMIC_..._LOCK_FREE`` macro, if defined. — *end note*] 90 | 91 | Under 29.6.5 Requirements for operations on atomic types 92 | [**atomics.types.operations.req**], in paragraph 7: 93 | 94 | [*Note:* The return value of the ``is_lock_free`` member function is consistent 95 | with the value of ``is_always_lock_free`` for the same type. — *end note*] 96 | 97 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added. 98 | 99 | ------------------- 100 | Additional material 101 | ------------------- 102 | 103 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions 104 | (which the ``is_lock_free`` functions have) because these require a 105 | pointer. This makes the free functions significantly less useful as compile-time 106 | ``constexpr``. 107 | 108 | We show a sample implementation: 109 | 110 | .. literalinclude:: P0152.cc 111 | :language: c++ 112 | :lines: 4-48 113 | -------------------------------------------------------------------------------- /source/P0152R0.rst: -------------------------------------------------------------------------------- 1 | ==================================================== 2 | P0152R0 ``constexpr atomic::is_always_lock_free`` 3 | ==================================================== 4 | 5 | :Author: Olivier Giroux 6 | :Contact: ogiroux@nvidia.com 7 | :Author: JF Bastien 8 | :Contact: jfb@google.com 9 | :Author: Jeff Snyder 10 | :Contact: jeff-isocpp@caffeinated.me.uk 11 | :Date: 2015-10-21 12 | :Previous: http://wg21.link/N4509 13 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0152R0.rst 14 | :Source: https://github.com/jfbastien/papers/blob/master/source/P0152.cc 15 | 16 | The current design for ``std::atomic`` affords implementations the critical 17 | freedom to revert to critical sections when hardware support for atomic 18 | operations does not meet the size or semantic requirements for the associated 19 | type ``T``. This: 20 | 21 | * Preserves C++ support on aging hardware. 22 | * Supports developers who don't target a specific architecture e.g. with the 23 | ``-march=xxx`` flag. 24 | * Improves the portability of abstract representations for C++ programs, 25 | e.g. when compiling C++ code to execute portably within a web browser. 26 | 27 | The Standard also ensures that developers can be informed of the 28 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member 29 | and free-functions. This is important because programmers may want to select 30 | algorithm implementations, or even select algorithms, based on this 31 | knowledge. Developers are equally likely to do so for correctness and 32 | performance reasons. 33 | 34 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.** 35 | 36 | There is poor support for static determination of lock-freedom guarantees. 37 | 38 | At the present time the Standard has limited support in this domain: the 39 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the 40 | corresponding atomic type is *always* lock-free, sometimes lock-free or never 41 | lock-free, respectively. These macros are little more than a consolation prize 42 | because they do not work with an arbitrary type ``T`` (as the C++ native 43 | ``std::atomic`` library intends) and they leave adaptation for generic 44 | programming entirely up to the developer. 45 | 46 | This leads to the present, counter-intuitive state of the art whereby 47 | non-traditional uses of C++ have better support than high-performance 48 | computing. We aim to make the smallest possible change that improves the 49 | situation for HPC while leaving all other uses untouched. 50 | 51 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is 52 | suitable for use with SFINAE and ``static_assert``. 53 | 54 | ----------------- 55 | Proposed addition 56 | ----------------- 57 | 58 | Under 29.5 Atomic types [**atomics.types.generic**]: 59 | 60 | .. code-block:: c++ 61 | 62 | namespace std { 63 | template struct atomic { 64 | static constexpr bool is_always_lock_free = implementation-defined; 65 | // Omitting all other members for brevity. 66 | }; 67 | template <> struct atomic { 68 | static constexpr bool is_always_lock_free = implementation-defined; 69 | // Omitting all other members for brevity. 70 | }; 71 | template struct atomic { 72 | static constexpr bool is_always_lock_free = implementation-defined; 73 | // Omitting all other members for brevity. 74 | }; 75 | } 76 | 77 | Under 29.6.5 Requirements for operations on atomic types 78 | [**atomics.types.operations.req**], between paragraphs 6 and 7: 79 | 80 | .. code-block:: c++ 81 | 82 | static constexpr bool is_always_lock_free = implementation-defined; 83 | 84 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's 85 | operations are always lock-free, and false otherwise. The value of 86 | ``is_always_lock_free`` shall be consistent with the value of the corresponding 87 | ``ATOMIC_..._LOCK_FREE`` macro, if defined. 88 | 89 | Under 29.6.5 Requirements for operations on atomic types 90 | [**atomics.types.operations.req**], in paragraph 7: 91 | 92 | The return value of the ``is_lock_free`` member function shall be consistent 93 | with the value of ``is_always_lock_free`` for the same type. 94 | 95 | [*Example:* The following should never fail 96 | 97 | .. code-block:: c++ 98 | 99 | if (atomic::is_always_lock_free) 100 | assert(atomic().is_lock_free()); 101 | 102 | — *end example*] 103 | 104 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added. 105 | 106 | ------------------- 107 | Additional material 108 | ------------------- 109 | 110 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions 111 | (which the ``is_lock_free`` functions have) because these require a 112 | pointer. This makes the free functions significantly less useful as compile-time 113 | ``constexpr``. 114 | 115 | We show a sample implementation: 116 | 117 | .. literalinclude:: P0152.cc 118 | :language: c++ 119 | :lines: 4-48 120 | -------------------------------------------------------------------------------- /source/Math.signbit.bs: -------------------------------------------------------------------------------- 1 | 14 | 15 | IEEE 754 has a precise meaning for *sign bit*. JavaScript's `Math.sign` falls 16 | short on `-0.0` and `+0.0`. This is a shortcoming of a "batteries included" 17 | approach to language design. 18 | 19 | Correctly obtaining the sign bit of a Number in JavaScript is somewhat 20 | unintuitive: the naïve `x < 0` approach fails if `x` is `-0.0` because `0.0` and 21 | `-0.0` compare equal to each other. 22 | 23 | One can instead rely on division by zero returning one of `-Infinity` or 24 | `+Infinity`: `1.0 / x < 0`. This now has the interesting caveat of returning 25 | `NaN` if `x` was `NaN`. It's also highly counter-intuitive. 26 | 27 | JavaScript aficionado will know that `Object.is(-0, x)` will return `true` when 28 | `x` is `-0` but not when it's `0`. This is surprising for developers who are 29 | more numerics-oriented than object-—dare I say prototype-?—oriented. These 30 | developers just want the sign bit, IEEE 754 has a very precise definition of 31 | what the sign bit is, and why can't JavaScript just give them the sign bit? 32 | 33 | This issue [has been discussed previously](https://esdiscuss.org/topic/math-sign-vs-0) 34 | but was never addressed. We believe that this proposal can fix this 35 | oft-encountered problem once and for all. 36 | 37 | 38 | Revision History {#rev} 39 | ================ 40 | 41 | * Presented at the [2017-01](https://github.com/tc39/agendas/blob/master/2017/01.md) TC39 meeting and moved to Stage 1. 42 | 43 | 44 | Background {#bg} 45 | ========== 46 | 47 | IEEE 754 {#ieee754} 48 | -------- 49 | 50 | [[IEEE754]] section 5.5.1 defines *sign bit operations*. These operations are 51 | quiet-computational operations which only affect the sign bit of the arithmetic 52 | format. The operations treat floating-point numbers and NaNs alike, and signal 53 | no exception. As defined, they may propagate non-canonical encodings. 54 | 55 | The following operations are defined: 56 | 57 | * `copy` 58 | * `negate` 59 | * `abs` 60 | 61 | C / C++ {#cpp} 62 | ------- 63 | 64 | [[C]] and [[Cpp]] define `signbit` in `` and `` respectively. It 65 | returns a nonzero `int` value if and only if the sign of its argument value is 66 | negative. The `signbit` macro reports the sign of all values, including 67 | infinities, zeros, and NaNs. 68 | 69 | Go {#go} 70 | --- 71 | 72 | [[Go]]'s math package defines `Signbit` as `true` if `x` is negative or negative 73 | zero. While the specification is silent on NaN, 74 | [the implementation](https://golang.org/src/math/signbit.go) clearly extracts the 75 | sign bit regardless of NaN-ness. 76 | 77 | `Math.sign(x)` {#sign} 78 | ----------- 79 | 80 | JavaScript provides `Math.sign` which is specified as follows: 81 | 82 |
83 | 84 | Returns the sign of the x, indicating whether x is positive, negative or zero. 85 | 86 | * If `x` is `NaN`, the result is `NaN`. 87 | * If `x` is `-0`, the result is `-0`. 88 | * If `x` is `+0`, the result is `+0`. 89 | * If `x` is negative and not `-0`, the result is `-1`. 90 | * If `x` is positive and not `+0`, the result is `+1`. 91 | 92 |
93 | 94 | This falls short when dealing with `-0` and `+0` since these values both compare 95 | equal. 96 | 97 | 98 | Proposal {#proposal} 99 | ======== 100 | 101 | Given existing precedent as well as common hardware support, we propose adding 102 | `Math.signbit` to JavaScript as follows. 103 | 104 | `Math.signbit(x)` {#spec} 105 | ----------------- 106 | 107 | Returns whether the sign bit of `x` is set. 108 | 109 | 1. If `x` is `NaN`, the result is `false`. 110 | 1. If `x` is `-0`, the result is `true`. 111 | 1. If `x` is negative, the result is `true`. 112 | 1. Otherwise, the result is `false`. 113 | 114 | Note: The "Function Properties of the Math Object" section already states: 115 | "Each of the following `Math` object functions applies the `ToNumber` abstract 116 | operation to each of its argument." 117 | 118 | Alternatives {#alts} 119 | ------------ 120 | 121 | This proposal makes decisions which TC39 may want to consider modifying: 122 | 123 | * Coercison `ToNumber`. 124 | * The return type is Boolean. 125 | * NaN is equivalent to a positive number. 126 | 127 | 128 |
129 | {
130 |     "IEEE754": {
131 |         "href": "https://standards.ieee.org/findstds/standard/754-2008.html",
132 |         "title": "IEEE 754-2008",
133 |         "publisher": "IEEE Computer Society"
134 |     },
135 |     "C": {
136 |         "href": "http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf",
137 |         "title": "Programming Languages — C",
138 |         "publisher": "ISO/IEC JTC1 SC22 WG14"
139 |     },
140 |     "Cpp": {
141 |         "href": "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf",
142 |         "title": "Programming Languages — C++",
143 |         "publisher": "ISO/IEC JTC1 SC22 WG21"
144 |     },
145 |     "Go": {
146 |         "href": "https://golang.org/pkg/math/",
147 |         "title": "The Go Programming Language — Package math"
148 |     }
149 | }
150 | 
151 | -------------------------------------------------------------------------------- /source/P0476r0.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | Background {#bg} 18 | ========== 19 | 20 | Low-level code often seeks to interpret objects of one type as another: keep the 21 | same bits, but obtain an object of a different type. Doing so correctly is 22 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing 23 | rules yet these are the intuitive solutions developers mistakenly turn to. 24 | 25 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment 26 | pitfalls and allowing them to bit-cast non-default-constructible types. 27 | 28 | This facility inevitably ends up being used incorrectly on pointer types, we 29 | propose using appropriate concepts to prevent misuse. As our sample 30 | implementation demonstrates we could as well use `static_assert` or template 31 | SFINAE, but the timing of this library feature will likely coincide with 32 | concept's standardization. 33 | 34 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast 35 | function, as `memcpy` itself isn't `constexpr`. Marking our proposed function as 36 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`. This 37 | leaves implementations free to use their own internal solution (e.g. LLVM has a `bitcast` 39 | opcode). 40 | 41 | We propose to standardize this oft-used idiom, and avoid the pitfalls once and 42 | for all. 43 | 44 | Proposed Wording {#word} 45 | ================ 46 | 47 | Below, substitute the `�` character with a number the editor finds appropriate 48 | for the sub-section. 49 | 50 | Synopsis {#syn} 51 | -------- 52 | 53 | Under 20.2 Header `` synopsis [**utility**]: 54 | 55 | 56 | namespace std { 57 | // ... 58 | 59 | // 20.2.� bit-casting: 60 | template<typename To, typename From> 61 | requires 62 | sizeof(To) == sizeof(From) && 63 | is_trivially_copyable_v<To> && 64 | is_trivially_copyable_v<From> && 65 | is_standard_layout_v<To> && 66 | is_standard_layout_v<From> && 67 | !(is_pointer_v<From> && 68 | is_pointer_v<To>) && 69 | !(is_member_pointer_v<From> && 70 | is_member_pointer_v<To>) && 71 | !(is_member_object_pointer_v<From> && 72 | is_member_object_pointer_v<To>) && 73 | !(is_member_function_pointer_v<From> && 74 | is_member_function_pointer_v<To>) 75 | constexpr To bit_cast(const From& from) noexcept; 76 | 77 | // ... 78 | } 79 | 80 | 81 | Details {#det} 82 | ------- 83 | 84 | Under 20.2.`�` Bit-casting [**utility.bitcast**]: 85 | 86 | 87 | template<typename To, typename From> 88 | requires 89 | sizeof(To) == sizeof(From) && 90 | is_trivially_copyable_v<To> && 91 | is_trivially_copyable_v<From> && 92 | is_standard_layout_v<To> && 93 | is_standard_layout_v<From> && 94 | !(is_pointer_v<From> && 95 | is_pointer_v<To>) && 96 | !(is_member_pointer_v<From> && 97 | is_member_pointer_v<To>) && 98 | !(is_member_object_pointer_v<From> && 99 | is_member_object_pointer_v<To>) && 100 | !(is_member_function_pointer_v<From> && 101 | is_member_function_pointer_v<To>) 102 | constexpr To bit_cast(const From& from) noexcept; 103 | 104 | 105 | 1. Requires: `sizeof(To) == sizeof(From)`, 106 | `is_trivially_copyable_v` is `true`, 107 | `is_trivially_copyable_v` is `true`, 108 | `is_standard_layout_v` is `true`, 109 | `is_standard_layout_v` is `true`, 110 | `is_pointer_v && is_pointer_v` is `false`, 111 | `is_member_pointer_v && is_member_pointer_v` is `false`, 112 | `is_member_object_pointer_v && is_member_object_pointer_v` is `false`, 113 | `is_member_function_pointer_v && is_member_function_pointer_v` is `false`. 114 | 115 | 2. Returns: an object of type `To` whose object representation is equal 116 | to the object representation of `From`. If multiple object 117 | representations could represent the value 118 | representation of `From`, then it is unspecified which `To` 119 | value is returned. If no value representation corresponds 120 | to `To`'s object representation then the returned value is 121 | unspecified. 122 | 123 | Feature testing {#test} 124 | --------------- 125 | 126 | The `__cpp_lib_bit_cast` feature test macro should be added. 127 | 128 | Appendix {#appendix} 129 | ======== 130 | 131 | The Standard's [**basic.types**] section explicitly blesses `memcpy`: 132 | 133 |
134 | 135 | For any trivially copyable type `T`, if two pointers to `T` point to distinct 136 | `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class 137 | subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into 138 | `obj2`, `obj2` shall subsequently hold the same value as `obj1`. 139 | 140 | [*Example:* 141 | ``` 142 | T* t1p; 143 | T* t2p; 144 | // provided that t2p points to an initialized object ... 145 | std::memcpy(t1p, t2p, sizeof(T)); 146 | // at this point, every subobject of trivially copyable type in *t1p contains 147 | // the same value as the corresponding subobject in *t2p 148 | ``` 149 | — *end example*] 150 | 151 |
152 | 153 | Whereas section [class.union] says: 154 | 155 |
156 | 157 | In a union, at most one of the non-static data members can be 158 | active at any time, that is, the value of at most one of the 159 | non-static data members can be stored in a union at any time. 160 | 161 |
162 | 163 | Acknowledgement {#ack} 164 | =============== 165 | 166 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review 167 | and suggested improvements. 168 | -------------------------------------------------------------------------------- /source/p1102r0.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | Introduction and motivation {#intro} 18 | =========================== 19 | 20 | Currently, C++ lambdas with no parameters do not require a parameter declaration 21 | clause. The specification even contains this language in [**expr.prim.lambda**] 22 | section 8.4.5 ❡4: 23 | 24 | > If a lambda-expression does not include a lambda-declarator, it is as if the 25 | > lambda-declarator were `()`. 26 | 27 | This allows us to omit the unused `()` in simple lambdas such as this: 28 | 29 | 30 | std::string s1 = "abc"; 31 | auto withParen = [s1 = std::move(s1)] () { 32 | std::cout << s1 << '\n'; 33 | }; 34 | 35 | std::string s2 = "abc"; 36 | auto noSean = [s2 = std::move(s2)] { // Note no syntax error. 37 | std::cout << s2 << '\n'; 38 | }; 39 | 40 | 41 | These particular lambdas have ownership of the strings, so they ought to be able 42 | to mutate it, but `s1` and `s2` are const (because the `const` operator is 43 | declared `const` by default) so we need to add the `mutable` keyword: 44 | 45 | 46 | std::string s1 = "abc"; 47 | auto withParen = [s1 = std::move(s1)] () mutable { 48 | s1 += "d"; 49 | std::cout << s1 << '\n'; 50 | }; 51 | 52 | std::string s2 = "abc"; 53 | auto noSean = [s2 = std::move(s2)] mutable { // Currently a syntax error. 54 | s2 += "d"; 55 | std::cout << s2 << '\n'; 56 | }; 57 | 58 | 59 | Confusingly, the current Standard requires the empty parens when using the 60 | `mutable` keyword. This rule is unintuitive, causes common syntax errors, and 61 | clutters our code. When compiling with clang, we even get a syntax error that 62 | indicates the compiler knows exactly what is going on: 63 | 64 | 65 | example.cpp:11:54: error: lambda requires '()' before 'mutable' 66 | auto noSean = [s2 = std::move(s2)] mutable { // Currently a syntax error. 67 | ^ 68 | () 69 | 1 error generated. 70 | 71 | 72 | This proposal would make these parentheses unnecessary like they were before we 73 | added `mutable`. This will apply to: 74 | 75 | * lambda template parameters 76 | * `constexpr` 77 | * `mutable` 78 | * Exception specifications and `noexcept` 79 | * attributes 80 | * trailing return types 81 | * `requires` 82 | 83 | EWG discussed this change as [[EWG135]] 84 | in [Lenexa](http://wiki.edg.com/bin/view/Wg21lenexa/EWGIssuesResolutionMinutes) 85 | and voted 15 to 1 on forwarding to core. It became [[CWG2121]], discussed 86 | in 87 | [Kona](http://wiki.edg.com/bin/view/Wg21kona2015/CoreWorkingGroup#CWG_2121_More_flexible_lambda_sy) and 88 | needed someone to volunteer wording. 89 | 90 | This paper was discussed on the EWG reflector in June, Nina Ranns provided 91 | feedback, and EWG chair agreed that the paper should move to CWG directly given 92 | previous polls. 93 | 94 | 95 | Impact {#impact} 96 | ====== 97 | 98 | This change will not break existing code. 99 | 100 | 101 | Wording {#word} 102 | ======= 103 | 104 | Modify Lambda expressions [**expr.prim.lambda**] as follows: 105 | 106 |
107 | 108 | 113 | 114 | lambda-expression :
115 | lambda-introducer lambda-declarator requires-clauseopt compound-statement
116 | lambda-introducer < template-parameter-list > requires-clauseopt compound-statement
117 | lambda-introducer < template-parameter-list > requires-clauseopt
118 | lambda-declarator requires-clauseopt compound-statement
119 | lambda-introducer :
120 | [ lambda-captureopt ]
121 | lambda-declarator :
122 | ( parameter-declaration-clause )opt decl-specifier-seqopt
123 | noexcept-specifieropt attribute-specifier-seqopt trailing-return-typeopt
124 |
125 | 126 |
127 | 128 | Modify ❡4: 129 | 130 |
131 | 132 | If a *lambda-expression**lambda-declarator* does not 133 | include a *lambda-declarator*`(` *parameter-declaration-clause* 134 | `)`, it is as if the *lambda-declarator*`(` 135 | *parameter-declaration-clause* `)` were `()`. The lambda return type is 136 | `auto`, which is replaced by the type specified by the *trailing-return-type* if 137 | provided and/or deduced from `return` statements as described in 10.1.7.4. 138 | 139 |
140 | 141 | Keep Closure types [**expr.prim.lambda.closure**] ❡3 as-is: 142 | 143 |
144 | 145 | The return type and function parameters of the function call operator template 146 | are derived from the *lambda-expression*'s *trailing-return-type* and 147 | *parameter-declaration-clause* by replacing each occurrence of `auto` in the 148 | *decl-specifier*s of the *parameter-declaration-clause* with the name of the 149 | corresponding invented *template-parameter*. The *requires-clause* of the 150 | function call operator template is the *requires-clause* immediately following 151 | `<` *template-parameter-list* `>`, if any. The trailing *requires-clause* of 152 | the function call operator or operator template is the *requires-clause* 153 | following the *lambda-declarator*, if any. 154 | 155 |
156 | 157 | Note: The first sentence can remain as-is because the modification to 158 | **[expr.prim.lambda**] ❡4 create an empty *parameter-declaration-clause* if 159 | none is provided. Similarly, the second and third sentences bind the 160 | *requires-clause* unambiguously. 161 | -------------------------------------------------------------------------------- /source/P0476r1.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | 18 | This paper is a revision of [[P0476r0]], addressing LEWG comments from the 2016 19 | Issaquah meeting. See [[#rev]] for details. 20 | 21 | 22 | Background {#bg} 23 | ========== 24 | 25 | Low-level code often seeks to interpret objects of one type as another: keep the 26 | same bits, but obtain an object of a different type. Doing so correctly is 27 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing 28 | rules yet these are the intuitive solutions developers mistakenly turn to. 29 | 30 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment 31 | pitfalls and allowing them to bit-cast non-default-constructible types. 32 | 33 | This proposal uses appropriate concepts to prevent misuse. As the sample 34 | implementation demonstrates we could as well use `static_assert` or template 35 | SFINAE, but the timing of this library feature will likely coincide with 36 | concept's standardization. 37 | 38 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast 39 | function, as `memcpy` itself isn't `constexpr`. Marking the proposed function as 40 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`. This 41 | leaves implementations free to use their own internal solution (e.g. LLVM has a `bitcast` 43 | opcode). 44 | 45 | We should standardize this oft-used idiom, and avoid the pitfalls once and for 46 | all. 47 | 48 | 49 | Proposed Wording {#word} 50 | ================ 51 | 52 | Below, substitute the `�` character with a number the editor finds appropriate 53 | for the sub-section. 54 | 55 | Synopsis {#syn} 56 | -------- 57 | 58 | Under 20.2 Header `` synopsis [**utility**]: 59 | 60 | 61 | namespace std { 62 | // ... 63 | 64 | // 20.2.� bit-casting: 65 | template<typename To, typename From> 66 | requires 67 | sizeof(To) == sizeof(From) && 68 | is_trivially_copyable_v<To> && 69 | is_trivially_copyable_v<From> 70 | constexpr To bit_cast(const From& from) noexcept; 71 | 72 | // ... 73 | } 74 | 75 | 76 | Details {#det} 77 | ------- 78 | 79 | Under 20.2.`�` Bit-casting [**utility.bitcast**]: 80 | 81 | 82 | template<typename To, typename From> 83 | requires 84 | sizeof(To) == sizeof(From) && 85 | is_trivially_copyable_v<To> && 86 | is_trivially_copyable_v<From> 87 | constexpr To bit_cast(const From& from) noexcept; 88 | 89 | 90 | 1. Requires: `sizeof(To) == sizeof(From)`, 91 | `is_trivially_copyable_v` is `true`, 92 | `is_trivially_copyable_v` is `true`. 93 | 94 | 2. Returns: an object of type `To` whose object representation is equal 95 | to the object representation of `From`. If multiple object 96 | representations could represent the value 97 | representation of `From`, then it is unspecified which `To` 98 | value is returned. If no value representation corresponds 99 | to `To`'s object representation then the returned value is 100 | unspecified. 101 | 102 | Feature testing {#test} 103 | --------------- 104 | 105 | The `__cpp_lib_bit_cast` feature test macro should be added. 106 | 107 | Appendix {#appendix} 108 | ======== 109 | 110 | The Standard's [**basic.types**] section explicitly blesses `memcpy`: 111 | 112 |
113 | 114 | For any trivially copyable type `T`, if two pointers to `T` point to distinct 115 | `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class 116 | subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into 117 | `obj2`, `obj2` shall subsequently hold the same value as `obj1`. 118 | 119 | [*Example:* 120 | ``` 121 | T* t1p; 122 | T* t2p; 123 | // provided that t2p points to an initialized object ... 124 | std::memcpy(t1p, t2p, sizeof(T)); 125 | // at this point, every subobject of trivially copyable type in *t1p contains 126 | // the same value as the corresponding subobject in *t2p 127 | ``` 128 | — *end example*] 129 | 130 |
131 | 132 | Whereas section [class.union] says: 133 | 134 |
135 | 136 | In a union, at most one of the non-static data members can be 137 | active at any time, that is, the value of at most one of the 138 | non-static data members can be stored in a union at any time. 139 | 140 |
141 | 142 | 143 | Revision History {#rev} 144 | ================ 145 | 146 | r0 ➡ r1 {#r0r1} 147 | -------- 148 | 149 | The paper was reviewed by LEWG at the 2016 Issaquah meeting: 150 | 151 | * Remove the standard layout requirement—trivially copyable suffices for the `memcpy` requirement. 152 | * We discussed removing `constexpr`, but there was no consent either way. There was some suggestion that it’ll be hard for implementers, but there's also some desire (by the same implementers) to have those features available in order to support things like `constexpr` instances of `std::variant`. 153 | * The pointer-forbidding logic was removed. It was initially there to help developers when a better tool is available, but it's easily worked around (e.g. with a `struct` containing a pointer). Note that this doesn't prevent `constexpr` versions of `bit_cast`: the implementation is allowed to error out on `bit_cast` of pointer. 154 | * Some discussion about concepts-usage, but it seems like mostly an LWG issue and we're reasonably sure that concepts will land before this or in a compatible vehicle. 155 | 156 | Straw polls: 157 | 158 | * Do we want to see [[P0476r0]] again? unanimous consent. 159 | * `bit_cast` should allow pointer types in `To` and `From`. **SF F N A SA** 4 5 4 2 1 160 | * `bit_cast` should be `constexpr`? **SF F N A SA** 4 3 7 2 3 161 | 162 | 163 | Acknowledgement {#ack} 164 | =============== 165 | 166 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review 167 | and suggested improvements. 168 | -------------------------------------------------------------------------------- /source/P0502r0.bs: -------------------------------------------------------------------------------- 1 | 18 | 19 | Background {#bg} 20 | ========== 21 | 22 | The Standard was simplified in [[P0394r4]]: exceptions leaving parallel algorithms lead to `std::terminate()` being called. This matches the behavior of exceptions leaving `main()` as well as `std::thread()`. 23 | 24 | The following National Body comments from [[P0488R0]] were discussed in SG1 at Issaquah, along with [[p0451r0]]: 25 | 26 | * US 15, US 167: Don't `terminate()` when a parallel algorithm exits via uncaught exception and either re-add `exception_list`, add `noexcept` policies + re-add `exception_list`, make it UB or throw an unspecified exception (revert [[P0394r4]]). 27 | * US 17, US 169: Don't `terminate()` when a parallel algorithm exits via uncaught exception and re-add `exception_list` (revert [[P0394r4]]). 28 | * US 16, US 168: Clarify which exception is thrown when a parallel algorithm exits via uncaught exception. 29 | * US 170: Add a customization point for `ExecutionPolicy`s which defines their exception handling behavior (don't re-add `exception_list`). 30 | * CA 17: Preserve the `terminate()`-on-uncaught-exception behavior in the parallel algorithms (keep [[P0394r4]]). 31 | 32 | Straw Polls {#straw} 33 | ----------- 34 | 35 | The following straw polls were taken: 36 | 37 | **Straw Poll A:** In 25.2.4 ❡2, have uncaught exception behavior be defined by `ExecutionPolicy`. In 20.19 define the behavior for the three standard policies in C++17 (`seq`, `par`, `par_unseq`) as `terminate()`. 38 | 39 | 40 | 41 | 42 |
**SF****F****N****A****SA**
Many7110
43 | 44 | ⟹ Consensus to write a paper for this before the end of the week. Bryce, JF, and Carter will write it. 45 | 46 | **Straw Poll B:** Do we want to rename the policies to reflect the fact that they call `terminate()` instead of throwing exceptions. 47 | 48 | 49 | 50 | 51 |
**SF****F****N****A****SA**
17967
52 | 53 | ⟹ No consensus for change. 54 | 55 | **Straw Poll C:** Beyond the changes from the first straw poll, additional changes are required. 56 | 57 | 58 | 59 | 60 |
**SF****F****N****A****SA**
2010116
61 | 62 | ⟹ No consensus for change. 63 | 64 | Action {#boom} 65 | ------ 66 | 67 | This paper follows the guidance from *straw poll A*: there is no behavior change, but the behavior is specified to allow future execution policies which exhibit different behavior. 68 | 69 | 70 | Proposed Wording {#word} 71 | ================ 72 | 73 | Apply the following edits to section 15.5.1 ❡1 note, bullet 1.13: 74 | 75 |
76 | 77 | 15.5.1 The `std::terminate()` function [**except.terminate**] 78 | 79 | 1. In some situations exception handling must be abandoned for less subtle error handling techniques. [ *Note:* These situations are: 80 | 81 | […] 82 | 83 | (1.13) — for parallel algorithms whose `ExecutionPolicy` specify such behavior (20.19.4, 20.19.5, 20.19.6), when execution of an element access function (25.2.1) of a parallel algorithm exits via an exception (25.2.4), or 84 | 85 | […] 86 | 87 | *— end note* ] 88 | 89 |
90 | 91 | Apply the following edits to section 20.19: 92 | 93 |
94 | 95 | 20.19.4 Sequential execution policy [**execpol.seq**] 96 | 97 | class execution::sequenced_policy { unspecified }; 98 | 99 | 1. The class `execution::sequenced_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm’s execution may not be parallelized. 100 | 2. During the execution of a parallel algorithm with the `execution::sequenced_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called. 101 | 102 | 20.19.5 Parallel execution policy [**execpol.par**] 103 | 104 | class execution::parallel_policy { unspecified }; 105 | 106 | 1. The class `execution::parallel_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm’s execution may be parallelized. 107 | 2. During the execution of a parallel algorithm with the `execution::parallel_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called. 108 | 109 | 20.19.6 Parallel+Vector execution policy [**execpol.vec**] 110 | 111 | class execution::parallel_unsequenced_policy { unspecified }; 112 | 113 | 1. The class `execution::parallel_unsequenced_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm’s execution may be parallelized and vectorized. 114 | 2. During the execution of a parallel algorithm with the `execution::parallel_unsequenced_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called. 115 | 116 |
117 | 118 | Apply the following edits to section 25.2.4 [**algorithms.parallel.exceptions**] ❡2: 119 | 120 |
121 | 122 | During the execution of a parallel algorithm, if the invocation of an element access function exits via an uncaught exception, the behavior is determined by the `ExecutionPolicy`.`terminate()` is called. 123 | 124 |
125 | 126 | 127 | Acknowledgement {#ack} 128 | =============== 129 | 130 | Thank you to all SG1 participants: David Sankel, Alisdair Meredith, Hartmut Kaiser, Pablo Halpern, Jared Hoberock, Michael Wong, Pete Becker. Special thanks to the scribe Paul McKenney. 131 | -------------------------------------------------------------------------------- /source/P0418r1.bs: -------------------------------------------------------------------------------- 1 | 17 | 18 | Background {#bg} 19 | ========== 20 | 21 | [[LWG2445]] was discussed and resolved by SG1 in Urbana. 22 | 23 | LWG issue #2445 {#issue} 24 | --------------- 25 | 26 |
27 | 28 | The definitions of compare and exchange in [util.smartptr.shared.atomic] ¶32 29 | and [atomics.types.operations.req] ¶21 state: 30 | 31 |
32 | 33 | Requires: The failure argument shall not be `memory_order_release` nor 34 | `memory_order_acq_rel`. The failure argument shall be no stronger than the 35 | success argument. 36 | 37 |
38 | 39 | The term "stronger" isn't defined by the standard. 40 | 41 | It is hinted at by [atomics.types.operations.req] ¶22: 42 | 43 |
44 | 45 | When only one `memory_order` argument is supplied, the value of `success` is 46 | `order`, and the value of `failure` is `order` except that a value of 47 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire` 48 | and a value of `memory_order_release` shall be replaced by the value 49 | `memory_order_relaxed`. 50 | 51 |
52 | 53 | Should the standard define a partial ordering for memory orders, where consume 54 | and acquire are incomparable with release? 55 | 56 |
57 | 58 | Proposed SG1 resolution from Urbana {#old-res} 59 | ----------------------------------- 60 | 61 | Add the following note: 62 | 63 |
64 | 65 | [Note: Memory orders have the following relative strengths implied by their 66 | definitions: 67 | 68 |
 69 |     T: relaxed
 70 |     Choice:
 71 |         T: release
 72 |         Sequence:
 73 |             T: consume
 74 |             T: acquire
 75 |     T: acq_rel
 76 |     T: seq_cst
 77 | 
78 | 79 | —end note] 80 | 81 |
82 | 83 | Further issue {#moar} 84 | ------------- 85 | 86 | Nonetheless: 87 | 88 | * The resolution isn't on the LWG tracker. 89 | * The proposed note was never moved to the draft Standard. 90 | 91 | Furthermore, the resolution which SG1 came to in Urbana resolves what "stronger" 92 | means by specifying a lattice, but isn't not clear on what "The failure argument 93 | shall be no stronger than the success argument" means given the lattice. 94 | 95 | There is no relationship, "stronger" or otherwise, between release and 96 | consume/acquire. The current wording says "shall be no stronger" which isn't the 97 | same as "shall not be stronger" in this context. Is that on purpose? At a 98 | minimum it's not clear and should be clarified. 99 | 100 | Should the following be valid: 101 | 102 | ``` 103 | compare_exchange_strong(x, y, z, memory_order_release, memory_order_acquire); 104 | ``` 105 | 106 | Or does the code need to be: 107 | 108 | ``` 109 | compare_exchange_strong(x, y, z, memory_order_acq_rel, memory_order_acquire); 110 | ``` 111 | 112 | Similar questions can be asked for `memory_order_consume` ordering on `failure`. 113 | 114 | Is there even a point in restricting `success`/`failure` orderings? On 115 | architectures with load-linked/store-conditional instructions the load and store 116 | are distinct instructions which can each have their own memory ordering (with 117 | appropriate leading/trailing fences if required), whereas architectures with 118 | compare-and-exchange already have a limited set of instructions to choose 119 | from. The current limitation (assuming [[LWG2445]] is resolved) only seems to 120 | restrict compilers on load-linked/store-conditional architectures. 121 | 122 | The following code could be valid if the stored data didn't need to be published 123 | nor ordered, whereas any retry needs to read additional data: 124 | 125 | ``` 126 | compare_exchange_strong(x, y, z, memory_order_relaxed, memory_order_acquire); 127 | ``` 128 | 129 | Even if—for lack of clever instruction—architectures cannot take advantage of 130 | such code, compiler are able to optimize atomics in all sorts of clever ways as 131 | discussed in [[N4455]]. 132 | 133 | Updated proposal {#new-res} 134 | ================ 135 | 136 | This paper proposes removing the "stronger" restrictions between 137 | compare-exchange's `success` and `failure` ordering, and doesn't add a lattice 138 | to order atomic orderings. The only remaining restriction is that 139 | `memory_order_release` and `memory_order_acq_rel` for `failure` are still 140 | disallowed: a failed compare-exchange doesn't store, the current model is 141 | therefore not sensible with these orderings. 142 | 143 | There have been discussions about `memory_order_release` loads, e.g. for 144 | seqlock. Such potential changes are left up to future papers. 145 | 146 | Modify [util.smartptr.shared.atomic] ¶32 as follows: 147 | 148 |
149 | 150 | Requires: The failure argument shall not be `memory_order_release` nor 151 | `memory_order_acq_rel`. The failure argument shall be no stronger than 152 | the success argument. 153 | 154 |
155 | 156 | Modify [atomics.types.operations.req] ¶21 as follows: 157 | 158 |
159 | 160 | Requires: The failure argument shall not be `memory_order_release` nor 161 | `memory_order_acq_rel`. The failure argument shall be no stronger than 162 | the success argument. 163 | 164 |
165 | 166 | Leave [atomics.types.operations.req] ¶22 as-is: 167 | 168 |
169 | 170 | Effects: Atomically, compares the contents of the memory pointed to by 171 | `object` or by `this` for equality with that in `expected`, and if `true`, 172 | replaces the contents of the memory pointed to by `object` or by `this` with 173 | that in `desired`, and if `false`, updates the contents of the memory in 174 | `expected` with the contents of the memory pointed to by `object` or by 175 | `this`. Further, if the comparison is `true`, memory is affected according to 176 | the value of `success`, and if the comparison is `false`, memory is affected 177 | according to the value of `failure`. 178 | 179 | When only one `memory_order` argument is supplied, the value of `success` is 180 | `order`, and the value of `failure` is `order` except that a value of 181 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire` 182 | and a value of `memory_order_release` shall be replaced by the value 183 | `memory_order_relaxed`. 184 | 185 | If the operation returns `true`, these operations are atomic read-modify-write 186 | operations (1.10). Otherwise, these operations are atomic load operations. 187 | 188 |
189 | 190 | Acknowledgement {#ack} 191 | =============== 192 | 193 | Thanks to John McCall for pointing out that the proposed resolution was still 194 | insufficient, and for providing ample feedback. 195 | -------------------------------------------------------------------------------- /source/P0418r2.bs: -------------------------------------------------------------------------------- 1 | 17 | 18 | Background {#bg} 19 | ========== 20 | 21 | [[LWG2445]] was discussed and resolved by SG1 in Urbana. 22 | 23 | This revision updates [[P0418r1]] with accurate wording for 24 | [util.smartptr.shared.atomic] ¶32, to be deleted from [[N4606]]. 25 | 26 | LWG issue #2445 {#issue} 27 | --------------- 28 | 29 |
30 | 31 | The definitions of compare and exchange in [util.smartptr.shared.atomic] 32 | ¶32 and [atomics.types.operations.req] ¶21 state: 33 | 34 |
35 | 36 | Requires: The failure argument shall not be `memory_order_release` nor 37 | `memory_order_acq_rel`. The failure argument shall be no stronger than the 38 | success argument. 39 | 40 |
41 | 42 | The term "stronger" isn't defined by the standard. 43 | 44 | It is hinted at by [atomics.types.operations.req] ¶22: 45 | 46 |
47 | 48 | When only one `memory_order` argument is supplied, the value of `success` is 49 | `order`, and the value of `failure` is `order` except that a value of 50 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire` 51 | and a value of `memory_order_release` shall be replaced by the value 52 | `memory_order_relaxed`. 53 | 54 |
55 | 56 | Should the standard define a partial ordering for memory orders, where consume 57 | and acquire are incomparable with release? 58 | 59 |
60 | 61 | Proposed SG1 resolution from Urbana {#old-res} 62 | ----------------------------------- 63 | 64 | Add the following note: 65 | 66 |
67 | 68 | [ *Note:* Memory orders have the following relative strengths implied by their 69 | definitions: 70 | 71 |
 72 |     T: relaxed
 73 |     Choice:
 74 |         T: release
 75 |         Sequence:
 76 |             T: consume
 77 |             T: acquire
 78 |     T: acq_rel
 79 |     T: seq_cst
 80 | 
81 | 82 | *—end note* ] 83 | 84 |
85 | 86 | Further issue {#moar} 87 | ------------- 88 | 89 | Nonetheless: 90 | 91 | * The resolution isn't on the LWG tracker. 92 | * The proposed note was never moved to the draft Standard. 93 | 94 | Furthermore, the resolution which SG1 came to in Urbana resolves what "stronger" 95 | means by specifying a lattice, but isn't not clear on what "The failure argument 96 | shall be no stronger than the success argument" means given the lattice. 97 | 98 | There is no relationship, "stronger" or otherwise, between release and 99 | consume/acquire. The current wording says "shall be no stronger" which isn't the 100 | same as "shall not be stronger" in this context. Is that on purpose? At a 101 | minimum it's not clear and should be clarified. 102 | 103 | Should the following be valid: 104 | 105 | ``` 106 | compare_exchange_strong(x, y, z, memory_order_release, memory_order_acquire); 107 | ``` 108 | 109 | Or does the code need to be: 110 | 111 | ``` 112 | compare_exchange_strong(x, y, z, memory_order_acq_rel, memory_order_acquire); 113 | ``` 114 | 115 | Similar questions can be asked for `memory_order_consume` ordering on `failure`. 116 | 117 | Is there even a point in restricting `success`/`failure` orderings? On 118 | architectures with load-linked/store-conditional instructions the load and store 119 | are distinct instructions which can each have their own memory ordering (with 120 | appropriate leading/trailing fences if required), whereas architectures with 121 | compare-and-exchange already have a limited set of instructions to choose 122 | from. The current limitation (assuming [[LWG2445]] is resolved) only seems to 123 | restrict compilers on load-linked/store-conditional architectures. 124 | 125 | The following code could be valid if the stored data didn't need to be published 126 | nor ordered, whereas any retry needs to read additional data: 127 | 128 | ``` 129 | compare_exchange_strong(x, y, z, memory_order_relaxed, memory_order_acquire); 130 | ``` 131 | 132 | Even if—for lack of clever instruction—architectures cannot take advantage of 133 | such code, compiler are able to optimize atomics in all sorts of clever ways as 134 | discussed in [[N4455]]. 135 | 136 | Updated proposal {#new-res} 137 | ================ 138 | 139 | This paper proposes removing the "stronger" restrictions between 140 | compare-exchange's `success` and `failure` ordering, and doesn't add a lattice 141 | to order atomic orderings. The only remaining restriction is that 142 | `memory_order_release` and `memory_order_acq_rel` for `failure` are still 143 | disallowed: a failed compare-exchange doesn't store, the current model is 144 | therefore not sensible with these orderings. 145 | 146 | There have been discussions about `memory_order_release` loads, e.g. for 147 | seqlock. Such potential changes are left up to future papers. 148 | 149 | Modify [util.smartptr.shared.atomic] ¶32 as follows: 150 | 151 |
152 | 153 | Requires: The failure argument shall not be 154 | `memory_order_release`, nor `memory_order_acq_rel`, 155 | or stronger than success. 156 | 157 |
158 | 159 | Modify [atomics.types.operations.req] ¶21 as follows: 160 | 161 |
162 | 163 | Requires: The failure argument shall not be `memory_order_release` nor 164 | `memory_order_acq_rel`. The failure argument shall be no stronger than 165 | the success argument. 166 | 167 |
168 | 169 | Leave [atomics.types.operations.req] ¶22 as-is: 170 | 171 |
172 | 173 | Effects: Atomically, compares the contents of the memory pointed to by 174 | `object` or by `this` for equality with that in `expected`, and if `true`, 175 | replaces the contents of the memory pointed to by `object` or by `this` with 176 | that in `desired`, and if `false`, updates the contents of the memory in 177 | `expected` with the contents of the memory pointed to by `object` or by 178 | `this`. Further, if the comparison is `true`, memory is affected according to 179 | the value of `success`, and if the comparison is `false`, memory is affected 180 | according to the value of `failure`. 181 | 182 | When only one `memory_order` argument is supplied, the value of `success` is 183 | `order`, and the value of `failure` is `order` except that a value of 184 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire` 185 | and a value of `memory_order_release` shall be replaced by the value 186 | `memory_order_relaxed`. 187 | 188 | If the operation returns `true`, these operations are atomic read-modify-write 189 | operations (1.10). Otherwise, these operations are atomic load operations. 190 | 191 |
192 | 193 | Acknowledgement {#ack} 194 | =============== 195 | 196 | Thanks to John McCall for pointing out that the proposed resolution was still 197 | insufficient, and for providing ample feedback. 198 | -------------------------------------------------------------------------------- /source/p1119r0.bs: -------------------------------------------------------------------------------- 1 | 20 | 21 | Wording {#word} 22 | ======= 23 | 24 | [[P0154R1]] introduced `constexpr std::hardware_{constructive,destructive}_interference_size` to C++17: 25 | 26 | Header `` synopsis [**new.syn**]: 27 | 28 |
29 | 30 | 31 | 32 | namespace std { 33 | // ... 34 | // 21.6.5, hardware interference size 35 | inline constexpr size_t hardware_destructive_interference_size = implementation-defined; 36 | inline constexpr size_t hardware_constructive_interference_size = implementation-defined; 37 | // ... 38 | } 39 | 40 | 41 | 42 |
43 | 44 | Hardware interference size [**hardware.interference**]: 45 | 46 |
47 | 48 | inline constexpr size_t hardware_destructive_interference_size = implementation-defined; 49 | 50 | This number is the minimum recommended offset between two concurrently-accessed 51 | objects to avoid additional performance degradation due to contention introduced 52 | by the implementation. It shall be at least `alignof(max_align_t)`. 53 | 54 | [ *Example*: 55 | 56 | 57 | struct keep_apart { 58 | alignas(hardware_destructive_interference_size) atomic<int> cat; 59 | alignas(hardware_destructive_interference_size) atomic<int> dog; 60 | }; 61 | 62 | 63 | — *end example* ] 64 | 65 | inline constexpr size_t hardware_constructive_interference_size = implementation-defined; 66 | 67 | This number is the maximum recommended size of contiguous memory occupied by 68 | two objects accessed with temporal locality by concurrent threads. It shall be 69 | at least `alignof(max_align_t)`. 70 | 71 | [ *Example*: 72 | 73 | 74 | struct together { 75 | atomic<int> dog; 76 | int puppy; 77 | }; 78 | struct kennel { 79 | // Other data members... 80 | alignas(sizeof(together)) together pack; 81 | // Other data members... 82 | }; 83 | static_assert(sizeof(together) <= hardware_constructive_interference_size); 84 | 85 | 86 | — *end example* ] 87 | 88 |
89 | 90 | Discussions {#discussions} 91 | =========== 92 | 93 | The paper was discussed in: 94 | 95 | * [SG1 Kona](http://wiki.edg.com/bin/view/Wg21kona2015/N4523) 96 | * [LEWG Kona](http://wiki.edg.com/bin/view/Wg21kona2015/P0154) 97 | * [LEWG Jacksonville](http://wiki.edg.com/bin/view/Wg21jacksonville/P0154) 98 | * [LWG Jacksonville](http://wiki.edg.com/bin/view/Wg21jacksonville/D0154R1) 99 | 100 | ABI issues were considered in these discussions, and the committee decided that 101 | having these values was worth the potential pain points. ABI issues can arise as 102 | follows: 103 | 104 | 1. A developer asks the compiler to generate code for multiple targets of the 105 | same ISA, and these targets prefer different interference sizes. 106 | 1. A developer indicates that code should be generated for heterogeneous system 107 | (such as CPU and GPU), which prefer different interference sizes. 108 | 1. A developer uses different compilers, and links the result together. 109 | 110 | A further ABI issue was added by [[P0607r0]] by making the variables `inline`: 111 | in case 1. above the interference size values differ between translation units, 112 | which is a problem if they are used in an ODR-relevant context. That paper noted: 113 | 114 |
115 | 116 | [*Drafting notes*: The removal of the explicit `static` specifier for the 117 | namespace-scope constants `hardware_destructive_interference_size` and 118 | `hardware_constructive_interference_size` is still required because adding 119 | `inline` alone would still not solve the ODR violation problem here. 120 | — *end drafting notes*] 121 | 122 |
123 | 124 | This change indeed fixes the ODR issue where two translation units translated 125 | with the same interference size values may violate ODR when used with e.g. 126 | `std::max`. It however introduces a new ODR issue for case 1. above. 127 | 128 | Richard Smith and Tim Song propose changing the definition to: 129 | 130 | 131 | static constexpr const std::size_t& hardware_destructive_interference_size = implementation-defined; 132 | static constexpr const std::size_t& hardware_constructive_interference_size = implementation-defined; 133 | 134 | 135 | We propose a discussion and poll on this topic. 136 | 137 | 138 | Pushback {#push} 139 | ======== 140 | 141 | The maintainers of clang and GCC 142 | have 143 | [discussed an implementation strategy](http://lists.llvm.org/pipermail/cfe-dev/2018-May/058073.html), 144 | but received pushback based on the above ABI issues. The messaging from the 145 | committee wasn't clear that ABI issues were discussed and the proposal accepted 146 | despite these issues. This type of ABI problem is difficult or impossible to 147 | warn about, some implementors are worried. 148 | 149 | Some implementors are worries that they have the following choices when 150 | implementing, and are unsure which approach to take: 151 | 152 | 1. Pick a value once for each ABI and cast it in stone forever, even if 153 | microarchitectural revisions cause the values to change. 154 | 1. Change the value between microarchitectures, even though that's an ABI 155 | break? 156 | 1. Something else. 157 | 158 | The authors believe that the ABI issues are acceptable because: 159 | 160 | * As demonstrated in the original paper, developers already write code like 161 | this, using macros. Any ABI issue that exist with this proposal already 162 | existed before the proposal. 163 | * Many uses of these values have no ABI breakage potential because they only 164 | target one variant of one ISA. 165 | * The usecase for these values is to lay out datastructures. These 166 | datastructures shouldn't be shared across translation units which follow 167 | different ABIs. 168 | * Similar ABI issues already exist with `max_align_t` and `intmax_t`. 169 | * Implementations can offer compiler flags which specifically control ABI. For 170 | example, `-mcpu` could keep the ABI stable, but `-mcpu-abi` would change it. 171 | 172 | Polls {#polls} 173 | ===== 174 | 175 | We propose the following poll for SG1: 176 | 177 | > The committee understands the ABI issues with `std::hardware_{constructive,destructive}_interference_size`, yet chooses to standardize these values nonetheless. 178 | 179 | The committee could also consider adding a note to point out ABI issues with 180 | these values. This would be a novel note, since ABI isn't discussed in the 181 | Standard. 182 | 183 | We propose the following poll for SG1, LEWG, and LWG: 184 | 185 | > Both ODR issues should be addressed, the type should therefore be changed to `static constexpr const std::size_t&`. 186 | 187 | Not all authors of this paper are in favor of this direction, but all agree the 188 | discussion is worth having. 189 | -------------------------------------------------------------------------------- /source/N4523.rst: -------------------------------------------------------------------------------- 1 | =================================================================== 2 | N4523 ``constexpr std::thread::hardware_{true,false}_sharing_size`` 3 | =================================================================== 4 | 5 | :Author: JF Bastien 6 | :Contact: jfb@google.com 7 | :Author: Olivier Giroux 8 | :Contact: ogiroux@nvidia.com 9 | :Date: 2015-05-21 10 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4523.rst 11 | 12 | --------- 13 | Rationale 14 | --------- 15 | 16 | Starting with C++11, the library includes 17 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity 18 | useful in the design of control structures in multi-threaded programs: the 19 | extent of threads that do not interfere (to the first-order). Established 20 | practice throughout the industry also relies on a second implementation 21 | quantity, used instead in the design of data structures in the same programs. 22 | This quantity is the granularity of memory that does not interfere (to the 23 | first-order), commonly referred to as the *cache-line size*. 24 | 25 | Uses of *cache-line size* fall into two broad categories: 26 | 27 | * Avoiding false-sharing between objects with temporally disjoint runtime access 28 | patterns from different threads. e.g. Producer-consumer queues. 29 | * Promoting true-sharing between objects which have temporally local runtime 30 | access patterns. e.g. The ``barrier`` example, as illustrated in N4522_. 31 | 32 | .. _N4522: http://wg21.link/N4522 33 | 34 | The most sigificant issue with this useful implementation quantity is the 35 | questionable portability of the methods used in current practice to determine 36 | its value, despite their pervasiveness and popularity as a group. In the 37 | appendix_ we review several different compile-time and run-time methods. The 38 | portability problem with most of these methods is that they expose a 39 | micro-architectural detail without accounting for the intent of the implementors 40 | (such as we are) over the life of the ISA or ABI. 41 | 42 | We aim to contribute a modest invention for this cause, abstractions for this 43 | quantity that can be conservatively defined for given purposes by 44 | implementations: 45 | 46 | * *False-sharing size*: a number that's suitable as an offset between two 47 | objects to likely avoid false-sharing due to different runtime access patterns 48 | from different threads. 49 | * *True-sharing size*: a number that's suitable as a limit on two objects' 50 | combined memory footprint size and base alignment to likely promote 51 | true-sharing between them. 52 | 53 | In both cases these values are provided on a quality of implementation basis, 54 | purely as hints that are likely to improve performance. These are ideal portable 55 | values to use with the ``alignas()`` keyword, for which there currently exists 56 | nearly no standard-supported portable uses. 57 | 58 | ----------------- 59 | Proposed addition 60 | ----------------- 61 | 62 | We propose adding the following to the standard: 63 | 64 | Under 30.3.1 Class ``thread`` [**thread.thread.class**]: 65 | 66 | .. code-block:: c++ 67 | 68 | namespace std { 69 | class thread { 70 | // ... 71 | public: 72 | static constexpr size_t hardware_false_sharing_size = /* implementation-defined */; 73 | static constexpr size_t hardware_true_sharing_size = /* implementation-defined */; 74 | // ... 75 | }; 76 | } 77 | 78 | Under 30.3.1.6 ``thread`` static members [**thread.thread.static**]: 79 | 80 | ``constexpr size_t hardware_false_sharing_size = /* implementation-defined */;`` 81 | 82 | This number is the minimum recommended offset between two concurrently-accessed 83 | objects to avoid additional performance degradation due to contention introduced 84 | by the implementation. 85 | 86 | [*Example:* 87 | 88 | .. code-block:: c++ 89 | 90 | struct apart { 91 | alignas(hardware_false_sharing_size) atomic flag1, flag2; 92 | }; 93 | 94 | — *end example*] 95 | 96 | ``constexpr size_t hardware_true_sharing_size = /* implementation-defined */;`` 97 | 98 | This number is the minimum recommended alignment and maximum recommended size of 99 | contiguous memory occupied by two objects accessed with temporal locality by 100 | concurrent threads. 101 | 102 | [*Example:* 103 | 104 | .. code-block:: c++ 105 | 106 | alignas(hardware_true_sharing_size) struct colocated { 107 | atomic flag; 108 | int tinydata; 109 | }; 110 | static_assert(sizeof(colocated) <= hardware_true_sharing_size); 111 | 112 | — *end example*] 113 | 114 | The ``__cpp_lib_thread_hardware_sharing_size`` feature test macro should be 115 | added. 116 | 117 | .. _appendix: 118 | 119 | -------- 120 | Appendix 121 | -------- 122 | 123 | Compile-time *cache-line size* 124 | ============================== 125 | 126 | We informatively list a few ways in which the L1 *cache-line size* is obtained 127 | in different open-source projects at compile-time. 128 | 129 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured 130 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this 131 | value is determined through the configure-time option 132 | ``CONFIG__L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT`` 133 | is hard-coded in the architecture's ``include/asm/cache.h`` header. 134 | 135 | Many open-source projects from Google contain a ``base/port.h`` header which 136 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of 137 | architecture detection macros. These header files have often diverged. A token 138 | example from the autofdo_ project is: 139 | 140 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h 141 | 142 | .. code-block:: c++ 143 | 144 | // Cache line alignment 145 | #if defined(__i386__) || defined(__x86_64__) 146 | #define CACHELINE_SIZE 64 147 | #elif defined(__powerpc64__) 148 | // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines. 149 | // Need to check if this is appropriate for other PowerPC64 systems. 150 | #define CACHELINE_SIZE 128 151 | #elif defined(__arm__) 152 | // Cache line sizes for ARM: These values are not strictly correct since 153 | // cache line sizes depend on implementations, not architectures. There 154 | // are even implementations with cache line sizes configurable at boot 155 | // time. 156 | #if defined(__ARM_ARCH_5T__) 157 | #define CACHELINE_SIZE 32 158 | #elif defined(__ARM_ARCH_7A__) 159 | #define CACHELINE_SIZE 64 160 | #endif 161 | #endif 162 | 163 | #ifndef CACHELINE_SIZE 164 | // A reasonable default guess. Note that overestimates tend to waste more 165 | // space, while underestimates tend to waste more time. 166 | #define CACHELINE_SIZE 64 167 | #endif 168 | 169 | #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE))) 170 | 171 | Runtime *cache-line size* 172 | ========================= 173 | 174 | We informatively list a few ways in which the L1 *cache-line size* can be 175 | obtained on different operating systems and architectures at runtime. 176 | 177 | On OSX one would use: 178 | 179 | .. code-block:: c++ 180 | 181 | sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0) 182 | 183 | On Windows one would use: 184 | 185 | .. code-block:: c++ 186 | 187 | GetLogicalProcessorInformation(&buf[0], &sizeof_buf); 188 | for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) { 189 | if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1) 190 | cacheline_size = buf[i].Cache.LineSize; 191 | 192 | On Linux one would either use: 193 | 194 | .. code-block:: c++ 195 | 196 | p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r"); 197 | fscanf(p, "%d", &cacheline_size); 198 | 199 | or: 200 | 201 | .. code-block:: c++ 202 | 203 | sysconf(_SC_LEVEL1_DCACHE_LINESIZE); 204 | 205 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which 206 | leaves the result in ``ECX``, which needs further work to extract. 207 | 208 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to 209 | extract. 210 | -------------------------------------------------------------------------------- /source/P1018R19.bs: -------------------------------------------------------------------------------- 1 | 17 | 18 | 21 | 22 | Executive summary {#summary} 23 | ================= 24 | 25 | The Evolution Working Group did not meet in-person between the February 2020 meeting in Prague, until November 2022 in Kona. You will find EWG's pandemic activities in [[P1018r18]]. 26 | 27 | This paper summarizes all of the work that was performed in the November 2022 Kona meeting. 28 | 29 | Work Performed {#work} 30 | ============== 31 | 32 | This meeting was the first towards finalizing C++23, see [[P1000r4]] for the full schedule. In the ISO process, we received a variety of comments from different National Bodies. The full list is tracked as GitHub issues. EWG received 33 National Body comments. Of those, 16 were closed as duplicates, and 17 were reviewed with the following outcomes: 33 | 34 | 40 | 41 | Separately from finalizing C++23, we’ve continued early work towards C++26 and later. We track outstanding proposals in GitHub as well, here are the ones for EWG which are ready to review. EWG and its incubator EWGI started the week with 83 papers to review (some not for the first time), EWG therefore had to prioritize using a variety of criteria such as the C++ Direction Group’s recommendations in [[P2000r4]]. During the week forwarded the following papers to CWG for C++26: 42 |
    43 |
  • [[P1061R0]] Structured Bindings can introduce a Pack
  • 44 |
  • [[P2361R0]] Unevaluated string literals
  • 45 |
  • [[P2014R0]] aligned allocation of coroutine frames
  • 46 |
  • [[P0609R1]] Attributes for Structured Bindings
  • 47 |
  • [[P2558R0]] Add @, $, and ` to the basic character set
  • 48 |
  • [[P2621R0]] UB? In my Lexer?
  • 49 |
  • [[P2686R0]] Updated wording and implementation experience for P1481 (constexpr structured bindings)
  • 50 |
  • [[P1967R0]] #embed - a simple, scannable preprocessor-based resource acquisition method
  • 51 |
  • [[P2593R0]] Allowing static_assert(false): To be forwarded after the next meeting unless a better proposal comes up
  • 52 |
53 | This doesn’t mean that they will all be in C++26, they are only tentatively on track to be in C++26. 54 | 55 | The following papers were reviewed and forwarded to LEWG, the library evolution group, meaning that either EWG sees no need for language input, or provided language input to the library group, or requests library input to further the language work: 56 |
    57 |
  • [[P2641R0]] Checking if a union alternative is active
  • 58 |
  • [[P2546R0]] Debugging Support
  • 59 |
  • [[P0876R5]] fiber_context - fibers without scheduler
  • 60 |
  • [[P2141R0]] Aggregates are named tuples
  • 61 |
62 | 63 | The following papers were reviewed and encouraged to come back with an update: 64 |
    65 |
  • [[P0901R2]] Size feedback in operator new
  • 66 |
  • [[P2677R0]] Reconsidering concepts in-place syntax
  • 67 |
  • Pattern Matching:
  • 68 |
      69 |
    • [[P2211R0]] Exhaustiveness Checking for Pattern Matching
    • 70 |
    • [[P2169R0]] A Nice Placeholder With No Name
    • 71 |
    • [[P2392R2]] Pattern matching using is and as
    • 72 |
    • [[P2688R0]] Pattern Matching Discussion for Kona 2022
    • 73 |
    • [[P2561R1]] An error propagation operator
    • 74 |
    • [[P2656R0]] C++ Ecosystem International Standard
    • 75 |
    76 |
  • Pointer Provenance
  • 77 |
      78 |
    • [[P2188R0]] Zap the Zap: Pointers should just be bags of bits
    • 79 |
    • P2434R0 (not yet published) Nondeterministic pointer provenance
    • 80 |
    81 |
  • [[P2547R0]] Language support for customisable functions
  • 82 |
  • [[P2632R0]] A plan for better template meta programming facilities in C++26
  • 83 |
  • [[P2671R0]] Syntax choices for generalized pack declaration and usage
  • 84 |
85 | 86 | The following papers were reviewed and had no consensus for further work: 87 |
    88 |
  • [[P2669R0]] Deprecate changing kind of names in class template specializations
  • 89 |
  • [[P2174R0]] Compound Literals
  • 90 |
  • [[P2381R0]] Pattern Matching with Exception Handling
  • 91 |
92 | 93 | CWG asked for EWG feedback on: 94 |
    95 |
  • [[CWG2463]] Conditions for trivially copyable classes, the conclusion was that a paper was needed to address the issue
  • 96 |
97 | 98 | The committee also tracks defects through various groups. EWG issues were tracked in [[P1018r18]], and will shortly move to GitHub. This week we reviewed EWG issues as follows: 99 |
    100 |
  • 2 Marked Resolved
  • 101 |
  • 1 Marked as “Needs a Paper”
  • 102 |
  • 17 Closed as “Not A Defect”
  • 103 |
104 | 105 | EWG hosted an evening session on “the future of C++”. The results in a few weeks (once the committee discussed internally, based on the survey feedback that sent attendees). It was well attended with 100+ participants, and much frank discussion. 106 | 107 | A session on [[P2676r0]] he Val object model was held, so that C++ committee members learn about the work David Abrahams is doing at Adobe on the Val language. We separately heard from Herb Sutter on CppFront. We also had good engagement from a few folks who have worked on the Carbon programming language. As this is the C++ committee, we also often talk about languages such as Rust, Circle, Zig and others. 108 | 109 |
110 | {
111 |     "P1018r18": {
112 |         "href": "https://wg21.link/p1018r18",
113 |         "title": "C++ Language Evolution status - pandemic edition - 2022/08-2022/011",
114 |         "authors": ["JF Bastien"],
115 |         "date": "2022-10-24"
116 |     }
117 | }
118 | 
119 | -------------------------------------------------------------------------------- /source/P0528r1.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | This issue has been discussed by the authors at every recent Standards meetings, 18 | yet a full solution has been elusive despite helpful proposals. We believe that 19 | this proposal can fix this oft-encountered problem once and for all. 20 | 21 | [[P0528r0]] details extensive background on this problem (not repeated here), 22 | and proposed standardizing a trait, `has_padding_bits`, and using it on 23 | `compare_and_exchange_*`. This paper applies EWG guidance and simply adds a 24 | note. 25 | 26 | 27 | Edit History {#edit} 28 | ============ 29 | 30 | r0 → r1 {#r0r1} 31 | ------- 32 | 33 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming 34 | value of `T` have a consistent value for the purposes of read/modify/write 35 | atomic operations? 36 | 37 | Purposefully not addressed in this paper: 38 | 39 | * `union` with padding bits 40 | * Types with trap representations 41 | 42 | Proposed Wording {#word} 43 | ================ 44 | 45 | In Operations on atomic types [**atomics.types.operations**], insert a new 46 | paragraph after the note in ❡1: 47 | 48 |
49 | 50 | [*Note:* Many operations are volatile-qualified. The "volatile as device 51 | register" semantics have not changed in the standard. This qualification means 52 | that volatility is preserved when applying these operations to volatile objects. 53 | It does not mean that operations on non-volatile objects become volatile. —*end 54 | note*] 55 | 56 | 57 | 58 | Atomic operations, both through `atomic` and free-functions, can be performed 59 | on types `T` which contain bits that never participate in the object's 60 | representation. In such cases an implementation shall ensure that 61 | initialization, assignment, store, exchange, and read-modify-write operations 62 | replace bits which never participate in the object's representation with an 63 | implementation-defined value. A compatible implementation-defined value shall be 64 | used for compare-and-exchange operations' copy of the `expected` value. 65 | 66 | As a consequence, the following code is guaranteed to avoid spurious failure: 67 | 68 | 69 | 70 | struct padded { 71 | char c = 0x42; 72 | // Padding here. 73 | unsigned i = 0xC0DEFEFE; 74 | }; 75 | atomic<padded> pad = ATOMIC_VAR_INIT({}); 76 | 77 | bool success() { 78 | padded expected, desired { 0, 0 }; 79 | return pad.compare_exchange_strong(expected, desired); 80 | } 81 | 82 | 83 | 84 | [*Note:* 85 | 86 | Types which contain bits that sometimes participate in the object's 87 | representation, such as a `union` containing a type with padding bits and a 88 | type without, may always fail compare-and-exchange when these bits are not 89 | participating in the object's representation because they have an 90 | indeterminate value. Such a program is ill-formed, no diagnostic required. 91 | 92 | —*end note*] 93 | 94 | 95 | 96 |
97 | 98 | Edit ❡17 and onwards as follows: 99 | 100 |
101 | 102 | *Requires:* The `failure` argument shall not be `memory_order::release` nor 103 | `memory_order::acq_rel`. 104 | 105 | *Effects:* Retrieves the value in `expected`. Bits in the retrieved value 106 | which never participate in the object's representation are set to a value 107 | compatible to that previously stored in the atomic object. It then 108 | atomically compares the contents of the memory pointed to by `this` for equality 109 | with that previously retrieved from `expected`, and if true, replaces the 110 | contents of the memory pointed to by `this` with that in `desired`. If and only 111 | if the comparison is true, memory is affected according to the value of 112 | `success`, and if the comparison is false, memory is affected according to the 113 | value of `failure`. When only one `memory_order` argument is supplied, the value 114 | of `success` is `order`, and the value of `failure` is `order` except that a 115 | value of `memory_order::acq_rel` shall be replaced by the value 116 | `memory_order::acquire` and a value of `memory_order::release` shall be replaced 117 | by the value `memory_order::relaxed`. If and only if the comparison is false 118 | then, after the atomic operation, the contents of the memory in `expected` are 119 | replaced by the value read from the memory pointed to by `this` during the 120 | atomic comparison. If the operation returns `true`, these operations are atomic 121 | read-modify-write operations on the memory pointed to by `this`. Otherwise, 122 | these operations are atomic load operations on that memory. 123 | 124 | *Returns:* The result of the comparison. 125 | 126 | [*Note:* 127 | 128 | For example, the effect of `compare_exchange_strong` is 129 | 130 | 131 | 132 | if (memcmp(this, &expected, sizeof(*this)) == 0) 133 | memcpy(this, &desired, sizeof(*this)); 134 | else 135 | memcpy(expected, this, sizeof(*this)); 136 | 137 | 138 | 139 | —*end note*] 140 | 141 | [*Example:* 142 | 143 | The expected use of the compare-and-exchange operations is as follows. The 144 | compare-and-exchange operations will update `expected` when another iteration 145 | of the loop is needed. 146 | 147 | 148 | 149 | expected = current.load(); 150 | do { 151 | desired = function(expected); 152 | } while (!current.compare_exchange_weak(expected, desired)); 153 | 154 | 155 | 156 | —*end example*] 157 | 158 | [*Example:* 159 | 160 | Because the expected value is updated only on failure, code releasing the 161 | memory containing the `expected` value on success will work. E.g. list head 162 | insertion will act atomically and would not introduce a data race in the 163 | following code: 164 | 165 | 166 | 167 | do { 168 | p->next = head; // make new list node point to the current head 169 | } while (!head.compare_exchange_weak(p->next, p)); // try to insert 170 | 171 | 172 | 173 | —*end example*] 174 | 175 | Implementations should ensure that weak compare-and-exchange operations do not 176 | consistently return `false` unless either the atomic object has value different 177 | from `expected` or there are concurrent modifications to the atomic object. 178 | 179 | 180 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is, 181 | even when the contents of memory referred to by `expected` and `this` are equal, 182 | it may return `false` and store back to `expected` the same memory contents that 183 | were originally there. 184 | 185 | [*Note:* 186 | 187 | This spurious failure enables implementation of compare-and-exchange on a 188 | broader class of machines, e.g., load-locked store-conditional machines. A 189 | consequence of spurious failure is that nearly all uses of weak 190 | compare-and-exchange will be in a loop. When a compare-and-exchange is in a 191 | loop, the weak version will yield better performance on some platforms. When a 192 | weak compare-and-exchange would require a loop and a strong one would not, the 193 | strong one is preferable. 194 | 195 | —*end note*] 196 | 197 | [*Note:* 198 | 199 | The `memcpy` and `memcmp` semantics of the compare-and-exchange operations may 200 | result in failed comparisons for values that compare equal with `operator==` 201 | if the underlying type has padding bits which sometimes participate in 202 | the object's representation, trap bits, or alternate representations of 203 | the same value other than those caused by padding bits which never 204 | participate in the object's representation. 205 | 206 | —*end note*] 207 | 208 |
209 | -------------------------------------------------------------------------------- /source/P0528r2.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | This issue has been discussed by the authors at every recent Standards meetings, 18 | yet a full solution has been elusive despite helpful proposals. We believe that 19 | this proposal can fix this oft-encountered problem once and for all. 20 | 21 | [[P0528r0]] details extensive background on this problem (not repeated here), 22 | and proposed standardizing a trait, `has_padding_bits`, and using it on 23 | `compare_and_exchange_*`. [[P0528r1]] applied EWG guidance and simply added 24 | wording directing implementations to ensure that the desired behavior occur. At 25 | SG1's request this paper follows EWG's guidance but uses different wording. 26 | 27 | 28 | Edit History {#edit} 29 | ============ 30 | 31 | r1 → r2 {#r1r2} 32 | ------- 33 | 34 | In Jacksonville, SG1 supported the paper but suggested an alternate way to 35 | approach the wording than the one EWG proposed in Albuquerque: don't talk about 36 | contents of the memory, but rather discuss the value representation to describe 37 | compare-and-exchange. This paper follows SG1's guidance and offers different 38 | wording, with the intent that the semantics be equivalent. EWG reviewed the 39 | updated wording an voted to support it and forward to Core. 40 | 41 | r0 → r1 {#r0r1} 42 | ------- 43 | 44 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming 45 | value of `T` have a consistent value for the purposes of read/modify/write 46 | atomic operations? 47 | 48 | Purposefully not addressed in this paper: 49 | 50 | * `union` with padding bits 51 | * Types with trap representations 52 | 53 | Proposed Wording {#word} 54 | ================ 55 | 56 | Edit ❡17 and onwards as follows: 57 | 58 |
59 | 60 | *Requires:* The `failure` argument shall not be `memory_order::release` nor 61 | `memory_order::acq_rel`. 62 | 63 | *Effects:* Retrieves the value in `expected`. It then atomically compares 64 | the contents of the memory pointed to by `this`value representation 65 | of `*this` for equality with that previously retrieved from `expected`, 66 | and if true, replaces the contents of the memory pointed to 67 | by `this`value representation of `*this` with that in `desired`. If 68 | and only if the comparison is true, memory is affected according to the value of 69 | `success`, and if the comparison is false, memory is affected according to the 70 | value of `failure`. When only one `memory_order` argument is supplied, the value 71 | of `success` is `order`, and the value of `failure` is `order` except that a 72 | value of `memory_order::acq_rel` shall be replaced by the value 73 | `memory_order::acquire` and a value of `memory_order::release` shall be replaced 74 | by the value `memory_order::relaxed`. If and only if the comparison is false 75 | then, after the atomic operation, the contents of the memorythe 76 | value representation in `expected` are replaced by the value 77 | representation read from the memory pointed to by `this` during the atomic 78 | comparison. If the operation returns `true`, these operations are atomic 79 | read-modify-write operations on the memory pointed to by `this`. Otherwise, 80 | these operations are atomic load operations on that memory. 81 | 82 | *Returns:* The result of the comparison. 83 | 84 | [*Note:* 85 | 86 | For example, the effect of `compare_exchange_strong` on objects without padding bits is 87 | 88 | 89 | 90 | if (memcmp(this, &expected, sizeof(*this)) == 0) 91 | memcpy(this, &desired, sizeof(*this)); 92 | else 93 | memcpy(expected, this, sizeof(*this)); 94 | 95 | 96 | 97 | —*end note*] 98 | 99 | [*Example:* 100 | 101 | The expected use of the compare-and-exchange operations is as follows. The 102 | compare-and-exchange operations will update `expected` when another iteration 103 | of the loop is needed. 104 | 105 | 106 | 107 | expected = current.load(); 108 | do { 109 | desired = function(expected); 110 | } while (!current.compare_exchange_weak(expected, desired)); 111 | 112 | 113 | 114 | —*end example*] 115 | 116 | [*Example:* 117 | 118 | Because the expected value is updated only on failure, code releasing the 119 | memory containing the `expected` value on success will work. E.g. list head 120 | insertion will act atomically and would not introduce a data race in the 121 | following code: 122 | 123 | 124 | 125 | do { 126 | p->next = head; // make new list node point to the current head 127 | } while (!head.compare_exchange_weak(p->next, p)); // try to insert 128 | 129 | 130 | 131 | —*end example*] 132 | 133 | Implementations should ensure that weak compare-and-exchange operations do not 134 | consistently return `false` unless either the atomic object has value different 135 | from `expected` or there are concurrent modifications to the atomic object. 136 | 137 | 138 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is, 139 | even when the contents of memory referred to by `expected` and `this` are equal, 140 | it may return `false` and store back to `expected` the same memory contents that 141 | were originally there. 142 | 143 | [*Note:* 144 | 145 | This spurious failure enables implementation of compare-and-exchange on a 146 | broader class of machines, e.g., load-locked store-conditional machines. A 147 | consequence of spurious failure is that nearly all uses of weak 148 | compare-and-exchange will be in a loop. When a compare-and-exchange is in a 149 | loop, the weak version will yield better performance on some platforms. When a 150 | weak compare-and-exchange would require a loop and a strong one would not, the 151 | strong one is preferable. 152 | 153 | —*end note*] 154 | 155 | [*Note:* 156 | 157 | The `memcpy` and `memcmp` semantics of the compare-and-exchange operations 158 | may result in failed comparisons for values that compare equal with 159 | `operator==` if the underlying type has padding bits which sometimes 160 | participate in the object's representation, trap bits, or 161 | alternate representations of the same value other than those caused by 162 | padding bits which never participate in the object's representation. 163 | Notably, on implementations conforming to ISO/IEC/IEEE 60559, floating-point 164 | `-0.0` and `+0.0` will not compare equal with `memcmp` but will compare equal 165 | with `operator==`, and NaNs with the same payload will compare equal with 166 | `memcmp` but will not compare equal with `operator==`. 167 | 168 | —*end note*] 169 | 170 | 171 | 172 | [*Note:* 173 | 174 | Compare-and-exchange acts on an object's value representation, ensuring that 175 | padding bits which never participate in the object's representation are ignored. 176 | 177 | As a consequence, the following code is guaranteed to avoid spurious failure: 178 | 179 | 180 | 181 | struct padded { 182 | char clank = 0x42; 183 | // Padding here. 184 | unsigned biff = 0xC0DEFEFE; 185 | }; 186 | atomic<padded> pad = ATOMIC_VAR_INIT({}); 187 | 188 | bool zap() { 189 | padded expected, desired { 0, 0 }; 190 | return pad.compare_exchange_strong(expected, desired); 191 | } 192 | 193 | 194 | 195 | —*end note*] 196 | 197 | [*Note:* 198 | 199 | Types which contain bits that sometimes participate in the object's 200 | representation, such as a `union` containing a type with padding bits and a 201 | type without, may always fail compare-and-exchange when these bits are not 202 | participating in the object's representation because they have an 203 | indeterminate value. 204 | 205 | —*end note*] 206 | 207 | 208 | 209 |
210 | -------------------------------------------------------------------------------- /source/P1018r6.bs: -------------------------------------------------------------------------------- 1 | 17 | 18 | Executive summary {#summary} 19 | ================= 20 | 21 | * Finalize ballot resolution for C++20, to address National Body comments in [[N4844]]. 22 | * Start work on features for C++23 and later. 23 | * Joins session with LEWG on ABI, based on P1863R1. 24 | 25 | 26 | Paper of note {#note} 27 | ============= 28 | 29 | * P1000R4 C++ IS schedule 30 | * P0592R4 To boldly suggest an overall plan for C++23 31 | * P1999R0 Process: 2×-🇨🇿 evolutionary material via a Tentatively Ready status 32 | * P2118R0 Documenting Core Undefined or Unspecified Behavior 33 | 34 | 35 | Tentatively ready papers {#tentative} 36 | ======================== 37 | 38 | Following our process in P1999, here are the papers that EWG considers tentatively ready for CWG. We'll take a brief look at the next meeting, and if nothing particular concerns anyone, send them to CWG. 39 | 40 | * P1847R2 Make declaration order layout mandated 41 | * P2025R0 Guaranteed copy elision for named return objects 42 | * P1949R2 C++ Identifier Syntax using Unicode Standard Annex 31 43 | 44 | You can follow this list on GitHub. 45 | 46 | 47 | ABI discussion {#abi} 48 | ============== 49 | 50 | We held a joint sessions with LEWG to discuss ABI, based on P1863R1, The outcome of the discussion was as follows: 51 | 52 | * To the best of our ability, we should promise users that we won’t break ABI, ever
Wasn't contended: we disagree with this statement and might break ABI in the future. 53 | * From now on, we should consider incremental ABI for every C++ release
Received extremely positive support, with a small minority disagreeing strongly. 54 | * We should consider a big ABI break for C++23
Was extremely contended, with a few more people in favor than against. This was insufficient to call consensus. 55 | * We should consider a big ABI break for C++SOMETHING
Was positive enough to call consensus, but still had a quite substantial opposition including many disagreeing strongly. Were we to do a big ABI break we would need to work very hard on consensus building. Indeed, the number of people disagreeing strongly on a poll for a concrete change would block consensus. 56 | * When we are unable to resolve a conflict between performance and ABI compatibility, we should prioritize performance
Was still more positive, but also had a quite substantial opposition including many disagreeing strongly. Again, we should consider performance over ABI but work extremely hard towards consensus building when doing so. 57 | 58 | 59 | National body comments {#nb} 60 | ====================== 61 | 62 | * P2003R0 Fixing Internal and External Linkage Entities in Header Units #740 63 | * P2014R0 Proposed resolution for US061/US062 - aligned allocation of coroutine frames #750 64 | * P1884R0 Private Module Partition: An Inconsistent Boundary #729 65 | * P2100R0 Keep unhandled_exception of a promise type mandatory - a response to US062 and FR066 66 | * P2104R0 GB046 Allow caching of evaluations of concept specializations #45 67 | 68 | 69 | C++23 discussions {#cpp23} 70 | ================= 71 | 72 | We discussed a few papers which could make it to C++23: 73 | 74 | * P2085R0 Consistent defaulted comparisons 75 | * P0592R4 To boldly suggest an overall plan for C++23 76 | * P1999R0 Process proposal: double-check evolutionary material via a Tentatively Ready status 77 | * P1468R3 Fixed-layout floating-point type aliases 78 | * P1467R3 Extended floating-point types 79 | * P1371R2 Pattern Matching 80 | * P1000R4 C++ IS schedule 81 | * P1726R2 Pointer lifetime-end zap 82 | * P2092R0 Disambiguating Nested-Requirements 83 | * P1040R5 std::embed 84 | * P1677R2 Cancellation is not an Error 85 | * P1401R2 Narrowing contextual conversions to bool 86 | * P0876R10 fiber_context - fibers without scheduler 87 | * P0847R4 Deducing this 88 | * P2082R1 Fixing CTAD for aggregates 89 | * P1774R3 Portable assumptions 90 | * P2118R0 Documenting Core Undefined or Unspecified Behavior 91 | * P0849R2 auto(x): decay-copy in the language 92 | * P2036R0 Changing scope for lambda trailing-return-type 93 | * P2071R0 Named universal character escapes 94 | * P1900R0 Concepts-Adjacent Problems 95 | * P1847R2 Make declaration order layout mandated 96 | * P1393R0 A General Property Customization Mechanism 97 | * P2026R0 A Constituent Study Group for Safety-Critical Applications 98 | * P1938R0 if consteval 99 | * P1955R0 Top Level Is Constant Evaluated 100 | * P2041R0 Deleting variable templates 101 | * P0870R2 A proposal for a type trait to detect narrowing conversions 102 | * P2025R0 Guaranteed copy elision for named return objects 103 | * P2013R0 Freestanding Language: Optional ::operator new 104 | * P1949R2 C++ Identifier Syntax using Unicode Standard Annex 31 105 | 106 | The following papers were scheduled for discussion, but authors requested to delay until the next meeting: 107 | 108 | * P1967R1 #embed - a simple, scannable preprocessor-based resource acquisition method 109 | * P1046R2 Automatically Generate More Operators 110 | * P2049R0 Constraint refinement for special-cased functions 111 | 112 | The following papers were scheduled for discussion, but were seen in SG7 Reflection who decided to table them for now: 113 | 114 | * P1733R0 User-friendly and Evolution-friendly Reflection: A Compromise 115 | * P2089R0 Function parameter constraints are fragile 116 | 117 | 118 | Near-future EWG plans {#future} 119 | ===================== 120 | 121 | We will continue to work on C++23, prioritizing according to P0592. 122 | -------------------------------------------------------------------------------- /source/P1225R0.bs: -------------------------------------------------------------------------------- 1 | 15 | 16 | Abstract {#abs} 17 | ======== 18 | 19 | I’ve gathered input from a variety of folks involved in graphics at Apple, and here is our joint, considered, position regarding the 2D Graphics proposal. 20 | 21 | We’re worried that the 2D Graphics proposal in [[P0267R8]] might be detrimental to developers, students, and users of devices which contain C++ code. Graphics are important to the Apple ecosystem, and we can see them as an important part of C++. However, we don’t think P0267R8 meets the quality bar for acceptance into C++. We want to see the reference implementation prove orthogonality, extensibility, and performance across a handful of platforms. 22 | 23 | 24 | Design {#design} 25 | ====== 26 | 27 | Were we to design a 2D Graphics API, we’d do the following: 28 | 29 | 1. Multiple output devices: Memory buffer, Window, SVG, PDF, etc. 30 | 31 | 1. Memory buffer must be directly usable by graphics API 32 | 1. Support types such as `fp16` [[P0303R0]] 33 | 1. Alpha channel support 34 | 35 | 1. Anti-aliasing should come for free where supported 36 | 1. Text 37 | 1. Consistent, DPI-independent, output 38 | 1. Hardware support where available 39 | 1. Reasonable performance 40 | 1. Reasonable power consumption 41 | 1. Color spaces and gamma support 42 | 1. Possibility to build an interactive model with animation on top of the API 43 | 44 | From the current proposal we like: 45 | 46 | 1. 2D Matrix is 3×3, so homogeneous, presented as 2×3 in the API 47 | 1. Decouples display points from actual points 48 | 1. Vector graphics 49 | 1. Compositing properly handled 50 | 51 | Science and teaching {#st} 52 | ==================== 53 | 54 | We’ve heard the following reasons for including 2D Graphics in C++: 55 | 56 | 1. Teaching 57 | 1. Scientific plot generation 58 | 59 | We think putting pixels on the screen is great, but we want to do so responsibly. 60 | 61 | Both for science and teaching, we appreciate what’s available through solutions such as Matlab / matplotlib / R / D3.js. These solutions are powerful and match the performance of the language they complement. For C++ we’d expect a solution which is able to deliver performance which at least approaches that of modern graphics frameworks, and surpassing those of Matlab / Python / R / JavaScript. 62 | 63 | As a teaching tool, the current proposal teaches fairly low-level capabilities (i.e. complex things are hard to create) and is missing critical functionality. We fear it will hinder students by teaching them to start everything from scratch, and by not teaching them a few key details. 64 | 65 | As a plotting tool it’s clearly falling short because it can’t label any axis (c.f. Tufte). Even if text were supported, the sample libraries for Matlab, Python, R, and JavaScript are much easier to draw plots with. The 2D Graphics proposal is neither capable nor convenient in that regard. 66 | 67 | As a broad generalization, students currently learn data visualization (beyond what Excel + CSV files can do) in Matlab or Python if they do science, in R if they do math, and in JavaScript if they do anything else. We urge the Committee members at least try some of these, for example scatterplot, histogram, wordtree. These aren’t teaching toys and are used, for example, by the New York Times. There’s value in teaching students to pull themselves up from the language’s bootstraps, we therefore think the type of API in the current 2D Graphics library is useful. However, we want to know—i.e. we want to see it prototyped—that higher-level capabilities are also something that can be implemented. We think higher-level capabilities are more useful for teaching, yet we understand that C++ might want to offer lower-level primitives first. 68 | 69 | Abstraction Level {#level} 70 | ================= 71 | 72 | When we say the current proposal is too low-level, here are things we’d like to see at least prototyped to know that the proposal can grow into a powerful high-level library: 73 | 74 | * Obtain a window object 75 | * Load / transform / draw asset files 76 | * Complex raster image support (including swizzled surfaces, compression, 2D form clipping, used as texture fill) 77 | * New user-implemented rasterization primitives (such as ellipses or NURBS curve) 78 | * Stacking geometric transforms before drawing (can this be done already?) 79 | * Scissoring / clipping 80 | * Handle user input 81 | * Text support (glyph rasterization (e.g. FreeType), text Shaping (e.g. HarfBuzz), string Rendering (e.g. Pango)), or something platform specific (e.g. CoreText on Apple Platforms) 82 | * Complex line drawing (e.g. dashed lines, along a path) 83 | * Can all of the offered primitives be implemented directly on hardware using shaders? 84 | 85 | In other words, we understand that a proposal might want to start small and grow more features over time. We want to know that this growth is possible, and that features can be composed into higher-level primitives. 86 | 87 | Missing Details {#missing} 88 | =============== 89 | 90 | When we say the current proposal has key details we find missing, here are what we want to see in an initial version: 91 | 92 | * It’s unclear that buffering is implementable, and that’s critical to a high-performance implementation. We’d like to see it implemented. We want to see a deferred mode implementation, not just immediate mode. 93 | * Support modern color spaces and gamma. 94 | * DPI independence is needed. 95 | * Display points seem to address individual pixels in the image. We’d like to be able to address at finer granularity (MSAA samples, typographer points, pica). 96 | * We’re not convinced that animation can be supported efficiently (i.e. update a single matrix in the stack of transforms). 97 | * The current proposal doesn’t specify which image format can be loaded, yet the reference implementation has PNG, JPEG, TIFF. This lack of specification makes portability difficult. 98 | * We want to see an implementation generate PDF, SVG, raster output, as well as output in an OS window. This should be doable portably with zero code change. 99 | 100 | C++ Aesthetics {#cpp} 101 | ============== 102 | 103 | Aesthetically, this lacks the feel of a C++ standard library. In particular: 104 | 105 | * The dual error handling mechanism, while reminiscent of filesystem, is quaint in the STL. 106 | * Most APIs seem to be function-oriented and have a C API feel to them. 107 | * We’re surprised that we don’t have iterators / ranges for e.g. a path. We’d expect STL algorithms to work on such primitives. 108 | * We’d like to see linear algebra, trigonometry, and matrix math standardized separately. 109 | 110 | Conclusion {#conc} 111 | ========== 112 | 113 | We want to offer developers a graphics solution which allows usage of the full capabilities of the hardware we ship, without wasting battery life. Were we to ship the 2D Graphics proposal, we’d be putting our and C++’s good name on an API. We want to be sure it doesn't provides a disservice to developers and users. 114 | 115 | We’re surprised and worried that the reference implementation on Mac requires X11 and MacPorts. We want to see an implementation that re-uses platform primitives on more than Linux. What was the experience with CoreGraphics? 116 | 117 | The windows + SVG proposal in [[P1062R0]] isn’t terrible. Obtaining a window seems like a simple step forward. SVG has some upsides and a few downsides, but overall we’re positive on them. We like that the proposal leans on existing standards. 118 | 119 | Web view from [[P1108R0]] is trivial to support if specified well, but we don’t think it does what graphics enthusiasts want to do. It might be an interesting proposal, but we think it stands separately from 2D Graphics. 120 | -------------------------------------------------------------------------------- /source/P0154R0.rst: -------------------------------------------------------------------------------- 1 | ================================================================================ 2 | P0154R0 ``constexpr std::hardware_{constructive,destructive}_interference_size`` 3 | ================================================================================ 4 | 5 | :Author: JF Bastien 6 | :Contact: jfb@google.com 7 | :Author: Olivier Giroux 8 | :Contact: ogiroux@nvidia.com 9 | :Date: 2015-10-24 10 | :Previous: http://wg21.link/N4523 11 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0154R0.rst 12 | 13 | --------- 14 | Rationale 15 | --------- 16 | 17 | Starting with C++11, the library includes 18 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity 19 | useful in the design of control structures in multi-threaded programs: the 20 | extent of threads that do not interfere (to the first-order). Established 21 | practice throughout the industry also relies on a second implementation 22 | quantity, used instead in the design of data structures in the same programs. 23 | This quantity is the granularity of memory that does not interfere (to the 24 | first-order), commonly referred to as the *cache-line size*. 25 | 26 | Uses of *cache-line size* fall into two broad categories: 27 | 28 | * Avoiding destructive interference (false-sharing) between objects with 29 | temporally disjoint runtime access patterns from different 30 | threads. e.g. Producer-consumer queues. 31 | * Promoting constructive interference (true-sharing) between objects which have 32 | temporally local runtime access patterns. e.g. The ``barrier`` example, as 33 | illustrated in P0153R0_. 34 | 35 | .. _P0153R0: http://wg21.link/P0153R0 36 | 37 | The most sigificant issue with this useful implementation quantity is the 38 | questionable portability of the methods used in current practice to determine 39 | its value, despite their pervasiveness and popularity as a group. In the 40 | appendix_ we review several different compile-time and run-time methods. The 41 | portability problem with most of these methods is that they expose a 42 | micro-architectural detail without accounting for the intent of the implementors 43 | (such as we are) over the life of the ISA or ABI. 44 | 45 | We aim to contribute a modest invention for this cause, abstractions for this 46 | quantity that can be conservatively defined for given purposes by 47 | implementations: 48 | 49 | * *Destructive interference size*: a number that's suitable as an offset between 50 | two objects to likely avoid false-sharing due to different runtime access 51 | patterns from different threads. 52 | * *Constructive interference size*: a number that's suitable as a limit on two 53 | objects' combined memory footprint size and base alignment to likely promote 54 | true-sharing between them. 55 | 56 | In both cases these values are provided on a quality of implementation basis, 57 | purely as hints that are likely to improve performance. These are ideal portable 58 | values to use with the ``alignas()`` keyword, for which there currently exists 59 | nearly no standard-supported portable uses. 60 | 61 | ----------------- 62 | Proposed addition 63 | ----------------- 64 | 65 | Below, substitute the `�` character with a number the editor finds appropriate 66 | for the sub-section. We propose adding the following to the standard: 67 | 68 | Under 20.7.2 Header ```` synopsis [**memory.syn**]: 69 | 70 | .. code-block:: c++ 71 | 72 | namespace std { 73 | // ... 74 | // 20.7.� Hardware interference size 75 | static constexpr size_t hardware_destructive_interference_size = implementation-defined; 76 | static constexpr size_t hardware_constructive_interference_size = implementation-defined; 77 | // ... 78 | } 79 | 80 | Under 20.7.� Hardware interference size [**hardware.interference**]: 81 | 82 | ``constexpr size_t hardware_destructive_interference_size = implementation-defined;`` 83 | 84 | This number is the minimum recommended offset between two concurrently-accessed 85 | objects to avoid additional performance degradation due to contention introduced 86 | by the implementation. It shall be a valid alignment value for any type. 87 | 88 | [*Example:* 89 | 90 | .. code-block:: c++ 91 | 92 | struct apart { 93 | alignas(hardware_destructive_interference_size) atomic flag1, flag2; 94 | }; 95 | 96 | — *end example*] 97 | 98 | ``constexpr size_t hardware_constructive_interference_size = implementation-defined;`` 99 | 100 | This number is the minimum recommended alignment of contiguous memory occupied 101 | by two objects accessed with temporal locality by concurrent threads. It shall 102 | be a valid alignment value for any type. 103 | 104 | [*Note:* This number is also the maximum recommended size of contiguous memory 105 | occupied by two objects accessed in this manner. — *end note*] 106 | 107 | [*Example:* 108 | 109 | .. code-block:: c++ 110 | 111 | alignas(hardware_constructive_interference_size) struct colocated { 112 | atomic flag; 113 | int tinydata; 114 | }; 115 | static_assert(sizeof(colocated) <= hardware_constructive_interference_size); 116 | 117 | — *end example*] 118 | 119 | The ``__cpp_lib_thread_hardware_interference_size`` feature test macro should be 120 | added. 121 | 122 | .. _appendix: 123 | 124 | -------- 125 | Appendix 126 | -------- 127 | 128 | Compile-time *cache-line size* 129 | ============================== 130 | 131 | We informatively list a few ways in which the L1 *cache-line size* is obtained 132 | in different open-source projects at compile-time. 133 | 134 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured 135 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this 136 | value is determined through the configure-time option 137 | ``CONFIG__L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT`` 138 | is hard-coded in the architecture's ``include/asm/cache.h`` header. 139 | 140 | Many open-source projects from Google contain a ``base/port.h`` header which 141 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of 142 | architecture detection macros. These header files have often diverged. A token 143 | example from the autofdo_ project is: 144 | 145 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h 146 | 147 | .. code-block:: c++ 148 | 149 | // Cache line alignment 150 | #if defined(__i386__) || defined(__x86_64__) 151 | #define CACHELINE_SIZE 64 152 | #elif defined(__powerpc64__) 153 | // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines. 154 | // Need to check if this is appropriate for other PowerPC64 systems. 155 | #define CACHELINE_SIZE 128 156 | #elif defined(__arm__) 157 | // Cache line sizes for ARM: These values are not strictly correct since 158 | // cache line sizes depend on implementations, not architectures. There 159 | // are even implementations with cache line sizes configurable at boot 160 | // time. 161 | #if defined(__ARM_ARCH_5T__) 162 | #define CACHELINE_SIZE 32 163 | #elif defined(__ARM_ARCH_7A__) 164 | #define CACHELINE_SIZE 64 165 | #endif 166 | #endif 167 | 168 | #ifndef CACHELINE_SIZE 169 | // A reasonable default guess. Note that overestimates tend to waste more 170 | // space, while underestimates tend to waste more time. 171 | #define CACHELINE_SIZE 64 172 | #endif 173 | 174 | #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE))) 175 | 176 | Runtime *cache-line size* 177 | ========================= 178 | 179 | We informatively list a few ways in which the L1 *cache-line size* can be 180 | obtained on different operating systems and architectures at runtime. Libraries 181 | such as hwloc_ perform these queries, and could also be added to the standard as 182 | a separate proposal. 183 | 184 | .. _hwloc: http://www.open-mpi.org/projects/hwloc/ 185 | 186 | On OSX one would use: 187 | 188 | .. code-block:: c++ 189 | 190 | sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0) 191 | 192 | On Windows one would use: 193 | 194 | .. code-block:: c++ 195 | 196 | GetLogicalProcessorInformation(&buf[0], &sizeof_buf); 197 | for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) { 198 | if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1) 199 | cacheline_size = buf[i].Cache.LineSize; 200 | 201 | On Linux one would either use: 202 | 203 | .. code-block:: c++ 204 | 205 | p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r"); 206 | fscanf(p, "%d", &cacheline_size); 207 | 208 | or: 209 | 210 | .. code-block:: c++ 211 | 212 | sysconf(_SC_LEVEL1_DCACHE_LINESIZE); 213 | 214 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which 215 | leaves the result in ``ECX``, which needs further work to extract. 216 | 217 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to 218 | extract. 219 | -------------------------------------------------------------------------------- /source/P0154R1.rst: -------------------------------------------------------------------------------- 1 | ================================================================================ 2 | P0154R1 ``constexpr std::hardware_{constructive,destructive}_interference_size`` 3 | ================================================================================ 4 | 5 | :Author: JF Bastien 6 | :Contact: jfb@google.com 7 | :Author: Olivier Giroux 8 | :Contact: ogiroux@nvidia.com 9 | :Date: 2016-03-03 10 | :Previous: http://wg21.link/N4523 11 | :Previous: http://wg21.link/P0154R0 12 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0154R1.rst 13 | 14 | --------- 15 | Rationale 16 | --------- 17 | 18 | Starting with C++11, the library includes 19 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity 20 | useful in the design of control structures in multi-threaded programs: the 21 | extent of threads that do not interfere (to the first-order). Established 22 | practice throughout the industry also relies on a second implementation 23 | quantity, used instead in the design of data structures in the same programs. 24 | This quantity is the granularity of memory that does not interfere (to the 25 | first-order), commonly referred to as the *cache-line size*. 26 | 27 | Uses of *cache-line size* fall into two broad categories: 28 | 29 | * Avoiding destructive interference (false-sharing) between objects with 30 | temporally disjoint runtime access patterns from different 31 | threads. e.g. Producer-consumer queues. 32 | * Promoting constructive interference (true-sharing) between objects which have 33 | temporally local runtime access patterns. e.g. The ``barrier`` example, as 34 | illustrated in P0153R0_. 35 | 36 | .. _P0153R0: http://wg21.link/P0153R0 37 | 38 | The most sigificant issue with this useful implementation quantity is the 39 | questionable portability of the methods used in current practice to determine 40 | its value, despite their pervasiveness and popularity as a group. In the 41 | appendix_ we review several different compile-time and run-time methods. The 42 | portability problem with most of these methods is that they expose a 43 | micro-architectural detail without accounting for the intent of the implementors 44 | (such as we are) over the life of the ISA or ABI. 45 | 46 | We aim to contribute a modest invention for this cause, abstractions for this 47 | quantity that can be conservatively defined for given purposes by 48 | implementations: 49 | 50 | * *Destructive interference size*: a number that's suitable as an offset between 51 | two objects to likely avoid false-sharing due to different runtime access 52 | patterns from different threads. 53 | * *Constructive interference size*: a number that's suitable as a limit on two 54 | objects' combined memory footprint size and base alignment to likely promote 55 | true-sharing between them. 56 | 57 | In both cases these values are provided on a quality of implementation basis, 58 | purely as hints that are likely to improve performance. These are ideal portable 59 | values to use with the ``alignas()`` keyword, for which there currently exists 60 | nearly no standard-supported portable uses. 61 | 62 | ----------------- 63 | Proposed addition 64 | ----------------- 65 | 66 | Below, substitute the `�` character with a number the editor finds appropriate 67 | for the sub-section. We propose adding the following to the standard: 68 | 69 | Under 18.6 Header ```` synopsis [**support.dynamic**]: 70 | 71 | .. code-block:: c++ 72 | 73 | namespace std { 74 | // ... 75 | // 18.6.� Hardware interference size 76 | static constexpr size_t hardware_destructive_interference_size = implementation-defined; 77 | static constexpr size_t hardware_constructive_interference_size = implementation-defined; 78 | // ... 79 | } 80 | 81 | Under 18.6.� Hardware interference size [**hardware.interference**]: 82 | 83 | ``constexpr size_t hardware_destructive_interference_size = implementation-defined;`` 84 | 85 | This number is the minimum recommended offset between two concurrently-accessed 86 | objects to avoid additional performance degradation due to contention introduced 87 | by the implementation. It shall be at least ``alignof(max_align_t)``. 88 | 89 | [*Example:* 90 | 91 | .. code-block:: c++ 92 | 93 | struct keep_apart { 94 | alignas(hardware_destructive_interference_size) atomic cat; 95 | alignas(hardware_destructive_interference_size) atomic dog; 96 | }; 97 | 98 | — *end example*] 99 | 100 | ``constexpr size_t hardware_constructive_interference_size = implementation-defined;`` 101 | 102 | This number is the maximum recommended size of contiguous memory occupied by two 103 | objects accessed with temporal locality by concurrent threads. It shall be at 104 | least ``alignof(max_align_t)``. 105 | 106 | [*Example:* 107 | 108 | .. code-block:: c++ 109 | 110 | struct together { 111 | atomic dog; 112 | int puppy; 113 | }; 114 | struct kennel { 115 | // Other data members... 116 | alignas(sizeof(together)) together pack; 117 | // Other data members... 118 | }; 119 | static_assert(sizeof(together) <= hardware_constructive_interference_size); 120 | 121 | — *end example*] 122 | 123 | The ``__cpp_lib_thread_hardware_interference_size`` feature test macro should be 124 | added. 125 | 126 | .. _appendix: 127 | 128 | -------- 129 | Appendix 130 | -------- 131 | 132 | Compile-time *cache-line size* 133 | ============================== 134 | 135 | We informatively list a few ways in which the L1 *cache-line size* is obtained 136 | in different open-source projects at compile-time. 137 | 138 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured 139 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this 140 | value is determined through the configure-time option 141 | ``CONFIG__L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT`` 142 | is hard-coded in the architecture's ``include/asm/cache.h`` header. 143 | 144 | Many open-source projects from Google contain a ``base/port.h`` header which 145 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of 146 | architecture detection macros. These header files have often diverged. A token 147 | example from the autofdo_ project is: 148 | 149 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h 150 | 151 | .. code-block:: c++ 152 | 153 | // Cache line alignment 154 | #if defined(__i386__) || defined(__x86_64__) 155 | #define CACHELINE_SIZE 64 156 | #elif defined(__powerpc64__) 157 | // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines. 158 | // Need to check if this is appropriate for other PowerPC64 systems. 159 | #define CACHELINE_SIZE 128 160 | #elif defined(__arm__) 161 | // Cache line sizes for ARM: These values are not strictly correct since 162 | // cache line sizes depend on implementations, not architectures. There 163 | // are even implementations with cache line sizes configurable at boot 164 | // time. 165 | #if defined(__ARM_ARCH_5T__) 166 | #define CACHELINE_SIZE 32 167 | #elif defined(__ARM_ARCH_7A__) 168 | #define CACHELINE_SIZE 64 169 | #endif 170 | #endif 171 | 172 | #ifndef CACHELINE_SIZE 173 | // A reasonable default guess. Note that overestimates tend to waste more 174 | // space, while underestimates tend to waste more time. 175 | #define CACHELINE_SIZE 64 176 | #endif 177 | 178 | #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE))) 179 | 180 | Runtime *cache-line size* 181 | ========================= 182 | 183 | We informatively list a few ways in which the L1 *cache-line size* can be 184 | obtained on different operating systems and architectures at runtime. Libraries 185 | such as hwloc_ perform these queries, and could also be added to the standard as 186 | a separate proposal. 187 | 188 | .. _hwloc: http://www.open-mpi.org/projects/hwloc/ 189 | 190 | On OSX one would use: 191 | 192 | .. code-block:: c++ 193 | 194 | sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0) 195 | 196 | On Windows one would use: 197 | 198 | .. code-block:: c++ 199 | 200 | GetLogicalProcessorInformation(&buf[0], &sizeof_buf); 201 | for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) { 202 | if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1) 203 | cacheline_size = buf[i].Cache.LineSize; 204 | 205 | On Linux one would either use: 206 | 207 | .. code-block:: c++ 208 | 209 | p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r"); 210 | fscanf(p, "%d", &cacheline_size); 211 | 212 | or: 213 | 214 | .. code-block:: c++ 215 | 216 | sysconf(_SC_LEVEL1_DCACHE_LINESIZE); 217 | 218 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which 219 | leaves the result in ``ECX``, which needs further work to extract. 220 | 221 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to 222 | extract. 223 | -------------------------------------------------------------------------------- /source/P0476r2.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | 18 | This paper is a revision of [[P0476r1]], addressing LEWG comments from the 2017 19 | Toronto meeting as well as comments from LEWG and LWG from the 2017 Albuquerque 20 | meeting. See [[#rev]] for details. 21 | 22 | 23 | Background {#bg} 24 | ========== 25 | 26 | Low-level code often seeks to interpret objects of one type as another: keep the 27 | same bits, but obtain an object of a different type. Doing so correctly is 28 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing 29 | rules yet these are the intuitive solutions developers mistakenly turn to. 30 | 31 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment 32 | pitfalls and allowing them to bit-cast non-default-constructible types. 33 | 34 | This proposal uses appropriate concepts to prevent misuse. As the sample 35 | implementation demonstrates we could as well use `static_assert` or template 36 | SFINAE, but the timing of this library feature will likely coincide with 37 | concept's standardization. 38 | 39 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast 40 | function, as `memcpy` itself isn't `constexpr`. Marking the proposed function as 41 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`, but 42 | requires compiler support. This leaves implementations free to use their own 43 | internal solution (e.g. LLVM has a `bitcast` 45 | opcode). 46 | 47 | We should standardize this oft-used idiom, and avoid the pitfalls once and for 48 | all. 49 | 50 | 51 | Proposed Wording {#word} 52 | ================ 53 | 54 | Below, substitute the `�` character with a number or name the editor finds 55 | appropriate for the sub-section. 56 | 57 | In 20.5.1.2 [**headers**] add the header `` to: 58 | 59 | * Table 16 — C++ library headers 60 | * Table 19 — C++ headers for freestanding implementations 61 | 62 | In the numerics section, add the following: 63 | 64 | 65 | 29.� Bit manipulation library [**bit**] {#bit} 66 | --------------------------------------- 67 | 68 | 29.�.1 General [**bit.general**] {#bitgen} 69 | -------------------------------- 70 | 71 | The header `` provides components to access, manipulate and process both 72 | individual bits and bit sequences. 73 | 74 | 29.�.2 Header `` synopsis [**bit.syn**] {#bitsyn} 75 | -------------------------------------------- 76 | 77 | 78 | namespace std { 79 | 80 | // 29.�.3 bit_cast 81 | template<typename To, typename From> 82 | constexpr To bit_cast(const From& from) noexcept; 83 | 84 | } 85 | 86 | 87 | 29.�.3 Function template `bit_cast` [**bit.cast**] {#bitcast} 88 | -------------------------------------------------- 89 | 90 | 91 | template<typename To, typename From> 92 | constexpr To bit_cast(const From& from) noexcept; 93 | 94 | 95 |
    96 |
  1. *Remarks*: 97 | 98 | This function shall not participate in overload resolution unless: 99 |
      100 |
    • `sizeof(To) == sizeof(From)` is `true`;
    • 101 |
    • `is_trivially_copyable_v` is `true`; and
    • 102 |
    • `is_trivially_copyable_v` is `true`.
    • 103 |
    104 | 105 | This function shall be `constexpr` if and only if `To`, `From`, and the types 106 | of all subobjects of `To` and `From` are types `T` such that: 107 | 108 |
      109 |
    • `is_union_v` is `false`;
    • 110 |
    • `is_pointer_v` is `false`;
    • 111 |
    • `is_member_pointer_v` is `false`;
    • 112 |
    • `is_volatile_v` is `false`; and
    • 113 |
    • `T` has no non-static data members of reference type.
    • 114 |
    115 |
  2. 116 |
  3. *Returns*: 117 | 118 | An object of type `To`. Each bit of the value representation of the result 119 | is equal to the corresponding bit in the object representation of 120 | `from`. Padding bits of the `To` object are unspecified. If there is no 121 | value of type `To` corresponding to the value representation produced, the 122 | behavior is undefined. If there are multiple such values, which value is 123 | produced is unspecified. 124 | 125 |
  4. 126 |
127 |
128 | 129 | Feature testing {#test} 130 | --------------- 131 | 132 | The `__cpp_lib_bit_cast` feature test macro should be added. 133 | 134 | Appendix {#appendix} 135 | ======== 136 | 137 | The Standard's [**basic.types**] section explicitly blesses `memcpy`: 138 | 139 |
140 | 141 | For any trivially copyable type `T`, if two pointers to `T` point to distinct 142 | `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class 143 | subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into 144 | `obj2`, `obj2` shall subsequently hold the same value as `obj1`. 145 | 146 | [*Example:* 147 | ``` 148 | T* t1p; 149 | T* t2p; 150 | // provided that t2p points to an initialized object ... 151 | std::memcpy(t1p, t2p, sizeof(T)); 152 | // at this point, every subobject of trivially copyable type in *t1p contains 153 | // the same value as the corresponding subobject in *t2p 154 | ``` 155 | — *end example*] 156 | 157 |
158 | 159 | Whereas section [**class.union**] says: 160 | 161 |
162 | 163 | In a union, at most one of the non-static data members can be 164 | active at any time, that is, the value of at most one of the 165 | non-static data members can be stored in a union at any time. 166 | 167 |
168 | 169 | 170 | Revision History {#rev} 171 | ================ 172 | 173 | r1 ➡ r2 {#r1r2} 174 | -------- 175 | 176 | The paper was reviewed by LEWG at the 2017 Toronto meeting and feedback was 177 | provided. In the 2017 Albuquerque meeting LEWG provided feedback regarding usage 178 | of concepts while discussing [[P0802r0]], and EWG reviewed the paper: 179 | 180 | * Use "shall not participate in overload resolution" wording instead of a 181 | requires clause. 182 | * The author was asked to explore naming. LEWG took a poll in Albuquerque and 183 | voted to keep `bit_cast`. 184 | * There was strong sentiment that this facility should be available in 185 | freestanding implementations. LEWG is changing its guidance regarding 186 | freestanding header granularity, but until guidance is actually changed it 187 | was decided that a currently freestanding header should be used. LEWG took a 188 | poll in Albuquerque, and the new `` header was chosen instead of 189 | ``. 190 | * Call out that `constexpr` requires compiler support. 191 | * Make `constexpr` conditional, similar to variant's [variant.ctor] wording, 192 | based on an EWG straw poll in Albuquerque. 193 | * LWG review made the `constexpr` remark recursive, and tuned the return 194 | wording, asking CWG to review the changes. 195 | * LWG review requested that this paper also add the `` header, and let 196 | the editor resolve races if multiple papers add the header concurrently. 197 | * CWG substantially tuned the wording. 198 | 199 | r0 ➡ r1 {#r0r1} 200 | -------- 201 | 202 | The paper was reviewed by LEWG at the 2016 Issaquah meeting: 203 | 204 | * Remove the standard layout requirement—trivially copyable suffices for the `memcpy` requirement. 205 | * We discussed removing `constexpr`, but there was no consent either way. There was some suggestion that it’ll be hard for implementers, but there's also some desire (by the same implementers) to have those features available in order to support things like `constexpr` instances of `std::variant`. 206 | * The pointer-forbidding logic was removed. It was initially there to help developers when a better tool is available, but it's easily worked around (e.g. with a `struct` containing a pointer). Note that this doesn't prevent `constexpr` versions of `bit_cast`: the implementation is allowed to error out on `bit_cast` of pointer. 207 | * Some discussion about concepts-usage, but it seems like mostly an LWG issue and we're reasonably sure that concepts will land before this or in a compatible vehicle. 208 | 209 | Straw polls: 210 | 211 | * Do we want to see [[P0476r0]] again? unanimous consent. 212 | * `bit_cast` should allow pointer types in `To` and `From`. **SF F N A SA** 4 5 4 2 1 213 | * `bit_cast` should be `constexpr`? **SF F N A SA** 4 3 7 2 3 214 | 215 | 216 | Acknowledgement {#ack} 217 | =============== 218 | 219 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review 220 | and suggested improvements. 221 | -------------------------------------------------------------------------------- /source/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Papers documentation build configuration file, created by 4 | # sphinx-quickstart on Sun Mar 22 16:26:35 2015. 5 | # 6 | # This file is execfile()d with the current directory set to its 7 | # containing dir. 8 | # 9 | # Note that not all possible configuration values are present in this 10 | # autogenerated file. 11 | # 12 | # All configuration values have a default; values that are commented out 13 | # serve to show the default. 14 | 15 | import sys 16 | import os 17 | 18 | # If extensions (or modules to document with autodoc) are in another directory, 19 | # add these directories to sys.path here. If the directory is relative to the 20 | # documentation root, use os.path.abspath to make it absolute, like shown here. 21 | #sys.path.insert(0, os.path.abspath('.')) 22 | 23 | # -- General configuration ------------------------------------------------ 24 | 25 | # If your documentation needs a minimal Sphinx version, state it here. 26 | #needs_sphinx = '1.0' 27 | 28 | # Add any Sphinx extension module names here, as strings. They can be 29 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 30 | # ones. 31 | extensions = [ 32 | 'sphinx.ext.todo', 33 | ] 34 | 35 | # Add any paths that contain templates here, relative to this directory. 36 | templates_path = ['_templates'] 37 | 38 | # The suffix of source filenames. 39 | source_suffix = '.rst' 40 | 41 | # The encoding of source files. 42 | #source_encoding = 'utf-8-sig' 43 | 44 | # The master toctree document. 45 | master_doc = 'index' 46 | 47 | # General information about the project. 48 | project = u'Papers' 49 | copyright = u'2015, JF Bastien' 50 | 51 | # The version info for the project you're documenting, acts as replacement for 52 | # |version| and |release|, also used in various other places throughout the 53 | # built documents. 54 | # 55 | # The short X.Y version. 56 | version = '1.0' 57 | # The full version, including alpha/beta/rc tags. 58 | release = '1.0' 59 | 60 | # The language for content autogenerated by Sphinx. Refer to documentation 61 | # for a list of supported languages. 62 | #language = None 63 | 64 | # There are two options for replacing |today|: either, you set today to some 65 | # non-false value, then it is used: 66 | #today = '' 67 | # Else, today_fmt is used as the format for a strftime call. 68 | #today_fmt = '%B %d, %Y' 69 | 70 | # List of patterns, relative to source directory, that match files and 71 | # directories to ignore when looking for source files. 72 | exclude_patterns = [] 73 | 74 | # The reST default role (used for this markup: `text`) to use for all 75 | # documents. 76 | #default_role = None 77 | 78 | # If true, '()' will be appended to :func: etc. cross-reference text. 79 | #add_function_parentheses = True 80 | 81 | # If true, the current module name will be prepended to all description 82 | # unit titles (such as .. function::). 83 | #add_module_names = True 84 | 85 | # If true, sectionauthor and moduleauthor directives will be shown in the 86 | # output. They are ignored by default. 87 | #show_authors = False 88 | 89 | # The name of the Pygments (syntax highlighting) style to use. 90 | pygments_style = 'sphinx' 91 | 92 | # A list of ignored prefixes for module index sorting. 93 | #modindex_common_prefix = [] 94 | 95 | # If true, keep warnings as "system message" paragraphs in the built documents. 96 | #keep_warnings = False 97 | 98 | 99 | # -- Options for HTML output ---------------------------------------------- 100 | 101 | # The theme to use for HTML and HTML Help pages. See the documentation for 102 | # a list of builtin themes. 103 | html_theme = 'basic' 104 | 105 | # Theme options are theme-specific and customize the look and feel of a theme 106 | # further. For a list of options available for each theme, see the 107 | # documentation. 108 | #html_theme_options = {} 109 | 110 | # Add any paths that contain custom themes here, relative to this directory. 111 | html_theme_path = ['_templates/'] 112 | 113 | # The name for this set of Sphinx documents. If None, it defaults to 114 | # " v documentation". 115 | html_title = '' 116 | 117 | # A shorter title for the navigation bar. Default is the same as html_title. 118 | #html_short_title = None 119 | 120 | # The name of an image file (relative to this directory) to place at the top 121 | # of the sidebar. 122 | #html_logo = None 123 | 124 | # The name of an image file (within the static path) to use as favicon of the 125 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 126 | # pixels large. 127 | #html_favicon = None 128 | 129 | # Add any paths that contain custom static files (such as style sheets) here, 130 | # relative to this directory. They are copied after the builtin static files, 131 | # so a file named "default.css" will overwrite the builtin "default.css". 132 | html_static_path = ['_static'] 133 | 134 | # Add any extra paths that contain custom files (such as robots.txt or 135 | # .htaccess) here, relative to this directory. These files are copied 136 | # directly to the root of the documentation. 137 | #html_extra_path = [] 138 | 139 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 140 | # using the given strftime format. 141 | #html_last_updated_fmt = '%b %d, %Y' 142 | 143 | # If true, SmartyPants will be used to convert quotes and dashes to 144 | # typographically correct entities. 145 | #html_use_smartypants = True 146 | 147 | # Custom sidebar templates, maps document names to template names. 148 | #html_sidebars = {} 149 | 150 | # Additional templates that should be rendered to pages, maps page names to 151 | # template names. 152 | #html_additional_pages = {} 153 | 154 | # If false, no module index is generated. 155 | #html_domain_indices = True 156 | 157 | # If false, no index is generated. 158 | #html_use_index = True 159 | 160 | # If true, the index is split into individual pages for each letter. 161 | #html_split_index = False 162 | 163 | # If true, links to the reST sources are added to the pages. 164 | #html_show_sourcelink = True 165 | 166 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 167 | #html_show_sphinx = True 168 | 169 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 170 | #html_show_copyright = True 171 | 172 | # If true, an OpenSearch description file will be output, and all pages will 173 | # contain a tag referring to it. The value of this option must be the 174 | # base URL from which the finished HTML is served. 175 | #html_use_opensearch = '' 176 | 177 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 178 | #html_file_suffix = None 179 | 180 | # Output file base name for HTML help builder. 181 | htmlhelp_basename = 'Papersdoc' 182 | 183 | 184 | # -- Options for LaTeX output --------------------------------------------- 185 | 186 | latex_elements = { 187 | # The paper size ('letterpaper' or 'a4paper'). 188 | #'papersize': 'letterpaper', 189 | 190 | # The font size ('10pt', '11pt' or '12pt'). 191 | #'pointsize': '10pt', 192 | 193 | # Additional stuff for the LaTeX preamble. 194 | #'preamble': '', 195 | } 196 | 197 | # Grouping the document tree into LaTeX files. List of tuples 198 | # (source start file, target name, title, 199 | # author, documentclass [howto, manual, or own class]). 200 | latex_documents = [ 201 | ('index', 'Papers.tex', u'Papers Documentation', 202 | u'JF Bastien', 'manual'), 203 | ] 204 | 205 | # The name of an image file (relative to this directory) to place at the top of 206 | # the title page. 207 | #latex_logo = None 208 | 209 | # For "manual" documents, if this is true, then toplevel headings are parts, 210 | # not chapters. 211 | #latex_use_parts = False 212 | 213 | # If true, show page references after internal links. 214 | #latex_show_pagerefs = False 215 | 216 | # If true, show URL addresses after external links. 217 | #latex_show_urls = False 218 | 219 | # Documents to append as an appendix to all manuals. 220 | #latex_appendices = [] 221 | 222 | # If false, no module index is generated. 223 | #latex_domain_indices = True 224 | 225 | 226 | # -- Options for manual page output --------------------------------------- 227 | 228 | # One entry per manual page. List of tuples 229 | # (source start file, name, description, authors, manual section). 230 | man_pages = [ 231 | ('index', 'papers', u'Papers Documentation', 232 | [u'JF Bastien'], 1) 233 | ] 234 | 235 | # If true, show URL addresses after external links. 236 | #man_show_urls = False 237 | 238 | 239 | # -- Options for Texinfo output ------------------------------------------- 240 | 241 | # Grouping the document tree into Texinfo files. List of tuples 242 | # (source start file, target name, title, author, 243 | # dir menu entry, description, category) 244 | texinfo_documents = [ 245 | ('index', 'Papers', u'Papers Documentation', 246 | u'JF Bastien', 'Papers', 'One line description of project.', 247 | 'Miscellaneous'), 248 | ] 249 | 250 | # Documents to append as an appendix to all manuals. 251 | #texinfo_appendices = [] 252 | 253 | # If false, no module index is generated. 254 | #texinfo_domain_indices = True 255 | 256 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 257 | #texinfo_show_urls = 'footnote' 258 | 259 | # If true, do not generate a @detailmenu in the "Top" node's menu. 260 | #texinfo_no_detailmenu = False 261 | -------------------------------------------------------------------------------- /source/p0528r3.bs: -------------------------------------------------------------------------------- 1 | 16 | 17 | This issue has been discussed by the authors at every recent Standards meetings, 18 | yet a full solution has been elusive despite helpful proposals. We believe that 19 | this proposal can fix this oft-encountered problem once and for all. 20 | 21 | [[P0528r0]] details extensive background on this problem (not repeated here), 22 | and proposed standardizing a trait, `has_padding_bits`, and using it on 23 | `compare_and_exchange_*`. [[P0528r1]] applied EWG guidance and simply added 24 | wording directing implementations to ensure that the desired behavior occur. At 25 | SG1's request this paper follows EWG's guidance but uses different wording. 26 | 27 | 28 | Edit History {#edit} 29 | ============ 30 | 31 | r2 → r3 {#r2r3} 32 | ------- 33 | 34 | In Rapperswil, CWG suggested various wording updates to the paper. 35 | 36 | 37 | r1 → r2 {#r1r2} 38 | ------- 39 | 40 | In Jacksonville, SG1 supported the paper but suggested an alternate way to 41 | approach the wording than the one EWG proposed in Albuquerque: don't talk about 42 | contents of the memory, but rather discuss the value representation to describe 43 | compare-and-exchange. This paper follows SG1's guidance and offers different 44 | wording, with the intent that the semantics be equivalent. EWG reviewed the 45 | updated wording an voted to support it and forward to Core. 46 | 47 | r0 → r1 {#r0r1} 48 | ------- 49 | 50 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming 51 | value of `T` have a consistent value for the purposes of read/modify/write 52 | atomic operations? 53 | 54 | Purposefully not addressed in this paper: 55 | 56 | * `union` with padding bits 57 | * Types with trap representations 58 | 59 | Proposed Wording {#word} 60 | ================ 61 | 62 | In Operations on `atomic` types [**atomics.types.operations**], edit ❡17 and 63 | onwards as follows: 64 | 65 |
66 | 67 |
 68 | 
 69 | bool compare_exchange_weak(T& expected, T desired,
 70 |                            memory_order success, memory_order failure) volatile noexcept;
 71 | bool compare_exchange_weak(T& expected, T desired,
 72 |                            memory_order success, memory_order failure) noexcept;
 73 | bool compare_exchange_strong(T& expected, T desired,
 74 |                              memory_order success, memory_order failure) volatile noexcept;
 75 | bool compare_exchange_strong(T& expected, T desired,
 76 |                              memory_order success, memory_order failure) noexcept;
 77 | bool compare_exchange_weak(T& expected, T desired,
 78 |                            memory_order order = memory_order::seq_cst) volatile noexcept;
 79 | bool compare_exchange_weak(T& expected, T desired,
 80 |                            memory_order order = memory_order::seq_cst) noexcept;
 81 | bool compare_exchange_strong(T& expected, T desired,
 82 |                              memory_order order = memory_order::seq_cst) volatile noexcept;
 83 | bool compare_exchange_strong(T& expected, T desired,
 84 |                              memory_order order = memory_order::seq_cst) noexcept;
 85 | 
 86 | 
87 | 88 |
89 | 90 | ❡17: 91 | 92 |
93 | 94 | *Requires:* The `failure` argument shall not be `memory_order::release` nor 95 | `memory_order::acq_rel`. 96 | 97 |
98 | 99 | ❡18: 100 | 101 |
102 | 103 | *Effects:* Retrieves the value in `expected`. It then atomically compares 104 | the contents of the memoryvalue representation of the value 105 | pointed to by `this` for equality with that previously retrieved from 106 | `expected`, and if true, replaces the contents of the memoryvalue 107 | pointed to by `this` with that in 108 | `desired`. If and only if the comparison is true, memory is affected according 109 | to the value of `success`, and if the comparison is false, memory is affected 110 | according to the value of `failure`. When only one `memory_order` argument is 111 | supplied, the value of `success` is `order`, and the value of `failure` is 112 | `order` except that a value of `memory_order::acq_rel` shall be replaced by the 113 | value `memory_order::acquire` and a value of `memory_order::release` shall be 114 | replaced by the value `memory_order::relaxed`. If and only if the comparison is 115 | false then, after the atomic operation, the contents of the 116 | memoryvalue in `expected` areis 117 | replaced by the value read from the memory pointed to 118 | by `this` during the atomic comparison. If the operation returns `true`, these 119 | operations are atomic read-modify-write operations on the memory pointed to by 120 | `this`. Otherwise, these operations are atomic load operations on that memory. 121 | 122 |
123 | 124 | ❡19: 125 | 126 |
127 | 128 | *Returns:* The result of the comparison. 129 | 130 |
131 | 132 | ❡20: 133 | 134 |
135 | 136 | [*Note:* 137 | 138 | For example, the effect of `compare_exchange_strong` on objects without padding bits is 139 | 140 | 141 | 142 | if (memcmp(this, &expected, sizeof(*this)) == 0) 143 | memcpy(this, &desired, sizeof(*this)); 144 | else 145 | memcpy(expected, this, sizeof(*this)); 146 | 147 | 148 | 149 | —*end note*] 150 | 151 | [*Example:* 152 | 153 | The expected use of the compare-and-exchange operations is as follows. The 154 | compare-and-exchange operations will update `expected` when another iteration 155 | of the loop is needed. 156 | 157 | 158 | 159 | expected = current.load(); 160 | do { 161 | desired = function(expected); 162 | } while (!current.compare_exchange_weak(expected, desired)); 163 | 164 | 165 | 166 | —*end example*] 167 | 168 | [*Example:* 169 | 170 | Because the expected value is updated only on failure, code releasing the 171 | memory containing the `expected` value on success will work. E.g. list head 172 | insertion will act atomically and would not introduce a data race in the 173 | following code: 174 | 175 | 176 | 177 | do { 178 | p->next = head; // make new list node point to the current head 179 | } while (!head.compare_exchange_weak(p->next, p)); // try to insert 180 | 181 | 182 | 183 | —*end example*] 184 | 185 |
186 | 187 | ❡21: 188 | 189 |
190 | 191 | Implementations should ensure that weak compare-and-exchange operations do not 192 | consistently return `false` unless either the atomic object has value different 193 | from `expected` or there are concurrent modifications to the atomic object. 194 | 195 |
196 | 197 | ❡22: 198 | 199 |
200 | 201 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is, 202 | even when the contents of memory referred to by `expected` and `this` are equal, 203 | it may return `false` and store back to `expected` the same memory contents that 204 | were originally there. 205 | 206 | [*Note:* 207 | 208 | This spurious failure enables implementation of compare-and-exchange on a 209 | broader class of machines, e.g., load-locked store-conditional machines. A 210 | consequence of spurious failure is that nearly all uses of weak 211 | compare-and-exchange will be in a loop. When a compare-and-exchange is in a 212 | loop, the weak version will yield better performance on some platforms. When a 213 | weak compare-and-exchange would require a loop and a strong one would not, the 214 | strong one is preferable. 215 | 216 | —*end note*] 217 | 218 |
219 | 220 | ❡23: 221 | 222 |
223 | 224 | [*Note:* 225 | 226 | Under cases where the The `memcpy` and `memcmp` 227 | semantics of the compare-and-exchange operations apply, the outcome might 228 | be may result in failed comparisons for values that compare 229 | equal with `operator==` if the underlying type has padding bits, trap bits, or 230 | alternate representations of the same value. Notably, on implementations 231 | conforming to ISO/IEC/IEEE 60559, floating-point `-0.0` and `+0.0` will not 232 | compare equal with `memcmp` but will compare equal with `operator==`, and NaNs 233 | with the same payload will compare equal with `memcmp` but will not compare 234 | equal with `operator==`. 235 | 236 | —*end note*] 237 | 238 | 239 | 240 | [*Note:* 241 | 242 | Because compare-and-exchange acts on an object’s value representation, padding 243 | bits that never participate in the object’s value representation are ignored. 244 | 245 | As a consequence, the following code is guaranteed to avoid spurious failure: 246 | 247 | 248 | 249 | struct padded { 250 | char clank = 0x42; 251 | // Padding here. 252 | unsigned biff = 0xC0DEFEFE; 253 | }; 254 | atomic<padded> pad = ATOMIC_VAR_INIT({}); 255 | 256 | bool zap() { 257 | padded expected, desired { 0, 0 }; 258 | return pad.compare_exchange_strong(expected, desired); 259 | } 260 | 261 | 262 | 263 | —*end note*] 264 | 265 | [*Note:* 266 | 267 | For a union with bits that participate in the value representation of some 268 | members but not others, compare-and-exchange might always fail. This is because 269 | such padding bits have an indeteminate value when they do not participate in 270 | the value representation of the active member. 271 | 272 | As a consequence, the following code is not guaranteed to ever succeed: 273 | 274 | 275 | 276 | union pony { 277 | double celestia = 0.; 278 | short luna; // padded 279 | }; 280 | atomic<pony> princesses = ATOMIC_VAR_INIT({}); 281 | 282 | bool party(pony desired) { 283 | pony expected; 284 | return princesses.compare_exchange_strong(expected, desired); 285 | } 286 | 287 | 288 | 289 | —*end note*] 290 | 291 | 292 | 293 |
294 | -------------------------------------------------------------------------------- /source/N4522.rst: -------------------------------------------------------------------------------- 1 | ============================================== 2 | N4522 ``std::atomic_object_fence(mo, T&&...)`` 3 | ============================================== 4 | 5 | :Author: Olivier Giroux 6 | :Contact: ogiroux@nvidia.com 7 | :Author: JF Bastien 8 | :Contact: jfb@google.com 9 | :Date: 2015-05-21 10 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4522.rst 11 | 12 | --------- 13 | Rationale 14 | --------- 15 | 16 | Fences allow programmers to express a conservative approximation to the precise 17 | pair-wise relations of operations required to be ordered in the happens-before 18 | relation. This is conservative because fences use the sequenced-before relation 19 | to select vast extents of the program into the happens-before relation. 20 | 21 | This conservatism is commonly desired because it is difficult to reason about 22 | operations hidden behind layers of abstraction in C++ programs. An unfortunate 23 | consequence of this is that precise expression of ordering is not possible in 24 | C++ currently, which makes it easy to over-constrain the order of operations 25 | internal to synchronization primitives that comprise multiple atomic objects. 26 | This constrains the ability of implementations (compiler and hardware) to 27 | reorder, ignore, or assume the absence of operations that are not relevant or 28 | not visible. 29 | 30 | In existing practice, the ``flush`` primitive of OpenMP is more expressive than 31 | the fences of C++ in at least this one sense: it can optionally restrict the 32 | ordering of operations to a developer-specified set of memory locations. This is 33 | enough to exactly express the required pair-wise ordering for short lock-free 34 | algorithms. This capability isn't only relevant to OpenMP and would be further 35 | enhanced if it was integrated with the other facets of the more modern C++ 36 | memory model. 37 | 38 | An example use-case for this capability is a likely implementation strategy for 39 | N4392_'s ``std::barrier`` object. This algorithm makes ordered modifications on 40 | the atomic sub-objects of a larger non-atomic synchronization object, but the 41 | internal modifications need only be ordered with respect to each other, not all 42 | surrounding objects (they are ordered separately). 43 | 44 | .. _N4392: http://wg21.link/N4392 45 | 46 | In one example implementation, ``std::barrier`` is coded as follows: 47 | 48 | .. code-block:: c++ 49 | 50 | struct barrier { 51 | // Some member functions elided. 52 | void arrive_and_wait() { 53 | int const myepoch = epoch.load(memory_order_relaxed); 54 | int const result = arrived.fetch_add(1, memory_order_acq_rel) + 1; 55 | if (result == expected) { 56 | expected = nexpected.load(memory_order_relaxed); 57 | arrived.store(0, memory_order_relaxed); 58 | // Only need to order {expected, arrived} -> {epoch}. 59 | epoch.store(myepoch + 1, memory_order_release); 60 | } 61 | else 62 | while (epoch.load(memory_order_acquire) == myepoch) 63 | ; 64 | } 65 | private: 66 | int expected; 67 | atomic arrived, nexpected, epoch; 68 | }; 69 | 70 | The release operation on the epoch atomic is likely to require the compiler to 71 | insert a fence that has an effect that goes beyond the intended constraint, 72 | which is to order only the operations on the barrier object. Since the barrier 73 | object is likely to be smaller than a cache line and the library's 74 | implementation can control its alignment using ``alignas``, then it would be 75 | possible to compile this program without a fence in this location on 76 | architectures that are cache-line coherent. 77 | 78 | To concisely express the bound on the set of memory operations whose order is 79 | constrained, we propose to accompany ``std::atomic_thread_fence`` with an 80 | ``object`` variant which takes a reference to the object(s) to be ordered by 81 | the fence. 82 | 83 | ----------------- 84 | Proposed addition 85 | ----------------- 86 | 87 | Under 29.2 Header ```` synopsis [**atomics.syn**]: 88 | 89 | .. code-block:: c++ 90 | 91 | namespace std { 92 | // 29.8, fences 93 | // ... 94 | template 95 | void atomic_object_fence(memory_order, T&&... objects) noexcept; 96 | } 97 | 98 | Under 29.8 Fences [**atomics.fences**], after the current 99 | ``atomic_thread_fence`` paragraph: 100 | 101 | ``template void atomic_object_fence(memory_order, T&&... objects) noexcept;`` 102 | 103 | *Effect*: Equivalent to ``atomic_thread_fence(order)`` except that operations on 104 | objects other than those in the variadic template arguments and their 105 | sub-objects are *un-sequenced* with the fence. 106 | 107 | *Note*: The compiler may omit fences entirely depending on alignment 108 | information, may generate a dynamic test leading to a fence for under-aligned 109 | objects, or may emit the same fence an ``atomic_thread_fence`` would. 110 | 111 | The ``__cpp_lib_atomic_object_fence`` feature test macro should be added. 112 | 113 | ---------------------- 114 | Example implementation 115 | ---------------------- 116 | 117 | A trivial, yet conforming implementation may implement the new fence in terms of 118 | the existing ``std::atomic_thread_fence`` using the same memory order: 119 | 120 | .. code-block:: c++ 121 | 122 | template 123 | void atomic_object_fence(std::memory_order order, T &&...) noexcept { 124 | std::atomic_thread_fence(order); 125 | } 126 | 127 | A more advanced implementation can overload this for the single-object case 128 | on architectures (or micro-architectures) that have cache coherency with a known 129 | line size, even if it is conservatively approximated: 130 | 131 | .. code-block:: c++ 132 | 133 | #define __CACHELINE_SIZE // Secret (micro-)architectural value. 134 | template 135 | std::enable_if_t::value && 136 | __CACHELINE_SIZE - alignof(T) % __CACHELINE_SIZE >= sizeof(T)> 137 | atomic_object_fence(std::memory_order, T &&object) noexcept { 138 | asm volatile("" : "+m"(object) : "m"(object)); // Code motion barrier. 139 | } 140 | 141 | To extend this for multiple objects, an implementation for the same architecture may 142 | emit a run-time check that the total footprint of all the objects fits in the span of 143 | a single cache line. This check may commonly be eliminated as dead code, for example 144 | when the objects are references from a common base pointer. 145 | 146 | The above ``std::barrier`` example's inner-code can use the new overload as follows: 147 | 148 | .. code-block:: c++ 149 | 150 | if (result == expected) { 151 | expected = nexpected.load(memory_order_relaxed); 152 | arrived.store(0, memory_order_relaxed); 153 | atomic_object_fence(memory_order_release, *this); 154 | epoch.store(myepoch + 1, memory_order_relaxed); 155 | } 156 | 157 | It is equivalently valid to list the individual members of ``barrier`` instead of 158 | ``*this``. Both forms are equivalent. 159 | 160 | Less trivial implementations of ``std::atomic_object_fence`` can enable more 161 | optimizations for new hardware and portable program representations. 162 | 163 | ----------------- 164 | Relation to N4523 165 | ----------------- 166 | 167 | In N4523_ we propose to formalize the notions of false-sharing and true-sharing 168 | as perceived by the implementation in relation to the placement of objects in 169 | memory. In the expository implementation of the previous section we also showed 170 | how a cache-line coherent architecture or micro-architecture can elide fences 171 | that only bisect relations between objects that are in the same cache line, if 172 | provable at compile-time. These notions interact in a virtuous way because 173 | N4523's abstraction enables reasoning about likely cache behavior that 174 | implementations can optimize for. 175 | 176 | .. _N4523: http://wg21.link/N4523 177 | 178 | The example application of ``std::atomic_object_fence`` to the ``std::barrier`` 179 | object is improved by combining these notions as follows: 180 | 181 | .. code-block:: c++ 182 | 183 | alignas(std::thread::hardware_true_sharing_size) // N4523 184 | struct barrier { 185 | // Some member functions elided. 186 | void arrive_and_wait() { 187 | int const myepoch = epoch.load(memory_order_relaxed); 188 | int const result = arrived.fetch_add(1, memory_order_acq_rel) + 1; 189 | if (result == expected) { 190 | expected = nexpected.load(memory_order_relaxed); 191 | arrived.store(0, memory_order_relaxed); 192 | atomic_object_fence(memory_order_release, *this); // N4522 193 | epoch.store(myepoch + 1, memory_order_relaxed); 194 | } 195 | else 196 | while (epoch.load(memory_order_acquire) == myepoch) 197 | ; 198 | } 199 | private: 200 | int expected; 201 | atomic arrived, nexpected, epoch; 202 | }; 203 | 204 | By aligning the barrier object to the true-sharing granularity, it is 205 | significantly more likely that the implementation will be able to elide the 206 | fence if the architecture or micro-architecture has cache-line coherency. Of 207 | course an implementation of the Standard is free to ensure this by other means, 208 | we provide this example as exposition for what developer programs might do. 209 | 210 | -------------------- 211 | Memory model example 212 | -------------------- 213 | 214 | =========================== =========================== 215 | T0 T1 216 | =========================== =========================== 217 | ``0: w = 1;`` ``4: while(!a.load(rlx));`` 218 | ``1: x = 1;`` ``5: objfence(acq, a, x);`` 219 | ``2: objfence(rel, a, x);`` ``6: assert(x);`` 220 | ``3: a.store(1,rlx);`` ``7: assert(w);`` 221 | =========================== =========================== 222 | 223 | The semantics of fences mean that: 224 | 225 | ``2`` synchronizes-with ``5`` because [**29.8¶2**]: 226 | A. ``2`` is sequenced-before ``3``, 227 | B. ``3`` inter-thread happens-before ``4``, and 228 | C. ``4`` is sequenced-before ``5``. 229 | 230 | ``1`` happens-before ``6`` because [**1.10¶13-14**]: 231 | A. ``1`` is sequenced-before ``2``, 232 | B. ``2`` synchronizes-with ``5``, and 233 | C. ``5`` is sequenced-before ``6``. 234 | 235 | Therefore the program is well-defined (so far) and the ``assert(x)`` of ``6`` 236 | does not fire. 237 | 238 | However, the *un-sequenced* semantics of the object fence also mean that: 239 | 240 | ``0`` conflicts with ``7`` because [**1.10¶23**]: 241 | A. ``0`` is a store to ``w``, ``7`` is a load of ``w`` and they are not both 242 | atomic, and 243 | B. ``0`` is not sequenced-before ``2`` and ``5`` is not sequenced-before 244 | ``7``. 245 | 246 | Therefore the ``assert(w)`` of ``7`` makes the program undefined due to a 247 | data-race. 248 | 249 | -------------------------------------------------------------------------------- /source/P1018r5.bs: -------------------------------------------------------------------------------- 1 | 17 | 18 | Executive summary {#summary} 19 | ================= 20 | 21 | Most time was spent in ballot resolution for C++20, to address National Body comments in [[N4844]]. 22 | 23 | 24 | Work highlights {#high} 25 | =============== 26 | 27 | Language Evolution received roughly 100 National Body comments. We did at least one round of discussion on all of these comments. 28 | 29 | * Concepts: allow requires clauses on non-template friend functions of class templates. 30 | * Coroutines: most comments rejected, a few sent away to write a paper. 31 | * Undefined Behavior: deferred addressing all comments to C++23. 32 | * Feature test macros: comments were addressed. 33 | * Modules: many comments, including fixing issues around header units. 34 | * Changed how non-type template parameters work: allow types with all public members, all of which can themselves be used as NTTPs. This allows array members, reference members, pointers and references to subobjects, floating-point, and unions. 35 | * Began discussing some papers targeted at C++23. 36 | 37 | 38 | National Body comment details {#nb-details} 39 | ============================= 40 | 41 | Miscellaneous NB comments: 42 | 43 | 55 | 56 | using enum: 57 | 58 | 62 | 63 | Non-type template parameters: 64 | 65 | 71 | 72 | Concepts: 73 | 74 | 84 | 85 | Coroutines: 86 | 87 | 104 | 105 | Undefined Behavior: 106 | 107 | 111 | 112 | Modules: seen by SG2, mostly accepted their recommendations. 113 | 114 | Unicode: 115 | 116 | 119 | 120 | Feature test macros: 121 | 122 | All addressed by [[P1902r1]]. 123 | 124 | 129 | 130 | 131 | C++23 discussions {#cpp32} 132 | ================= 133 | 134 | We started discussing a few papers which could make it to C++23. 135 | 136 | * Floating-point types from [[P1467r2]] and [[P1468r2]] received strong support. 137 | * [[P1105r1]] freestanding: there's ongoing interest in better supporting freestanding targets, and we gave direction to the author. 138 | * [[P1371r1]] Pattern matching: moving along, but the authors need help with implementation / usage experience if we want this to make C++23. 139 | * [[P1040r4]] `std::embed`: was seen by this group and others, and received confusing feedback, though most many people agree there's something useful to be had here. 140 | * [[P1219r2]] Homogeneous variadic function parameters: did not receive sufficient support to move forward. 141 | * [[P1097r2]] Named character escapes: received feedback, will see again. 142 | * [[P1895r0]] tag_invoke: A general pattern for supporting customisable functions: the general feeling was that there were some concerns with a library-only solution to the problem. Several interested parties are planning on working with the paper authors to try to come up with such a language feature. 143 | * [[P1676r0]] C++ Exception Optimizations. An experiment: informative discussion. 144 | * [[P1365r0]] Using Coroutine TS with zero dynamic allocations: informative discussion. 145 | * [[P1046r1]] Automatically Generate More Operators: received feedback, fairly positive. 146 | * [[P1908r0]] Reserving Attribute Names for Future Use: accepted, sent to CWG. 147 | * [[P0876r9]] `fiber_context` - fibers without scheduler: targets a TS. Gave feedback, will see again. 148 | * [[P1061r1]] Structured Bindings can introduce a Pack: approve of general direction. 149 | * [[P1839r1]] Accessing Object Representations: approve of general direction. 150 | 151 | 152 | Near-future EWG plans {#future} 153 | ===================== 154 | 155 | There will still be some ballot resolution work in Prague, to address comments which we discussed but haven't resolved in Belfast. There will be no further ballot resolution after Prague. 156 | 157 | Ballot resolution will likely take a small portion of our time. Once that is done, Language Evolution will switch into full C++23 mode, likely following the plans outlined in [[P0592r3]]. These plans were discussed in multiple groups and received strong support. 158 | --------------------------------------------------------------------------------