├── .gitignore
├── source
    ├── Standardizing C++.pdf
    ├── index.rst
    ├── D2151r0.bs
    ├── DCanadian.bs
    ├── P0528r0.cc
    ├── D1501R0.bs
    ├── _templates
    │   └── layout.html
    ├── bikeshed.bs
    ├── P1205R0.bs
    ├── N4509.cc
    ├── P0152.cc
    ├── N4509.rst
    ├── P0908r0.bs
    ├── P0152R1.rst
    ├── P0152R0.rst
    ├── Math.signbit.bs
    ├── P0476r0.bs
    ├── p1102r0.bs
    ├── P0476r1.bs
    ├── P0502r0.bs
    ├── P0418r1.bs
    ├── P0418r2.bs
    ├── p1119r0.bs
    ├── N4523.rst
    ├── P1018R19.bs
    ├── P0528r1.bs
    ├── P0528r2.bs
    ├── P1018r6.bs
    ├── P1225R0.bs
    ├── P0154R0.rst
    ├── P0154R1.rst
    ├── P0476r2.bs
    ├── conf.py
    ├── p0528r3.bs
    ├── N4522.rst
    └── P1018r5.bs
├── .travis.yml
├── linkcheck.sh
├── deploy.sh
├── README.md
└── Makefile


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | *.html
3 | build/
4 | 


--------------------------------------------------------------------------------
/source/Standardizing C++.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jfbastien/papers/HEAD/source/Standardizing C++.pdf


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | sudo: false
 2 | # This repository doesn't contain Python code, but it uses Python tooling.
 3 | language: python
 4 | python:
 5 |   - "2.7"
 6 | install:
 7 |   - pip install sphinx pygments lxml setuptools --upgrade
 8 |   - git clone https://github.com/tabatkins/bikeshed.git
 9 |   - pip install --editable $PWD/bikeshed
10 |   - bikeshed update
11 | script:
12 |   - make html
13 |   - ./linkcheck.sh
14 | notifications:
15 |   email: false
16 | 


--------------------------------------------------------------------------------
/source/index.rst:
--------------------------------------------------------------------------------
 1 | C++ standards committee papers by JF Bastien
 2 | ============================================
 3 | 
 4 | Here are a few papers that I've written for the C++ standards committee. This
 5 | list isn't comprehensive and currently only contains the papers which I've moved
 6 | to github_.
 7 | 
 8 |  .. _github: https://github.com/jfbastien/papers
 9 | 
10 | .. toctree::
11 |   :maxdepth: 1
12 | 
13 |   N4455
14 |   P0152R1
15 |   P0154R1
16 |   P0153R0
17 |   P0193R1
18 |   2016-02
19 | 
20 | Previous revisions of the above papers:
21 | 
22 | .. toctree::
23 |   :maxdepth: 1
24 | 
25 |   N4509
26 |   P0152R0
27 |   N4523
28 |   P0154R0
29 |   N4522
30 |   P0193R0
31 | 


--------------------------------------------------------------------------------
/linkcheck.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -u
 4 | set -x
 5 | 
 6 | out=./build/linkcheck/output.txt
 7 | rm -rf $out
 8 | 
 9 | # Ignore linkcheck failures: new documents point to their own github location
10 | # which doesn't exist yet.
11 | make linkcheck
12 | 
13 | if [ ! -f $out ]; then
14 |   echo "Cannot find $out"
15 |   exit 1
16 | fi
17 | 
18 | # Manually check failures, discarding self-point failures. The others matter.
19 | #
20 | # The output.txt format is:
21 | # filename.rst:LINE: [broken] https://example.com/path/to/filename.rst: HTTP Error 404: Not Found
22 | grep -v "^\([^:]*\):.*\1" $out | grep "\[broken\]"
23 | if [ $? -eq 0 ]; then
24 |   cat $out
25 |   exit 1
26 | fi
27 | 
28 | exit 0
29 | 


--------------------------------------------------------------------------------
/deploy.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | 
 3 | set -e
 4 | set -u
 5 | 
 6 | # Deploy generated html pages to github.io.
 7 | 
 8 | BUILD=./build/
 9 | HTML=$BUILD/html
10 | DIR=$BUILD/jfbastien.github.io
11 | PAPERS=$DIR/papers
12 | CLONE=git@github.com:jfbastien/jfbastien.github.io.git
13 | 
14 | HASH=$(git rev-parse HEAD)
15 | SUBJECT=$(git log -n1 --pretty=format:%s)
16 | 
17 | # Hacky reuse of git's require_clean_work_tree.
18 | OPTIONS_SPEC=
19 | LONG_USAGE=
20 | USAGE=
21 | NONGIT_OK=
22 | SUBDIRECTORY_OK=
23 | source $(git --exec-path)/git-sh-setup ""
24 | require_clean_work_tree deploy "Please commit or stash changes."
25 | 
26 | # Copy generated html files to the github.io repo.
27 | rm -rf $DIR
28 | mkdir $DIR
29 | git clone $CLONE $DIR
30 | find $HTML/*.html -maxdepth 1 -type f \
31 |   \( -iname "*.html" ! -iname "genindex.html" ! -iname "search.html" \) | \
32 |   xargs -I{} cp {} $PAPERS/
33 | 
34 | # Commit the changes, and deploy them.
35 | pushd $PAPERS
36 | git status
37 | git add "*.html"
38 | git commit -m "Update '$SUBJECT'
39 | 
40 | Hash: $HASH"
41 | git push origin master
42 | popd
43 | 


--------------------------------------------------------------------------------
/source/D2151r0.bs:
--------------------------------------------------------------------------------
 1 | <pre class='metadata'>
 2 | Title: Language Evolution Issue List
 3 | Shortname: P2151
 4 | Revision: 0
 5 | Audience: EWG
 6 | Status: D
 7 | Group: WG21
 8 | URL: http://wg21.link/P2151r0
 9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P2151r0.bs">github.com/jfbastien/papers/blob/master/source/P2151r0.bs</a>
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Date: 2020-04-10
12 | Markup Shorthands: markdown yes
13 | Toggle Diffs: no
14 | No abstract: true
15 | </pre>
16 | 
17 | The purpose of this document is to record the status of issues which have come before the Evolution Working Group (EWG) of the INCITS PL22.16 and ISO WG21 C++ Standards Committee. Issues represent potential defects in the C++ Standard. Issues against Core Language, Library, and Library Evolution are tracked separately.
18 | 
19 | EWG issues were previously tracked by [[N4539]].
20 | 
21 | This document contains:
22 | 
23 | * Evolution issues which are actively being considered by the Evolution Working Group, i.e., issues which have a status of New, Open, Ready, or Review.
24 | * Evolution issues which have have been closed since the document was last updated.
25 | 
26 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # C++ Standard Committee Papers
 2 | 
 3 | Build status: [![Build Status](https://travis-ci.org/jfbastien/papers.svg?branch=master)](https://travis-ci.org/jfbastien/papers)
 4 | 
 5 | Official C++ Standard Committee papers are available from [the C++ mailings][].
 6 | 
 7 | More information on the C++ Standard Committee is available on
 8 | [the Committee site][].
 9 | 
10 | I've written a few of these papers and co-authored a few others.
11 | I initially wrote them using reStructuredText, but have now moved to
12 | [bikeshed](https://github.com/tabatkins/bikeshed). Papers in this repository are
13 | final and published when numbered `N` or `P`, and are drafts when numbered
14 | `D`. This is an ISO thing: I can't revise already-published `N` or `P`
15 | papers. The paper revision (the `R` part in `P` numbered papers) has to be
16 | incremented, and a new paper published.
17 | 
18 | New paper numbers are obtained through the Committee's Vice-Chair. The Committee's
19 | website details [how to submit proposals][].
20 | 
21 |   [the Committee site]: https://isocpp.org/std/the-committee
22 |   [the C++ mailings]: http://open-std.org/jtc1/sc22/wg21/docs/papers/
23 |   [how to submit proposals]: https://isocpp.org/std/submit-a-proposal
24 | 


--------------------------------------------------------------------------------
/source/DCanadian.bs:
--------------------------------------------------------------------------------
 1 | <pre class='metadata'>
 2 | Title: Canadian friends are not friends
 3 | Shortname: D????
 4 | Revision: 0
 5 | Audience: EWG
 6 | Status: D
 7 | Group: WG21
 8 | URL: http://wg21.link/P????
 9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/DCanadian.bs">github.com/jfbastien/papers/blob/master/source/DCanadian.bs</a>
10 | Editor: JF Bastien, Woven by Toyota, cxx@jfbastien.com
11 | Editor: Bruno Cardoso Lopes, Meta, bruno.cardoso@gmail.com
12 | Editor: Michael Spencer, Apple, bigcheesegs@gmail.com
13 | Date: 2023-06-13
14 | Markup Shorthands: markdown yes
15 | Toggle Diffs: no
16 | No abstract: true
17 | </pre>
18 | 
19 | This paper addresses [[CWG1699]].
20 | 
21 | ```
22 | import Canadian; // Contains `export class Canadian { class buddy {}; friend struct friendly; };`
23 | 
24 | class c {
25 |     class n {};
26 |     friend struct friendly;
27 | };
28 | 
29 | void g() { // #2
30 |   // 'n' accessible here? 
31 | }
32 | 
33 | struct friendly {
34 |     friend class c::n;          // #1
35 |     friend void g();            // #2
36 |     friend void h();            // #3
37 |     friend void f() { c::n(); } // #4 (EDG/MSVC Reject, Clang/GCC Accept)
38 |     friend class Canadian::buddy;
39 |     friend void ohCanada() {    // #5
40 |         // Canadian::buddy accessible here?
41 |     }
42 | };
43 | 
44 | void h() { // #3
45 |   // 'n' accessible here? 
46 | }
47 | ```
48 | 


--------------------------------------------------------------------------------
/source/P0528r0.cc:
--------------------------------------------------------------------------------
 1 | #include <atomic>
 2 | #include <cstring>
 3 | #include <new>
 4 | #include <stdio.h>
 5 | #include <type_traits>
 6 | 
 7 | struct Padded {
 8 |   char c = 0xFF;
 9 |   // Padding here.
10 |   int i = 0xFEEDFACE;
11 |   Padded() = default;
12 | };
13 | typedef std::atomic<Padded> Atomic;
14 | typedef std::aligned_storage<sizeof(Atomic)>::type Storage;
15 | 
16 | void peek(const char* what, void *into) {
17 |   printf("%16s %08x %08x\n", what, *(int*)into, *(1 + (int*)into));
18 | }
19 | 
20 | Storage* create() {
21 |   auto* storage = new Storage();
22 |   std::memset(storage, 0xBA, sizeof(Storage));
23 |   asm volatile("":::"memory");
24 |   peek("storage", storage);
25 |   return storage;
26 | }
27 | 
28 | Atomic* change(Storage* storage) {
29 |   // As if we used an allocator which reuses memory.
30 |   auto* atomic = new(storage) Atomic;
31 |   peek("atomic placed", atomic);
32 |   std::atomic_init(atomic, Padded()); // Which bits go in?
33 |   peek("atomic init", atomic);
34 |   return atomic;
35 | }
36 | 
37 | Padded infloop_maybe(Atomic* atomic) {
38 |   Padded desired;  // Padding unknown.
39 |   Padded expected; // Could be different.
40 |   peek("desired before", &desired);
41 |   peek("expected before", &expected);
42 |   peek("atomic before", atomic);
43 |   while (
44 |     !atomic->compare_exchange_strong(
45 |       expected,
46 |       desired // Padding bits added and removed here ˙ ͜ʟ˙
47 |   ));
48 |   peek("expected after", &expected);
49 |   peek("atomic after", atomic);
50 |   return expected; // Maybe changed here as well.
51 | }
52 | 
53 | int main() {
54 |   auto* storage = create();
55 |   auto* atomic = change(storage);
56 |   Padded p = infloop_maybe(atomic);
57 |   peek("main", &p);
58 |   return 0;
59 | }
60 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | # Makefile for Sphinx documentation
 2 | #
 3 | 
 4 | # You can set these variables from the command line.
 5 | SPHINXOPTS    =
 6 | SPHINXBUILD   = sphinx-build
 7 | PAPER         =
 8 | BUILDDIR      = build
 9 | 
10 | # User-friendly check for sphinx-build
11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
13 | endif
14 | 
15 | # Internal variables.
16 | PAPEROPT_a4     = -D latex_paper_size=a4
17 | PAPEROPT_letter = -D latex_paper_size=letter
18 | ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
19 | 
20 | .PHONY: help clean html linkcheck deploy
21 | 
22 | help:
23 | 	@echo "Please use \`make <target>' where <target> is one of"
24 | 	@echo "  html       to make standalone HTML files"
25 | 	@echo "  linkcheck  to check all external links for integrity"
26 | 	@echo "  deploy     to deploy to github.io"
27 | 
28 | clean:
29 | 	rm -rf $(BUILDDIR)/*
30 | 
31 | html:
32 | 	echo "Building sphinx sources"
33 | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
34 | 	bikeshed update
35 | 	find ./source/ -name "*.bs" -type f | xargs -I{} -t -n1 bikeshed spec {}
36 | 	mv ./source/*.html $(BUILDDIR)/html/
37 | 	@echo
38 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
39 | 
40 | linkcheck:
41 | 	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
42 | 	@echo
43 | 	@echo "Link check complete; look for any errors in the above output " \
44 | 	      "or in $(BUILDDIR)/linkcheck/output.txt."
45 | 
46 | deploy: clean html linkcheck
47 | 	./deploy.sh
48 | 


--------------------------------------------------------------------------------
/source/D1501R0.bs:
--------------------------------------------------------------------------------
 1 | <pre class='metadata'>
 2 | Title: Feedback on <code>std::audio</code>
 3 | Shortname: D1501
 4 | Revision: 0
 5 | !Draft Revision: 0
 6 | Audience: SG13
 7 | Status: D
 8 | Group: WG21
 9 | URL: http://wg21.link/D1501R0
10 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/D1501R0.bs">github.com/jfbastien/papers/blob/master/source/D1501R0.bs</a>
11 | Editor: Richard Powell, Apple, richardp@apple.com
12 | Editor: Sophia Poirier, Apple, spoirier@apple.com
13 | Editor: Dan Klingler, Apple, dklingler@apple.com
14 | Editor: Tony Guetta, Apple, aguetta@apple.com
15 | No abstract: true
16 | Date: 2019-02-21
17 | Markup Shorthands: markdown yes
18 | </pre>
19 | 
20 | We’ve gathered input from a variety of folks involved in audio at Apple, and
21 | here is our joint, considered position regarding the `std::audio` proposal in
22 | [[P1386R0]].
23 | 
24 | Audio is important to the Apple ecosystem. The type system, and determinism of
25 | C++ lends itself well to the audio software domain. In the proposal we like the
26 | formalization of data types and algorithms that are common in the audio domain.
27 | However, we are concerned about the audio device interfaces and requiring C++
28 | systems to have a specific implementation.
29 | 
30 | Creating a good interface between software and audio hardware is something that
31 | on the surface seems straightforward, but on a practical system is challenging
32 | to implement correctly. This area has typically been fairly platform-specific or
33 | handled by specialist libraries, and may not be immediately amenable to
34 | standardization. We think it’s best not to standardize audio hardware I/O.
35 | 
36 | Instead of attempting to standardize the interface and mechanism of audio
37 | hardware, providing a common representation of audio data could be an area of
38 | exploration that is suited to the language.
39 | 


--------------------------------------------------------------------------------
/source/_templates/layout.html:
--------------------------------------------------------------------------------
 1 | {#
 2 |     Single-page template.
 3 | #}
 4 | {%- block doctype -%}
 5 | <!DOCTYPE html>
 6 | {%- endblock %}
 7 | {%- set titlesuffix = "" %}
 8 | <html>
 9 |   <head>
10 |     <meta charset="{{ encoding }}">
11 |     <meta name="viewport" content="width=device-width, initial-scale=1">
12 |     <link href='https://fonts.googleapis.com/css?family=Roboto' rel='stylesheet' type='text/css'>
13 |     <link href='https://fonts.googleapis.com/css?family=Inconsolata:bold' rel='stylesheet' type='text/css'>
14 |     <style>
15 |       body {
16 |         font-family: 'Roboto', sans-serif;
17 |       }
18 |       .body {
19 |         margin: 0 auto;
20 |         max-width: 80em;
21 |       }
22 |       a {
23 |         color: #455A64;
24 |       }
25 |       h1, h2, h3, h4, h5, h6 {
26 |         color: #37474F;
27 |       }
28 |       h1 a, h2 a, h3 a, h4 a, h5 a, h6 a {
29 |         padding-left: 1em;
30 |         padding-right: 3em;
31 |         text-decoration: none;
32 |         opacity: 0;
33 |       }
34 |       a.headerlink:hover {
35 |         opacity: 1;
36 |       }
37 |       .field-name {
38 |         text-align: right;
39 |         padding-right: 1em;
40 |       }
41 |       tt, .highlight {
42 |         color: #263238;
43 |         background-color: #ECEFF1;
44 |         font-family: 'Inconsolata', monospace;
45 |         font-weight: bold;
46 |       }
47 |       tt {
48 |         padding: 0em 0.5em;
49 |       }
50 |       .highlight {
51 |         margin: 0em 1em;
52 |         padding: 0.1em 1em;
53 |       }
54 |     </style>
55 |     {%- block htmltitle %}
56 |     <title>{{ title|striptags|e }}{{ titlesuffix }}</title>
57 |     {%- endblock %}
58 | {%- block extrahead %} {% endblock %}
59 |   </head>
60 |   <body>
61 | {%- block header %}{% endblock %}
62 | {%- block content %}
63 |   {%- block document %}
64 |     <div class="body">
65 | {% block body %} {% endblock %}
66 |     </div>
67 |   {%- endblock %}
68 | {%- endblock %}
69 |   </body>
70 | </html>
71 | 


--------------------------------------------------------------------------------
/source/bikeshed.bs:
--------------------------------------------------------------------------------
 1 | <pre class='metadata'>
 2 | Title: Shedding the bikeshed: C++ papers should focus on content, not style
 3 | Shortname: D0???
 4 | Revision: 1
 5 | Audience: all
 6 | Status: D
 7 | Group: WG21
 8 | URL: http://wg21.link/p0????
 9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/bikeshed.bs">github.com/jfbastien/papers/blob/master/source/bikeshed.bs</a>
10 | Editor: JF Bastien, cxx@jfbastien.com
11 | Abstract: Writing a C++ standards committee paper can be as easy as riding a bicycle 🚲
12 | Date: 2016-08-09
13 | Markup Shorthands: markdown yes
14 | Toggle Diffs: yes
15 | </pre>
16 | 
17 | Coloring the shed {#colour}
18 | =================
19 | 
20 | Thoughtful standards people put significant effort into writing their
21 | papers. Often, too much of that effort goes into <em>style</em> or
22 | <em>format</em> instead of <em>content</em>. This meta-paper is ironically all
23 | style and no C++ content. It proposes that you stop formatting and start using
24 | <a href="https://github.com/tabatkins/bikeshed">bikeshed</a>.
25 | 
26 | While we're at it, we'll also propose that you use a public version control
27 | service such as <a href="https://github.com">github</a> to make it easier for
28 | reviewers to see how a paper evolved, both while in draft state as well as from
29 | one revision to another. Final papers are meant to be consumed as-is, but your
30 | paper collaborators, editors, or future-self will thank you when performing
31 | archaeology to untangle the inevitable nonsensical part of your final paper.
32 | 
33 | To do {#todo}
34 | =====
35 | 
36 | https://github.com/tabatkins/bikeshed/blob/master/docs/quick-start.md
37 | 
38 | 1. Basics
39 |     - What does the final paper look like?
40 |     - What does the source look like? (see section 4.)
41 |     - Who uses it?
42 |     - Takes care of the boilerplate
43 | 2. Convenience
44 |     - Webpages work everywhere
45 |     - Readable offline, no downloads
46 |     - Unicode Just Works™ (even the EDG wiki now supports it)
47 | 3. Good practice
48 |     - github for diffs: easier to track changes
49 |     - github integration: auto-generation, etc
50 | 4. markdown + HTML escape hatch
51 |     - https://github.com/tabatkins/bikeshed/blob/master/docs/markup.md
52 |     - Railroad diagrams
53 |     - Code, and syntax highlight
54 |     - Toggle diff
55 | 5. Link to other papers
56 | 6. Getting started
57 |     - Installing https://github.com/tabatkins/bikeshed/blob/master/docs/install.md
58 | 


--------------------------------------------------------------------------------
/source/P1205R0.bs:
--------------------------------------------------------------------------------
 1 | <pre class='metadata'>
 2 | Title: Teleportation via <code>co_await</code>
 3 | Shortname: P1205
 4 | Revision: 0
 5 | Audience: SG1, CWG
 6 | Status: P
 7 | Group: WG21
 8 | URL: http://wg21.link/P1205R0
 9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P1205R0.bs">github.com/jfbastien/papers/blob/master/source/P1205R0.bs</a>
10 | Editor: Olivier Giroux, NVIDIA, ogiroux@nvidia.com
11 | Editor: JF Bastien, Apple, jfbastien@apple.com
12 | No abstract: true
13 | Date: 2018-09-28
14 | Markup Shorthands: markdown yes
15 | </pre>
16 | 
17 | Issues {#issues}
18 | ======
19 | 
20 | The C++ Coroutine TS [[N4736]] has issues 31 and 32 listed in [[P0664R5]]:
21 | 
22 | > **31.** Add a note warning about thread switching near await and/or `coroutine_handle` wording.
23 | >
24 | > Add a note warning about thread switching near await and/or `coroutine_handle` wording
25 | >
26 | > **32.** Add a normative text making it UB to migrate coroutines between certain kind of execution agents. 
27 | >
28 | > Add a normative text making it UB to migrate coroutines between certain kind of execution agents. Clarify that migrating between `std::thread`s is OK. But migrating between CPU and GPU is UB.
29 | 
30 | Discussion {#discuss}
31 | ==========
32 | 
33 | Using `co_await`, one can teleport a suspended execution between execution agents:
34 | 
35 | <xmp>
36 | thread::id get_an_id() {
37 | 
38 |   // here: acquire a lock, read thread_local
39 |     
40 |   co_yield std::this_thread::get_id(); //< one result
41 | 
42 |   // UB: release the lock, reuse the same thread_local
43 |     
44 |   co_return std::this_thread::get_id(); //< different result
45 | }
46 | </xmp>
47 | 
48 | We say "teleport" here because the code that relocates the coroutine is outside
49 | the coroutine, in a possibly unrelated part of the program. This teleportation
50 | can take your coroutine to many interesting places, for example:
51 | 
52 | 1. the thread that runs `main`
53 | 2. threads from `std::thread` / `std::async`
54 | 3. elemental functions of `std::par`, `std::par_unseq`, `std::unseq` algorithms
55 | 4. global / `thread_local` constructors (see note)
56 | 5. global / `thread_local` / `static` destructors (see note)
57 | 6. functions registered with `at_exit` / `quick_exit`
58 | 7. signal handlers
59 | 8. future `fibers_context` of [[P0876R3]]
60 | 
61 | Note that it is presently implementation-defined whether many of these functions
62 | run in a specific thread, a single thread, or in many unspecified threads—see
63 | [[CWG2046]].
64 | 
65 | Proposed Resolution {#resolution}
66 | ===================
67 | 
68 | After [[N4736]] [**dcl.fct.def.coroutine**] ❡6:
69 | 
70 | <blockquote>
71 | 
72 |   A suspended coroutine can be resumed to continue execution by invoking a
73 |   resumption member function of an object of type `coroutine_handle&lt;P&gt;`
74 |   associated with this instance of the coroutine. The function that invoked a
75 |   resumption member function is called *resumer*. Invoking a resumption member
76 |   function for a coroutine that is not suspended results in undefined behavior.
77 | 
78 | </blockquote>
79 | 
80 | Add ❡7:
81 | 
82 | <blockquote>
83 | <ins>
84 | 
85 |   Resuming a coroutine on an execution agent other than the one it was suspended
86 |   on has implementation-defined behavior unless both are instances of
87 |   `std::thread`. [*Note*: a coroutine that is moved this way should avoid the use
88 |   of `thread_local` or `mutex` objects. — *End note*.]
89 | 
90 | </ins>
91 | </blockquote>
92 | 


--------------------------------------------------------------------------------
/source/N4509.cc:
--------------------------------------------------------------------------------
 1 | #include <atomic>
 2 | #include <iostream>
 3 | 
 4 | namespace std {
 5 | 
 6 |   namespace detail {
 7 |     // It is implementation-defined what this returns, as long as:
 8 |     //
 9 |     // if (std::atomic<T>::is_always_lock_free)
10 |     //   assert(std::atomic<T>()::is_lock_free());
11 |     //
12 |     // An implementation may therefore have more variable template
13 |     // specializations than the ones shown below.
14 |     template<typename T> static constexpr bool is_always_lock_free = false;
15 | 
16 |     // Implementations must match the C ATOMIC_*_LOCK_FREE macro values.
17 |     template<> static constexpr bool is_always_lock_free<bool> = 2 == ATOMIC_BOOL_LOCK_FREE;
18 |     template<> static constexpr bool is_always_lock_free<char> = 2 == ATOMIC_CHAR_LOCK_FREE;
19 |     template<> static constexpr bool is_always_lock_free<signed char> = 2 == ATOMIC_CHAR_LOCK_FREE;
20 |     template<> static constexpr bool is_always_lock_free<unsigned char> = 2 == ATOMIC_CHAR_LOCK_FREE;
21 |     template<> static constexpr bool is_always_lock_free<char16_t> = 2 == ATOMIC_CHAR16_T_LOCK_FREE;
22 |     template<> static constexpr bool is_always_lock_free<char32_t> = 2 == ATOMIC_CHAR32_T_LOCK_FREE;
23 |     template<> static constexpr bool is_always_lock_free<wchar_t> = 2 == ATOMIC_WCHAR_T_LOCK_FREE;
24 |     template<> static constexpr bool is_always_lock_free<short> = 2 == ATOMIC_SHORT_LOCK_FREE;
25 |     template<> static constexpr bool is_always_lock_free<unsigned short> = 2 == ATOMIC_SHORT_LOCK_FREE;
26 |     template<> static constexpr bool is_always_lock_free<int> = 2 == ATOMIC_INT_LOCK_FREE;
27 |     template<> static constexpr bool is_always_lock_free<unsigned int> = 2 == ATOMIC_INT_LOCK_FREE;
28 |     template<> static constexpr bool is_always_lock_free<long> = 2 == ATOMIC_LONG_LOCK_FREE;
29 |     template<> static constexpr bool is_always_lock_free<unsigned long> = 2 == ATOMIC_LONG_LOCK_FREE;
30 |     template<> static constexpr bool is_always_lock_free<long long> = 2 == ATOMIC_LLONG_LOCK_FREE;
31 |     template<> static constexpr bool is_always_lock_free<unsigned long long> = 2 == ATOMIC_LLONG_LOCK_FREE;
32 |     template<typename T> static constexpr bool is_always_lock_free<T*> = 2 == ATOMIC_POINTER_LOCK_FREE;
33 |     template<> static constexpr bool is_always_lock_free<std::nullptr_t> = 2 == ATOMIC_POINTER_LOCK_FREE;
34 | 
35 |     // The macros do not support float, double, long double, but C++ does
36 |     // support atomics of these types. An implementation shall ensure that these
37 |     // types, as well as user-defined types, guarantee the above invariant that
38 |     // is_always_lock_free implies is_lock_free for the same type.
39 |   }
40 | 
41 |   template<typename T>
42 |   struct atomic_n4509 {
43 |     // ...
44 |     static constexpr bool is_always_lock_free = detail::is_always_lock_free<T>;
45 |     // ...
46 |   };
47 | 
48 | }
49 | 
50 | template<typename T> using atomic = std::atomic_n4509<T>;
51 | 
52 | int main() {
53 |   std::cout <<
54 |     "bool\t" << atomic<bool>::is_always_lock_free << '\n' <<
55 |     "char\t" << atomic<char>::is_always_lock_free << '\n' <<
56 |     "signed char\t" << atomic<signed char>::is_always_lock_free << '\n' <<
57 |     "unsigned char\t" << atomic<unsigned char>::is_always_lock_free << '\n' <<
58 |     "char16_t\t" << atomic<char16_t>::is_always_lock_free << '\n' <<
59 |     "char32_t\t" << atomic<char32_t>::is_always_lock_free << '\n' <<
60 |     "wchar_t\t" << atomic<wchar_t>::is_always_lock_free << '\n' <<
61 |     "short\t" << atomic<short>::is_always_lock_free << '\n' <<
62 |     "unsigned short\t" << atomic<unsigned short>::is_always_lock_free << '\n' <<
63 |     "int\t" << atomic<int>::is_always_lock_free << '\n' <<
64 |     "unsigned int\t" << atomic<unsigned int>::is_always_lock_free << '\n' <<
65 |     "long\t" << atomic<long>::is_always_lock_free << '\n' <<
66 |     "unsigned long\t" << atomic<unsigned long>::is_always_lock_free << '\n' <<
67 |     "long long\t" << atomic<long long>::is_always_lock_free << '\n' <<
68 |     "unsigned long long\t" << atomic<unsigned long long>::is_always_lock_free << '\n' <<
69 |     "void*\t" << atomic<void*>::is_always_lock_free << '\n' <<
70 |     "std::nullptr_t\t" << atomic<std::nullptr_t>::is_always_lock_free << '\n';
71 | 
72 |   return 0;
73 | }
74 | 


--------------------------------------------------------------------------------
/source/P0152.cc:
--------------------------------------------------------------------------------
 1 | #include <atomic>
 2 | #include <iostream>
 3 | 
 4 | namespace std {
 5 | 
 6 |   namespace detail {
 7 |     // It is implementation-defined what this returns, as long as:
 8 |     //
 9 |     // if (std::atomic<T>::is_always_lock_free)
10 |     //   assert(std::atomic<T>()::is_lock_free());
11 |     //
12 |     // An implementation may therefore have more variable template
13 |     // specializations than the ones shown below.
14 |     template<typename T> static constexpr bool is_always_lock_free = false;
15 | 
16 |     // Implementations must match the C ATOMIC_*_LOCK_FREE macro values.
17 |     template<> static constexpr bool is_always_lock_free<bool> = 2 == ATOMIC_BOOL_LOCK_FREE;
18 |     template<> static constexpr bool is_always_lock_free<char> = 2 == ATOMIC_CHAR_LOCK_FREE;
19 |     template<> static constexpr bool is_always_lock_free<signed char> = 2 == ATOMIC_CHAR_LOCK_FREE;
20 |     template<> static constexpr bool is_always_lock_free<unsigned char> = 2 == ATOMIC_CHAR_LOCK_FREE;
21 |     template<> static constexpr bool is_always_lock_free<char16_t> = 2 == ATOMIC_CHAR16_T_LOCK_FREE;
22 |     template<> static constexpr bool is_always_lock_free<char32_t> = 2 == ATOMIC_CHAR32_T_LOCK_FREE;
23 |     template<> static constexpr bool is_always_lock_free<wchar_t> = 2 == ATOMIC_WCHAR_T_LOCK_FREE;
24 |     template<> static constexpr bool is_always_lock_free<short> = 2 == ATOMIC_SHORT_LOCK_FREE;
25 |     template<> static constexpr bool is_always_lock_free<unsigned short> = 2 == ATOMIC_SHORT_LOCK_FREE;
26 |     template<> static constexpr bool is_always_lock_free<int> = 2 == ATOMIC_INT_LOCK_FREE;
27 |     template<> static constexpr bool is_always_lock_free<unsigned int> = 2 == ATOMIC_INT_LOCK_FREE;
28 |     template<> static constexpr bool is_always_lock_free<long> = 2 == ATOMIC_LONG_LOCK_FREE;
29 |     template<> static constexpr bool is_always_lock_free<unsigned long> = 2 == ATOMIC_LONG_LOCK_FREE;
30 |     template<> static constexpr bool is_always_lock_free<long long> = 2 == ATOMIC_LLONG_LOCK_FREE;
31 |     template<> static constexpr bool is_always_lock_free<unsigned long long> = 2 == ATOMIC_LLONG_LOCK_FREE;
32 |     template<typename T> static constexpr bool is_always_lock_free<T*> = 2 == ATOMIC_POINTER_LOCK_FREE;
33 |     template<> static constexpr bool is_always_lock_free<std::nullptr_t> = 2 == ATOMIC_POINTER_LOCK_FREE;
34 | 
35 |     // The macros do not support float, double, long double, but C++ does
36 |     // support atomics of these types. An implementation shall ensure that these
37 |     // types, as well as user-defined types, guarantee the above invariant that
38 |     // is_always_lock_free implies is_lock_free for the same type.
39 |   }
40 | 
41 |   template<typename T>
42 |   struct atomic_n4509 {
43 |     // ...
44 |     static constexpr bool is_always_lock_free = detail::is_always_lock_free<T>;
45 |     // ...
46 |   };
47 | 
48 | }
49 | 
50 | template<typename T> using atomic = std::atomic_n4509<T>;
51 | 
52 | int main() {
53 |   std::cout <<
54 |     "bool\t" << atomic<bool>::is_always_lock_free << '\n' <<
55 |     "char\t" << atomic<char>::is_always_lock_free << '\n' <<
56 |     "signed char\t" << atomic<signed char>::is_always_lock_free << '\n' <<
57 |     "unsigned char\t" << atomic<unsigned char>::is_always_lock_free << '\n' <<
58 |     "char16_t\t" << atomic<char16_t>::is_always_lock_free << '\n' <<
59 |     "char32_t\t" << atomic<char32_t>::is_always_lock_free << '\n' <<
60 |     "wchar_t\t" << atomic<wchar_t>::is_always_lock_free << '\n' <<
61 |     "short\t" << atomic<short>::is_always_lock_free << '\n' <<
62 |     "unsigned short\t" << atomic<unsigned short>::is_always_lock_free << '\n' <<
63 |     "int\t" << atomic<int>::is_always_lock_free << '\n' <<
64 |     "unsigned int\t" << atomic<unsigned int>::is_always_lock_free << '\n' <<
65 |     "long\t" << atomic<long>::is_always_lock_free << '\n' <<
66 |     "unsigned long\t" << atomic<unsigned long>::is_always_lock_free << '\n' <<
67 |     "long long\t" << atomic<long long>::is_always_lock_free << '\n' <<
68 |     "unsigned long long\t" << atomic<unsigned long long>::is_always_lock_free << '\n' <<
69 |     "void*\t" << atomic<void*>::is_always_lock_free << '\n' <<
70 |     "std::nullptr_t\t" << atomic<std::nullptr_t>::is_always_lock_free << '\n';
71 | 
72 |   return 0;
73 | }
74 | 


--------------------------------------------------------------------------------
/source/N4509.rst:
--------------------------------------------------------------------------------
  1 | ==================================================
  2 | N4509 ``constexpr atomic<T>::is_always_lock_free``
  3 | ==================================================
  4 | 
  5 | :Author: Olivier Giroux
  6 | :Contact: ogiroux@nvidia.com
  7 | :Author: JF Bastien
  8 | :Contact: jfb@google.com
  9 | :Author: Jeff Snyder
 10 | :Contact: jeff-isocpp@caffeinated.me.uk
 11 | :Date: 2015-05-05
 12 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4509.rst
 13 | :Source: https://github.com/jfbastien/papers/blob/master/source/N4509.cc
 14 | 
 15 | The current design for ``std::atomic<T>`` affords implementations the critical
 16 | freedom to revert to critical sections when hardware support for atomic
 17 | operations does not meet the size or semantic requirements for the associated
 18 | type ``T``. This:
 19 | 
 20 | * Preserves C++ support on aging hardware.
 21 | * Supports developers who don't target a specific architecture e.g. with the
 22 |   ``-march=xxx`` flag.
 23 | * Improves the portability of abstract representations for C++ programs,
 24 |   e.g. when compiling C++ code to execute portably within a web browser.
 25 | 
 26 | The Standard also ensures that developers can be informed of the
 27 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member
 28 | and free-functions. This is important because programmers may want to select
 29 | algorithm implementations, or even select algorithms, based on this
 30 | knowledge. Developers are equally likely to do so for correctness and
 31 | performance reasons.
 32 | 
 33 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.**
 34 | 
 35 | There is poor support for static determination of lock-freedom guarantees.
 36 | 
 37 | At the present time the Standard has limited support in this domain: the
 38 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the
 39 | corresponding atomic type is *always* lock-free, sometimes lock-free or never
 40 | lock-free, respectively. These macros are little more than a consolation prize
 41 | because they do not work with an arbitrary type ``T`` (as the C++ native
 42 | ``std::atomic<T>`` library intends) and they leave adaptation for generic
 43 | programming entirely up to the developer.
 44 | 
 45 | This leads to the present, counter-intuitive state of the art whereby
 46 | non-traditional uses of C++ have better support than high-performance
 47 | computing. We aim to make the smallest possible change that improves the
 48 | situation for HPC while leaving all other uses untouched.
 49 | 
 50 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is
 51 | suitable for use with SFINAE and ``static_assert``.
 52 | 
 53 | -----------------
 54 | Proposed addition
 55 | -----------------
 56 | 
 57 | Under 29.5 Atomic types [**atomics.types.generic**]:
 58 | 
 59 | .. code-block:: c++
 60 | 
 61 |   namespace std {
 62 |     template <class T> struct atomic {
 63 |       static constexpr bool is_always_lock_free = /* implementation-defined */;
 64 |       // Omitting all other members for brevity.
 65 |     };
 66 |     template <> struct atomic<integral> {
 67 |       static constexpr bool is_always_lock_free = /* implementation-defined */;
 68 |       // Omitting all other members for brevity.
 69 |     };
 70 |     template <class T> struct atomic<T*> {
 71 |       static constexpr bool is_always_lock_free = /* implementation-defined */;
 72 |       // Omitting all other members for brevity.
 73 |     };
 74 |   }
 75 | 
 76 | After paragraph 2:
 77 | 
 78 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's
 79 | operations are always lock-free, and false otherwise. The value of
 80 | ``is_always_lock_free`` shall be consistent with the value of the corresponding
 81 | ``ATOMIC_..._LOCK_FREE`` macro, if defined.
 82 | 
 83 | Under 29.6.5 Requirements for operations on atomic types
 84 | [**atomics.types.operations.req**], in paragraph 7:
 85 | 
 86 | The return value of the ``is_lock_free`` member function shall be consistent
 87 | with the value of ``is_always_lock_free`` for the same type.
 88 | 
 89 | [*Example:* the following should never fail,
 90 | 
 91 | .. code-block:: c++
 92 | 
 93 |   if (atomic<T>::is_always_lock_free)
 94 |     assert(atomic<T>().is_lock_free());
 95 | 
 96 | — *end example*]
 97 | 
 98 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added.
 99 | 
100 | -------------------
101 | Additional material
102 | -------------------
103 | 
104 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions
105 | (which the ``is_lock_free`` functions have) because these require a
106 | pointer. This makes the free functions significantly less useful as compile-time
107 | ``constexpr``.
108 | 
109 | We show a sample implementation:
110 | 
111 | .. literalinclude:: N4509.cc
112 |    :language: c++
113 |    :lines: 4-48
114 | 


--------------------------------------------------------------------------------
/source/P0908r0.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Offsetof for Pointers to Members
  3 | Shortname: P0908
  4 | Revision: 0
  5 | Audience: EWG
  6 | Status: P
  7 | Group: WG21
  8 | Editor: Eddie Kohler, Harvard, kohler@seas.harvard.edu
  9 | URL: https://wg21.link/P0908r0
 10 | Abstract: The offsetof macro should support pointers to members.
 11 | Markup Shorthands: markdown yes
 12 | </pre>
 13 | 
 14 | The `offsetof` macro, inherited from C and applicable to standard-layout
 15 | classes (and, conditionally, other classes) in C++, calculates the layout
 16 | offset of a member within a class. `offsetof` is useful for calculating an
 17 | object pointer given a pointer to one of its members:
 18 | 
 19 | <xmp>
 20 | 
 21 | struct link {
 22 |   ...
 23 | };
 24 | 
 25 | struct container {
 26 |   link l;
 27 | };
 28 | 
 29 | container* container_from_link(link* x) {
 30 |   // x is known to be the .l part of some container
 31 |   uintptr_t x_address = reinterpret_cast<uintptr_t>(x);
 32 |   size_t l_offset = offsetof(container, l);
 33 |   return reinterpret_cast<container*>(x_address - l_offset);
 34 | }
 35 | 
 36 | </xmp>
 37 | 
 38 | This pattern is used in several implementations of intrusive containers, such
 39 | as Linux kernel linked lists (`struct list_head`).
 40 | 
 41 | Unfortunately, although `offsetof` works for some unusual
 42 | member-designators, it does not work for pointers to members. This won’t
 43 | compile:
 44 | 
 45 | <xmp>
 46 | 
 47 | template <typename Container, typename Link, Link (Container::* member)>
 48 | Container* generic_container_from_link(Link* x) {
 49 |   uintptr_t x_address = reinterpret_cast<uintptr_t>(x);
 50 |   size_t link_offset = offsetof(Container, member); // error!
 51 |   return reinterpret_cast<Container*>(x_address - link_offset);
 52 | }
 53 | 
 54 | </xmp>
 55 | 
 56 | Programmers currently compute pointer-to-member offsets using `nullptr` casts
 57 | (i.e., the incorrect folk implementation of `offsetof`, which invokes
 58 | undefined behavior), or by jumping through other hoops:
 59 | 
 60 | <xmp>
 61 | 
 62 | template <typename Container, typename Link, Link (Container::* member)>
 63 | Container* generic_container_from_link(Link* x) {
 64 |   ...
 65 |   alignas(Container) char container_space[sizeof(Container)] = {};
 66 |   Container* fake_container = reinterpret_cast<Container*>(container_space);
 67 |   size_t link_offset = reinterpret_cast<uintptr_t>(&(fake_container->*member))
 68 |       - reinterpret_cast<uintptr_t>(fake_container);
 69 |   ...
 70 | }
 71 | 
 72 | </xmp>
 73 | 
 74 | `offsetof` with pointer-to-member member-designators should simply work.
 75 | Modern compilers implement `offsetof` using an extension (`__builtin_offsetof`
 76 | in GCC and LLVM), so implementation need not require library changes. To avoid
 77 | ambiguity, we propose this syntax:
 78 | 
 79 | <xmp>
 80 | 
 81 | size_t link_offset = offsetof(Container, .*member);
 82 | 
 83 | </xmp>
 84 | 
 85 | 
 86 | Questions {#qq}
 87 | =========
 88 | 
 89 | Must a pointer-to-member expression in an `offsetof` member-designator be a
 90 | constant expression (such as a template argument)? The C standard requires
 91 | that “the expression `&(t.member-designator)` evaluates to an address
 92 | constant,” which might make this code illegal:
 93 | 
 94 | <xmp>
 95 | 
 96 | struct container {
 97 |   char array[200];
 98 | };
 99 | 
100 | int index = /* dynamic value */;
101 | size_t offset = offsetof(container, array[index]);  // questionable
102 | 
103 | </xmp>
104 | 
105 | But since several current compilers accept dynamic array indexes, the proposed
106 | wording allows any pointer to member.
107 | 
108 | 
109 | Proposed Wording {#word}
110 | ================
111 | 
112 | In Sizes, alignments, and offsets [**support.types.layout**], modify the first
113 | sentence of ❡1 as follows:
114 | 
115 | <blockquote>
116 | 
117 | The macro `offsetof(type, member-designator)` has the same semantics as the
118 | corresponding macro in the C standard library header `<stddef.h>`, but accepts
119 | a restricted set of `type` arguments <ins> and a superset of
120 | `member-designator` arguments </ins> in this International Standard.
121 | 
122 | </blockquote>
123 | 
124 | Add this paragraph after ❡1:
125 | 
126 | <blockquote>
127 | 
128 | <ins> An `offsetof` `member-designator` may contain pointer-to-member
129 | expressions as well as `member-designators` acceptable in C. A
130 | `member-designator` may begin with a prefix `.` or `.*` operator (e.g.,
131 | `offsetof(type, .member_name)` or `offsetof(type, .*pointer_to_member)`). If
132 | the prefix operator is omitted, `.` is assumed. </ins>
133 | 
134 | </blockquote>
135 | 
136 | 
137 | Example online discussions of the issue {#disc}
138 | =======================================
139 | 
140 | * <a href="https://groups.google.com/forum/#!topic/llvm-dev/l78RQ9zJR64">[LLVMdev] Evaluation of offsetof() macro</a>
141 | * <a href="https://gist.github.com/graphitemaster/494f21190bb2c63c5516">Working around offsetof limitations in C++</a>
142 | 


--------------------------------------------------------------------------------
/source/P0152R1.rst:
--------------------------------------------------------------------------------
  1 | ====================================================
  2 | P0152R1 ``constexpr atomic<T>::is_always_lock_free``
  3 | ====================================================
  4 | 
  5 | :Author: Olivier Giroux
  6 | :Contact: ogiroux@nvidia.com
  7 | :Author: JF Bastien
  8 | :Contact: jfb@google.com
  9 | :Author: Jeff Snyder
 10 | :Contact: jeff-isocpp@caffeinated.me.uk
 11 | :Date: 2016-03-02
 12 | :Previous: http://wg21.link/N4509
 13 | :Previous: http://wg21.link/P0152R0
 14 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0152R1.rst
 15 | :Source: https://github.com/jfbastien/papers/blob/master/source/P0152.cc
 16 | 
 17 | The current design for ``std::atomic<T>`` affords implementations the critical
 18 | freedom to revert to critical sections when hardware support for atomic
 19 | operations does not meet the size or semantic requirements for the associated
 20 | type ``T``. This:
 21 | 
 22 | * Preserves C++ support on aging hardware.
 23 | * Supports developers who don't target a specific architecture e.g. with the
 24 |   ``-march=xxx`` flag.
 25 | * Improves the portability of abstract representations for C++ programs,
 26 |   e.g. when compiling C++ code to execute portably within a web browser.
 27 | 
 28 | The Standard also ensures that developers can be informed of the
 29 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member
 30 | and free-functions. This is important because programmers may want to select
 31 | algorithm implementations, or even select algorithms, based on this
 32 | knowledge. Developers are equally likely to do so for correctness and
 33 | performance reasons.
 34 | 
 35 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.**
 36 | 
 37 | There is poor support for static determination of lock-freedom guarantees.
 38 | 
 39 | At the present time the Standard has limited support in this domain: the
 40 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the
 41 | corresponding atomic type is *always* lock-free, sometimes lock-free or never
 42 | lock-free, respectively. These macros are little more than a consolation prize
 43 | because they do not work with an arbitrary type ``T`` (as the C++ native
 44 | ``std::atomic<T>`` library intends) and they leave adaptation for generic
 45 | programming entirely up to the developer.
 46 | 
 47 | This leads to the present, counter-intuitive state of the art whereby
 48 | non-traditional uses of C++ have better support than high-performance
 49 | computing. We aim to make the smallest possible change that improves the
 50 | situation for HPC while leaving all other uses untouched.
 51 | 
 52 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is
 53 | suitable for use with SFINAE and ``static_assert``.
 54 | 
 55 | -----------------
 56 | Proposed addition
 57 | -----------------
 58 | 
 59 | Under 29.5 Atomic types [**atomics.types.generic**]:
 60 | 
 61 | .. code-block:: c++
 62 | 
 63 |   namespace std {
 64 |     template <class T> struct atomic {
 65 |       static constexpr bool is_always_lock_free = implementation-defined;
 66 |       // Omitting all other members for brevity.
 67 |     };
 68 |     template <> struct atomic<integral> {
 69 |       static constexpr bool is_always_lock_free = implementation-defined;
 70 |       // Omitting all other members for brevity.
 71 |     };
 72 |     template <class T> struct atomic<T*> {
 73 |       static constexpr bool is_always_lock_free = implementation-defined;
 74 |       // Omitting all other members for brevity.
 75 |     };
 76 |   }
 77 | 
 78 | Under 29.6.5 Requirements for operations on atomic types
 79 | [**atomics.types.operations.req**], between paragraphs 6 and 7:
 80 | 
 81 | .. code-block:: c++
 82 | 
 83 |   static constexpr bool is_always_lock_free = implementation-defined;
 84 | 
 85 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's
 86 | operations are always lock-free, and false otherwise.
 87 | 
 88 | [*Note:* The value of ``is_always_lock_free`` is consistent with the value of
 89 | the corresponding ``ATOMIC_..._LOCK_FREE`` macro, if defined. — *end note*]
 90 | 
 91 | Under 29.6.5 Requirements for operations on atomic types
 92 | [**atomics.types.operations.req**], in paragraph 7:
 93 | 
 94 | [*Note:* The return value of the ``is_lock_free`` member function is consistent
 95 | with the value of ``is_always_lock_free`` for the same type. — *end note*]
 96 | 
 97 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added.
 98 | 
 99 | -------------------
100 | Additional material
101 | -------------------
102 | 
103 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions
104 | (which the ``is_lock_free`` functions have) because these require a
105 | pointer. This makes the free functions significantly less useful as compile-time
106 | ``constexpr``.
107 | 
108 | We show a sample implementation:
109 | 
110 | .. literalinclude:: P0152.cc
111 |    :language: c++
112 |    :lines: 4-48
113 | 


--------------------------------------------------------------------------------
/source/P0152R0.rst:
--------------------------------------------------------------------------------
  1 | ====================================================
  2 | P0152R0 ``constexpr atomic<T>::is_always_lock_free``
  3 | ====================================================
  4 | 
  5 | :Author: Olivier Giroux
  6 | :Contact: ogiroux@nvidia.com
  7 | :Author: JF Bastien
  8 | :Contact: jfb@google.com
  9 | :Author: Jeff Snyder
 10 | :Contact: jeff-isocpp@caffeinated.me.uk
 11 | :Date: 2015-10-21
 12 | :Previous: http://wg21.link/N4509
 13 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0152R0.rst
 14 | :Source: https://github.com/jfbastien/papers/blob/master/source/P0152.cc
 15 | 
 16 | The current design for ``std::atomic<T>`` affords implementations the critical
 17 | freedom to revert to critical sections when hardware support for atomic
 18 | operations does not meet the size or semantic requirements for the associated
 19 | type ``T``. This:
 20 | 
 21 | * Preserves C++ support on aging hardware.
 22 | * Supports developers who don't target a specific architecture e.g. with the
 23 |   ``-march=xxx`` flag.
 24 | * Improves the portability of abstract representations for C++ programs,
 25 |   e.g. when compiling C++ code to execute portably within a web browser.
 26 | 
 27 | The Standard also ensures that developers can be informed of the
 28 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member
 29 | and free-functions. This is important because programmers may want to select
 30 | algorithm implementations, or even select algorithms, based on this
 31 | knowledge. Developers are equally likely to do so for correctness and
 32 | performance reasons.
 33 | 
 34 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.**
 35 | 
 36 | There is poor support for static determination of lock-freedom guarantees.
 37 | 
 38 | At the present time the Standard has limited support in this domain: the
 39 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the
 40 | corresponding atomic type is *always* lock-free, sometimes lock-free or never
 41 | lock-free, respectively. These macros are little more than a consolation prize
 42 | because they do not work with an arbitrary type ``T`` (as the C++ native
 43 | ``std::atomic<T>`` library intends) and they leave adaptation for generic
 44 | programming entirely up to the developer.
 45 | 
 46 | This leads to the present, counter-intuitive state of the art whereby
 47 | non-traditional uses of C++ have better support than high-performance
 48 | computing. We aim to make the smallest possible change that improves the
 49 | situation for HPC while leaving all other uses untouched.
 50 | 
 51 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is
 52 | suitable for use with SFINAE and ``static_assert``.
 53 | 
 54 | -----------------
 55 | Proposed addition
 56 | -----------------
 57 | 
 58 | Under 29.5 Atomic types [**atomics.types.generic**]:
 59 | 
 60 | .. code-block:: c++
 61 | 
 62 |   namespace std {
 63 |     template <class T> struct atomic {
 64 |       static constexpr bool is_always_lock_free = implementation-defined;
 65 |       // Omitting all other members for brevity.
 66 |     };
 67 |     template <> struct atomic<integral> {
 68 |       static constexpr bool is_always_lock_free = implementation-defined;
 69 |       // Omitting all other members for brevity.
 70 |     };
 71 |     template <class T> struct atomic<T*> {
 72 |       static constexpr bool is_always_lock_free = implementation-defined;
 73 |       // Omitting all other members for brevity.
 74 |     };
 75 |   }
 76 | 
 77 | Under 29.6.5 Requirements for operations on atomic types
 78 | [**atomics.types.operations.req**], between paragraphs 6 and 7:
 79 | 
 80 | .. code-block:: c++
 81 | 
 82 |   static constexpr bool is_always_lock_free = implementation-defined;
 83 | 
 84 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's
 85 | operations are always lock-free, and false otherwise. The value of
 86 | ``is_always_lock_free`` shall be consistent with the value of the corresponding
 87 | ``ATOMIC_..._LOCK_FREE`` macro, if defined.
 88 | 
 89 | Under 29.6.5 Requirements for operations on atomic types
 90 | [**atomics.types.operations.req**], in paragraph 7:
 91 | 
 92 | The return value of the ``is_lock_free`` member function shall be consistent
 93 | with the value of ``is_always_lock_free`` for the same type.
 94 | 
 95 | [*Example:* The following should never fail
 96 | 
 97 | .. code-block:: c++
 98 | 
 99 |   if (atomic<T>::is_always_lock_free)
100 |     assert(atomic<T>().is_lock_free());
101 | 
102 | — *end example*]
103 | 
104 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added.
105 | 
106 | -------------------
107 | Additional material
108 | -------------------
109 | 
110 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions
111 | (which the ``is_lock_free`` functions have) because these require a
112 | pointer. This makes the free functions significantly less useful as compile-time
113 | ``constexpr``.
114 | 
115 | We show a sample implementation:
116 | 
117 | .. literalinclude:: P0152.cc
118 |    :language: c++
119 |    :lines: 4-48
120 | 


--------------------------------------------------------------------------------
/source/Math.signbit.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Math.signbit
  3 | Shortname: Math.signbit
  4 | Revision: 0
  5 | Status: Stage1
  6 | Group: TC39
  7 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/Math.signbit.bs">github.com/jfbastien/papers/blob/master/source/Math.signbit.bs</a>
  8 | Editor: JF Bastien, Apple, jfbastien@apple.com
  9 | ED:
 10 | Abstract:
 11 | Date: 2017-01-26
 12 | Markup Shorthands: markdown yes
 13 | </pre>
 14 | 
 15 | IEEE 754 has a precise meaning for *sign bit*. JavaScript's `Math.sign` falls
 16 | short on `-0.0` and `+0.0`. This is a shortcoming of a "batteries included"
 17 | approach to language design.
 18 | 
 19 | Correctly obtaining the sign bit of a Number in JavaScript is somewhat
 20 | unintuitive: the naïve `x < 0` approach fails if `x` is `-0.0` because `0.0` and
 21 | `-0.0` compare equal to each other.
 22 | 
 23 | One can instead rely on division by zero returning one of `-Infinity` or
 24 | `+Infinity`: `1.0 / x < 0`. This now has the interesting caveat of returning
 25 | `NaN` if `x` was `NaN`. It's also highly counter-intuitive.
 26 | 
 27 | JavaScript aficionado will know that `Object.is(-0, x)` will return `true` when
 28 | `x` is `-0` but not when it's `0`. This is surprising for developers who are
 29 | more numerics-oriented than object-—dare I say prototype-?—oriented. These
 30 | developers just want the sign bit, IEEE 754 has a very precise definition of
 31 | what the sign bit is, and why can't JavaScript just give them the sign bit?
 32 | 
 33 | This issue [has been discussed previously](https://esdiscuss.org/topic/math-sign-vs-0)
 34 | but was never addressed. We believe that this proposal can fix this
 35 | oft-encountered problem once and for all.
 36 | 
 37 | 
 38 | Revision History {#rev}
 39 | ================
 40 | 
 41 | * Presented at the [2017-01](https://github.com/tc39/agendas/blob/master/2017/01.md) TC39 meeting and moved to Stage 1.
 42 | 
 43 | 
 44 | Background {#bg}
 45 | ==========
 46 | 
 47 | IEEE 754 {#ieee754}
 48 | --------
 49 | 
 50 | [[IEEE754]] section 5.5.1 defines *sign bit operations*. These operations are
 51 | quiet-computational operations which only affect the sign bit of the arithmetic
 52 | format. The operations treat floating-point numbers and NaNs alike, and signal
 53 | no exception. As defined, they may propagate non-canonical encodings.
 54 | 
 55 | The following operations are defined:
 56 | 
 57 | * `copy`
 58 | * `negate`
 59 | * `abs`
 60 | 
 61 | C / C++ {#cpp}
 62 | -------
 63 | 
 64 | [[C]] and [[Cpp]] define `signbit` in `<math.h>` and `<cmath>` respectively. It
 65 | returns a nonzero `int` value if and only if the sign of its argument value is
 66 | negative. The `signbit` macro reports the sign of all values, including
 67 | infinities, zeros, and NaNs.
 68 | 
 69 | Go {#go}
 70 | ---
 71 | 
 72 | [[Go]]'s math package defines `Signbit` as `true` if `x` is negative or negative
 73 | zero. While the specification is silent on NaN,
 74 | [the implementation](https://golang.org/src/math/signbit.go) clearly extracts the
 75 | sign bit regardless of NaN-ness.
 76 | 
 77 | `Math.sign(x)` {#sign}
 78 | -----------
 79 | 
 80 | JavaScript provides `Math.sign` which is specified as follows:
 81 | 
 82 | <blockquote>
 83 | 
 84 |   Returns the sign of the x, indicating whether x is positive, negative or zero.
 85 | 
 86 |   * If `x` is `NaN`, the result is `NaN`.
 87 |   * If `x` is `-0`, the result is `-0`.
 88 |   * If `x` is `+0`, the result is `+0`.
 89 |   * If `x` is negative and not `-0`, the result is `-1`.
 90 |   * If `x` is positive and not `+0`, the result is `+1`.
 91 | 
 92 | </blockquote>
 93 | 
 94 | This falls short when dealing with `-0` and `+0` since these values both compare
 95 | equal.
 96 | 
 97 | 
 98 | Proposal {#proposal}
 99 | ========
100 | 
101 | Given existing precedent as well as common hardware support, we propose adding
102 | `Math.signbit` to JavaScript as follows.
103 | 
104 | `Math.signbit(x)` {#spec}
105 | -----------------
106 | 
107 | Returns whether the sign bit of `x` is set.
108 | 
109 | 1. If `x` is `NaN`, the result is `false`.
110 | 1. If `x` is `-0`, the result is `true`.
111 | 1. If `x` is negative, the result is `true`.
112 | 1. Otherwise, the result is `false`.
113 | 
114 |   Note: The "Function Properties of the Math Object" section already states:
115 |   "Each of the following `Math` object functions applies the `ToNumber` abstract
116 |   operation to each of its argument."
117 | 
118 | Alternatives {#alts}
119 | ------------
120 | 
121 | This proposal makes decisions which TC39 may want to consider modifying:
122 | 
123 | * Coercison `ToNumber`.
124 | * The return type is Boolean.
125 | * NaN is equivalent to a positive number.
126 | 
127 | 
128 | <pre class=biblio>
129 | {
130 |     "IEEE754": {
131 |         "href": "https://standards.ieee.org/findstds/standard/754-2008.html",
132 |         "title": "IEEE 754-2008",
133 |         "publisher": "IEEE Computer Society"
134 |     },
135 |     "C": {
136 |         "href": "http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf",
137 |         "title": "Programming Languages — C",
138 |         "publisher": "ISO/IEC JTC1 SC22 WG14"
139 |     },
140 |     "Cpp": {
141 |         "href": "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf",
142 |         "title": "Programming Languages — C++",
143 |         "publisher": "ISO/IEC JTC1 SC22 WG21"
144 |     },
145 |     "Go": {
146 |         "href": "https://golang.org/pkg/math/",
147 |         "title": "The Go Programming Language — Package math"
148 |     }
149 | }
150 | </pre>
151 | 


--------------------------------------------------------------------------------
/source/P0476r0.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Bit-casting object representations
  3 | Shortname: P0476
  4 | Revision: 0
  5 | Audience: LEWG, LWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0476r0
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0476r0.bs">github.com/jfbastien/papers/blob/master/source/P0476r0.bs</a>
 10 | !Implementation: <a href="https://github.com/jfbastien/bit_cast/">github.com/jfbastien/bit_cast/</a>
 11 | Editor: JF Bastien, Apple, jfbastien@apple.com
 12 | Abstract: Obtaining equivalent object representations The Right Way™.
 13 | Date: 2016-10-16
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | Background {#bg}
 18 | ==========
 19 | 
 20 | Low-level code often seeks to interpret objects of one type as another: keep the
 21 | same bits, but obtain an object of a different type. Doing so correctly is
 22 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing
 23 | rules yet these are the intuitive solutions developers mistakenly turn to.
 24 | 
 25 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment
 26 | pitfalls and allowing them to bit-cast non-default-constructible types.
 27 | 
 28 | This facility inevitably ends up being used incorrectly on pointer types, we
 29 | propose using appropriate concepts to prevent misuse. As our sample
 30 | implementation demonstrates we could as well use `static_assert` or template
 31 | SFINAE, but the timing of this library feature will likely coincide with
 32 | concept's standardization.
 33 | 
 34 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast
 35 | function, as `memcpy` itself isn't `constexpr`. Marking our proposed function as
 36 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`. This
 37 | leaves implementations free to use their own internal solution (e.g. LLVM has <a
 38 | href="http://llvm.org/docs/LangRef.html#bitcast-to-instruction">a `bitcast`
 39 | opcode</a>).
 40 | 
 41 | We propose to standardize this oft-used idiom, and avoid the pitfalls once and
 42 | for all.
 43 | 
 44 | Proposed Wording {#word}
 45 | ================
 46 | 
 47 | Below, substitute the `�` character with a number the editor finds appropriate
 48 | for the sub-section.
 49 | 
 50 | Synopsis {#syn}
 51 | --------
 52 | 
 53 | Under 20.2 Header `<utility>` synopsis [**utility**]:
 54 | 
 55 | <xmp>
 56 | namespace std {
 57 |   // ...
 58 |   
 59 |   // 20.2.� bit-casting:
 60 |   template<typename To, typename From>
 61 |   requires
 62 |     sizeof(To) == sizeof(From) &&
 63 |     is_trivially_copyable_v<To> &&
 64 |     is_trivially_copyable_v<From> &&
 65 |     is_standard_layout_v<To> &&
 66 |     is_standard_layout_v<From> &&
 67 |     !(is_pointer_v<From> &&
 68 |       is_pointer_v<To>) &&
 69 |     !(is_member_pointer_v<From> &&
 70 |       is_member_pointer_v<To>) &&
 71 |     !(is_member_object_pointer_v<From> &&
 72 |       is_member_object_pointer_v<To>) &&
 73 |     !(is_member_function_pointer_v<From> &&
 74 |       is_member_function_pointer_v<To>)
 75 |   constexpr To bit_cast(const From& from) noexcept;
 76 |   
 77 |   // ...
 78 | }
 79 | </xmp>
 80 | 
 81 | Details {#det}
 82 | -------
 83 | 
 84 | Under 20.2.`�` Bit-casting [**utility.bitcast**]:
 85 | 
 86 | <xmp>
 87 |   template<typename To, typename From>
 88 |   requires
 89 |     sizeof(To) == sizeof(From) &&
 90 |     is_trivially_copyable_v<To> &&
 91 |     is_trivially_copyable_v<From> &&
 92 |     is_standard_layout_v<To> &&
 93 |     is_standard_layout_v<From> &&
 94 |     !(is_pointer_v<From> &&
 95 |       is_pointer_v<To>) &&
 96 |     !(is_member_pointer_v<From> &&
 97 |       is_member_pointer_v<To>) &&
 98 |     !(is_member_object_pointer_v<From> &&
 99 |       is_member_object_pointer_v<To>) &&
100 |     !(is_member_function_pointer_v<From> &&
101 |       is_member_function_pointer_v<To>)
102 |   constexpr To bit_cast(const From& from) noexcept;
103 | </xmp>
104 | 
105 | 1. Requires: `sizeof(To) == sizeof(From)`,
106 |              `is_trivially_copyable_v<To>` is `true`,
107 |              `is_trivially_copyable_v<From>` is `true`,
108 |              `is_standard_layout_v<To>` is `true`,
109 |              `is_standard_layout_v<From>` is `true`,
110 |              `is_pointer_v<To> && is_pointer_v<From>` is `false`,
111 |              `is_member_pointer_v<To> && is_member_pointer_v<From>` is `false`,
112 |              `is_member_object_pointer_v<To> && is_member_object_pointer_v<From>` is `false`,
113 |              `is_member_function_pointer_v<To> && is_member_function_pointer_v<From>` is `false`.
114 | 
115 | 2. Returns: an object of type `To` whose <em>object representation</em> is equal
116 |             to the object representation of `From`. If multiple <em>object
117 |             representations</em> could represent the <em>value
118 |             representation</em> of `From`, then it is unspecified which `To`
119 |             value is returned. If no <em>value representation</em> corresponds
120 |             to `To`'s <em>object representation</em> then the returned value is
121 |             unspecified.
122 | 
123 | Feature testing {#test}
124 | ---------------
125 | 
126 | The `__cpp_lib_bit_cast` feature test macro should be added.
127 | 
128 | Appendix {#appendix}
129 | ========
130 | 
131 | The Standard's [**basic.types**] section explicitly blesses `memcpy`:
132 | 
133 | <blockquote>
134 | 
135 |   For any trivially copyable type `T`, if two pointers to `T` point to distinct
136 |   `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class
137 |   subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into
138 |   `obj2`, `obj2` shall subsequently hold the same value as `obj1`.
139 | 
140 |   [*Example:*
141 | ```
142 |     T* t1p;
143 |     T* t2p;
144 |     // provided that t2p points to an initialized object ...
145 |     std::memcpy(t1p, t2p, sizeof(T));
146 |     // at this point, every subobject of trivially copyable type in *t1p contains
147 |     // the same value as the corresponding subobject in *t2p
148 | ```
149 |   — *end example*]
150 | 
151 | </blockquote>
152 | 
153 | Whereas section [class.union] says:
154 | 
155 | <blockquote>
156 | 
157 |   In a union, at most one of the non-static data members can be
158 |   active at any time, that is, the value of at most one of the
159 |   non-static data members can be stored in a union at any time.
160 | 
161 | </blockquote>
162 | 
163 | Acknowledgement {#ack}
164 | ===============
165 | 
166 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review
167 | and suggested improvements.
168 | 


--------------------------------------------------------------------------------
/source/p1102r0.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Down with ()!
  3 | Shortname: P1102
  4 | Revision: 0
  5 | Audience: CWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P1102R0
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/p1102r0.bs">https://github.com/jfbastien/papers/blob/master/source/p1102r0.bs</a>
 10 | Editor: Alex Christensen, Apple, achristensen@apple.com
 11 | Editor: JF Bastien, Apple, jfbastien@apple.com
 12 | Abstract: A proposal for removing unnecessary ()'s from C++ lambdas.
 13 | Date: 2018-06-20
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | Introduction and motivation {#intro}
 18 | ===========================
 19 | 
 20 | Currently, C++ lambdas with no parameters do not require a parameter declaration
 21 | clause. The specification even contains this language in [**expr.prim.lambda**]
 22 | section 8.4.5 ❡4:
 23 | 
 24 | > If a lambda-expression does not include a lambda-declarator, it is as if the
 25 | > lambda-declarator were `()`.
 26 | 
 27 | This allows us to omit the unused `()` in simple lambdas such as this:
 28 | 
 29 | <xmp>
 30 | std::string s1 = "abc";
 31 | auto withParen = [s1 = std::move(s1)] () {
 32 | 	std::cout << s1 << '\n'; 
 33 | };
 34 | 
 35 | std::string s2 = "abc";
 36 | auto noSean = [s2 = std::move(s2)] { // Note no syntax error.
 37 | 	std::cout << s2 << '\n'; 
 38 | };
 39 | </xmp>
 40 | 
 41 | These particular lambdas have ownership of the strings, so they ought to be able
 42 | to mutate it, but `s1` and `s2` are const (because the `const` operator is
 43 | declared `const` by default) so we need to add the `mutable` keyword:
 44 | 
 45 | <xmp>
 46 | std::string s1 = "abc";
 47 | auto withParen = [s1 = std::move(s1)] () mutable {
 48 | 	s1 += "d";
 49 | 	std::cout << s1 << '\n'; 
 50 | };
 51 | 
 52 | std::string s2 = "abc";
 53 | auto noSean = [s2 = std::move(s2)] mutable { // Currently a syntax error.
 54 | 	s2 += "d";
 55 | 	std::cout << s2 << '\n'; 
 56 | };
 57 | </xmp>
 58 | 
 59 | Confusingly, the current Standard requires the empty parens when using the
 60 | `mutable` keyword. This rule is unintuitive, causes common syntax errors, and
 61 | clutters our code. When compiling with clang, we even get a syntax error that
 62 | indicates the compiler knows exactly what is going on:
 63 | 
 64 | <xmp>
 65 | example.cpp:11:54: error: lambda requires '()' before 'mutable'
 66 | auto noSean = [s2 = std::move(s2)] mutable { // Currently a syntax error.
 67 |                                    ^
 68 |                                    () 
 69 | 1 error generated.
 70 | </xmp>
 71 | 
 72 | This proposal would make these parentheses unnecessary like they were before we
 73 | added `mutable`. This will apply to:
 74 | 
 75 |   * lambda template parameters
 76 |   * `constexpr`
 77 |   * `mutable`
 78 |   * Exception specifications and `noexcept`
 79 |   * attributes
 80 |   * trailing return types
 81 |   * `requires`
 82 | 
 83 | EWG discussed this change as [[EWG135]]
 84 | in [Lenexa](http://wiki.edg.com/bin/view/Wg21lenexa/EWGIssuesResolutionMinutes)
 85 | and voted 15 to 1 on forwarding to core. It became [[CWG2121]], discussed
 86 | in
 87 | [Kona](http://wiki.edg.com/bin/view/Wg21kona2015/CoreWorkingGroup#CWG_2121_More_flexible_lambda_sy) and
 88 | needed someone to volunteer wording.
 89 | 
 90 | This paper was discussed on the EWG reflector in June, Nina Ranns provided
 91 | feedback, and EWG chair agreed that the paper should move to CWG directly given
 92 | previous polls.
 93 | 
 94 | 
 95 | Impact {#impact}
 96 | ======
 97 | 
 98 | This change will not break existing code.
 99 | 
100 | 
101 | Wording {#word}
102 | =======
103 | 
104 | Modify Lambda expressions [**expr.prim.lambda**] as follows:
105 | 
106 | <blockquote>
107 | 
108 |     <style>
109 |     indent1 { padding-left: 4em; }
110 |     indent2 { padding-left: 8em; }
111 |     indent3 { padding-left: 12em; }
112 |     </style>
113 |     <i>
114 |     <indent1>lambda-expression :<br/></indent1>
115 |         <indent2>lambda-introducer lambda-declarator requires-clause<sub>opt</sub> compound-statement<br/></indent2>
116 |         <indent2><del>lambda-introducer < template-parameter-list > requires-clause<sub>opt</sub> compound-statement</del><br/></indent2>
117 |         <indent2>lambda-introducer < template-parameter-list > requires-clause<sub>opt</sub> <br/></indent2>
118 |             <indent3>lambda-declarator requires-clause<sub>opt</sub> compound-statement<br/></indent3>
119 |     <indent1>lambda-introducer :<br/></indent1>
120 |         <indent2>[ lambda-capture<sub>opt</sub> ]<br/></indent2>
121 |         <indent1>lambda-declarator :<br/></indent1>
122 |         <indent2>( parameter-declaration-clause )<sub><ins>opt</ins></sub> decl-specifier-seq<sub>opt</sub> <br/></indent2>
123 |             <indent3>noexcept-specifier<sub>opt</sub> attribute-specifier-seq<sub>opt</sub> trailing-return-type<sub>opt</sub><br/></indent3>
124 |     </i>
125 | 
126 | </blockquote>
127 | 
128 | Modify ❡4:
129 | 
130 | <blockquote>
131 | 
132 | If a <del>*lambda-expression*</del><ins>*lambda-declarator*</ins> does not
133 | include <del>a *lambda-declarator*</del><ins>`(` *parameter-declaration-clause*
134 | `)`</ins>, it is as if the <del>*lambda-declarator*</del><ins>`(`
135 | *parameter-declaration-clause* `)`</ins> were `()`. The lambda return type is
136 | `auto`, which is replaced by the type specified by the *trailing-return-type* if
137 | provided and/or deduced from `return` statements as described in 10.1.7.4.
138 | 
139 | </blockquote>
140 | 
141 | Keep Closure types [**expr.prim.lambda.closure**] ❡3 as-is:
142 | 
143 | <blockquote>
144 | 
145 |   The return type and function parameters of the function call operator template
146 |   are derived from the *lambda-expression*'s *trailing-return-type* and
147 |   *parameter-declaration-clause* by replacing each occurrence of `auto` in the
148 |   *decl-specifier*s of the *parameter-declaration-clause* with the name of the
149 |   corresponding invented *template-parameter*. The *requires-clause* of the
150 |   function call operator template is the *requires-clause* immediately following
151 |   `<` *template-parameter-list* `>`, if any. The trailing *requires-clause* of
152 |   the function call operator or operator template is the *requires-clause*
153 |   following the *lambda-declarator*, if any.
154 | 
155 | </blockquote>
156 | 
157 |   Note: The first sentence can remain as-is because the modification to
158 |   **[expr.prim.lambda**] ❡4 create an empty *parameter-declaration-clause* if
159 |   none is provided. Similarly, the second and third sentences bind the
160 |   *requires-clause* unambiguously.
161 | 


--------------------------------------------------------------------------------
/source/P0476r1.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Bit-casting object representations
  3 | Shortname: P0476
  4 | Revision: 1
  5 | Audience: LEWG, LWG, CWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0476r1
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0476r1.bs">github.com/jfbastien/papers/blob/master/source/P0476r1.bs</a>
 10 | !Implementation: <a href="https://github.com/jfbastien/bit_cast/">github.com/jfbastien/bit_cast/</a>
 11 | Editor: JF Bastien, Apple, jfbastien@apple.com
 12 | Abstract: Obtaining equivalent object representations The Right Way™.
 13 | Date: 2016-11-11
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | 
 18 | This paper is a revision of [[P0476r0]], addressing LEWG comments from the 2016
 19 | Issaquah meeting. See [[#rev]] for details.
 20 | 
 21 | 
 22 | Background {#bg}
 23 | ==========
 24 | 
 25 | Low-level code often seeks to interpret objects of one type as another: keep the
 26 | same bits, but obtain an object of a different type. Doing so correctly is
 27 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing
 28 | rules yet these are the intuitive solutions developers mistakenly turn to.
 29 | 
 30 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment
 31 | pitfalls and allowing them to bit-cast non-default-constructible types.
 32 | 
 33 | This proposal uses appropriate concepts to prevent misuse. As the sample
 34 | implementation demonstrates we could as well use `static_assert` or template
 35 | SFINAE, but the timing of this library feature will likely coincide with
 36 | concept's standardization.
 37 | 
 38 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast
 39 | function, as `memcpy` itself isn't `constexpr`. Marking the proposed function as
 40 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`. This
 41 | leaves implementations free to use their own internal solution (e.g. LLVM has <a
 42 | href="http://llvm.org/docs/LangRef.html#bitcast-to-instruction">a `bitcast`
 43 | opcode</a>).
 44 | 
 45 | We should standardize this oft-used idiom, and avoid the pitfalls once and for
 46 | all.
 47 | 
 48 | 
 49 | Proposed Wording {#word}
 50 | ================
 51 | 
 52 | Below, substitute the `�` character with a number the editor finds appropriate
 53 | for the sub-section.
 54 | 
 55 | Synopsis {#syn}
 56 | --------
 57 | 
 58 | Under 20.2 Header `<utility>` synopsis [**utility**]:
 59 | 
 60 | <xmp>
 61 | namespace std {
 62 |   // ...
 63 |   
 64 |   // 20.2.� bit-casting:
 65 |   template<typename To, typename From>
 66 |   requires
 67 |     sizeof(To) == sizeof(From) &&
 68 |     is_trivially_copyable_v<To> &&
 69 |     is_trivially_copyable_v<From>
 70 |   constexpr To bit_cast(const From& from) noexcept;
 71 |   
 72 |   // ...
 73 | }
 74 | </xmp>
 75 | 
 76 | Details {#det}
 77 | -------
 78 | 
 79 | Under 20.2.`�` Bit-casting [**utility.bitcast**]:
 80 | 
 81 | <xmp>
 82 |   template<typename To, typename From>
 83 |   requires
 84 |     sizeof(To) == sizeof(From) &&
 85 |     is_trivially_copyable_v<To> &&
 86 |     is_trivially_copyable_v<From>
 87 |   constexpr To bit_cast(const From& from) noexcept;
 88 | </xmp>
 89 | 
 90 | 1. Requires: `sizeof(To) == sizeof(From)`,
 91 |              `is_trivially_copyable_v<To>` is `true`,
 92 |              `is_trivially_copyable_v<From>` is `true`.
 93 | 
 94 | 2. Returns: an object of type `To` whose <em>object representation</em> is equal
 95 |             to the object representation of `From`. If multiple <em>object
 96 |             representations</em> could represent the <em>value
 97 |             representation</em> of `From`, then it is unspecified which `To`
 98 |             value is returned. If no <em>value representation</em> corresponds
 99 |             to `To`'s <em>object representation</em> then the returned value is
100 |             unspecified.
101 | 
102 | Feature testing {#test}
103 | ---------------
104 | 
105 | The `__cpp_lib_bit_cast` feature test macro should be added.
106 | 
107 | Appendix {#appendix}
108 | ========
109 | 
110 | The Standard's [**basic.types**] section explicitly blesses `memcpy`:
111 | 
112 | <blockquote>
113 | 
114 |   For any trivially copyable type `T`, if two pointers to `T` point to distinct
115 |   `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class
116 |   subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into
117 |   `obj2`, `obj2` shall subsequently hold the same value as `obj1`.
118 | 
119 |   [*Example:*
120 | ```
121 |     T* t1p;
122 |     T* t2p;
123 |     // provided that t2p points to an initialized object ...
124 |     std::memcpy(t1p, t2p, sizeof(T));
125 |     // at this point, every subobject of trivially copyable type in *t1p contains
126 |     // the same value as the corresponding subobject in *t2p
127 | ```
128 |   — *end example*]
129 | 
130 | </blockquote>
131 | 
132 | Whereas section [class.union] says:
133 | 
134 | <blockquote>
135 | 
136 |   In a union, at most one of the non-static data members can be
137 |   active at any time, that is, the value of at most one of the
138 |   non-static data members can be stored in a union at any time.
139 | 
140 | </blockquote>
141 | 
142 | 
143 | Revision History {#rev}
144 | ================
145 | 
146 | r0 ➡ r1 {#r0r1}
147 | --------
148 | 
149 | The paper was reviewed by LEWG at the 2016 Issaquah meeting:
150 | 
151 | * Remove the standard layout requirement—trivially copyable suffices for the `memcpy` requirement.
152 | * We discussed removing `constexpr`, but there was no consent either way. There was some suggestion that it’ll be hard for implementers, but there's also some desire (by the same implementers) to have those features available in order to support things like `constexpr` instances of `std::variant`.
153 | * The pointer-forbidding logic was removed. It was initially there to help developers when a better tool is available, but it's easily worked around (e.g. with a `struct` containing a pointer). Note that this doesn't prevent `constexpr` versions of `bit_cast`: the implementation is allowed to error out on `bit_cast` of pointer.
154 | * Some discussion about concepts-usage, but it seems like mostly an LWG issue and we're reasonably sure that concepts will land before this or in a compatible vehicle.
155 | 
156 | Straw polls:
157 | 
158 | * Do we want to see [[P0476r0]] again? unanimous consent.
159 | * `bit_cast` should allow pointer types in `To` and `From`. **SF F N A SA** 4 5 4 2 1
160 | * `bit_cast` should be `constexpr`? **SF F N A SA** 4 3 7 2 3
161 | 
162 | 
163 | Acknowledgement {#ack}
164 | ===============
165 | 
166 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review
167 | and suggested improvements.
168 | 


--------------------------------------------------------------------------------
/source/P0502r0.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Throwing out of a parallel algorithm terminates—but how?
  3 | Shortname: P0502
  4 | Revision: 0
  5 | Audience: SG1, LWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0502r0
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0502r0.bs">github.com/jfbastien/papers/blob/master/source/P0502r0.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Editor: Bryce Adelstein Lelbach, Lawrence Berkeley National Laboratory, balelbach@lbl.gov
 12 | Editor: H. Carter Edwards, Sandia National Laboratory, hcedwar@sandia.gov
 13 | Abstract: The Committee approves of terminating if exceptions leave parallel algorithms, but where to mandate termination should be updated.
 14 | Date: 2016-11-09
 15 | Markup Shorthands: markdown yes
 16 | Toggle Diffs: yes
 17 | </pre>
 18 | 
 19 | Background {#bg}
 20 | ==========
 21 | 
 22 | The Standard was simplified in [[P0394r4]]: exceptions leaving parallel algorithms lead to `std::terminate()` being called. This matches the behavior of exceptions leaving `main()` as well as `std::thread()`.
 23 | 
 24 | The following National Body comments from [[P0488R0]] were discussed in SG1 at Issaquah, along with [[p0451r0]]:
 25 | 
 26 | * US 15, US 167: Don't `terminate()` when a parallel algorithm exits via uncaught exception and either re-add `exception_list`, add `noexcept` policies + re-add `exception_list`, make it UB or throw an unspecified exception (revert [[P0394r4]]).
 27 | * US 17, US 169: Don't `terminate()` when a parallel algorithm exits via uncaught exception and re-add `exception_list` (revert [[P0394r4]]).
 28 | * US 16, US 168: Clarify which exception is thrown when a parallel algorithm exits via uncaught exception.
 29 | * US 170: Add a customization point for `ExecutionPolicy`s which defines their exception handling behavior (don't re-add `exception_list`).
 30 | * CA 17: Preserve the `terminate()`-on-uncaught-exception behavior in the parallel algorithms (keep [[P0394r4]]).
 31 | 
 32 | Straw Polls {#straw}
 33 | -----------
 34 | 
 35 | The following straw polls were taken:
 36 | 
 37 | **Straw Poll A:** In 25.2.4 ❡2, have uncaught exception behavior be defined by `ExecutionPolicy`. In 20.19 define the behavior for the three standard policies in C++17 (`seq`, `par`, `par_unseq`) as `terminate()`.
 38 | 
 39 | <table class="def">
 40 | <tr><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
 41 | <tr><th>Many</th><th>7</th><th>1</th><th>1</th><th>0</th></tr>
 42 | </table>
 43 | 
 44 | ⟹ Consensus to write a paper for this before the end of the week. Bryce, JF, and Carter will write it.
 45 | 
 46 | **Straw Poll B:** Do we want to rename the policies to reflect the fact that they call `terminate()` instead of throwing exceptions.
 47 | 
 48 | <table class="def">
 49 | <tr><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
 50 | <tr><th>1</th><th>7</th><th>9</th><th>6</th><th>7</th></tr>
 51 | </table>
 52 | 
 53 | ⟹ No consensus for change.
 54 | 
 55 | **Straw Poll C:** Beyond the changes from the first straw poll, additional changes are required.
 56 | 
 57 | <table class="def">
 58 | <tr><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
 59 | <tr><th>2</th><th>0</th><th>10</th><th>11</th><th>6</th></tr>
 60 | </table>
 61 | 
 62 | ⟹ No consensus for change.
 63 | 
 64 | Action {#boom}
 65 | ------
 66 | 
 67 | This paper follows the guidance from *straw poll A*: there is no behavior change, but the behavior is specified to allow future execution policies which exhibit different behavior.
 68 | 
 69 | 
 70 | Proposed Wording {#word}
 71 | ================
 72 | 
 73 | Apply the following edits to section 15.5.1 ❡1 note, bullet 1.13:
 74 | 
 75 | <blockquote>
 76 | 
 77 |   15.5.1 The `std::terminate()` function [**except.terminate**]
 78 | 
 79 |   1. In some situations exception handling must be abandoned for less subtle error handling techniques. [ *Note:* These situations are:
 80 | 
 81 |   […]
 82 | 
 83 |   (1.13) — <ins>for parallel algorithms whose `ExecutionPolicy` specify such behavior (20.19.4, 20.19.5, 20.19.6), </ins>when execution of an element access function (25.2.1) of a parallel algorithm exits via an exception (25.2.4), or
 84 | 
 85 |   […]
 86 | 
 87 |   *— end note* ]
 88 | 
 89 | </blockquote>
 90 | 
 91 | Apply the following edits to section 20.19:
 92 | 
 93 | <blockquote>
 94 | 
 95 |   20.19.4 Sequential execution policy [**execpol.seq**]
 96 | 
 97 |   <xmp>class execution::sequenced_policy { unspecified };</xmp>
 98 | 
 99 |   1. The class `execution::sequenced_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm’s execution may not be parallelized.
100 |   2. <ins>During the execution of a parallel algorithm with the `execution::sequenced_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called.</ins>
101 | 
102 |   20.19.5 Parallel execution policy [**execpol.par**]
103 | 
104 |   <xmp>class execution::parallel_policy { unspecified };</xmp>
105 | 
106 |   1. The class `execution::parallel_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm’s execution may be parallelized.
107 |   2. <ins>During the execution of a parallel algorithm with the `execution::parallel_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called.</ins>
108 | 
109 |   20.19.6 Parallel+Vector execution policy [**execpol.vec**]
110 | 
111 |   <xmp>class execution::parallel_unsequenced_policy { unspecified };</xmp>
112 | 
113 |   1. The class `execution::parallel_unsequenced_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm’s execution may be parallelized and vectorized.
114 |   2. <ins>During the execution of a parallel algorithm with the `execution::parallel_unsequenced_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called.</ins>
115 | 
116 | </blockquote>
117 | 
118 | Apply the following edits to section 25.2.4 [**algorithms.parallel.exceptions**] ❡2:
119 | 
120 | <blockquote>
121 | 
122 |   During the execution of a parallel algorithm, if the invocation of an element access function exits via an uncaught exception, <ins>the behavior is determined by the `ExecutionPolicy`.</ins><del>`terminate()` is called.</del>
123 | 
124 | </blockquote>
125 | 
126 | 
127 | Acknowledgement {#ack}
128 | ===============
129 | 
130 | Thank you to all SG1 participants: David Sankel, Alisdair Meredith, Hartmut Kaiser, Pablo Halpern, Jared Hoberock, Michael Wong, Pete Becker. Special thanks to the scribe Paul McKenney.
131 | 


--------------------------------------------------------------------------------
/source/P0418r1.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Fail or succeed: there is no atomic lattice
  3 | Shortname: P0418
  4 | Revision: 1
  5 | Audience: SG1, LWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/p0418r1
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0418r1.bs">github.com/jfbastien/papers/blob/master/source/P0418r1.bs</a>
 10 | Editor: JF Bastien, Google, cxx@jfbastien.com
 11 | Editor: Hans Boehm, Google, hboehm@google.com
 12 | Abstract: Try to resolve [[LWG2445]].
 13 | Date: 2016-08-02
 14 | Markup Shorthands: markdown yes
 15 | Toggle Diffs: yes
 16 | </pre>
 17 | 
 18 | Background {#bg}
 19 | ==========
 20 | 
 21 | [[LWG2445]] was discussed and resolved by SG1 in Urbana.
 22 | 
 23 | LWG issue #2445 {#issue}
 24 | ---------------
 25 | 
 26 | <blockquote>
 27 | 
 28 |   The definitions of compare and exchange in [util.smartptr.shared.atomic] ¶32
 29 |   and [atomics.types.operations.req] ¶21 state:
 30 | 
 31 |   <blockquote>
 32 | 
 33 |     Requires: The failure argument shall not be `memory_order_release` nor
 34 |     `memory_order_acq_rel`. The failure argument shall be no stronger than the
 35 |     success argument.
 36 | 
 37 |   </blockquote>
 38 | 
 39 |   The term "stronger" isn't defined by the standard.
 40 | 
 41 |   It is hinted at by [atomics.types.operations.req] ¶22:
 42 | 
 43 |   <blockquote>
 44 | 
 45 |     When only one `memory_order` argument is supplied, the value of `success` is
 46 |     `order`, and the value of `failure` is `order` except that a value of
 47 |     `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
 48 |     and a value of `memory_order_release` shall be replaced by the value
 49 |     `memory_order_relaxed`.
 50 | 
 51 |   </blockquote>
 52 | 
 53 |   Should the standard define a partial ordering for memory orders, where consume
 54 |   and acquire are incomparable with release?
 55 | 
 56 | </blockquote>
 57 | 
 58 | Proposed SG1 resolution from Urbana {#old-res}
 59 | -----------------------------------
 60 | 
 61 | Add the following note:
 62 | 
 63 | <blockquote><ins>
 64 | 
 65 |   [Note: Memory orders have the following relative strengths implied by their
 66 |   definitions:
 67 | 
 68 | <pre class="railroad-diagram">
 69 |     T: relaxed
 70 |     Choice:
 71 |         T: release
 72 |         Sequence:
 73 |             T: consume
 74 |             T: acquire
 75 |     T: acq_rel
 76 |     T: seq_cst
 77 | </pre>
 78 | 
 79 | —end note]
 80 | 
 81 | </ins></blockquote>
 82 | 
 83 | Further issue {#moar}
 84 | -------------
 85 | 
 86 | Nonetheless:
 87 | 
 88 | * The resolution isn't on the LWG tracker.
 89 | * The proposed note was never moved to the draft Standard.
 90 | 
 91 | Furthermore, the resolution which SG1 came to in Urbana resolves what "stronger"
 92 | means by specifying a lattice, but isn't not clear on what "The failure argument
 93 | shall be no stronger than the success argument" means given the lattice.
 94 | 
 95 | There is no relationship, "stronger" or otherwise, between release and
 96 | consume/acquire. The current wording says "shall be no stronger" which isn't the
 97 | same as "shall not be stronger" in this context. Is that on purpose? At a
 98 | minimum it's not clear and should be clarified.
 99 | 
100 | Should the following be valid:
101 | 
102 | ```
103 |   compare_exchange_strong(x, y, z, memory_order_release, memory_order_acquire);
104 | ```
105 | 
106 | Or does the code need to be:
107 | 
108 | ```
109 |   compare_exchange_strong(x, y, z, memory_order_acq_rel, memory_order_acquire);
110 | ```
111 | 
112 | Similar questions can be asked for `memory_order_consume` ordering on `failure`.
113 | 
114 | Is there even a point in restricting `success`/`failure` orderings? On
115 | architectures with load-linked/store-conditional instructions the load and store
116 | are distinct instructions which can each have their own memory ordering (with
117 | appropriate leading/trailing fences if required), whereas architectures with
118 | compare-and-exchange already have a limited set of instructions to choose
119 | from. The current limitation (assuming [[LWG2445]] is resolved) only seems to
120 | restrict compilers on load-linked/store-conditional architectures.
121 | 
122 | The following code could be valid if the stored data didn't need to be published
123 | nor ordered, whereas any retry needs to read additional data:
124 | 
125 | ```
126 |   compare_exchange_strong(x, y, z, memory_order_relaxed, memory_order_acquire);
127 | ```
128 | 
129 | Even if—for lack of clever instruction—architectures cannot take advantage of
130 | such code, compiler are able to optimize atomics in all sorts of clever ways as
131 | discussed in [[N4455]].
132 | 
133 | Updated proposal {#new-res}
134 | ================
135 | 
136 | This paper proposes removing the "stronger" restrictions between
137 | compare-exchange's `success` and `failure` ordering, and doesn't add a lattice
138 | to order atomic orderings. The only remaining restriction is that
139 | `memory_order_release` and `memory_order_acq_rel` for `failure` are still
140 | disallowed: a failed compare-exchange doesn't store, the current model is
141 | therefore not sensible with these orderings.
142 | 
143 | There have been discussions about `memory_order_release` loads, e.g. for
144 | seqlock. Such potential changes are left up to future papers.
145 | 
146 | Modify [util.smartptr.shared.atomic] ¶32 as follows:
147 | 
148 | <blockquote>
149 | 
150 |   Requires: The failure argument shall not be `memory_order_release` nor
151 |   `memory_order_acq_rel`.<del> The failure argument shall be no stronger than
152 |   the success argument.</del>
153 | 
154 | </blockquote>
155 | 
156 | Modify [atomics.types.operations.req] ¶21 as follows:
157 | 
158 | <blockquote>
159 | 
160 |   Requires: The failure argument shall not be `memory_order_release` nor
161 |   `memory_order_acq_rel`.<del> The failure argument shall be no stronger than
162 |   the success argument.</del>
163 | 
164 | </blockquote>
165 | 
166 | Leave [atomics.types.operations.req] ¶22 as-is:
167 | 
168 | <blockquote>
169 | 
170 |   Effects: Atomically, compares the contents of the memory pointed to by
171 |   `object` or by `this` for equality with that in `expected`, and if `true`,
172 |   replaces the contents of the memory pointed to by `object` or by `this` with
173 |   that in `desired`, and if `false`, updates the contents of the memory in
174 |   `expected` with the contents of the memory pointed to by `object` or by
175 |   `this`. Further, if the comparison is `true`, memory is affected according to
176 |   the value of `success`, and if the comparison is `false`, memory is affected
177 |   according to the value of `failure`.
178 | 
179 |   When only one `memory_order` argument is supplied, the value of `success` is
180 |   `order`, and the value of `failure` is `order` except that a value of
181 |   `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
182 |   and a value of `memory_order_release` shall be replaced by the value
183 |   `memory_order_relaxed`.
184 | 
185 |   If the operation returns `true`, these operations are atomic read-modify-write
186 |   operations (1.10). Otherwise, these operations are atomic load operations.
187 | 
188 | </blockquote>
189 | 
190 | Acknowledgement {#ack}
191 | ===============
192 | 
193 | Thanks to John McCall for pointing out that the proposed resolution was still
194 | insufficient, and for providing ample feedback.
195 | 


--------------------------------------------------------------------------------
/source/P0418r2.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Fail or succeed: there is no atomic lattice
  3 | Shortname: P0418
  4 | Revision: 2
  5 | Audience: SG1, LWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/p0418r2
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0418r2.bs">github.com/jfbastien/papers/blob/master/source/P0418r2.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Editor: Hans Boehm, Google, hboehm@google.com
 12 | Abstract: Try to resolve [[LWG2445]].
 13 | Date: 2016-11-09
 14 | Markup Shorthands: markdown yes
 15 | Toggle Diffs: yes
 16 | </pre>
 17 | 
 18 | Background {#bg}
 19 | ==========
 20 | 
 21 | [[LWG2445]] was discussed and resolved by SG1 in Urbana.
 22 | 
 23 | This revision updates [[P0418r1]] with accurate wording for
 24 | [util.smartptr.shared.atomic] ¶32, to be deleted from [[N4606]].
 25 | 
 26 | LWG issue #2445 {#issue}
 27 | ---------------
 28 | 
 29 | <blockquote>
 30 | 
 31 |   The definitions of compare and exchange in [util.smartptr.shared.atomic]
 32 |   ¶32 and [atomics.types.operations.req] ¶21 state:
 33 | 
 34 |   <blockquote>
 35 | 
 36 |     Requires: The failure argument shall not be `memory_order_release` nor
 37 |     `memory_order_acq_rel`. The failure argument shall be no stronger than the
 38 |     success argument.
 39 | 
 40 |   </blockquote>
 41 | 
 42 |   The term "stronger" isn't defined by the standard.
 43 | 
 44 |   It is hinted at by [atomics.types.operations.req] ¶22:
 45 | 
 46 |   <blockquote>
 47 | 
 48 |     When only one `memory_order` argument is supplied, the value of `success` is
 49 |     `order`, and the value of `failure` is `order` except that a value of
 50 |     `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
 51 |     and a value of `memory_order_release` shall be replaced by the value
 52 |     `memory_order_relaxed`.
 53 | 
 54 |   </blockquote>
 55 | 
 56 |   Should the standard define a partial ordering for memory orders, where consume
 57 |   and acquire are incomparable with release?
 58 | 
 59 | </blockquote>
 60 | 
 61 | Proposed SG1 resolution from Urbana {#old-res}
 62 | -----------------------------------
 63 | 
 64 | Add the following note:
 65 | 
 66 | <blockquote><ins>
 67 | 
 68 |   [ *Note:* Memory orders have the following relative strengths implied by their
 69 |   definitions:
 70 | 
 71 | <pre class="railroad-diagram">
 72 |     T: relaxed
 73 |     Choice:
 74 |         T: release
 75 |         Sequence:
 76 |             T: consume
 77 |             T: acquire
 78 |     T: acq_rel
 79 |     T: seq_cst
 80 | </pre>
 81 | 
 82 | *—end note* ]
 83 | 
 84 | </ins></blockquote>
 85 | 
 86 | Further issue {#moar}
 87 | -------------
 88 | 
 89 | Nonetheless:
 90 | 
 91 | * The resolution isn't on the LWG tracker.
 92 | * The proposed note was never moved to the draft Standard.
 93 | 
 94 | Furthermore, the resolution which SG1 came to in Urbana resolves what "stronger"
 95 | means by specifying a lattice, but isn't not clear on what "The failure argument
 96 | shall be no stronger than the success argument" means given the lattice.
 97 | 
 98 | There is no relationship, "stronger" or otherwise, between release and
 99 | consume/acquire. The current wording says "shall be no stronger" which isn't the
100 | same as "shall not be stronger" in this context. Is that on purpose? At a
101 | minimum it's not clear and should be clarified.
102 | 
103 | Should the following be valid:
104 | 
105 | ```
106 |   compare_exchange_strong(x, y, z, memory_order_release, memory_order_acquire);
107 | ```
108 | 
109 | Or does the code need to be:
110 | 
111 | ```
112 |   compare_exchange_strong(x, y, z, memory_order_acq_rel, memory_order_acquire);
113 | ```
114 | 
115 | Similar questions can be asked for `memory_order_consume` ordering on `failure`.
116 | 
117 | Is there even a point in restricting `success`/`failure` orderings? On
118 | architectures with load-linked/store-conditional instructions the load and store
119 | are distinct instructions which can each have their own memory ordering (with
120 | appropriate leading/trailing fences if required), whereas architectures with
121 | compare-and-exchange already have a limited set of instructions to choose
122 | from. The current limitation (assuming [[LWG2445]] is resolved) only seems to
123 | restrict compilers on load-linked/store-conditional architectures.
124 | 
125 | The following code could be valid if the stored data didn't need to be published
126 | nor ordered, whereas any retry needs to read additional data:
127 | 
128 | ```
129 |   compare_exchange_strong(x, y, z, memory_order_relaxed, memory_order_acquire);
130 | ```
131 | 
132 | Even if—for lack of clever instruction—architectures cannot take advantage of
133 | such code, compiler are able to optimize atomics in all sorts of clever ways as
134 | discussed in [[N4455]].
135 | 
136 | Updated proposal {#new-res}
137 | ================
138 | 
139 | This paper proposes removing the "stronger" restrictions between
140 | compare-exchange's `success` and `failure` ordering, and doesn't add a lattice
141 | to order atomic orderings. The only remaining restriction is that
142 | `memory_order_release` and `memory_order_acq_rel` for `failure` are still
143 | disallowed: a failed compare-exchange doesn't store, the current model is
144 | therefore not sensible with these orderings.
145 | 
146 | There have been discussions about `memory_order_release` loads, e.g. for
147 | seqlock. Such potential changes are left up to future papers.
148 | 
149 | Modify [util.smartptr.shared.atomic] ¶32 as follows:
150 | 
151 | <blockquote>
152 | 
153 |   Requires: <ins>The </ins>failure <ins>argument </ins>shall not be
154 |   `memory_order_release`<del>,</del><ins> nor</ins> `memory_order_acq_rel`<del>,
155 |   or stronger than success</del>.
156 | 
157 | </blockquote>
158 | 
159 | Modify [atomics.types.operations.req] ¶21 as follows:
160 | 
161 | <blockquote>
162 | 
163 |   Requires: The failure argument shall not be `memory_order_release` nor
164 |   `memory_order_acq_rel`.<del> The failure argument shall be no stronger than
165 |   the success argument.</del>
166 | 
167 | </blockquote>
168 | 
169 | Leave [atomics.types.operations.req] ¶22 as-is:
170 | 
171 | <blockquote>
172 | 
173 |   Effects: Atomically, compares the contents of the memory pointed to by
174 |   `object` or by `this` for equality with that in `expected`, and if `true`,
175 |   replaces the contents of the memory pointed to by `object` or by `this` with
176 |   that in `desired`, and if `false`, updates the contents of the memory in
177 |   `expected` with the contents of the memory pointed to by `object` or by
178 |   `this`. Further, if the comparison is `true`, memory is affected according to
179 |   the value of `success`, and if the comparison is `false`, memory is affected
180 |   according to the value of `failure`.
181 | 
182 |   When only one `memory_order` argument is supplied, the value of `success` is
183 |   `order`, and the value of `failure` is `order` except that a value of
184 |   `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
185 |   and a value of `memory_order_release` shall be replaced by the value
186 |   `memory_order_relaxed`.
187 | 
188 |   If the operation returns `true`, these operations are atomic read-modify-write
189 |   operations (1.10). Otherwise, these operations are atomic load operations.
190 | 
191 | </blockquote>
192 | 
193 | Acknowledgement {#ack}
194 | ===============
195 | 
196 | Thanks to John McCall for pointing out that the proposed resolution was still
197 | insufficient, and for providing ample feedback.
198 | 


--------------------------------------------------------------------------------
/source/p1119r0.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: ABI for std::hardware_{constructive,destructive}_interference_size
  3 | Shortname: P1119
  4 | Revision: 0
  5 | Audience: SG1, LEWG, LWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/d1119r0
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/p1119r0.bs">github.com/jfbastien/papers/blob/master/source/p1119r0.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Editor: Olivier Giroux, NVIDIA, ogiroux@nvidia.com
 12 | Editor: Jonathan Wakely, RedHat, cxx@kayari.org
 13 | Editor: Hal Finkel, Argonne National Laboratory, hfinkel@anl.gov
 14 | Editor: Thomas Rodgers, RedHat, trodgers@redhat.com
 15 | Editor: Matthias Kretz, GSI, m.kretz@gsi.de
 16 | Abstract: std::hardware_{constructive,destructive}_interference_size exposes potential ABI issues, and that's OK. This position paper clarifies the committee's position.
 17 | Date: 2018-06-22
 18 | Markup Shorthands: markdown yes
 19 | </pre>
 20 | 
 21 | Wording {#word}
 22 | =======
 23 | 
 24 | [[P0154R1]] introduced `constexpr std::hardware_{constructive,destructive}_interference_size` to C++17:
 25 | 
 26 | Header `<new>` synopsis [**new.syn**]:
 27 | 
 28 | <blockquote>
 29 | 
 30 | <xmp>
 31 | 
 32 | namespace std {
 33 |   // ...
 34 |   // 21.6.5, hardware interference size
 35 |   inline constexpr size_t hardware_destructive_interference_size = implementation-defined;
 36 |   inline constexpr size_t hardware_constructive_interference_size = implementation-defined;
 37 |   // ...
 38 | }
 39 | 
 40 | </xmp>
 41 | 
 42 | </blockquote>
 43 | 
 44 | Hardware interference size [**hardware.interference**]:
 45 | 
 46 | <blockquote>
 47 | 
 48 | <xmp>inline constexpr size_t hardware_destructive_interference_size = implementation-defined;</xmp>
 49 | 
 50 |   This number is the minimum recommended offset between two concurrently-accessed
 51 |   objects to avoid additional performance degradation due to contention introduced
 52 |   by the implementation. It shall be at least `alignof(max_align_t)`.
 53 | 
 54 | [ *Example*:
 55 | 
 56 | <xmp>
 57 |   struct keep_apart {
 58 |     alignas(hardware_destructive_interference_size) atomic<int> cat;
 59 |     alignas(hardware_destructive_interference_size) atomic<int> dog;
 60 |   };
 61 | </xmp>
 62 | 
 63 | — *end example* ]
 64 | 
 65 | <xmp>inline constexpr size_t hardware_constructive_interference_size = implementation-defined;</xmp>
 66 | 
 67 |   This number is the maximum recommended size of contiguous memory occupied by
 68 |   two objects accessed with temporal locality by concurrent threads. It shall be
 69 |   at least `alignof(max_align_t)`.
 70 | 
 71 | [ *Example*:
 72 | 
 73 | <xmp>
 74 |   struct together {
 75 |     atomic<int> dog;
 76 |     int puppy;
 77 |   };
 78 |   struct kennel {
 79 |   // Other data members...
 80 |      alignas(sizeof(together)) together pack;
 81 |   // Other data members...
 82 |   };
 83 |   static_assert(sizeof(together) <= hardware_constructive_interference_size);
 84 | </xmp>
 85 | 
 86 | — *end example* ]
 87 | 
 88 | </blockquote>
 89 | 
 90 | Discussions {#discussions}
 91 | ===========
 92 | 
 93 | The paper was discussed in:
 94 | 
 95 |  * [SG1 Kona](http://wiki.edg.com/bin/view/Wg21kona2015/N4523)
 96 |  * [LEWG Kona](http://wiki.edg.com/bin/view/Wg21kona2015/P0154)
 97 |  * [LEWG Jacksonville](http://wiki.edg.com/bin/view/Wg21jacksonville/P0154)
 98 |  * [LWG Jacksonville](http://wiki.edg.com/bin/view/Wg21jacksonville/D0154R1)
 99 | 
100 | ABI issues were considered in these discussions, and the committee decided that
101 | having these values was worth the potential pain points. ABI issues can arise as
102 | follows:
103 | 
104 |   1. A developer asks the compiler to generate code for multiple targets of the
105 |      same ISA, and these targets prefer different interference sizes.
106 |   1. A developer indicates that code should be generated for heterogeneous system
107 |      (such as CPU and GPU), which prefer different interference sizes.
108 |   1. A developer uses different compilers, and links the result together.
109 | 
110 | A further ABI issue was added by [[P0607r0]] by making the variables `inline`:
111 | in case 1. above the interference size values differ between translation units,
112 | which is a problem if they are used in an ODR-relevant context. That paper noted:
113 | 
114 | <blockquote>
115 | 
116 |   [*Drafting notes*: The removal of the explicit `static` specifier for the
117 |   namespace-scope constants `hardware_destructive_interference_size` and
118 |   `hardware_constructive_interference_size` is still required because adding
119 |   `inline` alone would still not solve the ODR violation problem here.
120 |   — *end drafting notes*]
121 | 
122 | </blockquote>
123 | 
124 | This change indeed fixes the ODR issue where two translation units translated
125 | with the same interference size values may violate ODR when used with e.g.
126 | `std::max`. It however introduces a new ODR issue for case 1. above.
127 | 
128 | Richard Smith and Tim Song propose changing the definition to:
129 | 
130 | <xmp>
131 | static constexpr const std::size_t& hardware_destructive_interference_size = implementation-defined;
132 | static constexpr const std::size_t& hardware_constructive_interference_size = implementation-defined;
133 | </xmp>
134 | 
135 | We propose a discussion and poll on this topic.
136 | 
137 | 
138 | Pushback {#push}
139 | ========
140 | 
141 | The maintainers of clang and GCC
142 | have
143 | [discussed an implementation strategy](http://lists.llvm.org/pipermail/cfe-dev/2018-May/058073.html),
144 | but received pushback based on the above ABI issues. The messaging from the
145 | committee wasn't clear that ABI issues were discussed and the proposal accepted
146 | despite these issues. This type of ABI problem is difficult or impossible to
147 | warn about, some implementors are worried.
148 | 
149 | Some implementors are worries that they have the following choices when
150 | implementing, and are unsure which approach to take:
151 | 
152 |   1. Pick a value once for each ABI and cast it in stone forever, even if
153 |      microarchitectural revisions cause the values to change.
154 |   1. Change the value between microarchitectures, even though that's an ABI
155 |      break?
156 |   1. Something else.
157 | 
158 | The authors believe that the ABI issues are acceptable because:
159 | 
160 |   * As demonstrated in the original paper, developers already write code like
161 |     this, using macros. Any ABI issue that exist with this proposal already
162 |     existed before the proposal.
163 |   * Many uses of these values have no ABI breakage potential because they only
164 |     target one variant of one ISA.
165 |   * The usecase for these values is to lay out datastructures. These
166 |     datastructures shouldn't be shared across translation units which follow
167 |     different ABIs.
168 |   * Similar ABI issues already exist with `max_align_t` and `intmax_t`.
169 |   * Implementations can offer compiler flags which specifically control ABI. For
170 |     example, `-mcpu` could keep the ABI stable, but `-mcpu-abi` would change it.
171 | 
172 | Polls {#polls}
173 | =====
174 | 
175 | We propose the following poll for SG1:
176 | 
177 | > The committee understands the ABI issues  with `std::hardware_{constructive,destructive}_interference_size`, yet chooses to  standardize these values nonetheless.
178 | 
179 | The committee could also consider adding a note to point out ABI issues with
180 | these values. This would be a novel note, since ABI isn't discussed in the
181 | Standard.
182 | 
183 | We propose the following poll for SG1, LEWG, and LWG:
184 | 
185 | > Both ODR issues should be addressed, the type should therefore be changed to `static constexpr const std::size_t&`.
186 | 
187 | Not all authors of this paper are in favor of this direction, but all agree the
188 | discussion is worth having.
189 | 


--------------------------------------------------------------------------------
/source/N4523.rst:
--------------------------------------------------------------------------------
  1 | ===================================================================
  2 | N4523 ``constexpr std::thread::hardware_{true,false}_sharing_size``
  3 | ===================================================================
  4 | 
  5 | :Author: JF Bastien
  6 | :Contact: jfb@google.com
  7 | :Author: Olivier Giroux
  8 | :Contact: ogiroux@nvidia.com
  9 | :Date: 2015-05-21
 10 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4523.rst
 11 | 
 12 | ---------
 13 | Rationale
 14 | ---------
 15 | 
 16 | Starting with C++11, the library includes
 17 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity
 18 | useful in the design of control structures in multi-threaded programs: the
 19 | extent of threads that do not interfere (to the first-order). Established
 20 | practice throughout the industry also relies on a second implementation
 21 | quantity, used instead in the design of data structures in the same programs.
 22 | This quantity is the granularity of memory that does not interfere (to the
 23 | first-order), commonly referred to as the *cache-line size*.
 24 | 
 25 | Uses of *cache-line size* fall into two broad categories:
 26 | 
 27 | * Avoiding false-sharing between objects with temporally disjoint runtime access
 28 |   patterns from different threads. e.g. Producer-consumer queues.
 29 | * Promoting true-sharing between objects which have temporally local runtime
 30 |   access patterns. e.g. The ``barrier`` example, as illustrated in N4522_.
 31 | 
 32 | .. _N4522: http://wg21.link/N4522
 33 | 
 34 | The most sigificant issue with this useful implementation quantity is the
 35 | questionable portability of the methods used in current practice to determine
 36 | its value, despite their pervasiveness and popularity as a group. In the
 37 | appendix_ we review several different compile-time and run-time methods. The
 38 | portability problem with most of these methods is that they expose a
 39 | micro-architectural detail without accounting for the intent of the implementors
 40 | (such as we are) over the life of the ISA or ABI.
 41 | 
 42 | We aim to contribute a modest invention for this cause, abstractions for this
 43 | quantity that can be conservatively defined for given purposes by
 44 | implementations:
 45 | 
 46 | * *False-sharing size*: a number that's suitable as an offset between two
 47 |   objects to likely avoid false-sharing due to different runtime access patterns
 48 |   from different threads.
 49 | * *True-sharing size*: a number that's suitable as a limit on two objects'
 50 |   combined memory footprint size and base alignment to likely promote
 51 |   true-sharing between them.
 52 | 
 53 | In both cases these values are provided on a quality of implementation basis,
 54 | purely as hints that are likely to improve performance. These are ideal portable
 55 | values to use with the ``alignas()`` keyword, for which there currently exists
 56 | nearly no standard-supported portable uses.
 57 | 
 58 | -----------------
 59 | Proposed addition
 60 | -----------------
 61 | 
 62 | We propose adding the following to the standard:
 63 | 
 64 | Under 30.3.1 Class ``thread`` [**thread.thread.class**]:
 65 | 
 66 | .. code-block:: c++
 67 | 
 68 |   namespace std {
 69 |     class thread {
 70 |       // ...
 71 |     public:
 72 |       static constexpr size_t hardware_false_sharing_size = /* implementation-defined */;
 73 |       static constexpr size_t hardware_true_sharing_size = /* implementation-defined */;
 74 |       // ...
 75 |     };
 76 |   }
 77 | 
 78 | Under 30.3.1.6 ``thread`` static members [**thread.thread.static**]:
 79 | 
 80 | ``constexpr size_t hardware_false_sharing_size = /* implementation-defined */;``
 81 | 
 82 | This number is the minimum recommended offset between two concurrently-accessed
 83 | objects to avoid additional performance degradation due to contention introduced
 84 | by the implementation.
 85 | 
 86 | [*Example:*
 87 | 
 88 | .. code-block:: c++
 89 | 
 90 |   struct apart {
 91 |     alignas(hardware_false_sharing_size) atomic<int> flag1, flag2;
 92 |   };
 93 | 
 94 | — *end example*]
 95 | 
 96 | ``constexpr size_t hardware_true_sharing_size = /* implementation-defined */;``
 97 | 
 98 | This number is the minimum recommended alignment and maximum recommended size of
 99 | contiguous memory occupied by two objects accessed with temporal locality by
100 | concurrent threads.
101 | 
102 | [*Example:*
103 | 
104 | .. code-block:: c++
105 | 
106 |   alignas(hardware_true_sharing_size) struct colocated {
107 |     atomic<int> flag;
108 |     int tinydata;
109 |   };
110 |   static_assert(sizeof(colocated) <= hardware_true_sharing_size);
111 | 
112 | — *end example*]
113 | 
114 | The ``__cpp_lib_thread_hardware_sharing_size`` feature test macro should be
115 | added.
116 | 
117 | .. _appendix:
118 | 
119 | --------
120 | Appendix
121 | --------
122 | 
123 | Compile-time *cache-line size*
124 | ==============================
125 | 
126 | We informatively list a few ways in which the L1 *cache-line size* is obtained
127 | in different open-source projects at compile-time.
128 | 
129 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured
130 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this
131 | value is determined through the configure-time option
132 | ``CONFIG_<ARCH>_L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT``
133 | is hard-coded in the architecture's ``include/asm/cache.h`` header.
134 | 
135 | Many open-source projects from Google contain a ``base/port.h`` header which
136 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of
137 | architecture detection macros. These header files have often diverged. A token
138 | example from the autofdo_ project is:
139 | 
140 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h
141 | 
142 | .. code-block:: c++
143 | 
144 |   // Cache line alignment
145 |   #if defined(__i386__) || defined(__x86_64__)
146 |   #define CACHELINE_SIZE 64
147 |   #elif defined(__powerpc64__)
148 |   // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines.
149 |   // Need to check if this is appropriate for other PowerPC64 systems.
150 |   #define CACHELINE_SIZE 128
151 |   #elif defined(__arm__)
152 |   // Cache line sizes for ARM: These values are not strictly correct since
153 |   // cache line sizes depend on implementations, not architectures.  There
154 |   // are even implementations with cache line sizes configurable at boot
155 |   // time.
156 |   #if defined(__ARM_ARCH_5T__)
157 |   #define CACHELINE_SIZE 32
158 |   #elif defined(__ARM_ARCH_7A__)
159 |   #define CACHELINE_SIZE 64
160 |   #endif
161 |   #endif
162 | 
163 |   #ifndef CACHELINE_SIZE
164 |   // A reasonable default guess.  Note that overestimates tend to waste more
165 |   // space, while underestimates tend to waste more time.
166 |   #define CACHELINE_SIZE 64
167 |   #endif
168 | 
169 |   #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE)))
170 | 
171 | Runtime *cache-line size*
172 | =========================
173 | 
174 | We informatively list a few ways in which the L1 *cache-line size* can be
175 | obtained on different operating systems and architectures at runtime.
176 | 
177 | On OSX one would use:
178 | 
179 | .. code-block:: c++
180 | 
181 |   sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0)
182 | 
183 | On Windows one would use:
184 | 
185 | .. code-block:: c++
186 | 
187 |   GetLogicalProcessorInformation(&buf[0], &sizeof_buf);
188 |   for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
189 |     if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1)
190 |       cacheline_size = buf[i].Cache.LineSize;
191 | 
192 | On Linux one would either use:
193 | 
194 | .. code-block:: c++
195 | 
196 |   p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
197 |   fscanf(p, "%d", &cacheline_size);
198 | 
199 | or:
200 | 
201 | .. code-block:: c++
202 | 
203 |   sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
204 | 
205 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which
206 | leaves the result in ``ECX``, which needs further work to extract.
207 | 
208 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to
209 | extract.
210 | 


--------------------------------------------------------------------------------
/source/P1018R19.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: C++ Language Evolution status
  3 | Shortname: P1018
  4 | Revision: 19
  5 | Audience: WG21, EWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P1018r19
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P1018r19.bs">github.com/jfbastien/papers/blob/master/source/P1018r19.bs</a>
 10 | Editor: JF Bastien, Woven Planet, cxx@jfbastien.com
 11 | Date: 2022-11-15
 12 | Markup Shorthands: markdown yes
 13 | Toggle Diffs: no
 14 | No abstract: false
 15 | Abstract: This paper is a collection of items that the C++ Language Evolution group has worked on in the latest meeting, their status, and plans for the future.
 16 | </pre>
 17 | 
 18 | <style>
 19 | table, th, td { border: 2px solid grey; }
 20 | </style>
 21 | 
 22 | Executive summary {#summary}
 23 | =================
 24 | 
 25 | The Evolution Working Group did not meet in-person between the February 2020 meeting in Prague, until November 2022 in Kona. You will find EWG's pandemic activities in [[P1018r18]].
 26 | 
 27 | This paper summarizes all of the work that was performed in the November 2022 Kona meeting.
 28 | 
 29 | Work Performed {#work}
 30 | ==============
 31 | 
 32 | This meeting was the first towards finalizing C++23, see [[P1000r4]] for the full schedule. In the ISO process, we received a variety of comments from different National Bodies. The full list is tracked as <a href="https://github.com/cplusplus/nbballot/issues?q=is%3Aissue+milestone%3A%22CD+C%2B%2B23%22+">GitHub issues</a>. EWG received 33 National Body comments. Of those, 16 were closed as duplicates, and 17 were reviewed with the following outcomes:
 33 | 
 34 | <ul>
 35 |   <li>5 Rejected: <a href="https://github.com/cplusplus/nbballot/issues/429">FR-027-006</a>, <a href="https://github.com/cplusplus/nbballot/issues/493">US 21-053</a>, <a href="https://github.com/cplusplus/nbballot/issues/485">US 14-043</a>, <a href="https://github.com/cplusplus/nbballot/issues/486">US 12-041</a>, <a href="https://github.com/cplusplus/nbballot/issues/425">FR-023-007</a></li>
 36 |   <li>9 Accepted &amp; Forwarded a resolution to CWG: <a href="https://github.com/cplusplus/nbballot/issues/428">FR-026-018</a>, <a href="https://github.com/cplusplus/nbballot/issues/431">GB-059</a>, <a href="https://github.com/cplusplus/nbballot/issues/451">GB-051</a>, <a href="https://github.com/cplusplus/nbballot/issues/488">US 16-045</a>, <a href="https://github.com/cplusplus/nbballot/issues/467">DE-046</a>, <a href="https://github.com/cplusplus/nbballot/issues/460">CA-065</a>, <a href="https://github.com/cplusplus/nbballot/issues/443">GB-048</a>, <a href="https://github.com/cplusplus/nbballot/issues/430">GB-055</a>, <a href="https://github.com/cplusplus/nbballot/issues/471">DE-038</a></li>
 37 |   <li>1 Forwarded to LEWG with EWG Blessing: <a href="https://github.com/cplusplus/nbballot/issues/457">GB-089</a></li>
 38 |   <li>2 Needs to come back to EWG (Will see in Telecons/next meeting): <a href="https://github.com/cplusplus/nbballot/issues/427">FR-025-017</a>, <a href="https://github.com/cplusplus/nbballot/issues/480">US 8-036</a></li>
 39 | </ul>
 40 | 
 41 | Separately from finalizing C++23, we’ve continued early work towards C++26 and later. We track outstanding proposals in GitHub as well, here are <a href="https://github.com/cplusplus/papers/issues?q=is%3Aissue+is%3Aopen+label%3Aewg+-label%3Aewgi+-label%3Aneeds-revision+-label%3Alewg+-label%3Alewgi+-label%3ATentativelyReady+-label%3AEWG-vote-on-me+sort%3Aupdated-desc+-label%3ASG16+">the ones for EWG which are ready to review</a>. EWG and its incubator EWGI started the week with 83 papers to review (some not for the first time), EWG therefore had to prioritize using a variety of criteria such as the C++ Direction Group’s recommendations in [[P2000r4]]. During the week forwarded the following papers to CWG for C++26:
 42 | <ul>
 43 |   <li>[[P1061R0]] Structured Bindings can introduce a Pack</li>
 44 |   <li>[[P2361R0]] Unevaluated string literals</li>
 45 |   <li>[[P2014R0]] aligned allocation of coroutine frames</li>
 46 |   <li>[[P0609R1]] Attributes for Structured Bindings</li>
 47 |   <li>[[P2558R0]] Add @, $, and ` to the basic character set</li>
 48 |   <li>[[P2621R0]] UB? In my Lexer?</li>
 49 |   <li>[[P2686R0]] Updated wording and implementation experience for P1481 (constexpr structured bindings)</li>
 50 |   <li>[[P1967R0]] #embed - a simple, scannable preprocessor-based resource acquisition method</li>
 51 |   <li>[[P2593R0]] Allowing static_assert(false): To be forwarded after the next meeting unless a better proposal comes up</li>
 52 | </ul>
 53 | This doesn’t mean that they will all be in C++26, they are only tentatively on track to be in C++26.
 54 | 
 55 | The following papers were reviewed and forwarded to LEWG, the library evolution group, meaning that either EWG sees no need for language input, or provided language input to the library group, or requests library input to further the language work:
 56 | <ul>
 57 |   <li>[[P2641R0]] Checking if a union alternative is active</li>
 58 |   <li>[[P2546R0]] Debugging Support</li>
 59 |   <li>[[P0876R5]] fiber_context - fibers without scheduler</li>
 60 |   <li>[[P2141R0]] Aggregates are named tuples</li>
 61 | </ul>
 62 | 
 63 | The following papers were reviewed and encouraged to come back with an update:
 64 | <ul>
 65 |   <li>[[P0901R2]] Size feedback in operator new</li>
 66 |   <li>[[P2677R0]] Reconsidering concepts in-place syntax</li>
 67 |   <li>Pattern Matching:</li>
 68 |   <ul>
 69 |     <li>[[P2211R0]] Exhaustiveness Checking for Pattern Matching</li>
 70 |     <li>[[P2169R0]] A Nice Placeholder With No Name</li>
 71 |     <li>[[P2392R2]] Pattern matching using is and as</li>
 72 |     <li>[[P2688R0]] Pattern Matching Discussion for Kona 2022</li>
 73 |     <li>[[P2561R1]] An error propagation operator</li>
 74 |     <li>[[P2656R0]] C++ Ecosystem International Standard</li>
 75 |   </ul>
 76 |   <li>Pointer Provenance</li>
 77 |   <ul>
 78 |     <li>[[P2188R0]] Zap the Zap: Pointers should just be bags of bits</li>
 79 |     <li>P2434R0 (not yet published) Nondeterministic pointer provenance</li>
 80 |   </ul>
 81 |   <li>[[P2547R0]] Language support for customisable functions</li>
 82 |   <li>[[P2632R0]] A plan for better template meta programming facilities in C++26</li>
 83 |   <li>[[P2671R0]] Syntax choices for generalized pack declaration and usage</li>
 84 | </ul>
 85 | 
 86 | The following papers were reviewed and had no consensus for further work:
 87 | <ul>
 88 |   <li>[[P2669R0]] Deprecate changing kind of names in class template specializations</li>
 89 |   <li>[[P2174R0]] Compound Literals</li>
 90 |   <li>[[P2381R0]] Pattern Matching with Exception Handling</li>
 91 | </ul>
 92 | 
 93 | CWG asked for EWG feedback on:
 94 | <ul>
 95 |   <li>[[CWG2463]] Conditions for trivially copyable classes, the conclusion was that a paper was needed to address the issue</li>
 96 | </ul>
 97 | 
 98 | The committee also tracks defects through various groups. EWG issues were tracked in [[P1018r18]], and will shortly move to GitHub. This week we reviewed EWG issues as follows:
 99 | <ul>
100 |   <li>2 Marked Resolved</li>
101 |   <li>1 Marked as “Needs a Paper”</li>
102 |   <li>17 Closed as “Not A Defect”</li>
103 | </ul>
104 | 
105 | EWG hosted an evening session on “the future of C++”. The results in a few weeks (once the committee discussed internally, based on the survey feedback that sent attendees). It was well attended with 100+ participants, and much frank discussion.
106 | 
107 | A session on [[P2676r0]] he Val object model was held, so that C++ committee members learn about the work David Abrahams is doing at Adobe on the Val language. We separately heard from <a href="https://www.youtube.com/watch?v=ELeZAKCN4tY">Herb Sutter on CppFront</a>. We also had good engagement from a few folks who have worked on the <a href="https://github.com/carbon-language/carbon-lang">Carbon programming language</a>. As this is the C++ committee, we also often talk about languages such as Rust, Circle, Zig and others.
108 | 
109 | <pre class=biblio>
110 | {
111 |     "P1018r18": {
112 |         "href": "https://wg21.link/p1018r18",
113 |         "title": "C++ Language Evolution status - pandemic edition - 2022/08-2022/011",
114 |         "authors": ["JF Bastien"],
115 |         "date": "2022-10-24"
116 |     }
117 | }
118 | </pre>
119 | 


--------------------------------------------------------------------------------
/source/P0528r1.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange
  3 | Shortname: P0528
  4 | Revision: 1
  5 | Audience: SG1, EWG, CWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0528r1
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0528r1.bs">github.com/jfbastien/papers/blob/master/source/P0528r1.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Editor: Michael Spencer, Sony Playstation, bigcheesegs@gmail.com
 12 | Abstract: Compare-and-exchange on a struct with padding bits should Just Work.
 13 | Date: 2018-02-11
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | This issue has been discussed by the authors at every recent Standards meetings,
 18 | yet a full solution has been elusive despite helpful proposals. We believe that
 19 | this proposal can fix this oft-encountered problem once and for all.
 20 | 
 21 | [[P0528r0]] details extensive background on this problem (not repeated here),
 22 | and proposed standardizing a trait, `has_padding_bits`, and using it on
 23 | `compare_and_exchange_*`. This paper applies EWG guidance and simply adds a
 24 | note.
 25 | 
 26 | 
 27 | Edit History {#edit}
 28 | ============
 29 | 
 30 | r0 → r1 {#r0r1}
 31 | -------
 32 | 
 33 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming
 34 | value of `T` have a consistent value for the purposes of read/modify/write
 35 | atomic operations?
 36 | 
 37 | Purposefully not addressed in this paper:
 38 | 
 39 |   * `union` with padding bits
 40 |   * Types with trap representations
 41 | 
 42 | Proposed Wording {#word}
 43 | ================
 44 | 
 45 | In Operations on atomic types [**atomics.types.operations**], insert a new
 46 | paragraph after the note in ❡1:
 47 | 
 48 | <blockquote>
 49 | 
 50 | [*Note:* Many operations are volatile-qualified. The "volatile as device
 51 | register" semantics have not changed in the standard. This qualification means
 52 | that volatility is preserved when applying these operations to volatile objects.
 53 | It does not mean that operations on non-volatile objects become volatile. —*end
 54 | note*]
 55 | 
 56 | <ins>
 57 | 
 58 | Atomic operations, both through `atomic<T>` and free-functions, can be performed
 59 | on types `T` which contain bits that never participate in the object's
 60 | representation. In such cases an implementation shall ensure that
 61 | initialization, assignment, store, exchange, and read-modify-write operations
 62 | replace bits which never participate in the object's representation with an
 63 | implementation-defined value. A compatible implementation-defined value shall be
 64 | used for compare-and-exchange operations' copy of the `expected` value.
 65 | 
 66 | As a consequence, the following code is guaranteed to avoid spurious failure:
 67 | 
 68 | <xmp>
 69 | 
 70 | struct padded {
 71 |   char c = 0x42;
 72 |   // Padding here.
 73 |   unsigned i = 0xC0DEFEFE;
 74 | };
 75 | atomic<padded> pad = ATOMIC_VAR_INIT({});
 76 | 
 77 | bool success() {
 78 |   padded expected, desired { 0, 0 };
 79 |   return pad.compare_exchange_strong(expected, desired);
 80 | }
 81 | 
 82 | </xmp>
 83 | 
 84 | [*Note:*
 85 | 
 86 |   Types which contain bits that sometimes participate in the object's
 87 |   representation, such as a `union` containing a type with padding bits and a
 88 |   type without, may always fail compare-and-exchange when these bits are not
 89 |   participating in the object's representation because they have an
 90 |   indeterminate value. Such a program is ill-formed, no diagnostic required.
 91 | 
 92 | —*end note*]
 93 | 
 94 | </ins>
 95 | 
 96 | </blockquote>
 97 | 
 98 | Edit ❡17 and onwards as follows:
 99 | 
100 | <blockquote>
101 | 
102 | *Requires:* The `failure` argument shall not be `memory_order::release` nor
103 | `memory_order::acq_rel`.
104 | 
105 | *Effects:* Retrieves the value in `expected`. <ins>Bits in the retrieved value
106 | which never participate in the object's representation are set to a value
107 | compatible to that previously stored in the atomic object.</ins> It then
108 | atomically compares the contents of the memory pointed to by `this` for equality
109 | with that previously retrieved from `expected`, and if true, replaces the
110 | contents of the memory pointed to by `this` with that in `desired`. If and only
111 | if the comparison is true, memory is affected according to the value of
112 | `success`, and if the comparison is false, memory is affected according to the
113 | value of `failure`. When only one `memory_order` argument is supplied, the value
114 | of `success` is `order`, and the value of `failure` is `order` except that a
115 | value of `memory_order::acq_rel` shall be replaced by the value
116 | `memory_order::acquire` and a value of `memory_order::release` shall be replaced
117 | by the value `memory_order::relaxed`. If and only if the comparison is false
118 | then, after the atomic operation, the contents of the memory in `expected` are
119 | replaced by the value read from the memory pointed to by `this` during the
120 | atomic comparison. If the operation returns `true`, these operations are atomic
121 | read-modify-write operations on the memory pointed to by `this`. Otherwise,
122 | these operations are atomic load operations on that memory.
123 | 
124 | *Returns:* The result of the comparison.
125 | 
126 | [*Note:*
127 | 
128 |   For example, the effect of `compare_exchange_strong` is
129 |   
130 |   <xmp>
131 |   
132 |     if (memcmp(this, &expected, sizeof(*this)) == 0)
133 |       memcpy(this, &desired, sizeof(*this));
134 |     else
135 |        memcpy(expected, this, sizeof(*this));
136 | 
137 |   </xmp>
138 | 
139 | —*end note*]
140 | 
141 | [*Example:*
142 | 
143 |   The expected use of the compare-and-exchange operations is as follows. The
144 |   compare-and-exchange operations will update `expected` when another iteration
145 |   of the loop is needed.
146 |   
147 |   <xmp>
148 | 
149 |     expected = current.load();
150 |     do {
151 |       desired = function(expected);
152 |     } while (!current.compare_exchange_weak(expected, desired));
153 | 
154 |   </xmp>
155 |   
156 | —*end example*]
157 |   
158 | [*Example:*
159 | 
160 |   Because the expected value is updated only on failure, code releasing the
161 |   memory containing the `expected` value on success will work. E.g. list head
162 |   insertion will act atomically and would not introduce a data race in the
163 |   following code:
164 |   
165 |   <xmp>
166 | 
167 |     do {
168 |       p->next = head; // make new list node point to the current head
169 |     } while (!head.compare_exchange_weak(p->next, p)); // try to insert
170 | 
171 |   </xmp>
172 |   
173 | —*end example*]
174 | 
175 | Implementations should ensure that weak compare-and-exchange operations do not
176 | consistently return `false` unless either the atomic object has value different
177 | from `expected` or there are concurrent modifications to the atomic object.
178 | 
179 | 
180 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is,
181 | even when the contents of memory referred to by `expected` and `this` are equal,
182 | it may return `false` and store back to `expected` the same memory contents that
183 | were originally there.
184 | 
185 | [*Note:*
186 | 
187 |   This spurious failure enables implementation of compare-and-exchange on a
188 |   broader class of machines, e.g., load-locked store-conditional machines. A
189 |   consequence of spurious failure is that nearly all uses of weak
190 |   compare-and-exchange will be in a loop. When a compare-and-exchange is in a
191 |   loop, the weak version will yield better performance on some platforms. When a
192 |   weak compare-and-exchange would require a loop and a strong one would not, the
193 |   strong one is preferable.
194 | 
195 | —*end note*]
196 | 
197 | [*Note:*
198 | 
199 |   The `memcpy` and `memcmp` semantics of the compare-and-exchange operations may
200 |   result in failed comparisons for values that compare equal with `operator==`
201 |   if the underlying type has padding bits<ins> which sometimes participate in
202 |   the object's representation</ins>, trap bits, or alternate representations of
203 |   the same value<ins> other than those caused by padding bits which never
204 |   participate in the object's representation</ins>.
205 | 
206 | —*end note*]
207 | 
208 | </blockquote>
209 | 


--------------------------------------------------------------------------------
/source/P0528r2.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange
  3 | Shortname: P0528
  4 | Revision: 2
  5 | Audience: CWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0528r2
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0528r2.bs">github.com/jfbastien/papers/blob/master/source/P0528r2.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Editor: Michael Spencer, Sony Playstation, bigcheesegs@gmail.com
 12 | Abstract: Compare-and-exchange on a struct with padding bits should Just Work.
 13 | Date: 2018-03-16
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | This issue has been discussed by the authors at every recent Standards meetings,
 18 | yet a full solution has been elusive despite helpful proposals. We believe that
 19 | this proposal can fix this oft-encountered problem once and for all.
 20 | 
 21 | [[P0528r0]] details extensive background on this problem (not repeated here),
 22 | and proposed standardizing a trait, `has_padding_bits`, and using it on
 23 | `compare_and_exchange_*`. [[P0528r1]] applied EWG guidance and simply added
 24 | wording directing implementations to ensure that the desired behavior occur. At
 25 | SG1's request this paper follows EWG's guidance but uses different wording.
 26 | 
 27 | 
 28 | Edit History {#edit}
 29 | ============
 30 | 
 31 | r1 → r2 {#r1r2}
 32 | -------
 33 | 
 34 | In Jacksonville, SG1 supported the paper but suggested an alternate way to
 35 | approach the wording than the one EWG proposed in Albuquerque: don't talk about
 36 | contents of the memory, but rather discuss the value representation to describe
 37 | compare-and-exchange. This paper follows SG1's guidance and offers different
 38 | wording, with the intent that the semantics be equivalent. EWG reviewed the
 39 | updated wording an voted to support it and forward to Core.
 40 | 
 41 | r0 → r1 {#r0r1}
 42 | -------
 43 | 
 44 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming
 45 | value of `T` have a consistent value for the purposes of read/modify/write
 46 | atomic operations?
 47 | 
 48 | Purposefully not addressed in this paper:
 49 | 
 50 |   * `union` with padding bits
 51 |   * Types with trap representations
 52 | 
 53 | Proposed Wording {#word}
 54 | ================
 55 | 
 56 | Edit ❡17 and onwards as follows:
 57 | 
 58 | <blockquote>
 59 | 
 60 | *Requires:* The `failure` argument shall not be `memory_order::release` nor
 61 | `memory_order::acq_rel`.
 62 | 
 63 | *Effects:* Retrieves the value in `expected`. It then atomically compares
 64 | the <del>contents of the memory pointed to by `this`</del><ins>value representation
 65 | of `*this`</ins> for equality with that previously retrieved from `expected`,
 66 | and if true, replaces the <del>contents of the memory pointed to
 67 | by `this`</del><ins>value representation of `*this`</ins> with that in `desired`.  If
 68 | and only if the comparison is true, memory is affected according to the value of
 69 | `success`, and if the comparison is false, memory is affected according to the
 70 | value of `failure`. When only one `memory_order` argument is supplied, the value
 71 | of `success` is `order`, and the value of `failure` is `order` except that a
 72 | value of `memory_order::acq_rel` shall be replaced by the value
 73 | `memory_order::acquire` and a value of `memory_order::release` shall be replaced
 74 | by the value `memory_order::relaxed`. If and only if the comparison is false
 75 | then, after the atomic operation, the <del>contents of the memory</del><ins>the
 76 | value representation</ins> in `expected` are replaced by the value<ins>
 77 | representation</ins> read from the memory pointed to by `this` during the atomic
 78 | comparison. If the operation returns `true`, these operations are atomic
 79 | read-modify-write operations on the memory pointed to by `this`. Otherwise,
 80 | these operations are atomic load operations on that memory.
 81 | 
 82 | *Returns:* The result of the comparison.
 83 | 
 84 | [*Note:*
 85 | 
 86 |   For example, the effect of `compare_exchange_strong` <ins>on objects without padding bits </ins>is
 87 |   
 88 |   <xmp>
 89 |   
 90 |     if (memcmp(this, &expected, sizeof(*this)) == 0)
 91 |       memcpy(this, &desired, sizeof(*this));
 92 |     else
 93 |        memcpy(expected, this, sizeof(*this));
 94 | 
 95 |   </xmp>
 96 | 
 97 | —*end note*]
 98 | 
 99 | [*Example:*
100 | 
101 |   The expected use of the compare-and-exchange operations is as follows. The
102 |   compare-and-exchange operations will update `expected` when another iteration
103 |   of the loop is needed.
104 |   
105 |   <xmp>
106 | 
107 |     expected = current.load();
108 |     do {
109 |       desired = function(expected);
110 |     } while (!current.compare_exchange_weak(expected, desired));
111 | 
112 |   </xmp>
113 |   
114 | —*end example*]
115 |   
116 | [*Example:*
117 | 
118 |   Because the expected value is updated only on failure, code releasing the
119 |   memory containing the `expected` value on success will work. E.g. list head
120 |   insertion will act atomically and would not introduce a data race in the
121 |   following code:
122 |   
123 |   <xmp>
124 | 
125 |     do {
126 |       p->next = head; // make new list node point to the current head
127 |     } while (!head.compare_exchange_weak(p->next, p)); // try to insert
128 | 
129 |   </xmp>
130 |   
131 | —*end example*]
132 | 
133 | Implementations should ensure that weak compare-and-exchange operations do not
134 | consistently return `false` unless either the atomic object has value different
135 | from `expected` or there are concurrent modifications to the atomic object.
136 | 
137 | 
138 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is,
139 | even when the contents of memory referred to by `expected` and `this` are equal,
140 | it may return `false` and store back to `expected` the same memory contents that
141 | were originally there.
142 | 
143 | [*Note:*
144 | 
145 |   This spurious failure enables implementation of compare-and-exchange on a
146 |   broader class of machines, e.g., load-locked store-conditional machines. A
147 |   consequence of spurious failure is that nearly all uses of weak
148 |   compare-and-exchange will be in a loop. When a compare-and-exchange is in a
149 |   loop, the weak version will yield better performance on some platforms. When a
150 |   weak compare-and-exchange would require a loop and a strong one would not, the
151 |   strong one is preferable.
152 | 
153 | —*end note*]
154 | 
155 | [*Note:*
156 | 
157 |   The `memcpy` and `memcmp` semantics of the compare-and-exchange operations
158 |   may result in failed comparisons for values that compare equal with
159 |   `operator==` if the underlying type has padding bits<ins> which sometimes
160 |   participate in the object's representation</ins>, trap bits, or
161 |   alternate representations of the same value<ins> other than those caused by
162 |   padding bits which never participate in the object's representation</ins>.
163 |   Notably, on implementations conforming to ISO/IEC/IEEE 60559, floating-point
164 |   `-0.0` and `+0.0` will not compare equal with `memcmp` but will compare equal
165 |   with `operator==`, and NaNs with the same payload will compare equal with
166 |   `memcmp` but will not compare equal with `operator==`.
167 | 
168 | —*end note*]
169 | 
170 | <ins>
171 | 
172 | [*Note:*
173 | 
174 |   Compare-and-exchange acts on an object's value representation, ensuring that
175 |   padding bits which never participate in the object's representation are ignored.
176 | 
177 |   As a consequence, the following code is guaranteed to avoid spurious failure:
178 | 
179 |   <xmp>
180 | 
181 |   struct padded {
182 |     char clank = 0x42;
183 |     // Padding here.
184 |     unsigned biff = 0xC0DEFEFE;
185 |   };
186 |   atomic<padded> pad = ATOMIC_VAR_INIT({});
187 | 
188 |   bool zap() {
189 |     padded expected, desired { 0, 0 };
190 |     return pad.compare_exchange_strong(expected, desired);
191 |   }
192 | 
193 |   </xmp>
194 | 
195 | —*end note*]
196 | 
197 | [*Note:*
198 | 
199 |   Types which contain bits that sometimes participate in the object's
200 |   representation, such as a `union` containing a type with padding bits and a
201 |   type without, may always fail compare-and-exchange when these bits are not
202 |   participating in the object's representation because they have an
203 |   indeterminate value.
204 | 
205 | —*end note*]
206 | 
207 | </ins>
208 | 
209 | </blockquote>
210 | 


--------------------------------------------------------------------------------
/source/P1018r6.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Language Evolution status after Prague 2020
  3 | Shortname: P1018
  4 | Revision: 6
  5 | Audience: WG21, EWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P1018r6
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P1018r6.bs">github.com/jfbastien/papers/blob/master/source/P1018r6.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Date: 2020-02-29
 12 | Markup Shorthands: markdown yes
 13 | Toggle Diffs: no
 14 | No abstract: false
 15 | Abstract: This paper is a collection of items that language Evolution has worked on in the latest C++ meeting, their status, and plans for the future.
 16 | </pre>
 17 | 
 18 | Executive summary {#summary}
 19 | =================
 20 | 
 21 | * Finalize ballot resolution for C++20, to address National Body comments in [[N4844]].
 22 | * Start work on features for C++23 and later.
 23 | * Joins session with LEWG on ABI, based on <a href="https://wg21.link/P1863R1">P1863R1</a>.
 24 | 
 25 | 
 26 | Paper of note {#note}
 27 | =============
 28 | 
 29 | * <a href="https://wg21.link/P1000R4">P1000R4</a> C++ IS schedule
 30 | * <a href="https://wg21.link/P0592R4">P0592R4</a> To boldly suggest an overall plan for C++23
 31 | * <a href="https://wg21.link/P1999R0">P1999R0</a> Process: 2×-🇨🇿 evolutionary material via a Tentatively Ready status
 32 | * <a href="https://wg21.link/P2118R0">P2118R0</a> Documenting Core Undefined or Unspecified Behavior
 33 | 
 34 | 
 35 | Tentatively ready papers {#tentative}
 36 | ========================
 37 | 
 38 | Following our process in <a href="https://wg21.link/P1999">P1999</a>, here are the papers that EWG considers tentatively ready for CWG. We'll take a brief look at the next meeting, and if nothing particular concerns anyone, send them to CWG.
 39 | 
 40 | * <a href="https://wg21.link/P1847R2">P1847R2</a> Make declaration order layout mandated 
 41 | * <a href="https://wg21.link/P2025R0">P2025R0</a> Guaranteed copy elision for named return objects
 42 | * <a href="https://wg21.link/P1949R2">P1949R2</a> C++ Identifier Syntax using Unicode Standard Annex 31
 43 | 
 44 | You can <a href="https://github.com/cplusplus/papers/labels/TentativelyReady">follow this list on GitHub</a>.
 45 | 
 46 | 
 47 | ABI discussion {#abi}
 48 | ==============
 49 | 
 50 | We held a joint sessions with LEWG to discuss ABI, based on <a href="https://wg21.link/P1863R1">P1863R1</a>, The outcome of the discussion was as follows:
 51 | 
 52 | * <strong>To the best of our ability, we should promise users that we won’t break ABI, ever</strong><br>Wasn't contended: we disagree with this statement and might break ABI in the future.
 53 | * <strong>From now on, we should consider incremental ABI for every C++ release</strong><br>Received extremely positive support, with a small minority disagreeing strongly.
 54 | * <strong>We should consider a big ABI break for C++23</strong><br>Was extremely contended, with a few more people in favor than against. This was insufficient to call consensus.
 55 | * <strong>We should consider a big ABI break for C++SOMETHING</strong><br>Was positive enough to call consensus, but still had a quite substantial opposition including many disagreeing strongly. Were we to do a big ABI break we would need to work very hard on consensus building. Indeed, the number of people disagreeing strongly on a poll for a concrete change would block consensus.
 56 | * <strong>When we are unable to resolve a conflict between performance and ABI compatibility, we should prioritize performance</strong><br>Was still more positive, but also had a quite substantial opposition including many disagreeing strongly. Again, we should consider performance over ABI but work extremely hard towards consensus building when doing so.
 57 | 
 58 | 
 59 | National body comments {#nb}
 60 | ======================
 61 | 
 62 | * <a href="https://wg21.link/P2003R0">P2003R0</a> Fixing Internal and External Linkage Entities in Header Units <a href="https://wg21.link/P2003/github">#740</a>
 63 | * <a href="https://wg21.link/P2014R0">P2014R0</a> Proposed resolution for US061/US062 - aligned allocation of coroutine frames <a href="https://wg21.link/P2014/github">#750</a>
 64 | * <a href="https://wg21.link/P1884R0">P1884R0</a> Private Module Partition: An Inconsistent Boundary <a href="https://wg21.link/P1884/github">#729</a>
 65 | * <a href="https://wg21.link/P2100R0">P2100R0</a> Keep unhandled_exception of a promise type mandatory - a response to US062 and FR066
 66 | * <a href="https://wg21.link/P2104R0">P2104R0</a> GB046 Allow caching of evaluations of concept specializations <a href="https://github.com/cplusplus/nbballot/issues/45">#45</a>
 67 | 
 68 | 
 69 | C++23 discussions {#cpp23}
 70 | =================
 71 | 
 72 | We discussed a few papers which could make it to C++23:
 73 | 
 74 | * <a href="https://wg21.link/P2085R0">P2085R0</a> Consistent defaulted comparisons
 75 | * <a href="https://wg21.link/P0592R4">P0592R4</a> To boldly suggest an overall plan for C++23
 76 | * <a href="https://wg21.link/P1999R0">P1999R0</a> Process proposal: double-check evolutionary material via a Tentatively Ready status
 77 | * <a href="https://wg21.link/P1468R3">P1468R3</a> Fixed-layout floating-point type aliases
 78 | * <a href="https://wg21.link/P1467R3">P1467R3</a> Extended floating-point types
 79 | * <a href="https://wg21.link/P1371R2">P1371R2</a> Pattern Matching
 80 | * <a href="https://wg21.link/P1000R4">P1000R4</a> C++ IS schedule
 81 | * <a href="https://wg21.link/P1726R2">P1726R2</a> Pointer lifetime-end zap
 82 | * <a href="https://wg21.link/P2092R0">P2092R0</a> Disambiguating Nested-Requirements
 83 | * <a href="https://wg21.link/P1040R5">P1040R5</a> std::embed
 84 | * <a href="https://wg21.link/P1677R2">P1677R2</a> Cancellation is not an Error
 85 | * <a href="https://wg21.link/P1401R2">P1401R2</a> Narrowing contextual conversions to bool
 86 | * <a href="https://wg21.link/P0876R10">P0876R10</a> fiber_context - fibers without scheduler
 87 | * <a href="https://wg21.link/P0847R4">P0847R4</a> Deducing this
 88 | * <a href="https://wg21.link/P2082R1">P2082R1</a> Fixing CTAD for aggregates
 89 | * <a href="https://wg21.link/P1774R3">P1774R3</a> Portable assumptions
 90 | * <a href="https://wg21.link/P2118R0">P2118R0</a> Documenting Core Undefined or Unspecified Behavior
 91 | * <a href="https://wg21.link/P0849R2">P0849R2</a> auto(x): decay-copy in the language
 92 | * <a href="https://wg21.link/P2036R0">P2036R0</a> Changing scope for lambda trailing-return-type
 93 | * <a href="https://wg21.link/P2071R0">P2071R0</a> Named universal character escapes
 94 | * <a href="https://wg21.link/P1900R0">P1900R0</a> Concepts-Adjacent Problems
 95 | * <a href="https://wg21.link/P1847R2">P1847R2</a> Make declaration order layout mandated
 96 | * <a href="https://wg21.link/P1393R0">P1393R0</a> A General Property Customization Mechanism
 97 | * <a href="https://wg21.link/P2026R0">P2026R0</a> A Constituent Study Group for Safety-Critical Applications
 98 | * <a href="https://wg21.link/P1938R0">P1938R0</a> if consteval
 99 | * <a href="https://wg21.link/P1955R0">P1955R0</a> Top Level Is Constant Evaluated
100 | * <a href="https://wg21.link/P2041R0">P2041R0</a> Deleting variable templates
101 | * <a href="https://wg21.link/P0870R2">P0870R2</a> A proposal for a type trait to detect narrowing conversions
102 | * <a href="https://wg21.link/P2025R0">P2025R0</a> Guaranteed copy elision for named return objects
103 | * <a href="https://wg21.link/P2013R0">P2013R0</a> Freestanding Language: Optional ::operator new
104 | * <a href="https://wg21.link/P1949R2">P1949R2</a> C++ Identifier Syntax using Unicode Standard Annex 31
105 | 
106 | The following papers were scheduled for discussion, but authors requested to delay until the next meeting:
107 | 
108 | * <a href="https://wg21.link/P1967R1">P1967R1</a> #embed - a simple, scannable preprocessor-based resource acquisition method
109 | * <a href="https://wg21.link/P1046R2">P1046R2</a> Automatically Generate More Operators
110 | * <a href="https://wg21.link/P2049R0">P2049R0</a> Constraint refinement for special-cased functions
111 | 
112 | The following papers were scheduled for discussion, but were seen in SG7 Reflection who decided to table them for now:
113 | 
114 | * <a href="https://wg21.link/P1733R0">P1733R0</a> User-friendly and Evolution-friendly Reflection: A Compromise
115 | * <a href="https://wg21.link/P2089R0">P2089R0</a> Function parameter constraints are fragile
116 | 
117 | 
118 | Near-future EWG plans {#future}
119 | =====================
120 | 
121 | We will continue to work on C++23, prioritizing according to <a href="https://wg21.link/P0592">P0592</a>.
122 | 


--------------------------------------------------------------------------------
/source/P1225R0.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Feedback on 2D Graphics
  3 | Shortname: P1225
  4 | Revision: 0
  5 | Audience: LEWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P1225R0
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P1225R0.bs">github.com/jfbastien/papers/blob/master/source/P1225R0.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | No abstract: true
 12 | Date: 2018-10-02
 13 | Markup Shorthands: markdown yes
 14 | </pre>
 15 | 
 16 | Abstract {#abs}
 17 | ========
 18 | 
 19 | I’ve gathered input from a variety of folks involved in graphics at Apple, and here is our joint, considered, position regarding the 2D Graphics proposal.
 20 | 
 21 | We’re worried that the 2D Graphics proposal in [[P0267R8]] might be detrimental to developers, students, and users of devices which contain C++ code. Graphics are important to the Apple ecosystem, and we can see them as an important part of C++. However, we don’t think P0267R8 meets the quality bar for acceptance into C++. We want to see the reference implementation prove orthogonality, extensibility, and performance across a handful of platforms.
 22 | 
 23 | 
 24 | Design {#design}
 25 | ======
 26 | 
 27 | Were we to design a 2D Graphics API, we’d do the following:
 28 | 
 29 | 1. Multiple output devices: Memory buffer, Window, SVG, PDF, etc.
 30 | 
 31 |     1. Memory buffer must be directly usable by graphics API
 32 |     1. Support types such as `fp16` [[P0303R0]]
 33 |     1. Alpha channel support
 34 | 
 35 | 1. Anti-aliasing should come for free where supported
 36 | 1. Text
 37 | 1. Consistent, DPI-independent, output
 38 | 1. Hardware support where available 
 39 | 1. Reasonable performance
 40 | 1. Reasonable power consumption
 41 | 1. Color spaces and gamma support
 42 | 1. Possibility to build an interactive model with animation on top of the API
 43 | 
 44 | From the current proposal we like:
 45 | 
 46 | 1. 2D Matrix is 3×3, so homogeneous, presented as 2×3 in the API
 47 | 1. Decouples display points from actual points
 48 | 1. Vector graphics
 49 | 1. Compositing properly handled
 50 | 
 51 | Science and teaching {#st}
 52 | ====================
 53 | 
 54 | We’ve heard the following reasons for including 2D Graphics in C++:
 55 | 
 56 | 1. Teaching
 57 | 1. Scientific plot generation
 58 | 
 59 | We think putting pixels on the screen is great, but we want to do so responsibly.
 60 | 
 61 | Both for science and teaching, we appreciate what’s available through solutions such as <a href="https://www.mathworks.com/help/matlab/ref/plot.html">Matlab</a> / <a href="https://matplotlib.org/users/pyplot_tutorial.html">matplotlib</a> / <a href="https://www.statmethods.net/graphs/line.html">R</a> / <a href="https://d3js.org/">D3.js</a>. These solutions are powerful and match the performance of the language they complement. For C++ we’d expect a solution which is able to deliver performance which at least approaches that of modern graphics frameworks, and surpassing those of Matlab / Python / R / JavaScript.
 62 | 
 63 | As a teaching tool, the current proposal teaches fairly low-level capabilities (i.e. complex things are hard to create) and is missing critical functionality. We fear it will hinder students by teaching them to start everything from scratch, and by not teaching them a few key details.
 64 | 
 65 | As a plotting tool it’s clearly falling short because it can’t label any axis (c.f. <a href="https://www.edwardtufte.com/tufte/books_vdqi">Tufte</a>). Even if text were supported, the sample libraries for Matlab, Python, R, and JavaScript are much easier to draw plots with. The 2D Graphics proposal is neither capable nor convenient in that regard.
 66 | 
 67 | As a broad generalization, students currently learn data visualization (beyond what Excel + CSV files can do) in Matlab or Python if they do science, in R if they do math, and in JavaScript if they do anything else. We urge the Committee members at least try some of these, for example <a href="https://beta.observablehq.com/@mbostock/d3-scatterplot-matrix">scatterplot</a>, <a href="https://beta.observablehq.com/@mbostock/d3-histogram">histogram</a>, <a href="https://www.jasondavies.com/wordtree/">wordtree</a>. These aren’t teaching toys and are used, for example, by the <a href="https://archive.nytimes.com/www.nytimes.com/interactive/2012/10/15/us/politics/swing-history.html">New York Times</a>. There’s value in teaching students to pull themselves up from the language’s bootstraps, we therefore think the type of API in the current 2D Graphics library is useful. However, we want to know—i.e. we want to see it prototyped—that higher-level capabilities are also something that can be implemented. We think higher-level capabilities are more useful for teaching, yet we understand that C++ might want to offer lower-level primitives first.
 68 | 
 69 | Abstraction Level {#level}
 70 | =================
 71 | 
 72 | When we say the current proposal is too low-level, here are things we’d like to see at least prototyped to know that the proposal can grow into a powerful high-level library:
 73 | 
 74 | * Obtain a window object
 75 | * Load / transform / draw asset files
 76 | * Complex raster image support (including swizzled surfaces, compression, 2D form clipping, used as texture fill)
 77 | * New user-implemented rasterization primitives (such as ellipses or NURBS curve)
 78 | * Stacking geometric transforms before drawing (can this be done already?)
 79 | * Scissoring / clipping
 80 | * Handle user input
 81 | * Text support (glyph rasterization (e.g. FreeType), text Shaping (e.g. HarfBuzz), string Rendering (e.g. Pango)), or something platform specific (e.g. CoreText on Apple Platforms)
 82 | * Complex line drawing (e.g. dashed lines, along a path)
 83 | * Can all of the offered primitives be implemented directly on hardware using shaders?
 84 | 
 85 | In other words, we understand that a proposal might want to start small and grow more features over time. We want to know that this growth is possible, and that features can be composed into higher-level primitives.
 86 | 
 87 | Missing Details {#missing}
 88 | ===============
 89 | 
 90 | When we say the current proposal has key details we find missing, here are what we want to see in an initial version:
 91 | 
 92 | * It’s unclear that buffering is implementable, and that’s critical to a high-performance implementation. We’d like to see it implemented. We want to see a deferred mode implementation, not just immediate mode.
 93 | * Support modern color spaces and gamma.
 94 | * DPI independence is needed.
 95 | * Display points seem to address individual pixels in the image. We’d like to be able to address at finer granularity (MSAA samples, typographer points, pica).
 96 | * We’re not convinced that animation can be supported efficiently (i.e. update a single matrix in the stack of transforms).
 97 | * The current proposal doesn’t specify which image format can be loaded, yet the reference implementation has PNG, JPEG, TIFF. This lack of specification makes portability difficult.
 98 | * We want to see an implementation generate PDF, SVG, raster output, as well as output in an OS window. This should be doable portably with zero code change.
 99 | 
100 | C++ Aesthetics {#cpp}
101 | ==============
102 | 
103 | Aesthetically, this lacks the feel of a C++ standard library. In particular:
104 | 
105 | * The dual error handling mechanism, while reminiscent of filesystem, is quaint in the STL.
106 | * Most APIs seem to be function-oriented and have a C API feel to them.
107 | * We’re surprised that we don’t have iterators / ranges for e.g. a path. We’d expect STL algorithms to work on such primitives.
108 | * We’d like to see linear algebra, trigonometry, and matrix math standardized separately.
109 | 
110 | Conclusion {#conc}
111 | ==========
112 | 
113 | We want to offer developers a graphics solution which allows usage of the full capabilities of the hardware we ship, without wasting battery life. Were we to ship the 2D Graphics proposal, we’d be putting our and C++’s good name on an API. We want to be sure it doesn't provides a disservice to developers and users.
114 | 
115 | We’re surprised and worried that the reference implementation on Mac requires X11 and MacPorts. We want to see an implementation that re-uses platform primitives on more than Linux. What was the experience with <a href="https://github.com/mikekazakov/P0267_cg">CoreGraphics</a>?
116 | 
117 | The windows + SVG proposal in [[P1062R0]] isn’t terrible. Obtaining a window seems like a simple step forward. SVG has some upsides and a few downsides, but overall we’re positive on them. We like that the proposal leans on existing standards.
118 | 
119 | Web view from [[P1108R0]] is trivial to support if specified well, but we don’t think it does what graphics enthusiasts want to do. It might be an interesting proposal, but we think it stands separately from 2D Graphics.
120 | 


--------------------------------------------------------------------------------
/source/P0154R0.rst:
--------------------------------------------------------------------------------
  1 | ================================================================================
  2 | P0154R0 ``constexpr std::hardware_{constructive,destructive}_interference_size``
  3 | ================================================================================
  4 | 
  5 | :Author: JF Bastien
  6 | :Contact: jfb@google.com
  7 | :Author: Olivier Giroux
  8 | :Contact: ogiroux@nvidia.com
  9 | :Date: 2015-10-24
 10 | :Previous: http://wg21.link/N4523
 11 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0154R0.rst
 12 | 
 13 | ---------
 14 | Rationale
 15 | ---------
 16 | 
 17 | Starting with C++11, the library includes
 18 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity
 19 | useful in the design of control structures in multi-threaded programs: the
 20 | extent of threads that do not interfere (to the first-order). Established
 21 | practice throughout the industry also relies on a second implementation
 22 | quantity, used instead in the design of data structures in the same programs.
 23 | This quantity is the granularity of memory that does not interfere (to the
 24 | first-order), commonly referred to as the *cache-line size*.
 25 | 
 26 | Uses of *cache-line size* fall into two broad categories:
 27 | 
 28 | * Avoiding destructive interference (false-sharing) between objects with
 29 |   temporally disjoint runtime access patterns from different
 30 |   threads. e.g. Producer-consumer queues.
 31 | * Promoting constructive interference (true-sharing) between objects which have
 32 |   temporally local runtime access patterns. e.g. The ``barrier`` example, as
 33 |   illustrated in P0153R0_.
 34 | 
 35 | .. _P0153R0: http://wg21.link/P0153R0
 36 | 
 37 | The most sigificant issue with this useful implementation quantity is the
 38 | questionable portability of the methods used in current practice to determine
 39 | its value, despite their pervasiveness and popularity as a group. In the
 40 | appendix_ we review several different compile-time and run-time methods. The
 41 | portability problem with most of these methods is that they expose a
 42 | micro-architectural detail without accounting for the intent of the implementors
 43 | (such as we are) over the life of the ISA or ABI.
 44 | 
 45 | We aim to contribute a modest invention for this cause, abstractions for this
 46 | quantity that can be conservatively defined for given purposes by
 47 | implementations:
 48 | 
 49 | * *Destructive interference size*: a number that's suitable as an offset between
 50 |   two objects to likely avoid false-sharing due to different runtime access
 51 |   patterns from different threads.
 52 | * *Constructive interference size*: a number that's suitable as a limit on two
 53 |   objects' combined memory footprint size and base alignment to likely promote
 54 |   true-sharing between them.
 55 | 
 56 | In both cases these values are provided on a quality of implementation basis,
 57 | purely as hints that are likely to improve performance. These are ideal portable
 58 | values to use with the ``alignas()`` keyword, for which there currently exists
 59 | nearly no standard-supported portable uses.
 60 | 
 61 | -----------------
 62 | Proposed addition
 63 | -----------------
 64 | 
 65 | Below, substitute the `�` character with a number the editor finds appropriate
 66 | for the sub-section. We propose adding the following to the standard:
 67 | 
 68 | Under 20.7.2 Header ``<memory>`` synopsis [**memory.syn**]:
 69 | 
 70 | .. code-block:: c++
 71 | 
 72 |   namespace std {
 73 |     // ...
 74 |     // 20.7.� Hardware interference size
 75 |     static constexpr size_t hardware_destructive_interference_size = implementation-defined;
 76 |     static constexpr size_t hardware_constructive_interference_size = implementation-defined;
 77 |     // ...
 78 |   }
 79 | 
 80 | Under 20.7.� Hardware interference size [**hardware.interference**]:
 81 | 
 82 | ``constexpr size_t hardware_destructive_interference_size = implementation-defined;``
 83 | 
 84 | This number is the minimum recommended offset between two concurrently-accessed
 85 | objects to avoid additional performance degradation due to contention introduced
 86 | by the implementation. It shall be a valid alignment value for any type.
 87 | 
 88 | [*Example:*
 89 | 
 90 | .. code-block:: c++
 91 | 
 92 |   struct apart {
 93 |     alignas(hardware_destructive_interference_size) atomic<int> flag1, flag2;
 94 |   };
 95 | 
 96 | — *end example*]
 97 | 
 98 | ``constexpr size_t hardware_constructive_interference_size = implementation-defined;``
 99 | 
100 | This number is the minimum recommended alignment of contiguous memory occupied
101 | by two objects accessed with temporal locality by concurrent threads. It shall
102 | be a valid alignment value for any type.
103 | 
104 | [*Note:* This number is also the maximum recommended size of contiguous memory
105 | occupied by two objects accessed in this manner. — *end note*]
106 | 
107 | [*Example:*
108 | 
109 | .. code-block:: c++
110 | 
111 |   alignas(hardware_constructive_interference_size) struct colocated {
112 |     atomic<int> flag;
113 |     int tinydata;
114 |   };
115 |   static_assert(sizeof(colocated) <= hardware_constructive_interference_size);
116 | 
117 | — *end example*]
118 | 
119 | The ``__cpp_lib_thread_hardware_interference_size`` feature test macro should be
120 | added.
121 | 
122 | .. _appendix:
123 | 
124 | --------
125 | Appendix
126 | --------
127 | 
128 | Compile-time *cache-line size*
129 | ==============================
130 | 
131 | We informatively list a few ways in which the L1 *cache-line size* is obtained
132 | in different open-source projects at compile-time.
133 | 
134 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured
135 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this
136 | value is determined through the configure-time option
137 | ``CONFIG_<ARCH>_L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT``
138 | is hard-coded in the architecture's ``include/asm/cache.h`` header.
139 | 
140 | Many open-source projects from Google contain a ``base/port.h`` header which
141 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of
142 | architecture detection macros. These header files have often diverged. A token
143 | example from the autofdo_ project is:
144 | 
145 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h
146 | 
147 | .. code-block:: c++
148 | 
149 |   // Cache line alignment
150 |   #if defined(__i386__) || defined(__x86_64__)
151 |   #define CACHELINE_SIZE 64
152 |   #elif defined(__powerpc64__)
153 |   // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines.
154 |   // Need to check if this is appropriate for other PowerPC64 systems.
155 |   #define CACHELINE_SIZE 128
156 |   #elif defined(__arm__)
157 |   // Cache line sizes for ARM: These values are not strictly correct since
158 |   // cache line sizes depend on implementations, not architectures.  There
159 |   // are even implementations with cache line sizes configurable at boot
160 |   // time.
161 |   #if defined(__ARM_ARCH_5T__)
162 |   #define CACHELINE_SIZE 32
163 |   #elif defined(__ARM_ARCH_7A__)
164 |   #define CACHELINE_SIZE 64
165 |   #endif
166 |   #endif
167 | 
168 |   #ifndef CACHELINE_SIZE
169 |   // A reasonable default guess.  Note that overestimates tend to waste more
170 |   // space, while underestimates tend to waste more time.
171 |   #define CACHELINE_SIZE 64
172 |   #endif
173 | 
174 |   #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE)))
175 | 
176 | Runtime *cache-line size*
177 | =========================
178 | 
179 | We informatively list a few ways in which the L1 *cache-line size* can be
180 | obtained on different operating systems and architectures at runtime. Libraries
181 | such as hwloc_ perform these queries, and could also be added to the standard as
182 | a separate proposal.
183 | 
184 | .. _hwloc: http://www.open-mpi.org/projects/hwloc/
185 | 
186 | On OSX one would use:
187 | 
188 | .. code-block:: c++
189 | 
190 |   sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0)
191 | 
192 | On Windows one would use:
193 | 
194 | .. code-block:: c++
195 | 
196 |   GetLogicalProcessorInformation(&buf[0], &sizeof_buf);
197 |   for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
198 |     if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1)
199 |       cacheline_size = buf[i].Cache.LineSize;
200 | 
201 | On Linux one would either use:
202 | 
203 | .. code-block:: c++
204 | 
205 |   p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
206 |   fscanf(p, "%d", &cacheline_size);
207 | 
208 | or:
209 | 
210 | .. code-block:: c++
211 | 
212 |   sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
213 | 
214 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which
215 | leaves the result in ``ECX``, which needs further work to extract.
216 | 
217 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to
218 | extract.
219 | 


--------------------------------------------------------------------------------
/source/P0154R1.rst:
--------------------------------------------------------------------------------
  1 | ================================================================================
  2 | P0154R1 ``constexpr std::hardware_{constructive,destructive}_interference_size``
  3 | ================================================================================
  4 | 
  5 | :Author: JF Bastien
  6 | :Contact: jfb@google.com
  7 | :Author: Olivier Giroux
  8 | :Contact: ogiroux@nvidia.com
  9 | :Date: 2016-03-03
 10 | :Previous: http://wg21.link/N4523
 11 | :Previous: http://wg21.link/P0154R0
 12 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0154R1.rst
 13 | 
 14 | ---------
 15 | Rationale
 16 | ---------
 17 | 
 18 | Starting with C++11, the library includes
 19 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity
 20 | useful in the design of control structures in multi-threaded programs: the
 21 | extent of threads that do not interfere (to the first-order). Established
 22 | practice throughout the industry also relies on a second implementation
 23 | quantity, used instead in the design of data structures in the same programs.
 24 | This quantity is the granularity of memory that does not interfere (to the
 25 | first-order), commonly referred to as the *cache-line size*.
 26 | 
 27 | Uses of *cache-line size* fall into two broad categories:
 28 | 
 29 | * Avoiding destructive interference (false-sharing) between objects with
 30 |   temporally disjoint runtime access patterns from different
 31 |   threads. e.g. Producer-consumer queues.
 32 | * Promoting constructive interference (true-sharing) between objects which have
 33 |   temporally local runtime access patterns. e.g. The ``barrier`` example, as
 34 |   illustrated in P0153R0_.
 35 | 
 36 | .. _P0153R0: http://wg21.link/P0153R0
 37 | 
 38 | The most sigificant issue with this useful implementation quantity is the
 39 | questionable portability of the methods used in current practice to determine
 40 | its value, despite their pervasiveness and popularity as a group. In the
 41 | appendix_ we review several different compile-time and run-time methods. The
 42 | portability problem with most of these methods is that they expose a
 43 | micro-architectural detail without accounting for the intent of the implementors
 44 | (such as we are) over the life of the ISA or ABI.
 45 | 
 46 | We aim to contribute a modest invention for this cause, abstractions for this
 47 | quantity that can be conservatively defined for given purposes by
 48 | implementations:
 49 | 
 50 | * *Destructive interference size*: a number that's suitable as an offset between
 51 |   two objects to likely avoid false-sharing due to different runtime access
 52 |   patterns from different threads.
 53 | * *Constructive interference size*: a number that's suitable as a limit on two
 54 |   objects' combined memory footprint size and base alignment to likely promote
 55 |   true-sharing between them.
 56 | 
 57 | In both cases these values are provided on a quality of implementation basis,
 58 | purely as hints that are likely to improve performance. These are ideal portable
 59 | values to use with the ``alignas()`` keyword, for which there currently exists
 60 | nearly no standard-supported portable uses.
 61 | 
 62 | -----------------
 63 | Proposed addition
 64 | -----------------
 65 | 
 66 | Below, substitute the `�` character with a number the editor finds appropriate
 67 | for the sub-section. We propose adding the following to the standard:
 68 | 
 69 | Under 18.6 Header ``<new>`` synopsis [**support.dynamic**]:
 70 | 
 71 | .. code-block:: c++
 72 | 
 73 |   namespace std {
 74 |     // ...
 75 |     // 18.6.� Hardware interference size
 76 |     static constexpr size_t hardware_destructive_interference_size = implementation-defined;
 77 |     static constexpr size_t hardware_constructive_interference_size = implementation-defined;
 78 |     // ...
 79 |   }
 80 | 
 81 | Under 18.6.� Hardware interference size [**hardware.interference**]:
 82 | 
 83 | ``constexpr size_t hardware_destructive_interference_size = implementation-defined;``
 84 | 
 85 | This number is the minimum recommended offset between two concurrently-accessed
 86 | objects to avoid additional performance degradation due to contention introduced
 87 | by the implementation. It shall be at least ``alignof(max_align_t)``.
 88 | 
 89 | [*Example:*
 90 | 
 91 | .. code-block:: c++
 92 | 
 93 |   struct keep_apart {
 94 |     alignas(hardware_destructive_interference_size) atomic<int> cat;
 95 |     alignas(hardware_destructive_interference_size) atomic<int> dog;
 96 |   };
 97 | 
 98 | — *end example*]
 99 | 
100 | ``constexpr size_t hardware_constructive_interference_size = implementation-defined;``
101 | 
102 | This number is the maximum recommended size of contiguous memory occupied by two
103 | objects accessed with temporal locality by concurrent threads. It shall be at
104 | least ``alignof(max_align_t)``.
105 | 
106 | [*Example:*
107 | 
108 | .. code-block:: c++
109 | 
110 |   struct together {
111 |     atomic<int> dog;
112 |     int puppy;
113 |   };
114 |   struct kennel {
115 |     // Other data members...
116 |     alignas(sizeof(together)) together pack;
117 |     // Other data members...
118 |   };
119 |   static_assert(sizeof(together) <= hardware_constructive_interference_size);
120 | 
121 | — *end example*]
122 | 
123 | The ``__cpp_lib_thread_hardware_interference_size`` feature test macro should be
124 | added.
125 | 
126 | .. _appendix:
127 | 
128 | --------
129 | Appendix
130 | --------
131 | 
132 | Compile-time *cache-line size*
133 | ==============================
134 | 
135 | We informatively list a few ways in which the L1 *cache-line size* is obtained
136 | in different open-source projects at compile-time.
137 | 
138 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured
139 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this
140 | value is determined through the configure-time option
141 | ``CONFIG_<ARCH>_L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT``
142 | is hard-coded in the architecture's ``include/asm/cache.h`` header.
143 | 
144 | Many open-source projects from Google contain a ``base/port.h`` header which
145 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of
146 | architecture detection macros. These header files have often diverged. A token
147 | example from the autofdo_ project is:
148 | 
149 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h
150 | 
151 | .. code-block:: c++
152 | 
153 |   // Cache line alignment
154 |   #if defined(__i386__) || defined(__x86_64__)
155 |   #define CACHELINE_SIZE 64
156 |   #elif defined(__powerpc64__)
157 |   // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines.
158 |   // Need to check if this is appropriate for other PowerPC64 systems.
159 |   #define CACHELINE_SIZE 128
160 |   #elif defined(__arm__)
161 |   // Cache line sizes for ARM: These values are not strictly correct since
162 |   // cache line sizes depend on implementations, not architectures.  There
163 |   // are even implementations with cache line sizes configurable at boot
164 |   // time.
165 |   #if defined(__ARM_ARCH_5T__)
166 |   #define CACHELINE_SIZE 32
167 |   #elif defined(__ARM_ARCH_7A__)
168 |   #define CACHELINE_SIZE 64
169 |   #endif
170 |   #endif
171 | 
172 |   #ifndef CACHELINE_SIZE
173 |   // A reasonable default guess.  Note that overestimates tend to waste more
174 |   // space, while underestimates tend to waste more time.
175 |   #define CACHELINE_SIZE 64
176 |   #endif
177 | 
178 |   #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE)))
179 | 
180 | Runtime *cache-line size*
181 | =========================
182 | 
183 | We informatively list a few ways in which the L1 *cache-line size* can be
184 | obtained on different operating systems and architectures at runtime. Libraries
185 | such as hwloc_ perform these queries, and could also be added to the standard as
186 | a separate proposal.
187 | 
188 | .. _hwloc: http://www.open-mpi.org/projects/hwloc/
189 | 
190 | On OSX one would use:
191 | 
192 | .. code-block:: c++
193 | 
194 |   sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0)
195 | 
196 | On Windows one would use:
197 | 
198 | .. code-block:: c++
199 | 
200 |   GetLogicalProcessorInformation(&buf[0], &sizeof_buf);
201 |   for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
202 |     if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1)
203 |       cacheline_size = buf[i].Cache.LineSize;
204 | 
205 | On Linux one would either use:
206 | 
207 | .. code-block:: c++
208 | 
209 |   p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
210 |   fscanf(p, "%d", &cacheline_size);
211 | 
212 | or:
213 | 
214 | .. code-block:: c++
215 | 
216 |   sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
217 | 
218 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which
219 | leaves the result in ``ECX``, which needs further work to extract.
220 | 
221 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to
222 | extract.
223 | 


--------------------------------------------------------------------------------
/source/P0476r2.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Bit-casting object representations
  3 | Shortname: P0476
  4 | Revision: 2
  5 | Audience: LWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0476r2
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0476r2.bs">github.com/jfbastien/papers/blob/master/source/P0476r2.bs</a>
 10 | !Implementation: <a href="https://github.com/jfbastien/bit_cast/">github.com/jfbastien/bit_cast/</a>
 11 | Editor: JF Bastien, Apple, jfbastien@apple.com
 12 | Abstract: Obtaining equivalent object representations The Right Way™.
 13 | Date: 2017-11-10
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | 
 18 | This paper is a revision of [[P0476r1]], addressing LEWG comments from the 2017
 19 | Toronto meeting as well as comments from LEWG and LWG from the 2017 Albuquerque
 20 | meeting. See [[#rev]] for details.
 21 | 
 22 | 
 23 | Background {#bg}
 24 | ==========
 25 | 
 26 | Low-level code often seeks to interpret objects of one type as another: keep the
 27 | same bits, but obtain an object of a different type. Doing so correctly is
 28 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing
 29 | rules yet these are the intuitive solutions developers mistakenly turn to.
 30 | 
 31 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment
 32 | pitfalls and allowing them to bit-cast non-default-constructible types.
 33 | 
 34 | This proposal uses appropriate concepts to prevent misuse. As the sample
 35 | implementation demonstrates we could as well use `static_assert` or template
 36 | SFINAE, but the timing of this library feature will likely coincide with
 37 | concept's standardization.
 38 | 
 39 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast
 40 | function, as `memcpy` itself isn't `constexpr`. Marking the proposed function as
 41 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`, but
 42 | requires compiler support. This leaves implementations free to use their own
 43 | internal solution (e.g. LLVM has <a
 44 | href="http://llvm.org/docs/LangRef.html#bitcast-to-instruction">a `bitcast`
 45 | opcode</a>).
 46 | 
 47 | We should standardize this oft-used idiom, and avoid the pitfalls once and for
 48 | all.
 49 | 
 50 | 
 51 | Proposed Wording {#word}
 52 | ================
 53 | 
 54 | Below, substitute the `�` character with a number or name the editor finds
 55 | appropriate for the sub-section.
 56 | 
 57 | In 20.5.1.2 [**headers**] add the header `<bit>` to:
 58 | 
 59 |   * Table 16 — C++ library headers
 60 |   * Table 19 — C++ headers for freestanding implementations
 61 | 
 62 | In the numerics section, add the following:
 63 | 
 64 | <ins>
 65 | 29.� Bit manipulation library [**bit**] {#bit}
 66 | ---------------------------------------
 67 | 
 68 | 29.�.1 General [**bit.general**] {#bitgen}
 69 | --------------------------------
 70 | 
 71 | The header `<bit>` provides components to access, manipulate and process both
 72 | individual bits and bit sequences.
 73 | 
 74 | 29.�.2 Header `<bit>` synopsis [**bit.syn**] {#bitsyn}
 75 | --------------------------------------------
 76 | 
 77 | <xmp>
 78 | namespace std {
 79 |   
 80 |   // 29.�.3 bit_cast
 81 |   template<typename To, typename From>
 82 |   constexpr To bit_cast(const From& from) noexcept;
 83 |   
 84 | }
 85 | </xmp>
 86 | 
 87 | 29.�.3 Function template `bit_cast` [**bit.cast**] {#bitcast}
 88 | --------------------------------------------------
 89 | 
 90 | <xmp>
 91 |   template<typename To, typename From>
 92 |   constexpr To bit_cast(const From& from) noexcept;
 93 | </xmp>
 94 | 
 95 | <ol>
 96 | <li>*Remarks*:
 97 | 
 98 |   This function shall not participate in overload resolution unless:
 99 |   <ul>
100 |     <li>`sizeof(To) == sizeof(From)` is `true`;</li>
101 |     <li>`is_trivially_copyable_v<To>` is `true`; and</li>
102 |     <li>`is_trivially_copyable_v<From>` is `true`.</li>
103 |   </ul>
104 | 
105 |   This function shall be `constexpr` if and only if `To`, `From`, and the types
106 |   of all subobjects of `To` and `From` are types `T` such that:
107 |   
108 |   <ul>
109 |     <li>`is_union_v<T>` is `false`;</li>
110 |     <li>`is_pointer_v<T>` is `false`;</li>
111 |     <li>`is_member_pointer_v<T>` is `false`;</li>
112 |     <li>`is_volatile_v<T>` is `false`; and</li>
113 |     <li>`T` has no non-static data members of reference type.</li>
114 |   </ul>
115 | </li>
116 | <li>*Returns*:
117 | 
118 |     An object of type `To`. Each bit of the value representation of the result
119 |     is equal to the corresponding bit in the object representation of
120 |     `from`. Padding bits of the `To` object are unspecified. If there is no
121 |     value of type `To` corresponding to the value representation produced, the
122 |     behavior is undefined. If there are multiple such values, which value is
123 |     produced is unspecified.
124 | 
125 | </li>
126 | </ol>
127 | </ins>
128 | 
129 | Feature testing {#test}
130 | ---------------
131 | 
132 | The `__cpp_lib_bit_cast` feature test macro should be added.
133 | 
134 | Appendix {#appendix}
135 | ========
136 | 
137 | The Standard's [**basic.types**] section explicitly blesses `memcpy`:
138 | 
139 | <blockquote>
140 | 
141 |   For any trivially copyable type `T`, if two pointers to `T` point to distinct
142 |   `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class
143 |   subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into
144 |   `obj2`, `obj2` shall subsequently hold the same value as `obj1`.
145 | 
146 |   [*Example:*
147 | ```
148 |     T* t1p;
149 |     T* t2p;
150 |     // provided that t2p points to an initialized object ...
151 |     std::memcpy(t1p, t2p, sizeof(T));
152 |     // at this point, every subobject of trivially copyable type in *t1p contains
153 |     // the same value as the corresponding subobject in *t2p
154 | ```
155 |   — *end example*]
156 | 
157 | </blockquote>
158 | 
159 | Whereas section [**class.union**] says:
160 | 
161 | <blockquote>
162 | 
163 |   In a union, at most one of the non-static data members can be
164 |   active at any time, that is, the value of at most one of the
165 |   non-static data members can be stored in a union at any time.
166 | 
167 | </blockquote>
168 | 
169 | 
170 | Revision History {#rev}
171 | ================
172 | 
173 | r1 ➡ r2 {#r1r2}
174 | --------
175 | 
176 | The paper was reviewed by LEWG at the 2017 Toronto meeting and feedback was
177 | provided. In the 2017 Albuquerque meeting LEWG provided feedback regarding usage
178 | of concepts while discussing [[P0802r0]], and EWG reviewed the paper:
179 | 
180 |   * Use "shall not participate in overload resolution" wording instead of a
181 |     requires clause.
182 |   * The author was asked to explore naming. LEWG took a poll in Albuquerque and
183 |     voted to keep `bit_cast`.
184 |   * There was strong sentiment that this facility should be available in
185 |     freestanding implementations. LEWG is changing its guidance regarding
186 |     freestanding header granularity, but until guidance is actually changed it
187 |     was decided that a currently freestanding header should be used. LEWG took a
188 |     poll in Albuquerque, and the new `<bit>` header was chosen instead of
189 |     `<cstddef>`.
190 |   * Call out that `constexpr` requires compiler support.
191 |   * Make `constexpr` conditional, similar to variant's [variant.ctor] wording,
192 |     based on an EWG straw poll in Albuquerque.
193 |   * LWG review made the `constexpr` remark recursive, and tuned the return
194 |     wording, asking CWG to review the changes.
195 |   * LWG review requested that this paper also add the `<bit>` header, and let
196 |     the editor resolve races if multiple papers add the header concurrently.
197 |   * CWG substantially tuned the wording.
198 | 
199 | r0 ➡ r1 {#r0r1}
200 | --------
201 | 
202 | The paper was reviewed by LEWG at the 2016 Issaquah meeting:
203 | 
204 | * Remove the standard layout requirement—trivially copyable suffices for the `memcpy` requirement.
205 | * We discussed removing `constexpr`, but there was no consent either way. There was some suggestion that it’ll be hard for implementers, but there's also some desire (by the same implementers) to have those features available in order to support things like `constexpr` instances of `std::variant`.
206 | * The pointer-forbidding logic was removed. It was initially there to help developers when a better tool is available, but it's easily worked around (e.g. with a `struct` containing a pointer). Note that this doesn't prevent `constexpr` versions of `bit_cast`: the implementation is allowed to error out on `bit_cast` of pointer.
207 | * Some discussion about concepts-usage, but it seems like mostly an LWG issue and we're reasonably sure that concepts will land before this or in a compatible vehicle.
208 | 
209 | Straw polls:
210 | 
211 | * Do we want to see [[P0476r0]] again? unanimous consent.
212 | * `bit_cast` should allow pointer types in `To` and `From`. **SF F N A SA** 4 5 4 2 1
213 | * `bit_cast` should be `constexpr`? **SF F N A SA** 4 3 7 2 3
214 | 
215 | 
216 | Acknowledgement {#ack}
217 | ===============
218 | 
219 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review
220 | and suggested improvements.
221 | 


--------------------------------------------------------------------------------
/source/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # Papers documentation build configuration file, created by
  4 | # sphinx-quickstart on Sun Mar 22 16:26:35 2015.
  5 | #
  6 | # This file is execfile()d with the current directory set to its
  7 | # containing dir.
  8 | #
  9 | # Note that not all possible configuration values are present in this
 10 | # autogenerated file.
 11 | #
 12 | # All configuration values have a default; values that are commented out
 13 | # serve to show the default.
 14 | 
 15 | import sys
 16 | import os
 17 | 
 18 | # If extensions (or modules to document with autodoc) are in another directory,
 19 | # add these directories to sys.path here. If the directory is relative to the
 20 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 21 | #sys.path.insert(0, os.path.abspath('.'))
 22 | 
 23 | # -- General configuration ------------------------------------------------
 24 | 
 25 | # If your documentation needs a minimal Sphinx version, state it here.
 26 | #needs_sphinx = '1.0'
 27 | 
 28 | # Add any Sphinx extension module names here, as strings. They can be
 29 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 30 | # ones.
 31 | extensions = [
 32 |     'sphinx.ext.todo',
 33 | ]
 34 | 
 35 | # Add any paths that contain templates here, relative to this directory.
 36 | templates_path = ['_templates']
 37 | 
 38 | # The suffix of source filenames.
 39 | source_suffix = '.rst'
 40 | 
 41 | # The encoding of source files.
 42 | #source_encoding = 'utf-8-sig'
 43 | 
 44 | # The master toctree document.
 45 | master_doc = 'index'
 46 | 
 47 | # General information about the project.
 48 | project = u'Papers'
 49 | copyright = u'2015, JF Bastien'
 50 | 
 51 | # The version info for the project you're documenting, acts as replacement for
 52 | # |version| and |release|, also used in various other places throughout the
 53 | # built documents.
 54 | #
 55 | # The short X.Y version.
 56 | version = '1.0'
 57 | # The full version, including alpha/beta/rc tags.
 58 | release = '1.0'
 59 | 
 60 | # The language for content autogenerated by Sphinx. Refer to documentation
 61 | # for a list of supported languages.
 62 | #language = None
 63 | 
 64 | # There are two options for replacing |today|: either, you set today to some
 65 | # non-false value, then it is used:
 66 | #today = ''
 67 | # Else, today_fmt is used as the format for a strftime call.
 68 | #today_fmt = '%B %d, %Y'
 69 | 
 70 | # List of patterns, relative to source directory, that match files and
 71 | # directories to ignore when looking for source files.
 72 | exclude_patterns = []
 73 | 
 74 | # The reST default role (used for this markup: `text`) to use for all
 75 | # documents.
 76 | #default_role = None
 77 | 
 78 | # If true, '()' will be appended to :func: etc. cross-reference text.
 79 | #add_function_parentheses = True
 80 | 
 81 | # If true, the current module name will be prepended to all description
 82 | # unit titles (such as .. function::).
 83 | #add_module_names = True
 84 | 
 85 | # If true, sectionauthor and moduleauthor directives will be shown in the
 86 | # output. They are ignored by default.
 87 | #show_authors = False
 88 | 
 89 | # The name of the Pygments (syntax highlighting) style to use.
 90 | pygments_style = 'sphinx'
 91 | 
 92 | # A list of ignored prefixes for module index sorting.
 93 | #modindex_common_prefix = []
 94 | 
 95 | # If true, keep warnings as "system message" paragraphs in the built documents.
 96 | #keep_warnings = False
 97 | 
 98 | 
 99 | # -- Options for HTML output ----------------------------------------------
100 | 
101 | # The theme to use for HTML and HTML Help pages.  See the documentation for
102 | # a list of builtin themes.
103 | html_theme = 'basic'
104 | 
105 | # Theme options are theme-specific and customize the look and feel of a theme
106 | # further.  For a list of options available for each theme, see the
107 | # documentation.
108 | #html_theme_options = {}
109 | 
110 | # Add any paths that contain custom themes here, relative to this directory.
111 | html_theme_path = ['_templates/']
112 | 
113 | # The name for this set of Sphinx documents.  If None, it defaults to
114 | # "<project> v<release> documentation".
115 | html_title = ''
116 | 
117 | # A shorter title for the navigation bar.  Default is the same as html_title.
118 | #html_short_title = None
119 | 
120 | # The name of an image file (relative to this directory) to place at the top
121 | # of the sidebar.
122 | #html_logo = None
123 | 
124 | # The name of an image file (within the static path) to use as favicon of the
125 | # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
126 | # pixels large.
127 | #html_favicon = None
128 | 
129 | # Add any paths that contain custom static files (such as style sheets) here,
130 | # relative to this directory. They are copied after the builtin static files,
131 | # so a file named "default.css" will overwrite the builtin "default.css".
132 | html_static_path = ['_static']
133 | 
134 | # Add any extra paths that contain custom files (such as robots.txt or
135 | # .htaccess) here, relative to this directory. These files are copied
136 | # directly to the root of the documentation.
137 | #html_extra_path = []
138 | 
139 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
140 | # using the given strftime format.
141 | #html_last_updated_fmt = '%b %d, %Y'
142 | 
143 | # If true, SmartyPants will be used to convert quotes and dashes to
144 | # typographically correct entities.
145 | #html_use_smartypants = True
146 | 
147 | # Custom sidebar templates, maps document names to template names.
148 | #html_sidebars = {}
149 | 
150 | # Additional templates that should be rendered to pages, maps page names to
151 | # template names.
152 | #html_additional_pages = {}
153 | 
154 | # If false, no module index is generated.
155 | #html_domain_indices = True
156 | 
157 | # If false, no index is generated.
158 | #html_use_index = True
159 | 
160 | # If true, the index is split into individual pages for each letter.
161 | #html_split_index = False
162 | 
163 | # If true, links to the reST sources are added to the pages.
164 | #html_show_sourcelink = True
165 | 
166 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
167 | #html_show_sphinx = True
168 | 
169 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
170 | #html_show_copyright = True
171 | 
172 | # If true, an OpenSearch description file will be output, and all pages will
173 | # contain a <link> tag referring to it.  The value of this option must be the
174 | # base URL from which the finished HTML is served.
175 | #html_use_opensearch = ''
176 | 
177 | # This is the file name suffix for HTML files (e.g. ".xhtml").
178 | #html_file_suffix = None
179 | 
180 | # Output file base name for HTML help builder.
181 | htmlhelp_basename = 'Papersdoc'
182 | 
183 | 
184 | # -- Options for LaTeX output ---------------------------------------------
185 | 
186 | latex_elements = {
187 | # The paper size ('letterpaper' or 'a4paper').
188 | #'papersize': 'letterpaper',
189 | 
190 | # The font size ('10pt', '11pt' or '12pt').
191 | #'pointsize': '10pt',
192 | 
193 | # Additional stuff for the LaTeX preamble.
194 | #'preamble': '',
195 | }
196 | 
197 | # Grouping the document tree into LaTeX files. List of tuples
198 | # (source start file, target name, title,
199 | #  author, documentclass [howto, manual, or own class]).
200 | latex_documents = [
201 |   ('index', 'Papers.tex', u'Papers Documentation',
202 |    u'JF Bastien', 'manual'),
203 | ]
204 | 
205 | # The name of an image file (relative to this directory) to place at the top of
206 | # the title page.
207 | #latex_logo = None
208 | 
209 | # For "manual" documents, if this is true, then toplevel headings are parts,
210 | # not chapters.
211 | #latex_use_parts = False
212 | 
213 | # If true, show page references after internal links.
214 | #latex_show_pagerefs = False
215 | 
216 | # If true, show URL addresses after external links.
217 | #latex_show_urls = False
218 | 
219 | # Documents to append as an appendix to all manuals.
220 | #latex_appendices = []
221 | 
222 | # If false, no module index is generated.
223 | #latex_domain_indices = True
224 | 
225 | 
226 | # -- Options for manual page output ---------------------------------------
227 | 
228 | # One entry per manual page. List of tuples
229 | # (source start file, name, description, authors, manual section).
230 | man_pages = [
231 |     ('index', 'papers', u'Papers Documentation',
232 |      [u'JF Bastien'], 1)
233 | ]
234 | 
235 | # If true, show URL addresses after external links.
236 | #man_show_urls = False
237 | 
238 | 
239 | # -- Options for Texinfo output -------------------------------------------
240 | 
241 | # Grouping the document tree into Texinfo files. List of tuples
242 | # (source start file, target name, title, author,
243 | #  dir menu entry, description, category)
244 | texinfo_documents = [
245 |   ('index', 'Papers', u'Papers Documentation',
246 |    u'JF Bastien', 'Papers', 'One line description of project.',
247 |    'Miscellaneous'),
248 | ]
249 | 
250 | # Documents to append as an appendix to all manuals.
251 | #texinfo_appendices = []
252 | 
253 | # If false, no module index is generated.
254 | #texinfo_domain_indices = True
255 | 
256 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
257 | #texinfo_show_urls = 'footnote'
258 | 
259 | # If true, do not generate a @detailmenu in the "Top" node's menu.
260 | #texinfo_no_detailmenu = False
261 | 


--------------------------------------------------------------------------------
/source/p0528r3.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange
  3 | Shortname: P0528
  4 | Revision: 3
  5 | Audience: CWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P0528r3
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0528r3.bs">github.com/jfbastien/papers/blob/master/source/P0528r3.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Editor: Michael Spencer, Sony Playstation, bigcheesegs@gmail.com
 12 | Abstract: Compare-and-exchange on a struct with padding bits should Just Work.
 13 | Date: 2018-06-07
 14 | Markup Shorthands: markdown yes
 15 | </pre>
 16 | 
 17 | This issue has been discussed by the authors at every recent Standards meetings,
 18 | yet a full solution has been elusive despite helpful proposals. We believe that
 19 | this proposal can fix this oft-encountered problem once and for all.
 20 | 
 21 | [[P0528r0]] details extensive background on this problem (not repeated here),
 22 | and proposed standardizing a trait, `has_padding_bits`, and using it on
 23 | `compare_and_exchange_*`. [[P0528r1]] applied EWG guidance and simply added
 24 | wording directing implementations to ensure that the desired behavior occur. At
 25 | SG1's request this paper follows EWG's guidance but uses different wording.
 26 | 
 27 | 
 28 | Edit History {#edit}
 29 | ============
 30 | 
 31 | r2 → r3 {#r2r3}
 32 | -------
 33 | 
 34 | In Rapperswil, CWG suggested various wording updates to the paper.
 35 | 
 36 | 
 37 | r1 → r2 {#r1r2}
 38 | -------
 39 | 
 40 | In Jacksonville, SG1 supported the paper but suggested an alternate way to
 41 | approach the wording than the one EWG proposed in Albuquerque: don't talk about
 42 | contents of the memory, but rather discuss the value representation to describe
 43 | compare-and-exchange. This paper follows SG1's guidance and offers different
 44 | wording, with the intent that the semantics be equivalent. EWG reviewed the
 45 | updated wording an voted to support it and forward to Core.
 46 | 
 47 | r0 → r1 {#r0r1}
 48 | -------
 49 | 
 50 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming
 51 | value of `T` have a consistent value for the purposes of read/modify/write
 52 | atomic operations?
 53 | 
 54 | Purposefully not addressed in this paper:
 55 | 
 56 |   * `union` with padding bits
 57 |   * Types with trap representations
 58 | 
 59 | Proposed Wording {#word}
 60 | ================
 61 | 
 62 | In Operations on `atomic` types [**atomics.types.operations**], edit ❡17 and
 63 | onwards as follows:
 64 | 
 65 | <blockquote>
 66 | 
 67 | <pre>
 68 | 
 69 | bool compare_exchange_weak(T& expected, T desired,
 70 |                            memory_order success, memory_order failure) volatile noexcept;
 71 | bool compare_exchange_weak(T& expected, T desired,
 72 |                            memory_order success, memory_order failure) noexcept;
 73 | bool compare_exchange_strong(T& expected, T desired,
 74 |                              memory_order success, memory_order failure) volatile noexcept;
 75 | bool compare_exchange_strong(T& expected, T desired,
 76 |                              memory_order success, memory_order failure) noexcept;
 77 | bool compare_exchange_weak(T& expected, T desired,
 78 |                            memory_order order = memory_order::seq_cst) volatile noexcept;
 79 | bool compare_exchange_weak(T& expected, T desired,
 80 |                            memory_order order = memory_order::seq_cst) noexcept;
 81 | bool compare_exchange_strong(T& expected, T desired,
 82 |                              memory_order order = memory_order::seq_cst) volatile noexcept;
 83 | bool compare_exchange_strong(T& expected, T desired,
 84 |                              memory_order order = memory_order::seq_cst) noexcept;
 85 | 
 86 | </pre>
 87 | 
 88 | </blockquote>
 89 | 
 90 | ❡17:
 91 | 
 92 | <blockquote>
 93 | 
 94 | *Requires:* The `failure` argument shall not be `memory_order::release` nor
 95 | `memory_order::acq_rel`.
 96 | 
 97 | </blockquote>
 98 | 
 99 | ❡18:
100 | 
101 | <blockquote>
102 | 
103 | *Effects:* Retrieves the value in `expected`. It then atomically compares
104 | the <del>contents of the memory</del><ins>value representation of the value</ins>
105 | pointed to by `this` for equality with that previously retrieved from
106 | `expected`, and if true, replaces the <del>contents of the memory</del><ins>value</ins>
107 | pointed to by `this` with that in
108 | `desired`. If and only if the comparison is true, memory is affected according
109 | to the value of `success`, and if the comparison is false, memory is affected
110 | according to the value of `failure`. When only one `memory_order` argument is
111 | supplied, the value of `success` is `order`, and the value of `failure` is
112 | `order` except that a value of `memory_order::acq_rel` shall be replaced by the
113 | value `memory_order::acquire` and a value of `memory_order::release` shall be
114 | replaced by the value `memory_order::relaxed`. If and only if the comparison is
115 | false then, after the atomic operation, the <del>contents of the
116 | memory</del><ins>value</ins> in `expected` <del>are</del><ins>is</ins>
117 | replaced by the value<del> read from the memory </del> pointed to
118 | by `this` during the atomic comparison. If the operation returns `true`, these
119 | operations are atomic read-modify-write operations on the memory pointed to by
120 | `this`. Otherwise, these operations are atomic load operations on that memory.
121 | 
122 | </blockquote>
123 | 
124 | ❡19:
125 | 
126 | <blockquote>
127 | 
128 | *Returns:* The result of the comparison.
129 | 
130 | </blockquote>
131 | 
132 | ❡20:
133 | 
134 | <blockquote>
135 | 
136 | [*Note:*
137 | 
138 |   For example, the effect of `compare_exchange_strong` <ins>on objects without padding bits </ins>is
139 |   
140 |   <xmp>
141 |   
142 |     if (memcmp(this, &expected, sizeof(*this)) == 0)
143 |       memcpy(this, &desired, sizeof(*this));
144 |     else
145 |        memcpy(expected, this, sizeof(*this));
146 | 
147 |   </xmp>
148 | 
149 | —*end note*]
150 | 
151 | [*Example:*
152 | 
153 |   The expected use of the compare-and-exchange operations is as follows. The
154 |   compare-and-exchange operations will update `expected` when another iteration
155 |   of the loop is needed.
156 |   
157 |   <xmp>
158 | 
159 |     expected = current.load();
160 |     do {
161 |       desired = function(expected);
162 |     } while (!current.compare_exchange_weak(expected, desired));
163 | 
164 |   </xmp>
165 |   
166 | —*end example*]
167 |   
168 | [*Example:*
169 | 
170 |   Because the expected value is updated only on failure, code releasing the
171 |   memory containing the `expected` value on success will work. E.g. list head
172 |   insertion will act atomically and would not introduce a data race in the
173 |   following code:
174 |   
175 |   <xmp>
176 | 
177 |     do {
178 |       p->next = head; // make new list node point to the current head
179 |     } while (!head.compare_exchange_weak(p->next, p)); // try to insert
180 | 
181 |   </xmp>
182 |   
183 | —*end example*]
184 | 
185 | </blockquote>
186 | 
187 | ❡21:
188 | 
189 | <blockquote>
190 | 
191 | Implementations should ensure that weak compare-and-exchange operations do not
192 | consistently return `false` unless either the atomic object has value different
193 | from `expected` or there are concurrent modifications to the atomic object.
194 | 
195 | </blockquote>
196 | 
197 | ❡22:
198 | 
199 | <blockquote>
200 | 
201 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is,
202 | even when the contents of memory referred to by `expected` and `this` are equal,
203 | it may return `false` and store back to `expected` the same memory contents that
204 | were originally there.
205 | 
206 | [*Note:*
207 | 
208 |   This spurious failure enables implementation of compare-and-exchange on a
209 |   broader class of machines, e.g., load-locked store-conditional machines. A
210 |   consequence of spurious failure is that nearly all uses of weak
211 |   compare-and-exchange will be in a loop. When a compare-and-exchange is in a
212 |   loop, the weak version will yield better performance on some platforms. When a
213 |   weak compare-and-exchange would require a loop and a strong one would not, the
214 |   strong one is preferable.
215 | 
216 | —*end note*]
217 | 
218 | </blockquote>
219 | 
220 | ❡23:
221 | 
222 | <blockquote>
223 | 
224 | [*Note:*
225 | 
226 |   <ins>Under cases where the </ins><del>The</del> `memcpy` and `memcmp`
227 |   semantics of the compare-and-exchange operations <ins>apply, the outcome might
228 |   be</ins><del> may result in</del> failed comparisons for values that compare
229 |   equal with `operator==` if the underlying type has <del>padding bits, </del>trap bits<del>,</del> or
230 |   alternate representations of the same value. Notably, on implementations
231 |   conforming to ISO/IEC/IEEE 60559, floating-point `-0.0` and `+0.0` will not
232 |   compare equal with `memcmp` but will compare equal with `operator==`, and NaNs
233 |   with the same payload will compare equal with `memcmp` but will not compare
234 |   equal with `operator==`.
235 | 
236 | —*end note*]
237 | 
238 | <ins>
239 | 
240 | [*Note:*
241 | 
242 |   Because compare-and-exchange acts on an object’s value representation, padding
243 |   bits that never participate in the object’s value representation are ignored.
244 | 
245 |   As a consequence, the following code is guaranteed to avoid spurious failure:
246 | 
247 |   <xmp>
248 | 
249 |   struct padded {
250 |     char clank = 0x42;
251 |     // Padding here.
252 |     unsigned biff = 0xC0DEFEFE;
253 |   };
254 |   atomic<padded> pad = ATOMIC_VAR_INIT({});
255 | 
256 |   bool zap() {
257 |     padded expected, desired { 0, 0 };
258 |     return pad.compare_exchange_strong(expected, desired);
259 |   }
260 | 
261 |   </xmp>
262 | 
263 | —*end note*]
264 | 
265 | [*Note:*
266 | 
267 |   For a union with bits that participate in the value representation of some
268 |   members but not others, compare-and-exchange might always fail. This is because
269 |   such padding bits have an indeteminate value when they do not participate in
270 |   the value representation of the active member.
271 | 
272 |   As a consequence, the following code is not guaranteed to ever succeed:
273 |   
274 |   <xmp>
275 | 
276 |   union pony {
277 |     double celestia = 0.;
278 |     short luna; // padded
279 |   };
280 |   atomic<pony> princesses = ATOMIC_VAR_INIT({});
281 | 
282 |   bool party(pony desired) {
283 |     pony expected;
284 |     return princesses.compare_exchange_strong(expected, desired);
285 |   }
286 | 
287 |   </xmp>
288 | 
289 | —*end note*]
290 | 
291 | </ins>
292 | 
293 | </blockquote>
294 | 


--------------------------------------------------------------------------------
/source/N4522.rst:
--------------------------------------------------------------------------------
  1 | ==============================================
  2 | N4522 ``std::atomic_object_fence(mo, T&&...)``
  3 | ==============================================
  4 | 
  5 | :Author: Olivier Giroux
  6 | :Contact: ogiroux@nvidia.com
  7 | :Author: JF Bastien
  8 | :Contact: jfb@google.com
  9 | :Date: 2015-05-21
 10 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4522.rst
 11 | 
 12 | ---------
 13 | Rationale
 14 | ---------
 15 | 
 16 | Fences allow programmers to express a conservative approximation to the precise
 17 | pair-wise relations of operations required to be ordered in the happens-before
 18 | relation. This is conservative because fences use the sequenced-before relation
 19 | to select vast extents of the program into the happens-before relation.
 20 | 
 21 | This conservatism is commonly desired because it is difficult to reason about
 22 | operations hidden behind layers of abstraction in C++ programs. An unfortunate
 23 | consequence of this is that precise expression of ordering is not possible in
 24 | C++ currently, which makes it easy to over-constrain the order of operations
 25 | internal to synchronization primitives that comprise multiple atomic objects.
 26 | This constrains the ability of implementations (compiler and hardware) to
 27 | reorder, ignore, or assume the absence of operations that are not relevant or
 28 | not visible.
 29 | 
 30 | In existing practice, the ``flush`` primitive of OpenMP is more expressive than
 31 | the fences of C++ in at least this one sense: it can optionally restrict the
 32 | ordering of operations to a developer-specified set of memory locations. This is
 33 | enough to exactly express the required pair-wise ordering for short lock-free
 34 | algorithms. This capability isn't only relevant to OpenMP and would be further
 35 | enhanced if it was integrated with the other facets of the more modern C++
 36 | memory model.
 37 | 
 38 | An example use-case for this capability is a likely implementation strategy for
 39 | N4392_'s ``std::barrier`` object. This algorithm makes ordered modifications on
 40 | the atomic sub-objects of a larger non-atomic synchronization object, but the
 41 | internal modifications need only be ordered with respect to each other, not all
 42 | surrounding objects (they are ordered separately).
 43 | 
 44 | .. _N4392: http://wg21.link/N4392
 45 | 
 46 | In one example implementation, ``std::barrier`` is coded as follows:
 47 | 
 48 | .. code-block:: c++
 49 | 
 50 |   struct barrier {
 51 |       // Some member functions elided.
 52 |       void arrive_and_wait() {
 53 |           int const myepoch = epoch.load(memory_order_relaxed);
 54 |           int const result = arrived.fetch_add(1, memory_order_acq_rel) + 1;
 55 |           if (result == expected) {
 56 |               expected = nexpected.load(memory_order_relaxed);
 57 |               arrived.store(0, memory_order_relaxed);
 58 |               // Only need to order {expected, arrived} -> {epoch}.
 59 |               epoch.store(myepoch + 1, memory_order_release);
 60 |           }
 61 |           else
 62 |               while (epoch.load(memory_order_acquire) == myepoch)
 63 |                   ;
 64 |       }
 65 |   private:
 66 |       int expected;
 67 |       atomic<int> arrived, nexpected, epoch;
 68 |   };
 69 | 
 70 | The release operation on the epoch atomic is likely to require the compiler to
 71 | insert a fence that has an effect that goes beyond the intended constraint,
 72 | which is to order only the operations on the barrier object. Since the barrier
 73 | object is likely to be smaller than a cache line and the library's
 74 | implementation can control its alignment using ``alignas``, then it would be
 75 | possible to compile this program without a fence in this location on
 76 | architectures that are cache-line coherent. 
 77 | 
 78 | To concisely express the bound on the set of memory operations whose order is 
 79 | constrained, we propose to accompany ``std::atomic_thread_fence`` with an 
 80 | ``object`` variant which takes a reference to the object(s) to be ordered by 
 81 | the fence.
 82 | 
 83 | -----------------
 84 | Proposed addition
 85 | -----------------
 86 | 
 87 | Under 29.2 Header ``<atomic>`` synopsis [**atomics.syn**]:
 88 | 
 89 | .. code-block:: c++
 90 | 
 91 |   namespace std {
 92 |      // 29.8, fences
 93 |      // ...
 94 |      template<class... T>
 95 |      void atomic_object_fence(memory_order, T&&... objects) noexcept;
 96 |    }
 97 | 
 98 | Under 29.8 Fences [**atomics.fences**], after the current
 99 | ``atomic_thread_fence`` paragraph:
100 | 
101 | ``template<class... T> void atomic_object_fence(memory_order, T&&... objects) noexcept;``
102 | 
103 | *Effect*: Equivalent to ``atomic_thread_fence(order)`` except that operations on
104 | objects other than those in the variadic template arguments and their
105 | sub-objects are *un-sequenced* with the fence.
106 | 
107 | *Note*: The compiler may omit fences entirely depending on alignment
108 | information, may generate a dynamic test leading to a fence for under-aligned
109 | objects, or may emit the same fence an ``atomic_thread_fence`` would.
110 | 
111 | The ``__cpp_lib_atomic_object_fence`` feature test macro should be added.
112 | 
113 | ----------------------
114 | Example implementation
115 | ----------------------
116 | 
117 | A trivial, yet conforming implementation may implement the new fence in terms of
118 | the existing ``std::atomic_thread_fence`` using the same memory order:
119 | 
120 | .. code-block:: c++
121 | 
122 |      template<class... T>
123 |      void atomic_object_fence(std::memory_order order, T &&...) noexcept {
124 |        std::atomic_thread_fence(order);
125 |      }
126 | 
127 | A more advanced implementation can overload this for the single-object case
128 | on architectures (or micro-architectures) that have cache coherency with a known 
129 | line size, even if it is conservatively approximated:
130 | 
131 | .. code-block:: c++
132 | 
133 |      #define __CACHELINE_SIZE // Secret (micro-)architectural value.
134 |      template <class T>
135 |      std::enable_if_t<std::is_standard_layout<T>::value &&
136 |                       __CACHELINE_SIZE - alignof(T) % __CACHELINE_SIZE >= sizeof(T)>
137 |      atomic_object_fence(std::memory_order, T &&object) noexcept {
138 |        asm volatile("" : "+m"(object) : "m"(object));  // Code motion barrier.
139 |      }
140 | 
141 | To extend this for multiple objects, an implementation for the same architecture may 
142 | emit a run-time check that the total footprint of all the objects fits in the span of 
143 | a single cache line.  This check may commonly be eliminated as dead code, for example
144 | when the objects are references from a common base pointer.
145 | 
146 | The above ``std::barrier`` example's inner-code can use the new overload as follows:
147 | 
148 | .. code-block:: c++
149 | 
150 |           if (result == expected) {
151 |               expected = nexpected.load(memory_order_relaxed);
152 |               arrived.store(0, memory_order_relaxed);
153 | 	      atomic_object_fence(memory_order_release, *this);
154 |               epoch.store(myepoch + 1, memory_order_relaxed);
155 |           }
156 | 
157 | It is equivalently valid to list the individual members of ``barrier`` instead of
158 | ``*this``. Both forms are equivalent.
159 | 
160 | Less trivial implementations of ``std::atomic_object_fence`` can enable more 
161 | optimizations for new hardware and portable program representations.
162 | 
163 | -----------------
164 | Relation to N4523
165 | -----------------
166 | 
167 | In N4523_ we propose to formalize the notions of false-sharing and true-sharing
168 | as perceived by the implementation in relation to the placement of objects in
169 | memory. In the expository implementation of the previous section we also showed
170 | how a cache-line coherent architecture or micro-architecture can elide fences
171 | that only bisect relations between objects that are in the same cache line, if
172 | provable at compile-time. These notions interact in a virtuous way because
173 | N4523's abstraction enables reasoning about likely cache behavior that
174 | implementations can optimize for.
175 | 
176 | .. _N4523: http://wg21.link/N4523
177 | 
178 | The example application of ``std::atomic_object_fence`` to the ``std::barrier``
179 | object is improved by combining these notions as follows:
180 | 
181 | .. code-block:: c++
182 | 
183 |   alignas(std::thread::hardware_true_sharing_size) // N4523
184 |   struct barrier {
185 |       // Some member functions elided.
186 |       void arrive_and_wait() {
187 |           int const myepoch = epoch.load(memory_order_relaxed);
188 |           int const result = arrived.fetch_add(1, memory_order_acq_rel) + 1;
189 |           if (result == expected) {
190 |               expected = nexpected.load(memory_order_relaxed);
191 |               arrived.store(0, memory_order_relaxed);
192 |               atomic_object_fence(memory_order_release, *this); // N4522
193 |               epoch.store(myepoch + 1, memory_order_relaxed);
194 |           }
195 |           else
196 |               while (epoch.load(memory_order_acquire) == myepoch)
197 |                   ;
198 |       }
199 |   private:
200 |       int expected;
201 |       atomic<int> arrived, nexpected, epoch;
202 |   };
203 | 
204 | By aligning the barrier object to the true-sharing granularity, it is
205 | significantly more likely that the implementation will be able to elide the
206 | fence if the architecture or micro-architecture has cache-line coherency. Of
207 | course an implementation of the Standard is free to ensure this by other means,
208 | we provide this example as exposition for what developer programs might do.
209 | 
210 | --------------------
211 | Memory model example
212 | --------------------
213 | 
214 | =========================== ===========================
215 | T0                          T1
216 | =========================== ===========================
217 | ``0: w = 1;``               ``4: while(!a.load(rlx));``
218 | ``1: x = 1;``               ``5: objfence(acq, a, x);``
219 | ``2: objfence(rel, a, x);`` ``6: assert(x);``
220 | ``3: a.store(1,rlx);``      ``7: assert(w);``
221 | =========================== ===========================
222 | 
223 | The semantics of fences mean that:
224 | 
225 | ``2`` synchronizes-with ``5`` because [**29.8¶2**]:
226 |   A. ``2`` is sequenced-before ``3``,
227 |   B. ``3`` inter-thread happens-before ``4``, and
228 |   C. ``4`` is sequenced-before ``5``.
229 | 
230 | ``1`` happens-before ``6`` because [**1.10¶13-14**]:
231 |   A. ``1`` is sequenced-before ``2``,
232 |   B. ``2`` synchronizes-with ``5``, and
233 |   C. ``5`` is sequenced-before ``6``.
234 | 
235 | Therefore the program is well-defined (so far) and the ``assert(x)`` of ``6``
236 | does not fire.
237 | 
238 | However, the *un-sequenced* semantics of the object fence also mean that:
239 | 
240 | ``0``  conflicts with ``7`` because [**1.10¶23**]:
241 |   A. ``0`` is a store to ``w``, ``7`` is a load of ``w`` and they are not both
242 |      atomic, and
243 |   B. ``0`` is not sequenced-before ``2`` and ``5`` is not sequenced-before
244 |      ``7``.
245 | 
246 | Therefore the ``assert(w)`` of ``7`` makes the program undefined due to a
247 | data-race.
248 | 
249 | 


--------------------------------------------------------------------------------
/source/P1018r5.bs:
--------------------------------------------------------------------------------
  1 | <pre class='metadata'>
  2 | Title: Language Evolution status after Belfast 2019
  3 | Shortname: P1018
  4 | Revision: 5
  5 | Audience: WG21, EWG
  6 | Status: P
  7 | Group: WG21
  8 | URL: http://wg21.link/P1018r5
  9 | !Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P1018r5.bs">github.com/jfbastien/papers/blob/master/source/P1018r5.bs</a>
 10 | Editor: JF Bastien, Apple, jfbastien@apple.com
 11 | Date: 2020-01-04
 12 | Markup Shorthands: markdown yes
 13 | Toggle Diffs: no
 14 | No abstract: false
 15 | Abstract: This paper is a collection of items that language Evolution has worked on in the latest C++ meeting, their status, and plans for the future.
 16 | </pre>
 17 | 
 18 | Executive summary {#summary}
 19 | =================
 20 | 
 21 | Most time was spent in ballot resolution for C++20, to address National Body comments in [[N4844]].
 22 | 
 23 | 
 24 | Work highlights {#high}
 25 | ===============
 26 | 
 27 | Language Evolution received roughly 100 National Body comments. We did at least one round of discussion on all of these comments.
 28 | 
 29 | * Concepts: allow requires clauses on non-template friend functions of class templates.
 30 | * Coroutines: most comments rejected, a few sent away to write a paper.
 31 | * Undefined Behavior: deferred addressing all comments to C++23.
 32 | * Feature test macros: comments were addressed.
 33 | * Modules: many comments, including fixing issues around header units.
 34 | * Changed how non-type template parameters work: allow types with all public members, all of which can themselves be used as NTTPs. This allows array members, reference members, pointers and references to subobjects, floating-point, and unions.
 35 | * Began discussing some papers targeted at C++23.
 36 | 
 37 | 
 38 | National Body comment details {#nb-details}
 39 | =============================
 40 | 
 41 | Miscellaneous NB comments:
 42 | 
 43 | <ul>
 44 | <li>Late-CH01 — <a href="https://github.com/cplusplus/nbballot/issues/375">11.3.5 [class.copy.assgn] p2,5 Defaulted copy and move assignment should have ref-qualifier</a> rejected
 45 | <li>FR222 — <a href="https://github.com/cplusplus/nbballot/issues/219">20.15.10 Replace std::is_constant_evaluated with "if consteval" P1938</a> rejected
 46 | <li>US129 — <a href="https://github.com/cplusplus/nbballot/issues/128">15.1 Rename __has_cpp_attribute to __has_attribute</a> rejected
 47 | <li>US056 — <a href="https://github.com/cplusplus/nbballot/issues/55">09.03 [dcl.init] Revert P0960 (parenthesized initialization of aggregates)</a> accepted
 48 | <li>US055 — <a href="https://github.com/cplusplus/nbballot/issues/54">09.02.3.5 [dcl.fct].18 Parameter with placeholder-type-specifier and default argument is valid but useless</a> rejected
 49 | <li>GB051 — <a href="https://github.com/cplusplus/nbballot/issues/50">08.05.4 Range-based for-loop should use ranges::begin/end</a> rejected
 50 | <li>US040 — <a href="https://github.com/cplusplus/nbballot/issues/39">06.06.2 [intro.object] Adopt implicit object creation P0593</a> accepted
 51 | <li>RU011 — <a href="https://github.com/cplusplus/nbballot/issues/11">[dcl.fct.def.general].8 Make <code>__func__</code> usable in constant expressions</a> rejected
 52 | <li>RU007 — <a href="https://github.com/cplusplus/nbballot/issues/7">[basic.life].8.3 Relax pointer value / aliasing rules</a> accepted
 53 | <li>US212 — <a href="https://github.com/cplusplus/nbballot/issues/209">20.07.3.1 [variant.ctor] Suprising variant construction LWG3228</a> forwarded by CWG, accepted as proposed by CWG
 54 | </ul>
 55 | 
 56 | using enum:
 57 | 
 58 | <ul>
 59 | <li>US043 — <a href="https://github.com/cplusplus/nbballot/issues/42">06.08 [basic.def].2.17 "using enum" feature is too subtle</a> rejected
 60 | <li>US070 — <a href="https://github.com/cplusplus/nbballot/issues/69">09.06.2 [enum.udecl] Keep the "using enum" language feature in C++20</a> not relevant given above rejection
 61 | </ul>
 62 | 
 63 | Non-type template parameters:
 64 | 
 65 | <ul>
 66 | <li>US114 — <a href="https://github.com/cplusplus/nbballot/issues/113">13.05 p1.5 Class types as non-type template arguments</a> accepted
 67 | <li>US102 — <a href="https://github.com/cplusplus/nbballot/issues/101">13.1 p4.1 Allow non-type template parameters of floating-point type P1714</a> with US114
 68 | <li>US092 — <a href="https://github.com/cplusplus/nbballot/issues/91">11.10.01 [class.compare.default] p04.2.1 Array members should have strong structural equality</a> with US114
 69 | <li>US091 — <a href="https://github.com/cplusplus/nbballot/issues/90">11.10.01 p04.1 Strong structural equality for enums</a> rejected
 70 | </ul>
 71 | 
 72 | Concepts:
 73 | 
 74 | <ul>
 75 | <li>CA378 — <a href="https://github.com/cplusplus/nbballot/issues/374">Remove constrained non-template functions</a> accepted
 76 | <li>US115 — <a href="https://github.com/cplusplus/nbballot/issues/114">13.6.4 [temp.friend] Hidden non-template friends need a requires-clause</a> accepted
 77 | <li>US111 — <a href="https://github.com/cplusplus/nbballot/issues/110">13.04.3 p1 Constraint normalization should also normalize negation</a> accepted
 78 | <li>PL103 — <a href="https://github.com/cplusplus/nbballot/issues/102">13.01 [temp.param] Elaborate syntax for constrained type template parameters</a> rejected
 79 | <li>US098 — <a href="https://github.com/cplusplus/nbballot/issues/97">13 p6 `Concept<X>` as type-constraint vs. id-expression</a> rejected
 80 | <li>US058 — <a href="https://github.com/cplusplus/nbballot/issues/57">09.04.1 Validity of bodies of non-templated functions with unsatisifed constraints</a> accepted CWG with CA378 accepted
 81 | <li>GB046 — <a href="https://github.com/cplusplus/nbballot/issues/45">07.05.4 Allow caching of evaluations of concept specializations</a> will see in Prague
 82 | <li>RU012 — <a href="https://github.com/cplusplus/nbballot/issues/12">[expr.prim.req].6 Incomplete types in requires-expressions should be ill-formed</a> rejected
 83 | </ul>
 84 | 
 85 | Coroutines:
 86 | 
 87 | <ul>
 88 | <li>US370 — <a href="https://github.com/cplusplus/nbballot/issues/366">Remove coroutines, revert P0912</a> rejected
 89 | <li>BG049 — <a href="https://github.com/cplusplus/nbballot/issues/48">7.06.2.3 p3.7 Remove await_suspend() that returns void/bool</a> rejected
 90 | <li>BG369 — <a href="https://github.com/cplusplus/nbballot/issues/365">07.06.2.3 p6 Do not use await_suspend returning void in example</a> rejected
 91 | <li>FR068 — <a href="https://github.com/cplusplus/nbballot/issues/67">09.05.4 Rename promise_type::final_suspend</a> rejected
 92 | <li>FR067 — <a href="https://github.com/cplusplus/nbballot/issues/66">09.05.4 Reduce number of coroutines customization points P1477</a> rejected
 93 | <li>US062 — <a href="https://github.com/cplusplus/nbballot/issues/61">09.04.4 [dcl.fct.def.coroutine].5 Make unhandled_exception in promise types optional</a> will see in Prague
 94 | <li>FR066 — <a href="https://github.com/cplusplus/nbballot/issues/65">09.05.4 Make unhandled_exception in promise types optional</a> with US062
 95 | <li>US061 — <a href="https://github.com/cplusplus/nbballot/issues/60">09.04.4 p10 Coroutine allocation should consider std::align_val_t</a> will see in Prague
 96 | <li>BG059 — <a href="https://github.com/cplusplus/nbballot/issues/58">09.04.4 Consistent naming for get_return_object, initial_suspend, final_suspend</a> rejected
 97 | <li>BG060 — <a href="https://github.com/cplusplus/nbballot/issues/59">09.04.4 Fix names of Promise functions</a> rejected
 98 | <li>BG048 — <a href="https://github.com/cplusplus/nbballot/issues/47">07.06.2.3 Rename await_suspend and await_resume</a> rejected
 99 | <li>US071 — <a href="https://github.com/cplusplus/nbballot/issues/70">09.09.4 [dcl.fct.def.coroutine] p1 Rename coroutine keywords away from co_ P1485</a> rejected
100 | <li>FR179 — <a href="https://github.com/cplusplus/nbballot/issues/177">7.12.03.2 Remove coroutine_handle::from_/address</a> rejected
101 | <li>FR001 — <a href="https://github.com/cplusplus/nbballot/issues/1">Allow separation of coroutine_handle and suspend points</a> rejected
102 | <li>BG180 — <a href="https://github.com/cplusplus/nbballot/issues/178">17.12.05 Fix example using await_suspend</a> rejected
103 | </ul>
104 | 
105 | Undefined Behavior:
106 | 
107 | <ul>
108 | <li>US368, US149, US148, US145, US144, US143, US142, US141, US131, US130, US027, US024 — Undefined Behavior in the preprocessor rejected
109 | <li>RU008 — <a href="https://github.com/cplusplus/nbballot/issues/8">[class.cdtor].2 Clarify value of *this during construction</a> rejected
110 | </ul>
111 | 
112 | Modules: seen by SG2, mostly accepted their recommendations.
113 | 
114 | Unicode:
115 | 
116 | <ul>
117 | <li>NL029 — <a href="https://github.com/cplusplus/nbballot/issues/28">05.10 [tab:lex.name.allowed] Disallow zero-width and control characters</a> rejected
118 | </ul>
119 | 
120 | Feature test macros:
121 | 
122 | All addressed by [[P1902r1]].
123 | 
124 | <ul>
125 | <li>US150 — <a href="https://github.com/cplusplus/nbballot/issues/149">15.10 [cpp.predefined] Add feature-test macro for "familiar template syntax for generic lambdas" P1902</a> accepted
126 | <li>GB147 — <a href="https://github.com/cplusplus/nbballot/issues/146">15.10 Add a feature-test macro for "consteval" P1902</a> accepted
127 | <li>GB146 — <a href="https://github.com/cplusplus/nbballot/issues/145">15.10 Add a feature-test macro for concepts P1902</a> accepted
128 | </ul>
129 | 
130 | 
131 | C++23 discussions {#cpp32}
132 | =================
133 | 
134 | We started discussing a few papers which could make it to C++23.
135 | 
136 | * Floating-point types from [[P1467r2]] and [[P1468r2]] received strong support.
137 | * [[P1105r1]] freestanding: there's ongoing interest in better supporting freestanding targets, and we gave direction to the author.
138 | * [[P1371r1]] Pattern matching: moving along, but the authors need help with implementation / usage experience if we want this to make C++23.
139 | * [[P1040r4]] `std::embed`: was seen by this group and others, and received confusing feedback, though most many people agree there's something useful to be had here.
140 | * [[P1219r2]] Homogeneous variadic function parameters: did not receive sufficient support to move forward.
141 | * [[P1097r2]] Named character escapes: received feedback, will see again.
142 | * [[P1895r0]] tag_invoke: A general pattern for supporting customisable functions: the general feeling was that there were some concerns with a library-only solution to the problem. Several interested parties are planning on working with the paper authors to try to come up with such a language feature.
143 | * [[P1676r0]] C++ Exception Optimizations. An experiment: informative discussion.
144 | * [[P1365r0]] Using Coroutine TS with zero dynamic allocations: informative discussion.
145 | * [[P1046r1]] Automatically Generate More Operators: received feedback, fairly positive.
146 | * [[P1908r0]] Reserving Attribute Names for Future Use: accepted, sent to CWG.
147 | * [[P0876r9]] `fiber_context` - fibers without scheduler: targets a TS. Gave feedback, will see again.
148 | * [[P1061r1]] Structured Bindings can introduce a Pack: approve of general direction.
149 | * [[P1839r1]] Accessing Object Representations: approve of general direction.
150 | 
151 | 
152 | Near-future EWG plans {#future}
153 | =====================
154 | 
155 | There will still be some ballot resolution work in Prague, to address comments which we discussed but haven't resolved in Belfast. There will be no further ballot resolution after Prague.
156 | 
157 | Ballot resolution will likely take a small portion of our time. Once that is done, Language Evolution will switch into full C++23 mode, likely following the plans outlined in [[P0592r3]]. These plans were discussed in multiple groups and received strong support.
158 | 


--------------------------------------------------------------------------------