16 |
17 | The purpose of this document is to record the status of issues which have come before the Evolution Working Group (EWG) of the INCITS PL22.16 and ISO WG21 C++ Standards Committee. Issues represent potential defects in the C++ Standard. Issues against Core Language, Library, and Library Evolution are tracked separately.
18 |
19 | EWG issues were previously tracked by [[N4539]].
20 |
21 | This document contains:
22 |
23 | * Evolution issues which are actively being considered by the Evolution Working Group, i.e., issues which have a status of New, Open, Ready, or Review.
24 | * Evolution issues which have have been closed since the document was last updated.
25 |
26 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # C++ Standard Committee Papers
2 |
3 | Build status: [](https://travis-ci.org/jfbastien/papers)
4 |
5 | Official C++ Standard Committee papers are available from [the C++ mailings][].
6 |
7 | More information on the C++ Standard Committee is available on
8 | [the Committee site][].
9 |
10 | I've written a few of these papers and co-authored a few others.
11 | I initially wrote them using reStructuredText, but have now moved to
12 | [bikeshed](https://github.com/tabatkins/bikeshed). Papers in this repository are
13 | final and published when numbered `N` or `P`, and are drafts when numbered
14 | `D`. This is an ISO thing: I can't revise already-published `N` or `P`
15 | papers. The paper revision (the `R` part in `P` numbered papers) has to be
16 | incremented, and a new paper published.
17 |
18 | New paper numbers are obtained through the Committee's Vice-Chair. The Committee's
19 | website details [how to submit proposals][].
20 |
21 | [the Committee site]: https://isocpp.org/std/the-committee
22 | [the C++ mailings]: http://open-std.org/jtc1/sc22/wg21/docs/papers/
23 | [how to submit proposals]: https://isocpp.org/std/submit-a-proposal
24 |
--------------------------------------------------------------------------------
/source/DCanadian.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: Canadian friends are not friends
3 | Shortname: D????
4 | Revision: 0
5 | Audience: EWG
6 | Status: D
7 | Group: WG21
8 | URL: http://wg21.link/P????
9 | !Source: github.com/jfbastien/papers/blob/master/source/DCanadian.bs
10 | Editor: JF Bastien, Woven by Toyota, cxx@jfbastien.com
11 | Editor: Bruno Cardoso Lopes, Meta, bruno.cardoso@gmail.com
12 | Editor: Michael Spencer, Apple, bigcheesegs@gmail.com
13 | Date: 2023-06-13
14 | Markup Shorthands: markdown yes
15 | Toggle Diffs: no
16 | No abstract: true
17 |
18 |
19 | This paper addresses [[CWG1699]].
20 |
21 | ```
22 | import Canadian; // Contains `export class Canadian { class buddy {}; friend struct friendly; };`
23 |
24 | class c {
25 | class n {};
26 | friend struct friendly;
27 | };
28 |
29 | void g() { // #2
30 | // 'n' accessible here?
31 | }
32 |
33 | struct friendly {
34 | friend class c::n; // #1
35 | friend void g(); // #2
36 | friend void h(); // #3
37 | friend void f() { c::n(); } // #4 (EDG/MSVC Reject, Clang/GCC Accept)
38 | friend class Canadian::buddy;
39 | friend void ohCanada() { // #5
40 | // Canadian::buddy accessible here?
41 | }
42 | };
43 |
44 | void h() { // #3
45 | // 'n' accessible here?
46 | }
47 | ```
48 |
--------------------------------------------------------------------------------
/source/P0528r0.cc:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 |
7 | struct Padded {
8 | char c = 0xFF;
9 | // Padding here.
10 | int i = 0xFEEDFACE;
11 | Padded() = default;
12 | };
13 | typedef std::atomic Atomic;
14 | typedef std::aligned_storage::type Storage;
15 |
16 | void peek(const char* what, void *into) {
17 | printf("%16s %08x %08x\n", what, *(int*)into, *(1 + (int*)into));
18 | }
19 |
20 | Storage* create() {
21 | auto* storage = new Storage();
22 | std::memset(storage, 0xBA, sizeof(Storage));
23 | asm volatile("":::"memory");
24 | peek("storage", storage);
25 | return storage;
26 | }
27 |
28 | Atomic* change(Storage* storage) {
29 | // As if we used an allocator which reuses memory.
30 | auto* atomic = new(storage) Atomic;
31 | peek("atomic placed", atomic);
32 | std::atomic_init(atomic, Padded()); // Which bits go in?
33 | peek("atomic init", atomic);
34 | return atomic;
35 | }
36 |
37 | Padded infloop_maybe(Atomic* atomic) {
38 | Padded desired; // Padding unknown.
39 | Padded expected; // Could be different.
40 | peek("desired before", &desired);
41 | peek("expected before", &expected);
42 | peek("atomic before", atomic);
43 | while (
44 | !atomic->compare_exchange_strong(
45 | expected,
46 | desired // Padding bits added and removed here ˙ ͜ʟ˙
47 | ));
48 | peek("expected after", &expected);
49 | peek("atomic after", atomic);
50 | return expected; // Maybe changed here as well.
51 | }
52 |
53 | int main() {
54 | auto* storage = create();
55 | auto* atomic = change(storage);
56 | Padded p = infloop_maybe(atomic);
57 | peek("main", &p);
58 | return 0;
59 | }
60 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | # Makefile for Sphinx documentation
2 | #
3 |
4 | # You can set these variables from the command line.
5 | SPHINXOPTS =
6 | SPHINXBUILD = sphinx-build
7 | PAPER =
8 | BUILDDIR = build
9 |
10 | # User-friendly check for sphinx-build
11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
13 | endif
14 |
15 | # Internal variables.
16 | PAPEROPT_a4 = -D latex_paper_size=a4
17 | PAPEROPT_letter = -D latex_paper_size=letter
18 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
19 |
20 | .PHONY: help clean html linkcheck deploy
21 |
22 | help:
23 | @echo "Please use \`make ' where is one of"
24 | @echo " html to make standalone HTML files"
25 | @echo " linkcheck to check all external links for integrity"
26 | @echo " deploy to deploy to github.io"
27 |
28 | clean:
29 | rm -rf $(BUILDDIR)/*
30 |
31 | html:
32 | echo "Building sphinx sources"
33 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
34 | bikeshed update
35 | find ./source/ -name "*.bs" -type f | xargs -I{} -t -n1 bikeshed spec {}
36 | mv ./source/*.html $(BUILDDIR)/html/
37 | @echo
38 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
39 |
40 | linkcheck:
41 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
42 | @echo
43 | @echo "Link check complete; look for any errors in the above output " \
44 | "or in $(BUILDDIR)/linkcheck/output.txt."
45 |
46 | deploy: clean html linkcheck
47 | ./deploy.sh
48 |
--------------------------------------------------------------------------------
/source/D1501R0.bs:
--------------------------------------------------------------------------------
1 |
19 |
20 | We’ve gathered input from a variety of folks involved in audio at Apple, and
21 | here is our joint, considered position regarding the `std::audio` proposal in
22 | [[P1386R0]].
23 |
24 | Audio is important to the Apple ecosystem. The type system, and determinism of
25 | C++ lends itself well to the audio software domain. In the proposal we like the
26 | formalization of data types and algorithms that are common in the audio domain.
27 | However, we are concerned about the audio device interfaces and requiring C++
28 | systems to have a specific implementation.
29 |
30 | Creating a good interface between software and audio hardware is something that
31 | on the surface seems straightforward, but on a practical system is challenging
32 | to implement correctly. This area has typically been fairly platform-specific or
33 | handled by specialist libraries, and may not be immediately amenable to
34 | standardization. We think it’s best not to standardize audio hardware I/O.
35 |
36 | Instead of attempting to standardize the interface and mechanism of audio
37 | hardware, providing a common representation of audio data could be an area of
38 | exploration that is suited to the language.
39 |
--------------------------------------------------------------------------------
/source/_templates/layout.html:
--------------------------------------------------------------------------------
1 | {#
2 | Single-page template.
3 | #}
4 | {%- block doctype -%}
5 |
6 | {%- endblock %}
7 | {%- set titlesuffix = "" %}
8 |
9 |
10 |
11 |
12 |
13 |
14 |
55 | {%- block htmltitle %}
56 | {{ title|striptags|e }}{{ titlesuffix }}
57 | {%- endblock %}
58 | {%- block extrahead %} {% endblock %}
59 |
60 |
61 | {%- block header %}{% endblock %}
62 | {%- block content %}
63 | {%- block document %}
64 |
2 | Title: Shedding the bikeshed: C++ papers should focus on content, not style
3 | Shortname: D0???
4 | Revision: 1
5 | Audience: all
6 | Status: D
7 | Group: WG21
8 | URL: http://wg21.link/p0????
9 | !Source: github.com/jfbastien/papers/blob/master/source/bikeshed.bs
10 | Editor: JF Bastien, cxx@jfbastien.com
11 | Abstract: Writing a C++ standards committee paper can be as easy as riding a bicycle 🚲
12 | Date: 2016-08-09
13 | Markup Shorthands: markdown yes
14 | Toggle Diffs: yes
15 |
16 |
17 | Coloring the shed {#colour}
18 | =================
19 |
20 | Thoughtful standards people put significant effort into writing their
21 | papers. Often, too much of that effort goes into style or
22 | format instead of content. This meta-paper is ironically all
23 | style and no C++ content. It proposes that you stop formatting and start using
24 | bikeshed.
25 |
26 | While we're at it, we'll also propose that you use a public version control
27 | service such as github to make it easier for
28 | reviewers to see how a paper evolved, both while in draft state as well as from
29 | one revision to another. Final papers are meant to be consumed as-is, but your
30 | paper collaborators, editors, or future-self will thank you when performing
31 | archaeology to untangle the inevitable nonsensical part of your final paper.
32 |
33 | To do {#todo}
34 | =====
35 |
36 | https://github.com/tabatkins/bikeshed/blob/master/docs/quick-start.md
37 |
38 | 1. Basics
39 | - What does the final paper look like?
40 | - What does the source look like? (see section 4.)
41 | - Who uses it?
42 | - Takes care of the boilerplate
43 | 2. Convenience
44 | - Webpages work everywhere
45 | - Readable offline, no downloads
46 | - Unicode Just Works™ (even the EDG wiki now supports it)
47 | 3. Good practice
48 | - github for diffs: easier to track changes
49 | - github integration: auto-generation, etc
50 | 4. markdown + HTML escape hatch
51 | - https://github.com/tabatkins/bikeshed/blob/master/docs/markup.md
52 | - Railroad diagrams
53 | - Code, and syntax highlight
54 | - Toggle diff
55 | 5. Link to other papers
56 | 6. Getting started
57 | - Installing https://github.com/tabatkins/bikeshed/blob/master/docs/install.md
58 |
--------------------------------------------------------------------------------
/source/P1205R0.bs:
--------------------------------------------------------------------------------
1 |
16 |
17 | Issues {#issues}
18 | ======
19 |
20 | The C++ Coroutine TS [[N4736]] has issues 31 and 32 listed in [[P0664R5]]:
21 |
22 | > **31.** Add a note warning about thread switching near await and/or `coroutine_handle` wording.
23 | >
24 | > Add a note warning about thread switching near await and/or `coroutine_handle` wording
25 | >
26 | > **32.** Add a normative text making it UB to migrate coroutines between certain kind of execution agents.
27 | >
28 | > Add a normative text making it UB to migrate coroutines between certain kind of execution agents. Clarify that migrating between `std::thread`s is OK. But migrating between CPU and GPU is UB.
29 |
30 | Discussion {#discuss}
31 | ==========
32 |
33 | Using `co_await`, one can teleport a suspended execution between execution agents:
34 |
35 |
36 | thread::id get_an_id() {
37 |
38 | // here: acquire a lock, read thread_local
39 |
40 | co_yield std::this_thread::get_id(); //< one result
41 |
42 | // UB: release the lock, reuse the same thread_local
43 |
44 | co_return std::this_thread::get_id(); //< different result
45 | }
46 |
47 |
48 | We say "teleport" here because the code that relocates the coroutine is outside
49 | the coroutine, in a possibly unrelated part of the program. This teleportation
50 | can take your coroutine to many interesting places, for example:
51 |
52 | 1. the thread that runs `main`
53 | 2. threads from `std::thread` / `std::async`
54 | 3. elemental functions of `std::par`, `std::par_unseq`, `std::unseq` algorithms
55 | 4. global / `thread_local` constructors (see note)
56 | 5. global / `thread_local` / `static` destructors (see note)
57 | 6. functions registered with `at_exit` / `quick_exit`
58 | 7. signal handlers
59 | 8. future `fibers_context` of [[P0876R3]]
60 |
61 | Note that it is presently implementation-defined whether many of these functions
62 | run in a specific thread, a single thread, or in many unspecified threads—see
63 | [[CWG2046]].
64 |
65 | Proposed Resolution {#resolution}
66 | ===================
67 |
68 | After [[N4736]] [**dcl.fct.def.coroutine**] ❡6:
69 |
70 |
71 |
72 | A suspended coroutine can be resumed to continue execution by invoking a
73 | resumption member function of an object of type `coroutine_handle<P>`
74 | associated with this instance of the coroutine. The function that invoked a
75 | resumption member function is called *resumer*. Invoking a resumption member
76 | function for a coroutine that is not suspended results in undefined behavior.
77 |
78 |
79 |
80 | Add ❡7:
81 |
82 |
83 |
84 |
85 | Resuming a coroutine on an execution agent other than the one it was suspended
86 | on has implementation-defined behavior unless both are instances of
87 | `std::thread`. [*Note*: a coroutine that is moved this way should avoid the use
88 | of `thread_local` or `mutex` objects. — *End note*.]
89 |
90 |
91 |
92 |
--------------------------------------------------------------------------------
/source/N4509.cc:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | namespace std {
5 |
6 | namespace detail {
7 | // It is implementation-defined what this returns, as long as:
8 | //
9 | // if (std::atomic::is_always_lock_free)
10 | // assert(std::atomic()::is_lock_free());
11 | //
12 | // An implementation may therefore have more variable template
13 | // specializations than the ones shown below.
14 | template static constexpr bool is_always_lock_free = false;
15 |
16 | // Implementations must match the C ATOMIC_*_LOCK_FREE macro values.
17 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_BOOL_LOCK_FREE;
18 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE;
19 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE;
20 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE;
21 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR16_T_LOCK_FREE;
22 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR32_T_LOCK_FREE;
23 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_WCHAR_T_LOCK_FREE;
24 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE;
25 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE;
26 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE;
27 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE;
28 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE;
29 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE;
30 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE;
31 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE;
32 | template static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE;
33 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE;
34 |
35 | // The macros do not support float, double, long double, but C++ does
36 | // support atomics of these types. An implementation shall ensure that these
37 | // types, as well as user-defined types, guarantee the above invariant that
38 | // is_always_lock_free implies is_lock_free for the same type.
39 | }
40 |
41 | template
42 | struct atomic_n4509 {
43 | // ...
44 | static constexpr bool is_always_lock_free = detail::is_always_lock_free;
45 | // ...
46 | };
47 |
48 | }
49 |
50 | template using atomic = std::atomic_n4509;
51 |
52 | int main() {
53 | std::cout <<
54 | "bool\t" << atomic::is_always_lock_free << '\n' <<
55 | "char\t" << atomic::is_always_lock_free << '\n' <<
56 | "signed char\t" << atomic::is_always_lock_free << '\n' <<
57 | "unsigned char\t" << atomic::is_always_lock_free << '\n' <<
58 | "char16_t\t" << atomic::is_always_lock_free << '\n' <<
59 | "char32_t\t" << atomic::is_always_lock_free << '\n' <<
60 | "wchar_t\t" << atomic::is_always_lock_free << '\n' <<
61 | "short\t" << atomic::is_always_lock_free << '\n' <<
62 | "unsigned short\t" << atomic::is_always_lock_free << '\n' <<
63 | "int\t" << atomic::is_always_lock_free << '\n' <<
64 | "unsigned int\t" << atomic::is_always_lock_free << '\n' <<
65 | "long\t" << atomic::is_always_lock_free << '\n' <<
66 | "unsigned long\t" << atomic::is_always_lock_free << '\n' <<
67 | "long long\t" << atomic::is_always_lock_free << '\n' <<
68 | "unsigned long long\t" << atomic::is_always_lock_free << '\n' <<
69 | "void*\t" << atomic::is_always_lock_free << '\n' <<
70 | "std::nullptr_t\t" << atomic::is_always_lock_free << '\n';
71 |
72 | return 0;
73 | }
74 |
--------------------------------------------------------------------------------
/source/P0152.cc:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | namespace std {
5 |
6 | namespace detail {
7 | // It is implementation-defined what this returns, as long as:
8 | //
9 | // if (std::atomic::is_always_lock_free)
10 | // assert(std::atomic()::is_lock_free());
11 | //
12 | // An implementation may therefore have more variable template
13 | // specializations than the ones shown below.
14 | template static constexpr bool is_always_lock_free = false;
15 |
16 | // Implementations must match the C ATOMIC_*_LOCK_FREE macro values.
17 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_BOOL_LOCK_FREE;
18 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE;
19 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE;
20 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR_LOCK_FREE;
21 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR16_T_LOCK_FREE;
22 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_CHAR32_T_LOCK_FREE;
23 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_WCHAR_T_LOCK_FREE;
24 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE;
25 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_SHORT_LOCK_FREE;
26 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE;
27 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_INT_LOCK_FREE;
28 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE;
29 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LONG_LOCK_FREE;
30 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE;
31 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_LLONG_LOCK_FREE;
32 | template static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE;
33 | template<> static constexpr bool is_always_lock_free = 2 == ATOMIC_POINTER_LOCK_FREE;
34 |
35 | // The macros do not support float, double, long double, but C++ does
36 | // support atomics of these types. An implementation shall ensure that these
37 | // types, as well as user-defined types, guarantee the above invariant that
38 | // is_always_lock_free implies is_lock_free for the same type.
39 | }
40 |
41 | template
42 | struct atomic_n4509 {
43 | // ...
44 | static constexpr bool is_always_lock_free = detail::is_always_lock_free;
45 | // ...
46 | };
47 |
48 | }
49 |
50 | template using atomic = std::atomic_n4509;
51 |
52 | int main() {
53 | std::cout <<
54 | "bool\t" << atomic::is_always_lock_free << '\n' <<
55 | "char\t" << atomic::is_always_lock_free << '\n' <<
56 | "signed char\t" << atomic::is_always_lock_free << '\n' <<
57 | "unsigned char\t" << atomic::is_always_lock_free << '\n' <<
58 | "char16_t\t" << atomic::is_always_lock_free << '\n' <<
59 | "char32_t\t" << atomic::is_always_lock_free << '\n' <<
60 | "wchar_t\t" << atomic::is_always_lock_free << '\n' <<
61 | "short\t" << atomic::is_always_lock_free << '\n' <<
62 | "unsigned short\t" << atomic::is_always_lock_free << '\n' <<
63 | "int\t" << atomic::is_always_lock_free << '\n' <<
64 | "unsigned int\t" << atomic::is_always_lock_free << '\n' <<
65 | "long\t" << atomic::is_always_lock_free << '\n' <<
66 | "unsigned long\t" << atomic::is_always_lock_free << '\n' <<
67 | "long long\t" << atomic::is_always_lock_free << '\n' <<
68 | "unsigned long long\t" << atomic::is_always_lock_free << '\n' <<
69 | "void*\t" << atomic::is_always_lock_free << '\n' <<
70 | "std::nullptr_t\t" << atomic::is_always_lock_free << '\n';
71 |
72 | return 0;
73 | }
74 |
--------------------------------------------------------------------------------
/source/N4509.rst:
--------------------------------------------------------------------------------
1 | ==================================================
2 | N4509 ``constexpr atomic::is_always_lock_free``
3 | ==================================================
4 |
5 | :Author: Olivier Giroux
6 | :Contact: ogiroux@nvidia.com
7 | :Author: JF Bastien
8 | :Contact: jfb@google.com
9 | :Author: Jeff Snyder
10 | :Contact: jeff-isocpp@caffeinated.me.uk
11 | :Date: 2015-05-05
12 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4509.rst
13 | :Source: https://github.com/jfbastien/papers/blob/master/source/N4509.cc
14 |
15 | The current design for ``std::atomic`` affords implementations the critical
16 | freedom to revert to critical sections when hardware support for atomic
17 | operations does not meet the size or semantic requirements for the associated
18 | type ``T``. This:
19 |
20 | * Preserves C++ support on aging hardware.
21 | * Supports developers who don't target a specific architecture e.g. with the
22 | ``-march=xxx`` flag.
23 | * Improves the portability of abstract representations for C++ programs,
24 | e.g. when compiling C++ code to execute portably within a web browser.
25 |
26 | The Standard also ensures that developers can be informed of the
27 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member
28 | and free-functions. This is important because programmers may want to select
29 | algorithm implementations, or even select algorithms, based on this
30 | knowledge. Developers are equally likely to do so for correctness and
31 | performance reasons.
32 |
33 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.**
34 |
35 | There is poor support for static determination of lock-freedom guarantees.
36 |
37 | At the present time the Standard has limited support in this domain: the
38 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the
39 | corresponding atomic type is *always* lock-free, sometimes lock-free or never
40 | lock-free, respectively. These macros are little more than a consolation prize
41 | because they do not work with an arbitrary type ``T`` (as the C++ native
42 | ``std::atomic`` library intends) and they leave adaptation for generic
43 | programming entirely up to the developer.
44 |
45 | This leads to the present, counter-intuitive state of the art whereby
46 | non-traditional uses of C++ have better support than high-performance
47 | computing. We aim to make the smallest possible change that improves the
48 | situation for HPC while leaving all other uses untouched.
49 |
50 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is
51 | suitable for use with SFINAE and ``static_assert``.
52 |
53 | -----------------
54 | Proposed addition
55 | -----------------
56 |
57 | Under 29.5 Atomic types [**atomics.types.generic**]:
58 |
59 | .. code-block:: c++
60 |
61 | namespace std {
62 | template struct atomic {
63 | static constexpr bool is_always_lock_free = /* implementation-defined */;
64 | // Omitting all other members for brevity.
65 | };
66 | template <> struct atomic {
67 | static constexpr bool is_always_lock_free = /* implementation-defined */;
68 | // Omitting all other members for brevity.
69 | };
70 | template struct atomic {
71 | static constexpr bool is_always_lock_free = /* implementation-defined */;
72 | // Omitting all other members for brevity.
73 | };
74 | }
75 |
76 | After paragraph 2:
77 |
78 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's
79 | operations are always lock-free, and false otherwise. The value of
80 | ``is_always_lock_free`` shall be consistent with the value of the corresponding
81 | ``ATOMIC_..._LOCK_FREE`` macro, if defined.
82 |
83 | Under 29.6.5 Requirements for operations on atomic types
84 | [**atomics.types.operations.req**], in paragraph 7:
85 |
86 | The return value of the ``is_lock_free`` member function shall be consistent
87 | with the value of ``is_always_lock_free`` for the same type.
88 |
89 | [*Example:* the following should never fail,
90 |
91 | .. code-block:: c++
92 |
93 | if (atomic::is_always_lock_free)
94 | assert(atomic().is_lock_free());
95 |
96 | — *end example*]
97 |
98 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added.
99 |
100 | -------------------
101 | Additional material
102 | -------------------
103 |
104 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions
105 | (which the ``is_lock_free`` functions have) because these require a
106 | pointer. This makes the free functions significantly less useful as compile-time
107 | ``constexpr``.
108 |
109 | We show a sample implementation:
110 |
111 | .. literalinclude:: N4509.cc
112 | :language: c++
113 | :lines: 4-48
114 |
--------------------------------------------------------------------------------
/source/P0908r0.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: Offsetof for Pointers to Members
3 | Shortname: P0908
4 | Revision: 0
5 | Audience: EWG
6 | Status: P
7 | Group: WG21
8 | Editor: Eddie Kohler, Harvard, kohler@seas.harvard.edu
9 | URL: https://wg21.link/P0908r0
10 | Abstract: The offsetof macro should support pointers to members.
11 | Markup Shorthands: markdown yes
12 |
13 |
14 | The `offsetof` macro, inherited from C and applicable to standard-layout
15 | classes (and, conditionally, other classes) in C++, calculates the layout
16 | offset of a member within a class. `offsetof` is useful for calculating an
17 | object pointer given a pointer to one of its members:
18 |
19 |
20 |
21 | struct link {
22 | ...
23 | };
24 |
25 | struct container {
26 | link l;
27 | };
28 |
29 | container* container_from_link(link* x) {
30 | // x is known to be the .l part of some container
31 | uintptr_t x_address = reinterpret_cast(x);
32 | size_t l_offset = offsetof(container, l);
33 | return reinterpret_cast(x_address - l_offset);
34 | }
35 |
36 |
37 |
38 | This pattern is used in several implementations of intrusive containers, such
39 | as Linux kernel linked lists (`struct list_head`).
40 |
41 | Unfortunately, although `offsetof` works for some unusual
42 | member-designators, it does not work for pointers to members. This won’t
43 | compile:
44 |
45 |
46 |
47 | template
48 | Container* generic_container_from_link(Link* x) {
49 | uintptr_t x_address = reinterpret_cast(x);
50 | size_t link_offset = offsetof(Container, member); // error!
51 | return reinterpret_cast(x_address - link_offset);
52 | }
53 |
54 |
55 |
56 | Programmers currently compute pointer-to-member offsets using `nullptr` casts
57 | (i.e., the incorrect folk implementation of `offsetof`, which invokes
58 | undefined behavior), or by jumping through other hoops:
59 |
60 |
61 |
62 | template
63 | Container* generic_container_from_link(Link* x) {
64 | ...
65 | alignas(Container) char container_space[sizeof(Container)] = {};
66 | Container* fake_container = reinterpret_cast(container_space);
67 | size_t link_offset = reinterpret_cast(&(fake_container->*member))
68 | - reinterpret_cast(fake_container);
69 | ...
70 | }
71 |
72 |
73 |
74 | `offsetof` with pointer-to-member member-designators should simply work.
75 | Modern compilers implement `offsetof` using an extension (`__builtin_offsetof`
76 | in GCC and LLVM), so implementation need not require library changes. To avoid
77 | ambiguity, we propose this syntax:
78 |
79 |
80 |
81 | size_t link_offset = offsetof(Container, .*member);
82 |
83 |
84 |
85 |
86 | Questions {#qq}
87 | =========
88 |
89 | Must a pointer-to-member expression in an `offsetof` member-designator be a
90 | constant expression (such as a template argument)? The C standard requires
91 | that “the expression `&(t.member-designator)` evaluates to an address
92 | constant,” which might make this code illegal:
93 |
94 |
95 |
96 | struct container {
97 | char array[200];
98 | };
99 |
100 | int index = /* dynamic value */;
101 | size_t offset = offsetof(container, array[index]); // questionable
102 |
103 |
104 |
105 | But since several current compilers accept dynamic array indexes, the proposed
106 | wording allows any pointer to member.
107 |
108 |
109 | Proposed Wording {#word}
110 | ================
111 |
112 | In Sizes, alignments, and offsets [**support.types.layout**], modify the first
113 | sentence of ❡1 as follows:
114 |
115 |
116 |
117 | The macro `offsetof(type, member-designator)` has the same semantics as the
118 | corresponding macro in the C standard library header ``, but accepts
119 | a restricted set of `type` arguments and a superset of
120 | `member-designator` arguments in this International Standard.
121 |
122 |
123 |
124 | Add this paragraph after ❡1:
125 |
126 |
127 |
128 | An `offsetof` `member-designator` may contain pointer-to-member
129 | expressions as well as `member-designators` acceptable in C. A
130 | `member-designator` may begin with a prefix `.` or `.*` operator (e.g.,
131 | `offsetof(type, .member_name)` or `offsetof(type, .*pointer_to_member)`). If
132 | the prefix operator is omitted, `.` is assumed.
133 |
134 |
135 |
136 |
137 | Example online discussions of the issue {#disc}
138 | =======================================
139 |
140 | * [LLVMdev] Evaluation of offsetof() macro
141 | * Working around offsetof limitations in C++
142 |
--------------------------------------------------------------------------------
/source/P0152R1.rst:
--------------------------------------------------------------------------------
1 | ====================================================
2 | P0152R1 ``constexpr atomic::is_always_lock_free``
3 | ====================================================
4 |
5 | :Author: Olivier Giroux
6 | :Contact: ogiroux@nvidia.com
7 | :Author: JF Bastien
8 | :Contact: jfb@google.com
9 | :Author: Jeff Snyder
10 | :Contact: jeff-isocpp@caffeinated.me.uk
11 | :Date: 2016-03-02
12 | :Previous: http://wg21.link/N4509
13 | :Previous: http://wg21.link/P0152R0
14 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0152R1.rst
15 | :Source: https://github.com/jfbastien/papers/blob/master/source/P0152.cc
16 |
17 | The current design for ``std::atomic`` affords implementations the critical
18 | freedom to revert to critical sections when hardware support for atomic
19 | operations does not meet the size or semantic requirements for the associated
20 | type ``T``. This:
21 |
22 | * Preserves C++ support on aging hardware.
23 | * Supports developers who don't target a specific architecture e.g. with the
24 | ``-march=xxx`` flag.
25 | * Improves the portability of abstract representations for C++ programs,
26 | e.g. when compiling C++ code to execute portably within a web browser.
27 |
28 | The Standard also ensures that developers can be informed of the
29 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member
30 | and free-functions. This is important because programmers may want to select
31 | algorithm implementations, or even select algorithms, based on this
32 | knowledge. Developers are equally likely to do so for correctness and
33 | performance reasons.
34 |
35 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.**
36 |
37 | There is poor support for static determination of lock-freedom guarantees.
38 |
39 | At the present time the Standard has limited support in this domain: the
40 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the
41 | corresponding atomic type is *always* lock-free, sometimes lock-free or never
42 | lock-free, respectively. These macros are little more than a consolation prize
43 | because they do not work with an arbitrary type ``T`` (as the C++ native
44 | ``std::atomic`` library intends) and they leave adaptation for generic
45 | programming entirely up to the developer.
46 |
47 | This leads to the present, counter-intuitive state of the art whereby
48 | non-traditional uses of C++ have better support than high-performance
49 | computing. We aim to make the smallest possible change that improves the
50 | situation for HPC while leaving all other uses untouched.
51 |
52 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is
53 | suitable for use with SFINAE and ``static_assert``.
54 |
55 | -----------------
56 | Proposed addition
57 | -----------------
58 |
59 | Under 29.5 Atomic types [**atomics.types.generic**]:
60 |
61 | .. code-block:: c++
62 |
63 | namespace std {
64 | template struct atomic {
65 | static constexpr bool is_always_lock_free = implementation-defined;
66 | // Omitting all other members for brevity.
67 | };
68 | template <> struct atomic {
69 | static constexpr bool is_always_lock_free = implementation-defined;
70 | // Omitting all other members for brevity.
71 | };
72 | template struct atomic {
73 | static constexpr bool is_always_lock_free = implementation-defined;
74 | // Omitting all other members for brevity.
75 | };
76 | }
77 |
78 | Under 29.6.5 Requirements for operations on atomic types
79 | [**atomics.types.operations.req**], between paragraphs 6 and 7:
80 |
81 | .. code-block:: c++
82 |
83 | static constexpr bool is_always_lock_free = implementation-defined;
84 |
85 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's
86 | operations are always lock-free, and false otherwise.
87 |
88 | [*Note:* The value of ``is_always_lock_free`` is consistent with the value of
89 | the corresponding ``ATOMIC_..._LOCK_FREE`` macro, if defined. — *end note*]
90 |
91 | Under 29.6.5 Requirements for operations on atomic types
92 | [**atomics.types.operations.req**], in paragraph 7:
93 |
94 | [*Note:* The return value of the ``is_lock_free`` member function is consistent
95 | with the value of ``is_always_lock_free`` for the same type. — *end note*]
96 |
97 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added.
98 |
99 | -------------------
100 | Additional material
101 | -------------------
102 |
103 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions
104 | (which the ``is_lock_free`` functions have) because these require a
105 | pointer. This makes the free functions significantly less useful as compile-time
106 | ``constexpr``.
107 |
108 | We show a sample implementation:
109 |
110 | .. literalinclude:: P0152.cc
111 | :language: c++
112 | :lines: 4-48
113 |
--------------------------------------------------------------------------------
/source/P0152R0.rst:
--------------------------------------------------------------------------------
1 | ====================================================
2 | P0152R0 ``constexpr atomic::is_always_lock_free``
3 | ====================================================
4 |
5 | :Author: Olivier Giroux
6 | :Contact: ogiroux@nvidia.com
7 | :Author: JF Bastien
8 | :Contact: jfb@google.com
9 | :Author: Jeff Snyder
10 | :Contact: jeff-isocpp@caffeinated.me.uk
11 | :Date: 2015-10-21
12 | :Previous: http://wg21.link/N4509
13 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0152R0.rst
14 | :Source: https://github.com/jfbastien/papers/blob/master/source/P0152.cc
15 |
16 | The current design for ``std::atomic`` affords implementations the critical
17 | freedom to revert to critical sections when hardware support for atomic
18 | operations does not meet the size or semantic requirements for the associated
19 | type ``T``. This:
20 |
21 | * Preserves C++ support on aging hardware.
22 | * Supports developers who don't target a specific architecture e.g. with the
23 | ``-march=xxx`` flag.
24 | * Improves the portability of abstract representations for C++ programs,
25 | e.g. when compiling C++ code to execute portably within a web browser.
26 |
27 | The Standard also ensures that developers can be informed of the
28 | implementation's lock-freedom guarantees, by using the ``is_lock_free()`` member
29 | and free-functions. This is important because programmers may want to select
30 | algorithm implementations, or even select algorithms, based on this
31 | knowledge. Developers are equally likely to do so for correctness and
32 | performance reasons.
33 |
34 | **The software design shipped in C++11 and C++14 is, however, somewhat sandbagged.**
35 |
36 | There is poor support for static determination of lock-freedom guarantees.
37 |
38 | At the present time the Standard has limited support in this domain: the
39 | ``ATOMIC_..._LOCK_FREE`` macros that return ``2``, ``1`` or ``0`` if the
40 | corresponding atomic type is *always* lock-free, sometimes lock-free or never
41 | lock-free, respectively. These macros are little more than a consolation prize
42 | because they do not work with an arbitrary type ``T`` (as the C++ native
43 | ``std::atomic`` library intends) and they leave adaptation for generic
44 | programming entirely up to the developer.
45 |
46 | This leads to the present, counter-intuitive state of the art whereby
47 | non-traditional uses of C++ have better support than high-performance
48 | computing. We aim to make the smallest possible change that improves the
49 | situation for HPC while leaving all other uses untouched.
50 |
51 | We propose a ``static constexpr`` complement of ``is_lock_free()`` that is
52 | suitable for use with SFINAE and ``static_assert``.
53 |
54 | -----------------
55 | Proposed addition
56 | -----------------
57 |
58 | Under 29.5 Atomic types [**atomics.types.generic**]:
59 |
60 | .. code-block:: c++
61 |
62 | namespace std {
63 | template struct atomic {
64 | static constexpr bool is_always_lock_free = implementation-defined;
65 | // Omitting all other members for brevity.
66 | };
67 | template <> struct atomic {
68 | static constexpr bool is_always_lock_free = implementation-defined;
69 | // Omitting all other members for brevity.
70 | };
71 | template struct atomic {
72 | static constexpr bool is_always_lock_free = implementation-defined;
73 | // Omitting all other members for brevity.
74 | };
75 | }
76 |
77 | Under 29.6.5 Requirements for operations on atomic types
78 | [**atomics.types.operations.req**], between paragraphs 6 and 7:
79 |
80 | .. code-block:: c++
81 |
82 | static constexpr bool is_always_lock_free = implementation-defined;
83 |
84 | The ``static`` data member ``is_always_lock_free`` is true if the atomic type's
85 | operations are always lock-free, and false otherwise. The value of
86 | ``is_always_lock_free`` shall be consistent with the value of the corresponding
87 | ``ATOMIC_..._LOCK_FREE`` macro, if defined.
88 |
89 | Under 29.6.5 Requirements for operations on atomic types
90 | [**atomics.types.operations.req**], in paragraph 7:
91 |
92 | The return value of the ``is_lock_free`` member function shall be consistent
93 | with the value of ``is_always_lock_free`` for the same type.
94 |
95 | [*Example:* The following should never fail
96 |
97 | .. code-block:: c++
98 |
99 | if (atomic::is_always_lock_free)
100 | assert(atomic().is_lock_free());
101 |
102 | — *end example*]
103 |
104 | The ``__cpp_lib_atomic_is_always_lock_free`` feature test macro should be added.
105 |
106 | -------------------
107 | Additional material
108 | -------------------
109 |
110 | We did not provide the ``atomic_is_always_lock_free`` C-style free functions
111 | (which the ``is_lock_free`` functions have) because these require a
112 | pointer. This makes the free functions significantly less useful as compile-time
113 | ``constexpr``.
114 |
115 | We show a sample implementation:
116 |
117 | .. literalinclude:: P0152.cc
118 | :language: c++
119 | :lines: 4-48
120 |
--------------------------------------------------------------------------------
/source/Math.signbit.bs:
--------------------------------------------------------------------------------
1 |
14 |
15 | IEEE 754 has a precise meaning for *sign bit*. JavaScript's `Math.sign` falls
16 | short on `-0.0` and `+0.0`. This is a shortcoming of a "batteries included"
17 | approach to language design.
18 |
19 | Correctly obtaining the sign bit of a Number in JavaScript is somewhat
20 | unintuitive: the naïve `x < 0` approach fails if `x` is `-0.0` because `0.0` and
21 | `-0.0` compare equal to each other.
22 |
23 | One can instead rely on division by zero returning one of `-Infinity` or
24 | `+Infinity`: `1.0 / x < 0`. This now has the interesting caveat of returning
25 | `NaN` if `x` was `NaN`. It's also highly counter-intuitive.
26 |
27 | JavaScript aficionado will know that `Object.is(-0, x)` will return `true` when
28 | `x` is `-0` but not when it's `0`. This is surprising for developers who are
29 | more numerics-oriented than object-—dare I say prototype-?—oriented. These
30 | developers just want the sign bit, IEEE 754 has a very precise definition of
31 | what the sign bit is, and why can't JavaScript just give them the sign bit?
32 |
33 | This issue [has been discussed previously](https://esdiscuss.org/topic/math-sign-vs-0)
34 | but was never addressed. We believe that this proposal can fix this
35 | oft-encountered problem once and for all.
36 |
37 |
38 | Revision History {#rev}
39 | ================
40 |
41 | * Presented at the [2017-01](https://github.com/tc39/agendas/blob/master/2017/01.md) TC39 meeting and moved to Stage 1.
42 |
43 |
44 | Background {#bg}
45 | ==========
46 |
47 | IEEE 754 {#ieee754}
48 | --------
49 |
50 | [[IEEE754]] section 5.5.1 defines *sign bit operations*. These operations are
51 | quiet-computational operations which only affect the sign bit of the arithmetic
52 | format. The operations treat floating-point numbers and NaNs alike, and signal
53 | no exception. As defined, they may propagate non-canonical encodings.
54 |
55 | The following operations are defined:
56 |
57 | * `copy`
58 | * `negate`
59 | * `abs`
60 |
61 | C / C++ {#cpp}
62 | -------
63 |
64 | [[C]] and [[Cpp]] define `signbit` in `` and `` respectively. It
65 | returns a nonzero `int` value if and only if the sign of its argument value is
66 | negative. The `signbit` macro reports the sign of all values, including
67 | infinities, zeros, and NaNs.
68 |
69 | Go {#go}
70 | ---
71 |
72 | [[Go]]'s math package defines `Signbit` as `true` if `x` is negative or negative
73 | zero. While the specification is silent on NaN,
74 | [the implementation](https://golang.org/src/math/signbit.go) clearly extracts the
75 | sign bit regardless of NaN-ness.
76 |
77 | `Math.sign(x)` {#sign}
78 | -----------
79 |
80 | JavaScript provides `Math.sign` which is specified as follows:
81 |
82 |
83 |
84 | Returns the sign of the x, indicating whether x is positive, negative or zero.
85 |
86 | * If `x` is `NaN`, the result is `NaN`.
87 | * If `x` is `-0`, the result is `-0`.
88 | * If `x` is `+0`, the result is `+0`.
89 | * If `x` is negative and not `-0`, the result is `-1`.
90 | * If `x` is positive and not `+0`, the result is `+1`.
91 |
92 |
93 |
94 | This falls short when dealing with `-0` and `+0` since these values both compare
95 | equal.
96 |
97 |
98 | Proposal {#proposal}
99 | ========
100 |
101 | Given existing precedent as well as common hardware support, we propose adding
102 | `Math.signbit` to JavaScript as follows.
103 |
104 | `Math.signbit(x)` {#spec}
105 | -----------------
106 |
107 | Returns whether the sign bit of `x` is set.
108 |
109 | 1. If `x` is `NaN`, the result is `false`.
110 | 1. If `x` is `-0`, the result is `true`.
111 | 1. If `x` is negative, the result is `true`.
112 | 1. Otherwise, the result is `false`.
113 |
114 | Note: The "Function Properties of the Math Object" section already states:
115 | "Each of the following `Math` object functions applies the `ToNumber` abstract
116 | operation to each of its argument."
117 |
118 | Alternatives {#alts}
119 | ------------
120 |
121 | This proposal makes decisions which TC39 may want to consider modifying:
122 |
123 | * Coercison `ToNumber`.
124 | * The return type is Boolean.
125 | * NaN is equivalent to a positive number.
126 |
127 |
128 |
16 |
17 | Background {#bg}
18 | ==========
19 |
20 | Low-level code often seeks to interpret objects of one type as another: keep the
21 | same bits, but obtain an object of a different type. Doing so correctly is
22 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing
23 | rules yet these are the intuitive solutions developers mistakenly turn to.
24 |
25 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment
26 | pitfalls and allowing them to bit-cast non-default-constructible types.
27 |
28 | This facility inevitably ends up being used incorrectly on pointer types, we
29 | propose using appropriate concepts to prevent misuse. As our sample
30 | implementation demonstrates we could as well use `static_assert` or template
31 | SFINAE, but the timing of this library feature will likely coincide with
32 | concept's standardization.
33 |
34 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast
35 | function, as `memcpy` itself isn't `constexpr`. Marking our proposed function as
36 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`. This
37 | leaves implementations free to use their own internal solution (e.g. LLVM has a `bitcast`
39 | opcode).
40 |
41 | We propose to standardize this oft-used idiom, and avoid the pitfalls once and
42 | for all.
43 |
44 | Proposed Wording {#word}
45 | ================
46 |
47 | Below, substitute the `�` character with a number the editor finds appropriate
48 | for the sub-section.
49 |
50 | Synopsis {#syn}
51 | --------
52 |
53 | Under 20.2 Header `` synopsis [**utility**]:
54 |
55 |
56 | namespace std {
57 | // ...
58 |
59 | // 20.2.� bit-casting:
60 | template
61 | requires
62 | sizeof(To) == sizeof(From) &&
63 | is_trivially_copyable_v &&
64 | is_trivially_copyable_v &&
65 | is_standard_layout_v &&
66 | is_standard_layout_v &&
67 | !(is_pointer_v &&
68 | is_pointer_v) &&
69 | !(is_member_pointer_v &&
70 | is_member_pointer_v) &&
71 | !(is_member_object_pointer_v &&
72 | is_member_object_pointer_v) &&
73 | !(is_member_function_pointer_v &&
74 | is_member_function_pointer_v)
75 | constexpr To bit_cast(const From& from) noexcept;
76 |
77 | // ...
78 | }
79 |
80 |
81 | Details {#det}
82 | -------
83 |
84 | Under 20.2.`�` Bit-casting [**utility.bitcast**]:
85 |
86 |
87 | template
88 | requires
89 | sizeof(To) == sizeof(From) &&
90 | is_trivially_copyable_v &&
91 | is_trivially_copyable_v &&
92 | is_standard_layout_v &&
93 | is_standard_layout_v &&
94 | !(is_pointer_v &&
95 | is_pointer_v) &&
96 | !(is_member_pointer_v &&
97 | is_member_pointer_v) &&
98 | !(is_member_object_pointer_v &&
99 | is_member_object_pointer_v) &&
100 | !(is_member_function_pointer_v &&
101 | is_member_function_pointer_v)
102 | constexpr To bit_cast(const From& from) noexcept;
103 |
104 |
105 | 1. Requires: `sizeof(To) == sizeof(From)`,
106 | `is_trivially_copyable_v` is `true`,
107 | `is_trivially_copyable_v` is `true`,
108 | `is_standard_layout_v` is `true`,
109 | `is_standard_layout_v` is `true`,
110 | `is_pointer_v && is_pointer_v` is `false`,
111 | `is_member_pointer_v && is_member_pointer_v` is `false`,
112 | `is_member_object_pointer_v && is_member_object_pointer_v` is `false`,
113 | `is_member_function_pointer_v && is_member_function_pointer_v` is `false`.
114 |
115 | 2. Returns: an object of type `To` whose object representation is equal
116 | to the object representation of `From`. If multiple object
117 | representations could represent the value
118 | representation of `From`, then it is unspecified which `To`
119 | value is returned. If no value representation corresponds
120 | to `To`'s object representation then the returned value is
121 | unspecified.
122 |
123 | Feature testing {#test}
124 | ---------------
125 |
126 | The `__cpp_lib_bit_cast` feature test macro should be added.
127 |
128 | Appendix {#appendix}
129 | ========
130 |
131 | The Standard's [**basic.types**] section explicitly blesses `memcpy`:
132 |
133 |
134 |
135 | For any trivially copyable type `T`, if two pointers to `T` point to distinct
136 | `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class
137 | subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into
138 | `obj2`, `obj2` shall subsequently hold the same value as `obj1`.
139 |
140 | [*Example:*
141 | ```
142 | T* t1p;
143 | T* t2p;
144 | // provided that t2p points to an initialized object ...
145 | std::memcpy(t1p, t2p, sizeof(T));
146 | // at this point, every subobject of trivially copyable type in *t1p contains
147 | // the same value as the corresponding subobject in *t2p
148 | ```
149 | — *end example*]
150 |
151 |
156 |
157 | In a union, at most one of the non-static data members can be
158 | active at any time, that is, the value of at most one of the
159 | non-static data members can be stored in a union at any time.
160 |
161 |
162 |
163 | Acknowledgement {#ack}
164 | ===============
165 |
166 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review
167 | and suggested improvements.
168 |
--------------------------------------------------------------------------------
/source/p1102r0.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: Down with ()!
3 | Shortname: P1102
4 | Revision: 0
5 | Audience: CWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P1102R0
9 | !Source: https://github.com/jfbastien/papers/blob/master/source/p1102r0.bs
10 | Editor: Alex Christensen, Apple, achristensen@apple.com
11 | Editor: JF Bastien, Apple, jfbastien@apple.com
12 | Abstract: A proposal for removing unnecessary ()'s from C++ lambdas.
13 | Date: 2018-06-20
14 | Markup Shorthands: markdown yes
15 |
16 |
17 | Introduction and motivation {#intro}
18 | ===========================
19 |
20 | Currently, C++ lambdas with no parameters do not require a parameter declaration
21 | clause. The specification even contains this language in [**expr.prim.lambda**]
22 | section 8.4.5 ❡4:
23 |
24 | > If a lambda-expression does not include a lambda-declarator, it is as if the
25 | > lambda-declarator were `()`.
26 |
27 | This allows us to omit the unused `()` in simple lambdas such as this:
28 |
29 |
30 | std::string s1 = "abc";
31 | auto withParen = [s1 = std::move(s1)] () {
32 | std::cout << s1 << '\n';
33 | };
34 |
35 | std::string s2 = "abc";
36 | auto noSean = [s2 = std::move(s2)] { // Note no syntax error.
37 | std::cout << s2 << '\n';
38 | };
39 |
40 |
41 | These particular lambdas have ownership of the strings, so they ought to be able
42 | to mutate it, but `s1` and `s2` are const (because the `const` operator is
43 | declared `const` by default) so we need to add the `mutable` keyword:
44 |
45 |
46 | std::string s1 = "abc";
47 | auto withParen = [s1 = std::move(s1)] () mutable {
48 | s1 += "d";
49 | std::cout << s1 << '\n';
50 | };
51 |
52 | std::string s2 = "abc";
53 | auto noSean = [s2 = std::move(s2)] mutable { // Currently a syntax error.
54 | s2 += "d";
55 | std::cout << s2 << '\n';
56 | };
57 |
58 |
59 | Confusingly, the current Standard requires the empty parens when using the
60 | `mutable` keyword. This rule is unintuitive, causes common syntax errors, and
61 | clutters our code. When compiling with clang, we even get a syntax error that
62 | indicates the compiler knows exactly what is going on:
63 |
64 |
65 | example.cpp:11:54: error: lambda requires '()' before 'mutable'
66 | auto noSean = [s2 = std::move(s2)] mutable { // Currently a syntax error.
67 | ^
68 | ()
69 | 1 error generated.
70 |
71 |
72 | This proposal would make these parentheses unnecessary like they were before we
73 | added `mutable`. This will apply to:
74 |
75 | * lambda template parameters
76 | * `constexpr`
77 | * `mutable`
78 | * Exception specifications and `noexcept`
79 | * attributes
80 | * trailing return types
81 | * `requires`
82 |
83 | EWG discussed this change as [[EWG135]]
84 | in [Lenexa](http://wiki.edg.com/bin/view/Wg21lenexa/EWGIssuesResolutionMinutes)
85 | and voted 15 to 1 on forwarding to core. It became [[CWG2121]], discussed
86 | in
87 | [Kona](http://wiki.edg.com/bin/view/Wg21kona2015/CoreWorkingGroup#CWG_2121_More_flexible_lambda_sy) and
88 | needed someone to volunteer wording.
89 |
90 | This paper was discussed on the EWG reflector in June, Nina Ranns provided
91 | feedback, and EWG chair agreed that the paper should move to CWG directly given
92 | previous polls.
93 |
94 |
95 | Impact {#impact}
96 | ======
97 |
98 | This change will not break existing code.
99 |
100 |
101 | Wording {#word}
102 | =======
103 |
104 | Modify Lambda expressions [**expr.prim.lambda**] as follows:
105 |
106 |
131 |
132 | If a *lambda-expression**lambda-declarator* does not
133 | include a *lambda-declarator*`(` *parameter-declaration-clause*
134 | `)`, it is as if the *lambda-declarator*`(`
135 | *parameter-declaration-clause* `)` were `()`. The lambda return type is
136 | `auto`, which is replaced by the type specified by the *trailing-return-type* if
137 | provided and/or deduced from `return` statements as described in 10.1.7.4.
138 |
139 |
144 |
145 | The return type and function parameters of the function call operator template
146 | are derived from the *lambda-expression*'s *trailing-return-type* and
147 | *parameter-declaration-clause* by replacing each occurrence of `auto` in the
148 | *decl-specifier*s of the *parameter-declaration-clause* with the name of the
149 | corresponding invented *template-parameter*. The *requires-clause* of the
150 | function call operator template is the *requires-clause* immediately following
151 | `<` *template-parameter-list* `>`, if any. The trailing *requires-clause* of
152 | the function call operator or operator template is the *requires-clause*
153 | following the *lambda-declarator*, if any.
154 |
155 |
156 |
157 | Note: The first sentence can remain as-is because the modification to
158 | **[expr.prim.lambda**] ❡4 create an empty *parameter-declaration-clause* if
159 | none is provided. Similarly, the second and third sentences bind the
160 | *requires-clause* unambiguously.
161 |
--------------------------------------------------------------------------------
/source/P0476r1.bs:
--------------------------------------------------------------------------------
1 |
16 |
17 |
18 | This paper is a revision of [[P0476r0]], addressing LEWG comments from the 2016
19 | Issaquah meeting. See [[#rev]] for details.
20 |
21 |
22 | Background {#bg}
23 | ==========
24 |
25 | Low-level code often seeks to interpret objects of one type as another: keep the
26 | same bits, but obtain an object of a different type. Doing so correctly is
27 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing
28 | rules yet these are the intuitive solutions developers mistakenly turn to.
29 |
30 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment
31 | pitfalls and allowing them to bit-cast non-default-constructible types.
32 |
33 | This proposal uses appropriate concepts to prevent misuse. As the sample
34 | implementation demonstrates we could as well use `static_assert` or template
35 | SFINAE, but the timing of this library feature will likely coincide with
36 | concept's standardization.
37 |
38 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast
39 | function, as `memcpy` itself isn't `constexpr`. Marking the proposed function as
40 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`. This
41 | leaves implementations free to use their own internal solution (e.g. LLVM has a `bitcast`
43 | opcode).
44 |
45 | We should standardize this oft-used idiom, and avoid the pitfalls once and for
46 | all.
47 |
48 |
49 | Proposed Wording {#word}
50 | ================
51 |
52 | Below, substitute the `�` character with a number the editor finds appropriate
53 | for the sub-section.
54 |
55 | Synopsis {#syn}
56 | --------
57 |
58 | Under 20.2 Header `` synopsis [**utility**]:
59 |
60 |
61 | namespace std {
62 | // ...
63 |
64 | // 20.2.� bit-casting:
65 | template
66 | requires
67 | sizeof(To) == sizeof(From) &&
68 | is_trivially_copyable_v &&
69 | is_trivially_copyable_v
70 | constexpr To bit_cast(const From& from) noexcept;
71 |
72 | // ...
73 | }
74 |
75 |
76 | Details {#det}
77 | -------
78 |
79 | Under 20.2.`�` Bit-casting [**utility.bitcast**]:
80 |
81 |
82 | template
83 | requires
84 | sizeof(To) == sizeof(From) &&
85 | is_trivially_copyable_v &&
86 | is_trivially_copyable_v
87 | constexpr To bit_cast(const From& from) noexcept;
88 |
89 |
90 | 1. Requires: `sizeof(To) == sizeof(From)`,
91 | `is_trivially_copyable_v` is `true`,
92 | `is_trivially_copyable_v` is `true`.
93 |
94 | 2. Returns: an object of type `To` whose object representation is equal
95 | to the object representation of `From`. If multiple object
96 | representations could represent the value
97 | representation of `From`, then it is unspecified which `To`
98 | value is returned. If no value representation corresponds
99 | to `To`'s object representation then the returned value is
100 | unspecified.
101 |
102 | Feature testing {#test}
103 | ---------------
104 |
105 | The `__cpp_lib_bit_cast` feature test macro should be added.
106 |
107 | Appendix {#appendix}
108 | ========
109 |
110 | The Standard's [**basic.types**] section explicitly blesses `memcpy`:
111 |
112 |
113 |
114 | For any trivially copyable type `T`, if two pointers to `T` point to distinct
115 | `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class
116 | subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into
117 | `obj2`, `obj2` shall subsequently hold the same value as `obj1`.
118 |
119 | [*Example:*
120 | ```
121 | T* t1p;
122 | T* t2p;
123 | // provided that t2p points to an initialized object ...
124 | std::memcpy(t1p, t2p, sizeof(T));
125 | // at this point, every subobject of trivially copyable type in *t1p contains
126 | // the same value as the corresponding subobject in *t2p
127 | ```
128 | — *end example*]
129 |
130 |
135 |
136 | In a union, at most one of the non-static data members can be
137 | active at any time, that is, the value of at most one of the
138 | non-static data members can be stored in a union at any time.
139 |
140 |
141 |
142 |
143 | Revision History {#rev}
144 | ================
145 |
146 | r0 ➡ r1 {#r0r1}
147 | --------
148 |
149 | The paper was reviewed by LEWG at the 2016 Issaquah meeting:
150 |
151 | * Remove the standard layout requirement—trivially copyable suffices for the `memcpy` requirement.
152 | * We discussed removing `constexpr`, but there was no consent either way. There was some suggestion that it’ll be hard for implementers, but there's also some desire (by the same implementers) to have those features available in order to support things like `constexpr` instances of `std::variant`.
153 | * The pointer-forbidding logic was removed. It was initially there to help developers when a better tool is available, but it's easily worked around (e.g. with a `struct` containing a pointer). Note that this doesn't prevent `constexpr` versions of `bit_cast`: the implementation is allowed to error out on `bit_cast` of pointer.
154 | * Some discussion about concepts-usage, but it seems like mostly an LWG issue and we're reasonably sure that concepts will land before this or in a compatible vehicle.
155 |
156 | Straw polls:
157 |
158 | * Do we want to see [[P0476r0]] again? unanimous consent.
159 | * `bit_cast` should allow pointer types in `To` and `From`. **SF F N A SA** 4 5 4 2 1
160 | * `bit_cast` should be `constexpr`? **SF F N A SA** 4 3 7 2 3
161 |
162 |
163 | Acknowledgement {#ack}
164 | ===============
165 |
166 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review
167 | and suggested improvements.
168 |
--------------------------------------------------------------------------------
/source/P0502r0.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: Throwing out of a parallel algorithm terminates—but how?
3 | Shortname: P0502
4 | Revision: 0
5 | Audience: SG1, LWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P0502r0
9 | !Source: github.com/jfbastien/papers/blob/master/source/P0502r0.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Editor: Bryce Adelstein Lelbach, Lawrence Berkeley National Laboratory, balelbach@lbl.gov
12 | Editor: H. Carter Edwards, Sandia National Laboratory, hcedwar@sandia.gov
13 | Abstract: The Committee approves of terminating if exceptions leave parallel algorithms, but where to mandate termination should be updated.
14 | Date: 2016-11-09
15 | Markup Shorthands: markdown yes
16 | Toggle Diffs: yes
17 |
18 |
19 | Background {#bg}
20 | ==========
21 |
22 | The Standard was simplified in [[P0394r4]]: exceptions leaving parallel algorithms lead to `std::terminate()` being called. This matches the behavior of exceptions leaving `main()` as well as `std::thread()`.
23 |
24 | The following National Body comments from [[P0488R0]] were discussed in SG1 at Issaquah, along with [[p0451r0]]:
25 |
26 | * US 15, US 167: Don't `terminate()` when a parallel algorithm exits via uncaught exception and either re-add `exception_list`, add `noexcept` policies + re-add `exception_list`, make it UB or throw an unspecified exception (revert [[P0394r4]]).
27 | * US 17, US 169: Don't `terminate()` when a parallel algorithm exits via uncaught exception and re-add `exception_list` (revert [[P0394r4]]).
28 | * US 16, US 168: Clarify which exception is thrown when a parallel algorithm exits via uncaught exception.
29 | * US 170: Add a customization point for `ExecutionPolicy`s which defines their exception handling behavior (don't re-add `exception_list`).
30 | * CA 17: Preserve the `terminate()`-on-uncaught-exception behavior in the parallel algorithms (keep [[P0394r4]]).
31 |
32 | Straw Polls {#straw}
33 | -----------
34 |
35 | The following straw polls were taken:
36 |
37 | **Straw Poll A:** In 25.2.4 ❡2, have uncaught exception behavior be defined by `ExecutionPolicy`. In 20.19 define the behavior for the three standard policies in C++17 (`seq`, `par`, `par_unseq`) as `terminate()`.
38 |
39 |
40 |
**SF**
**F**
**N**
**A**
**SA**
41 |
Many
7
1
1
0
42 |
43 |
44 | ⟹ Consensus to write a paper for this before the end of the week. Bryce, JF, and Carter will write it.
45 |
46 | **Straw Poll B:** Do we want to rename the policies to reflect the fact that they call `terminate()` instead of throwing exceptions.
47 |
48 |
49 |
**SF**
**F**
**N**
**A**
**SA**
50 |
1
7
9
6
7
51 |
52 |
53 | ⟹ No consensus for change.
54 |
55 | **Straw Poll C:** Beyond the changes from the first straw poll, additional changes are required.
56 |
57 |
58 |
**SF**
**F**
**N**
**A**
**SA**
59 |
2
0
10
11
6
60 |
61 |
62 | ⟹ No consensus for change.
63 |
64 | Action {#boom}
65 | ------
66 |
67 | This paper follows the guidance from *straw poll A*: there is no behavior change, but the behavior is specified to allow future execution policies which exhibit different behavior.
68 |
69 |
70 | Proposed Wording {#word}
71 | ================
72 |
73 | Apply the following edits to section 15.5.1 ❡1 note, bullet 1.13:
74 |
75 |
76 |
77 | 15.5.1 The `std::terminate()` function [**except.terminate**]
78 |
79 | 1. In some situations exception handling must be abandoned for less subtle error handling techniques. [ *Note:* These situations are:
80 |
81 | […]
82 |
83 | (1.13) — for parallel algorithms whose `ExecutionPolicy` specify such behavior (20.19.4, 20.19.5, 20.19.6), when execution of an element access function (25.2.1) of a parallel algorithm exits via an exception (25.2.4), or
84 |
85 | […]
86 |
87 | *— end note* ]
88 |
89 |
90 |
91 | Apply the following edits to section 20.19:
92 |
93 |
94 |
95 | 20.19.4 Sequential execution policy [**execpol.seq**]
96 |
97 | class execution::sequenced_policy { unspecified };
98 |
99 | 1. The class `execution::sequenced_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm’s execution may not be parallelized.
100 | 2. During the execution of a parallel algorithm with the `execution::sequenced_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called.
101 |
102 | 20.19.5 Parallel execution policy [**execpol.par**]
103 |
104 | class execution::parallel_policy { unspecified };
105 |
106 | 1. The class `execution::parallel_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm’s execution may be parallelized.
107 | 2. During the execution of a parallel algorithm with the `execution::parallel_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called.
108 |
109 | 20.19.6 Parallel+Vector execution policy [**execpol.vec**]
110 |
111 | class execution::parallel_unsequenced_policy { unspecified };
112 |
113 | 1. The class `execution::parallel_unsequenced_policy` is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm’s execution may be parallelized and vectorized.
114 | 2. During the execution of a parallel algorithm with the `execution::parallel_unsequenced_policy` policy, if the invocation of an element access function exits via an uncaught exception, `terminate()` shall be called.
115 |
116 |
117 |
118 | Apply the following edits to section 25.2.4 [**algorithms.parallel.exceptions**] ❡2:
119 |
120 |
121 |
122 | During the execution of a parallel algorithm, if the invocation of an element access function exits via an uncaught exception, the behavior is determined by the `ExecutionPolicy`.`terminate()` is called.
123 |
124 |
125 |
126 |
127 | Acknowledgement {#ack}
128 | ===============
129 |
130 | Thank you to all SG1 participants: David Sankel, Alisdair Meredith, Hartmut Kaiser, Pablo Halpern, Jared Hoberock, Michael Wong, Pete Becker. Special thanks to the scribe Paul McKenney.
131 |
--------------------------------------------------------------------------------
/source/P0418r1.bs:
--------------------------------------------------------------------------------
1 |
17 |
18 | Background {#bg}
19 | ==========
20 |
21 | [[LWG2445]] was discussed and resolved by SG1 in Urbana.
22 |
23 | LWG issue #2445 {#issue}
24 | ---------------
25 |
26 |
27 |
28 | The definitions of compare and exchange in [util.smartptr.shared.atomic] ¶32
29 | and [atomics.types.operations.req] ¶21 state:
30 |
31 |
32 |
33 | Requires: The failure argument shall not be `memory_order_release` nor
34 | `memory_order_acq_rel`. The failure argument shall be no stronger than the
35 | success argument.
36 |
37 |
38 |
39 | The term "stronger" isn't defined by the standard.
40 |
41 | It is hinted at by [atomics.types.operations.req] ¶22:
42 |
43 |
44 |
45 | When only one `memory_order` argument is supplied, the value of `success` is
46 | `order`, and the value of `failure` is `order` except that a value of
47 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
48 | and a value of `memory_order_release` shall be replaced by the value
49 | `memory_order_relaxed`.
50 |
51 |
52 |
53 | Should the standard define a partial ordering for memory orders, where consume
54 | and acquire are incomparable with release?
55 |
56 |
57 |
58 | Proposed SG1 resolution from Urbana {#old-res}
59 | -----------------------------------
60 |
61 | Add the following note:
62 |
63 |
64 |
65 | [Note: Memory orders have the following relative strengths implied by their
66 | definitions:
67 |
68 |
82 |
83 | Further issue {#moar}
84 | -------------
85 |
86 | Nonetheless:
87 |
88 | * The resolution isn't on the LWG tracker.
89 | * The proposed note was never moved to the draft Standard.
90 |
91 | Furthermore, the resolution which SG1 came to in Urbana resolves what "stronger"
92 | means by specifying a lattice, but isn't not clear on what "The failure argument
93 | shall be no stronger than the success argument" means given the lattice.
94 |
95 | There is no relationship, "stronger" or otherwise, between release and
96 | consume/acquire. The current wording says "shall be no stronger" which isn't the
97 | same as "shall not be stronger" in this context. Is that on purpose? At a
98 | minimum it's not clear and should be clarified.
99 |
100 | Should the following be valid:
101 |
102 | ```
103 | compare_exchange_strong(x, y, z, memory_order_release, memory_order_acquire);
104 | ```
105 |
106 | Or does the code need to be:
107 |
108 | ```
109 | compare_exchange_strong(x, y, z, memory_order_acq_rel, memory_order_acquire);
110 | ```
111 |
112 | Similar questions can be asked for `memory_order_consume` ordering on `failure`.
113 |
114 | Is there even a point in restricting `success`/`failure` orderings? On
115 | architectures with load-linked/store-conditional instructions the load and store
116 | are distinct instructions which can each have their own memory ordering (with
117 | appropriate leading/trailing fences if required), whereas architectures with
118 | compare-and-exchange already have a limited set of instructions to choose
119 | from. The current limitation (assuming [[LWG2445]] is resolved) only seems to
120 | restrict compilers on load-linked/store-conditional architectures.
121 |
122 | The following code could be valid if the stored data didn't need to be published
123 | nor ordered, whereas any retry needs to read additional data:
124 |
125 | ```
126 | compare_exchange_strong(x, y, z, memory_order_relaxed, memory_order_acquire);
127 | ```
128 |
129 | Even if—for lack of clever instruction—architectures cannot take advantage of
130 | such code, compiler are able to optimize atomics in all sorts of clever ways as
131 | discussed in [[N4455]].
132 |
133 | Updated proposal {#new-res}
134 | ================
135 |
136 | This paper proposes removing the "stronger" restrictions between
137 | compare-exchange's `success` and `failure` ordering, and doesn't add a lattice
138 | to order atomic orderings. The only remaining restriction is that
139 | `memory_order_release` and `memory_order_acq_rel` for `failure` are still
140 | disallowed: a failed compare-exchange doesn't store, the current model is
141 | therefore not sensible with these orderings.
142 |
143 | There have been discussions about `memory_order_release` loads, e.g. for
144 | seqlock. Such potential changes are left up to future papers.
145 |
146 | Modify [util.smartptr.shared.atomic] ¶32 as follows:
147 |
148 |
149 |
150 | Requires: The failure argument shall not be `memory_order_release` nor
151 | `memory_order_acq_rel`. The failure argument shall be no stronger than
152 | the success argument.
153 |
154 |
159 |
160 | Requires: The failure argument shall not be `memory_order_release` nor
161 | `memory_order_acq_rel`. The failure argument shall be no stronger than
162 | the success argument.
163 |
164 |
169 |
170 | Effects: Atomically, compares the contents of the memory pointed to by
171 | `object` or by `this` for equality with that in `expected`, and if `true`,
172 | replaces the contents of the memory pointed to by `object` or by `this` with
173 | that in `desired`, and if `false`, updates the contents of the memory in
174 | `expected` with the contents of the memory pointed to by `object` or by
175 | `this`. Further, if the comparison is `true`, memory is affected according to
176 | the value of `success`, and if the comparison is `false`, memory is affected
177 | according to the value of `failure`.
178 |
179 | When only one `memory_order` argument is supplied, the value of `success` is
180 | `order`, and the value of `failure` is `order` except that a value of
181 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
182 | and a value of `memory_order_release` shall be replaced by the value
183 | `memory_order_relaxed`.
184 |
185 | If the operation returns `true`, these operations are atomic read-modify-write
186 | operations (1.10). Otherwise, these operations are atomic load operations.
187 |
188 |
189 |
190 | Acknowledgement {#ack}
191 | ===============
192 |
193 | Thanks to John McCall for pointing out that the proposed resolution was still
194 | insufficient, and for providing ample feedback.
195 |
--------------------------------------------------------------------------------
/source/P0418r2.bs:
--------------------------------------------------------------------------------
1 |
17 |
18 | Background {#bg}
19 | ==========
20 |
21 | [[LWG2445]] was discussed and resolved by SG1 in Urbana.
22 |
23 | This revision updates [[P0418r1]] with accurate wording for
24 | [util.smartptr.shared.atomic] ¶32, to be deleted from [[N4606]].
25 |
26 | LWG issue #2445 {#issue}
27 | ---------------
28 |
29 |
30 |
31 | The definitions of compare and exchange in [util.smartptr.shared.atomic]
32 | ¶32 and [atomics.types.operations.req] ¶21 state:
33 |
34 |
35 |
36 | Requires: The failure argument shall not be `memory_order_release` nor
37 | `memory_order_acq_rel`. The failure argument shall be no stronger than the
38 | success argument.
39 |
40 |
41 |
42 | The term "stronger" isn't defined by the standard.
43 |
44 | It is hinted at by [atomics.types.operations.req] ¶22:
45 |
46 |
47 |
48 | When only one `memory_order` argument is supplied, the value of `success` is
49 | `order`, and the value of `failure` is `order` except that a value of
50 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
51 | and a value of `memory_order_release` shall be replaced by the value
52 | `memory_order_relaxed`.
53 |
54 |
55 |
56 | Should the standard define a partial ordering for memory orders, where consume
57 | and acquire are incomparable with release?
58 |
59 |
60 |
61 | Proposed SG1 resolution from Urbana {#old-res}
62 | -----------------------------------
63 |
64 | Add the following note:
65 |
66 |
67 |
68 | [ *Note:* Memory orders have the following relative strengths implied by their
69 | definitions:
70 |
71 |
85 |
86 | Further issue {#moar}
87 | -------------
88 |
89 | Nonetheless:
90 |
91 | * The resolution isn't on the LWG tracker.
92 | * The proposed note was never moved to the draft Standard.
93 |
94 | Furthermore, the resolution which SG1 came to in Urbana resolves what "stronger"
95 | means by specifying a lattice, but isn't not clear on what "The failure argument
96 | shall be no stronger than the success argument" means given the lattice.
97 |
98 | There is no relationship, "stronger" or otherwise, between release and
99 | consume/acquire. The current wording says "shall be no stronger" which isn't the
100 | same as "shall not be stronger" in this context. Is that on purpose? At a
101 | minimum it's not clear and should be clarified.
102 |
103 | Should the following be valid:
104 |
105 | ```
106 | compare_exchange_strong(x, y, z, memory_order_release, memory_order_acquire);
107 | ```
108 |
109 | Or does the code need to be:
110 |
111 | ```
112 | compare_exchange_strong(x, y, z, memory_order_acq_rel, memory_order_acquire);
113 | ```
114 |
115 | Similar questions can be asked for `memory_order_consume` ordering on `failure`.
116 |
117 | Is there even a point in restricting `success`/`failure` orderings? On
118 | architectures with load-linked/store-conditional instructions the load and store
119 | are distinct instructions which can each have their own memory ordering (with
120 | appropriate leading/trailing fences if required), whereas architectures with
121 | compare-and-exchange already have a limited set of instructions to choose
122 | from. The current limitation (assuming [[LWG2445]] is resolved) only seems to
123 | restrict compilers on load-linked/store-conditional architectures.
124 |
125 | The following code could be valid if the stored data didn't need to be published
126 | nor ordered, whereas any retry needs to read additional data:
127 |
128 | ```
129 | compare_exchange_strong(x, y, z, memory_order_relaxed, memory_order_acquire);
130 | ```
131 |
132 | Even if—for lack of clever instruction—architectures cannot take advantage of
133 | such code, compiler are able to optimize atomics in all sorts of clever ways as
134 | discussed in [[N4455]].
135 |
136 | Updated proposal {#new-res}
137 | ================
138 |
139 | This paper proposes removing the "stronger" restrictions between
140 | compare-exchange's `success` and `failure` ordering, and doesn't add a lattice
141 | to order atomic orderings. The only remaining restriction is that
142 | `memory_order_release` and `memory_order_acq_rel` for `failure` are still
143 | disallowed: a failed compare-exchange doesn't store, the current model is
144 | therefore not sensible with these orderings.
145 |
146 | There have been discussions about `memory_order_release` loads, e.g. for
147 | seqlock. Such potential changes are left up to future papers.
148 |
149 | Modify [util.smartptr.shared.atomic] ¶32 as follows:
150 |
151 |
152 |
153 | Requires: The failure argument shall not be
154 | `memory_order_release`, nor `memory_order_acq_rel`,
155 | or stronger than success.
156 |
157 |
162 |
163 | Requires: The failure argument shall not be `memory_order_release` nor
164 | `memory_order_acq_rel`. The failure argument shall be no stronger than
165 | the success argument.
166 |
167 |
172 |
173 | Effects: Atomically, compares the contents of the memory pointed to by
174 | `object` or by `this` for equality with that in `expected`, and if `true`,
175 | replaces the contents of the memory pointed to by `object` or by `this` with
176 | that in `desired`, and if `false`, updates the contents of the memory in
177 | `expected` with the contents of the memory pointed to by `object` or by
178 | `this`. Further, if the comparison is `true`, memory is affected according to
179 | the value of `success`, and if the comparison is `false`, memory is affected
180 | according to the value of `failure`.
181 |
182 | When only one `memory_order` argument is supplied, the value of `success` is
183 | `order`, and the value of `failure` is `order` except that a value of
184 | `memory_order_acq_rel` shall be replaced by the value `memory_order_acquire`
185 | and a value of `memory_order_release` shall be replaced by the value
186 | `memory_order_relaxed`.
187 |
188 | If the operation returns `true`, these operations are atomic read-modify-write
189 | operations (1.10). Otherwise, these operations are atomic load operations.
190 |
191 |
192 |
193 | Acknowledgement {#ack}
194 | ===============
195 |
196 | Thanks to John McCall for pointing out that the proposed resolution was still
197 | insufficient, and for providing ample feedback.
198 |
--------------------------------------------------------------------------------
/source/p1119r0.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: ABI for std::hardware_{constructive,destructive}_interference_size
3 | Shortname: P1119
4 | Revision: 0
5 | Audience: SG1, LEWG, LWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/d1119r0
9 | !Source: github.com/jfbastien/papers/blob/master/source/p1119r0.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Editor: Olivier Giroux, NVIDIA, ogiroux@nvidia.com
12 | Editor: Jonathan Wakely, RedHat, cxx@kayari.org
13 | Editor: Hal Finkel, Argonne National Laboratory, hfinkel@anl.gov
14 | Editor: Thomas Rodgers, RedHat, trodgers@redhat.com
15 | Editor: Matthias Kretz, GSI, m.kretz@gsi.de
16 | Abstract: std::hardware_{constructive,destructive}_interference_size exposes potential ABI issues, and that's OK. This position paper clarifies the committee's position.
17 | Date: 2018-06-22
18 | Markup Shorthands: markdown yes
19 |
47 |
48 | inline constexpr size_t hardware_destructive_interference_size = implementation-defined;
49 |
50 | This number is the minimum recommended offset between two concurrently-accessed
51 | objects to avoid additional performance degradation due to contention introduced
52 | by the implementation. It shall be at least `alignof(max_align_t)`.
53 |
54 | [ *Example*:
55 |
56 |
57 | struct keep_apart {
58 | alignas(hardware_destructive_interference_size) atomic cat;
59 | alignas(hardware_destructive_interference_size) atomic dog;
60 | };
61 |
62 |
63 | — *end example* ]
64 |
65 | inline constexpr size_t hardware_constructive_interference_size = implementation-defined;
66 |
67 | This number is the maximum recommended size of contiguous memory occupied by
68 | two objects accessed with temporal locality by concurrent threads. It shall be
69 | at least `alignof(max_align_t)`.
70 |
71 | [ *Example*:
72 |
73 |
74 | struct together {
75 | atomic dog;
76 | int puppy;
77 | };
78 | struct kennel {
79 | // Other data members...
80 | alignas(sizeof(together)) together pack;
81 | // Other data members...
82 | };
83 | static_assert(sizeof(together) <= hardware_constructive_interference_size);
84 |
85 |
86 | — *end example* ]
87 |
88 |
89 |
90 | Discussions {#discussions}
91 | ===========
92 |
93 | The paper was discussed in:
94 |
95 | * [SG1 Kona](http://wiki.edg.com/bin/view/Wg21kona2015/N4523)
96 | * [LEWG Kona](http://wiki.edg.com/bin/view/Wg21kona2015/P0154)
97 | * [LEWG Jacksonville](http://wiki.edg.com/bin/view/Wg21jacksonville/P0154)
98 | * [LWG Jacksonville](http://wiki.edg.com/bin/view/Wg21jacksonville/D0154R1)
99 |
100 | ABI issues were considered in these discussions, and the committee decided that
101 | having these values was worth the potential pain points. ABI issues can arise as
102 | follows:
103 |
104 | 1. A developer asks the compiler to generate code for multiple targets of the
105 | same ISA, and these targets prefer different interference sizes.
106 | 1. A developer indicates that code should be generated for heterogeneous system
107 | (such as CPU and GPU), which prefer different interference sizes.
108 | 1. A developer uses different compilers, and links the result together.
109 |
110 | A further ABI issue was added by [[P0607r0]] by making the variables `inline`:
111 | in case 1. above the interference size values differ between translation units,
112 | which is a problem if they are used in an ODR-relevant context. That paper noted:
113 |
114 |
115 |
116 | [*Drafting notes*: The removal of the explicit `static` specifier for the
117 | namespace-scope constants `hardware_destructive_interference_size` and
118 | `hardware_constructive_interference_size` is still required because adding
119 | `inline` alone would still not solve the ODR violation problem here.
120 | — *end drafting notes*]
121 |
122 |
123 |
124 | This change indeed fixes the ODR issue where two translation units translated
125 | with the same interference size values may violate ODR when used with e.g.
126 | `std::max`. It however introduces a new ODR issue for case 1. above.
127 |
128 | Richard Smith and Tim Song propose changing the definition to:
129 |
130 |
131 | static constexpr const std::size_t& hardware_destructive_interference_size = implementation-defined;
132 | static constexpr const std::size_t& hardware_constructive_interference_size = implementation-defined;
133 |
134 |
135 | We propose a discussion and poll on this topic.
136 |
137 |
138 | Pushback {#push}
139 | ========
140 |
141 | The maintainers of clang and GCC
142 | have
143 | [discussed an implementation strategy](http://lists.llvm.org/pipermail/cfe-dev/2018-May/058073.html),
144 | but received pushback based on the above ABI issues. The messaging from the
145 | committee wasn't clear that ABI issues were discussed and the proposal accepted
146 | despite these issues. This type of ABI problem is difficult or impossible to
147 | warn about, some implementors are worried.
148 |
149 | Some implementors are worries that they have the following choices when
150 | implementing, and are unsure which approach to take:
151 |
152 | 1. Pick a value once for each ABI and cast it in stone forever, even if
153 | microarchitectural revisions cause the values to change.
154 | 1. Change the value between microarchitectures, even though that's an ABI
155 | break?
156 | 1. Something else.
157 |
158 | The authors believe that the ABI issues are acceptable because:
159 |
160 | * As demonstrated in the original paper, developers already write code like
161 | this, using macros. Any ABI issue that exist with this proposal already
162 | existed before the proposal.
163 | * Many uses of these values have no ABI breakage potential because they only
164 | target one variant of one ISA.
165 | * The usecase for these values is to lay out datastructures. These
166 | datastructures shouldn't be shared across translation units which follow
167 | different ABIs.
168 | * Similar ABI issues already exist with `max_align_t` and `intmax_t`.
169 | * Implementations can offer compiler flags which specifically control ABI. For
170 | example, `-mcpu` could keep the ABI stable, but `-mcpu-abi` would change it.
171 |
172 | Polls {#polls}
173 | =====
174 |
175 | We propose the following poll for SG1:
176 |
177 | > The committee understands the ABI issues with `std::hardware_{constructive,destructive}_interference_size`, yet chooses to standardize these values nonetheless.
178 |
179 | The committee could also consider adding a note to point out ABI issues with
180 | these values. This would be a novel note, since ABI isn't discussed in the
181 | Standard.
182 |
183 | We propose the following poll for SG1, LEWG, and LWG:
184 |
185 | > Both ODR issues should be addressed, the type should therefore be changed to `static constexpr const std::size_t&`.
186 |
187 | Not all authors of this paper are in favor of this direction, but all agree the
188 | discussion is worth having.
189 |
--------------------------------------------------------------------------------
/source/N4523.rst:
--------------------------------------------------------------------------------
1 | ===================================================================
2 | N4523 ``constexpr std::thread::hardware_{true,false}_sharing_size``
3 | ===================================================================
4 |
5 | :Author: JF Bastien
6 | :Contact: jfb@google.com
7 | :Author: Olivier Giroux
8 | :Contact: ogiroux@nvidia.com
9 | :Date: 2015-05-21
10 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4523.rst
11 |
12 | ---------
13 | Rationale
14 | ---------
15 |
16 | Starting with C++11, the library includes
17 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity
18 | useful in the design of control structures in multi-threaded programs: the
19 | extent of threads that do not interfere (to the first-order). Established
20 | practice throughout the industry also relies on a second implementation
21 | quantity, used instead in the design of data structures in the same programs.
22 | This quantity is the granularity of memory that does not interfere (to the
23 | first-order), commonly referred to as the *cache-line size*.
24 |
25 | Uses of *cache-line size* fall into two broad categories:
26 |
27 | * Avoiding false-sharing between objects with temporally disjoint runtime access
28 | patterns from different threads. e.g. Producer-consumer queues.
29 | * Promoting true-sharing between objects which have temporally local runtime
30 | access patterns. e.g. The ``barrier`` example, as illustrated in N4522_.
31 |
32 | .. _N4522: http://wg21.link/N4522
33 |
34 | The most sigificant issue with this useful implementation quantity is the
35 | questionable portability of the methods used in current practice to determine
36 | its value, despite their pervasiveness and popularity as a group. In the
37 | appendix_ we review several different compile-time and run-time methods. The
38 | portability problem with most of these methods is that they expose a
39 | micro-architectural detail without accounting for the intent of the implementors
40 | (such as we are) over the life of the ISA or ABI.
41 |
42 | We aim to contribute a modest invention for this cause, abstractions for this
43 | quantity that can be conservatively defined for given purposes by
44 | implementations:
45 |
46 | * *False-sharing size*: a number that's suitable as an offset between two
47 | objects to likely avoid false-sharing due to different runtime access patterns
48 | from different threads.
49 | * *True-sharing size*: a number that's suitable as a limit on two objects'
50 | combined memory footprint size and base alignment to likely promote
51 | true-sharing between them.
52 |
53 | In both cases these values are provided on a quality of implementation basis,
54 | purely as hints that are likely to improve performance. These are ideal portable
55 | values to use with the ``alignas()`` keyword, for which there currently exists
56 | nearly no standard-supported portable uses.
57 |
58 | -----------------
59 | Proposed addition
60 | -----------------
61 |
62 | We propose adding the following to the standard:
63 |
64 | Under 30.3.1 Class ``thread`` [**thread.thread.class**]:
65 |
66 | .. code-block:: c++
67 |
68 | namespace std {
69 | class thread {
70 | // ...
71 | public:
72 | static constexpr size_t hardware_false_sharing_size = /* implementation-defined */;
73 | static constexpr size_t hardware_true_sharing_size = /* implementation-defined */;
74 | // ...
75 | };
76 | }
77 |
78 | Under 30.3.1.6 ``thread`` static members [**thread.thread.static**]:
79 |
80 | ``constexpr size_t hardware_false_sharing_size = /* implementation-defined */;``
81 |
82 | This number is the minimum recommended offset between two concurrently-accessed
83 | objects to avoid additional performance degradation due to contention introduced
84 | by the implementation.
85 |
86 | [*Example:*
87 |
88 | .. code-block:: c++
89 |
90 | struct apart {
91 | alignas(hardware_false_sharing_size) atomic flag1, flag2;
92 | };
93 |
94 | — *end example*]
95 |
96 | ``constexpr size_t hardware_true_sharing_size = /* implementation-defined */;``
97 |
98 | This number is the minimum recommended alignment and maximum recommended size of
99 | contiguous memory occupied by two objects accessed with temporal locality by
100 | concurrent threads.
101 |
102 | [*Example:*
103 |
104 | .. code-block:: c++
105 |
106 | alignas(hardware_true_sharing_size) struct colocated {
107 | atomic flag;
108 | int tinydata;
109 | };
110 | static_assert(sizeof(colocated) <= hardware_true_sharing_size);
111 |
112 | — *end example*]
113 |
114 | The ``__cpp_lib_thread_hardware_sharing_size`` feature test macro should be
115 | added.
116 |
117 | .. _appendix:
118 |
119 | --------
120 | Appendix
121 | --------
122 |
123 | Compile-time *cache-line size*
124 | ==============================
125 |
126 | We informatively list a few ways in which the L1 *cache-line size* is obtained
127 | in different open-source projects at compile-time.
128 |
129 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured
130 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this
131 | value is determined through the configure-time option
132 | ``CONFIG__L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT``
133 | is hard-coded in the architecture's ``include/asm/cache.h`` header.
134 |
135 | Many open-source projects from Google contain a ``base/port.h`` header which
136 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of
137 | architecture detection macros. These header files have often diverged. A token
138 | example from the autofdo_ project is:
139 |
140 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h
141 |
142 | .. code-block:: c++
143 |
144 | // Cache line alignment
145 | #if defined(__i386__) || defined(__x86_64__)
146 | #define CACHELINE_SIZE 64
147 | #elif defined(__powerpc64__)
148 | // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines.
149 | // Need to check if this is appropriate for other PowerPC64 systems.
150 | #define CACHELINE_SIZE 128
151 | #elif defined(__arm__)
152 | // Cache line sizes for ARM: These values are not strictly correct since
153 | // cache line sizes depend on implementations, not architectures. There
154 | // are even implementations with cache line sizes configurable at boot
155 | // time.
156 | #if defined(__ARM_ARCH_5T__)
157 | #define CACHELINE_SIZE 32
158 | #elif defined(__ARM_ARCH_7A__)
159 | #define CACHELINE_SIZE 64
160 | #endif
161 | #endif
162 |
163 | #ifndef CACHELINE_SIZE
164 | // A reasonable default guess. Note that overestimates tend to waste more
165 | // space, while underestimates tend to waste more time.
166 | #define CACHELINE_SIZE 64
167 | #endif
168 |
169 | #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE)))
170 |
171 | Runtime *cache-line size*
172 | =========================
173 |
174 | We informatively list a few ways in which the L1 *cache-line size* can be
175 | obtained on different operating systems and architectures at runtime.
176 |
177 | On OSX one would use:
178 |
179 | .. code-block:: c++
180 |
181 | sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0)
182 |
183 | On Windows one would use:
184 |
185 | .. code-block:: c++
186 |
187 | GetLogicalProcessorInformation(&buf[0], &sizeof_buf);
188 | for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
189 | if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1)
190 | cacheline_size = buf[i].Cache.LineSize;
191 |
192 | On Linux one would either use:
193 |
194 | .. code-block:: c++
195 |
196 | p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
197 | fscanf(p, "%d", &cacheline_size);
198 |
199 | or:
200 |
201 | .. code-block:: c++
202 |
203 | sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
204 |
205 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which
206 | leaves the result in ``ECX``, which needs further work to extract.
207 |
208 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to
209 | extract.
210 |
--------------------------------------------------------------------------------
/source/P1018R19.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: C++ Language Evolution status
3 | Shortname: P1018
4 | Revision: 19
5 | Audience: WG21, EWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P1018r19
9 | !Source: github.com/jfbastien/papers/blob/master/source/P1018r19.bs
10 | Editor: JF Bastien, Woven Planet, cxx@jfbastien.com
11 | Date: 2022-11-15
12 | Markup Shorthands: markdown yes
13 | Toggle Diffs: no
14 | No abstract: false
15 | Abstract: This paper is a collection of items that the C++ Language Evolution group has worked on in the latest meeting, their status, and plans for the future.
16 |
17 |
18 |
21 |
22 | Executive summary {#summary}
23 | =================
24 |
25 | The Evolution Working Group did not meet in-person between the February 2020 meeting in Prague, until November 2022 in Kona. You will find EWG's pandemic activities in [[P1018r18]].
26 |
27 | This paper summarizes all of the work that was performed in the November 2022 Kona meeting.
28 |
29 | Work Performed {#work}
30 | ==============
31 |
32 | This meeting was the first towards finalizing C++23, see [[P1000r4]] for the full schedule. In the ISO process, we received a variety of comments from different National Bodies. The full list is tracked as GitHub issues. EWG received 33 National Body comments. Of those, 16 were closed as duplicates, and 17 were reviewed with the following outcomes:
33 |
34 |
2 Needs to come back to EWG (Will see in Telecons/next meeting): FR-025-017, US 8-036
39 |
40 |
41 | Separately from finalizing C++23, we’ve continued early work towards C++26 and later. We track outstanding proposals in GitHub as well, here are the ones for EWG which are ready to review. EWG and its incubator EWGI started the week with 83 papers to review (some not for the first time), EWG therefore had to prioritize using a variety of criteria such as the C++ Direction Group’s recommendations in [[P2000r4]]. During the week forwarded the following papers to CWG for C++26:
42 |
43 |
[[P1061R0]] Structured Bindings can introduce a Pack
44 |
[[P2361R0]] Unevaluated string literals
45 |
[[P2014R0]] aligned allocation of coroutine frames
46 |
[[P0609R1]] Attributes for Structured Bindings
47 |
[[P2558R0]] Add @, $, and ` to the basic character set
48 |
[[P2621R0]] UB? In my Lexer?
49 |
[[P2686R0]] Updated wording and implementation experience for P1481 (constexpr structured bindings)
50 |
[[P1967R0]] #embed - a simple, scannable preprocessor-based resource acquisition method
51 |
[[P2593R0]] Allowing static_assert(false): To be forwarded after the next meeting unless a better proposal comes up
52 |
53 | This doesn’t mean that they will all be in C++26, they are only tentatively on track to be in C++26.
54 |
55 | The following papers were reviewed and forwarded to LEWG, the library evolution group, meaning that either EWG sees no need for language input, or provided language input to the library group, or requests library input to further the language work:
56 |
57 |
[[P2641R0]] Checking if a union alternative is active
58 |
[[P2546R0]] Debugging Support
59 |
[[P0876R5]] fiber_context - fibers without scheduler
60 |
[[P2141R0]] Aggregates are named tuples
61 |
62 |
63 | The following papers were reviewed and encouraged to come back with an update:
64 |
[[P2547R0]] Language support for customisable functions
82 |
[[P2632R0]] A plan for better template meta programming facilities in C++26
83 |
[[P2671R0]] Syntax choices for generalized pack declaration and usage
84 |
85 |
86 | The following papers were reviewed and had no consensus for further work:
87 |
88 |
[[P2669R0]] Deprecate changing kind of names in class template specializations
89 |
[[P2174R0]] Compound Literals
90 |
[[P2381R0]] Pattern Matching with Exception Handling
91 |
92 |
93 | CWG asked for EWG feedback on:
94 |
95 |
[[CWG2463]] Conditions for trivially copyable classes, the conclusion was that a paper was needed to address the issue
96 |
97 |
98 | The committee also tracks defects through various groups. EWG issues were tracked in [[P1018r18]], and will shortly move to GitHub. This week we reviewed EWG issues as follows:
99 |
100 |
2 Marked Resolved
101 |
1 Marked as “Needs a Paper”
102 |
17 Closed as “Not A Defect”
103 |
104 |
105 | EWG hosted an evening session on “the future of C++”. The results in a few weeks (once the committee discussed internally, based on the survey feedback that sent attendees). It was well attended with 100+ participants, and much frank discussion.
106 |
107 | A session on [[P2676r0]] he Val object model was held, so that C++ committee members learn about the work David Abrahams is doing at Adobe on the Val language. We separately heard from Herb Sutter on CppFront. We also had good engagement from a few folks who have worked on the Carbon programming language. As this is the C++ committee, we also often talk about languages such as Rust, Circle, Zig and others.
108 |
109 |
2 | Title: The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange
3 | Shortname: P0528
4 | Revision: 1
5 | Audience: SG1, EWG, CWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P0528r1
9 | !Source: github.com/jfbastien/papers/blob/master/source/P0528r1.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Editor: Michael Spencer, Sony Playstation, bigcheesegs@gmail.com
12 | Abstract: Compare-and-exchange on a struct with padding bits should Just Work.
13 | Date: 2018-02-11
14 | Markup Shorthands: markdown yes
15 |
16 |
17 | This issue has been discussed by the authors at every recent Standards meetings,
18 | yet a full solution has been elusive despite helpful proposals. We believe that
19 | this proposal can fix this oft-encountered problem once and for all.
20 |
21 | [[P0528r0]] details extensive background on this problem (not repeated here),
22 | and proposed standardizing a trait, `has_padding_bits`, and using it on
23 | `compare_and_exchange_*`. This paper applies EWG guidance and simply adds a
24 | note.
25 |
26 |
27 | Edit History {#edit}
28 | ============
29 |
30 | r0 → r1 {#r0r1}
31 | -------
32 |
33 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming
34 | value of `T` have a consistent value for the purposes of read/modify/write
35 | atomic operations?
36 |
37 | Purposefully not addressed in this paper:
38 |
39 | * `union` with padding bits
40 | * Types with trap representations
41 |
42 | Proposed Wording {#word}
43 | ================
44 |
45 | In Operations on atomic types [**atomics.types.operations**], insert a new
46 | paragraph after the note in ❡1:
47 |
48 |
49 |
50 | [*Note:* Many operations are volatile-qualified. The "volatile as device
51 | register" semantics have not changed in the standard. This qualification means
52 | that volatility is preserved when applying these operations to volatile objects.
53 | It does not mean that operations on non-volatile objects become volatile. —*end
54 | note*]
55 |
56 |
57 |
58 | Atomic operations, both through `atomic` and free-functions, can be performed
59 | on types `T` which contain bits that never participate in the object's
60 | representation. In such cases an implementation shall ensure that
61 | initialization, assignment, store, exchange, and read-modify-write operations
62 | replace bits which never participate in the object's representation with an
63 | implementation-defined value. A compatible implementation-defined value shall be
64 | used for compare-and-exchange operations' copy of the `expected` value.
65 |
66 | As a consequence, the following code is guaranteed to avoid spurious failure:
67 |
68 |
69 |
70 | struct padded {
71 | char c = 0x42;
72 | // Padding here.
73 | unsigned i = 0xC0DEFEFE;
74 | };
75 | atomic pad = ATOMIC_VAR_INIT({});
76 |
77 | bool success() {
78 | padded expected, desired { 0, 0 };
79 | return pad.compare_exchange_strong(expected, desired);
80 | }
81 |
82 |
83 |
84 | [*Note:*
85 |
86 | Types which contain bits that sometimes participate in the object's
87 | representation, such as a `union` containing a type with padding bits and a
88 | type without, may always fail compare-and-exchange when these bits are not
89 | participating in the object's representation because they have an
90 | indeterminate value. Such a program is ill-formed, no diagnostic required.
91 |
92 | —*end note*]
93 |
94 |
95 |
96 |
97 |
98 | Edit ❡17 and onwards as follows:
99 |
100 |
101 |
102 | *Requires:* The `failure` argument shall not be `memory_order::release` nor
103 | `memory_order::acq_rel`.
104 |
105 | *Effects:* Retrieves the value in `expected`. Bits in the retrieved value
106 | which never participate in the object's representation are set to a value
107 | compatible to that previously stored in the atomic object. It then
108 | atomically compares the contents of the memory pointed to by `this` for equality
109 | with that previously retrieved from `expected`, and if true, replaces the
110 | contents of the memory pointed to by `this` with that in `desired`. If and only
111 | if the comparison is true, memory is affected according to the value of
112 | `success`, and if the comparison is false, memory is affected according to the
113 | value of `failure`. When only one `memory_order` argument is supplied, the value
114 | of `success` is `order`, and the value of `failure` is `order` except that a
115 | value of `memory_order::acq_rel` shall be replaced by the value
116 | `memory_order::acquire` and a value of `memory_order::release` shall be replaced
117 | by the value `memory_order::relaxed`. If and only if the comparison is false
118 | then, after the atomic operation, the contents of the memory in `expected` are
119 | replaced by the value read from the memory pointed to by `this` during the
120 | atomic comparison. If the operation returns `true`, these operations are atomic
121 | read-modify-write operations on the memory pointed to by `this`. Otherwise,
122 | these operations are atomic load operations on that memory.
123 |
124 | *Returns:* The result of the comparison.
125 |
126 | [*Note:*
127 |
128 | For example, the effect of `compare_exchange_strong` is
129 |
130 |
131 |
132 | if (memcmp(this, &expected, sizeof(*this)) == 0)
133 | memcpy(this, &desired, sizeof(*this));
134 | else
135 | memcpy(expected, this, sizeof(*this));
136 |
137 |
138 |
139 | —*end note*]
140 |
141 | [*Example:*
142 |
143 | The expected use of the compare-and-exchange operations is as follows. The
144 | compare-and-exchange operations will update `expected` when another iteration
145 | of the loop is needed.
146 |
147 |
148 |
149 | expected = current.load();
150 | do {
151 | desired = function(expected);
152 | } while (!current.compare_exchange_weak(expected, desired));
153 |
154 |
155 |
156 | —*end example*]
157 |
158 | [*Example:*
159 |
160 | Because the expected value is updated only on failure, code releasing the
161 | memory containing the `expected` value on success will work. E.g. list head
162 | insertion will act atomically and would not introduce a data race in the
163 | following code:
164 |
165 |
166 |
167 | do {
168 | p->next = head; // make new list node point to the current head
169 | } while (!head.compare_exchange_weak(p->next, p)); // try to insert
170 |
171 |
172 |
173 | —*end example*]
174 |
175 | Implementations should ensure that weak compare-and-exchange operations do not
176 | consistently return `false` unless either the atomic object has value different
177 | from `expected` or there are concurrent modifications to the atomic object.
178 |
179 |
180 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is,
181 | even when the contents of memory referred to by `expected` and `this` are equal,
182 | it may return `false` and store back to `expected` the same memory contents that
183 | were originally there.
184 |
185 | [*Note:*
186 |
187 | This spurious failure enables implementation of compare-and-exchange on a
188 | broader class of machines, e.g., load-locked store-conditional machines. A
189 | consequence of spurious failure is that nearly all uses of weak
190 | compare-and-exchange will be in a loop. When a compare-and-exchange is in a
191 | loop, the weak version will yield better performance on some platforms. When a
192 | weak compare-and-exchange would require a loop and a strong one would not, the
193 | strong one is preferable.
194 |
195 | —*end note*]
196 |
197 | [*Note:*
198 |
199 | The `memcpy` and `memcmp` semantics of the compare-and-exchange operations may
200 | result in failed comparisons for values that compare equal with `operator==`
201 | if the underlying type has padding bits which sometimes participate in
202 | the object's representation, trap bits, or alternate representations of
203 | the same value other than those caused by padding bits which never
204 | participate in the object's representation.
205 |
206 | —*end note*]
207 |
208 |
2 | Title: The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange
3 | Shortname: P0528
4 | Revision: 2
5 | Audience: CWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P0528r2
9 | !Source: github.com/jfbastien/papers/blob/master/source/P0528r2.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Editor: Michael Spencer, Sony Playstation, bigcheesegs@gmail.com
12 | Abstract: Compare-and-exchange on a struct with padding bits should Just Work.
13 | Date: 2018-03-16
14 | Markup Shorthands: markdown yes
15 |
16 |
17 | This issue has been discussed by the authors at every recent Standards meetings,
18 | yet a full solution has been elusive despite helpful proposals. We believe that
19 | this proposal can fix this oft-encountered problem once and for all.
20 |
21 | [[P0528r0]] details extensive background on this problem (not repeated here),
22 | and proposed standardizing a trait, `has_padding_bits`, and using it on
23 | `compare_and_exchange_*`. [[P0528r1]] applied EWG guidance and simply added
24 | wording directing implementations to ensure that the desired behavior occur. At
25 | SG1's request this paper follows EWG's guidance but uses different wording.
26 |
27 |
28 | Edit History {#edit}
29 | ============
30 |
31 | r1 → r2 {#r1r2}
32 | -------
33 |
34 | In Jacksonville, SG1 supported the paper but suggested an alternate way to
35 | approach the wording than the one EWG proposed in Albuquerque: don't talk about
36 | contents of the memory, but rather discuss the value representation to describe
37 | compare-and-exchange. This paper follows SG1's guidance and offers different
38 | wording, with the intent that the semantics be equivalent. EWG reviewed the
39 | updated wording an voted to support it and forward to Core.
40 |
41 | r0 → r1 {#r0r1}
42 | -------
43 |
44 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming
45 | value of `T` have a consistent value for the purposes of read/modify/write
46 | atomic operations?
47 |
48 | Purposefully not addressed in this paper:
49 |
50 | * `union` with padding bits
51 | * Types with trap representations
52 |
53 | Proposed Wording {#word}
54 | ================
55 |
56 | Edit ❡17 and onwards as follows:
57 |
58 |
59 |
60 | *Requires:* The `failure` argument shall not be `memory_order::release` nor
61 | `memory_order::acq_rel`.
62 |
63 | *Effects:* Retrieves the value in `expected`. It then atomically compares
64 | the contents of the memory pointed to by `this`value representation
65 | of `*this` for equality with that previously retrieved from `expected`,
66 | and if true, replaces the contents of the memory pointed to
67 | by `this`value representation of `*this` with that in `desired`. If
68 | and only if the comparison is true, memory is affected according to the value of
69 | `success`, and if the comparison is false, memory is affected according to the
70 | value of `failure`. When only one `memory_order` argument is supplied, the value
71 | of `success` is `order`, and the value of `failure` is `order` except that a
72 | value of `memory_order::acq_rel` shall be replaced by the value
73 | `memory_order::acquire` and a value of `memory_order::release` shall be replaced
74 | by the value `memory_order::relaxed`. If and only if the comparison is false
75 | then, after the atomic operation, the contents of the memorythe
76 | value representation in `expected` are replaced by the value
77 | representation read from the memory pointed to by `this` during the atomic
78 | comparison. If the operation returns `true`, these operations are atomic
79 | read-modify-write operations on the memory pointed to by `this`. Otherwise,
80 | these operations are atomic load operations on that memory.
81 |
82 | *Returns:* The result of the comparison.
83 |
84 | [*Note:*
85 |
86 | For example, the effect of `compare_exchange_strong` on objects without padding bits is
87 |
88 |
89 |
90 | if (memcmp(this, &expected, sizeof(*this)) == 0)
91 | memcpy(this, &desired, sizeof(*this));
92 | else
93 | memcpy(expected, this, sizeof(*this));
94 |
95 |
96 |
97 | —*end note*]
98 |
99 | [*Example:*
100 |
101 | The expected use of the compare-and-exchange operations is as follows. The
102 | compare-and-exchange operations will update `expected` when another iteration
103 | of the loop is needed.
104 |
105 |
106 |
107 | expected = current.load();
108 | do {
109 | desired = function(expected);
110 | } while (!current.compare_exchange_weak(expected, desired));
111 |
112 |
113 |
114 | —*end example*]
115 |
116 | [*Example:*
117 |
118 | Because the expected value is updated only on failure, code releasing the
119 | memory containing the `expected` value on success will work. E.g. list head
120 | insertion will act atomically and would not introduce a data race in the
121 | following code:
122 |
123 |
124 |
125 | do {
126 | p->next = head; // make new list node point to the current head
127 | } while (!head.compare_exchange_weak(p->next, p)); // try to insert
128 |
129 |
130 |
131 | —*end example*]
132 |
133 | Implementations should ensure that weak compare-and-exchange operations do not
134 | consistently return `false` unless either the atomic object has value different
135 | from `expected` or there are concurrent modifications to the atomic object.
136 |
137 |
138 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is,
139 | even when the contents of memory referred to by `expected` and `this` are equal,
140 | it may return `false` and store back to `expected` the same memory contents that
141 | were originally there.
142 |
143 | [*Note:*
144 |
145 | This spurious failure enables implementation of compare-and-exchange on a
146 | broader class of machines, e.g., load-locked store-conditional machines. A
147 | consequence of spurious failure is that nearly all uses of weak
148 | compare-and-exchange will be in a loop. When a compare-and-exchange is in a
149 | loop, the weak version will yield better performance on some platforms. When a
150 | weak compare-and-exchange would require a loop and a strong one would not, the
151 | strong one is preferable.
152 |
153 | —*end note*]
154 |
155 | [*Note:*
156 |
157 | The `memcpy` and `memcmp` semantics of the compare-and-exchange operations
158 | may result in failed comparisons for values that compare equal with
159 | `operator==` if the underlying type has padding bits which sometimes
160 | participate in the object's representation, trap bits, or
161 | alternate representations of the same value other than those caused by
162 | padding bits which never participate in the object's representation.
163 | Notably, on implementations conforming to ISO/IEC/IEEE 60559, floating-point
164 | `-0.0` and `+0.0` will not compare equal with `memcmp` but will compare equal
165 | with `operator==`, and NaNs with the same payload will compare equal with
166 | `memcmp` but will not compare equal with `operator==`.
167 |
168 | —*end note*]
169 |
170 |
171 |
172 | [*Note:*
173 |
174 | Compare-and-exchange acts on an object's value representation, ensuring that
175 | padding bits which never participate in the object's representation are ignored.
176 |
177 | As a consequence, the following code is guaranteed to avoid spurious failure:
178 |
179 |
180 |
181 | struct padded {
182 | char clank = 0x42;
183 | // Padding here.
184 | unsigned biff = 0xC0DEFEFE;
185 | };
186 | atomic pad = ATOMIC_VAR_INIT({});
187 |
188 | bool zap() {
189 | padded expected, desired { 0, 0 };
190 | return pad.compare_exchange_strong(expected, desired);
191 | }
192 |
193 |
194 |
195 | —*end note*]
196 |
197 | [*Note:*
198 |
199 | Types which contain bits that sometimes participate in the object's
200 | representation, such as a `union` containing a type with padding bits and a
201 | type without, may always fail compare-and-exchange when these bits are not
202 | participating in the object's representation because they have an
203 | indeterminate value.
204 |
205 | —*end note*]
206 |
207 |
208 |
209 |
2 | Title: Language Evolution status after Prague 2020
3 | Shortname: P1018
4 | Revision: 6
5 | Audience: WG21, EWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P1018r6
9 | !Source: github.com/jfbastien/papers/blob/master/source/P1018r6.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Date: 2020-02-29
12 | Markup Shorthands: markdown yes
13 | Toggle Diffs: no
14 | No abstract: false
15 | Abstract: This paper is a collection of items that language Evolution has worked on in the latest C++ meeting, their status, and plans for the future.
16 |
17 |
18 | Executive summary {#summary}
19 | =================
20 |
21 | * Finalize ballot resolution for C++20, to address National Body comments in [[N4844]].
22 | * Start work on features for C++23 and later.
23 | * Joins session with LEWG on ABI, based on P1863R1.
24 |
25 |
26 | Paper of note {#note}
27 | =============
28 |
29 | * P1000R4 C++ IS schedule
30 | * P0592R4 To boldly suggest an overall plan for C++23
31 | * P1999R0 Process: 2×-🇨🇿 evolutionary material via a Tentatively Ready status
32 | * P2118R0 Documenting Core Undefined or Unspecified Behavior
33 |
34 |
35 | Tentatively ready papers {#tentative}
36 | ========================
37 |
38 | Following our process in P1999, here are the papers that EWG considers tentatively ready for CWG. We'll take a brief look at the next meeting, and if nothing particular concerns anyone, send them to CWG.
39 |
40 | * P1847R2 Make declaration order layout mandated
41 | * P2025R0 Guaranteed copy elision for named return objects
42 | * P1949R2 C++ Identifier Syntax using Unicode Standard Annex 31
43 |
44 | You can follow this list on GitHub.
45 |
46 |
47 | ABI discussion {#abi}
48 | ==============
49 |
50 | We held a joint sessions with LEWG to discuss ABI, based on P1863R1, The outcome of the discussion was as follows:
51 |
52 | * To the best of our ability, we should promise users that we won’t break ABI, ever Wasn't contended: we disagree with this statement and might break ABI in the future.
53 | * From now on, we should consider incremental ABI for every C++ release Received extremely positive support, with a small minority disagreeing strongly.
54 | * We should consider a big ABI break for C++23 Was extremely contended, with a few more people in favor than against. This was insufficient to call consensus.
55 | * We should consider a big ABI break for C++SOMETHING Was positive enough to call consensus, but still had a quite substantial opposition including many disagreeing strongly. Were we to do a big ABI break we would need to work very hard on consensus building. Indeed, the number of people disagreeing strongly on a poll for a concrete change would block consensus.
56 | * When we are unable to resolve a conflict between performance and ABI compatibility, we should prioritize performance Was still more positive, but also had a quite substantial opposition including many disagreeing strongly. Again, we should consider performance over ABI but work extremely hard towards consensus building when doing so.
57 |
58 |
59 | National body comments {#nb}
60 | ======================
61 |
62 | * P2003R0 Fixing Internal and External Linkage Entities in Header Units #740
63 | * P2014R0 Proposed resolution for US061/US062 - aligned allocation of coroutine frames #750
64 | * P1884R0 Private Module Partition: An Inconsistent Boundary #729
65 | * P2100R0 Keep unhandled_exception of a promise type mandatory - a response to US062 and FR066
66 | * P2104R0 GB046 Allow caching of evaluations of concept specializations #45
67 |
68 |
69 | C++23 discussions {#cpp23}
70 | =================
71 |
72 | We discussed a few papers which could make it to C++23:
73 |
74 | * P2085R0 Consistent defaulted comparisons
75 | * P0592R4 To boldly suggest an overall plan for C++23
76 | * P1999R0 Process proposal: double-check evolutionary material via a Tentatively Ready status
77 | * P1468R3 Fixed-layout floating-point type aliases
78 | * P1467R3 Extended floating-point types
79 | * P1371R2 Pattern Matching
80 | * P1000R4 C++ IS schedule
81 | * P1726R2 Pointer lifetime-end zap
82 | * P2092R0 Disambiguating Nested-Requirements
83 | * P1040R5 std::embed
84 | * P1677R2 Cancellation is not an Error
85 | * P1401R2 Narrowing contextual conversions to bool
86 | * P0876R10 fiber_context - fibers without scheduler
87 | * P0847R4 Deducing this
88 | * P2082R1 Fixing CTAD for aggregates
89 | * P1774R3 Portable assumptions
90 | * P2118R0 Documenting Core Undefined or Unspecified Behavior
91 | * P0849R2 auto(x): decay-copy in the language
92 | * P2036R0 Changing scope for lambda trailing-return-type
93 | * P2071R0 Named universal character escapes
94 | * P1900R0 Concepts-Adjacent Problems
95 | * P1847R2 Make declaration order layout mandated
96 | * P1393R0 A General Property Customization Mechanism
97 | * P2026R0 A Constituent Study Group for Safety-Critical Applications
98 | * P1938R0 if consteval
99 | * P1955R0 Top Level Is Constant Evaluated
100 | * P2041R0 Deleting variable templates
101 | * P0870R2 A proposal for a type trait to detect narrowing conversions
102 | * P2025R0 Guaranteed copy elision for named return objects
103 | * P2013R0 Freestanding Language: Optional ::operator new
104 | * P1949R2 C++ Identifier Syntax using Unicode Standard Annex 31
105 |
106 | The following papers were scheduled for discussion, but authors requested to delay until the next meeting:
107 |
108 | * P1967R1 #embed - a simple, scannable preprocessor-based resource acquisition method
109 | * P1046R2 Automatically Generate More Operators
110 | * P2049R0 Constraint refinement for special-cased functions
111 |
112 | The following papers were scheduled for discussion, but were seen in SG7 Reflection who decided to table them for now:
113 |
114 | * P1733R0 User-friendly and Evolution-friendly Reflection: A Compromise
115 | * P2089R0 Function parameter constraints are fragile
116 |
117 |
118 | Near-future EWG plans {#future}
119 | =====================
120 |
121 | We will continue to work on C++23, prioritizing according to P0592.
122 |
--------------------------------------------------------------------------------
/source/P1225R0.bs:
--------------------------------------------------------------------------------
1 |
15 |
16 | Abstract {#abs}
17 | ========
18 |
19 | I’ve gathered input from a variety of folks involved in graphics at Apple, and here is our joint, considered, position regarding the 2D Graphics proposal.
20 |
21 | We’re worried that the 2D Graphics proposal in [[P0267R8]] might be detrimental to developers, students, and users of devices which contain C++ code. Graphics are important to the Apple ecosystem, and we can see them as an important part of C++. However, we don’t think P0267R8 meets the quality bar for acceptance into C++. We want to see the reference implementation prove orthogonality, extensibility, and performance across a handful of platforms.
22 |
23 |
24 | Design {#design}
25 | ======
26 |
27 | Were we to design a 2D Graphics API, we’d do the following:
28 |
29 | 1. Multiple output devices: Memory buffer, Window, SVG, PDF, etc.
30 |
31 | 1. Memory buffer must be directly usable by graphics API
32 | 1. Support types such as `fp16` [[P0303R0]]
33 | 1. Alpha channel support
34 |
35 | 1. Anti-aliasing should come for free where supported
36 | 1. Text
37 | 1. Consistent, DPI-independent, output
38 | 1. Hardware support where available
39 | 1. Reasonable performance
40 | 1. Reasonable power consumption
41 | 1. Color spaces and gamma support
42 | 1. Possibility to build an interactive model with animation on top of the API
43 |
44 | From the current proposal we like:
45 |
46 | 1. 2D Matrix is 3×3, so homogeneous, presented as 2×3 in the API
47 | 1. Decouples display points from actual points
48 | 1. Vector graphics
49 | 1. Compositing properly handled
50 |
51 | Science and teaching {#st}
52 | ====================
53 |
54 | We’ve heard the following reasons for including 2D Graphics in C++:
55 |
56 | 1. Teaching
57 | 1. Scientific plot generation
58 |
59 | We think putting pixels on the screen is great, but we want to do so responsibly.
60 |
61 | Both for science and teaching, we appreciate what’s available through solutions such as Matlab / matplotlib / R / D3.js. These solutions are powerful and match the performance of the language they complement. For C++ we’d expect a solution which is able to deliver performance which at least approaches that of modern graphics frameworks, and surpassing those of Matlab / Python / R / JavaScript.
62 |
63 | As a teaching tool, the current proposal teaches fairly low-level capabilities (i.e. complex things are hard to create) and is missing critical functionality. We fear it will hinder students by teaching them to start everything from scratch, and by not teaching them a few key details.
64 |
65 | As a plotting tool it’s clearly falling short because it can’t label any axis (c.f. Tufte). Even if text were supported, the sample libraries for Matlab, Python, R, and JavaScript are much easier to draw plots with. The 2D Graphics proposal is neither capable nor convenient in that regard.
66 |
67 | As a broad generalization, students currently learn data visualization (beyond what Excel + CSV files can do) in Matlab or Python if they do science, in R if they do math, and in JavaScript if they do anything else. We urge the Committee members at least try some of these, for example scatterplot, histogram, wordtree. These aren’t teaching toys and are used, for example, by the New York Times. There’s value in teaching students to pull themselves up from the language’s bootstraps, we therefore think the type of API in the current 2D Graphics library is useful. However, we want to know—i.e. we want to see it prototyped—that higher-level capabilities are also something that can be implemented. We think higher-level capabilities are more useful for teaching, yet we understand that C++ might want to offer lower-level primitives first.
68 |
69 | Abstraction Level {#level}
70 | =================
71 |
72 | When we say the current proposal is too low-level, here are things we’d like to see at least prototyped to know that the proposal can grow into a powerful high-level library:
73 |
74 | * Obtain a window object
75 | * Load / transform / draw asset files
76 | * Complex raster image support (including swizzled surfaces, compression, 2D form clipping, used as texture fill)
77 | * New user-implemented rasterization primitives (such as ellipses or NURBS curve)
78 | * Stacking geometric transforms before drawing (can this be done already?)
79 | * Scissoring / clipping
80 | * Handle user input
81 | * Text support (glyph rasterization (e.g. FreeType), text Shaping (e.g. HarfBuzz), string Rendering (e.g. Pango)), or something platform specific (e.g. CoreText on Apple Platforms)
82 | * Complex line drawing (e.g. dashed lines, along a path)
83 | * Can all of the offered primitives be implemented directly on hardware using shaders?
84 |
85 | In other words, we understand that a proposal might want to start small and grow more features over time. We want to know that this growth is possible, and that features can be composed into higher-level primitives.
86 |
87 | Missing Details {#missing}
88 | ===============
89 |
90 | When we say the current proposal has key details we find missing, here are what we want to see in an initial version:
91 |
92 | * It’s unclear that buffering is implementable, and that’s critical to a high-performance implementation. We’d like to see it implemented. We want to see a deferred mode implementation, not just immediate mode.
93 | * Support modern color spaces and gamma.
94 | * DPI independence is needed.
95 | * Display points seem to address individual pixels in the image. We’d like to be able to address at finer granularity (MSAA samples, typographer points, pica).
96 | * We’re not convinced that animation can be supported efficiently (i.e. update a single matrix in the stack of transforms).
97 | * The current proposal doesn’t specify which image format can be loaded, yet the reference implementation has PNG, JPEG, TIFF. This lack of specification makes portability difficult.
98 | * We want to see an implementation generate PDF, SVG, raster output, as well as output in an OS window. This should be doable portably with zero code change.
99 |
100 | C++ Aesthetics {#cpp}
101 | ==============
102 |
103 | Aesthetically, this lacks the feel of a C++ standard library. In particular:
104 |
105 | * The dual error handling mechanism, while reminiscent of filesystem, is quaint in the STL.
106 | * Most APIs seem to be function-oriented and have a C API feel to them.
107 | * We’re surprised that we don’t have iterators / ranges for e.g. a path. We’d expect STL algorithms to work on such primitives.
108 | * We’d like to see linear algebra, trigonometry, and matrix math standardized separately.
109 |
110 | Conclusion {#conc}
111 | ==========
112 |
113 | We want to offer developers a graphics solution which allows usage of the full capabilities of the hardware we ship, without wasting battery life. Were we to ship the 2D Graphics proposal, we’d be putting our and C++’s good name on an API. We want to be sure it doesn't provides a disservice to developers and users.
114 |
115 | We’re surprised and worried that the reference implementation on Mac requires X11 and MacPorts. We want to see an implementation that re-uses platform primitives on more than Linux. What was the experience with CoreGraphics?
116 |
117 | The windows + SVG proposal in [[P1062R0]] isn’t terrible. Obtaining a window seems like a simple step forward. SVG has some upsides and a few downsides, but overall we’re positive on them. We like that the proposal leans on existing standards.
118 |
119 | Web view from [[P1108R0]] is trivial to support if specified well, but we don’t think it does what graphics enthusiasts want to do. It might be an interesting proposal, but we think it stands separately from 2D Graphics.
120 |
--------------------------------------------------------------------------------
/source/P0154R0.rst:
--------------------------------------------------------------------------------
1 | ================================================================================
2 | P0154R0 ``constexpr std::hardware_{constructive,destructive}_interference_size``
3 | ================================================================================
4 |
5 | :Author: JF Bastien
6 | :Contact: jfb@google.com
7 | :Author: Olivier Giroux
8 | :Contact: ogiroux@nvidia.com
9 | :Date: 2015-10-24
10 | :Previous: http://wg21.link/N4523
11 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0154R0.rst
12 |
13 | ---------
14 | Rationale
15 | ---------
16 |
17 | Starting with C++11, the library includes
18 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity
19 | useful in the design of control structures in multi-threaded programs: the
20 | extent of threads that do not interfere (to the first-order). Established
21 | practice throughout the industry also relies on a second implementation
22 | quantity, used instead in the design of data structures in the same programs.
23 | This quantity is the granularity of memory that does not interfere (to the
24 | first-order), commonly referred to as the *cache-line size*.
25 |
26 | Uses of *cache-line size* fall into two broad categories:
27 |
28 | * Avoiding destructive interference (false-sharing) between objects with
29 | temporally disjoint runtime access patterns from different
30 | threads. e.g. Producer-consumer queues.
31 | * Promoting constructive interference (true-sharing) between objects which have
32 | temporally local runtime access patterns. e.g. The ``barrier`` example, as
33 | illustrated in P0153R0_.
34 |
35 | .. _P0153R0: http://wg21.link/P0153R0
36 |
37 | The most sigificant issue with this useful implementation quantity is the
38 | questionable portability of the methods used in current practice to determine
39 | its value, despite their pervasiveness and popularity as a group. In the
40 | appendix_ we review several different compile-time and run-time methods. The
41 | portability problem with most of these methods is that they expose a
42 | micro-architectural detail without accounting for the intent of the implementors
43 | (such as we are) over the life of the ISA or ABI.
44 |
45 | We aim to contribute a modest invention for this cause, abstractions for this
46 | quantity that can be conservatively defined for given purposes by
47 | implementations:
48 |
49 | * *Destructive interference size*: a number that's suitable as an offset between
50 | two objects to likely avoid false-sharing due to different runtime access
51 | patterns from different threads.
52 | * *Constructive interference size*: a number that's suitable as a limit on two
53 | objects' combined memory footprint size and base alignment to likely promote
54 | true-sharing between them.
55 |
56 | In both cases these values are provided on a quality of implementation basis,
57 | purely as hints that are likely to improve performance. These are ideal portable
58 | values to use with the ``alignas()`` keyword, for which there currently exists
59 | nearly no standard-supported portable uses.
60 |
61 | -----------------
62 | Proposed addition
63 | -----------------
64 |
65 | Below, substitute the `�` character with a number the editor finds appropriate
66 | for the sub-section. We propose adding the following to the standard:
67 |
68 | Under 20.7.2 Header ```` synopsis [**memory.syn**]:
69 |
70 | .. code-block:: c++
71 |
72 | namespace std {
73 | // ...
74 | // 20.7.� Hardware interference size
75 | static constexpr size_t hardware_destructive_interference_size = implementation-defined;
76 | static constexpr size_t hardware_constructive_interference_size = implementation-defined;
77 | // ...
78 | }
79 |
80 | Under 20.7.� Hardware interference size [**hardware.interference**]:
81 |
82 | ``constexpr size_t hardware_destructive_interference_size = implementation-defined;``
83 |
84 | This number is the minimum recommended offset between two concurrently-accessed
85 | objects to avoid additional performance degradation due to contention introduced
86 | by the implementation. It shall be a valid alignment value for any type.
87 |
88 | [*Example:*
89 |
90 | .. code-block:: c++
91 |
92 | struct apart {
93 | alignas(hardware_destructive_interference_size) atomic flag1, flag2;
94 | };
95 |
96 | — *end example*]
97 |
98 | ``constexpr size_t hardware_constructive_interference_size = implementation-defined;``
99 |
100 | This number is the minimum recommended alignment of contiguous memory occupied
101 | by two objects accessed with temporal locality by concurrent threads. It shall
102 | be a valid alignment value for any type.
103 |
104 | [*Note:* This number is also the maximum recommended size of contiguous memory
105 | occupied by two objects accessed in this manner. — *end note*]
106 |
107 | [*Example:*
108 |
109 | .. code-block:: c++
110 |
111 | alignas(hardware_constructive_interference_size) struct colocated {
112 | atomic flag;
113 | int tinydata;
114 | };
115 | static_assert(sizeof(colocated) <= hardware_constructive_interference_size);
116 |
117 | — *end example*]
118 |
119 | The ``__cpp_lib_thread_hardware_interference_size`` feature test macro should be
120 | added.
121 |
122 | .. _appendix:
123 |
124 | --------
125 | Appendix
126 | --------
127 |
128 | Compile-time *cache-line size*
129 | ==============================
130 |
131 | We informatively list a few ways in which the L1 *cache-line size* is obtained
132 | in different open-source projects at compile-time.
133 |
134 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured
135 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this
136 | value is determined through the configure-time option
137 | ``CONFIG__L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT``
138 | is hard-coded in the architecture's ``include/asm/cache.h`` header.
139 |
140 | Many open-source projects from Google contain a ``base/port.h`` header which
141 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of
142 | architecture detection macros. These header files have often diverged. A token
143 | example from the autofdo_ project is:
144 |
145 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h
146 |
147 | .. code-block:: c++
148 |
149 | // Cache line alignment
150 | #if defined(__i386__) || defined(__x86_64__)
151 | #define CACHELINE_SIZE 64
152 | #elif defined(__powerpc64__)
153 | // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines.
154 | // Need to check if this is appropriate for other PowerPC64 systems.
155 | #define CACHELINE_SIZE 128
156 | #elif defined(__arm__)
157 | // Cache line sizes for ARM: These values are not strictly correct since
158 | // cache line sizes depend on implementations, not architectures. There
159 | // are even implementations with cache line sizes configurable at boot
160 | // time.
161 | #if defined(__ARM_ARCH_5T__)
162 | #define CACHELINE_SIZE 32
163 | #elif defined(__ARM_ARCH_7A__)
164 | #define CACHELINE_SIZE 64
165 | #endif
166 | #endif
167 |
168 | #ifndef CACHELINE_SIZE
169 | // A reasonable default guess. Note that overestimates tend to waste more
170 | // space, while underestimates tend to waste more time.
171 | #define CACHELINE_SIZE 64
172 | #endif
173 |
174 | #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE)))
175 |
176 | Runtime *cache-line size*
177 | =========================
178 |
179 | We informatively list a few ways in which the L1 *cache-line size* can be
180 | obtained on different operating systems and architectures at runtime. Libraries
181 | such as hwloc_ perform these queries, and could also be added to the standard as
182 | a separate proposal.
183 |
184 | .. _hwloc: http://www.open-mpi.org/projects/hwloc/
185 |
186 | On OSX one would use:
187 |
188 | .. code-block:: c++
189 |
190 | sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0)
191 |
192 | On Windows one would use:
193 |
194 | .. code-block:: c++
195 |
196 | GetLogicalProcessorInformation(&buf[0], &sizeof_buf);
197 | for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
198 | if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1)
199 | cacheline_size = buf[i].Cache.LineSize;
200 |
201 | On Linux one would either use:
202 |
203 | .. code-block:: c++
204 |
205 | p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
206 | fscanf(p, "%d", &cacheline_size);
207 |
208 | or:
209 |
210 | .. code-block:: c++
211 |
212 | sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
213 |
214 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which
215 | leaves the result in ``ECX``, which needs further work to extract.
216 |
217 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to
218 | extract.
219 |
--------------------------------------------------------------------------------
/source/P0154R1.rst:
--------------------------------------------------------------------------------
1 | ================================================================================
2 | P0154R1 ``constexpr std::hardware_{constructive,destructive}_interference_size``
3 | ================================================================================
4 |
5 | :Author: JF Bastien
6 | :Contact: jfb@google.com
7 | :Author: Olivier Giroux
8 | :Contact: ogiroux@nvidia.com
9 | :Date: 2016-03-03
10 | :Previous: http://wg21.link/N4523
11 | :Previous: http://wg21.link/P0154R0
12 | :URL: https://github.com/jfbastien/papers/blob/master/source/P0154R1.rst
13 |
14 | ---------
15 | Rationale
16 | ---------
17 |
18 | Starting with C++11, the library includes
19 | ``std::thread::hardware_concurrency()`` to provide an implementation quantity
20 | useful in the design of control structures in multi-threaded programs: the
21 | extent of threads that do not interfere (to the first-order). Established
22 | practice throughout the industry also relies on a second implementation
23 | quantity, used instead in the design of data structures in the same programs.
24 | This quantity is the granularity of memory that does not interfere (to the
25 | first-order), commonly referred to as the *cache-line size*.
26 |
27 | Uses of *cache-line size* fall into two broad categories:
28 |
29 | * Avoiding destructive interference (false-sharing) between objects with
30 | temporally disjoint runtime access patterns from different
31 | threads. e.g. Producer-consumer queues.
32 | * Promoting constructive interference (true-sharing) between objects which have
33 | temporally local runtime access patterns. e.g. The ``barrier`` example, as
34 | illustrated in P0153R0_.
35 |
36 | .. _P0153R0: http://wg21.link/P0153R0
37 |
38 | The most sigificant issue with this useful implementation quantity is the
39 | questionable portability of the methods used in current practice to determine
40 | its value, despite their pervasiveness and popularity as a group. In the
41 | appendix_ we review several different compile-time and run-time methods. The
42 | portability problem with most of these methods is that they expose a
43 | micro-architectural detail without accounting for the intent of the implementors
44 | (such as we are) over the life of the ISA or ABI.
45 |
46 | We aim to contribute a modest invention for this cause, abstractions for this
47 | quantity that can be conservatively defined for given purposes by
48 | implementations:
49 |
50 | * *Destructive interference size*: a number that's suitable as an offset between
51 | two objects to likely avoid false-sharing due to different runtime access
52 | patterns from different threads.
53 | * *Constructive interference size*: a number that's suitable as a limit on two
54 | objects' combined memory footprint size and base alignment to likely promote
55 | true-sharing between them.
56 |
57 | In both cases these values are provided on a quality of implementation basis,
58 | purely as hints that are likely to improve performance. These are ideal portable
59 | values to use with the ``alignas()`` keyword, for which there currently exists
60 | nearly no standard-supported portable uses.
61 |
62 | -----------------
63 | Proposed addition
64 | -----------------
65 |
66 | Below, substitute the `�` character with a number the editor finds appropriate
67 | for the sub-section. We propose adding the following to the standard:
68 |
69 | Under 18.6 Header ```` synopsis [**support.dynamic**]:
70 |
71 | .. code-block:: c++
72 |
73 | namespace std {
74 | // ...
75 | // 18.6.� Hardware interference size
76 | static constexpr size_t hardware_destructive_interference_size = implementation-defined;
77 | static constexpr size_t hardware_constructive_interference_size = implementation-defined;
78 | // ...
79 | }
80 |
81 | Under 18.6.� Hardware interference size [**hardware.interference**]:
82 |
83 | ``constexpr size_t hardware_destructive_interference_size = implementation-defined;``
84 |
85 | This number is the minimum recommended offset between two concurrently-accessed
86 | objects to avoid additional performance degradation due to contention introduced
87 | by the implementation. It shall be at least ``alignof(max_align_t)``.
88 |
89 | [*Example:*
90 |
91 | .. code-block:: c++
92 |
93 | struct keep_apart {
94 | alignas(hardware_destructive_interference_size) atomic cat;
95 | alignas(hardware_destructive_interference_size) atomic dog;
96 | };
97 |
98 | — *end example*]
99 |
100 | ``constexpr size_t hardware_constructive_interference_size = implementation-defined;``
101 |
102 | This number is the maximum recommended size of contiguous memory occupied by two
103 | objects accessed with temporal locality by concurrent threads. It shall be at
104 | least ``alignof(max_align_t)``.
105 |
106 | [*Example:*
107 |
108 | .. code-block:: c++
109 |
110 | struct together {
111 | atomic dog;
112 | int puppy;
113 | };
114 | struct kennel {
115 | // Other data members...
116 | alignas(sizeof(together)) together pack;
117 | // Other data members...
118 | };
119 | static_assert(sizeof(together) <= hardware_constructive_interference_size);
120 |
121 | — *end example*]
122 |
123 | The ``__cpp_lib_thread_hardware_interference_size`` feature test macro should be
124 | added.
125 |
126 | .. _appendix:
127 |
128 | --------
129 | Appendix
130 | --------
131 |
132 | Compile-time *cache-line size*
133 | ==============================
134 |
135 | We informatively list a few ways in which the L1 *cache-line size* is obtained
136 | in different open-source projects at compile-time.
137 |
138 | The Linux kernel defines the ``__cacheline_aligned`` macro which is configured
139 | for each architecture through ``L1_CACHE_BYTES``. On some architectures this
140 | value is determined through the configure-time option
141 | ``CONFIG__L1_CACHE_SHIFT``, and on others the value of ``L1_CACHE_SHIFT``
142 | is hard-coded in the architecture's ``include/asm/cache.h`` header.
143 |
144 | Many open-source projects from Google contain a ``base/port.h`` header which
145 | defines the ``CACHELINE_ALIGNED`` macro based on an explicit list of
146 | architecture detection macros. These header files have often diverged. A token
147 | example from the autofdo_ project is:
148 |
149 | .. _autofdo: https://github.com/google/autofdo/blob/master/base/port.h
150 |
151 | .. code-block:: c++
152 |
153 | // Cache line alignment
154 | #if defined(__i386__) || defined(__x86_64__)
155 | #define CACHELINE_SIZE 64
156 | #elif defined(__powerpc64__)
157 | // TODO(dougkwan) This is the L1 D-cache line size of our Power7 machines.
158 | // Need to check if this is appropriate for other PowerPC64 systems.
159 | #define CACHELINE_SIZE 128
160 | #elif defined(__arm__)
161 | // Cache line sizes for ARM: These values are not strictly correct since
162 | // cache line sizes depend on implementations, not architectures. There
163 | // are even implementations with cache line sizes configurable at boot
164 | // time.
165 | #if defined(__ARM_ARCH_5T__)
166 | #define CACHELINE_SIZE 32
167 | #elif defined(__ARM_ARCH_7A__)
168 | #define CACHELINE_SIZE 64
169 | #endif
170 | #endif
171 |
172 | #ifndef CACHELINE_SIZE
173 | // A reasonable default guess. Note that overestimates tend to waste more
174 | // space, while underestimates tend to waste more time.
175 | #define CACHELINE_SIZE 64
176 | #endif
177 |
178 | #define CACHELINE_ALIGNED __attribute__((aligned(CACHELINE_SIZE)))
179 |
180 | Runtime *cache-line size*
181 | =========================
182 |
183 | We informatively list a few ways in which the L1 *cache-line size* can be
184 | obtained on different operating systems and architectures at runtime. Libraries
185 | such as hwloc_ perform these queries, and could also be added to the standard as
186 | a separate proposal.
187 |
188 | .. _hwloc: http://www.open-mpi.org/projects/hwloc/
189 |
190 | On OSX one would use:
191 |
192 | .. code-block:: c++
193 |
194 | sysctlbyname("hw.cachelinesize", &cacheline_size, &sizeof_cacheline_size, 0, 0)
195 |
196 | On Windows one would use:
197 |
198 | .. code-block:: c++
199 |
200 | GetLogicalProcessorInformation(&buf[0], &sizeof_buf);
201 | for (i = 0; i != sizeof_buf / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); ++i) {
202 | if (buf[i].Relationship == RelationCache && buf[i].Cache.Level == 1)
203 | cacheline_size = buf[i].Cache.LineSize;
204 |
205 | On Linux one would either use:
206 |
207 | .. code-block:: c++
208 |
209 | p = fopen("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", "r");
210 | fscanf(p, "%d", &cacheline_size);
211 |
212 | or:
213 |
214 | .. code-block:: c++
215 |
216 | sysconf(_SC_LEVEL1_DCACHE_LINESIZE);
217 |
218 | On x86 one would use the ``CPUID`` Instruction with ``EAX = 80000005h``, which
219 | leaves the result in ``ECX``, which needs further work to extract.
220 |
221 | On ARM one would use ``mrs %[ctr], ctr_el0``, which needs further work to
222 | extract.
223 |
--------------------------------------------------------------------------------
/source/P0476r2.bs:
--------------------------------------------------------------------------------
1 |
16 |
17 |
18 | This paper is a revision of [[P0476r1]], addressing LEWG comments from the 2017
19 | Toronto meeting as well as comments from LEWG and LWG from the 2017 Albuquerque
20 | meeting. See [[#rev]] for details.
21 |
22 |
23 | Background {#bg}
24 | ==========
25 |
26 | Low-level code often seeks to interpret objects of one type as another: keep the
27 | same bits, but obtain an object of a different type. Doing so correctly is
28 | error-prone: using `reinterpret_cast` or `union` runs afoul of type-aliasing
29 | rules yet these are the intuitive solutions developers mistakenly turn to.
30 |
31 | Attuned developers use `aligned_storage` with `memcpy`, avoiding alignment
32 | pitfalls and allowing them to bit-cast non-default-constructible types.
33 |
34 | This proposal uses appropriate concepts to prevent misuse. As the sample
35 | implementation demonstrates we could as well use `static_assert` or template
36 | SFINAE, but the timing of this library feature will likely coincide with
37 | concept's standardization.
38 |
39 | Furthermore, it is currently impossible to implement a `constexpr` bit-cast
40 | function, as `memcpy` itself isn't `constexpr`. Marking the proposed function as
41 | `constexpr` doesn't require or prevent `memcpy` from becoming `constexpr`, but
42 | requires compiler support. This leaves implementations free to use their own
43 | internal solution (e.g. LLVM has a `bitcast`
45 | opcode).
46 |
47 | We should standardize this oft-used idiom, and avoid the pitfalls once and for
48 | all.
49 |
50 |
51 | Proposed Wording {#word}
52 | ================
53 |
54 | Below, substitute the `�` character with a number or name the editor finds
55 | appropriate for the sub-section.
56 |
57 | In 20.5.1.2 [**headers**] add the header `` to:
58 |
59 | * Table 16 — C++ library headers
60 | * Table 19 — C++ headers for freestanding implementations
61 |
62 | In the numerics section, add the following:
63 |
64 |
65 | 29.� Bit manipulation library [**bit**] {#bit}
66 | ---------------------------------------
67 |
68 | 29.�.1 General [**bit.general**] {#bitgen}
69 | --------------------------------
70 |
71 | The header `` provides components to access, manipulate and process both
72 | individual bits and bit sequences.
73 |
74 | 29.�.2 Header `` synopsis [**bit.syn**] {#bitsyn}
75 | --------------------------------------------
76 |
77 |
78 | namespace std {
79 |
80 | // 29.�.3 bit_cast
81 | template
82 | constexpr To bit_cast(const From& from) noexcept;
83 |
84 | }
85 |
86 |
87 | 29.�.3 Function template `bit_cast` [**bit.cast**] {#bitcast}
88 | --------------------------------------------------
89 |
90 |
91 | template
92 | constexpr To bit_cast(const From& from) noexcept;
93 |
94 |
95 |
96 |
*Remarks*:
97 |
98 | This function shall not participate in overload resolution unless:
99 |
100 |
`sizeof(To) == sizeof(From)` is `true`;
101 |
`is_trivially_copyable_v` is `true`; and
102 |
`is_trivially_copyable_v` is `true`.
103 |
104 |
105 | This function shall be `constexpr` if and only if `To`, `From`, and the types
106 | of all subobjects of `To` and `From` are types `T` such that:
107 |
108 |
109 |
`is_union_v` is `false`;
110 |
`is_pointer_v` is `false`;
111 |
`is_member_pointer_v` is `false`;
112 |
`is_volatile_v` is `false`; and
113 |
`T` has no non-static data members of reference type.
114 |
115 |
116 |
*Returns*:
117 |
118 | An object of type `To`. Each bit of the value representation of the result
119 | is equal to the corresponding bit in the object representation of
120 | `from`. Padding bits of the `To` object are unspecified. If there is no
121 | value of type `To` corresponding to the value representation produced, the
122 | behavior is undefined. If there are multiple such values, which value is
123 | produced is unspecified.
124 |
125 |
126 |
127 |
128 |
129 | Feature testing {#test}
130 | ---------------
131 |
132 | The `__cpp_lib_bit_cast` feature test macro should be added.
133 |
134 | Appendix {#appendix}
135 | ========
136 |
137 | The Standard's [**basic.types**] section explicitly blesses `memcpy`:
138 |
139 |
140 |
141 | For any trivially copyable type `T`, if two pointers to `T` point to distinct
142 | `T` objects `obj1` and `obj2`, where neither `obj1` nor `obj2` is a base-class
143 | subobject, if the *underlying bytes* (1.7) making up `obj1` are copied into
144 | `obj2`, `obj2` shall subsequently hold the same value as `obj1`.
145 |
146 | [*Example:*
147 | ```
148 | T* t1p;
149 | T* t2p;
150 | // provided that t2p points to an initialized object ...
151 | std::memcpy(t1p, t2p, sizeof(T));
152 | // at this point, every subobject of trivially copyable type in *t1p contains
153 | // the same value as the corresponding subobject in *t2p
154 | ```
155 | — *end example*]
156 |
157 |
162 |
163 | In a union, at most one of the non-static data members can be
164 | active at any time, that is, the value of at most one of the
165 | non-static data members can be stored in a union at any time.
166 |
167 |
168 |
169 |
170 | Revision History {#rev}
171 | ================
172 |
173 | r1 ➡ r2 {#r1r2}
174 | --------
175 |
176 | The paper was reviewed by LEWG at the 2017 Toronto meeting and feedback was
177 | provided. In the 2017 Albuquerque meeting LEWG provided feedback regarding usage
178 | of concepts while discussing [[P0802r0]], and EWG reviewed the paper:
179 |
180 | * Use "shall not participate in overload resolution" wording instead of a
181 | requires clause.
182 | * The author was asked to explore naming. LEWG took a poll in Albuquerque and
183 | voted to keep `bit_cast`.
184 | * There was strong sentiment that this facility should be available in
185 | freestanding implementations. LEWG is changing its guidance regarding
186 | freestanding header granularity, but until guidance is actually changed it
187 | was decided that a currently freestanding header should be used. LEWG took a
188 | poll in Albuquerque, and the new `` header was chosen instead of
189 | ``.
190 | * Call out that `constexpr` requires compiler support.
191 | * Make `constexpr` conditional, similar to variant's [variant.ctor] wording,
192 | based on an EWG straw poll in Albuquerque.
193 | * LWG review made the `constexpr` remark recursive, and tuned the return
194 | wording, asking CWG to review the changes.
195 | * LWG review requested that this paper also add the `` header, and let
196 | the editor resolve races if multiple papers add the header concurrently.
197 | * CWG substantially tuned the wording.
198 |
199 | r0 ➡ r1 {#r0r1}
200 | --------
201 |
202 | The paper was reviewed by LEWG at the 2016 Issaquah meeting:
203 |
204 | * Remove the standard layout requirement—trivially copyable suffices for the `memcpy` requirement.
205 | * We discussed removing `constexpr`, but there was no consent either way. There was some suggestion that it’ll be hard for implementers, but there's also some desire (by the same implementers) to have those features available in order to support things like `constexpr` instances of `std::variant`.
206 | * The pointer-forbidding logic was removed. It was initially there to help developers when a better tool is available, but it's easily worked around (e.g. with a `struct` containing a pointer). Note that this doesn't prevent `constexpr` versions of `bit_cast`: the implementation is allowed to error out on `bit_cast` of pointer.
207 | * Some discussion about concepts-usage, but it seems like mostly an LWG issue and we're reasonably sure that concepts will land before this or in a compatible vehicle.
208 |
209 | Straw polls:
210 |
211 | * Do we want to see [[P0476r0]] again? unanimous consent.
212 | * `bit_cast` should allow pointer types in `To` and `From`. **SF F N A SA** 4 5 4 2 1
213 | * `bit_cast` should be `constexpr`? **SF F N A SA** 4 3 7 2 3
214 |
215 |
216 | Acknowledgement {#ack}
217 | ===============
218 |
219 | Thanks to Saam Barati, Jeffrey Yasskin, and Sam Benzaquen for their early review
220 | and suggested improvements.
221 |
--------------------------------------------------------------------------------
/source/conf.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # Papers documentation build configuration file, created by
4 | # sphinx-quickstart on Sun Mar 22 16:26:35 2015.
5 | #
6 | # This file is execfile()d with the current directory set to its
7 | # containing dir.
8 | #
9 | # Note that not all possible configuration values are present in this
10 | # autogenerated file.
11 | #
12 | # All configuration values have a default; values that are commented out
13 | # serve to show the default.
14 |
15 | import sys
16 | import os
17 |
18 | # If extensions (or modules to document with autodoc) are in another directory,
19 | # add these directories to sys.path here. If the directory is relative to the
20 | # documentation root, use os.path.abspath to make it absolute, like shown here.
21 | #sys.path.insert(0, os.path.abspath('.'))
22 |
23 | # -- General configuration ------------------------------------------------
24 |
25 | # If your documentation needs a minimal Sphinx version, state it here.
26 | #needs_sphinx = '1.0'
27 |
28 | # Add any Sphinx extension module names here, as strings. They can be
29 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
30 | # ones.
31 | extensions = [
32 | 'sphinx.ext.todo',
33 | ]
34 |
35 | # Add any paths that contain templates here, relative to this directory.
36 | templates_path = ['_templates']
37 |
38 | # The suffix of source filenames.
39 | source_suffix = '.rst'
40 |
41 | # The encoding of source files.
42 | #source_encoding = 'utf-8-sig'
43 |
44 | # The master toctree document.
45 | master_doc = 'index'
46 |
47 | # General information about the project.
48 | project = u'Papers'
49 | copyright = u'2015, JF Bastien'
50 |
51 | # The version info for the project you're documenting, acts as replacement for
52 | # |version| and |release|, also used in various other places throughout the
53 | # built documents.
54 | #
55 | # The short X.Y version.
56 | version = '1.0'
57 | # The full version, including alpha/beta/rc tags.
58 | release = '1.0'
59 |
60 | # The language for content autogenerated by Sphinx. Refer to documentation
61 | # for a list of supported languages.
62 | #language = None
63 |
64 | # There are two options for replacing |today|: either, you set today to some
65 | # non-false value, then it is used:
66 | #today = ''
67 | # Else, today_fmt is used as the format for a strftime call.
68 | #today_fmt = '%B %d, %Y'
69 |
70 | # List of patterns, relative to source directory, that match files and
71 | # directories to ignore when looking for source files.
72 | exclude_patterns = []
73 |
74 | # The reST default role (used for this markup: `text`) to use for all
75 | # documents.
76 | #default_role = None
77 |
78 | # If true, '()' will be appended to :func: etc. cross-reference text.
79 | #add_function_parentheses = True
80 |
81 | # If true, the current module name will be prepended to all description
82 | # unit titles (such as .. function::).
83 | #add_module_names = True
84 |
85 | # If true, sectionauthor and moduleauthor directives will be shown in the
86 | # output. They are ignored by default.
87 | #show_authors = False
88 |
89 | # The name of the Pygments (syntax highlighting) style to use.
90 | pygments_style = 'sphinx'
91 |
92 | # A list of ignored prefixes for module index sorting.
93 | #modindex_common_prefix = []
94 |
95 | # If true, keep warnings as "system message" paragraphs in the built documents.
96 | #keep_warnings = False
97 |
98 |
99 | # -- Options for HTML output ----------------------------------------------
100 |
101 | # The theme to use for HTML and HTML Help pages. See the documentation for
102 | # a list of builtin themes.
103 | html_theme = 'basic'
104 |
105 | # Theme options are theme-specific and customize the look and feel of a theme
106 | # further. For a list of options available for each theme, see the
107 | # documentation.
108 | #html_theme_options = {}
109 |
110 | # Add any paths that contain custom themes here, relative to this directory.
111 | html_theme_path = ['_templates/']
112 |
113 | # The name for this set of Sphinx documents. If None, it defaults to
114 | # " v documentation".
115 | html_title = ''
116 |
117 | # A shorter title for the navigation bar. Default is the same as html_title.
118 | #html_short_title = None
119 |
120 | # The name of an image file (relative to this directory) to place at the top
121 | # of the sidebar.
122 | #html_logo = None
123 |
124 | # The name of an image file (within the static path) to use as favicon of the
125 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
126 | # pixels large.
127 | #html_favicon = None
128 |
129 | # Add any paths that contain custom static files (such as style sheets) here,
130 | # relative to this directory. They are copied after the builtin static files,
131 | # so a file named "default.css" will overwrite the builtin "default.css".
132 | html_static_path = ['_static']
133 |
134 | # Add any extra paths that contain custom files (such as robots.txt or
135 | # .htaccess) here, relative to this directory. These files are copied
136 | # directly to the root of the documentation.
137 | #html_extra_path = []
138 |
139 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
140 | # using the given strftime format.
141 | #html_last_updated_fmt = '%b %d, %Y'
142 |
143 | # If true, SmartyPants will be used to convert quotes and dashes to
144 | # typographically correct entities.
145 | #html_use_smartypants = True
146 |
147 | # Custom sidebar templates, maps document names to template names.
148 | #html_sidebars = {}
149 |
150 | # Additional templates that should be rendered to pages, maps page names to
151 | # template names.
152 | #html_additional_pages = {}
153 |
154 | # If false, no module index is generated.
155 | #html_domain_indices = True
156 |
157 | # If false, no index is generated.
158 | #html_use_index = True
159 |
160 | # If true, the index is split into individual pages for each letter.
161 | #html_split_index = False
162 |
163 | # If true, links to the reST sources are added to the pages.
164 | #html_show_sourcelink = True
165 |
166 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
167 | #html_show_sphinx = True
168 |
169 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
170 | #html_show_copyright = True
171 |
172 | # If true, an OpenSearch description file will be output, and all pages will
173 | # contain a tag referring to it. The value of this option must be the
174 | # base URL from which the finished HTML is served.
175 | #html_use_opensearch = ''
176 |
177 | # This is the file name suffix for HTML files (e.g. ".xhtml").
178 | #html_file_suffix = None
179 |
180 | # Output file base name for HTML help builder.
181 | htmlhelp_basename = 'Papersdoc'
182 |
183 |
184 | # -- Options for LaTeX output ---------------------------------------------
185 |
186 | latex_elements = {
187 | # The paper size ('letterpaper' or 'a4paper').
188 | #'papersize': 'letterpaper',
189 |
190 | # The font size ('10pt', '11pt' or '12pt').
191 | #'pointsize': '10pt',
192 |
193 | # Additional stuff for the LaTeX preamble.
194 | #'preamble': '',
195 | }
196 |
197 | # Grouping the document tree into LaTeX files. List of tuples
198 | # (source start file, target name, title,
199 | # author, documentclass [howto, manual, or own class]).
200 | latex_documents = [
201 | ('index', 'Papers.tex', u'Papers Documentation',
202 | u'JF Bastien', 'manual'),
203 | ]
204 |
205 | # The name of an image file (relative to this directory) to place at the top of
206 | # the title page.
207 | #latex_logo = None
208 |
209 | # For "manual" documents, if this is true, then toplevel headings are parts,
210 | # not chapters.
211 | #latex_use_parts = False
212 |
213 | # If true, show page references after internal links.
214 | #latex_show_pagerefs = False
215 |
216 | # If true, show URL addresses after external links.
217 | #latex_show_urls = False
218 |
219 | # Documents to append as an appendix to all manuals.
220 | #latex_appendices = []
221 |
222 | # If false, no module index is generated.
223 | #latex_domain_indices = True
224 |
225 |
226 | # -- Options for manual page output ---------------------------------------
227 |
228 | # One entry per manual page. List of tuples
229 | # (source start file, name, description, authors, manual section).
230 | man_pages = [
231 | ('index', 'papers', u'Papers Documentation',
232 | [u'JF Bastien'], 1)
233 | ]
234 |
235 | # If true, show URL addresses after external links.
236 | #man_show_urls = False
237 |
238 |
239 | # -- Options for Texinfo output -------------------------------------------
240 |
241 | # Grouping the document tree into Texinfo files. List of tuples
242 | # (source start file, target name, title, author,
243 | # dir menu entry, description, category)
244 | texinfo_documents = [
245 | ('index', 'Papers', u'Papers Documentation',
246 | u'JF Bastien', 'Papers', 'One line description of project.',
247 | 'Miscellaneous'),
248 | ]
249 |
250 | # Documents to append as an appendix to all manuals.
251 | #texinfo_appendices = []
252 |
253 | # If false, no module index is generated.
254 | #texinfo_domain_indices = True
255 |
256 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
257 | #texinfo_show_urls = 'footnote'
258 |
259 | # If true, do not generate a @detailmenu in the "Top" node's menu.
260 | #texinfo_no_detailmenu = False
261 |
--------------------------------------------------------------------------------
/source/p0528r3.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: The Curious Case of Padding Bits, Featuring Atomic Compare-and-Exchange
3 | Shortname: P0528
4 | Revision: 3
5 | Audience: CWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P0528r3
9 | !Source: github.com/jfbastien/papers/blob/master/source/P0528r3.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Editor: Michael Spencer, Sony Playstation, bigcheesegs@gmail.com
12 | Abstract: Compare-and-exchange on a struct with padding bits should Just Work.
13 | Date: 2018-06-07
14 | Markup Shorthands: markdown yes
15 |
16 |
17 | This issue has been discussed by the authors at every recent Standards meetings,
18 | yet a full solution has been elusive despite helpful proposals. We believe that
19 | this proposal can fix this oft-encountered problem once and for all.
20 |
21 | [[P0528r0]] details extensive background on this problem (not repeated here),
22 | and proposed standardizing a trait, `has_padding_bits`, and using it on
23 | `compare_and_exchange_*`. [[P0528r1]] applied EWG guidance and simply added
24 | wording directing implementations to ensure that the desired behavior occur. At
25 | SG1's request this paper follows EWG's guidance but uses different wording.
26 |
27 |
28 | Edit History {#edit}
29 | ============
30 |
31 | r2 → r3 {#r2r3}
32 | -------
33 |
34 | In Rapperswil, CWG suggested various wording updates to the paper.
35 |
36 |
37 | r1 → r2 {#r1r2}
38 | -------
39 |
40 | In Jacksonville, SG1 supported the paper but suggested an alternate way to
41 | approach the wording than the one EWG proposed in Albuquerque: don't talk about
42 | contents of the memory, but rather discuss the value representation to describe
43 | compare-and-exchange. This paper follows SG1's guidance and offers different
44 | wording, with the intent that the semantics be equivalent. EWG reviewed the
45 | updated wording an voted to support it and forward to Core.
46 |
47 | r0 → r1 {#r0r1}
48 | -------
49 |
50 | In Albuquerque, EWG voted to make the padding bits of `atomic` and the incoming
51 | value of `T` have a consistent value for the purposes of read/modify/write
52 | atomic operations?
53 |
54 | Purposefully not addressed in this paper:
55 |
56 | * `union` with padding bits
57 | * Types with trap representations
58 |
59 | Proposed Wording {#word}
60 | ================
61 |
62 | In Operations on `atomic` types [**atomics.types.operations**], edit ❡17 and
63 | onwards as follows:
64 |
65 |
93 |
94 | *Requires:* The `failure` argument shall not be `memory_order::release` nor
95 | `memory_order::acq_rel`.
96 |
97 |
98 |
99 | ❡18:
100 |
101 |
102 |
103 | *Effects:* Retrieves the value in `expected`. It then atomically compares
104 | the contents of the memoryvalue representation of the value
105 | pointed to by `this` for equality with that previously retrieved from
106 | `expected`, and if true, replaces the contents of the memoryvalue
107 | pointed to by `this` with that in
108 | `desired`. If and only if the comparison is true, memory is affected according
109 | to the value of `success`, and if the comparison is false, memory is affected
110 | according to the value of `failure`. When only one `memory_order` argument is
111 | supplied, the value of `success` is `order`, and the value of `failure` is
112 | `order` except that a value of `memory_order::acq_rel` shall be replaced by the
113 | value `memory_order::acquire` and a value of `memory_order::release` shall be
114 | replaced by the value `memory_order::relaxed`. If and only if the comparison is
115 | false then, after the atomic operation, the contents of the
116 | memoryvalue in `expected` areis
117 | replaced by the value read from the memory pointed to
118 | by `this` during the atomic comparison. If the operation returns `true`, these
119 | operations are atomic read-modify-write operations on the memory pointed to by
120 | `this`. Otherwise, these operations are atomic load operations on that memory.
121 |
122 |
123 |
124 | ❡19:
125 |
126 |
127 |
128 | *Returns:* The result of the comparison.
129 |
130 |
131 |
132 | ❡20:
133 |
134 |
135 |
136 | [*Note:*
137 |
138 | For example, the effect of `compare_exchange_strong` on objects without padding bits is
139 |
140 |
141 |
142 | if (memcmp(this, &expected, sizeof(*this)) == 0)
143 | memcpy(this, &desired, sizeof(*this));
144 | else
145 | memcpy(expected, this, sizeof(*this));
146 |
147 |
148 |
149 | —*end note*]
150 |
151 | [*Example:*
152 |
153 | The expected use of the compare-and-exchange operations is as follows. The
154 | compare-and-exchange operations will update `expected` when another iteration
155 | of the loop is needed.
156 |
157 |
158 |
159 | expected = current.load();
160 | do {
161 | desired = function(expected);
162 | } while (!current.compare_exchange_weak(expected, desired));
163 |
164 |
165 |
166 | —*end example*]
167 |
168 | [*Example:*
169 |
170 | Because the expected value is updated only on failure, code releasing the
171 | memory containing the `expected` value on success will work. E.g. list head
172 | insertion will act atomically and would not introduce a data race in the
173 | following code:
174 |
175 |
176 |
177 | do {
178 | p->next = head; // make new list node point to the current head
179 | } while (!head.compare_exchange_weak(p->next, p)); // try to insert
180 |
181 |
182 |
183 | —*end example*]
184 |
185 |
186 |
187 | ❡21:
188 |
189 |
190 |
191 | Implementations should ensure that weak compare-and-exchange operations do not
192 | consistently return `false` unless either the atomic object has value different
193 | from `expected` or there are concurrent modifications to the atomic object.
194 |
195 |
196 |
197 | ❡22:
198 |
199 |
200 |
201 | *Remarks:* A weak compare-and-exchange operation may fail spuriously. That is,
202 | even when the contents of memory referred to by `expected` and `this` are equal,
203 | it may return `false` and store back to `expected` the same memory contents that
204 | were originally there.
205 |
206 | [*Note:*
207 |
208 | This spurious failure enables implementation of compare-and-exchange on a
209 | broader class of machines, e.g., load-locked store-conditional machines. A
210 | consequence of spurious failure is that nearly all uses of weak
211 | compare-and-exchange will be in a loop. When a compare-and-exchange is in a
212 | loop, the weak version will yield better performance on some platforms. When a
213 | weak compare-and-exchange would require a loop and a strong one would not, the
214 | strong one is preferable.
215 |
216 | —*end note*]
217 |
218 |
219 |
220 | ❡23:
221 |
222 |
223 |
224 | [*Note:*
225 |
226 | Under cases where the The `memcpy` and `memcmp`
227 | semantics of the compare-and-exchange operations apply, the outcome might
228 | be may result in failed comparisons for values that compare
229 | equal with `operator==` if the underlying type has padding bits, trap bits, or
230 | alternate representations of the same value. Notably, on implementations
231 | conforming to ISO/IEC/IEEE 60559, floating-point `-0.0` and `+0.0` will not
232 | compare equal with `memcmp` but will compare equal with `operator==`, and NaNs
233 | with the same payload will compare equal with `memcmp` but will not compare
234 | equal with `operator==`.
235 |
236 | —*end note*]
237 |
238 |
239 |
240 | [*Note:*
241 |
242 | Because compare-and-exchange acts on an object’s value representation, padding
243 | bits that never participate in the object’s value representation are ignored.
244 |
245 | As a consequence, the following code is guaranteed to avoid spurious failure:
246 |
247 |
248 |
249 | struct padded {
250 | char clank = 0x42;
251 | // Padding here.
252 | unsigned biff = 0xC0DEFEFE;
253 | };
254 | atomic pad = ATOMIC_VAR_INIT({});
255 |
256 | bool zap() {
257 | padded expected, desired { 0, 0 };
258 | return pad.compare_exchange_strong(expected, desired);
259 | }
260 |
261 |
262 |
263 | —*end note*]
264 |
265 | [*Note:*
266 |
267 | For a union with bits that participate in the value representation of some
268 | members but not others, compare-and-exchange might always fail. This is because
269 | such padding bits have an indeteminate value when they do not participate in
270 | the value representation of the active member.
271 |
272 | As a consequence, the following code is not guaranteed to ever succeed:
273 |
274 |
275 |
276 | union pony {
277 | double celestia = 0.;
278 | short luna; // padded
279 | };
280 | atomic princesses = ATOMIC_VAR_INIT({});
281 |
282 | bool party(pony desired) {
283 | pony expected;
284 | return princesses.compare_exchange_strong(expected, desired);
285 | }
286 |
287 |
288 |
289 | —*end note*]
290 |
291 |
292 |
293 |
294 |
--------------------------------------------------------------------------------
/source/N4522.rst:
--------------------------------------------------------------------------------
1 | ==============================================
2 | N4522 ``std::atomic_object_fence(mo, T&&...)``
3 | ==============================================
4 |
5 | :Author: Olivier Giroux
6 | :Contact: ogiroux@nvidia.com
7 | :Author: JF Bastien
8 | :Contact: jfb@google.com
9 | :Date: 2015-05-21
10 | :URL: https://github.com/jfbastien/papers/blob/master/source/N4522.rst
11 |
12 | ---------
13 | Rationale
14 | ---------
15 |
16 | Fences allow programmers to express a conservative approximation to the precise
17 | pair-wise relations of operations required to be ordered in the happens-before
18 | relation. This is conservative because fences use the sequenced-before relation
19 | to select vast extents of the program into the happens-before relation.
20 |
21 | This conservatism is commonly desired because it is difficult to reason about
22 | operations hidden behind layers of abstraction in C++ programs. An unfortunate
23 | consequence of this is that precise expression of ordering is not possible in
24 | C++ currently, which makes it easy to over-constrain the order of operations
25 | internal to synchronization primitives that comprise multiple atomic objects.
26 | This constrains the ability of implementations (compiler and hardware) to
27 | reorder, ignore, or assume the absence of operations that are not relevant or
28 | not visible.
29 |
30 | In existing practice, the ``flush`` primitive of OpenMP is more expressive than
31 | the fences of C++ in at least this one sense: it can optionally restrict the
32 | ordering of operations to a developer-specified set of memory locations. This is
33 | enough to exactly express the required pair-wise ordering for short lock-free
34 | algorithms. This capability isn't only relevant to OpenMP and would be further
35 | enhanced if it was integrated with the other facets of the more modern C++
36 | memory model.
37 |
38 | An example use-case for this capability is a likely implementation strategy for
39 | N4392_'s ``std::barrier`` object. This algorithm makes ordered modifications on
40 | the atomic sub-objects of a larger non-atomic synchronization object, but the
41 | internal modifications need only be ordered with respect to each other, not all
42 | surrounding objects (they are ordered separately).
43 |
44 | .. _N4392: http://wg21.link/N4392
45 |
46 | In one example implementation, ``std::barrier`` is coded as follows:
47 |
48 | .. code-block:: c++
49 |
50 | struct barrier {
51 | // Some member functions elided.
52 | void arrive_and_wait() {
53 | int const myepoch = epoch.load(memory_order_relaxed);
54 | int const result = arrived.fetch_add(1, memory_order_acq_rel) + 1;
55 | if (result == expected) {
56 | expected = nexpected.load(memory_order_relaxed);
57 | arrived.store(0, memory_order_relaxed);
58 | // Only need to order {expected, arrived} -> {epoch}.
59 | epoch.store(myepoch + 1, memory_order_release);
60 | }
61 | else
62 | while (epoch.load(memory_order_acquire) == myepoch)
63 | ;
64 | }
65 | private:
66 | int expected;
67 | atomic arrived, nexpected, epoch;
68 | };
69 |
70 | The release operation on the epoch atomic is likely to require the compiler to
71 | insert a fence that has an effect that goes beyond the intended constraint,
72 | which is to order only the operations on the barrier object. Since the barrier
73 | object is likely to be smaller than a cache line and the library's
74 | implementation can control its alignment using ``alignas``, then it would be
75 | possible to compile this program without a fence in this location on
76 | architectures that are cache-line coherent.
77 |
78 | To concisely express the bound on the set of memory operations whose order is
79 | constrained, we propose to accompany ``std::atomic_thread_fence`` with an
80 | ``object`` variant which takes a reference to the object(s) to be ordered by
81 | the fence.
82 |
83 | -----------------
84 | Proposed addition
85 | -----------------
86 |
87 | Under 29.2 Header ```` synopsis [**atomics.syn**]:
88 |
89 | .. code-block:: c++
90 |
91 | namespace std {
92 | // 29.8, fences
93 | // ...
94 | template
95 | void atomic_object_fence(memory_order, T&&... objects) noexcept;
96 | }
97 |
98 | Under 29.8 Fences [**atomics.fences**], after the current
99 | ``atomic_thread_fence`` paragraph:
100 |
101 | ``template void atomic_object_fence(memory_order, T&&... objects) noexcept;``
102 |
103 | *Effect*: Equivalent to ``atomic_thread_fence(order)`` except that operations on
104 | objects other than those in the variadic template arguments and their
105 | sub-objects are *un-sequenced* with the fence.
106 |
107 | *Note*: The compiler may omit fences entirely depending on alignment
108 | information, may generate a dynamic test leading to a fence for under-aligned
109 | objects, or may emit the same fence an ``atomic_thread_fence`` would.
110 |
111 | The ``__cpp_lib_atomic_object_fence`` feature test macro should be added.
112 |
113 | ----------------------
114 | Example implementation
115 | ----------------------
116 |
117 | A trivial, yet conforming implementation may implement the new fence in terms of
118 | the existing ``std::atomic_thread_fence`` using the same memory order:
119 |
120 | .. code-block:: c++
121 |
122 | template
123 | void atomic_object_fence(std::memory_order order, T &&...) noexcept {
124 | std::atomic_thread_fence(order);
125 | }
126 |
127 | A more advanced implementation can overload this for the single-object case
128 | on architectures (or micro-architectures) that have cache coherency with a known
129 | line size, even if it is conservatively approximated:
130 |
131 | .. code-block:: c++
132 |
133 | #define __CACHELINE_SIZE // Secret (micro-)architectural value.
134 | template
135 | std::enable_if_t::value &&
136 | __CACHELINE_SIZE - alignof(T) % __CACHELINE_SIZE >= sizeof(T)>
137 | atomic_object_fence(std::memory_order, T &&object) noexcept {
138 | asm volatile("" : "+m"(object) : "m"(object)); // Code motion barrier.
139 | }
140 |
141 | To extend this for multiple objects, an implementation for the same architecture may
142 | emit a run-time check that the total footprint of all the objects fits in the span of
143 | a single cache line. This check may commonly be eliminated as dead code, for example
144 | when the objects are references from a common base pointer.
145 |
146 | The above ``std::barrier`` example's inner-code can use the new overload as follows:
147 |
148 | .. code-block:: c++
149 |
150 | if (result == expected) {
151 | expected = nexpected.load(memory_order_relaxed);
152 | arrived.store(0, memory_order_relaxed);
153 | atomic_object_fence(memory_order_release, *this);
154 | epoch.store(myepoch + 1, memory_order_relaxed);
155 | }
156 |
157 | It is equivalently valid to list the individual members of ``barrier`` instead of
158 | ``*this``. Both forms are equivalent.
159 |
160 | Less trivial implementations of ``std::atomic_object_fence`` can enable more
161 | optimizations for new hardware and portable program representations.
162 |
163 | -----------------
164 | Relation to N4523
165 | -----------------
166 |
167 | In N4523_ we propose to formalize the notions of false-sharing and true-sharing
168 | as perceived by the implementation in relation to the placement of objects in
169 | memory. In the expository implementation of the previous section we also showed
170 | how a cache-line coherent architecture or micro-architecture can elide fences
171 | that only bisect relations between objects that are in the same cache line, if
172 | provable at compile-time. These notions interact in a virtuous way because
173 | N4523's abstraction enables reasoning about likely cache behavior that
174 | implementations can optimize for.
175 |
176 | .. _N4523: http://wg21.link/N4523
177 |
178 | The example application of ``std::atomic_object_fence`` to the ``std::barrier``
179 | object is improved by combining these notions as follows:
180 |
181 | .. code-block:: c++
182 |
183 | alignas(std::thread::hardware_true_sharing_size) // N4523
184 | struct barrier {
185 | // Some member functions elided.
186 | void arrive_and_wait() {
187 | int const myepoch = epoch.load(memory_order_relaxed);
188 | int const result = arrived.fetch_add(1, memory_order_acq_rel) + 1;
189 | if (result == expected) {
190 | expected = nexpected.load(memory_order_relaxed);
191 | arrived.store(0, memory_order_relaxed);
192 | atomic_object_fence(memory_order_release, *this); // N4522
193 | epoch.store(myepoch + 1, memory_order_relaxed);
194 | }
195 | else
196 | while (epoch.load(memory_order_acquire) == myepoch)
197 | ;
198 | }
199 | private:
200 | int expected;
201 | atomic arrived, nexpected, epoch;
202 | };
203 |
204 | By aligning the barrier object to the true-sharing granularity, it is
205 | significantly more likely that the implementation will be able to elide the
206 | fence if the architecture or micro-architecture has cache-line coherency. Of
207 | course an implementation of the Standard is free to ensure this by other means,
208 | we provide this example as exposition for what developer programs might do.
209 |
210 | --------------------
211 | Memory model example
212 | --------------------
213 |
214 | =========================== ===========================
215 | T0 T1
216 | =========================== ===========================
217 | ``0: w = 1;`` ``4: while(!a.load(rlx));``
218 | ``1: x = 1;`` ``5: objfence(acq, a, x);``
219 | ``2: objfence(rel, a, x);`` ``6: assert(x);``
220 | ``3: a.store(1,rlx);`` ``7: assert(w);``
221 | =========================== ===========================
222 |
223 | The semantics of fences mean that:
224 |
225 | ``2`` synchronizes-with ``5`` because [**29.8¶2**]:
226 | A. ``2`` is sequenced-before ``3``,
227 | B. ``3`` inter-thread happens-before ``4``, and
228 | C. ``4`` is sequenced-before ``5``.
229 |
230 | ``1`` happens-before ``6`` because [**1.10¶13-14**]:
231 | A. ``1`` is sequenced-before ``2``,
232 | B. ``2`` synchronizes-with ``5``, and
233 | C. ``5`` is sequenced-before ``6``.
234 |
235 | Therefore the program is well-defined (so far) and the ``assert(x)`` of ``6``
236 | does not fire.
237 |
238 | However, the *un-sequenced* semantics of the object fence also mean that:
239 |
240 | ``0`` conflicts with ``7`` because [**1.10¶23**]:
241 | A. ``0`` is a store to ``w``, ``7`` is a load of ``w`` and they are not both
242 | atomic, and
243 | B. ``0`` is not sequenced-before ``2`` and ``5`` is not sequenced-before
244 | ``7``.
245 |
246 | Therefore the ``assert(w)`` of ``7`` makes the program undefined due to a
247 | data-race.
248 |
249 |
--------------------------------------------------------------------------------
/source/P1018r5.bs:
--------------------------------------------------------------------------------
1 |
2 | Title: Language Evolution status after Belfast 2019
3 | Shortname: P1018
4 | Revision: 5
5 | Audience: WG21, EWG
6 | Status: P
7 | Group: WG21
8 | URL: http://wg21.link/P1018r5
9 | !Source: github.com/jfbastien/papers/blob/master/source/P1018r5.bs
10 | Editor: JF Bastien, Apple, jfbastien@apple.com
11 | Date: 2020-01-04
12 | Markup Shorthands: markdown yes
13 | Toggle Diffs: no
14 | No abstract: false
15 | Abstract: This paper is a collection of items that language Evolution has worked on in the latest C++ meeting, their status, and plans for the future.
16 |
17 |
18 | Executive summary {#summary}
19 | =================
20 |
21 | Most time was spent in ballot resolution for C++20, to address National Body comments in [[N4844]].
22 |
23 |
24 | Work highlights {#high}
25 | ===============
26 |
27 | Language Evolution received roughly 100 National Body comments. We did at least one round of discussion on all of these comments.
28 |
29 | * Concepts: allow requires clauses on non-template friend functions of class templates.
30 | * Coroutines: most comments rejected, a few sent away to write a paper.
31 | * Undefined Behavior: deferred addressing all comments to C++23.
32 | * Feature test macros: comments were addressed.
33 | * Modules: many comments, including fixing issues around header units.
34 | * Changed how non-type template parameters work: allow types with all public members, all of which can themselves be used as NTTPs. This allows array members, reference members, pointers and references to subobjects, floating-point, and unions.
35 | * Began discussing some papers targeted at C++23.
36 |
37 |
38 | National Body comment details {#nb-details}
39 | =============================
40 |
41 | Miscellaneous NB comments:
42 |
43 |
129 |
130 |
131 | C++23 discussions {#cpp32}
132 | =================
133 |
134 | We started discussing a few papers which could make it to C++23.
135 |
136 | * Floating-point types from [[P1467r2]] and [[P1468r2]] received strong support.
137 | * [[P1105r1]] freestanding: there's ongoing interest in better supporting freestanding targets, and we gave direction to the author.
138 | * [[P1371r1]] Pattern matching: moving along, but the authors need help with implementation / usage experience if we want this to make C++23.
139 | * [[P1040r4]] `std::embed`: was seen by this group and others, and received confusing feedback, though most many people agree there's something useful to be had here.
140 | * [[P1219r2]] Homogeneous variadic function parameters: did not receive sufficient support to move forward.
141 | * [[P1097r2]] Named character escapes: received feedback, will see again.
142 | * [[P1895r0]] tag_invoke: A general pattern for supporting customisable functions: the general feeling was that there were some concerns with a library-only solution to the problem. Several interested parties are planning on working with the paper authors to try to come up with such a language feature.
143 | * [[P1676r0]] C++ Exception Optimizations. An experiment: informative discussion.
144 | * [[P1365r0]] Using Coroutine TS with zero dynamic allocations: informative discussion.
145 | * [[P1046r1]] Automatically Generate More Operators: received feedback, fairly positive.
146 | * [[P1908r0]] Reserving Attribute Names for Future Use: accepted, sent to CWG.
147 | * [[P0876r9]] `fiber_context` - fibers without scheduler: targets a TS. Gave feedback, will see again.
148 | * [[P1061r1]] Structured Bindings can introduce a Pack: approve of general direction.
149 | * [[P1839r1]] Accessing Object Representations: approve of general direction.
150 |
151 |
152 | Near-future EWG plans {#future}
153 | =====================
154 |
155 | There will still be some ballot resolution work in Prague, to address comments which we discussed but haven't resolved in Belfast. There will be no further ballot resolution after Prague.
156 |
157 | Ballot resolution will likely take a small portion of our time. Once that is done, Language Evolution will switch into full C++23 mode, likely following the plans outlined in [[P0592r3]]. These plans were discussed in multiple groups and received strong support.
158 |
--------------------------------------------------------------------------------