├── .gitignore
├── README.md
├── build.bat
├── build.sh
├── build_clang.sh
├── build_clang_omp.sh
├── build_omp.sh
├── include
    ├── atomic_wait
    ├── barrier
    ├── latch
    └── semaphore
├── lib
    └── source.cpp
├── sample.cpp
└── sample.hpp


/.gitignore:
--------------------------------------------------------------------------------
1 | .vscode/*
2 | sample
3 | sample.exe
4 | *.obj
5 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Sample implementation of C++20 synchronization facilities
 2 | 
 3 | This repository contains a sample implementation of these http://wg21.link/p1135 facilities:
 4 | 
 5 | * `atomic_wait` / `_notify` free-functions (not the member functions)
 6 | * `counting_` / `binary_semaphore`
 7 | * `latch` and `barrier`
 8 | 
 9 | ## How do I build the sample?
10 | 
11 | This project is self-contained.
12 | 
13 | ```
14 | git clone https://github.com/ogiroux/atomic_wait
15 | build.sh
16 | ```
17 | 
18 | ## What platforms are supported?
19 | 
20 | Linux, Mac and Windows.
21 | 
22 | ## How are `atomic_wait` / `_notify` composed?
23 | 
24 | The implementation has a variety of strategies that it selects by platform:
25 |  * Contention state table. Optimizes futex usage, or holds CVs, unless `-D__NO_TABLE`.
26 |  * Futex. Supported on Linux and Windows, unless `-D__NO_FUTEX`. Requires a table on Linux.
27 |  * Condition variables. Supported on Linux and Mac, unless `-D__NO_CONDVAR`. Requires a table.
28 |  * Timed back-off. Supported on everything, unless `-D__NO_SLEEP`.
29 |  * Spinlock. Supported on everything, only used last unless `-D__NO_IDENT`.
30 | 
31 | These strategies are selected for each platform, in the order written, based on what's disabled with the macros:
32 |  * Linux: futex + table -> CVs + table -> timed backoff -> spin.
33 |  * Mac: CVs + table -> timed backoff -> spin.
34 |  * Windows: futex -> timed backoff -> spin.
35 |  * CUDA: timed backoff -> spin.
36 |  * Unidentified platform: spin.
37 | 
38 | ## How do `counting_` / `binary_semaphore` work?
39 | 
40 | The implementation has these specializations:
41 | 
42 | * The fully general case, for `counting_semaphore` instantiated for huge numbers. This is implemented in terms of `atomic<ptrdiff_t>`, `atomic_wait` / `_notify`. This path is always enabled.
43 | * The constrained case, the default range, for numbers supported by the underlying platform semaphore (typically a `long`). This is implemented in terms of POSIX, Dispatch and Win32 semaphores, with optimizations below. Disable this path with `-D__NO_SEM`.
44 | * The case of a unit range, such as the alias `binary_semaphore`. This is specialized only when platform semaphores are disabled. This path uses `atomic<ptrdiff_t>`, `atomic_wait` / `_notify`.
45 | 
46 | Platform semaphores get (because they need) some additional optimizations, in up two orthogonal directions:
47 | 
48 | * Front buffering: an `atomic` object models the semaphore's conceptual count (incl. negative values). Operations to the platform semaphore are avoided as long as the modeled count stays positive. This is enabled by default on all platforms, but can be disabled with `-D__NO_SEM_FRONT`.
49 | * Back buffering: when the platform semaphore does not natively support the `release( count )` operation, an `atomic` object distributes the `release(1)` cooperatively among all threads waiting on the semaphore, as in a binary tree. This is used by default on Linux and Mac OS X, and can be disabled with `-D__NO_SEM_BACK`.
50 | 
51 | ## What about `latch` and `barrier`?
52 | 
53 | At the moment, they are only implemented in terms of `atomic` operations. These aren't ready for review.
54 | 


--------------------------------------------------------------------------------
/build.bat:
--------------------------------------------------------------------------------
1 | cl /EHsc /Iinclude /std:c++17 /O2 sample.cpp lib/source.cpp synchronization.lib /Fe:sample.exe
2 | 


--------------------------------------------------------------------------------
/build.sh:
--------------------------------------------------------------------------------
1 | g++ -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample
2 | 


--------------------------------------------------------------------------------
/build_clang.sh:
--------------------------------------------------------------------------------
1 | clang -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample
2 | 


--------------------------------------------------------------------------------
/build_clang_omp.sh:
--------------------------------------------------------------------------------
1 | clang -fopenmp=libomp -L../llvm-project/build/lib/ -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample
2 | 


--------------------------------------------------------------------------------
/build_omp.sh:
--------------------------------------------------------------------------------
1 | g++ -fopenmp -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample
2 | 


--------------------------------------------------------------------------------
/include/atomic_wait:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | Copyright (c) 2019, NVIDIA Corporation
  4 | 
  5 | Permission is hereby granted, free of charge, to any person obtaining a copy
  6 | of this software and associated documentation files (the "Software"), to deal
  7 | in the Software without restriction, including without limitation the rights
  8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  9 | copies of the Software, and to permit persons to whom the Software is
 10 | furnished to do so, subject to the following conditions:
 11 | 
 12 | The above copyright notice and this permission notice shall be included in
 13 | all copies or substantial portions of the Software.
 14 | 
 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 21 | THE SOFTWARE.
 22 | 
 23 | */
 24 | 
 25 | /*
 26 | 
 27 | This file introduces std::atomic_wait, atomic_notify_one, atomic_notify_all.
 28 | 
 29 | It has these strategies implemented:
 30 |  * Contention table. Used to optimize futex notify, or to hold CVs. Disable with __NO_TABLE.
 31 |  * Futex. Supported on Linux and Windows. For performance requires a table on Linux. Disable with __NO_FUTEX.
 32 |  * Condition variables. Supported on Linux and Mac. Requires table to function. Disable with __NO_CONDVAR.
 33 |  * Timed back-off. Supported on everything. Disable with __NO_SLEEP.
 34 |  * Spinlock. Supported on everything. Force with __NO_IDENT. Note: performance is too terrible to use.
 35 | 
 36 | You can also compare to pure spinning at algorithm level with __NO_WAIT.
 37 | 
 38 | The strategy is chosen this way, by platform:
 39 |  * Linux: default to futex (with table), fallback to futex (no table) -> CVs -> timed backoff -> spin.
 40 |  * Mac: default to CVs (table), fallback to timed backoff -> spin.
 41 |  * Windows: default to futex (no table), fallback to timed backoff -> spin.
 42 |  * CUDA: default to timed backoff, fallback to spin. (This is not all checked in in this tree.)
 43 |  * Unidentified platform: default to spin.
 44 | 
 45 | */
 46 | 
 47 | //#define __NO_TABLE
 48 | //#define __NO_FUTEX
 49 | //#define __NO_CONDVAR
 50 | //#define __NO_SLEEP
 51 | //#define __NO_IDENT
 52 | 
 53 | // To benchmark against spinning
 54 | //#define __NO_SPIN
 55 | //#define __NO_WAIT
 56 | 
 57 | #ifndef __ATOMIC_WAIT_INCLUDED
 58 | #define __ATOMIC_WAIT_INCLUDED
 59 | 
 60 | #include <cstdint>
 61 | #include <climits>
 62 | #include <cassert>
 63 | #include <type_traits>
 64 | 
 65 | #if defined(__NO_IDENT)
 66 | 
 67 |     #include <thread>
 68 |     #include <chrono>
 69 | 
 70 |     #define __ABI
 71 |     #define __YIELD() std::this_thread::yield()
 72 |     #define __SLEEP(x) std::this_thread::sleep_for(std::chrono::microseconds(x))
 73 |     #define __YIELD_PROCESSOR()
 74 | 
 75 | #else
 76 | 
 77 | #if defined(__CUSTD__)
 78 |     #define __NO_FUTEX
 79 |     #define __NO_CONDVAR
 80 |     #ifndef __CUDACC__
 81 |         #define __host__ 
 82 |         #define __device__
 83 |     #endif
 84 |     #define __ABI __host__ __device__
 85 | #else
 86 |     #define __ABI
 87 | #endif
 88 | 
 89 | #if defined(__APPLE__) || defined(__linux__)
 90 | 
 91 |     #include <unistd.h>
 92 |     #include <sched.h>
 93 |     #define __YIELD() sched_yield()
 94 |     #define __SLEEP(x) usleep(x)
 95 | 
 96 |     #if defined(__aarch64__)
 97 |         #  define __YIELD_PROCESSOR() asm volatile ("yield" ::: "memory")
 98 |     #elif defined(__x86_64__)
 99 |         # define __YIELD_PROCESSOR() asm volatile ("pause" ::: "memory")
100 |     #elif defined (__powerpc__)
101 |         # define __YIELD_PROCESSOR() asm volatile ("or 27,27,27" ::: "memory")
102 |     #endif
103 | #endif
104 | 
105 | #if defined(__linux__) && !defined(__NO_FUTEX)
106 | 
107 |     #if !defined(__NO_TABLE)
108 |         #define __TABLE
109 |     #endif
110 | 
111 |     #include <time.h>
112 |     #include <unistd.h>
113 |     #include <linux/futex.h>
114 |     #include <sys/syscall.h>
115 |     
116 |     #define __FUTEX
117 |     #define __FUTEX_TIMED
118 |     #define __type_used_directly(_T) (std::is_same<typename std::remove_const< \
119 |             typename std::remove_volatile<_Tp>::type>::type, __futex_preferred_t>::value)
120 |     using __futex_preferred_t = std::int32_t;
121 |     template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
122 |     void __do_direct_wait(_Tp const* ptr, _Tp val, void const* timeout) {
123 |         syscall(SYS_futex, ptr, FUTEX_WAIT_PRIVATE, val, timeout, 0, 0);
124 |     }
125 |     template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
126 |     void __do_direct_wake(_Tp const* ptr, bool all) {
127 |         syscall(SYS_futex, ptr, FUTEX_WAKE_PRIVATE, all ? INT_MAX : 1, 0, 0, 0);
128 |     }
129 | 
130 | #elif defined(_WIN32) && !defined(__CUSTD__)
131 | 
132 |     #define __NO_CONDVAR
133 |     #define __NO_TABLE
134 | 
135 |     #include <Windows.h>
136 |     #define __YIELD() Sleep(0)
137 |     #define __SLEEP(x) Sleep(x)
138 |     #define __YIELD_PROCESSOR() YieldProcessor()
139 | 
140 |     #include <intrin.h>
141 |     template <class _Tp>
142 |     auto __atomic_load_n(_Tp const* a, int) -> typename std::remove_reference<decltype(*a)>::type {
143 |         auto const t = *a;
144 |         _ReadWriteBarrier();
145 |         return t;
146 |     }
147 |     #define __builtin_expect(e, v) (e)
148 | 
149 |     #if defined(_WIN32_WINNT) && (_WIN32_WINNT >= _WIN32_WINNT_WIN8) && !defined(__NO_FUTEX)
150 | 
151 |         #define __FUTEX
152 |         #define __type_used_directly(_T) (sizeof(_T) <= 8)
153 |         using __futex_preferred_t = std::int64_t;
154 |         template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
155 |         void __do_direct_wait(_Tp const* ptr, _Tp val, void const*) {
156 |             WaitOnAddress((PVOID)ptr, (PVOID)&val, sizeof(_Tp), INFINITE);
157 |         }
158 |         template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
159 |         void __do_direct_wake(_Tp const* ptr, bool all) {
160 |             if (all)
161 |                 WakeByAddressAll((PVOID)ptr);
162 |             else
163 |                 WakeByAddressSingle((PVOID)ptr);
164 |         }
165 | 
166 |     #endif
167 | #endif // _WIN32
168 | 
169 | #if !defined(__FUTEX) && !defined(__NO_CONDVAR)
170 | 
171 |     #if defined(__NO_TABLE)
172 |         #warning "Condvars always generate a table (ignoring __NO_TABLE)."
173 |     #endif
174 |     #include <pthread.h>
175 |     #define __CONDVAR
176 |     #define __TABLE
177 | #endif
178 | 
179 | #endif // __NO_IDENT
180 | 
181 | #ifdef __TABLE
182 |     struct alignas(64) contended_t {
183 |     #if defined(__FUTEX)
184 |         int                     waiters = 0;
185 |         __futex_preferred_t     version = 0;
186 |     #elif defined(__CONDVAR)
187 |         int                     credit = 0;
188 |         pthread_mutex_t         mutex = PTHREAD_MUTEX_INITIALIZER;
189 |         pthread_cond_t          condvar = PTHREAD_COND_INITIALIZER;
190 |     #else
191 |         #error ""
192 |     #endif
193 |     };
194 |     contended_t * __contention(volatile void const * p);
195 | #else
196 |     template <class _Tp>
197 |     __ABI void __cxx_atomic_try_wait_slow_fallback(_Tp const* ptr, _Tp val, int order) {
198 |     #ifndef __NO_SLEEP
199 |         long history = 10;
200 |         do {
201 |             __SLEEP(history >> 2);
202 |             history += history >> 2;
203 |             if (history > (1 << 10))
204 |                 history = 1 << 10;
205 |         } while (__atomic_load_n(ptr, order) == val);
206 |     #else
207 |         __YIELD();
208 |     #endif
209 |     }
210 | #endif // __TABLE
211 | 
212 | #if defined(__CONDVAR)
213 | 
214 |     template <class _Tp>
215 |     void __cxx_atomic_notify_all(volatile _Tp const* ptr) {
216 |         auto * const c = __contention(ptr);
217 |         __atomic_thread_fence(__ATOMIC_SEQ_CST);
218 |         if(__builtin_expect(0 == __atomic_load_n(&c->credit, __ATOMIC_RELAXED), 1))
219 |             return;
220 |         if(0 != __atomic_exchange_n(&c->credit, 0, __ATOMIC_RELAXED)) {
221 |             pthread_mutex_lock(&c->mutex);
222 |             pthread_mutex_unlock(&c->mutex);
223 |             pthread_cond_broadcast(&c->condvar);
224 |         }
225 |     }
226 |     template <class _Tp>
227 |     void __cxx_atomic_notify_one(volatile _Tp const* ptr) {
228 |         __cxx_atomic_notify_all(ptr);
229 |     }
230 |     template <class _Tp>
231 |     void __cxx_atomic_try_wait_slow(volatile _Tp const* ptr, _Tp const val, int order) {
232 |         auto * const c = __contention(ptr);
233 |         pthread_mutex_lock(&c->mutex);
234 |         __atomic_store_n(&c->credit, 1, __ATOMIC_RELAXED);
235 |         __atomic_thread_fence(__ATOMIC_SEQ_CST);
236 |         if (val == __atomic_load_n(ptr, order))
237 |             pthread_cond_wait(&c->condvar, &c->mutex);
238 |         pthread_mutex_unlock(&c->mutex);
239 |     }
240 | 
241 | #elif defined(__FUTEX)
242 | 
243 |         template <class _Tp, typename std::enable_if<!__type_used_directly(_Tp), int>::type = 1>
244 |         void __cxx_atomic_notify_all(_Tp const* ptr) {
245 |     #if defined(__TABLE)
246 |             auto * const c = __contention(ptr);
247 |             __atomic_fetch_add(&c->version, 1, __ATOMIC_RELAXED);
248 |             __atomic_thread_fence(__ATOMIC_SEQ_CST);
249 |             if (0 != __atomic_exchange_n(&c->waiters, 0, __ATOMIC_RELAXED))
250 |                 __do_direct_wake(&c->version, true);
251 |     #endif
252 |         }
253 |         template <class _Tp, typename std::enable_if<!__type_used_directly(_Tp), int>::type = 1>
254 |         void __cxx_atomic_notify_one(_Tp const* ptr) {
255 |             __cxx_atomic_notify_all(ptr);
256 |         }
257 |         template <class _Tp, typename std::enable_if<!__type_used_directly(_Tp), int>::type = 1>
258 |         void __cxx_atomic_try_wait_slow(_Tp const* ptr, _Tp const val, int order) {
259 |     #if defined(__TABLE)
260 |             auto * const c = __contention(ptr);
261 |             __atomic_store_n(&c->waiters, 1, __ATOMIC_RELAXED);
262 |             __atomic_thread_fence(__ATOMIC_SEQ_CST);
263 |             auto const version = __atomic_load_n(&c->version, __ATOMIC_RELAXED);
264 |             if (__builtin_expect(val != __atomic_load_n(ptr, order), 1))
265 |                 return;
266 |         #ifdef __FUTEX_TIMED
267 |             constexpr timespec timeout = { 2, 0 }; // Hedge on rare 'int version' aliasing.
268 |             __do_direct_wait(&c->version, version, &timeout);
269 |         #else
270 |             __do_direct_wait(&c->version, version, nullptr);
271 |         #endif
272 |     #else
273 |         __cxx_atomic_try_wait_slow_fallback(ptr, val, order);
274 |     #endif // __TABLE
275 |         }
276 | 
277 |     template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
278 |     void __cxx_atomic_try_wait_slow(_Tp const* ptr, _Tp val, int order) {
279 |     #ifdef __TABLE
280 |         auto * const c = __contention(ptr);
281 |         __atomic_fetch_add(&c->waiters, 1, __ATOMIC_RELAXED);
282 |         __atomic_thread_fence(__ATOMIC_SEQ_CST);
283 |     #endif
284 |         __do_direct_wait(ptr, val, nullptr);
285 |     #ifdef __TABLE
286 |         __atomic_fetch_sub(&c->waiters, 1, __ATOMIC_RELAXED);
287 |     #endif
288 |     }
289 |     template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
290 |     void __cxx_atomic_notify_all(_Tp const* ptr) {
291 |     #ifdef __TABLE
292 |         auto * const c = __contention(ptr);
293 |         __atomic_thread_fence(__ATOMIC_SEQ_CST);
294 |         if (0 != __atomic_load_n(&c->waiters, __ATOMIC_RELAXED))
295 |     #endif
296 |             __do_direct_wake(ptr, true);
297 |     }
298 |     template <class _Tp, typename std::enable_if<__type_used_directly(_Tp), int>::type = 1>
299 |     void __cxx_atomic_notify_one(_Tp const* ptr) {
300 |     #ifdef __TABLE
301 |         auto * const c = __contention(ptr);
302 |         __atomic_thread_fence(__ATOMIC_SEQ_CST);
303 |         if (0 != __atomic_load_n(&c->waiters, __ATOMIC_RELAXED))
304 |     #endif
305 |             __do_direct_wake(ptr, false);
306 |     }
307 | 
308 | #else // __FUTEX || __CONDVAR
309 | 
310 |     template <class _Tp>
311 |     __ABI void __cxx_atomic_try_wait_slow(_Tp const* ptr, _Tp val, int order) {
312 |         __cxx_atomic_try_wait_slow_fallback(ptr, val, order);
313 |     }
314 |     template <class _Tp>
315 |     __ABI void __cxx_atomic_notify_one(_Tp const* ptr) { }
316 |     template <class _Tp>
317 |     __ABI void __cxx_atomic_notify_all(_Tp const* ptr) { }
318 | 
319 | #endif // __FUTEX || __CONDVAR
320 | 
321 | template <class _Tp>
322 | __ABI void __cxx_atomic_wait(_Tp const* ptr, _Tp const val, int order) {
323 | #ifndef __NO_SPIN
324 |     if(__builtin_expect(__atomic_load_n(ptr, order) != val,1))
325 |         return;
326 |     for(int i = 0; i < 16; ++i) {
327 |         if(__atomic_load_n(ptr, order) != val)
328 |             return;
329 |         if(i < 12)
330 |             __YIELD_PROCESSOR();
331 |         else
332 |             __YIELD();
333 |     }
334 | #endif
335 |     while(val == __atomic_load_n(ptr, order))
336 | #ifndef __NO_WAIT
337 |         __cxx_atomic_try_wait_slow(ptr, val, order)
338 | #endif
339 |         ;
340 | }
341 | 
342 | #include <atomic>
343 | 
344 | namespace std {
345 | 
346 |     template <class _Tp, class _Tv>
347 |     __ABI void atomic_wait_explicit(atomic<_Tp> const* a, _Tv val, std::memory_order order) {
348 |         __cxx_atomic_wait((const _Tp*)a, (_Tp)val, (int)order);
349 |     }
350 |     template <class _Tp, class _Tv>
351 |     __ABI void atomic_wait(atomic<_Tp> const* a, _Tv val) {
352 |         atomic_wait_explicit(a, val, std::memory_order_seq_cst);
353 |     }
354 |     template <class _Tp>
355 |     __ABI void atomic_notify_one(atomic<_Tp> const* a) {
356 |         __cxx_atomic_notify_one((const _Tp*)a);
357 |     }
358 |     template <class _Tp>
359 |     __ABI void atomic_notify_all(atomic<_Tp> const* a) {
360 |         __cxx_atomic_notify_all((const _Tp*)a);
361 |     }
362 | }
363 | 
364 | #endif //__ATOMIC_WAIT_INCLUDED
365 | 


--------------------------------------------------------------------------------
/include/barrier:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | Copyright (c) 2019, NVIDIA Corporation
  4 | 
  5 | Permission is hereby granted, free of charge, to any person obtaining a copy
  6 | of this software and associated documentation files (the "Software"), to deal
  7 | in the Software without restriction, including without limitation the rights
  8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  9 | copies of the Software, and to permit persons to whom the Software is
 10 | furnished to do so, subject to the following conditions:
 11 | 
 12 | The above copyright notice and this permission notice shall be included in
 13 | all copies or substantial portions of the Software.
 14 | 
 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 21 | THE SOFTWARE.
 22 | 
 23 | */
 24 | 
 25 | #include <cstddef>
 26 | #include <thread>
 27 | #include <vector>
 28 | #include "atomic_wait"
 29 | 
 30 | //#define __BARRIER_NO_BUTTERFLY
 31 | //#define __BARRIER_NO_WAIT
 32 | //#define __BARRIER_NO_SPECIALIZATION
 33 | 
 34 | struct EmptyCompletionF {
 35 |     inline void operator()() noexcept { }
 36 | };
 37 | 
 38 | #ifndef __BARRIER_NO_BUTTERFLY
 39 | 
 40 | extern thread_local size_t __barrier_favorite_hash;
 41 | 
 42 | template<class CompletionF = EmptyCompletionF>
 43 | class barrier {
 44 | 
 45 |     static constexpr size_t __max_steps = CHAR_BIT * sizeof(ptrdiff_t) - 1;
 46 | 
 47 |     using __phase_t = uint8_t;    
 48 | 
 49 |     struct alignas(64) __state_t {
 50 |         std::atomic<__phase_t> v[__max_steps + 1] = {0};
 51 |     };
 52 | 
 53 |     alignas(64) ptrdiff_t              expected;
 54 |     ptrdiff_t                          expected_steps;
 55 |     std::atomic<ptrdiff_t>             expected_adjustment;
 56 |     std::atomic<__phase_t>             phase;
 57 |     alignas(64) std::vector<__state_t> state;
 58 |     CompletionF                        completion;
 59 | 
 60 |     static constexpr __phase_t __next_phase(__phase_t old) { 
 61 |         return (old + 1) & 3; 
 62 |     }
 63 |     inline bool __try_get_id(size_t id, __phase_t old_phase) {
 64 |         return state[id].v[__max_steps].compare_exchange_strong(old_phase, 
 65 |                                                                 __next_phase(old_phase), 
 66 |                                                                 std::memory_order_relaxed);
 67 |     }
 68 |     inline size_t __get_id(__phase_t const old_phase, ptrdiff_t count) {
 69 |         for(auto i = 0; i < count; ++i) {
 70 |             auto id = __barrier_favorite_hash + i;
 71 |             if(id > count)
 72 |                 id %= count;
 73 |             if(__builtin_expect(__try_get_id(id, old_phase), 1)) {
 74 |                 if(__barrier_favorite_hash != id)
 75 |                     __barrier_favorite_hash = id;
 76 |                 return id;
 77 |             }
 78 |         }
 79 |         return ~0ull;
 80 |     }
 81 |     static constexpr uint32_t __log2_floor(ptrdiff_t count) { 
 82 |         return count <= 1 ? 0 : 1 + __log2_floor(count >> 1); 
 83 |     }
 84 |     static constexpr uint32_t __log2_ceil(ptrdiff_t count) { 
 85 |         auto const t = __log2_floor(count);
 86 |         return count == (1 << t) ? t : t + 1;
 87 |     }
 88 | 
 89 | public:
 90 |     using arrival_token = std::tuple<std::atomic<__phase_t>&, __phase_t>;
 91 | 
 92 |     barrier(ptrdiff_t expected, CompletionF completion = CompletionF()) 
 93 |             : expected(expected), expected_steps(__log2_ceil(expected)), 
 94 |               expected_adjustment(0), phase(0),
 95 |               state(expected), completion(completion) { 
 96 |         assert(expected >= 0);
 97 |     }
 98 | 
 99 |     ~barrier() = default;
100 | 
101 |     barrier(barrier const&) = delete;
102 |     barrier& operator=(barrier const&) = delete;
103 | 
104 |     [[nodiscard]] inline arrival_token arrive(ptrdiff_t update = 1) {
105 | 
106 |         size_t id = 0; // assume, for now
107 |         auto const old_phase = phase.load(std::memory_order_relaxed);
108 |         auto const count = expected;
109 |         assert(count > 0);
110 |         auto const steps = expected_steps;
111 |         if(0 != steps) {
112 |             id = __get_id(old_phase, count);
113 |             assert(id != ~0ull);
114 |             for(uint32_t k = 0;k < steps; ++k) {
115 |                 auto const index = steps - k - 1;
116 |                 state[(id + (1 << k)) % count].v[index].store(__next_phase(old_phase), std::memory_order_release);
117 |                 while(state[id].v[index].load(std::memory_order_acquire) == old_phase)
118 |                     ;
119 |             }
120 |         }
121 |         if(0 == id) {
122 |             completion();
123 |             expected += expected_adjustment.load(std::memory_order_relaxed);
124 |             expected_steps = __log2_ceil(expected);
125 |             expected_adjustment.store(0, std::memory_order_relaxed);
126 |             phase.store(__next_phase(old_phase), std::memory_order_release);
127 |         }
128 |         return std::tie(phase, old_phase);
129 |     }
130 |     inline void wait(arrival_token&& token) const {
131 |         auto const& current_phase = std::get<0>(token);
132 |         auto const old_phase = std::get<1>(token);
133 |         if(__builtin_expect(old_phase != current_phase.load(std::memory_order_acquire),1))
134 |             return;
135 | #ifndef __BARRIER_NO_WAIT
136 |         using __clock = std::conditional<std::chrono::high_resolution_clock::is_steady, 
137 |                                             std::chrono::high_resolution_clock,
138 |                                             std::chrono::steady_clock>::type;
139 |         auto const start = __clock::now();
140 | #endif
141 |         while (old_phase == current_phase.load(std::memory_order_acquire)) {
142 | #ifndef __BARRIER_NO_WAIT
143 |             auto const elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(__clock::now() - start);
144 |             auto const step = std::min(elapsed / 4 + std::chrono::nanoseconds(100), 
145 |                                        std::chrono::nanoseconds(1500));
146 |             if(step > std::chrono::nanoseconds(1000))
147 |                 std::this_thread::sleep_for(step);
148 |             else if(step > std::chrono::nanoseconds(500))
149 | #endif
150 |                 std::this_thread::yield();
151 |         }
152 |     }
153 | 	inline void arrive_and_wait() {
154 |         wait(arrive());
155 | 	}
156 |     inline void arrive_and_drop() {
157 |         expected_adjustment.fetch_sub(1, std::memory_order_relaxed);
158 |         (void)arrive();
159 |     }
160 | };
161 | 
162 | #else
163 | 
164 | template<class CompletionF = EmptyCompletionF>
165 | class barrier {
166 | 
167 |     alignas(64) std::atomic<bool> phase;
168 |     std::atomic<ptrdiff_t> expected, arrived;
169 |     CompletionF completion;
170 | public:
171 |     using arrival_token = bool;
172 | 
173 |     barrier(ptrdiff_t expected, CompletionF completion = CompletionF()) 
174 |         : phase(false), expected(expected), arrived(expected), completion(completion) { 
175 |     }
176 | 
177 |     ~barrier() = default;
178 | 
179 |     barrier(barrier const&) = delete;
180 |     barrier& operator=(barrier const&) = delete;
181 | 
182 |     [[nodiscard]] arrival_token arrive(ptrdiff_t update = 1) {
183 |         auto const old_phase = phase.load(std::memory_order_relaxed);
184 |         auto const result = arrived.fetch_sub(update, std::memory_order_acq_rel) - update;
185 |         assert(result >= 0);
186 |         auto const new_expected = expected.load(std::memory_order_relaxed);
187 |         if(0 == result) {
188 |             completion();
189 |             arrived.store(new_expected, std::memory_order_relaxed);
190 |             phase.store(!old_phase, std::memory_order_release);
191 | #ifndef __BARRIER_NO_WAIT
192 |             atomic_notify_all(&phase);
193 | #endif
194 |         }
195 |         return old_phase;
196 |     }
197 |     void wait(arrival_token&& old_phase) const {
198 | #ifndef __BARRIER_NO_WAIT
199 |         atomic_wait_explicit(&phase, old_phase, std::memory_order_acquire);
200 | #else
201 |         while(old_phase == phase.load(std::memory_order_acquire))
202 |             ;
203 | #endif
204 |     }
205 | 	void arrive_and_wait() {
206 |         wait(arrive());
207 | 	}
208 |     void arrive_and_drop() {
209 |         expected.fetch_sub(1, std::memory_order_relaxed);
210 |         (void)arrive();
211 |     }
212 | };
213 | 
214 | #ifndef __BARRIER_NO_SPECIALIZATION
215 | 
216 | template< >
217 | class barrier<EmptyCompletionF> {
218 | 
219 |     static constexpr uint64_t expected_unit = 1ull;
220 |     static constexpr uint64_t arrived_unit = 1ull << 32;
221 |     static constexpr uint64_t expected_mask = arrived_unit - 1;
222 |     static constexpr uint64_t phase_bit = 1ull << 63;
223 |     static constexpr uint64_t arrived_mask = (phase_bit - 1) & ~expected_mask;
224 | 
225 |     alignas(64) std::atomic<uint64_t> phase_arrived_expected;
226 | 
227 |     static inline constexpr uint64_t __init(ptrdiff_t count) noexcept {
228 |         uint64_t const comp = (1u << 31) - count;
229 |         return (comp << 32) | comp;
230 |     }
231 | 
232 | public:
233 |     using arrival_token = uint64_t;
234 | 
235 |     barrier(ptrdiff_t count, EmptyCompletionF = EmptyCompletionF()) 
236 |         : phase_arrived_expected(__init(count)) { 
237 |     }
238 | 
239 |     ~barrier() = default;
240 | 
241 |     barrier(barrier const&) = delete;
242 |     barrier& operator=(barrier const&) = delete;
243 | 
244 |     [[nodiscard]] inline arrival_token arrive(ptrdiff_t update = 1) {
245 | 
246 |         auto const old = phase_arrived_expected.fetch_add(arrived_unit, std::memory_order_acq_rel);
247 |         if((old ^ (old + arrived_unit)) & phase_bit) {
248 |             phase_arrived_expected.fetch_add((old & expected_mask) << 32, std::memory_order_relaxed);
249 | #ifndef __BARRIER_NO_WAIT
250 |             atomic_notify_all(&phase_arrived_expected);
251 | #endif
252 |         }
253 |         return old & phase_bit;
254 |     }
255 |     inline void wait(arrival_token&& phase) const {
256 | 
257 |         while(1) {
258 |             uint64_t const current = phase_arrived_expected.load(std::memory_order_acquire);
259 |             if((current & phase_bit) != phase)
260 |                 return;
261 | #ifndef __BARRIER_NO_WAIT
262 |             atomic_wait_explicit(&phase_arrived_expected, current, std::memory_order_relaxed);
263 | #endif
264 |         }
265 |     }
266 | 	inline void arrive_and_wait() {
267 |         wait(arrive());
268 | 	}
269 |     inline void arrive_and_drop() {
270 |         phase_arrived_expected.fetch_add(expected_unit, std::memory_order_relaxed);
271 |         (void)arrive();
272 |     }
273 | };
274 | 
275 | #endif //__BARRIER_NO_SPECIALIZATION
276 | 
277 | #endif //__BARRIER_NO_BUTTERFLY
278 | 


--------------------------------------------------------------------------------
/include/latch:
--------------------------------------------------------------------------------
 1 | /*
 2 | 
 3 | Copyright (c) 2019, NVIDIA Corporation
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 | 
23 | */
24 | 
25 | #include "atomic_wait"
26 | 
27 | namespace std {
28 | 
29 | class latch {
30 | public:
31 |     constexpr explicit latch(ptrdiff_t expected) : counter(expected) { }
32 |     ~latch() = default;
33 | 
34 |     latch(const latch&) = delete;
35 |     latch& operator=(const latch&) = delete;
36 | 
37 |     inline void count_down(ptrdiff_t update = 1) {
38 |         assert(update > 0);
39 |         auto const old = counter.fetch_sub(update, std::memory_order_release);
40 |         assert(old >= update);
41 | #ifndef __NO_WAIT
42 |         if(old == update)
43 |             atomic_notify_all(&counter);
44 | #endif
45 |     }
46 |     inline bool try_wait() const noexcept {
47 |         return counter.load(std::memory_order_acquire) == 0;
48 |     }
49 |     inline void wait() const {
50 |         while(1) {
51 |             auto const current = counter.load(std::memory_order_acquire);
52 |             if(current == 0) 
53 |                 return;
54 | #ifndef __NO_WAIT
55 |             atomic_wait_explicit(&counter, current, std::memory_order_relaxed)
56 | #endif
57 |             ;
58 |         }
59 |     }
60 |     inline void arrive_and_wait(ptrdiff_t update = 1) {
61 |         count_down(update);
62 |         wait();
63 |     }
64 | 
65 | private:
66 |     std::atomic<ptrdiff_t> counter;
67 | };
68 | 
69 | }
70 | 


--------------------------------------------------------------------------------
/include/semaphore:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | Copyright (c) 2019, NVIDIA Corporation
  4 | 
  5 | Permission is hereby granted, free of charge, to any person obtaining a copy
  6 | of this software and associated documentation files (the "Software"), to deal
  7 | in the Software without restriction, including without limitation the rights
  8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  9 | copies of the Software, and to permit persons to whom the Software is
 10 | furnished to do so, subject to the following conditions:
 11 | 
 12 | The above copyright notice and this permission notice shall be included in
 13 | all copies or substantial portions of the Software.
 14 | 
 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 21 | THE SOFTWARE.
 22 | 
 23 | */
 24 | 
 25 | #include <cstddef>
 26 | #include <algorithm>
 27 | #include "atomic_wait"
 28 | 
 29 | //#define __NO_SEM
 30 | //#define __NO_SEM_BACK
 31 | //#define __NO_SEM_FRONT
 32 | //#define __NO_SEM_POLL
 33 | 
 34 | #if defined(__APPLE__) || defined(__linux__)
 35 | 	#define __semaphore_no_inline __attribute__ ((noinline))
 36 | #elif defined(_WIN32)
 37 | 	#define __semaphore_no_inline __declspec(noinline)
 38 |     #include <Windows.h>
 39 | 	#undef min
 40 | 	#undef max
 41 | #else
 42 |     #define __semaphore_no_inline
 43 |     #define __NO_SEM
 44 | #endif
 45 | 
 46 | #ifndef __NO_SEM
 47 | 
 48 | #if defined(__APPLE__) 
 49 | 
 50 |     #include <dispatch/dispatch.h>
 51 | 
 52 |     #define __SEM_POST_ONE
 53 |     static constexpr ptrdiff_t __semaphore_max = std::numeric_limits<long>::max();
 54 |     using __semaphore_sem_t = dispatch_semaphore_t;
 55 | 
 56 |     inline bool __semaphore_sem_init(__semaphore_sem_t &sem, int init) {
 57 |         return (sem = dispatch_semaphore_create(init)) != NULL;
 58 |     }
 59 |     inline bool __semaphore_sem_destroy(__semaphore_sem_t &sem) {
 60 |         assert(sem != NULL);
 61 |         dispatch_release(sem);
 62 |         return true;
 63 |     }
 64 |     inline bool __semaphore_sem_post(__semaphore_sem_t &sem, int inc) {
 65 |         assert(inc == 1);
 66 |         dispatch_semaphore_signal(sem);
 67 |         return true;
 68 |     }
 69 |     inline bool __semaphore_sem_wait(__semaphore_sem_t &sem) {
 70 |         return dispatch_semaphore_wait(sem, DISPATCH_TIME_FOREVER) == 0;
 71 |     }
 72 |     template <class Rep, class Period>
 73 |     inline bool __semaphore_sem_wait_timed(__semaphore_sem_t &sem, std::chrono::duration<Rep, Period> const &delta) {
 74 |         return dispatch_semaphore_wait(sem, dispatch_time(DISPATCH_TIME_NOW, std::chrono::duration_cast<std::chrono::nanoseconds>(delta).count())) == 0;
 75 |     }
 76 | 
 77 | #endif //__APPLE__
 78 | 
 79 | #if defined(__linux__)
 80 | 
 81 |     #include <unistd.h>
 82 |     #include <sched.h>
 83 |     #include <semaphore.h>
 84 | 
 85 |     #ifndef __NO_SEM_POLL
 86 |         #define __NO_SEM_POLL
 87 |     #endif
 88 |     #define __SEM_POST_ONE
 89 |     static constexpr ptrdiff_t __semaphore_max = SEM_VALUE_MAX;
 90 |     using __semaphore_sem_t = sem_t;
 91 |     
 92 |     inline bool __semaphore_sem_init(__semaphore_sem_t &sem, int init) {
 93 |         return sem_init(&sem, 0, init) == 0;
 94 |     }
 95 |     inline bool __semaphore_sem_destroy(__semaphore_sem_t &sem) {
 96 |         return sem_destroy(&sem) == 0;
 97 |     }
 98 |     inline bool __semaphore_sem_post(__semaphore_sem_t &sem, int inc) {
 99 |         assert(inc == 1);
100 |         return sem_post(&sem) == 0;
101 |     }
102 |     inline bool __semaphore_sem_wait(__semaphore_sem_t &sem) {
103 |         return sem_wait(&sem) == 0;
104 |     }
105 |     template <class Rep, class Period>
106 |     inline bool __semaphore_sem_wait_timed(__semaphore_sem_t &sem, std::chrono::duration<Rep, Period> const &delta) {
107 |         struct timespec ts;
108 |         ts.tv_sec = static_cast<long>(std::chrono::duration_cast<std::chrono::seconds>(delta).count());
109 |         ts.tv_nsec = static_cast<long>(std::chrono::duration_cast<std::chrono::nanoseconds>(delta).count());
110 |         return sem_timedwait(&sem, &ts) == 0;
111 |     }
112 | 
113 | #endif //__linux__
114 | 
115 | #if defined(_WIN32)
116 | 
117 |     #define __NO_SEM_BACK
118 | 
119 |     static constexpr ptrdiff_t __semaphore_max = std::numeric_limits<long>::max();
120 |     using __semaphore_sem_t = HANDLE;
121 | 
122 |     inline bool __semaphore_sem_init(__semaphore_sem_t &sem, int init) {
123 |         bool const ret = (sem = CreateSemaphore(NULL, init, INT_MAX, NULL)) != NULL;
124 |         assert(ret);
125 |         return ret;
126 |     }
127 |     inline bool __semaphore_sem_destroy(__semaphore_sem_t &sem) {
128 |         assert(sem != NULL);
129 |         return CloseHandle(sem) == TRUE;
130 |     }
131 |     inline bool __semaphore_sem_post(__semaphore_sem_t &sem, int inc) {
132 |         assert(sem != NULL);
133 |         assert(inc > 0);
134 |         return ReleaseSemaphore(sem, inc, NULL) == TRUE;
135 |     }
136 |     inline bool __semaphore_sem_wait(__semaphore_sem_t &sem) {
137 |         assert(sem != NULL);
138 |         return WaitForSingleObject(sem, INFINITE) == WAIT_OBJECT_0;
139 |     }
140 |     template <class Rep, class Period>
141 |     inline bool __semaphore_sem_wait_timed(__semaphore_sem_t &sem, std::chrono::duration<Rep, Period> const &delta) {
142 |         assert(sem != NULL);
143 |         return WaitForSingleObject(sem, (DWORD)std::chrono::duration_cast<std::chrono::milliseconds>(delta).count()) == WAIT_OBJECT_0;
144 |     }
145 | 
146 | #endif // _WIN32
147 | 
148 | #endif
149 | 
150 | namespace std {
151 | 
152 | class __atomic_semaphore_base {
153 | 
154 |     __semaphore_no_inline inline bool __fetch_sub_if_slow(ptrdiff_t old) {
155 |         while (old != 0) {
156 |             if (count.compare_exchange_weak(old, old - 1, std::memory_order_acquire, std::memory_order_relaxed))
157 |                 return true; 
158 |         }
159 |         return false;
160 |     }
161 |     inline bool __fetch_sub_if() {
162 | 
163 |         ptrdiff_t old = count.load(std::memory_order_acquire);
164 |         if (old == 0)
165 |             return false;
166 |         if(count.compare_exchange_weak(old, old - 1, std::memory_order_acquire, std::memory_order_relaxed))
167 |             return true;
168 |         return __fetch_sub_if_slow(old); // fail only if not available
169 |     }
170 |     __semaphore_no_inline inline void __wait_slow() {
171 |         while (1) {
172 |             ptrdiff_t const old = count.load(std::memory_order_acquire);
173 |             if(old != 0)
174 |                 break;
175 |             atomic_wait_explicit(&count, old, std::memory_order_relaxed);
176 |         }
177 |     }
178 |     __semaphore_no_inline inline bool __acquire_slow_timed(std::chrono::nanoseconds const& rel_time) {
179 | 
180 |         using __clock = std::conditional<std::chrono::high_resolution_clock::is_steady, 
181 |                                          std::chrono::high_resolution_clock,
182 |                                          std::chrono::steady_clock>::type;
183 | 
184 |         auto const start = __clock::now();
185 |         while (1) {
186 |             ptrdiff_t const old = count.load(std::memory_order_acquire);
187 |             if(old != 0 && __fetch_sub_if_slow(old))
188 |                 return true;
189 |             auto const elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(__clock::now() - start);
190 |             auto const delta = rel_time - elapsed;
191 |             if(delta <= std::chrono::nanoseconds(0))
192 |                 return false;
193 |             auto const sleep = std::min((elapsed.count() >> 2) + 100, delta.count());
194 |             std::this_thread::sleep_for(std::chrono::nanoseconds(sleep));
195 |         }
196 |     }
197 |     std::atomic<ptrdiff_t> count;
198 | 
199 | public:
200 |     static constexpr ptrdiff_t max() noexcept { 
201 |         return std::numeric_limits<ptrdiff_t>::max(); 
202 |     }
203 | 
204 |     __atomic_semaphore_base(ptrdiff_t count) : count(count) { }
205 | 
206 |     ~__atomic_semaphore_base() = default;
207 | 
208 |     __atomic_semaphore_base(__atomic_semaphore_base const&) = delete;
209 |     __atomic_semaphore_base& operator=(__atomic_semaphore_base const&) = delete;
210 | 
211 |     inline void release(ptrdiff_t update = 1) {
212 |         count.fetch_add(update, std::memory_order_release);
213 |         if(update > 1)
214 |             atomic_notify_all(&count);
215 |         else
216 |             atomic_notify_one(&count);
217 |     }
218 |     inline void acquire() {
219 |         while (!try_acquire())
220 |             __wait_slow();
221 |     }
222 | 
223 |     inline bool try_acquire() noexcept {
224 |         return __fetch_sub_if();
225 |     }
226 |     template <class Clock, class Duration>
227 |     inline bool try_acquire_until(std::chrono::time_point<Clock, Duration> const& abs_time) {
228 | 
229 |         if (try_acquire())
230 |             return true;
231 |         else
232 |             return __acquire_slow_timed(abs_time - Clock::now());
233 |     }
234 |     template <class Rep, class Period>
235 |     inline bool try_acquire_for(std::chrono::duration<Rep, Period> const& rel_time) {
236 | 
237 |         if (try_acquire())
238 |             return true;
239 |         else
240 |             return __acquire_slow_timed(rel_time);
241 |     }
242 | };
243 | 
244 | #ifndef __NO_SEM
245 | 
246 | class __semaphore_base {
247 | 
248 |     inline bool __backfill(bool success) {
249 | #ifndef __NO_SEM_BACK
250 |         if(success) {
251 |             auto const back_amount = __backbuffer.fetch_sub(2, std::memory_order_acquire);
252 |             bool const post_one = back_amount > 0;
253 |             bool const post_two = back_amount > 1;
254 |             auto const success = (!post_one || __semaphore_sem_post(__semaphore, 1)) && 
255 |                                  (!post_two || __semaphore_sem_post(__semaphore, 1));
256 |             assert(success);
257 |             if(!post_one || !post_two)
258 |                 __backbuffer.fetch_add(!post_one ? 2 : 1, std::memory_order_relaxed);
259 |         }
260 | #endif
261 |         return success;
262 |     }
263 |     inline bool __try_acquire_fast() {
264 | #ifndef __NO_SEM_FRONT
265 | #ifndef __NO_SEM_POLL
266 |         ptrdiff_t old = __frontbuffer.load(std::memory_order_relaxed);
267 |         if(!(old >> 32)) {
268 |             using __clock = std::conditional<std::chrono::high_resolution_clock::is_steady, 
269 |                                              std::chrono::high_resolution_clock,
270 |                                              std::chrono::steady_clock>::type;
271 |             auto const start = __clock::now();
272 |             old = __frontbuffer.load(std::memory_order_relaxed);
273 |             while(!(old >> 32)) {
274 |                 auto const elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(__clock::now() - start);
275 |                 if(elapsed > std::chrono::microseconds(5))
276 |                     break;
277 |                 std::this_thread::sleep_for((elapsed + std::chrono::nanoseconds(100)) / 4);
278 |             }
279 |         }
280 | #else
281 |         // boldly assume the semaphore is free with a count of 1, just because
282 |         ptrdiff_t old = 1ll << 32;
283 | #endif
284 |         // always steal if you can
285 |         while(old >> 32)
286 |             if(__frontbuffer.compare_exchange_weak(old, old - (1ll << 32), std::memory_order_acquire))
287 |                 return true;
288 |         // record we're waiting
289 |         old = __frontbuffer.fetch_add(1ll, std::memory_order_release);
290 |         // ALWAYS steal if you can!
291 |         while(old >> 32)
292 |             if(__frontbuffer.compare_exchange_weak(old, old - (1ll << 32), std::memory_order_acquire))
293 |                 break;
294 |         // not going to wait after all
295 |         if(old >> 32)
296 |             return __try_done(true);
297 | #endif
298 |         // the wait has begun...
299 |         return false;
300 |     }
301 |     inline bool __try_done(bool success) {
302 | #ifndef __NO_SEM_FRONT
303 |         // record we're NOT waiting
304 |         __frontbuffer.fetch_sub(1ll, std::memory_order_release);
305 | #endif
306 |         return __backfill(success);
307 |     }
308 |     __semaphore_no_inline inline void __release_slow(ptrdiff_t post_amount) {
309 | #ifdef __SEM_POST_ONE
310 |     #ifndef __NO_SEM_BACK
311 |         bool const post_one = post_amount > 0;
312 |         bool const post_two = post_amount > 1;
313 |         if(post_amount > 2)
314 |             __backbuffer.fetch_add(post_amount - 2, std::memory_order_acq_rel);
315 |         auto const success = (!post_one || __semaphore_sem_post(__semaphore, 1)) && 
316 |                              (!post_two || __semaphore_sem_post(__semaphore, 1));
317 |         assert(success);
318 |     #else
319 |         for(; post_amount; --post_amount) {
320 |             auto const success = __semaphore_sem_post(__semaphore, 1);
321 |             assert(success);
322 |         }
323 |     #endif
324 | #else
325 |         auto const success = __semaphore_sem_post(__semaphore, post_amount);
326 |         assert(success);
327 | #endif
328 |     }
329 |     __semaphore_sem_t __semaphore;
330 | #ifndef __NO_SEM_FRONT
331 |     std::atomic<ptrdiff_t> __frontbuffer;
332 | #endif
333 | #ifndef __NO_SEM_BACK
334 |     std::atomic<ptrdiff_t> __backbuffer;
335 | #endif
336 | 
337 | public:
338 |     static constexpr ptrdiff_t max() noexcept {
339 |         return __semaphore_max;
340 |     }
341 |     
342 |     __semaphore_base(ptrdiff_t count = 0) : __semaphore()
343 | #ifndef __NO_SEM_FRONT
344 |     , __frontbuffer(count << 32)
345 | #endif
346 | #ifndef __NO_SEM_BACK
347 |     , __backbuffer(0)
348 | #endif
349 |     { 
350 |         assert(count <= max());
351 |         auto const success = 
352 | #ifndef __NO_SEM_FRONT
353 |             __semaphore_sem_init(__semaphore, 0);
354 | #else
355 |             __semaphore_sem_init(__semaphore, count);
356 | #endif
357 |         assert(success);
358 |     }
359 |     ~__semaphore_base() {
360 | #ifndef __NO_SEM_FRONT
361 |         assert(0 == (__frontbuffer.load(std::memory_order_relaxed) & ~0u));
362 | #endif
363 |         auto const success = __semaphore_sem_destroy(__semaphore);
364 |         assert(success);
365 |     }
366 | 
367 |     __semaphore_base(const __semaphore_base&) = delete;
368 |     __semaphore_base& operator=(const __semaphore_base&) = delete;
369 | 
370 |     inline void release(ptrdiff_t update = 1) {
371 | #ifndef __NO_SEM_FRONT
372 |         // boldly assume the semaphore is taken but uncontended
373 |         ptrdiff_t old = 0;
374 |         // try to fast-release as long as it's uncontended
375 |         while(0 == (old & ~0ul))
376 |             if(__frontbuffer.compare_exchange_weak(old, old + (update << 32), std::memory_order_acq_rel))
377 |                 return;
378 | #endif
379 |         // slow-release it is
380 |         __release_slow(update);
381 |     }
382 |     inline void acquire() {
383 |         if(!__try_acquire_fast())
384 |             __try_done(__semaphore_sem_wait(__semaphore));
385 |     }
386 |     inline bool try_acquire() noexcept {
387 |         return try_acquire_for(std::chrono::nanoseconds(0));
388 |     }
389 |     template <class Clock, class Duration>
390 |     bool try_acquire_until(std::chrono::time_point<Clock, Duration> const& abs_time) {
391 |         auto const current = std::max(Clock::now(), abs_time);
392 |         return try_acquire_for(std::chrono::duration_cast<std::chrono::nanoseconds>(abs_time - current));
393 |     }
394 |     template <class Rep, class Period>
395 |     bool try_acquire_for(std::chrono::duration<Rep, Period> const& rel_time) {
396 |         return __try_acquire_fast() ||
397 |                __try_done(__semaphore_sem_wait_timed(__semaphore, rel_time));
398 |     }
399 | };
400 | 
401 | #endif //__NO_SEM
402 | 
403 | template<ptrdiff_t least_max_value>
404 | using semaphore_base = 
405 | #ifndef __NO_SEM
406 |     typename std::conditional<least_max_value <= __semaphore_base::max(), 
407 |                               __semaphore_base, 
408 |                               __atomic_semaphore_base>::type
409 | #else
410 |     __atomic_semaphore_base
411 | #endif
412 |     ;
413 | 
414 | template<ptrdiff_t least_max_value = INT_MAX>
415 | class counting_semaphore : public semaphore_base<least_max_value> {
416 |     static_assert(least_max_value <= semaphore_base<least_max_value>::max(), "");
417 | 
418 | public:
419 |     counting_semaphore(ptrdiff_t count = 0) : semaphore_base<least_max_value>(count) { }
420 |     ~counting_semaphore() = default;
421 | 
422 |     counting_semaphore(const counting_semaphore&) = delete;
423 |     counting_semaphore& operator=(const counting_semaphore&) = delete;
424 | };
425 | 
426 | #ifdef __NO_SEM
427 | 
428 | class __binary_semaphore_base {
429 | 
430 |     __semaphore_no_inline inline bool __acquire_slow_timed(std::chrono::nanoseconds const& rel_time) {
431 | 
432 |         using __clock = std::conditional<std::chrono::high_resolution_clock::is_steady, 
433 |                                          std::chrono::high_resolution_clock,
434 |                                          std::chrono::steady_clock>::type;
435 | 
436 |         auto const start = __clock::now();
437 |         while (!try_acquire()) {
438 |             auto const elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(__clock::now() - start);
439 |             auto const delta = rel_time - elapsed;
440 |             if(delta <= std::chrono::nanoseconds(0))
441 |                 return false;
442 |             auto const sleep = std::min((elapsed.count() >> 2) + 100, delta.count());
443 |             std::this_thread::sleep_for(std::chrono::nanoseconds(sleep));
444 |         }
445 |         return true;
446 |     }
447 |     std::atomic<int> available;
448 | 
449 | public:
450 |     static constexpr ptrdiff_t max() noexcept { return 1; }
451 | 
452 |     __binary_semaphore_base(ptrdiff_t available) : available(available) { }
453 | 
454 |     ~__binary_semaphore_base() = default;
455 | 
456 |     __binary_semaphore_base(__binary_semaphore_base const&) = delete;
457 |     __binary_semaphore_base& operator=(__binary_semaphore_base const&) = delete;
458 | 
459 |     inline void release(ptrdiff_t update = 1) {
460 |         available.store(1, std::memory_order_release);
461 |         atomic_notify_one(&available);
462 |     }
463 |     inline void acquire() {
464 |         while (!__builtin_expect(try_acquire(), 1))
465 |             atomic_wait_explicit(&available, 0, std::memory_order_relaxed);
466 |     }
467 | 
468 |     inline bool try_acquire() noexcept {
469 |         return 1 == available.exchange(0, std::memory_order_acquire);
470 |     }
471 |     template <class Clock, class Duration>
472 |     bool try_acquire_until(std::chrono::time_point<Clock, Duration> const& abs_time) {
473 | 
474 |         if (__builtin_expect(try_acquire(), 1))
475 |             return true;
476 |         else
477 |             return __acquire_slow_timed(abs_time - Clock::now());
478 |     }
479 |     template <class Rep, class Period>
480 |     bool try_acquire_for(std::chrono::duration<Rep, Period> const& rel_time) {
481 | 
482 |         if (__builtin_expect(try_acquire(), 1))
483 |             return true;
484 |         else
485 |             return __acquire_slow_timed(rel_time);
486 |     }
487 | };
488 | 
489 | template<>
490 | class counting_semaphore<1> : public __binary_semaphore_base {
491 | public:
492 |     counting_semaphore(ptrdiff_t count = 0) : __binary_semaphore_base(count) { }
493 |     ~counting_semaphore() = default;
494 | 
495 |     counting_semaphore(const counting_semaphore&) = delete;
496 |     counting_semaphore& operator=(const counting_semaphore&) = delete;
497 | };
498 | 
499 | #endif // __NO_SEM
500 | 
501 | using binary_semaphore = counting_semaphore<1>;
502 | 
503 | }
504 | 


--------------------------------------------------------------------------------
/lib/source.cpp:
--------------------------------------------------------------------------------
 1 | /*
 2 | 
 3 | Copyright (c) 2019, NVIDIA Corporation
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 | 
23 | */
24 | 
25 | #include <atomic_wait>
26 | #include <barrier>
27 | #include <thread>
28 | 
29 | #ifdef __TABLE
30 | 
31 | contended_t contention[256];
32 | 
33 | contended_t * __contention(volatile void const * p) {
34 |     return contention + ((uintptr_t)p & 255);
35 | }
36 | 
37 | #endif //__TABLE
38 | 
39 | thread_local size_t __barrier_favorite_hash =
40 |     std::hash<std::thread::id>()(std::this_thread::get_id());
41 | 


--------------------------------------------------------------------------------
/sample.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | Copyright (c) 2019, NVIDIA Corporation
  4 | 
  5 | Permission is hereby granted, free of charge, to any person obtaining a copy
  6 | of this software and associated documentation files (the "Software"), to deal
  7 | in the Software without restriction, including without limitation the rights
  8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  9 | copies of the Software, and to permit persons to whom the Software is
 10 | furnished to do so, subject to the following conditions:
 11 | 
 12 | The above copyright notice and this permission notice shall be included in
 13 | all copies or substantial portions of the Software.
 14 | 
 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 21 | THE SOFTWARE.
 22 | 
 23 | */
 24 | 
 25 | // WAIT / NOTIFY
 26 |     //#define __NO_TABLE
 27 |     //#define __NO_FUTEX
 28 |     //#define __NO_CONDVAR
 29 |     //#define __NO_SLEEP
 30 |     //#define __NO_IDENT
 31 |     // To benchmark against spinning
 32 |     //#define __NO_SPIN
 33 |     //#define __NO_WAIT
 34 | 
 35 | // SEMAPHORE
 36 |     //#define __NO_SEM
 37 |     //#define __NO_SEM_BACK
 38 |     //#define __NO_SEM_FRONT
 39 |     //#define __NO_SEM_POLL
 40 | 
 41 | #include <cmath>
 42 | #include <mutex>
 43 | #include <thread>
 44 | #include <vector>
 45 | #include <string>
 46 | #include <iostream>
 47 | #include <iomanip>
 48 | #include <numeric>
 49 | #include <tuple>
 50 | #include <set>
 51 | 
 52 | #include "sample.hpp"
 53 | 
 54 | static constexpr int sections = 1 << 20;
 55 | 
 56 | using sum_mean_dev_t = std::tuple<int, double, double>;
 57 | 
 58 | template<class V>
 59 | sum_mean_dev_t sum_mean_dev(V && v) {
 60 |     assert(!v.empty());
 61 |     auto const sum = std::accumulate(v.begin(), v.end(), 0);
 62 |     auto const mean = sum / v.size();
 63 |     auto const sq_diff_sum = std::accumulate(v.begin(), v.end(), 0.0, [=](auto left, auto right) -> auto {
 64 |         return left + (right - mean) * (right - mean);
 65 |     });
 66 |     auto const variance = sq_diff_sum / v.size();
 67 |     auto const stddev = std::sqrt(variance);
 68 |     return std::tie(sum, mean, stddev);
 69 | }
 70 | 
 71 | template <class F>
 72 | sum_mean_dev_t test_body(int threads, F && f) {
 73 | 
 74 |     std::vector<int> progress(threads, 0);
 75 | 	std::vector<std::thread> ts(threads);
 76 | 	for (int i = 0; i < threads; ++i)
 77 | 		ts[i] = std::thread([&, i]() {
 78 |             progress[i] = f(sections / threads);
 79 |         });
 80 | 
 81 | 	for (auto& t : ts)
 82 | 		t.join();
 83 | 
 84 |     return sum_mean_dev(progress);
 85 | }
 86 | 
 87 | template <class F>
 88 | sum_mean_dev_t test_omp_body(int threads, F && f) {
 89 | #ifdef _OPENMP
 90 |     std::vector<int> progress(threads, 0);
 91 |     #pragma omp parallel for num_threads(threads)
 92 |     for (int i = 0; i < threads; ++i)
 93 |         progress[i] = f(sections / threads);
 94 |     return sum_mean_dev(progress);
 95 | #else
 96 |     assert(0); // build with -fopenmp
 97 | 	return sum_mean_dev_t();
 98 | #endif
 99 | }
100 | 
101 | template <class F>
102 | void test(std::string const& name, int threads, F && f, std::atomic<bool>& keep_going, bool use_omp = false) {
103 | 
104 |     std::thread test_helper([&]() {
105 |         std::this_thread::sleep_for(std::chrono::seconds(2));
106 |         keep_going.store(false, std::memory_order_relaxed);
107 |     });
108 | 
109 |     auto const t1 = std::chrono::steady_clock::now();
110 |     auto const smd = use_omp ? test_omp_body(threads, f)
111 |                              : test_body(threads, f);
112 |     auto const t2 = std::chrono::steady_clock::now();
113 | 
114 |     test_helper.join();
115 | 
116 | 	double const d = double(std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count());
117 |     std::cout << std::setprecision(2) << std::fixed;
118 | 	std::cout << name << " : " << d / std::get<0>(smd) << "ns per step, fairness metric = " 
119 |                          << 100 * (1.0 - std::min(1.0, std::get<2>(smd) / std::get<1>(smd))) << "%." 
120 |                          << std::endl;
121 | }
122 | 
123 | template<class F>
124 | void test_loop(F && f) {
125 |     static int const max = std::thread::hardware_concurrency();
126 |     static std::vector<std::pair<int, std::string>> const counts = 
127 |         { { 1, "single-threaded" }, 
128 |           { max >> 5, "3% occupancy" },
129 |           { max >> 4, "6% occupancy" },
130 |           { max >> 3, "12% occupancy" },
131 |           { max >> 2, "25% occupancy" },
132 |           { max >> 1, "50% occupancy" },
133 |           { max, "100% occupancy" },
134 | //#if !defined(__NO_SPIN) || !defined(__NO_WAIT)
135 | //          { max * 2, "200% occupancy" } 
136 | //#endif
137 |         };
138 |     std::set<int> done{0};
139 |     for(auto const& c : counts) {
140 |         if(done.find(c.first) != done.end())
141 |             continue;
142 |         f(c);
143 |         done.insert(c.first);
144 |     }
145 | }
146 | 
147 | template<class M>
148 | void test_mutex(std::string const& name, bool use_omp = false) {
149 |     test_loop([&](auto c) {
150 |         M m;
151 |         std::atomic<bool> keep_going(true);
152 |         auto f = [&](int n) -> int {
153 |             int i = 0;
154 |             while(keep_going.load(std::memory_order_relaxed)) {
155 |                 m.lock();
156 |                 ++i;
157 |                 m.unlock();
158 |             }
159 |             return i;
160 |         };
161 |         test(name + ": " + c.second, c.first, f, keep_going);
162 |     });
163 | };
164 | 
165 | template<class B>
166 | void test_barrier(std::string const& name, bool use_omp = false) {
167 | 
168 |     test_loop([&](auto c) {
169 |         B b(c.first);
170 |         std::atomic<bool> keep_going(true); // unused here
171 |         auto f = [&](int n)  -> int {
172 |             for (int i = 0; i < n; ++i)
173 |                 b.arrive_and_wait();
174 |             return n;
175 |         };
176 |         test(name + ": " + c.second, c.first, f, keep_going, use_omp);
177 |     });
178 | };
179 | 
180 | int main() {
181 | 
182 |     int const max = std::thread::hardware_concurrency();
183 |     std::cout << "System has " << max << " hardware threads." << std::endl;
184 | 
185 | #ifndef __NO_MUTEX
186 |     test_mutex<sem_mutex>("Semlock");
187 |     test_mutex<mutex>("Spinlock");
188 |     test_mutex<ticket_mutex>("Ticket");
189 | #endif
190 | 
191 | #ifndef __NO_BARRIER
192 |     test_barrier<barrier<>>("Barrier");
193 | #endif
194 | 
195 | #ifdef _OPENMP
196 |     struct omp_barrier {
197 |         omp_barrier(ptrdiff_t) { }
198 |         void arrive_and_wait() {
199 |             #pragma omp barrier
200 |         }
201 |     };
202 |     test_barrier<omp_barrier>("OMP", true);
203 | #endif
204 | /*
205 | #if defined(_POSIX_THREADS) && !defined(__APPLE__)
206 |     struct posix_barrier {
207 |         posix_barrier(ptrdiff_t count) {
208 |             pthread_barrier_init(&pb, nullptr, count);
209 |         }
210 |         ~posix_barrier() {
211 |             pthread_barrier_destroy(&pb);
212 |         }
213 |         void arrive_and_wait() {
214 |             pthread_barrier_wait(&pb);
215 |         }
216 |         pthread_barrier_t pb;
217 |     };
218 |     test_barrier<posix_barrier>("Pthread");
219 | #endif
220 | */
221 | 	return 0;
222 | }
223 | 


--------------------------------------------------------------------------------
/sample.hpp:
--------------------------------------------------------------------------------
 1 | /*
 2 | 
 3 | Copyright (c) 2019, NVIDIA Corporation
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 | 
23 | */
24 | 
25 | #include <chrono>
26 | #include <atomic>
27 | #include <atomic_wait>
28 | #include <semaphore>
29 | #include <latch>
30 | #include <barrier>
31 | 
32 | struct mutex {
33 | 	void lock() noexcept {
34 | 		while (1 == l.exchange(1, std::memory_order_acquire))
35 | #ifndef __NO_WAIT
36 | 			atomic_wait_explicit(&l, 1, std::memory_order_relaxed)
37 | #endif
38 |             ;
39 | 	}
40 | 	void unlock() noexcept {
41 | 		l.store(0, std::memory_order_release);
42 | #ifndef __NO_WAIT
43 | 		atomic_notify_one(&l);
44 | #endif
45 | 	}
46 | 	std::atomic<int> l = ATOMIC_VAR_INIT(0);
47 | };
48 | 
49 | struct ticket_mutex {
50 | 	void lock() noexcept {
51 |         auto const my = in.fetch_add(1, std::memory_order_acquire);
52 |         while(1) {
53 |             auto const now = out.load(std::memory_order_acquire);
54 |             if(now == my)
55 |                 return;
56 | #ifndef __NO_WAIT
57 |             atomic_wait_explicit(&out, now, std::memory_order_relaxed);
58 | #endif
59 |         }
60 | 	}
61 | 	void unlock() noexcept {
62 | 		out.fetch_add(1, std::memory_order_release);
63 | #ifndef __NO_WAIT
64 | 		atomic_notify_all(&out);
65 | #endif
66 | 	}
67 | 	alignas(64) std::atomic<int> in = ATOMIC_VAR_INIT(0);
68 |     alignas(64) std::atomic<int> out = ATOMIC_VAR_INIT(0);
69 | };
70 | 
71 | struct sem_mutex {
72 | 	void lock() noexcept {
73 |         c.acquire();
74 | 	}
75 | 	void unlock() noexcept {
76 |         c.release();
77 | 	}
78 | 	std::binary_semaphore c = 1;
79 | };
80 | 


--------------------------------------------------------------------------------