├── .gitignore ├── README.md ├── build.bat ├── build.sh ├── build_clang.sh ├── build_clang_omp.sh ├── build_omp.sh ├── include ├── atomic_wait ├── barrier ├── latch └── semaphore ├── lib └── source.cpp ├── sample.cpp └── sample.hpp /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode/* 2 | sample 3 | sample.exe 4 | *.obj 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Sample implementation of C++20 synchronization facilities 2 | 3 | This repository contains a sample implementation of these http://wg21.link/p1135 facilities: 4 | 5 | * `atomic_wait` / `_notify` free-functions (not the member functions) 6 | * `counting_` / `binary_semaphore` 7 | * `latch` and `barrier` 8 | 9 | ## How do I build the sample? 10 | 11 | This project is self-contained. 12 | 13 | ``` 14 | git clone https://github.com/ogiroux/atomic_wait 15 | build.sh 16 | ``` 17 | 18 | ## What platforms are supported? 19 | 20 | Linux, Mac and Windows. 21 | 22 | ## How are `atomic_wait` / `_notify` composed? 23 | 24 | The implementation has a variety of strategies that it selects by platform: 25 | * Contention state table. Optimizes futex usage, or holds CVs, unless `-D__NO_TABLE`. 26 | * Futex. Supported on Linux and Windows, unless `-D__NO_FUTEX`. Requires a table on Linux. 27 | * Condition variables. Supported on Linux and Mac, unless `-D__NO_CONDVAR`. Requires a table. 28 | * Timed back-off. Supported on everything, unless `-D__NO_SLEEP`. 29 | * Spinlock. Supported on everything, only used last unless `-D__NO_IDENT`. 30 | 31 | These strategies are selected for each platform, in the order written, based on what's disabled with the macros: 32 | * Linux: futex + table -> CVs + table -> timed backoff -> spin. 33 | * Mac: CVs + table -> timed backoff -> spin. 34 | * Windows: futex -> timed backoff -> spin. 35 | * CUDA: timed backoff -> spin. 36 | * Unidentified platform: spin. 37 | 38 | ## How do `counting_` / `binary_semaphore` work? 39 | 40 | The implementation has these specializations: 41 | 42 | * The fully general case, for `counting_semaphore` instantiated for huge numbers. This is implemented in terms of `atomic`, `atomic_wait` / `_notify`. This path is always enabled. 43 | * The constrained case, the default range, for numbers supported by the underlying platform semaphore (typically a `long`). This is implemented in terms of POSIX, Dispatch and Win32 semaphores, with optimizations below. Disable this path with `-D__NO_SEM`. 44 | * The case of a unit range, such as the alias `binary_semaphore`. This is specialized only when platform semaphores are disabled. This path uses `atomic`, `atomic_wait` / `_notify`. 45 | 46 | Platform semaphores get (because they need) some additional optimizations, in up two orthogonal directions: 47 | 48 | * Front buffering: an `atomic` object models the semaphore's conceptual count (incl. negative values). Operations to the platform semaphore are avoided as long as the modeled count stays positive. This is enabled by default on all platforms, but can be disabled with `-D__NO_SEM_FRONT`. 49 | * Back buffering: when the platform semaphore does not natively support the `release( count )` operation, an `atomic` object distributes the `release(1)` cooperatively among all threads waiting on the semaphore, as in a binary tree. This is used by default on Linux and Mac OS X, and can be disabled with `-D__NO_SEM_BACK`. 50 | 51 | ## What about `latch` and `barrier`? 52 | 53 | At the moment, they are only implemented in terms of `atomic` operations. These aren't ready for review. 54 | -------------------------------------------------------------------------------- /build.bat: -------------------------------------------------------------------------------- 1 | cl /EHsc /Iinclude /std:c++17 /O2 sample.cpp lib/source.cpp synchronization.lib /Fe:sample.exe 2 | -------------------------------------------------------------------------------- /build.sh: -------------------------------------------------------------------------------- 1 | g++ -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample 2 | -------------------------------------------------------------------------------- /build_clang.sh: -------------------------------------------------------------------------------- 1 | clang -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample 2 | -------------------------------------------------------------------------------- /build_clang_omp.sh: -------------------------------------------------------------------------------- 1 | clang -fopenmp=libomp -L../llvm-project/build/lib/ -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample 2 | -------------------------------------------------------------------------------- /build_omp.sh: -------------------------------------------------------------------------------- 1 | g++ -fopenmp -Iinclude -std=c++17 -O2 sample.cpp lib/source.cpp -lstdc++ -lpthread -lm -o sample 2 | -------------------------------------------------------------------------------- /include/atomic_wait: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | /* 26 | 27 | This file introduces std::atomic_wait, atomic_notify_one, atomic_notify_all. 28 | 29 | It has these strategies implemented: 30 | * Contention table. Used to optimize futex notify, or to hold CVs. Disable with __NO_TABLE. 31 | * Futex. Supported on Linux and Windows. For performance requires a table on Linux. Disable with __NO_FUTEX. 32 | * Condition variables. Supported on Linux and Mac. Requires table to function. Disable with __NO_CONDVAR. 33 | * Timed back-off. Supported on everything. Disable with __NO_SLEEP. 34 | * Spinlock. Supported on everything. Force with __NO_IDENT. Note: performance is too terrible to use. 35 | 36 | You can also compare to pure spinning at algorithm level with __NO_WAIT. 37 | 38 | The strategy is chosen this way, by platform: 39 | * Linux: default to futex (with table), fallback to futex (no table) -> CVs -> timed backoff -> spin. 40 | * Mac: default to CVs (table), fallback to timed backoff -> spin. 41 | * Windows: default to futex (no table), fallback to timed backoff -> spin. 42 | * CUDA: default to timed backoff, fallback to spin. (This is not all checked in in this tree.) 43 | * Unidentified platform: default to spin. 44 | 45 | */ 46 | 47 | //#define __NO_TABLE 48 | //#define __NO_FUTEX 49 | //#define __NO_CONDVAR 50 | //#define __NO_SLEEP 51 | //#define __NO_IDENT 52 | 53 | // To benchmark against spinning 54 | //#define __NO_SPIN 55 | //#define __NO_WAIT 56 | 57 | #ifndef __ATOMIC_WAIT_INCLUDED 58 | #define __ATOMIC_WAIT_INCLUDED 59 | 60 | #include 61 | #include 62 | #include 63 | #include 64 | 65 | #if defined(__NO_IDENT) 66 | 67 | #include 68 | #include 69 | 70 | #define __ABI 71 | #define __YIELD() std::this_thread::yield() 72 | #define __SLEEP(x) std::this_thread::sleep_for(std::chrono::microseconds(x)) 73 | #define __YIELD_PROCESSOR() 74 | 75 | #else 76 | 77 | #if defined(__CUSTD__) 78 | #define __NO_FUTEX 79 | #define __NO_CONDVAR 80 | #ifndef __CUDACC__ 81 | #define __host__ 82 | #define __device__ 83 | #endif 84 | #define __ABI __host__ __device__ 85 | #else 86 | #define __ABI 87 | #endif 88 | 89 | #if defined(__APPLE__) || defined(__linux__) 90 | 91 | #include 92 | #include 93 | #define __YIELD() sched_yield() 94 | #define __SLEEP(x) usleep(x) 95 | 96 | #if defined(__aarch64__) 97 | # define __YIELD_PROCESSOR() asm volatile ("yield" ::: "memory") 98 | #elif defined(__x86_64__) 99 | # define __YIELD_PROCESSOR() asm volatile ("pause" ::: "memory") 100 | #elif defined (__powerpc__) 101 | # define __YIELD_PROCESSOR() asm volatile ("or 27,27,27" ::: "memory") 102 | #endif 103 | #endif 104 | 105 | #if defined(__linux__) && !defined(__NO_FUTEX) 106 | 107 | #if !defined(__NO_TABLE) 108 | #define __TABLE 109 | #endif 110 | 111 | #include 112 | #include 113 | #include 114 | #include 115 | 116 | #define __FUTEX 117 | #define __FUTEX_TIMED 118 | #define __type_used_directly(_T) (std::is_same::type>::type, __futex_preferred_t>::value) 120 | using __futex_preferred_t = std::int32_t; 121 | template ::type = 1> 122 | void __do_direct_wait(_Tp const* ptr, _Tp val, void const* timeout) { 123 | syscall(SYS_futex, ptr, FUTEX_WAIT_PRIVATE, val, timeout, 0, 0); 124 | } 125 | template ::type = 1> 126 | void __do_direct_wake(_Tp const* ptr, bool all) { 127 | syscall(SYS_futex, ptr, FUTEX_WAKE_PRIVATE, all ? INT_MAX : 1, 0, 0, 0); 128 | } 129 | 130 | #elif defined(_WIN32) && !defined(__CUSTD__) 131 | 132 | #define __NO_CONDVAR 133 | #define __NO_TABLE 134 | 135 | #include 136 | #define __YIELD() Sleep(0) 137 | #define __SLEEP(x) Sleep(x) 138 | #define __YIELD_PROCESSOR() YieldProcessor() 139 | 140 | #include 141 | template 142 | auto __atomic_load_n(_Tp const* a, int) -> typename std::remove_reference::type { 143 | auto const t = *a; 144 | _ReadWriteBarrier(); 145 | return t; 146 | } 147 | #define __builtin_expect(e, v) (e) 148 | 149 | #if defined(_WIN32_WINNT) && (_WIN32_WINNT >= _WIN32_WINNT_WIN8) && !defined(__NO_FUTEX) 150 | 151 | #define __FUTEX 152 | #define __type_used_directly(_T) (sizeof(_T) <= 8) 153 | using __futex_preferred_t = std::int64_t; 154 | template ::type = 1> 155 | void __do_direct_wait(_Tp const* ptr, _Tp val, void const*) { 156 | WaitOnAddress((PVOID)ptr, (PVOID)&val, sizeof(_Tp), INFINITE); 157 | } 158 | template ::type = 1> 159 | void __do_direct_wake(_Tp const* ptr, bool all) { 160 | if (all) 161 | WakeByAddressAll((PVOID)ptr); 162 | else 163 | WakeByAddressSingle((PVOID)ptr); 164 | } 165 | 166 | #endif 167 | #endif // _WIN32 168 | 169 | #if !defined(__FUTEX) && !defined(__NO_CONDVAR) 170 | 171 | #if defined(__NO_TABLE) 172 | #warning "Condvars always generate a table (ignoring __NO_TABLE)." 173 | #endif 174 | #include 175 | #define __CONDVAR 176 | #define __TABLE 177 | #endif 178 | 179 | #endif // __NO_IDENT 180 | 181 | #ifdef __TABLE 182 | struct alignas(64) contended_t { 183 | #if defined(__FUTEX) 184 | int waiters = 0; 185 | __futex_preferred_t version = 0; 186 | #elif defined(__CONDVAR) 187 | int credit = 0; 188 | pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; 189 | pthread_cond_t condvar = PTHREAD_COND_INITIALIZER; 190 | #else 191 | #error "" 192 | #endif 193 | }; 194 | contended_t * __contention(volatile void const * p); 195 | #else 196 | template 197 | __ABI void __cxx_atomic_try_wait_slow_fallback(_Tp const* ptr, _Tp val, int order) { 198 | #ifndef __NO_SLEEP 199 | long history = 10; 200 | do { 201 | __SLEEP(history >> 2); 202 | history += history >> 2; 203 | if (history > (1 << 10)) 204 | history = 1 << 10; 205 | } while (__atomic_load_n(ptr, order) == val); 206 | #else 207 | __YIELD(); 208 | #endif 209 | } 210 | #endif // __TABLE 211 | 212 | #if defined(__CONDVAR) 213 | 214 | template 215 | void __cxx_atomic_notify_all(volatile _Tp const* ptr) { 216 | auto * const c = __contention(ptr); 217 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 218 | if(__builtin_expect(0 == __atomic_load_n(&c->credit, __ATOMIC_RELAXED), 1)) 219 | return; 220 | if(0 != __atomic_exchange_n(&c->credit, 0, __ATOMIC_RELAXED)) { 221 | pthread_mutex_lock(&c->mutex); 222 | pthread_mutex_unlock(&c->mutex); 223 | pthread_cond_broadcast(&c->condvar); 224 | } 225 | } 226 | template 227 | void __cxx_atomic_notify_one(volatile _Tp const* ptr) { 228 | __cxx_atomic_notify_all(ptr); 229 | } 230 | template 231 | void __cxx_atomic_try_wait_slow(volatile _Tp const* ptr, _Tp const val, int order) { 232 | auto * const c = __contention(ptr); 233 | pthread_mutex_lock(&c->mutex); 234 | __atomic_store_n(&c->credit, 1, __ATOMIC_RELAXED); 235 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 236 | if (val == __atomic_load_n(ptr, order)) 237 | pthread_cond_wait(&c->condvar, &c->mutex); 238 | pthread_mutex_unlock(&c->mutex); 239 | } 240 | 241 | #elif defined(__FUTEX) 242 | 243 | template ::type = 1> 244 | void __cxx_atomic_notify_all(_Tp const* ptr) { 245 | #if defined(__TABLE) 246 | auto * const c = __contention(ptr); 247 | __atomic_fetch_add(&c->version, 1, __ATOMIC_RELAXED); 248 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 249 | if (0 != __atomic_exchange_n(&c->waiters, 0, __ATOMIC_RELAXED)) 250 | __do_direct_wake(&c->version, true); 251 | #endif 252 | } 253 | template ::type = 1> 254 | void __cxx_atomic_notify_one(_Tp const* ptr) { 255 | __cxx_atomic_notify_all(ptr); 256 | } 257 | template ::type = 1> 258 | void __cxx_atomic_try_wait_slow(_Tp const* ptr, _Tp const val, int order) { 259 | #if defined(__TABLE) 260 | auto * const c = __contention(ptr); 261 | __atomic_store_n(&c->waiters, 1, __ATOMIC_RELAXED); 262 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 263 | auto const version = __atomic_load_n(&c->version, __ATOMIC_RELAXED); 264 | if (__builtin_expect(val != __atomic_load_n(ptr, order), 1)) 265 | return; 266 | #ifdef __FUTEX_TIMED 267 | constexpr timespec timeout = { 2, 0 }; // Hedge on rare 'int version' aliasing. 268 | __do_direct_wait(&c->version, version, &timeout); 269 | #else 270 | __do_direct_wait(&c->version, version, nullptr); 271 | #endif 272 | #else 273 | __cxx_atomic_try_wait_slow_fallback(ptr, val, order); 274 | #endif // __TABLE 275 | } 276 | 277 | template ::type = 1> 278 | void __cxx_atomic_try_wait_slow(_Tp const* ptr, _Tp val, int order) { 279 | #ifdef __TABLE 280 | auto * const c = __contention(ptr); 281 | __atomic_fetch_add(&c->waiters, 1, __ATOMIC_RELAXED); 282 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 283 | #endif 284 | __do_direct_wait(ptr, val, nullptr); 285 | #ifdef __TABLE 286 | __atomic_fetch_sub(&c->waiters, 1, __ATOMIC_RELAXED); 287 | #endif 288 | } 289 | template ::type = 1> 290 | void __cxx_atomic_notify_all(_Tp const* ptr) { 291 | #ifdef __TABLE 292 | auto * const c = __contention(ptr); 293 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 294 | if (0 != __atomic_load_n(&c->waiters, __ATOMIC_RELAXED)) 295 | #endif 296 | __do_direct_wake(ptr, true); 297 | } 298 | template ::type = 1> 299 | void __cxx_atomic_notify_one(_Tp const* ptr) { 300 | #ifdef __TABLE 301 | auto * const c = __contention(ptr); 302 | __atomic_thread_fence(__ATOMIC_SEQ_CST); 303 | if (0 != __atomic_load_n(&c->waiters, __ATOMIC_RELAXED)) 304 | #endif 305 | __do_direct_wake(ptr, false); 306 | } 307 | 308 | #else // __FUTEX || __CONDVAR 309 | 310 | template 311 | __ABI void __cxx_atomic_try_wait_slow(_Tp const* ptr, _Tp val, int order) { 312 | __cxx_atomic_try_wait_slow_fallback(ptr, val, order); 313 | } 314 | template 315 | __ABI void __cxx_atomic_notify_one(_Tp const* ptr) { } 316 | template 317 | __ABI void __cxx_atomic_notify_all(_Tp const* ptr) { } 318 | 319 | #endif // __FUTEX || __CONDVAR 320 | 321 | template 322 | __ABI void __cxx_atomic_wait(_Tp const* ptr, _Tp const val, int order) { 323 | #ifndef __NO_SPIN 324 | if(__builtin_expect(__atomic_load_n(ptr, order) != val,1)) 325 | return; 326 | for(int i = 0; i < 16; ++i) { 327 | if(__atomic_load_n(ptr, order) != val) 328 | return; 329 | if(i < 12) 330 | __YIELD_PROCESSOR(); 331 | else 332 | __YIELD(); 333 | } 334 | #endif 335 | while(val == __atomic_load_n(ptr, order)) 336 | #ifndef __NO_WAIT 337 | __cxx_atomic_try_wait_slow(ptr, val, order) 338 | #endif 339 | ; 340 | } 341 | 342 | #include 343 | 344 | namespace std { 345 | 346 | template 347 | __ABI void atomic_wait_explicit(atomic<_Tp> const* a, _Tv val, std::memory_order order) { 348 | __cxx_atomic_wait((const _Tp*)a, (_Tp)val, (int)order); 349 | } 350 | template 351 | __ABI void atomic_wait(atomic<_Tp> const* a, _Tv val) { 352 | atomic_wait_explicit(a, val, std::memory_order_seq_cst); 353 | } 354 | template 355 | __ABI void atomic_notify_one(atomic<_Tp> const* a) { 356 | __cxx_atomic_notify_one((const _Tp*)a); 357 | } 358 | template 359 | __ABI void atomic_notify_all(atomic<_Tp> const* a) { 360 | __cxx_atomic_notify_all((const _Tp*)a); 361 | } 362 | } 363 | 364 | #endif //__ATOMIC_WAIT_INCLUDED 365 | -------------------------------------------------------------------------------- /include/barrier: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include "atomic_wait" 29 | 30 | //#define __BARRIER_NO_BUTTERFLY 31 | //#define __BARRIER_NO_WAIT 32 | //#define __BARRIER_NO_SPECIALIZATION 33 | 34 | struct EmptyCompletionF { 35 | inline void operator()() noexcept { } 36 | }; 37 | 38 | #ifndef __BARRIER_NO_BUTTERFLY 39 | 40 | extern thread_local size_t __barrier_favorite_hash; 41 | 42 | template 43 | class barrier { 44 | 45 | static constexpr size_t __max_steps = CHAR_BIT * sizeof(ptrdiff_t) - 1; 46 | 47 | using __phase_t = uint8_t; 48 | 49 | struct alignas(64) __state_t { 50 | std::atomic<__phase_t> v[__max_steps + 1] = {0}; 51 | }; 52 | 53 | alignas(64) ptrdiff_t expected; 54 | ptrdiff_t expected_steps; 55 | std::atomic expected_adjustment; 56 | std::atomic<__phase_t> phase; 57 | alignas(64) std::vector<__state_t> state; 58 | CompletionF completion; 59 | 60 | static constexpr __phase_t __next_phase(__phase_t old) { 61 | return (old + 1) & 3; 62 | } 63 | inline bool __try_get_id(size_t id, __phase_t old_phase) { 64 | return state[id].v[__max_steps].compare_exchange_strong(old_phase, 65 | __next_phase(old_phase), 66 | std::memory_order_relaxed); 67 | } 68 | inline size_t __get_id(__phase_t const old_phase, ptrdiff_t count) { 69 | for(auto i = 0; i < count; ++i) { 70 | auto id = __barrier_favorite_hash + i; 71 | if(id > count) 72 | id %= count; 73 | if(__builtin_expect(__try_get_id(id, old_phase), 1)) { 74 | if(__barrier_favorite_hash != id) 75 | __barrier_favorite_hash = id; 76 | return id; 77 | } 78 | } 79 | return ~0ull; 80 | } 81 | static constexpr uint32_t __log2_floor(ptrdiff_t count) { 82 | return count <= 1 ? 0 : 1 + __log2_floor(count >> 1); 83 | } 84 | static constexpr uint32_t __log2_ceil(ptrdiff_t count) { 85 | auto const t = __log2_floor(count); 86 | return count == (1 << t) ? t : t + 1; 87 | } 88 | 89 | public: 90 | using arrival_token = std::tuple&, __phase_t>; 91 | 92 | barrier(ptrdiff_t expected, CompletionF completion = CompletionF()) 93 | : expected(expected), expected_steps(__log2_ceil(expected)), 94 | expected_adjustment(0), phase(0), 95 | state(expected), completion(completion) { 96 | assert(expected >= 0); 97 | } 98 | 99 | ~barrier() = default; 100 | 101 | barrier(barrier const&) = delete; 102 | barrier& operator=(barrier const&) = delete; 103 | 104 | [[nodiscard]] inline arrival_token arrive(ptrdiff_t update = 1) { 105 | 106 | size_t id = 0; // assume, for now 107 | auto const old_phase = phase.load(std::memory_order_relaxed); 108 | auto const count = expected; 109 | assert(count > 0); 110 | auto const steps = expected_steps; 111 | if(0 != steps) { 112 | id = __get_id(old_phase, count); 113 | assert(id != ~0ull); 114 | for(uint32_t k = 0;k < steps; ++k) { 115 | auto const index = steps - k - 1; 116 | state[(id + (1 << k)) % count].v[index].store(__next_phase(old_phase), std::memory_order_release); 117 | while(state[id].v[index].load(std::memory_order_acquire) == old_phase) 118 | ; 119 | } 120 | } 121 | if(0 == id) { 122 | completion(); 123 | expected += expected_adjustment.load(std::memory_order_relaxed); 124 | expected_steps = __log2_ceil(expected); 125 | expected_adjustment.store(0, std::memory_order_relaxed); 126 | phase.store(__next_phase(old_phase), std::memory_order_release); 127 | } 128 | return std::tie(phase, old_phase); 129 | } 130 | inline void wait(arrival_token&& token) const { 131 | auto const& current_phase = std::get<0>(token); 132 | auto const old_phase = std::get<1>(token); 133 | if(__builtin_expect(old_phase != current_phase.load(std::memory_order_acquire),1)) 134 | return; 135 | #ifndef __BARRIER_NO_WAIT 136 | using __clock = std::conditional::type; 139 | auto const start = __clock::now(); 140 | #endif 141 | while (old_phase == current_phase.load(std::memory_order_acquire)) { 142 | #ifndef __BARRIER_NO_WAIT 143 | auto const elapsed = std::chrono::duration_cast(__clock::now() - start); 144 | auto const step = std::min(elapsed / 4 + std::chrono::nanoseconds(100), 145 | std::chrono::nanoseconds(1500)); 146 | if(step > std::chrono::nanoseconds(1000)) 147 | std::this_thread::sleep_for(step); 148 | else if(step > std::chrono::nanoseconds(500)) 149 | #endif 150 | std::this_thread::yield(); 151 | } 152 | } 153 | inline void arrive_and_wait() { 154 | wait(arrive()); 155 | } 156 | inline void arrive_and_drop() { 157 | expected_adjustment.fetch_sub(1, std::memory_order_relaxed); 158 | (void)arrive(); 159 | } 160 | }; 161 | 162 | #else 163 | 164 | template 165 | class barrier { 166 | 167 | alignas(64) std::atomic phase; 168 | std::atomic expected, arrived; 169 | CompletionF completion; 170 | public: 171 | using arrival_token = bool; 172 | 173 | barrier(ptrdiff_t expected, CompletionF completion = CompletionF()) 174 | : phase(false), expected(expected), arrived(expected), completion(completion) { 175 | } 176 | 177 | ~barrier() = default; 178 | 179 | barrier(barrier const&) = delete; 180 | barrier& operator=(barrier const&) = delete; 181 | 182 | [[nodiscard]] arrival_token arrive(ptrdiff_t update = 1) { 183 | auto const old_phase = phase.load(std::memory_order_relaxed); 184 | auto const result = arrived.fetch_sub(update, std::memory_order_acq_rel) - update; 185 | assert(result >= 0); 186 | auto const new_expected = expected.load(std::memory_order_relaxed); 187 | if(0 == result) { 188 | completion(); 189 | arrived.store(new_expected, std::memory_order_relaxed); 190 | phase.store(!old_phase, std::memory_order_release); 191 | #ifndef __BARRIER_NO_WAIT 192 | atomic_notify_all(&phase); 193 | #endif 194 | } 195 | return old_phase; 196 | } 197 | void wait(arrival_token&& old_phase) const { 198 | #ifndef __BARRIER_NO_WAIT 199 | atomic_wait_explicit(&phase, old_phase, std::memory_order_acquire); 200 | #else 201 | while(old_phase == phase.load(std::memory_order_acquire)) 202 | ; 203 | #endif 204 | } 205 | void arrive_and_wait() { 206 | wait(arrive()); 207 | } 208 | void arrive_and_drop() { 209 | expected.fetch_sub(1, std::memory_order_relaxed); 210 | (void)arrive(); 211 | } 212 | }; 213 | 214 | #ifndef __BARRIER_NO_SPECIALIZATION 215 | 216 | template< > 217 | class barrier { 218 | 219 | static constexpr uint64_t expected_unit = 1ull; 220 | static constexpr uint64_t arrived_unit = 1ull << 32; 221 | static constexpr uint64_t expected_mask = arrived_unit - 1; 222 | static constexpr uint64_t phase_bit = 1ull << 63; 223 | static constexpr uint64_t arrived_mask = (phase_bit - 1) & ~expected_mask; 224 | 225 | alignas(64) std::atomic phase_arrived_expected; 226 | 227 | static inline constexpr uint64_t __init(ptrdiff_t count) noexcept { 228 | uint64_t const comp = (1u << 31) - count; 229 | return (comp << 32) | comp; 230 | } 231 | 232 | public: 233 | using arrival_token = uint64_t; 234 | 235 | barrier(ptrdiff_t count, EmptyCompletionF = EmptyCompletionF()) 236 | : phase_arrived_expected(__init(count)) { 237 | } 238 | 239 | ~barrier() = default; 240 | 241 | barrier(barrier const&) = delete; 242 | barrier& operator=(barrier const&) = delete; 243 | 244 | [[nodiscard]] inline arrival_token arrive(ptrdiff_t update = 1) { 245 | 246 | auto const old = phase_arrived_expected.fetch_add(arrived_unit, std::memory_order_acq_rel); 247 | if((old ^ (old + arrived_unit)) & phase_bit) { 248 | phase_arrived_expected.fetch_add((old & expected_mask) << 32, std::memory_order_relaxed); 249 | #ifndef __BARRIER_NO_WAIT 250 | atomic_notify_all(&phase_arrived_expected); 251 | #endif 252 | } 253 | return old & phase_bit; 254 | } 255 | inline void wait(arrival_token&& phase) const { 256 | 257 | while(1) { 258 | uint64_t const current = phase_arrived_expected.load(std::memory_order_acquire); 259 | if((current & phase_bit) != phase) 260 | return; 261 | #ifndef __BARRIER_NO_WAIT 262 | atomic_wait_explicit(&phase_arrived_expected, current, std::memory_order_relaxed); 263 | #endif 264 | } 265 | } 266 | inline void arrive_and_wait() { 267 | wait(arrive()); 268 | } 269 | inline void arrive_and_drop() { 270 | phase_arrived_expected.fetch_add(expected_unit, std::memory_order_relaxed); 271 | (void)arrive(); 272 | } 273 | }; 274 | 275 | #endif //__BARRIER_NO_SPECIALIZATION 276 | 277 | #endif //__BARRIER_NO_BUTTERFLY 278 | -------------------------------------------------------------------------------- /include/latch: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | #include "atomic_wait" 26 | 27 | namespace std { 28 | 29 | class latch { 30 | public: 31 | constexpr explicit latch(ptrdiff_t expected) : counter(expected) { } 32 | ~latch() = default; 33 | 34 | latch(const latch&) = delete; 35 | latch& operator=(const latch&) = delete; 36 | 37 | inline void count_down(ptrdiff_t update = 1) { 38 | assert(update > 0); 39 | auto const old = counter.fetch_sub(update, std::memory_order_release); 40 | assert(old >= update); 41 | #ifndef __NO_WAIT 42 | if(old == update) 43 | atomic_notify_all(&counter); 44 | #endif 45 | } 46 | inline bool try_wait() const noexcept { 47 | return counter.load(std::memory_order_acquire) == 0; 48 | } 49 | inline void wait() const { 50 | while(1) { 51 | auto const current = counter.load(std::memory_order_acquire); 52 | if(current == 0) 53 | return; 54 | #ifndef __NO_WAIT 55 | atomic_wait_explicit(&counter, current, std::memory_order_relaxed) 56 | #endif 57 | ; 58 | } 59 | } 60 | inline void arrive_and_wait(ptrdiff_t update = 1) { 61 | count_down(update); 62 | wait(); 63 | } 64 | 65 | private: 66 | std::atomic counter; 67 | }; 68 | 69 | } 70 | -------------------------------------------------------------------------------- /include/semaphore: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | #include 26 | #include 27 | #include "atomic_wait" 28 | 29 | //#define __NO_SEM 30 | //#define __NO_SEM_BACK 31 | //#define __NO_SEM_FRONT 32 | //#define __NO_SEM_POLL 33 | 34 | #if defined(__APPLE__) || defined(__linux__) 35 | #define __semaphore_no_inline __attribute__ ((noinline)) 36 | #elif defined(_WIN32) 37 | #define __semaphore_no_inline __declspec(noinline) 38 | #include 39 | #undef min 40 | #undef max 41 | #else 42 | #define __semaphore_no_inline 43 | #define __NO_SEM 44 | #endif 45 | 46 | #ifndef __NO_SEM 47 | 48 | #if defined(__APPLE__) 49 | 50 | #include 51 | 52 | #define __SEM_POST_ONE 53 | static constexpr ptrdiff_t __semaphore_max = std::numeric_limits::max(); 54 | using __semaphore_sem_t = dispatch_semaphore_t; 55 | 56 | inline bool __semaphore_sem_init(__semaphore_sem_t &sem, int init) { 57 | return (sem = dispatch_semaphore_create(init)) != NULL; 58 | } 59 | inline bool __semaphore_sem_destroy(__semaphore_sem_t &sem) { 60 | assert(sem != NULL); 61 | dispatch_release(sem); 62 | return true; 63 | } 64 | inline bool __semaphore_sem_post(__semaphore_sem_t &sem, int inc) { 65 | assert(inc == 1); 66 | dispatch_semaphore_signal(sem); 67 | return true; 68 | } 69 | inline bool __semaphore_sem_wait(__semaphore_sem_t &sem) { 70 | return dispatch_semaphore_wait(sem, DISPATCH_TIME_FOREVER) == 0; 71 | } 72 | template 73 | inline bool __semaphore_sem_wait_timed(__semaphore_sem_t &sem, std::chrono::duration const &delta) { 74 | return dispatch_semaphore_wait(sem, dispatch_time(DISPATCH_TIME_NOW, std::chrono::duration_cast(delta).count())) == 0; 75 | } 76 | 77 | #endif //__APPLE__ 78 | 79 | #if defined(__linux__) 80 | 81 | #include 82 | #include 83 | #include 84 | 85 | #ifndef __NO_SEM_POLL 86 | #define __NO_SEM_POLL 87 | #endif 88 | #define __SEM_POST_ONE 89 | static constexpr ptrdiff_t __semaphore_max = SEM_VALUE_MAX; 90 | using __semaphore_sem_t = sem_t; 91 | 92 | inline bool __semaphore_sem_init(__semaphore_sem_t &sem, int init) { 93 | return sem_init(&sem, 0, init) == 0; 94 | } 95 | inline bool __semaphore_sem_destroy(__semaphore_sem_t &sem) { 96 | return sem_destroy(&sem) == 0; 97 | } 98 | inline bool __semaphore_sem_post(__semaphore_sem_t &sem, int inc) { 99 | assert(inc == 1); 100 | return sem_post(&sem) == 0; 101 | } 102 | inline bool __semaphore_sem_wait(__semaphore_sem_t &sem) { 103 | return sem_wait(&sem) == 0; 104 | } 105 | template 106 | inline bool __semaphore_sem_wait_timed(__semaphore_sem_t &sem, std::chrono::duration const &delta) { 107 | struct timespec ts; 108 | ts.tv_sec = static_cast(std::chrono::duration_cast(delta).count()); 109 | ts.tv_nsec = static_cast(std::chrono::duration_cast(delta).count()); 110 | return sem_timedwait(&sem, &ts) == 0; 111 | } 112 | 113 | #endif //__linux__ 114 | 115 | #if defined(_WIN32) 116 | 117 | #define __NO_SEM_BACK 118 | 119 | static constexpr ptrdiff_t __semaphore_max = std::numeric_limits::max(); 120 | using __semaphore_sem_t = HANDLE; 121 | 122 | inline bool __semaphore_sem_init(__semaphore_sem_t &sem, int init) { 123 | bool const ret = (sem = CreateSemaphore(NULL, init, INT_MAX, NULL)) != NULL; 124 | assert(ret); 125 | return ret; 126 | } 127 | inline bool __semaphore_sem_destroy(__semaphore_sem_t &sem) { 128 | assert(sem != NULL); 129 | return CloseHandle(sem) == TRUE; 130 | } 131 | inline bool __semaphore_sem_post(__semaphore_sem_t &sem, int inc) { 132 | assert(sem != NULL); 133 | assert(inc > 0); 134 | return ReleaseSemaphore(sem, inc, NULL) == TRUE; 135 | } 136 | inline bool __semaphore_sem_wait(__semaphore_sem_t &sem) { 137 | assert(sem != NULL); 138 | return WaitForSingleObject(sem, INFINITE) == WAIT_OBJECT_0; 139 | } 140 | template 141 | inline bool __semaphore_sem_wait_timed(__semaphore_sem_t &sem, std::chrono::duration const &delta) { 142 | assert(sem != NULL); 143 | return WaitForSingleObject(sem, (DWORD)std::chrono::duration_cast(delta).count()) == WAIT_OBJECT_0; 144 | } 145 | 146 | #endif // _WIN32 147 | 148 | #endif 149 | 150 | namespace std { 151 | 152 | class __atomic_semaphore_base { 153 | 154 | __semaphore_no_inline inline bool __fetch_sub_if_slow(ptrdiff_t old) { 155 | while (old != 0) { 156 | if (count.compare_exchange_weak(old, old - 1, std::memory_order_acquire, std::memory_order_relaxed)) 157 | return true; 158 | } 159 | return false; 160 | } 161 | inline bool __fetch_sub_if() { 162 | 163 | ptrdiff_t old = count.load(std::memory_order_acquire); 164 | if (old == 0) 165 | return false; 166 | if(count.compare_exchange_weak(old, old - 1, std::memory_order_acquire, std::memory_order_relaxed)) 167 | return true; 168 | return __fetch_sub_if_slow(old); // fail only if not available 169 | } 170 | __semaphore_no_inline inline void __wait_slow() { 171 | while (1) { 172 | ptrdiff_t const old = count.load(std::memory_order_acquire); 173 | if(old != 0) 174 | break; 175 | atomic_wait_explicit(&count, old, std::memory_order_relaxed); 176 | } 177 | } 178 | __semaphore_no_inline inline bool __acquire_slow_timed(std::chrono::nanoseconds const& rel_time) { 179 | 180 | using __clock = std::conditional::type; 183 | 184 | auto const start = __clock::now(); 185 | while (1) { 186 | ptrdiff_t const old = count.load(std::memory_order_acquire); 187 | if(old != 0 && __fetch_sub_if_slow(old)) 188 | return true; 189 | auto const elapsed = std::chrono::duration_cast(__clock::now() - start); 190 | auto const delta = rel_time - elapsed; 191 | if(delta <= std::chrono::nanoseconds(0)) 192 | return false; 193 | auto const sleep = std::min((elapsed.count() >> 2) + 100, delta.count()); 194 | std::this_thread::sleep_for(std::chrono::nanoseconds(sleep)); 195 | } 196 | } 197 | std::atomic count; 198 | 199 | public: 200 | static constexpr ptrdiff_t max() noexcept { 201 | return std::numeric_limits::max(); 202 | } 203 | 204 | __atomic_semaphore_base(ptrdiff_t count) : count(count) { } 205 | 206 | ~__atomic_semaphore_base() = default; 207 | 208 | __atomic_semaphore_base(__atomic_semaphore_base const&) = delete; 209 | __atomic_semaphore_base& operator=(__atomic_semaphore_base const&) = delete; 210 | 211 | inline void release(ptrdiff_t update = 1) { 212 | count.fetch_add(update, std::memory_order_release); 213 | if(update > 1) 214 | atomic_notify_all(&count); 215 | else 216 | atomic_notify_one(&count); 217 | } 218 | inline void acquire() { 219 | while (!try_acquire()) 220 | __wait_slow(); 221 | } 222 | 223 | inline bool try_acquire() noexcept { 224 | return __fetch_sub_if(); 225 | } 226 | template 227 | inline bool try_acquire_until(std::chrono::time_point const& abs_time) { 228 | 229 | if (try_acquire()) 230 | return true; 231 | else 232 | return __acquire_slow_timed(abs_time - Clock::now()); 233 | } 234 | template 235 | inline bool try_acquire_for(std::chrono::duration const& rel_time) { 236 | 237 | if (try_acquire()) 238 | return true; 239 | else 240 | return __acquire_slow_timed(rel_time); 241 | } 242 | }; 243 | 244 | #ifndef __NO_SEM 245 | 246 | class __semaphore_base { 247 | 248 | inline bool __backfill(bool success) { 249 | #ifndef __NO_SEM_BACK 250 | if(success) { 251 | auto const back_amount = __backbuffer.fetch_sub(2, std::memory_order_acquire); 252 | bool const post_one = back_amount > 0; 253 | bool const post_two = back_amount > 1; 254 | auto const success = (!post_one || __semaphore_sem_post(__semaphore, 1)) && 255 | (!post_two || __semaphore_sem_post(__semaphore, 1)); 256 | assert(success); 257 | if(!post_one || !post_two) 258 | __backbuffer.fetch_add(!post_one ? 2 : 1, std::memory_order_relaxed); 259 | } 260 | #endif 261 | return success; 262 | } 263 | inline bool __try_acquire_fast() { 264 | #ifndef __NO_SEM_FRONT 265 | #ifndef __NO_SEM_POLL 266 | ptrdiff_t old = __frontbuffer.load(std::memory_order_relaxed); 267 | if(!(old >> 32)) { 268 | using __clock = std::conditional::type; 271 | auto const start = __clock::now(); 272 | old = __frontbuffer.load(std::memory_order_relaxed); 273 | while(!(old >> 32)) { 274 | auto const elapsed = std::chrono::duration_cast(__clock::now() - start); 275 | if(elapsed > std::chrono::microseconds(5)) 276 | break; 277 | std::this_thread::sleep_for((elapsed + std::chrono::nanoseconds(100)) / 4); 278 | } 279 | } 280 | #else 281 | // boldly assume the semaphore is free with a count of 1, just because 282 | ptrdiff_t old = 1ll << 32; 283 | #endif 284 | // always steal if you can 285 | while(old >> 32) 286 | if(__frontbuffer.compare_exchange_weak(old, old - (1ll << 32), std::memory_order_acquire)) 287 | return true; 288 | // record we're waiting 289 | old = __frontbuffer.fetch_add(1ll, std::memory_order_release); 290 | // ALWAYS steal if you can! 291 | while(old >> 32) 292 | if(__frontbuffer.compare_exchange_weak(old, old - (1ll << 32), std::memory_order_acquire)) 293 | break; 294 | // not going to wait after all 295 | if(old >> 32) 296 | return __try_done(true); 297 | #endif 298 | // the wait has begun... 299 | return false; 300 | } 301 | inline bool __try_done(bool success) { 302 | #ifndef __NO_SEM_FRONT 303 | // record we're NOT waiting 304 | __frontbuffer.fetch_sub(1ll, std::memory_order_release); 305 | #endif 306 | return __backfill(success); 307 | } 308 | __semaphore_no_inline inline void __release_slow(ptrdiff_t post_amount) { 309 | #ifdef __SEM_POST_ONE 310 | #ifndef __NO_SEM_BACK 311 | bool const post_one = post_amount > 0; 312 | bool const post_two = post_amount > 1; 313 | if(post_amount > 2) 314 | __backbuffer.fetch_add(post_amount - 2, std::memory_order_acq_rel); 315 | auto const success = (!post_one || __semaphore_sem_post(__semaphore, 1)) && 316 | (!post_two || __semaphore_sem_post(__semaphore, 1)); 317 | assert(success); 318 | #else 319 | for(; post_amount; --post_amount) { 320 | auto const success = __semaphore_sem_post(__semaphore, 1); 321 | assert(success); 322 | } 323 | #endif 324 | #else 325 | auto const success = __semaphore_sem_post(__semaphore, post_amount); 326 | assert(success); 327 | #endif 328 | } 329 | __semaphore_sem_t __semaphore; 330 | #ifndef __NO_SEM_FRONT 331 | std::atomic __frontbuffer; 332 | #endif 333 | #ifndef __NO_SEM_BACK 334 | std::atomic __backbuffer; 335 | #endif 336 | 337 | public: 338 | static constexpr ptrdiff_t max() noexcept { 339 | return __semaphore_max; 340 | } 341 | 342 | __semaphore_base(ptrdiff_t count = 0) : __semaphore() 343 | #ifndef __NO_SEM_FRONT 344 | , __frontbuffer(count << 32) 345 | #endif 346 | #ifndef __NO_SEM_BACK 347 | , __backbuffer(0) 348 | #endif 349 | { 350 | assert(count <= max()); 351 | auto const success = 352 | #ifndef __NO_SEM_FRONT 353 | __semaphore_sem_init(__semaphore, 0); 354 | #else 355 | __semaphore_sem_init(__semaphore, count); 356 | #endif 357 | assert(success); 358 | } 359 | ~__semaphore_base() { 360 | #ifndef __NO_SEM_FRONT 361 | assert(0 == (__frontbuffer.load(std::memory_order_relaxed) & ~0u)); 362 | #endif 363 | auto const success = __semaphore_sem_destroy(__semaphore); 364 | assert(success); 365 | } 366 | 367 | __semaphore_base(const __semaphore_base&) = delete; 368 | __semaphore_base& operator=(const __semaphore_base&) = delete; 369 | 370 | inline void release(ptrdiff_t update = 1) { 371 | #ifndef __NO_SEM_FRONT 372 | // boldly assume the semaphore is taken but uncontended 373 | ptrdiff_t old = 0; 374 | // try to fast-release as long as it's uncontended 375 | while(0 == (old & ~0ul)) 376 | if(__frontbuffer.compare_exchange_weak(old, old + (update << 32), std::memory_order_acq_rel)) 377 | return; 378 | #endif 379 | // slow-release it is 380 | __release_slow(update); 381 | } 382 | inline void acquire() { 383 | if(!__try_acquire_fast()) 384 | __try_done(__semaphore_sem_wait(__semaphore)); 385 | } 386 | inline bool try_acquire() noexcept { 387 | return try_acquire_for(std::chrono::nanoseconds(0)); 388 | } 389 | template 390 | bool try_acquire_until(std::chrono::time_point const& abs_time) { 391 | auto const current = std::max(Clock::now(), abs_time); 392 | return try_acquire_for(std::chrono::duration_cast(abs_time - current)); 393 | } 394 | template 395 | bool try_acquire_for(std::chrono::duration const& rel_time) { 396 | return __try_acquire_fast() || 397 | __try_done(__semaphore_sem_wait_timed(__semaphore, rel_time)); 398 | } 399 | }; 400 | 401 | #endif //__NO_SEM 402 | 403 | template 404 | using semaphore_base = 405 | #ifndef __NO_SEM 406 | typename std::conditional::type 409 | #else 410 | __atomic_semaphore_base 411 | #endif 412 | ; 413 | 414 | template 415 | class counting_semaphore : public semaphore_base { 416 | static_assert(least_max_value <= semaphore_base::max(), ""); 417 | 418 | public: 419 | counting_semaphore(ptrdiff_t count = 0) : semaphore_base(count) { } 420 | ~counting_semaphore() = default; 421 | 422 | counting_semaphore(const counting_semaphore&) = delete; 423 | counting_semaphore& operator=(const counting_semaphore&) = delete; 424 | }; 425 | 426 | #ifdef __NO_SEM 427 | 428 | class __binary_semaphore_base { 429 | 430 | __semaphore_no_inline inline bool __acquire_slow_timed(std::chrono::nanoseconds const& rel_time) { 431 | 432 | using __clock = std::conditional::type; 435 | 436 | auto const start = __clock::now(); 437 | while (!try_acquire()) { 438 | auto const elapsed = std::chrono::duration_cast(__clock::now() - start); 439 | auto const delta = rel_time - elapsed; 440 | if(delta <= std::chrono::nanoseconds(0)) 441 | return false; 442 | auto const sleep = std::min((elapsed.count() >> 2) + 100, delta.count()); 443 | std::this_thread::sleep_for(std::chrono::nanoseconds(sleep)); 444 | } 445 | return true; 446 | } 447 | std::atomic available; 448 | 449 | public: 450 | static constexpr ptrdiff_t max() noexcept { return 1; } 451 | 452 | __binary_semaphore_base(ptrdiff_t available) : available(available) { } 453 | 454 | ~__binary_semaphore_base() = default; 455 | 456 | __binary_semaphore_base(__binary_semaphore_base const&) = delete; 457 | __binary_semaphore_base& operator=(__binary_semaphore_base const&) = delete; 458 | 459 | inline void release(ptrdiff_t update = 1) { 460 | available.store(1, std::memory_order_release); 461 | atomic_notify_one(&available); 462 | } 463 | inline void acquire() { 464 | while (!__builtin_expect(try_acquire(), 1)) 465 | atomic_wait_explicit(&available, 0, std::memory_order_relaxed); 466 | } 467 | 468 | inline bool try_acquire() noexcept { 469 | return 1 == available.exchange(0, std::memory_order_acquire); 470 | } 471 | template 472 | bool try_acquire_until(std::chrono::time_point const& abs_time) { 473 | 474 | if (__builtin_expect(try_acquire(), 1)) 475 | return true; 476 | else 477 | return __acquire_slow_timed(abs_time - Clock::now()); 478 | } 479 | template 480 | bool try_acquire_for(std::chrono::duration const& rel_time) { 481 | 482 | if (__builtin_expect(try_acquire(), 1)) 483 | return true; 484 | else 485 | return __acquire_slow_timed(rel_time); 486 | } 487 | }; 488 | 489 | template<> 490 | class counting_semaphore<1> : public __binary_semaphore_base { 491 | public: 492 | counting_semaphore(ptrdiff_t count = 0) : __binary_semaphore_base(count) { } 493 | ~counting_semaphore() = default; 494 | 495 | counting_semaphore(const counting_semaphore&) = delete; 496 | counting_semaphore& operator=(const counting_semaphore&) = delete; 497 | }; 498 | 499 | #endif // __NO_SEM 500 | 501 | using binary_semaphore = counting_semaphore<1>; 502 | 503 | } 504 | -------------------------------------------------------------------------------- /lib/source.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | #include 26 | #include 27 | #include 28 | 29 | #ifdef __TABLE 30 | 31 | contended_t contention[256]; 32 | 33 | contended_t * __contention(volatile void const * p) { 34 | return contention + ((uintptr_t)p & 255); 35 | } 36 | 37 | #endif //__TABLE 38 | 39 | thread_local size_t __barrier_favorite_hash = 40 | std::hash()(std::this_thread::get_id()); 41 | -------------------------------------------------------------------------------- /sample.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | // WAIT / NOTIFY 26 | //#define __NO_TABLE 27 | //#define __NO_FUTEX 28 | //#define __NO_CONDVAR 29 | //#define __NO_SLEEP 30 | //#define __NO_IDENT 31 | // To benchmark against spinning 32 | //#define __NO_SPIN 33 | //#define __NO_WAIT 34 | 35 | // SEMAPHORE 36 | //#define __NO_SEM 37 | //#define __NO_SEM_BACK 38 | //#define __NO_SEM_FRONT 39 | //#define __NO_SEM_POLL 40 | 41 | #include 42 | #include 43 | #include 44 | #include 45 | #include 46 | #include 47 | #include 48 | #include 49 | #include 50 | #include 51 | 52 | #include "sample.hpp" 53 | 54 | static constexpr int sections = 1 << 20; 55 | 56 | using sum_mean_dev_t = std::tuple; 57 | 58 | template 59 | sum_mean_dev_t sum_mean_dev(V && v) { 60 | assert(!v.empty()); 61 | auto const sum = std::accumulate(v.begin(), v.end(), 0); 62 | auto const mean = sum / v.size(); 63 | auto const sq_diff_sum = std::accumulate(v.begin(), v.end(), 0.0, [=](auto left, auto right) -> auto { 64 | return left + (right - mean) * (right - mean); 65 | }); 66 | auto const variance = sq_diff_sum / v.size(); 67 | auto const stddev = std::sqrt(variance); 68 | return std::tie(sum, mean, stddev); 69 | } 70 | 71 | template 72 | sum_mean_dev_t test_body(int threads, F && f) { 73 | 74 | std::vector progress(threads, 0); 75 | std::vector ts(threads); 76 | for (int i = 0; i < threads; ++i) 77 | ts[i] = std::thread([&, i]() { 78 | progress[i] = f(sections / threads); 79 | }); 80 | 81 | for (auto& t : ts) 82 | t.join(); 83 | 84 | return sum_mean_dev(progress); 85 | } 86 | 87 | template 88 | sum_mean_dev_t test_omp_body(int threads, F && f) { 89 | #ifdef _OPENMP 90 | std::vector progress(threads, 0); 91 | #pragma omp parallel for num_threads(threads) 92 | for (int i = 0; i < threads; ++i) 93 | progress[i] = f(sections / threads); 94 | return sum_mean_dev(progress); 95 | #else 96 | assert(0); // build with -fopenmp 97 | return sum_mean_dev_t(); 98 | #endif 99 | } 100 | 101 | template 102 | void test(std::string const& name, int threads, F && f, std::atomic& keep_going, bool use_omp = false) { 103 | 104 | std::thread test_helper([&]() { 105 | std::this_thread::sleep_for(std::chrono::seconds(2)); 106 | keep_going.store(false, std::memory_order_relaxed); 107 | }); 108 | 109 | auto const t1 = std::chrono::steady_clock::now(); 110 | auto const smd = use_omp ? test_omp_body(threads, f) 111 | : test_body(threads, f); 112 | auto const t2 = std::chrono::steady_clock::now(); 113 | 114 | test_helper.join(); 115 | 116 | double const d = double(std::chrono::duration_cast(t2 - t1).count()); 117 | std::cout << std::setprecision(2) << std::fixed; 118 | std::cout << name << " : " << d / std::get<0>(smd) << "ns per step, fairness metric = " 119 | << 100 * (1.0 - std::min(1.0, std::get<2>(smd) / std::get<1>(smd))) << "%." 120 | << std::endl; 121 | } 122 | 123 | template 124 | void test_loop(F && f) { 125 | static int const max = std::thread::hardware_concurrency(); 126 | static std::vector> const counts = 127 | { { 1, "single-threaded" }, 128 | { max >> 5, "3% occupancy" }, 129 | { max >> 4, "6% occupancy" }, 130 | { max >> 3, "12% occupancy" }, 131 | { max >> 2, "25% occupancy" }, 132 | { max >> 1, "50% occupancy" }, 133 | { max, "100% occupancy" }, 134 | //#if !defined(__NO_SPIN) || !defined(__NO_WAIT) 135 | // { max * 2, "200% occupancy" } 136 | //#endif 137 | }; 138 | std::set done{0}; 139 | for(auto const& c : counts) { 140 | if(done.find(c.first) != done.end()) 141 | continue; 142 | f(c); 143 | done.insert(c.first); 144 | } 145 | } 146 | 147 | template 148 | void test_mutex(std::string const& name, bool use_omp = false) { 149 | test_loop([&](auto c) { 150 | M m; 151 | std::atomic keep_going(true); 152 | auto f = [&](int n) -> int { 153 | int i = 0; 154 | while(keep_going.load(std::memory_order_relaxed)) { 155 | m.lock(); 156 | ++i; 157 | m.unlock(); 158 | } 159 | return i; 160 | }; 161 | test(name + ": " + c.second, c.first, f, keep_going); 162 | }); 163 | }; 164 | 165 | template 166 | void test_barrier(std::string const& name, bool use_omp = false) { 167 | 168 | test_loop([&](auto c) { 169 | B b(c.first); 170 | std::atomic keep_going(true); // unused here 171 | auto f = [&](int n) -> int { 172 | for (int i = 0; i < n; ++i) 173 | b.arrive_and_wait(); 174 | return n; 175 | }; 176 | test(name + ": " + c.second, c.first, f, keep_going, use_omp); 177 | }); 178 | }; 179 | 180 | int main() { 181 | 182 | int const max = std::thread::hardware_concurrency(); 183 | std::cout << "System has " << max << " hardware threads." << std::endl; 184 | 185 | #ifndef __NO_MUTEX 186 | test_mutex("Semlock"); 187 | test_mutex("Spinlock"); 188 | test_mutex("Ticket"); 189 | #endif 190 | 191 | #ifndef __NO_BARRIER 192 | test_barrier>("Barrier"); 193 | #endif 194 | 195 | #ifdef _OPENMP 196 | struct omp_barrier { 197 | omp_barrier(ptrdiff_t) { } 198 | void arrive_and_wait() { 199 | #pragma omp barrier 200 | } 201 | }; 202 | test_barrier("OMP", true); 203 | #endif 204 | /* 205 | #if defined(_POSIX_THREADS) && !defined(__APPLE__) 206 | struct posix_barrier { 207 | posix_barrier(ptrdiff_t count) { 208 | pthread_barrier_init(&pb, nullptr, count); 209 | } 210 | ~posix_barrier() { 211 | pthread_barrier_destroy(&pb); 212 | } 213 | void arrive_and_wait() { 214 | pthread_barrier_wait(&pb); 215 | } 216 | pthread_barrier_t pb; 217 | }; 218 | test_barrier("Pthread"); 219 | #endif 220 | */ 221 | return 0; 222 | } 223 | -------------------------------------------------------------------------------- /sample.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | Copyright (c) 2019, NVIDIA Corporation 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | 32 | struct mutex { 33 | void lock() noexcept { 34 | while (1 == l.exchange(1, std::memory_order_acquire)) 35 | #ifndef __NO_WAIT 36 | atomic_wait_explicit(&l, 1, std::memory_order_relaxed) 37 | #endif 38 | ; 39 | } 40 | void unlock() noexcept { 41 | l.store(0, std::memory_order_release); 42 | #ifndef __NO_WAIT 43 | atomic_notify_one(&l); 44 | #endif 45 | } 46 | std::atomic l = ATOMIC_VAR_INIT(0); 47 | }; 48 | 49 | struct ticket_mutex { 50 | void lock() noexcept { 51 | auto const my = in.fetch_add(1, std::memory_order_acquire); 52 | while(1) { 53 | auto const now = out.load(std::memory_order_acquire); 54 | if(now == my) 55 | return; 56 | #ifndef __NO_WAIT 57 | atomic_wait_explicit(&out, now, std::memory_order_relaxed); 58 | #endif 59 | } 60 | } 61 | void unlock() noexcept { 62 | out.fetch_add(1, std::memory_order_release); 63 | #ifndef __NO_WAIT 64 | atomic_notify_all(&out); 65 | #endif 66 | } 67 | alignas(64) std::atomic in = ATOMIC_VAR_INIT(0); 68 | alignas(64) std::atomic out = ATOMIC_VAR_INIT(0); 69 | }; 70 | 71 | struct sem_mutex { 72 | void lock() noexcept { 73 | c.acquire(); 74 | } 75 | void unlock() noexcept { 76 | c.release(); 77 | } 78 | std::binary_semaphore c = 1; 79 | }; 80 | --------------------------------------------------------------------------------