├── .github └── dependabot.yml ├── .gitignore ├── .travis.yml ├── CHANGELOG.md ├── Cargo.toml ├── README.md ├── benches └── membarrier.rs ├── bors.toml ├── src └── lib.rs └── tests └── membarrier.rs /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | updates: 3 | - package-ecosystem: cargo 4 | directory: "/" 5 | schedule: 6 | interval: daily 7 | time: "20:00" 8 | open-pull-requests-limit: 10 9 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | /target/ 3 | **/*.rs.bk 4 | Cargo.lock 5 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: rust 2 | 3 | cache: cargo 4 | 5 | branches: 6 | only: 7 | - master 8 | - staging 9 | - trying 10 | 11 | matrix: 12 | fast_finish: true 13 | 14 | include: 15 | # Linux 16 | - rust: stable 17 | os: linux 18 | - rust: beta 19 | os: linux 20 | - rust: nightly 21 | os: linux 22 | # OS X 23 | - rust: stable 24 | os: osx 25 | - rust: beta 26 | os: osx 27 | - rust: nightly 28 | os: osx 29 | # Windows MSVC 30 | - rust: stable-x86_64-pc-windows-msvc 31 | os: windows 32 | - rust: beta-x86_64-pc-windows-msvc 33 | os: windows 34 | - rust: nightly-x86_64-pc-windows-msvc 35 | os: windows 36 | # Windows GNU 37 | - rust: stable-x86_64-pc-windows-gnu 38 | os: windows 39 | - rust: beta-x86_64-pc-windows-gnu 40 | os: windows 41 | - rust: nightly-x86_64-pc-windows-gnu 42 | os: windows 43 | 44 | script: 45 | - cargo test 46 | - cargo test --release 47 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | All notable changes to this project will be documented in this file. 3 | 4 | The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) 5 | and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). 6 | 7 | ## [Unreleased] 8 | 9 | ## 0.2.3 - 2023-03-22 10 | ### Changed 11 | - Improve Windows support. 12 | 13 | ## 0.2.0 - 2018-11-14 14 | ### Added 15 | - Add support for `no_std`. 16 | - Add Support for Windows. 17 | - Add test for Windows and OS X. 18 | - Add benchmark. 19 | - Implement a fallback for old Linux systems. 20 | 21 | ### Changed 22 | - Change the API in a backward-incompatible manner. 23 | 24 | ### Removed 25 | - Remove the memory barrier normal path. Use `fence(Ordering::SeqCst)` instead. 26 | 27 | ## 0.1.0 - 2018-03-29 28 | ### Added 29 | - First version of membarrier-rs. 30 | 31 | [Unreleased]: https://github.com/jeehoonkang/membarrier-rs/compare/v0.1.0...HEAD 32 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "membarrier" 3 | version = "0.2.3" 4 | authors = ["Jeehoon Kang "] 5 | license = "MIT/Apache-2.0" 6 | readme = "README.md" 7 | repository = "https://github.com/jeehoonkang/membarrier-rs" 8 | homepage = "https://github.com/jeehoonkang/membarrier-rs" 9 | documentation = "https://docs.rs/membarrier" 10 | description = "Process-wide memory barrier" 11 | keywords = ["memory-barrier", "barrier", "sys_membarrier", "rcu"] 12 | categories = ["memory-management", "concurrency", "os", "no-std"] 13 | 14 | [dependencies] 15 | cfg-if = "1.0" 16 | lazy_static = "1.4" 17 | libc = "0.2" 18 | windows-sys = { version = "0.48.0", features = ["Win32_System_Threading"] } 19 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Process-wide memory barrier 2 | 3 | [![Build Status](https://travis-ci.org/jeehoonkang/membarrier-rs.svg?branch=master)](https://travis-ci.org/jeehoonkang/membarrier-rs) 4 | [![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](https://github.com/jeehoonkang/membarrier-rs) 5 | [![Cargo](https://img.shields.io/crates/v/membarrier.svg)](https://crates.io/crates/membarrier) 6 | [![Documentation](https://docs.rs/membarrier/badge.svg)](https://docs.rs/membarrier) 7 | 8 | Memory barrier is one of the strongest synchronization primitives in modern relaxed-memory 9 | concurrency. In relaxed-memory concurrency, two threads may have different viewpoint on the 10 | underlying memory system, e.g. thread T1 may have recognized a value V at location X, while T2 does 11 | not know of X=V at all. This discrepancy is one of the main reasons why concurrent programming is 12 | hard. Memory barrier synchronizes threads in such a way that after memory barriers, threads have the 13 | same viewpoint on the underlying memory system. 14 | 15 | Unfortunately, memory barrier is not cheap. Usually, in modern computer systems, there's a 16 | designated memory barrier instruction, e.g. `MFENCE` in x86 and `DMB SY` in ARM, and they may 17 | take more than 100 cycles. Use of memory barrier instruction may be tolerable for several use 18 | cases, e.g. context switching of a few threads, or synchronizing events that happen only once in 19 | the lifetime of a long process. However, sometimes memory barrier is necessary in a fast path, 20 | which significantly degrades the performance. 21 | 22 | In order to reduce the synchronization cost of memory barrier, Linux and Windows provides 23 | *process-wide memory barrier*, which basically performs memory barrier for every thread in the 24 | process. Provided that it's even slower than the ordinary memory barrier instruction, what's the 25 | benefit? At the cost of process-wide memory barrier, other threads may be exempted from issuing a 26 | memory barrier instruction at all! In other words, by using process-wide memory barrier, you can 27 | optimize fast path at the performance cost of slow path. 28 | 29 | For process-wide memory barrier, Linux recently introduced the `sys_membarrier()` system call, but 30 | it's known that in older Linux, the `mprotect()` system call with appropriate arguments provides 31 | process-wide memory barrier semantics. Windows provides `FlushProcessWriteBuffers()` API. 32 | 33 | ## Usage 34 | 35 | Use this crate as follows: 36 | 37 | ```rust 38 | extern crate membarrier; 39 | use std::sync::atomic::{fence, Ordering}; 40 | 41 | membarrier::light(); // light-weight barrier 42 | membarrier::heavy(); // heavy-weight barrier 43 | fence(Ordering::SeqCst); // normal barrier 44 | ``` 45 | 46 | ## Semantics 47 | 48 | Formally, there are three kinds of memory barrier: light one (`membarrier::light()`), heavy one 49 | (`membarrier::heavy()`), and the normal one (`fence(Ordering::SeqCst)`). In an execution of a 50 | program, there is a total order over all instances of memory barrier. If thread A issues barrier X 51 | and thread B issues barrier Y and X is ordered before Y, then A's knowledge on the underlying memory 52 | system at the time of X is transferred to B after Y, provided that: 53 | 54 | - Either of A's or B's barrier is heavy; or 55 | - Both of A's and B's barriers are normal. 56 | 57 | ## Reference 58 | 59 | For more information, see the [Linux `man` page for 60 | `membarrier`](http://man7.org/linux/man-pages/man2/membarrier.2.html). 61 | -------------------------------------------------------------------------------- /benches/membarrier.rs: -------------------------------------------------------------------------------- 1 | #![feature(test)] 2 | 3 | extern crate test; 4 | extern crate membarrier; 5 | 6 | use test::Bencher; 7 | use std::sync::atomic::{fence, Ordering}; 8 | 9 | #[bench] 10 | fn light(b: &mut Bencher) { 11 | b.iter(|| { 12 | membarrier::light(); 13 | }); 14 | } 15 | 16 | #[bench] 17 | fn normal(b: &mut Bencher) { 18 | b.iter(|| { 19 | fence(Ordering::SeqCst); 20 | }); 21 | } 22 | 23 | #[bench] 24 | fn heavy(b: &mut Bencher) { 25 | b.iter(|| { 26 | membarrier::heavy(); 27 | }); 28 | } 29 | -------------------------------------------------------------------------------- /bors.toml: -------------------------------------------------------------------------------- 1 | status = ["continuous-integration/travis-ci/push"] 2 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | //! Process-wide memory barrier. 2 | //! 3 | //! Memory barrier is one of the strongest synchronization primitives in modern relaxed-memory 4 | //! concurrency. In relaxed-memory concurrency, two threads may have different viewpoint on the 5 | //! underlying memory system, e.g. thread T1 may have recognized a value V at location X, while T2 6 | //! does not know of X=V at all. This discrepancy is one of the main reasons why concurrent 7 | //! programming is hard. Memory barrier synchronizes threads in such a way that after memory 8 | //! barriers, threads have the same viewpoint on the underlying memory system. 9 | //! 10 | //! Unfortunately, memory barrier is not cheap. Usually, in modern computer systems, there's a 11 | //! designated memory barrier instruction, e.g. `MFENCE` in x86 and `DMB SY` in ARM, and they may 12 | //! take more than 100 cycles. Use of memory barrier instruction may be tolerable for several use 13 | //! cases, e.g. context switching of a few threads, or synchronizing events that happen only once in 14 | //! the lifetime of a long process. However, sometimes memory barrier is necessary in a fast path, 15 | //! which significantly degrades the performance. 16 | //! 17 | //! In order to reduce the synchronization cost of memory barrier, Linux and Windows provides 18 | //! *process-wide memory barrier*, which basically performs memory barrier for every thread in the 19 | //! process. Provided that it's even slower than the ordinary memory barrier instruction, what's the 20 | //! benefit? At the cost of process-wide memory barrier, other threads may be exempted form issuing 21 | //! a memory barrier instruction at all! In other words, by using process-wide memory barrier, you 22 | //! can optimize fast path at the performance cost of slow path. 23 | //! 24 | //! This crate provides an abstraction of process-wide memory barrier over different operating 25 | //! systems and hardware. It is implemented as follows. For recent Linux systems, we use the 26 | //! `sys_membarrier()` system call; and for those old Linux systems without support for 27 | //! `sys_membarrier()`, we fall back to the `mprotect()` system call that is known to provide 28 | //! process-wide memory barrier semantics. For Windows, we use the `FlushProcessWriteBuffers()` 29 | //! API. For all the other systems, we fall back to the normal `SeqCst` fence for both fast and slow 30 | //! paths. 31 | //! 32 | //! 33 | //! # Usage 34 | //! 35 | //! Use this crate as follows: 36 | //! 37 | //! ``` 38 | //! extern crate membarrier; 39 | //! use std::sync::atomic::{fence, Ordering}; 40 | //! 41 | //! membarrier::light(); // light-weight barrier 42 | //! membarrier::heavy(); // heavy-weight barrier 43 | //! fence(Ordering::SeqCst); // normal barrier 44 | //! ``` 45 | //! 46 | //! # Semantics 47 | //! 48 | //! Formally, there are three kinds of memory barrier: light one (`membarrier::light()`), heavy one 49 | //! (`membarrier::heavy()`), and the normal one (`fence(Ordering::SeqCst)`). In an execution of a 50 | //! program, there is a total order over all instances of memory barrier. If thread A issues barrier 51 | //! X and thread B issues barrier Y and X is ordered before Y, then A's knowledge on the underlying 52 | //! memory system at the time of X is transferred to B after Y, provided that: 53 | //! 54 | //! - Either of A's or B's barrier is heavy; or 55 | //! - Both of A's and B's barriers are normal. 56 | //! 57 | //! # Reference 58 | //! 59 | //! For more information, see the [Linux `man` page for 60 | //! `membarrier`](http://man7.org/linux/man-pages/man2/membarrier.2.html). 61 | 62 | #![warn(missing_docs, missing_debug_implementations)] 63 | #![no_std] 64 | 65 | #[macro_use] 66 | extern crate cfg_if; 67 | #[allow(unused_imports)] 68 | #[macro_use] 69 | extern crate lazy_static; 70 | extern crate libc; 71 | extern crate windows_sys; 72 | 73 | #[allow(unused_macros)] 74 | macro_rules! fatal_assert { 75 | ($cond:expr) => { 76 | if !$cond { 77 | #[allow(unused_unsafe)] 78 | unsafe { 79 | libc::abort(); 80 | } 81 | } 82 | }; 83 | } 84 | 85 | cfg_if! { 86 | if #[cfg(all(target_os = "linux"))] { 87 | pub use linux::*; 88 | } else if #[cfg(target_os = "windows")] { 89 | pub use windows::*; 90 | } else { 91 | pub use default::*; 92 | } 93 | } 94 | 95 | #[allow(dead_code)] 96 | mod default { 97 | use core::sync::atomic::{fence, Ordering}; 98 | 99 | /// Issues a light memory barrier for fast path. 100 | /// 101 | /// It just issues the normal memory barrier instruction. 102 | #[inline] 103 | pub fn light() { 104 | fence(Ordering::SeqCst); 105 | } 106 | 107 | /// Issues a heavy memory barrier for slow path. 108 | /// 109 | /// It just issues the normal memory barrier instruction. 110 | #[inline] 111 | pub fn heavy() { 112 | fence(Ordering::SeqCst); 113 | } 114 | } 115 | 116 | #[cfg(target_os = "linux")] 117 | mod linux { 118 | use core::sync::atomic; 119 | 120 | /// A choice between three strategies for process-wide barrier on Linux. 121 | #[derive(Clone, Copy, PartialEq, Eq)] 122 | enum Strategy { 123 | /// Use the `membarrier` system call. 124 | Membarrier, 125 | /// Use the `mprotect`-based trick. 126 | Mprotect, 127 | /// Use `SeqCst` fences. 128 | Fallback, 129 | } 130 | 131 | lazy_static! { 132 | /// The right strategy to use on the current machine. 133 | static ref STRATEGY: Strategy = { 134 | if membarrier::is_supported() { 135 | Strategy::Membarrier 136 | } else if mprotect::is_supported() { 137 | Strategy::Mprotect 138 | } else { 139 | Strategy::Fallback 140 | } 141 | }; 142 | } 143 | 144 | mod membarrier { 145 | /// Commands for the membarrier system call. 146 | /// 147 | /// # Caveat 148 | /// 149 | /// We're defining it here because, unfortunately, the `libc` crate currently doesn't 150 | /// expose `membarrier_cmd` for us. You can find the numbers in the [Linux source 151 | /// code](https://github.com/torvalds/linux/blob/master/include/uapi/linux/membarrier.h). 152 | /// 153 | /// This enum should really be `#[repr(libc::c_int)]`, but Rust currently doesn't allow it. 154 | #[repr(i32)] 155 | #[allow(dead_code, non_camel_case_types)] 156 | enum membarrier_cmd { 157 | MEMBARRIER_CMD_QUERY = 0, 158 | MEMBARRIER_CMD_GLOBAL = (1 << 0), 159 | MEMBARRIER_CMD_GLOBAL_EXPEDITED = (1 << 1), 160 | MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED = (1 << 2), 161 | MEMBARRIER_CMD_PRIVATE_EXPEDITED = (1 << 3), 162 | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED = (1 << 4), 163 | MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 5), 164 | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 6), 165 | } 166 | 167 | /// Call the `sys_membarrier` system call. 168 | #[inline] 169 | fn sys_membarrier(cmd: membarrier_cmd) -> libc::c_long { 170 | unsafe { libc::syscall(libc::SYS_membarrier, cmd as libc::c_int, 0 as libc::c_int) } 171 | } 172 | 173 | /// Returns `true` if the `sys_membarrier` call is available. 174 | pub fn is_supported() -> bool { 175 | // Queries which membarrier commands are supported. Checks if private expedited 176 | // membarrier is supported. 177 | let ret = sys_membarrier(membarrier_cmd::MEMBARRIER_CMD_QUERY); 178 | if ret < 0 179 | || ret & membarrier_cmd::MEMBARRIER_CMD_PRIVATE_EXPEDITED as libc::c_long == 0 180 | || ret & membarrier_cmd::MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED as libc::c_long 181 | == 0 182 | { 183 | return false; 184 | } 185 | 186 | // Registers the current process as a user of private expedited membarrier. 187 | if sys_membarrier(membarrier_cmd::MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED) < 0 { 188 | return false; 189 | } 190 | 191 | true 192 | } 193 | 194 | /// Executes a heavy `sys_membarrier`-based barrier. 195 | #[inline] 196 | pub fn barrier() { 197 | fatal_assert!(sys_membarrier(membarrier_cmd::MEMBARRIER_CMD_PRIVATE_EXPEDITED) >= 0); 198 | } 199 | } 200 | 201 | mod mprotect { 202 | use core::{cell::UnsafeCell, mem::MaybeUninit, ptr, sync::atomic}; 203 | use libc; 204 | 205 | struct Barrier { 206 | lock: UnsafeCell, 207 | page: u64, 208 | page_size: libc::size_t, 209 | } 210 | 211 | unsafe impl Sync for Barrier {} 212 | 213 | impl Barrier { 214 | /// Issues a process-wide barrier by changing access protections of a single mmap-ed 215 | /// page. This method is not as fast as the `sys_membarrier()` call, but works very 216 | /// similarly. 217 | #[inline] 218 | fn barrier(&self) { 219 | let page = self.page as *mut libc::c_void; 220 | 221 | unsafe { 222 | // Lock the mutex. 223 | fatal_assert!(libc::pthread_mutex_lock(self.lock.get()) == 0); 224 | 225 | // Set the page access protections to read + write. 226 | fatal_assert!( 227 | libc::mprotect(page, self.page_size, libc::PROT_READ | libc::PROT_WRITE,) 228 | == 0 229 | ); 230 | 231 | // Ensure that the page is dirty before we change the protection so that we 232 | // prevent the OS from skipping the global TLB flush. 233 | let atomic_usize = &*(page as *const atomic::AtomicUsize); 234 | atomic_usize.fetch_add(1, atomic::Ordering::SeqCst); 235 | 236 | // Set the page access protections to none. 237 | // 238 | // Changing a page protection from read + write to none causes the OS to issue 239 | // an interrupt to flush TLBs on all processors. This also results in flushing 240 | // the processor buffers. 241 | fatal_assert!(libc::mprotect(page, self.page_size, libc::PROT_NONE) == 0); 242 | 243 | // Unlock the mutex. 244 | fatal_assert!(libc::pthread_mutex_unlock(self.lock.get()) == 0); 245 | } 246 | } 247 | } 248 | 249 | lazy_static! { 250 | /// An alternative solution to `sys_membarrier` that works on older Linux kernels and 251 | /// x86/x86-64 systems. 252 | static ref BARRIER: Barrier = { 253 | unsafe { 254 | // Find out the page size on the current system. 255 | let page_size = libc::sysconf(libc::_SC_PAGESIZE); 256 | fatal_assert!(page_size > 0); 257 | let page_size = page_size as libc::size_t; 258 | 259 | // Create a dummy page. 260 | let page = libc::mmap( 261 | ptr::null_mut(), 262 | page_size, 263 | libc::PROT_NONE, 264 | libc::MAP_PRIVATE | libc::MAP_ANONYMOUS, 265 | -1 as libc::c_int, 266 | 0 as libc::off_t, 267 | ); 268 | fatal_assert!(page != libc::MAP_FAILED); 269 | fatal_assert!(page as libc::size_t % page_size == 0); 270 | 271 | // Locking the page ensures that it stays in memory during the two mprotect 272 | // calls in `Barrier::barrier()`. If the page was unmapped between those calls, 273 | // they would not have the expected effect of generating IPI. 274 | libc::mlock(page, page_size as libc::size_t); 275 | 276 | // Initialize the mutex. 277 | let lock = UnsafeCell::new(libc::PTHREAD_MUTEX_INITIALIZER); 278 | let mut attr = MaybeUninit::::uninit(); 279 | fatal_assert!(libc::pthread_mutexattr_init(attr.as_mut_ptr()) == 0); 280 | let mut attr = attr.assume_init(); 281 | fatal_assert!( 282 | libc::pthread_mutexattr_settype(&mut attr, libc::PTHREAD_MUTEX_NORMAL) == 0 283 | ); 284 | fatal_assert!(libc::pthread_mutex_init(lock.get(), &attr) == 0); 285 | fatal_assert!(libc::pthread_mutexattr_destroy(&mut attr) == 0); 286 | 287 | let page = page as u64; 288 | 289 | Barrier { lock, page, page_size } 290 | } 291 | }; 292 | } 293 | 294 | /// Returns `true` if the `mprotect`-based trick is supported. 295 | pub fn is_supported() -> bool { 296 | cfg!(target_arch = "x86") || cfg!(target_arch = "x86_64") 297 | } 298 | 299 | /// Executes a heavy `mprotect`-based barrier. 300 | #[inline] 301 | pub fn barrier() { 302 | BARRIER.barrier(); 303 | } 304 | } 305 | 306 | /// Issues a light memory barrier for fast path. 307 | /// 308 | /// It issues a compiler fence, which disallows compiler optimizations across itself. It incurs 309 | /// basically no costs in run-time. 310 | #[inline] 311 | #[allow(dead_code)] 312 | pub fn light() { 313 | use self::Strategy::*; 314 | match *STRATEGY { 315 | Membarrier | Mprotect => atomic::compiler_fence(atomic::Ordering::SeqCst), 316 | Fallback => atomic::fence(atomic::Ordering::SeqCst), 317 | } 318 | } 319 | 320 | /// Issues a heavy memory barrier for slow path. 321 | /// 322 | /// It issues a private expedited membarrier using the `sys_membarrier()` system call, if 323 | /// supported; otherwise, it falls back to `mprotect()`-based process-wide memory barrier. 324 | #[inline] 325 | #[allow(dead_code)] 326 | pub fn heavy() { 327 | use self::Strategy::*; 328 | match *STRATEGY { 329 | Membarrier => membarrier::barrier(), 330 | Mprotect => mprotect::barrier(), 331 | Fallback => atomic::fence(atomic::Ordering::SeqCst), 332 | } 333 | } 334 | } 335 | 336 | #[cfg(target_os = "windows")] 337 | mod windows { 338 | use core::sync::atomic; 339 | use windows_sys; 340 | 341 | /// Issues light memory barrier for fast path. 342 | /// 343 | /// It issues compiler fence, which disallows compiler optimizations across itself. 344 | #[inline] 345 | pub fn light() { 346 | atomic::compiler_fence(atomic::Ordering::SeqCst); 347 | } 348 | 349 | /// Issues heavy memory barrier for slow path. 350 | /// 351 | /// It invokes the `FlushProcessWriteBuffers()` system call. 352 | #[inline] 353 | pub fn heavy() { 354 | unsafe { 355 | windows_sys::Win32::System::Threading::FlushProcessWriteBuffers(); 356 | } 357 | } 358 | } 359 | -------------------------------------------------------------------------------- /tests/membarrier.rs: -------------------------------------------------------------------------------- 1 | #![no_std] 2 | 3 | extern crate membarrier; 4 | 5 | use core::sync::atomic::{fence, Ordering}; 6 | 7 | #[test] 8 | fn fences() { 9 | membarrier::light(); // light-weight barrier 10 | fence(Ordering::SeqCst); // normal barrier 11 | membarrier::heavy(); // heavy-weight barrier 12 | } 13 | --------------------------------------------------------------------------------