├── LICENSE ├── include └── stopwatch.h └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /include/stopwatch.h: -------------------------------------------------------------------------------- 1 | #ifndef STOPWATCH_H_ 2 | #define STOPWATCH_H_ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | namespace stopwatch { 11 | // An implementation of the 'TrivialClock' concept using the rdtscp instruction. 12 | struct rdtscp_clock { 13 | using rep = std::uint64_t; 14 | using period = std::ratio<1>; 15 | using duration = std::chrono::duration; 16 | using time_point = std::chrono::time_point; 17 | 18 | static auto now() noexcept -> time_point { 19 | std::uint32_t hi, lo; 20 | __asm__ __volatile__("rdtscp" : "=d"(hi), "=a"(lo)); 21 | return time_point(duration((static_cast(hi) << 32) | lo)); 22 | } 23 | }; 24 | 25 | // A timer using the specified clock. 26 | template 27 | struct timer { 28 | using time_point = typename Clock::time_point; 29 | using duration = typename Clock::duration; 30 | 31 | timer(const duration duration) noexcept : expiry(Clock::now() + duration) {} 32 | timer(const time_point expiry) noexcept : expiry(expiry) {} 33 | 34 | bool done(time_point now = Clock::now()) const noexcept { 35 | return now >= expiry; 36 | } 37 | 38 | auto remaining(time_point now = Clock::now()) const noexcept -> duration { 39 | return expiry - now; 40 | } 41 | 42 | const time_point expiry; 43 | }; 44 | 45 | template 46 | constexpr auto make_timer(typename Clock::duration duration) -> timer { 47 | return timer(duration); 48 | } 49 | 50 | // Times how long it takes a function to execute using the specified clock. 51 | template 52 | auto time(Func&& function) -> typename Clock::duration { 53 | const auto start = Clock::now(); 54 | function(); 55 | return Clock::now() - start; 56 | } 57 | 58 | // Samples the given function N times using the specified clock. 59 | template 60 | auto sample(Func&& function) -> std::array { 61 | std::array samples; 62 | 63 | for (std::size_t i = 0u; i < N; ++i) { 64 | samples[i] = time(function); 65 | } 66 | 67 | std::sort(samples.begin(), samples.end()); 68 | return samples; 69 | } 70 | } /* namespace stopwatch */ 71 | 72 | #endif // STOPWATCH_H_ 73 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ⏱️ stopwatch 2 | Single-header C++11 RDTSCP clock and timing utilities released into the public domain. 3 | 4 | # why 5 | While developing games, I have wanted the following features which are not provided by `std::chrono`: 6 | 1. triggering events after a certain amount of time 7 | 2. timing function calls in a high precision manner 8 | 9 | # requirements 10 | 1. The `RDTSCP` instruction and a compiler which supports C++11 or higher. 11 | 2. Your processor must have an [Intel Nehalem (2008)](https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)) or newer processor _or_ a processeor with an invariant TSC. 12 | 13 | If you do not meet these requirements, you can easily remove the `RDTSCP` code from the library and enjoy the other features. The relevant sections of the [The Intel Software Developer Manuals](http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf) are at the bottom of this page. 14 | 15 | # usage 16 | ## timer 17 | ```c++ 18 | #include "stopwatch.h" 19 | #include 20 | #include 21 | #include 22 | 23 | int main() { 24 | const auto timer = stopwatch::make_timer(std::chrono::seconds(10)); 25 | while (!timer.done()) { 26 | std::cout << std::chrono::duration_cast( 27 | timer.remaining()) 28 | .count() 29 | << " seconds remain." << std::endl; 30 | std::this_thread::sleep_for(std::chrono::milliseconds(100)); 31 | } 32 | std::cout << "10 seconds have elapsed" << std::endl; 33 | } 34 | ``` 35 | 36 | ## timing one function call 37 | ```c++ 38 | #include "stopwatch.h" 39 | #include 40 | 41 | int main() { 42 | const auto cycles = stopwatch::time([] { 43 | for (std::size_t i = 0; i < 10; ++i) { 44 | std::cout << i << std::endl; 45 | } 46 | }); 47 | 48 | std::cout << "To print out 10 numbers, it took " << cycles.count() 49 | << " cycles." << std::endl; 50 | } 51 | ``` 52 | 53 | ## sampling multiple calls to a function 54 | Taking the median number of cycles for inserting 10000 items into the beginning of a container. 55 | ```c++ 56 | #include "stopwatch.h" 57 | #include 58 | #include 59 | #include 60 | 61 | int main() { 62 | const auto deque_samples = stopwatch::sample<100>([] { 63 | std::deque deque; 64 | for (std::size_t i = 0; i < 10000; ++i) { 65 | deque.insert(deque.begin(), i); 66 | } 67 | }); 68 | 69 | const auto vector_samples = stopwatch::sample<100>([] { 70 | std::vector vector; 71 | for (std::size_t i = 0; i < 10000; ++i) { 72 | vector.insert(vector.begin(), i); 73 | } 74 | }); 75 | 76 | std::cout << "median for deque: " << deque_samples[49].count() << std::endl; 77 | std::cout << "median for vector: " << vector_samples[49].count() << std::endl; 78 | } 79 | ``` 80 | 81 | Output on my MacbookPro 2016: 82 | ``` 83 | median for deque: 487760 84 | median for vector: 7595754 85 | ``` 86 | 87 | # using another clock 88 | Using another clock is as simple as passing the clock in as a template argument. An example using `std::chrono::system_clock` inplace of `stopwatch::rdtscp_clock` for the `timing one function call` example: 89 | ```c++ 90 | const auto cycles = stopwatch::time([] { 91 | for (std::size_t i = 0; i < 10; ++i) { 92 | std::cout << i << std::endl; 93 | } 94 | }); 95 | ``` 96 | `stopwatch::time([] { ... })` became `stopwatch::time([] { ... }`. That's it! 97 | 98 | # contributing 99 | Contributions of any variety are greatly appreciated. All code is passed through `clang-format` using the Google style. 100 | 101 | ## [The Intel Software Developer Manuals](http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf) 102 | ### Section 16.12.1 103 | > The time stamp counter in newer processors may support an enhancement, referred 104 | to as invariant TSC. Processor’s support for invariant TSC is indicated by 105 | CPUID.80000007H:EDX[8]. 106 | The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is 107 | the architectural behavior moving forward. On processors with invariant TSC 108 | support, the OS may use the TSC for wall clock timer services (instead of ACPI or 109 | HPET timers). TSC reads are much more efficient and do not incur the overhead 110 | associated with a ring transition or access to a platform resource. 111 | 112 | ### Section 16.12.2 113 | > Processors based on Intel microarchitecture code name Nehalem provide an auxiliary 114 | TSC register, IA32_TSC_AUX that is designed to be used in conjunction with 115 | IA32_TSC. IA32_TSC_AUX provides a 32-bit field that is initialized by privileged software 116 | with a signature value (for example, a logical processor ID). 117 | 118 | > The primary usage of IA32_TSC_AUX in conjunction with IA32_TSC is to allow software 119 | to read the 64-bit time stamp in IA32_TSC and signature value in 120 | IA32_TSC_AUX with the instruction RDTSCP in an atomic operation. RDTSCP returns 121 | the 64-bit time stamp in EDX:EAX and the 32-bit TSC_AUX signature value in ECX. 122 | The atomicity of RDTSCP ensures that no context switch can occur between the reads 123 | of the TSC and TSC_AUX values. 124 | --------------------------------------------------------------------------------