├── .gitignore ├── CLAUDE.md ├── LICENSE ├── Makefile ├── README.md ├── examples ├── Makefile ├── assert.c ├── baseline.c ├── baseline_multi.c ├── basic.c ├── bm_seed.cpp ├── custom.c ├── indirect.cpp ├── jemalloc.cpp ├── l1d_miss.cpp ├── raw.c ├── storms.cpp └── suspend.c ├── include └── b63 │ ├── b63.h │ ├── benchmark.h │ ├── counter.h │ ├── counter_list.h │ ├── counters │ ├── cycles.h │ ├── jemalloc.h │ ├── osx_kperf.h │ ├── perf_events.h │ ├── perf_events_map.h │ └── time.h │ ├── printer.h │ ├── register.h │ ├── run.h │ ├── suite.h │ └── utils │ ├── section_ptr_list.h │ ├── stats.h │ ├── string.h │ ├── timer.h │ └── ttable.h └── ref ├── api.md ├── architecture.md ├── counters.md ├── examples.md ├── overview.md └── usage.md /.gitignore: -------------------------------------------------------------------------------- 1 | # vim 2 | *.swp 3 | 4 | # build folder 5 | _build/ 6 | examples/_build/ 7 | -------------------------------------------------------------------------------- /CLAUDE.md: -------------------------------------------------------------------------------- 1 | # B63 Project Reference 2 | 3 | This is a reference for the B63 benchmarking library. Here are the documentation files: 4 | 5 | - [Overview](/ref/overview.md) - High-level overview of the B63 library and its features 6 | - [Usage Guide](/ref/usage.md) - How to use B63 for benchmarking your code 7 | - [Architecture](/ref/architecture.md) - Core components and design of the library 8 | - [Counters](/ref/counters.md) - Details on the counter system for measurements 9 | - [API Reference](/ref/api.md) - Complete API documentation 10 | - [Examples](/ref/examples.md) - Example code and usage patterns 11 | 12 | ## Key Files 13 | 14 | - `/include/b63/b63.h` - Main include file 15 | - `/include/b63/benchmark.h` - Core benchmark structures 16 | - `/include/b63/counter.h` - Counter interface 17 | - `/include/b63/run.h` - Benchmark execution 18 | 19 | ## Building Examples 20 | 21 | ```bash 22 | cd examples 23 | make 24 | ``` 25 | 26 | ## Running a Benchmark 27 | 28 | ```bash 29 | ./examples/basic -i -c time,cycles -e 10 -t 5.0 -s 42 30 | ``` 31 | 32 | Options: 33 | - `-i`: Interactive mode 34 | - `-c`: Specify counters 35 | - `-e`: Number of epochs 36 | - `-t`: Time limit per benchmark 37 | - `-s`: Random seed -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | 3 | Version 2.0, January 2004 4 | 5 | http://www.apache.org/licenses/ 6 | 7 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 8 | 9 | 1. Definitions. 10 | 11 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 16 | 17 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. 18 | 19 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. 20 | 21 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 22 | 23 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). 24 | 25 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. 26 | 27 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." 28 | 29 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 30 | 31 | 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 32 | 33 | 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 34 | 35 | 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: 36 | 37 | You must give any other recipients of the Work or Derivative Works a copy of this License; and 38 | You must cause any modified files to carry prominent notices stating that You changed the files; and 39 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 40 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. 41 | 42 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 43 | 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 44 | 45 | 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 46 | 47 | 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 48 | 49 | 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 50 | 51 | 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 52 | 53 | END OF TERMS AND CONDITIONS 54 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | ifeq ($(PREFIX),) 2 | PREFIX := /usr/local 3 | endif 4 | 5 | install: 6 | install -d $(PREFIX)/include/b63/counters 7 | install -d $(PREFIX)/include/b63/utils 8 | install -m 644 include/b63/*.h $(PREFIX)/include/b63/ 9 | install -m 644 include/b63/counters/*.h $(PREFIX)/include/b63/counters/ 10 | install -m 644 include/b63/utils/*.h $(PREFIX)/include/b63/utils/ 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # B63 2 | 3 | Light-weight micro-benchmarking tool for C. 4 | 5 | ## Motivation 6 | Why was it built, given that quite a few already exist? 7 | - quick and easy benchmarking for C, not C++ only; 8 | - benchmarking custom counters, rather than time/cycles only, specifically: 9 | - CPU Performance Monitoring Unit counters, for example number of cache misses or branch mispredictions; 10 | - jemalloc memory allocations; 11 | - custom measurements, like number of hash collisions. 12 | 13 | ## Examples 14 | The easiest way to get a sense of how it could be used is to look at and 15 | run benchmarks from examples/ folder. The library is header-only, so examples only need to include: 16 | - b63.h header; 17 | - individual counter headers, if needed. 18 | 19 | This is how benchmarking time, cpu cycles and cache misses might look like on Linux: 20 | 21 | ```cpp 22 | #include "../include/b63/b63.h" 23 | #include "../include/b63/counters/perf_events.h" 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | 31 | const size_t kSize = (1 << 16); 32 | const size_t kMask = kSize - 1; 33 | 34 | /* 35 | * B63_BASELINE defines a 'baseline' function to benchmark. 36 | * In this definition, 'sequential' is benchmark name, 37 | * and 'n' is the parameter the function needs to use as 38 | * 'how many iterations to run'. It is important to have this parameter 39 | * to be able to adjust the run time dynamically 40 | */ 41 | B63_BASELINE(sequential, n) { 42 | std::vector v; 43 | 44 | /* 45 | * Anything within 'B63_SUSPEND' will not be counted 46 | * towards benchmark score. 47 | */ 48 | B63_SUSPEND { 49 | v.resize(kSize); 50 | std::iota(v.begin(), v.end(), 0); 51 | } 52 | int32_t res = 0; 53 | for (size_t i = 0; i < n; i++) { 54 | for (size_t j = 0; j < kSize; j++) { 55 | res += v[j]; 56 | } 57 | } 58 | /* this is to prevent compiler from optimizing res out */ 59 | B63_KEEP(res); 60 | } 61 | 62 | /* 63 | * This is another benchmark, which will be compared to baseline 64 | */ 65 | B63_BENCHMARK(random, n) { 66 | std::vector v; 67 | B63_SUSPEND { 68 | /* b63_seed is passed implicitly to every benchmark */ 69 | std::srand(b63_seed); 70 | v.resize(kSize); 71 | std::generate(v.begin(), v.end(), std::rand); 72 | } 73 | int32_t res = 0; 74 | for (size_t i = 0; i < n; i++) { 75 | for (size_t j = 0; j < kSize; j++) { 76 | res += v[v[j] & kMask]; 77 | } 78 | } 79 | B63_KEEP(res); 80 | } 81 | 82 | int main(int argc, char **argv) { 83 | srand(time(0)); 84 | /* 85 | * This call starts benchmarking. 86 | * Comma-separated list of counters to measure is passed explicitly here, 87 | * but one can provide command-line flag -c to override. 88 | * In this case, we are measuring 3 counters: 89 | * * lpe:cycles - CPU cycles spent in benchmark outside of B63_SUSPEND, as measured with Linux perf_events; 90 | * * lpe:L1-dcache-load-misses - CPU L1 Data cache misses during benchmark run outside of B63_SUSPEND; 91 | * * time - wall time outside of B63_SUSPEND. 92 | */ 93 | B63_RUN_WITH("time,lpe:cycles,lpe:L1-dcache-load-misses", argc, argv); 94 | return 0; 95 | } 96 | ``` 97 | 98 | Build and run: 99 | 100 | This is the output of the sample run: 101 | ``` 102 | $ g++ -O3 bm_seed.cpp -o bm 103 | $ ./bm -i # i for interactive mode 104 | sequential time : 52858.855 105 | random time :148667.365 (+181.253% *) 106 | sequential lpe:cycles : 132055.030 107 | random lpe:cycles :372451.514 (+182.043% *) 108 | sequential lpe:L1-dcache-load-misses: 4969.704 109 | random lpe:L1-dcache-load-misses:80874.886 (+1527.358% *) 110 | ``` 111 | Currently B63 repeats the run for every counter to reduce side-effects of measurement, but this might change in the future. 112 | The way to read the results: for benchmark 'sequential', which is baseline version, we spent 52 milliseconds per iteration; 113 | For 'random' version, we see clear increase in time and equivalent increase in CPU cycles (+181%), and a much more prominent increase in L1 data cache misses (+1724%). The asteriks means: p99 confidence interval for the difference between benchmark and baseline does not contain 0, thus, you can be 99% confident that it is derectionally correct result. 114 | 115 | Extra examples can be found in examples/ folder: 116 | 1. Measuring time / iteration ([examples/basic.c](examples/basic.c)); 117 | 2. Suspending tracking ([examples/suspend.c](examples/suspend.c)); 118 | 3. Comparing implementations with baseline ([examples/baseline.c](examples/baseline.c)); 119 | 4. Using custom counter, number of function calls in this case ([examples/custom.c](examples/custom.c)); 120 | 5. Using cache miss counter from linux perf_events ([examples/l1d_miss.cpp](examples/l1d_miss.cpp)); 121 | 6. Using raw counter from linux perf_events ([examples/raw.c](examples/raw.c)); 122 | 7. Measuring jemalloc allocation stats ([examples/jemalloc.cpp](examples/jemalloc.cpp)); 123 | 8. Utilizing seed to keep benchmark results reproducible ([examples/bm_seed.cpp](examples/bm_seed.cpp)); 124 | 9. Multiple comparisons, including A/A test: ([examples/baseline_multi.c](examples/baseline_multi.c)). 125 | 126 | ## Comparison and baselines 127 | Within the benchmark suite, there's a way to define 'baseline', and compare all other benchmarks against it. When comparing, 99% confidence interval is computed using differences between individual epochs. 128 | 129 | ## Output Modes 130 | Two output modes are supported: 131 | - plaintext mode (default), which produces output suitable for scripting/parsing, printing out each epoch individually to leave an option for more advanced data studies. 132 | ``` 133 | $ ./_build/bm_baseline 134 | basic,time,16777215,233781738 135 | basic,time,16777215,228961470 136 | basic,time,16777215,230559174 137 | basic,time,16777215,228707363 138 | basic,time,16777215,228769396 139 | basic_half,time,33554431,227525646 140 | basic_half,time,33554431,228749848 141 | basic_half,time,33554431,228985440 142 | basic_half,time,33554431,228123909 143 | basic_half,time,33554431,228560855 144 | ``` 145 | - interactive mode turned on with -i flag. There isn't much interactivity really, but the output is formatted and colored for human consumption, rather than other tool consumption. 146 | ``` 147 | $ ./_build/bm_baseline -i 148 | basic time : 13.597 149 | basic_half time : 6.787 (-50.083% *) 150 | ``` 151 | 152 | ## Configuration 153 | 154 | ### CLI Flags 155 | Following CLI flags are supported: 156 | - -i if provided, interactive output mode will be used; 157 | - -c counter1[,counter2,counter3,...] -- override default counters for all benchmarks; 158 | - -e epochs_count -- override how many epochs to run the benchmark for; 159 | - -t timelimit_per_benchmark - time limit in seconds for how long to run the benchmark; includes time benchmark is suspended. 160 | - -d delimiter to use for plaintext. Comma is default. 161 | - -s seed. Optional, needed for reproducibility and A/B testing across binaries, for example, different versions of code or difference hardware. If not provided, seed will be generated. 162 | 163 | ### Configuration in code 164 | It's possible to configure the counters to run within the code itself, by using B63_RUN_WITH("list,of,counters", argc, argv); 165 | 166 | ## Counters 167 | In addition to measuring time, B63 allows to define and use custom counters, for example CPU perf events. Some counters are already built and provided in counters/ folder, but framework is flexible and makes it easy to define new ones. 168 | 169 | For now following counters are implemented: 170 | 1) time - most basic counter, measures time in microseconds. [Linux, FreeBSD, MacOS] 171 | 2) jemalloc - measures bytes allocated by jemalloc. [Linux, FreeBSD, MacOS] 172 | 3) perf_events - measures custom CPU counters, like cache misses, branch mispredictions, etc. [Linux only, 2.6.31+] 173 | 174 | ### Notes for building custom counters: 175 | Counters are expected to be additive and monotonic; 176 | Implementation of the counting and suspension lives in [include/b63/run.h](include/b63/run.h); [examples/custom.c](examples/custom.c) is a simple case of custom counter definition. All counters shipped with the library can be used as examples, as they do not rely on anything internal from b63. 177 | 178 | Counters header files should be included from benchmark c/cpp file directly; only default timer counter is included from 179 | b63 itself. It is done to avoid having an insane amount of ifdefs in the code and compilicated build rules, as counters have to be gated by compiler/os/libraries installed and used. 180 | When benchmarks are configured to run with multiple counters, each benchmark is re-run for each counter. This is an easy way to deal with measurement side effects, but has obvious disadvantages: 181 | - benchmark needs to run longer; 182 | - in cases when the variance between benchmarks runs is high, results might look confusing. 183 | 184 | The suspension is an important case to understand and interpret correctly. To illustrate this, let's look at the following example [benchmark](examples/suspend.c): 185 | 186 | ``` 187 | $ ./_build/bm_suspend 188 | with_suspend,time,8388607,117749190 189 | with_suspend,time,8388607,117033209 190 | with_suspend,time,8388607,114440936 191 | with_suspend,time,8388607,114655889 192 | with_suspend,time,8388607,114215822 193 | basic,time,16777215,228015817 194 | basic,time,16777215,230814726 195 | basic,time,16777215,227958139 196 | basic,time,16777215,228723995 197 | basic,time,16777215,229286180 198 | $ ./_build/bm_suspend -i 199 | with_suspend time : 13.672 200 | basic time : 13.528 201 | ``` 202 | 203 | In interactive mode, the rate of events per iteration is reported, while in plaintext mode number of iterations and number of events is printed out directly. Time limit for running the benchmark is taking time spent in suspension into account, to make run time predictable. Thus, the way to interpret the output is: 'with_suspend' is equivalent to 'basic' in non-suspended time, thus the time/iteration is very close. However, the suspended activity takes a while, so we had to run fewer iterations overall. 204 | 205 | ### Existing counters: 206 | #### Linux perf_events ("lpe:...") 207 | The acronym/prefix used is 'lpe'. 208 | This family of counters uses perf_events interface, same as Linux perf tool. It allows counting performance events either by predefined names for popular counters (cycles, cache-misses, branches, page-faults) or custom CPU-specific raw codes in r format. This makes answering questions like 'how many cache misses will different version of the code have?' or 209 | 'how different execution ports on CPU are used across several implementation of the algorithm?' much easier compared to building separate binaries, running them with perf tool (or equivalent) drilling down to the function in question, etc. 210 | 211 | Example usage: 212 | ``` 213 | $ ./bm_raw -c lpe:cycles,lpe:r04a1 214 | ``` 215 | 216 | #### Jemalloc thread allocations ("jemalloc_thread_allocated") 217 | This counter tracks the number of bytes allocated by jemalloc in the calling thread. Example usage: 218 | ``` 219 | $ ./bm_jemalloc -c jemalloc_thread_allocated 220 | ``` 221 | 222 | #### Time ("time") 223 | Default counter, counts microseconds. 224 | 225 | #### OS X kperf-based counters 226 | The prefix is kperf. Currently only measures main thread. For a list of events supported, check https://github.com/okuvshynov/b63/blob/master/include/b63/counters/osx_kperf.h#L67-L75 227 | 228 | ## Dependencies and compatibility 229 | 230 | B63 requires following C compiler attributes available: 231 | - \_\_attribute\_\_((cleanup)) 232 | - \_\_attribute\_\_((used)) 233 | - \_\_attribute\_\_((section)) 234 | 235 | Reasonably recent GCC and Clang have them, but I'm not sure which versions started supporting them. 236 | 237 | Individual counters can have specific requirements. For example, Linux perf_events, not surprisingly, 238 | will only work on Linux, jemalloc counter will only work/make sense if memory allocation is done via jemalloc. 239 | 240 | ### Tested On 241 | 1. MacBook Pro 2019 242 | - OS: MacOS 10.14.6 (x86_64-apple-darwin18.7.0) 243 | - CPU: Intel(R) Core(TM) i5-8257U 244 | - Compiler: clang-1001.0.46.4 (Apple LLVM) 245 | 2. MacBook Pro 2009 246 | - OS: Ubuntu 18.04.3 (Kernel: 4.15.0-58) 247 | - CPU: Intel(R) Core(TM) 2 Duo P8700 248 | - Compiler: GCC 7.4.0 249 | 3. Paspberry PI 250 | - OS: Raspbian GNU/Linux 9 (Kernel: 4.14.71-v7+) 251 | - CPU: ARMv7 Processor rev 4 (v7l) 252 | - Compiler: GCC 6.3.0 253 | 4. [VM] FreeBSD 254 | - OS: FreeBSD 12.0 255 | - Compiler: FreeBSD clang 6.0.1 256 | 5. MacMini 2007 257 | - OS: Ubuntu 11.10 (Kernel: 3.0.0-13-generic) 258 | - CPU: Intel(R) Core(TM)2 CPU T5600 259 | - Compiler: GCC 4.6.1 260 | - Caveats: 261 | - requires -lrt flag, as POSIX realtime extension are not (yet) in libc. 262 | - ref-cycles event from linux perf_events is not supported. 263 | 264 | ## Internals 265 | The library consists of a core part responsible for running the benchmarks, and pluggable counters. The library is header-only, thus, there isn't much encapsulation going on. Every global symbol is prefixed with b63\_. 266 | 267 | Main internal data structures are: 268 | 1) b63_benchmark. Each function defined with a 'B63_BENCHMARK' or 'B63_BASELINE' macro corresponds to one benchmark instance. 269 | 2) b63_suite. Set of all benchmarks defined in the translation unit. 270 | 3) b63_ctype. Counter Type. Defines a type/family of a counter, for example, 'linux_perf_event' or 'jemalloc' 271 | 4) b63_counter. Instance of a counter, which has to be of one of the defined counter types. 272 | 5) b63_counter_list. Set of all counters to run benchmarks for. 273 | 6) b63_run. Individual benchmark execution. 274 | 275 | ## Next steps: 276 | - a convenient way to measure outliers. For example, as hash maps usually have amortized O(1) cost for lookup, what does p99 lookup time looks like for some lookup distribution? What can be done to improve? 277 | - support CPU perf counters sources beyond Linux perf_events, for example [Intel's PCM](https://github.com/opcm/pcm) and [BSD pmcstat](https://www.freebsd.org/cgi/man.cgi?query=pmcstat). 278 | - GPU perf counters (at least for Nvidia). 279 | - [low-pri] disk access and network. 280 | -------------------------------------------------------------------------------- /examples/Makefile: -------------------------------------------------------------------------------- 1 | # 2 | # Copyright 2019 Oleksandr Kuvshynov 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | bm_basic: 17 | mkdir -p _build/ 18 | cc -Wall -Wno-unused-function basic.c -I. -O3 -o _build/bm_basic -std=c99 -lm 19 | ./_build/bm_basic -i 20 | 21 | bm_assert: 22 | mkdir -p _build/ 23 | cc -Wall -Wno-unused-function assert.c -I. -O3 -o _build/bm_assert -std=c99 -lm 24 | ./_build/bm_assert -i 25 | 26 | bm_suspend: 27 | mkdir -p _build/ 28 | cc -Wall -Wno-unused-function suspend.c -I. -O3 -o _build/bm_suspend -std=c99 -lm 29 | ./_build/bm_suspend -i 30 | 31 | bm_baseline: 32 | mkdir -p _build/ 33 | cc -Wall -Wno-unused-function baseline.c -I. -O3 -o _build/bm_baseline -std=c99 -lm 34 | ./_build/bm_baseline -i 35 | 36 | bm_baseline_multi: 37 | mkdir -p _build/ 38 | cc -Wall -Wno-unused-function baseline_multi.c -I. -O3 -o _build/bm_baseline_multi -std=c99 -lm 39 | ./_build/bm_baseline_multi -i 40 | 41 | bm_custom: 42 | mkdir -p _build/ 43 | cc -Wall -Wno-unused-function custom.c -I. -O3 -o _build/bm_custom -std=c99 -lm 44 | ./_build/bm_custom -i 45 | 46 | bm_l1d_miss: 47 | mkdir -p _build/ 48 | c++ -Wall -Wno-unused-function l1d_miss.cpp -I. -O3 -o _build/bm_l1d_miss -std=c++17 49 | ./_build/bm_l1d_miss -i -c lpe:L1-dcache-load-misses,time 50 | 51 | bm_storms: 52 | mkdir -p _build/ 53 | clang++ -Wall -Wno-unused-function storms.cpp -I. -O3 -o _build/storms -std=c++17 54 | sudo ./_build/storms -i -c time,kperf:cycles,kperf:instructions 55 | 56 | bm_l1d_miss_osx: 57 | mkdir -p _build/ 58 | c++ -Wall -Wno-unused-function l1d_miss.cpp -I. -O3 -o _build/bm_l1d_miss -std=c++17 59 | sudo ./_build/bm_l1d_miss -i -c kperf:L1-dcache-load-misses,time,kperf:cycles 60 | 61 | bm_raw: 62 | mkdir -p _build/ 63 | cc -Wall -Wno-unused-function raw.c -I. -O3 -o _build/bm_raw -std=c99 -lm 64 | ./_build/bm_raw -c lpe:r04A1,lpe:r10A1 -i 65 | 66 | bm_jemalloc_non_bsd: 67 | mkdir -p _build/ 68 | c++ jemalloc.cpp -O3 -o _build/bm_jemalloc -I`jemalloc-config --includedir` -L`jemalloc-config --libdir` -Wl,-rpath,`jemalloc-config --libdir` -ljemalloc `jemalloc-config --libs` 69 | ./_build/bm_jemalloc -i 70 | 71 | bm_jemalloc_bsd: 72 | mkdir -p _build/ 73 | c++ jemalloc.cpp -O3 -o _build/bm_jemalloc 74 | ./_build/bm_jemalloc -i 75 | 76 | bm_indirect: 77 | mkdir -p _build/ 78 | c++ -Wall -Wno-unused-function indirect.cpp -I. -O3 -o _build/indirect -std=c++17 79 | ./_build/indirect -i -c time 80 | 81 | 82 | clean: 83 | rm -rf _build/ 84 | 85 | format: 86 | find . -iname *.h -o -iname *.c -o -iname *.cpp | xargs clang-format -i 87 | -------------------------------------------------------------------------------- /examples/assert.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | #include "math.h" 20 | 21 | B63_BENCHMARK(basic_w_assert_fail, n) { 22 | int64_t res = 0LL; 23 | for (int64_t i = 0; i < n; i++) { 24 | res += i; 25 | } 26 | B63_ASSERT(res == n * n); 27 | } 28 | 29 | B63_BASELINE(basic_w_assert_b, n) { 30 | double res = 0; 31 | for (int64_t i = 0; i < n; i++) { 32 | for (int64_t x = 1; x < 1234567; x++) { 33 | /* ln(2) series */ 34 | res += (x % 2 == 1 ? 1.0 : -1.0) / x; 35 | } 36 | } 37 | B63_ASSERT(fabs(res - log(2.0) * n) <= 1e-4); 38 | } 39 | 40 | B63_BENCHMARK(basic_w_assert, n) { 41 | double res = 0; 42 | for (int64_t i = 0; i < n; i++) { 43 | for (int64_t x = 1; x < 1234567; x++) { 44 | /* ln(2) series */ 45 | res += (x % 2 == 1 ? 1.0 : -1.0) / x; 46 | } 47 | } 48 | B63_ASSERT(fabs(res - log(2.0) * n) <= 1e-4); 49 | } 50 | 51 | int main(int argc, char **argv) { 52 | B63_RUN(argc, argv); 53 | return 0; 54 | } 55 | -------------------------------------------------------------------------------- /examples/baseline.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | B63_BASELINE(basic, n) { 20 | int i = 0, res = 0; 21 | for (i = 0; i < n; i++) { 22 | res += rand(); 23 | } 24 | B63_KEEP(res); 25 | } 26 | 27 | B63_BENCHMARK(basic_half, n) { 28 | int i = 0, res = 0; 29 | for (i = 0; i < n; i += 2) { 30 | res += rand(); 31 | } 32 | B63_KEEP(res); 33 | } 34 | 35 | int main(int argc, char **argv) { 36 | B63_RUN(argc, argv); 37 | return 0; 38 | } 39 | -------------------------------------------------------------------------------- /examples/baseline_multi.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | B63_BASELINE(basic, n) { 20 | int i = 0, res = 0; 21 | for (i = 0; i < n; i++) { 22 | res += rand(); 23 | } 24 | B63_KEEP(res); 25 | } 26 | 27 | B63_BENCHMARK(basic_half, n) { 28 | int i = 0, res = 0; 29 | for (i = 0; i < n; i += 2) { 30 | res += rand(); 31 | } 32 | B63_KEEP(res); 33 | } 34 | 35 | B63_BENCHMARK(basic_twice, n) { 36 | int i = 0, res = 0; 37 | for (i = 0; i < n * 2; i++) { 38 | res += rand(); 39 | } 40 | B63_KEEP(res); 41 | } 42 | 43 | B63_BENCHMARK(basic_5x_more, n) { 44 | int i = 0, res = 0; 45 | for (i = 0; i < 5 * n; i++) { 46 | res += rand(); 47 | } 48 | B63_KEEP(res); 49 | } 50 | 51 | B63_BENCHMARK(basic_5x_less, n) { 52 | int i = 0, res = 0; 53 | for (i = 0; i < n; i += 5) { 54 | res += rand(); 55 | } 56 | B63_KEEP(res); 57 | } 58 | 59 | B63_BENCHMARK(basic_20_percent_more, n) { 60 | int i = 0, res = 0; 61 | for (i = 0; i < 6 * n; i += 5) { 62 | res += rand(); 63 | } 64 | B63_KEEP(res); 65 | } 66 | 67 | B63_BENCHMARK(basic_20_percent_less, n) { 68 | int i = 0, res = 0; 69 | for (i = 0; i < 4 * n; i += 5) { 70 | res += rand(); 71 | } 72 | B63_KEEP(res); 73 | } 74 | 75 | B63_BENCHMARK(basic_10_percent_more, n) { 76 | int i = 0, res = 0; 77 | for (i = 0; i < 11 * n; i += 10) { 78 | res += rand(); 79 | } 80 | B63_KEEP(res); 81 | } 82 | 83 | B63_BENCHMARK(basic_10_percent_less, n) { 84 | int i = 0, res = 0; 85 | for (i = 0; i < 9 * n; i += 10) { 86 | res += rand(); 87 | } 88 | B63_KEEP(res); 89 | } 90 | 91 | B63_BENCHMARK(basic_same, n) { 92 | int i = 0, res = 0; 93 | for (i = 0; i < n; i++) { 94 | res += rand(); 95 | } 96 | B63_KEEP(res); 97 | } 98 | 99 | int main(int argc, char **argv) { 100 | B63_RUN(argc, argv); 101 | return 0; 102 | } 103 | -------------------------------------------------------------------------------- /examples/basic.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | B63_BENCHMARK(basic, n) { 20 | int i = 0, res = 0; 21 | for (i = 0; i < n; i++) { 22 | res += rand(); 23 | } 24 | B63_KEEP(res); 25 | } 26 | 27 | int main(int argc, char **argv) { 28 | B63_RUN(argc, argv); 29 | return 0; 30 | } 31 | -------------------------------------------------------------------------------- /examples/bm_seed.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/b63/b63.h" 2 | #include "../include/b63/counters/perf_events.h" 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | const size_t kSize = (1 << 16); 11 | const size_t kMask = kSize - 1; 12 | 13 | /* 14 | * B63_BASELINE defines a 'baseline' function to benchmark. 15 | * In this definition, 'sequential' is benchmark name, 16 | * and 'n' is the parameter the function needs to use as 17 | * 'how many iterations to run'. It is important to have this parameter 18 | * to be able to adjust the run time dynamically 19 | */ 20 | B63_BASELINE(sequential, n) { 21 | std::vector v; 22 | 23 | /* 24 | * Anything within 'B63_SUSPEND' will not be counted 25 | * towards benchmark score. 26 | */ 27 | B63_SUSPEND { 28 | v.resize(kSize); 29 | std::iota(v.begin(), v.end(), 0); 30 | } 31 | int32_t res = 0; 32 | for (size_t i = 0; i < n; i++) { 33 | for (size_t j = 0; j < kSize; j++) { 34 | res += v[j]; 35 | } 36 | } 37 | /* this is to prevent compiler from optimizing res out */ 38 | B63_KEEP(res); 39 | } 40 | 41 | /* 42 | * This is another benchmark, which will be compared to baseline 43 | */ 44 | B63_BENCHMARK(random, n) { 45 | std::vector v; 46 | B63_SUSPEND { 47 | /* b63_seed is passed implicitly to every benchmark */ 48 | std::srand(b63_seed); 49 | v.resize(kSize); 50 | std::generate(v.begin(), v.end(), std::rand); 51 | } 52 | int32_t res = 0; 53 | for (size_t i = 0; i < n; i++) { 54 | for (size_t j = 0; j < kSize; j++) { 55 | res += v[v[j] & kMask]; 56 | } 57 | } 58 | B63_KEEP(res); 59 | } 60 | 61 | int main(int argc, char **argv) { 62 | /* 63 | * This call starts benchmarking. 64 | * Comma-separated list of counters to measure is passed explicitly here, 65 | * but one can provide command-line flag -c to override. 66 | * In this case, we are measuring 3 counters: 67 | * * lpe:cycles - CPU cycles spent in benchmark outside of B63_SUSPEND, as 68 | * measured with Linux perf_events; 69 | * * lpe:L1-dcache-load-misses - CPU L1 Data cache misses during benchmark 70 | * run outside of B63_SUSPEND; 71 | * * time - wall time outside of B63_SUSPEND. 72 | */ 73 | B63_RUN_WITH("time,lpe:cycles,lpe:L1-dcache-load-misses", argc, argv); 74 | return 0; 75 | } 76 | -------------------------------------------------------------------------------- /examples/custom.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | #include 20 | #include 21 | #include 22 | 23 | int n = 0; 24 | int64_t callcount = 0LL; 25 | 26 | /* in this example custom counter is used. */ 27 | B63_COUNTER(calls) { return callcount; } 28 | 29 | static int f() { 30 | callcount++; 31 | return n++; 32 | } 33 | 34 | B63_BASELINE(call_normal, n) { 35 | int i = 0, res = 0; 36 | for (i = 0; i < n; i++) { 37 | res += f(); 38 | } 39 | B63_KEEP(res); 40 | } 41 | 42 | B63_BENCHMARK(call_twice, n) { 43 | int i = 0, res = 0; 44 | for (i = 0; i < n; i++) { 45 | res += (f() + f()); 46 | } 47 | B63_KEEP(res); 48 | } 49 | 50 | int main(int argc, char **argv) { 51 | B63_RUN_WITH("calls", argc, argv); 52 | return 0; 53 | } 54 | -------------------------------------------------------------------------------- /examples/indirect.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | 26 | const size_t kSize = (1 << 10); 27 | 28 | B63_BASELINE(direct, n) { 29 | std::vector v; 30 | B63_SUSPEND { 31 | v.resize(kSize); 32 | std::iota(v.begin(), v.end(), 0); 33 | } 34 | int32_t res = 0; 35 | for (size_t i = 0; i < n; i++) { 36 | for (size_t j = 0; j < kSize; j++) { 37 | res += v[j]; 38 | } 39 | } 40 | 41 | B63_KEEP(res); 42 | } 43 | 44 | B63_BENCHMARK(indirect, n) { 45 | std::vector v; 46 | B63_SUSPEND { 47 | v.resize(kSize); 48 | std::iota(v.begin(), v.end(), 0); 49 | } 50 | 51 | int32_t res = 0; 52 | for (size_t i = 0; i < n; i++) { 53 | for (size_t j = 0; j < kSize; j++) { 54 | res += v[v[j]]; 55 | } 56 | } 57 | B63_KEEP(res); 58 | } 59 | 60 | int main(int argc, char **argv) { 61 | srand(time(0)); 62 | B63_RUN_WITH("time", argc, argv); 63 | return 0; 64 | } 65 | -------------------------------------------------------------------------------- /examples/jemalloc.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/counters/jemalloc.h" 18 | #include "../include/b63/b63.h" 19 | 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | 27 | const size_t kSize = (1 << 16); 28 | 29 | B63_BASELINE(allocate, n) { 30 | std::vector v; 31 | v.resize(kSize * n); 32 | std::iota(v.begin(), v.end(), 0); 33 | int32_t res = accumulate(v.begin(), v.end(), 0); 34 | B63_KEEP(res); 35 | } 36 | 37 | B63_BENCHMARK(allocate_more, n) { 38 | std::vector v; 39 | v.resize((kSize + 1) * n); 40 | std::iota(v.begin(), v.end(), 0); 41 | int32_t res = accumulate(v.begin(), v.end(), 0); 42 | B63_KEEP(res); 43 | } 44 | 45 | int main(int argc, char **argv) { 46 | srand(time(0)); 47 | B63_RUN_WITH("jemalloc_thread_allocated", argc, argv); 48 | return 0; 49 | } 50 | -------------------------------------------------------------------------------- /examples/l1d_miss.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | #include "../include/b63/counters/perf_events.h" 19 | #include "../include/b63/counters/osx_kperf.h" 20 | 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | 28 | const size_t kSize = (1 << 16); 29 | const size_t kMask = kSize - 1; 30 | 31 | B63_BASELINE(sequential, n) { 32 | std::vector v; 33 | B63_SUSPEND { 34 | v.resize(kSize); 35 | std::iota(v.begin(), v.end(), 0); 36 | } 37 | int32_t res = 0; 38 | for (size_t i = 0; i < n; i++) { 39 | for (size_t j = 0; j < kSize; j++) { 40 | res += v[v[j] & kMask]; 41 | } 42 | } 43 | 44 | B63_KEEP(res); 45 | } 46 | 47 | B63_BENCHMARK(random, n) { 48 | std::vector v; 49 | B63_SUSPEND { 50 | v.resize(kSize); 51 | std::generate(v.begin(), v.end(), std::rand); 52 | } 53 | 54 | int32_t res = 0; 55 | for (size_t i = 0; i < n; i++) { 56 | for (size_t j = 0; j < kSize; j++) { 57 | res += v[v[j] & kMask]; 58 | } 59 | } 60 | B63_KEEP(res); 61 | } 62 | 63 | int main(int argc, char **argv) { 64 | srand(time(0)); 65 | B63_RUN_WITH("time", argc, argv); 66 | return 0; 67 | } 68 | -------------------------------------------------------------------------------- /examples/raw.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | #include "../include/b63/counters/perf_events.h" 19 | 20 | #include 21 | #include 22 | #include 23 | 24 | /* 25 | * This example illustrates using raw events from processor PMU. 26 | * In Linux perf_events that's achieved by passing combination 27 | * of 'mask' and 'event'. Those are CPU-specific, so, this example 28 | * might need to be run differently or not work at all on different cpus. 29 | * Still, it provides a decent illustration on how to use them (and why). 30 | * 31 | * In b63, -c lpe:r needs to be passed as command-line argument 32 | * in order to read counter value. 33 | * For example, on old Core2 Intel processor from 2009: 34 | * 35 | * mask = 0x01 && event = 0xA1 corresponds to 'uops executed on port 0' 36 | * mask = 0x02 && event = 0xA1 corresponds to 'uops executed on port 1' 37 | * mask = 0x04 && event = 0xA1 corresponds to 'uops executed on port 2' 38 | * mask = 0x08 && event = 0xA1 corresponds to 'uops executed on port 3' 39 | * mask = 0x10 && event = 0xA1 corresponds to 'uops executed on port 4' 40 | * mask = 0x20 && event = 0xA1 corresponds to 'uops executed on port 5' 41 | * 42 | * memory load is executed on port 2 and memory (data) store on port 4. 43 | * 44 | * Thus, running this benchmark produces result like this: 45 | * 46 | $ examples/_build/bm_raw -c lpe:r04A1 -i 47 | Running 2 benchmarks 48 | [DONE] many_writes : 32.571429 events per iteration 49 | [DONE] many_reads : 285749.285714 events per iteration 50 | 51 | $ examples/_build/bm_raw -c lpe:r10A1 -i 52 | Running 2 benchmarks 53 | [DONE] many_writes : 500017.428571 events per iteration 54 | [DONE] many_reads : 10.000000 events per iteration 55 | 56 | * 57 | * For more information on 'which instructions run where' and 'which codes 58 | correspond to which events' please refer to: 59 | * - Intel Optimization manual: 60 | https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-optimization-reference-manual 61 | * - Agner Fog's manual, specifically 'microarchitecture' volume: 62 | https://www.agner.org/optimize/#manuals 63 | * 64 | */ 65 | const int32_t kSize = 500000; 66 | const int32_t kLookups = 50000; 67 | 68 | B63_BENCHMARK(many_reads, n) { 69 | int32_t *v = NULL; 70 | int32_t i = 0, res = 0, j; 71 | B63_SUSPEND { 72 | v = malloc(kSize * sizeof(int32_t)); 73 | for (i = 0; i < kSize; i++) { 74 | v[i] = rand(); 75 | } 76 | } 77 | for (j = 0; j < n; j++) 78 | for (i = 0; i < kLookups; i++) { 79 | res += v[i]; 80 | } 81 | B63_KEEP(res); 82 | } 83 | 84 | B63_BENCHMARK(many_writes, n) { 85 | int32_t *v = NULL; 86 | int32_t i = 0, res = 0, j; 87 | B63_SUSPEND { 88 | v = malloc(kSize * sizeof(int32_t)); 89 | for (i = 0; i < kSize; i++) { 90 | v[i] = rand(); 91 | } 92 | } 93 | for (j = 0; j < n; j++) 94 | for (i = 0; i < kLookups; i++) { 95 | v[i] = res; 96 | res += i + j; 97 | } 98 | B63_KEEP(res); 99 | } 100 | 101 | int main(int argc, char **argv) { 102 | srand(time(0)); 103 | B63_RUN(argc, argv); 104 | return 0; 105 | } 106 | -------------------------------------------------------------------------------- /examples/storms.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/b63/b63.h" 2 | #include "../include/b63/counters/osx_kperf.h" 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | struct A { 15 | A* next; 16 | int64_t payload; 17 | }; 18 | 19 | constexpr bool is_pow2(int a) { 20 | return a && ((a & (a - 1)) == 0); 21 | } 22 | 23 | struct Test { 24 | Test(int64_t len, int64_t rep) : rep_(rep * len) { 25 | l.resize(len); 26 | for (int64_t i = 0; i < len; i++) { 27 | l[i].payload = i; 28 | } 29 | std::random_device rd; 30 | std::mt19937 g(rd()); 31 | std::shuffle(l.begin(), l.end(), g); 32 | 33 | for (int64_t i = 0; i + 1 < len; i++) { 34 | l[i].next = &l[i+1]; 35 | } 36 | l[len - 1].next = &l[0]; 37 | 38 | std::sort(l.begin(), l.end(), [](const A& a, const A& b){ 39 | return a.payload < b.payload; 40 | }); 41 | } 42 | 43 | template 44 | int64_t run() { 45 | static_assert(is_pow2(unroll), "unroll factor must be power of 2"); 46 | A* curr = &l[0]; 47 | int64_t res = 0; 48 | #pragma clang loop unroll_count(unroll) 49 | for (int64_t r = 0; r < rep_; r++) { 50 | curr = curr->next; 51 | if (curr->payload % 3 == 0) { 52 | res += curr->payload; 53 | } 54 | } 55 | return res; 56 | } 57 | 58 | template 59 | int64_t run2() { 60 | static_assert(is_pow2(unroll), "unroll factor must be power of 2"); 61 | A* curr = &l[0]; 62 | int64_t res = 0; 63 | #pragma clang loop unroll_count(unroll) 64 | for (int64_t r = 0; r < rep_; r++) { 65 | curr = curr->next; 66 | if (curr->payload % 3 == 0) { 67 | res += curr->payload; 68 | } 69 | if (curr->payload % 7 == 0) { 70 | res += curr->payload; 71 | } 72 | if (curr->payload % 11 == 0) { 73 | res += curr->payload; 74 | } 75 | } 76 | return res; 77 | } 78 | 79 | private: 80 | std::vector l; 81 | int64_t rep_; 82 | }; 83 | 84 | const size_t kSize = (1 << 10); 85 | 86 | #define BM_UNROLLED(name, qos, unroll) \ 87 | B63_BENCHMARK(name##_u##unroll, n) { \ 88 | pthread_set_qos_class_self_np(qos, 0); \ 89 | Test* t; \ 90 | B63_SUSPEND { \ 91 | t = new Test(kSize, n); \ 92 | } \ 93 | int64_t res = t->run2(); \ 94 | B63_KEEP(res); \ 95 | B63_SUSPEND { \ 96 | delete t; \ 97 | } \ 98 | } 99 | 100 | #define FIRESTORM_UNROLLED(unroll) \ 101 | BM_UNROLLED(firestorm, QOS_CLASS_USER_INTERACTIVE, unroll) 102 | #define ICESTORM_UNROLLED(unroll) \ 103 | BM_UNROLLED(icestorm, QOS_CLASS_BACKGROUND, unroll) 104 | 105 | FIRESTORM_UNROLLED(1) 106 | FIRESTORM_UNROLLED(2) 107 | FIRESTORM_UNROLLED(4) 108 | FIRESTORM_UNROLLED(8) 109 | FIRESTORM_UNROLLED(16) 110 | FIRESTORM_UNROLLED(32) 111 | FIRESTORM_UNROLLED(64) 112 | FIRESTORM_UNROLLED(128) 113 | 114 | ICESTORM_UNROLLED(1) 115 | ICESTORM_UNROLLED(2) 116 | ICESTORM_UNROLLED(4) 117 | ICESTORM_UNROLLED(8) 118 | ICESTORM_UNROLLED(16) 119 | ICESTORM_UNROLLED(32) 120 | ICESTORM_UNROLLED(64) 121 | ICESTORM_UNROLLED(128) 122 | 123 | int main(int argc, char **argv) { 124 | srand(time(0)); 125 | B63_RUN_WITH("time", argc, argv); 126 | return 0; 127 | } 128 | -------------------------------------------------------------------------------- /examples/suspend.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #include "../include/b63/b63.h" 18 | 19 | #include 20 | #include 21 | #include 22 | 23 | B63_BENCHMARK(basic, n) { 24 | int i = 0, res = 0; 25 | for (i = 0; i < n; i++) { 26 | res += rand(); 27 | } 28 | B63_KEEP(res); 29 | } 30 | 31 | B63_BENCHMARK(with_suspend, n) { 32 | int i = 0, res = 0; 33 | for (i = 0; i < n; i++) { 34 | res += rand(); 35 | } 36 | /* This block will not be counted */ 37 | B63_SUSPEND { 38 | for (i = 0; i < n; i++) { 39 | res += rand(); 40 | } 41 | } 42 | B63_KEEP(res); 43 | } 44 | 45 | int main(int argc, char **argv) { 46 | srand(time(0)); 47 | B63_RUN(argc, argv); 48 | return 0; 49 | } 50 | -------------------------------------------------------------------------------- /include/b63/b63.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_H_ 18 | #define _B63_H_ 19 | 20 | /* Topline include */ 21 | 22 | /* for syscall */ 23 | #ifdef __linux__ 24 | #ifndef _GNU_SOURCE 25 | #define _GNU_SOURCE 26 | #endif 27 | #endif 28 | 29 | /* for CLOCK_MONOTONIC */ 30 | #ifndef __APPLE__ 31 | #define _POSIX_C_SOURCE 200809L 32 | #endif 33 | 34 | #include "benchmark.h" 35 | #include "register.h" 36 | #include "run.h" 37 | 38 | #endif 39 | -------------------------------------------------------------------------------- /include/b63/benchmark.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_DATA_H_ 18 | #define _B63_DATA_H_ 19 | 20 | #include "counter_list.h" 21 | 22 | struct b63_epoch; 23 | struct b63_benchmark; 24 | struct b63_sink; 25 | struct b63_suite; 26 | struct b63_counter; 27 | 28 | /* 29 | * Epoch is a unit of benchmark execution which might consist 30 | * of multiple iterations. For each benchmark,counter pair there 31 | * could be several epochs configured to run. 32 | * Result of each epoch will be used as individual measurement 33 | * in confidence interval computation. 34 | */ 35 | typedef struct b63_epoch { 36 | struct b63_benchmark *benchmark; 37 | struct b63_counter *counter; 38 | int64_t iterations; 39 | int64_t events; 40 | int8_t suspension_done; 41 | int8_t fail; 42 | } b63_epoch; 43 | 44 | /* 45 | * benchmarked function template, it needs to support 'run n iterations' and 46 | * seed for any generation. 47 | */ 48 | typedef void (*b63_target_fn)(struct b63_epoch *, uint64_t, int64_t); 49 | 50 | /* 51 | * This struct represents individual benchmark. 52 | */ 53 | typedef struct b63_benchmark { 54 | /* benchmark name */ 55 | const char *name; 56 | /* function to benchmark */ 57 | const b63_target_fn run; 58 | /* is this benchmark a baseline? */ 59 | const int8_t is_baseline; 60 | /* [weak] pointer to suite config */ 61 | struct b63_suite *suite; 62 | /* [weak] pointer to current results. Used ONLY for baseline */ 63 | b63_epoch *results; 64 | /* if any run has failed; */ 65 | int8_t failed; 66 | } b63_benchmark; 67 | 68 | #endif 69 | -------------------------------------------------------------------------------- /include/b63/counter.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_COUNTER_H_ 18 | #define _B63_COUNTER_H_ 19 | 20 | #include 21 | #include 22 | #include 23 | #include 24 | 25 | #include "utils/section_ptr_list.h" 26 | #include "utils/string.h" 27 | 28 | /* 29 | * This is an 'interface' for a counter. 30 | * Each counter needs to implement 'read' function, with optional state 31 | * passed. State will be created initially from string-based config. 32 | * If counter is stateless, default factory method for state could be used. 33 | * 34 | * Cleanup is an optional function with 'destructor-like' role. For trivial 35 | * state, like 'initial integer value' where no resource were acquired, it can 36 | * be NULL. In cases like linux perf_events, where file descriptor better be 37 | * closed, it should be set up to do that job. 38 | * 39 | * Example of 'stateless' counter: time. State is maintained externally, 40 | * nothing needs to be stored in the counter itself. 41 | * 42 | * Example of stateful counter: linux perf_events. During initialization 43 | * file descriptor is opened, and then used to read counter value. That 44 | * file descriptor will be a part of state. 45 | */ 46 | 47 | typedef int64_t (*b63_counter_read_fn)(void *impl); 48 | typedef void (*b63_counter_cleanup_fn)(void *impl); 49 | typedef void (*b63_counter_activate_fn)(void *impl); 50 | 51 | /* 52 | * returns 0 if construction fails. Note that 'NULL' implementation 53 | * is valid for stateless counters, so, we have to do error-handling 54 | * differently 55 | */ 56 | typedef int8_t (*b63_counter_factory_fn)(const char *config, void **impl); 57 | /* Default counter factory for stateless counters. */ 58 | int8_t b63_counter_factory_fn_default(const char *config, void **impl) { 59 | return 1; 60 | } 61 | 62 | /* 63 | * This represents 'type' or 'family' of a counter. For example, 64 | * 'lpe' will have a single type of counter, but might have multiple 65 | * instances for different events. 66 | */ 67 | typedef struct b63_ctype { 68 | b63_counter_read_fn read; 69 | b63_counter_factory_fn factory; 70 | b63_counter_cleanup_fn cleanup; 71 | b63_counter_activate_fn activate; 72 | const char *prefix; 73 | } b63_ctype; 74 | 75 | /* 76 | * This is an 'instance' of a counter, with specific configuration. 77 | * For example, 'lpe:cycles' and 'lpe:branch-misses' would be two 78 | * separate instances of same counter type/family. 79 | */ 80 | typedef struct b63_counter { 81 | b63_ctype *type; 82 | char *name; 83 | void *impl; 84 | } b63_counter; 85 | 86 | /* Pointers to registered counter types will be stored here */ 87 | B63_LIST_DECLARE(b63_ctype); 88 | 89 | /* Counter registration */ 90 | #define B63_COUNTER_REG(name, f, c, a) \ 91 | static int64_t b63_counter_read_##name(void *); \ 92 | static b63_ctype b63_ctype_##name = { \ 93 | .read = b63_counter_read_##name, \ 94 | .factory = f, \ 95 | .cleanup = c, \ 96 | .activate = a, \ 97 | .prefix = #name, \ 98 | }; \ 99 | B63_LIST_ADD(b63_ctype, name, &b63_ctype_##name); \ 100 | static int64_t b63_counter_read_##name(void *impl) 101 | 102 | /* Default registration for stateless counter */ 103 | #define B63_COUNTER_REG_0(name) \ 104 | B63_COUNTER_REG(name, b63_counter_factory_fn_default, NULL, NULL) 105 | 106 | /* Default registration for counter with trivial state. */ 107 | #define B63_COUNTER_REG_1(name, f) B63_COUNTER_REG(name, f, NULL, NULL) 108 | 109 | #define B63_COUNTER_REG_2(name, f, c) B63_COUNTER_REG(name, f, c, NULL) 110 | 111 | /* macro 'overloading', so we can just use B63_COUNTER */ 112 | #define B63_COUNTER_GET_REG(_1, _2, _3, _4, IMPL_NAME, ...) IMPL_NAME 113 | #define B63_COUNTER(...) \ 114 | B63_COUNTER_GET_REG(__VA_ARGS__, B63_COUNTER_REG, B63_COUNTER_REG_2, B63_COUNTER_REG_1, \ 115 | B63_COUNTER_REG_0) \ 116 | (__VA_ARGS__) 117 | 118 | /* deallocates name and impl if provided. */ 119 | void b63_counter_cleanup(b63_counter *c) { 120 | if (c->impl != NULL) { 121 | /* things like 'close file descriptor' happen here */ 122 | if (c->type->cleanup != NULL) { 123 | c->type->cleanup(c->impl); 124 | } 125 | free(c->impl); 126 | c->impl = NULL; 127 | } 128 | free(c->name); 129 | c->name = NULL; 130 | } 131 | 132 | /* 133 | * initializes counter with config string passed as range from 'b' to 'e'. 134 | * returns 0 (= false) if no prefix matches. 135 | */ 136 | int8_t b63_counter_init(b63_counter *counter, const char *b, const char *e) { 137 | B63_LIST_FOR_EACH(b63_ctype, type) { 138 | if (b63_range_starts_with(b, e, (*type)->prefix)) { 139 | counter->type = *type; 140 | counter->name = (char *)malloc(e - b + 1); 141 | if (counter->name == NULL) { 142 | /* malloc failed */ 143 | fprintf(stderr, "memory allocation failed\n"); 144 | return 0; 145 | } 146 | memcpy(counter->name, b, e - b); 147 | counter->name[e - b] = '\0'; 148 | if (!(*type)->factory(counter->name, &counter->impl)) { 149 | /* implementation construction fails */ 150 | fprintf(stderr, "counter implementation construction fails for %s\n", 151 | counter->name); 152 | free(counter->name); 153 | return 0; 154 | } 155 | return 1; 156 | } 157 | } 158 | return 0; 159 | } 160 | 161 | #endif 162 | -------------------------------------------------------------------------------- /include/b63/counter_list.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_COUNTER_LIST_H_ 18 | #define _B63_COUNTER_LIST_H_ 19 | 20 | #include "counter.h" 21 | #include "utils/string.h" 22 | 23 | /* 24 | * b63_counter_list represents list of counters. 25 | * The only way it should be constructed 26 | * is directly from comma-separated configuration string. 27 | * Operations it supports: 28 | * - initialize from a config string 29 | * - iterate 30 | * - cleanup 31 | */ 32 | 33 | typedef struct b63_counter_list { 34 | b63_counter *data; 35 | size_t size; 36 | char *conf; 37 | } b63_counter_list; 38 | 39 | /* 40 | * Initialize counter list from config in the format 41 | * counter1[,counter2,counter3,...] 42 | * In case one of the counters is unable to initialize, it is skipped 43 | * and warning is printed out. 44 | */ 45 | static void b63_counter_list_init(b63_counter_list *counters, 46 | const char *conf) { 47 | const char sep = ','; 48 | /* empty config is not a valid one, so at least one should be there */ 49 | counters->size = 1 + b63_str_count(conf, sep); 50 | 51 | /* allocate counters */ 52 | counters->data = (b63_counter *)malloc(counters->size * sizeof(b63_counter)); 53 | if (counters->data == NULL) { 54 | /* allocation failed; unable to proceed */ 55 | fprintf(stderr, "memory allocation failed for counter list.\n"); 56 | exit(EXIT_FAILURE); 57 | } 58 | for (size_t i = 0; i < counters->size; i++) { 59 | counters->data[i].impl = NULL; 60 | counters->data[i].name = NULL; 61 | } 62 | 63 | const char *b = conf, *e; 64 | for (size_t i = 0; i < counters->size; i++) { 65 | e = strchr(b, sep); 66 | if (e == NULL) { /* will happen on last iteration */ 67 | e = b + strlen(b); 68 | } 69 | 70 | if (!b63_counter_init(&counters->data[i], b, e)) { 71 | fprintf(stderr, "counter not initialized: %.*s\n", (int)(e - b), b); 72 | i--; 73 | counters->size--; 74 | } 75 | /* 76 | * Extra 1 is for separator. 77 | * b might point to an invalid location on last iteration, 78 | * but it won't be dereferenced. 79 | */ 80 | b = e + 1; 81 | } 82 | } 83 | 84 | /* does counter-specific cleanup + destroys the list */ 85 | static void b63_counter_list_cleanup(b63_counter_list *counters) { 86 | for (size_t i = 0; i < counters->size; i++) { 87 | b63_counter_cleanup(&counters->data[i]); 88 | } 89 | free(counters->data); 90 | counters->data = NULL; 91 | } 92 | 93 | /* iterates over counters in the list */ 94 | #define B63_FOR_EACH_COUNTER(list, pc) \ 95 | for (b63_counter *pc = list.data; pc < list.data + list.size; pc++) 96 | 97 | #endif 98 | -------------------------------------------------------------------------------- /include/b63/counters/cycles.h: -------------------------------------------------------------------------------- 1 | #ifndef _B63_COUNTERS_CYCLES_H_ 2 | #define _B63_COUNTERS_CYCLES_H_ 3 | 4 | #include 5 | 6 | #ifdef __APPLE__ 7 | #ifdef __ARM64_ARCH_8__ 8 | 9 | /* 10 | * based on the test on icestorm core, this looks proportional 11 | * to time, not to cycles. 12 | */ 13 | B63_COUNTER(cycles) { 14 | uint64_t res; 15 | __asm__ volatile("mrs %0, cntvct_el0" : "=r" (res)); 16 | return res; 17 | } 18 | 19 | #endif 20 | #endif 21 | 22 | #endif 23 | -------------------------------------------------------------------------------- /include/b63/counters/jemalloc.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_COUNTERS_JEMALLOC_H_ 18 | #define _B63_COUNTERS_JEMALLOC_H_ 19 | 20 | #include "../counter.h" 21 | #include 22 | #ifdef __FreeBSD__ 23 | #include 24 | #else 25 | #include 26 | #endif 27 | 28 | /* 29 | * Defines a counter for memory allocated. Only allocations in 30 | * local thread is being counted for this counter. 31 | */ 32 | B63_COUNTER(jemalloc_thread_allocated) { 33 | uint64_t res; 34 | size_t len = sizeof(uint64_t); 35 | if (0 == mallctl("thread.allocated", &res, &len, NULL, 0)) { 36 | return res; 37 | } 38 | fprintf(stderr, "Unable to get stats from jemalloc"); 39 | return 0; 40 | } 41 | 42 | #endif 43 | -------------------------------------------------------------------------------- /include/b63/counters/osx_kperf.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2021 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | /* 18 | * Based on the work by 19 | * D. Lemire, Duc Tri Nguen and Dougall Johnson 20 | * References: 21 | * - https://dougallj.github.io/applecpu/firestorm.html 22 | * - https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2021/03/24 23 | */ 24 | 25 | #ifndef _B63_COUNTERS_OSX_KPERF_H_ 26 | #define _B63_COUNTERS_OSX_KPERF_H_ 27 | 28 | #ifdef __APPLE__ 29 | 30 | #include 31 | #include 32 | #include 33 | #include 34 | 35 | #include "../counter.h" 36 | 37 | #define B63_KPERF_FUNCTIONS \ 38 | F(uint32_t, kpc_get_counter_count, uint32_t) \ 39 | F(uint32_t, kpc_get_config_count, uint32_t) \ 40 | F(int, kpc_force_all_ctrs_set, int) \ 41 | F(int, kpc_set_counting, uint32_t) \ 42 | F(int, kpc_set_thread_counting, uint32_t) \ 43 | F(int, kpc_set_config, uint32_t, void *) \ 44 | F(int, kpc_get_thread_counters, int, unsigned int, void *) 45 | 46 | #define F(ret, name, ...) \ 47 | typedef ret name##proc(__VA_ARGS__); \ 48 | static name##proc *name; 49 | B63_KPERF_FUNCTIONS 50 | #undef F 51 | 52 | #define KPC_CLASS_FIXED (0) 53 | #define KPC_CLASS_CONFIGURABLE (1) 54 | #define KPC_CLASS_FIXED_MASK (1u << KPC_CLASS_FIXED) 55 | #define KPC_CLASS_CONFIGURABLE_MASK (1u << KPC_CLASS_CONFIGURABLE) 56 | #define KPC_MASK (KPC_CLASS_CONFIGURABLE_MASK | KPC_CLASS_FIXED_MASK) 57 | 58 | /* 59 | * For now this supports only M1 CPU. 60 | */ 61 | #define B63_KPERF_COUNTER_SIZE 10 62 | #define B63_KPERF_CONFIG_SIZE 8 63 | 64 | struct b63_kperf_counter_event_map { 65 | const char *event_name; 66 | uint64_t config; 67 | uint64_t config_index; 68 | uint64_t counter_index; 69 | } b63_kperf_counter_events_flat_map[] = { 70 | {"cycles", 0x02, 0, 0}, 71 | {"instructions", 0x8c, 0, 1}, 72 | {"branches", 0x8d, 3, 5}, 73 | {"L1-dcache-load-misses", 0xbf, 3, 5}, 74 | {"L1-dcache-store-misses", 0xc0, 3, 5}, 75 | {"dTLB-load-misses", 0xc1, 3, 5}, 76 | {"branch-misses", 0xcb, 3, 5}, 77 | {"L1-icache-load-misses", 0xd3, 3, 5}, 78 | {"iTLB-load-misses", 0xd4, 3, 5}, 79 | }; 80 | 81 | typedef struct b63_counter_kperf { 82 | uint64_t counters[B63_KPERF_COUNTER_SIZE]; 83 | uint64_t config[B63_KPERF_CONFIG_SIZE]; 84 | uint64_t counter_index; 85 | } b63_counter_kperf; 86 | 87 | static uint64_t b63_kperf_pick_event(const char* event_name, b63_counter_kperf* kperf) { 88 | /* 89 | * Not every position in 'configurable' section works with every counter. 90 | * We can configure cycles to any position, but, for example, l1d misses 91 | * are not working with positions 0-2. Given that at the moment we record 92 | * one counter at a time anyway, we can always use pos 3 for everything. 93 | * 94 | * For cycles and instructions, though, we will use fixed counters; 95 | */ 96 | for (size_t i = 0; i < sizeof(b63_kperf_counter_events_flat_map) / 97 | sizeof(struct b63_kperf_counter_event_map); 98 | ++i) { 99 | if (strcmp(b63_kperf_counter_events_flat_map[i].event_name, event_name) == 0) { 100 | const uint64_t CFGWORD_EL0A64EN_MASK = 0x20000; 101 | kperf->config[b63_kperf_counter_events_flat_map[i].config_index] = b63_kperf_counter_events_flat_map[i].config | CFGWORD_EL0A64EN_MASK; 102 | kperf->counter_index = b63_kperf_counter_events_flat_map[i].counter_index; 103 | 104 | return 1; 105 | } 106 | } 107 | return 0; 108 | } 109 | 110 | 111 | static int8_t b63_counter_kperf_create(const char* conf, void **impl) { 112 | void *kperf = dlopen( 113 | "/System/Library/PrivateFrameworks/kperf.framework/Versions/A/kperf", 114 | RTLD_LAZY); 115 | if (!kperf) { 116 | fprintf(stderr, "kperf library not loaded\n"); 117 | return 0; 118 | } 119 | #define F(ret, name, ...) \ 120 | name = (name##proc *)(dlsym(kperf, #name)); \ 121 | if (!name) { \ 122 | fprintf(stderr, "%s = %p\n", #name, (void *)name); \ 123 | return 0; \ 124 | } 125 | B63_KPERF_FUNCTIONS 126 | #undef F 127 | 128 | if (kpc_get_counter_count(KPC_MASK) != B63_KPERF_COUNTER_SIZE) { 129 | fprintf(stderr, "wrong fixed counters count\n"); 130 | return 0; 131 | } 132 | 133 | if (kpc_get_config_count(KPC_MASK) != B63_KPERF_CONFIG_SIZE) { 134 | fprintf(stderr, "wrong fixed config count\n"); 135 | return 0; 136 | } 137 | 138 | b63_counter_kperf *res = (b63_counter_kperf *)malloc(sizeof(b63_counter_kperf)); 139 | if (res == NULL) { 140 | fprintf(stderr, "memory allocation failed for kperf counter\n"); 141 | return 0; 142 | } 143 | 144 | const char *event_name = conf + strlen("kperf:"); 145 | uint64_t found = b63_kperf_pick_event(event_name, res); 146 | if (found == 0) { 147 | fprintf(stderr, "event %s not found", conf); 148 | free(res); 149 | return 0; 150 | } 151 | 152 | *impl = res; 153 | return 1; 154 | } 155 | 156 | static void b63_counter_kperf_cleanup(void* impl) { 157 | if (impl != NULL) { 158 | /* 159 | * TODO: unload the dl library? 160 | * This should be done on counter-family level though. 161 | */ 162 | } 163 | } 164 | 165 | /* 166 | * activate function will be called right before the measurements for 167 | * this counter. It is needed to support multiple counters from kperf set, 168 | * for example, both l1d misses and dtlb misses. Apple's kperf doesn't seem to 169 | * multiplex/maintain internal state, thus we can only configure 170 | * a constant set of counters to measure. 171 | */ 172 | static void b63_counter_kperf_activate(void* impl) { 173 | if (impl != NULL) { 174 | b63_counter_kperf *kperf = (b63_counter_kperf *)impl; 175 | if (kpc_set_config(KPC_MASK, kperf->config)) { 176 | fprintf(stderr, "kpc_set_config failed\n"); 177 | return; 178 | } 179 | 180 | if (kpc_force_all_ctrs_set(1)) { 181 | fprintf(stderr, "kpc_force_all_ctrs_set failed\n"); 182 | return; 183 | } 184 | 185 | if (kpc_set_counting(KPC_MASK)) { 186 | fprintf(stderr, "kpc_set_counting failed\n"); 187 | return; 188 | } 189 | 190 | if (kpc_set_thread_counting(KPC_MASK)) { 191 | fprintf(stderr, "kpc_set_thread_counting failed\n"); 192 | return; 193 | } 194 | } 195 | } 196 | 197 | /* impl is 'passed' implicitly */ 198 | B63_COUNTER(kperf, b63_counter_kperf_create, b63_counter_kperf_cleanup, b63_counter_kperf_activate) { 199 | b63_counter_kperf *kperf = (b63_counter_kperf *)impl; 200 | if (kpc_get_thread_counters(0, B63_KPERF_COUNTER_SIZE, kperf->counters)) { 201 | fprintf(stderr, "kpc_get_thread_counters failed\n"); 202 | } 203 | return kperf->counters[kperf->counter_index]; 204 | } 205 | 206 | #endif 207 | #endif 208 | -------------------------------------------------------------------------------- /include/b63/counters/perf_events.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_COUNTERS_PERF_EVENTS_H_ 18 | #define _B63_COUNTERS_PERF_EVENTS_H_ 19 | 20 | #ifdef __linux__ 21 | 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | 32 | #include "../counter.h" 33 | #include "perf_events_map.h" 34 | 35 | /* 36 | * Linux perf_events-based counters. 37 | * 'lpe' is used as an acronym for linux perf_events. 38 | */ 39 | 40 | /* 'state' of a counter */ 41 | typedef struct b63_counter_lpe { 42 | int32_t fd; 43 | } b63_counter_lpe; 44 | 45 | /* 46 | * open file descriptor to read counter. See 47 | * http://man7.org/linux/man-pages/man2/perf_event_open.2.html for the list 48 | */ 49 | static int32_t b63_counter_lpe_open(uint32_t type, uint64_t config) { 50 | struct perf_event_attr pe; 51 | 52 | memset(&pe, 0, sizeof(struct perf_event_attr)); 53 | pe.type = type; 54 | pe.size = sizeof(struct perf_event_attr); 55 | pe.config = config; 56 | pe.disabled = 1; 57 | pe.exclude_kernel = 1; 58 | pe.exclude_hv = 1; 59 | 60 | /* Counting for current process, any cpu: pid == 0, cpu == -1 */ 61 | int32_t fd = syscall(__NR_perf_event_open, &pe, 0, -1, -1, 0); 62 | if (fd == -1) { 63 | fprintf(stderr, "perf event open error\n"); 64 | return -1; 65 | } 66 | /* no need to reset really, as we care about difference, not value */ 67 | ioctl(fd, PERF_EVENT_IOC_RESET, 0); 68 | ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); 69 | return fd; 70 | } 71 | 72 | /* 73 | * looking up type/config combination by event_name. 74 | * names match those of perf tool (`perf list`), for consistency. 75 | * returns -1 on failure. 76 | */ 77 | static int32_t b63_counter_lpe_init(const char *event_name) { 78 | /* first, check list of predefined events */ 79 | for (size_t i = 0; i < sizeof(b63_counter_events_flat_map) / 80 | sizeof(struct b63_counter_event_map); 81 | ++i) { 82 | if (strcmp(b63_counter_events_flat_map[i].event_name, event_name) == 0) { 83 | return b63_counter_lpe_open(b63_counter_events_flat_map[i].type, 84 | b63_counter_events_flat_map[i].config); 85 | } 86 | } 87 | /* now, trying raw event in format r, for example r01a1 */ 88 | if (strlen(event_name) > 1) { 89 | if (event_name[0] == 'r') { 90 | uint64_t conf = strtoull(event_name + 1, NULL, 16); 91 | return b63_counter_lpe_open(PERF_TYPE_RAW, conf); 92 | } 93 | } 94 | fprintf(stderr, "linux perf_events: unable to find event %s\n", event_name); 95 | return -1; 96 | } 97 | 98 | /* 99 | * conf comes in format 'lpe:cycles' 100 | * Thus, we pick the suffix and try to find needed options. 101 | * Return: 102 | * 0 if failure, 103 | * 1 if success 104 | */ 105 | static int8_t b63_counter_lpe_create(const char *conf, void **impl) { 106 | b63_counter_lpe *lpe = (b63_counter_lpe *)malloc(sizeof(b63_counter_lpe)); 107 | if (lpe == NULL) { 108 | fprintf(stderr, "memory allocation failed for lpe counter\n"); 109 | return 0; 110 | } 111 | 112 | const char *event_name = conf + strlen("lpe:"); 113 | lpe->fd = b63_counter_lpe_init(event_name); 114 | if (lpe->fd == -1) { 115 | /* failure: need to free resources */ 116 | free(lpe); 117 | return 0; 118 | } 119 | *impl = lpe; 120 | return 1; 121 | } 122 | 123 | /* Close file descriptor */ 124 | static void b63_counter_lpe_cleanup(void *impl) { 125 | if (impl != NULL) { 126 | b63_counter_lpe *lpe_impl = (b63_counter_lpe *)impl; 127 | close(lpe_impl->fd); 128 | } 129 | } 130 | 131 | /* impl is 'passed' implicitly */ 132 | B63_COUNTER(lpe, b63_counter_lpe_create, b63_counter_lpe_cleanup) { 133 | b63_counter_lpe *lpe_impl = (b63_counter_lpe *)impl; 134 | int64_t res; 135 | if (read(lpe_impl->fd, &res, sizeof(int64_t)) != sizeof(int64_t)) { 136 | fprintf(stderr, "read from perf_events fd failed"); 137 | return 0; 138 | } 139 | return res; 140 | } 141 | 142 | #endif /* __linux__ */ 143 | #endif 144 | -------------------------------------------------------------------------------- /include/b63/counters/perf_events_map.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_COUNTERS_PERF_EVENTS_MAP_H_ 18 | #define _B63_COUNTERS_PERF_EVENTS_MAP_H_ 19 | 20 | #ifdef __linux__ 21 | 22 | #include 23 | #include 24 | 25 | /* 26 | * Map from user-friendly strings to constants perf_events can understand. 27 | */ 28 | struct b63_counter_event_map { 29 | const char *event_name; 30 | uint32_t type; 31 | uint64_t config; 32 | } b63_counter_events_flat_map[] = { 33 | /* HW events */ 34 | {"branches", PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS}, 35 | {"branch-misses", PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES}, 36 | {"bus-cycles", PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES}, 37 | {"cache-misses", PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES}, 38 | {"cache-references", PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES}, 39 | {"cycles", PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES}, 40 | {"instructions", PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS}, 41 | {"ref-cycles", PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES}, 42 | 43 | /* SW events */ 44 | {"context-switches", PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES}, 45 | {"cs", PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES}, 46 | {"major-faults", PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ}, 47 | {"minor-faults", PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN}, 48 | {"page-faults", PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS}, 49 | 50 | /* Cache events */ 51 | {"L1-dcache-load-misses", PERF_TYPE_HW_CACHE, 52 | (PERF_COUNT_HW_CACHE_L1D | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 53 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 54 | {"L1-dcache-loads", PERF_TYPE_HW_CACHE, 55 | (PERF_COUNT_HW_CACHE_L1D | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 56 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 57 | {"L1-dcache-prefetches", PERF_TYPE_HW_CACHE, 58 | (PERF_COUNT_HW_CACHE_L1D | (PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) | 59 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 60 | {"L1-dcache-store-misses", PERF_TYPE_HW_CACHE, 61 | (PERF_COUNT_HW_CACHE_L1D | (PERF_COUNT_HW_CACHE_OP_WRITE << 8) | 62 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 63 | {"L1-dcache-stores", PERF_TYPE_HW_CACHE, 64 | (PERF_COUNT_HW_CACHE_L1D | (PERF_COUNT_HW_CACHE_OP_WRITE << 8) | 65 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 66 | {"L1-icache-load-misses", PERF_TYPE_HW_CACHE, 67 | (PERF_COUNT_HW_CACHE_L1I | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 68 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 69 | {"L1-icache-loads", PERF_TYPE_HW_CACHE, 70 | (PERF_COUNT_HW_CACHE_L1I | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 71 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 72 | 73 | {"LLC-load-misses", PERF_TYPE_HW_CACHE, 74 | (PERF_COUNT_HW_CACHE_LL | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 75 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 76 | {"LLC-loads", PERF_TYPE_HW_CACHE, 77 | (PERF_COUNT_HW_CACHE_LL | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 78 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 79 | {"LLC-store-misses", PERF_TYPE_HW_CACHE, 80 | (PERF_COUNT_HW_CACHE_LL | (PERF_COUNT_HW_CACHE_OP_WRITE << 8) | 81 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 82 | {"LLC-stores", PERF_TYPE_HW_CACHE, 83 | (PERF_COUNT_HW_CACHE_LL | (PERF_COUNT_HW_CACHE_OP_WRITE << 8) | 84 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 85 | 86 | {"branch-load-misses", PERF_TYPE_HW_CACHE, 87 | (PERF_COUNT_HW_CACHE_BPU | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 88 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 89 | {"branch-loads", PERF_TYPE_HW_CACHE, 90 | (PERF_COUNT_HW_CACHE_BPU | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 91 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 92 | 93 | {"iTLB-load-misses", PERF_TYPE_HW_CACHE, 94 | (PERF_COUNT_HW_CACHE_ITLB | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 95 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 96 | {"iTLB-loads", PERF_TYPE_HW_CACHE, 97 | (PERF_COUNT_HW_CACHE_ITLB | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 98 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 99 | 100 | {"dTLB-load-misses", PERF_TYPE_HW_CACHE, 101 | (PERF_COUNT_HW_CACHE_DTLB | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 102 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 103 | {"dTLB-loads", PERF_TYPE_HW_CACHE, 104 | (PERF_COUNT_HW_CACHE_DTLB | (PERF_COUNT_HW_CACHE_OP_READ << 8) | 105 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}, 106 | {"dTLB-store-misses", PERF_TYPE_HW_CACHE, 107 | (PERF_COUNT_HW_CACHE_DTLB | (PERF_COUNT_HW_CACHE_OP_WRITE << 8) | 108 | (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))}, 109 | {"dTLB-stores", PERF_TYPE_HW_CACHE, 110 | (PERF_COUNT_HW_CACHE_DTLB | (PERF_COUNT_HW_CACHE_OP_WRITE << 8) | 111 | (PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16))}}; 112 | 113 | #endif /* __linux__ */ 114 | 115 | #endif 116 | -------------------------------------------------------------------------------- /include/b63/counters/time.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_COUNTERS_TIME_H_ 18 | #define _B63_COUNTERS_TIME_H_ 19 | #ifdef NO_GET_TIME_SUPPORTED 20 | #include 21 | #else 22 | #include 23 | #endif 24 | #include "../counter.h" 25 | 26 | /* default counter returning time in nanoseconds */ 27 | B63_COUNTER(time) { 28 | #ifdef NO_GET_TIME_SUPPORTED 29 | struct timeval tv; 30 | 31 | if (gettimeofday (&tv, NULL) == 0) 32 | return (uint64_t) (tv.tv_sec * 1000000 + tv.tv_usec) * 1000; 33 | else 34 | return 0; 35 | #else 36 | struct timespec t; 37 | clock_gettime(CLOCK_MONOTONIC, &t); 38 | int64_t res = 1000000000LL * (int64_t)t.tv_sec + t.tv_nsec; 39 | return res; 40 | #endif 41 | } 42 | 43 | #endif 44 | -------------------------------------------------------------------------------- /include/b63/printer.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_PRINTER_ 18 | #define _B63_PRINTER_ 19 | 20 | #include "benchmark.h" 21 | #include "suite.h" 22 | #include "utils/stats.h" 23 | 24 | #include 25 | #include 26 | 27 | const char *B63_CLR_RED = "\033[0;31m"; 28 | const char *B63_CLR_GREEN = "\033[0;32m"; 29 | const char *B63_CLR_RESET = "\033[0m"; 30 | 31 | static void b63_print_individual(b63_benchmark *bm, const char *counter, 32 | b63_stats *tt) { 33 | if (bm->suite->printer_config.plaintext != 0) { 34 | return; 35 | } 36 | if (bm->failed) { 37 | printf("%s%-30s%-20s: assertion fail%s\n", B63_CLR_RED, 38 | bm->name, counter, B63_CLR_RESET); 39 | return; 40 | } 41 | printf("%-30s%-20s: %6.3lf\n", bm->name, counter, tt->sum_test / tt->n); 42 | } 43 | 44 | static void b63_print_comparison(b63_benchmark *bm, const char *counter, 45 | b63_stats *tt) { 46 | if (bm->suite->printer_config.plaintext != 0) { 47 | return; 48 | } 49 | if (bm->failed) { 50 | printf("%s%-30s%-20s: assertion fail%s\n", B63_CLR_RED, 51 | bm->name, counter, B63_CLR_RESET); 52 | return; 53 | } 54 | double d = b63_stats_diff(tt); 55 | double percentage_diff = b63_stats_percentage_diff(tt); 56 | double interval99 = b63_stats_99_interval(tt); 57 | double a = d - interval99; 58 | double b = d + interval99; 59 | const char *c = B63_CLR_RESET; 60 | char confident = ' '; 61 | /* confidense interval outside of 0 */ 62 | if (a * b > 0) { 63 | c = d < 0 ? B63_CLR_GREEN : B63_CLR_RED; 64 | confident = '*'; 65 | } 66 | printf("%-30s%-20s: %s%6.3lf (%+6.3lf%% %c)%s\n", bm->name, counter, c, 67 | tt->sum_test / tt->n, percentage_diff, confident, B63_CLR_RESET); 68 | 69 | fflush(stdout); 70 | } 71 | 72 | static void b63_print_done(b63_epoch *r) { 73 | /* plaintext output */ 74 | if (r->benchmark->suite->printer_config.plaintext != 0) { 75 | char d = r->benchmark->suite->printer_config.delimiter; 76 | printf("%s%c%s%c%" PRId64 "%c%" PRId64 "%c%lf\n", r->benchmark->name, d, 77 | r->counter->name, d, r->iterations, d, r->events, d, 78 | 1.0 * r->events / r->iterations); 79 | fflush(stdout); 80 | } 81 | } 82 | 83 | #endif 84 | -------------------------------------------------------------------------------- /include/b63/register.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_REGISTER_H_ 18 | #define _B63_REGISTER_H_ 19 | 20 | #include 21 | 22 | #include "benchmark.h" 23 | #include "utils/section_ptr_list.h" 24 | 25 | /* 26 | * Registration consists of several parts: 27 | * - declare the function which we will 'measure'; 28 | * - create a b63_benchmark struct to hold benchmark-specific data; 29 | * - put a pointer to that struct in separate section, so we can iterate on 30 | * all 'registered' benchmarks later; 31 | * - start the definition of the function. 32 | */ 33 | 34 | B63_LIST_DECLARE(b63_benchmark) 35 | 36 | #define B63_BENCHMARK_IMPL(bname, baseline, iters) \ 37 | void b63_run_##bname(b63_epoch *, uint64_t, int64_t); \ 38 | static b63_benchmark b63_b_##bname = { \ 39 | .name = #bname, \ 40 | .run = b63_run_##bname, \ 41 | .is_baseline = baseline, \ 42 | .failed = 0, \ 43 | }; \ 44 | B63_LIST_ADD(b63_benchmark, bname, &b63_b_##bname); \ 45 | void b63_run_##bname(b63_epoch *b63run, uint64_t iters, int64_t b63_seed) 46 | 47 | /* 48 | * Marking benchmark as a 'baseline' would make other benchmarks to be compared 49 | * against it. There could be only one baseline. 50 | */ 51 | #define B63_BASELINE(name, iters) B63_BENCHMARK_IMPL(name, 1, iters) 52 | #define B63_BENCHMARK(name, iters) B63_BENCHMARK_IMPL(name, 0, iters) 53 | 54 | #endif 55 | -------------------------------------------------------------------------------- /include/b63/run.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_RUN_H_ 18 | #define _B63_RUN_H_ 19 | 20 | #include 21 | #include 22 | #include 23 | 24 | #include "benchmark.h" 25 | #include "printer.h" 26 | #include "suite.h" 27 | #include "utils/section_ptr_list.h" 28 | #include "utils/stats.h" 29 | #include "utils/timer.h" 30 | 31 | #include "counters/time.h" 32 | 33 | static void b63_epoch_run(b63_epoch *e, int64_t seed) { 34 | b63_benchmark *b = e->benchmark; 35 | const int64_t timelimit_ms = 36 | 1000LL * b->suite->timelimit_s / b->suite->epochs; 37 | b63_counter *counter = e->counter; 38 | 39 | const int64_t max_iterations_per_epoch = (1LL << 31LL); 40 | e->events = 0LL; 41 | e->iterations = 0LL; 42 | e->suspension_done = 0; 43 | e->fail = 0; 44 | 45 | int64_t started, done; 46 | /* 47 | * For each epoch run as many iterations as fit within the time budget 48 | */ 49 | int64_t started_ms = b63_now_ms(); 50 | for (int64_t n = 1; e->iterations < max_iterations_per_epoch; n *= 2) { 51 | 52 | /* Here the 'measured' function is called */ 53 | started = counter->type->read(counter->impl); 54 | b->run(e, n, seed); 55 | done = counter->type->read(counter->impl); 56 | 57 | e->events += (done - started); 58 | e->iterations += n; 59 | 60 | /* ran out of time */ 61 | if ((b63_now_ms() - started_ms) > timelimit_ms) { 62 | break; 63 | } 64 | } 65 | b63_print_done(e); 66 | } 67 | 68 | /* 69 | * Runs benchmark b while measuring counter c 70 | */ 71 | static void b63_benchmark_run(b63_benchmark *b, b63_counter *c, 72 | b63_epoch *results) { 73 | int64_t result_index = 0LL; 74 | b63_suite *suite = b->suite; 75 | b63_stats tt; 76 | b63_stats_init(&tt); 77 | 78 | int64_t next_seed = suite->seed; 79 | b63_epoch *baseline_result = NULL; 80 | if (suite->baseline != NULL && b->is_baseline == 0) { 81 | baseline_result = suite->baseline->results; 82 | } 83 | 84 | for (int64_t e = 0; e < suite->epochs; e++) { 85 | b63_epoch *r = &(results[result_index++]); 86 | r->benchmark = b; 87 | r->counter = c; 88 | b63_epoch_run(r, next_seed); 89 | if (r->fail) { 90 | b->failed = 1; 91 | break; 92 | } 93 | next_seed = 134775853LL * next_seed + 1; 94 | 95 | double baseline_rate = 0.0; 96 | if (baseline_result != NULL) { 97 | baseline_rate = 98 | 1.0 * baseline_result->events / baseline_result->iterations; 99 | baseline_result++; 100 | } 101 | b63_stats_add(1.0 * r->events / r->iterations, baseline_rate, &tt); 102 | } 103 | if (baseline_result != NULL) { 104 | b63_print_comparison(b, c->name, &tt); 105 | } else { 106 | b63_print_individual(b, c->name, &tt); 107 | } 108 | } 109 | 110 | static void b63_suite_run(b63_suite *suite) { 111 | B63_LIST_FOR_EACH(b63_benchmark, b) { 112 | (*b)->suite = suite; 113 | /* set baseline */ 114 | if ((*b)->is_baseline) { 115 | if (suite->baseline != NULL) { 116 | fprintf(stderr, "two or more baselines defined.\n"); 117 | exit(EXIT_FAILURE); 118 | } 119 | suite->baseline = *b; 120 | } 121 | } 122 | 123 | b63_epoch *results = (b63_epoch *)malloc(suite->epochs * sizeof(b63_epoch)); 124 | b63_epoch *baseline_results = NULL; 125 | if (suite->baseline != NULL) { 126 | baseline_results = (b63_epoch *)malloc(suite->epochs * sizeof(b63_epoch)); 127 | suite->baseline->results = baseline_results; 128 | } 129 | 130 | B63_FOR_EACH_COUNTER(suite->counter_list, counter) { 131 | if (counter->type->activate != NULL) { 132 | counter->type->activate(counter->impl); 133 | } 134 | if (suite->baseline != NULL) { 135 | b63_benchmark_run(suite->baseline, counter, baseline_results); 136 | } 137 | B63_LIST_FOR_EACH(b63_benchmark, b) { 138 | if ((*b)->is_baseline) { 139 | continue; 140 | } 141 | b63_benchmark_run(*b, counter, results); 142 | } 143 | } 144 | 145 | free(results); 146 | if (baseline_results != NULL) { 147 | free(baseline_results); 148 | } 149 | } 150 | 151 | /* Reads config, creates suite and executes it */ 152 | static void b63_go(int argc, char **argv, const char *default_counter) { 153 | b63_suite suite; 154 | 155 | b63_suite_init(&suite, argc, argv); 156 | 157 | if (suite.counter_list.size == 0) { 158 | b63_counter_list_init(&suite.counter_list, default_counter); 159 | } 160 | 161 | b63_suite_run(&suite); 162 | 163 | b63_counter_list_cleanup(&suite.counter_list); 164 | } 165 | 166 | /* 167 | * Runs all registered benchmarks with counter. If '-c' is provided, it 168 | * overrides the setting in the code. 169 | * default_counter is a string literal, to allow 170 | * configuration to be passed easily, for example "lpe:cycles". 171 | */ 172 | #define B63_RUN_WITH(default_counter, argc, argv) \ 173 | b63_go(argc, argv, default_counter); 174 | 175 | /* shortcut for 'default' timer-based counter. */ 176 | #define B63_RUN(argc, argv) B63_RUN_WITH("time", argc, argv) 177 | 178 | /* 179 | * B63_KEEP implements a way to prevent compiler from optimizing out the 180 | * variable. Example: 181 | * 182 | * int res = 0; 183 | * ... some computations ... 184 | * B63_KEEP(res); 185 | */ 186 | #define B63_KEEP(v) __asm__ __volatile__("" ::"m"(v)) 187 | 188 | /* 189 | * B63_SUSPEND allows to 'exclude' the counted events (or time) 190 | * from part of the code. It is more convenient way to do set up, for example: 191 | * 192 | * B63_BENCHMARK(sort_benchmark, n) { 193 | * std::vector a; 194 | * B63_SUSPEND { // this block will be excluded 195 | * a.resize(n); 196 | * std::generate(a.begin(), a.end(), gen); // some random generator 197 | * } 198 | * std::sort(a.begin(), a.end()); 199 | * } 200 | */ 201 | 202 | typedef struct b63_suspension { 203 | int64_t start; 204 | b63_epoch *run; 205 | } b63_suspension; 206 | 207 | /* 208 | * This is a callback to execute when suspend context gets out of scope. 209 | * Total number of events during 'suspension loop' is subtracted from 210 | * event counter. 211 | */ 212 | static void b63_suspension_done(b63_suspension *s) { 213 | s->run->events -= 214 | (s->run->counter->type->read(s->run->counter->impl) - s->start); 215 | } 216 | 217 | /* 218 | * Use __attribute__((cleanup)) to trigger the measurement when sctx goes out of 219 | * scope. This allows to compute 'how many events happened during suspension'. 220 | */ 221 | #define B63_SUSPEND \ 222 | b63run->suspension_done = 0; \ 223 | for (b63_suspension b63s __attribute__((unused, cleanup(b63_suspension_done))) = \ 224 | { \ 225 | .start = b63run->counter->type->read(b63run->counter->impl), \ 226 | .run = b63run, \ 227 | }; \ 228 | b63run->suspension_done == 0; b63run->suspension_done = 1) 229 | 230 | #endif 231 | 232 | /* 233 | * Sanity check testing. This is not supposed to be a replacement for unit-testing, 234 | * but it's important to be able to make sure benchmark produced expected result 235 | * if applicable 236 | */ 237 | #define B63_ASSERT(c) if (!(c)) { b63run->fail = 1; } 238 | 239 | -------------------------------------------------------------------------------- /include/b63/suite.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_CONFIG_H_ 18 | #define _B63_CONFIG_H_ 19 | 20 | #include 21 | #include 22 | #include 23 | 24 | #include "benchmark.h" 25 | 26 | /* Configuration for printing the results. */ 27 | typedef struct b63_printer_config { 28 | uint8_t plaintext; 29 | char delimiter; 30 | } b63_printer_config; 31 | 32 | /* This is execution state for whole benchmark suite. */ 33 | typedef struct b63_suite { 34 | int64_t epochs, timelimit_s; 35 | b63_benchmark *baseline; /* NULL if no baseline in the suite */ 36 | b63_printer_config printer_config; 37 | 38 | /* full list of counters to run */ 39 | b63_counter_list counter_list; 40 | 41 | /* global seed for whole suite */ 42 | int64_t seed; 43 | } b63_suite; 44 | 45 | /* 46 | * Reads and updates suite from CLI arguments. 47 | * Might allocate counter, the responsibility to free is on caller 48 | * 49 | * Examples of what should be parsed: 50 | * 51 | * bm_sqlite -t 30 -e 5 -c disk_read 52 | * bm_sqlite -t 30 -e 5 -c disk_write 53 | * bm_hashmap -t 5 -e 10 -c lpe:L1-dcache-loads 54 | * bm_hashmap -t 5 -e 10 -c time 55 | * bm_hashmap 56 | * bm_custom -c calls 57 | * bm_decision_tree -t 10 -e 5 -i -c lpe:branch-misses 58 | */ 59 | 60 | static void b63_suite_init(b63_suite *suite, int argc, char **argv) { 61 | int c; 62 | 63 | /* default values */ 64 | suite->timelimit_s = 1; 65 | suite->epochs = 30; 66 | suite->counter_list.size = 0; 67 | suite->counter_list.data = NULL; 68 | suite->baseline = NULL; 69 | suite->printer_config.plaintext = 1; 70 | suite->printer_config.delimiter = ','; 71 | 72 | while ((c = getopt(argc, argv, "it:e:c:d:s:")) != -1) { 73 | switch (c) { 74 | case 'i': 75 | suite->printer_config.plaintext = 0; 76 | break; 77 | case 'd': 78 | /* TODO: rename to delimiter */ 79 | suite->printer_config.delimiter = optarg[0]; 80 | break; 81 | case 's': 82 | suite->seed = strtoll(optarg, NULL, 10); 83 | break; 84 | case 't': 85 | suite->timelimit_s = atoi(optarg); 86 | break; 87 | case 'e': 88 | suite->epochs = atoi(optarg); 89 | if (!(suite->epochs > 0)) { 90 | fprintf(stderr, "epochs count much be > 0\n"); 91 | exit(EXIT_FAILURE); 92 | } 93 | break; 94 | case 'c': 95 | b63_counter_list_init(&suite->counter_list, optarg); 96 | if (suite->counter_list.size == 0) { 97 | fprintf(stderr, "counter_list unable to init: %s\n", optarg); 98 | exit(EXIT_FAILURE); 99 | } 100 | break; /* from switch */ 101 | } 102 | } 103 | } 104 | 105 | #endif 106 | -------------------------------------------------------------------------------- /include/b63/utils/section_ptr_list.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_UTILS_B63_LIST_H_ 18 | #define _B63_UTILS_B63_LIST_H_ 19 | 20 | /* 21 | * This is a 'list of pointers, filled in at compile (link) time. 22 | * It is stored in separate named data section; 23 | * see register.h and counters/counter.h for usage example. 24 | * TODO: using __attribute__((constructor)) could be more portable. 25 | * We use gcc extensions anyway. 26 | */ 27 | 28 | #ifdef __APPLE__ 29 | 30 | #define B63_LIST_DECLARE(typ) \ 31 | extern typ *__start_##typ __asm("section$start$__DATA$__" #typ); \ 32 | extern typ *__stop_##typ __asm("section$end$__DATA$__" #typ); 33 | 34 | #define B63_LIST_ADD(typ, id, ptr) \ 35 | static typ *_##typ_##id __attribute__((used, section("__DATA,__" #typ))) = \ 36 | ptr; 37 | #else 38 | 39 | #define B63_LIST_DECLARE(typ) \ 40 | extern typ *__start_##typ; \ 41 | extern typ *__stop_##typ; 42 | 43 | #define B63_LIST_ADD(typ, id, ptr) \ 44 | static typ *_##typ_##id __attribute__((used, section(#typ))) = ptr; 45 | #endif 46 | 47 | #define B63_LIST_FOR_EACH(typ, pp) \ 48 | for (typ **pp = &__start_##typ; pp < &__stop_##typ; ++pp) 49 | 50 | #define B63_LIST_SIZE(typ) (&__stop_##typ - &__start_##typ) 51 | 52 | #endif 53 | -------------------------------------------------------------------------------- /include/b63/utils/stats.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_UTILS_STATS_H_ 18 | #define _B63_UTILS_STATS_H_ 19 | 20 | #include 21 | #include 22 | 23 | #include "ttable.h" 24 | 25 | /* 26 | * This struct represents data needed to perform incremental 27 | * computation of difference between two measurements, its mean, variance, 28 | * paired t-test. 29 | */ 30 | typedef struct b63_stats { 31 | int64_t n; 32 | double sum_base, sum_test; 33 | double sum; 34 | double sum_squared; 35 | } b63_stats; 36 | 37 | /* 38 | * Initialize with 'empty' values. 39 | */ 40 | void b63_stats_init(b63_stats *stats) { 41 | stats->n = 0LL; 42 | stats->sum = 0.0; 43 | stats->sum_base = 0.0; 44 | stats->sum_test = 0.0; 45 | stats->sum_squared = 0.0; 46 | } 47 | 48 | /* 49 | * Add a new value. 50 | */ 51 | void b63_stats_add(double test, double base, b63_stats *stats) { 52 | stats->sum_base += base; 53 | stats->sum_test += test; 54 | double diff = test - base; 55 | stats->sum += diff; 56 | stats->sum_squared += diff * diff; 57 | stats->n++; 58 | } 59 | 60 | /* 61 | * Average difference. 62 | */ 63 | double b63_stats_diff(b63_stats *stats) { return stats->sum / stats->n; } 64 | 65 | /* 66 | * Average difference in % 67 | */ 68 | double b63_stats_percentage_diff(b63_stats *stats) { 69 | return 100.0 * stats->sum_test / stats->sum_base - 100.0; 70 | } 71 | 72 | /*compute 95% confidence interval */ 73 | double b63_stats_99_interval(b63_stats *stats) { 74 | double val = b63_ttable_0005[stats->n < b63_ttable_0005_size 75 | ? stats->n 76 | : b63_ttable_0005_size - 1]; 77 | double n = stats->n; 78 | double s = stats->sum; 79 | double ssq = stats->sum_squared; 80 | return sqrt((ssq - s * s / n) / (n * (n - 1))) * val; 81 | } 82 | 83 | /* 84 | * compute paired t-test values. Unused at the moment. 85 | */ 86 | double b63_stats_ttest(b63_stats *stats) { 87 | if (stats->n == 0) { 88 | return 0.0; 89 | } 90 | double n = stats->n; 91 | double s = stats->sum; 92 | double ssq = stats->sum_squared; 93 | 94 | return sqrtl((n * s * s - s * s) / (n * ssq - s * s)); 95 | } 96 | 97 | #endif 98 | -------------------------------------------------------------------------------- /include/b63/utils/string.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_UTILS_STRING_H_ 18 | #define _B63_UTILS_STRING_H_ 19 | 20 | #include 21 | #include 22 | 23 | /* returns 1 if string range [b, e) starts with prefix */ 24 | int8_t b63_range_starts_with(const char *b, const char *e, const char *prefix) { 25 | size_t lstr = e - b; 26 | size_t lprefix = strlen(prefix); 27 | return lstr < lprefix ? 0 : (memcmp(prefix, b, lprefix) == 0); 28 | } 29 | 30 | /* count how many times c is contained in s */ 31 | size_t b63_str_count(const char *s, char c) { 32 | if (s == NULL) { 33 | return 0; 34 | } 35 | size_t res = 0; 36 | for (; *s; s++) { 37 | if (*s == c) { 38 | res++; 39 | } 40 | } 41 | return res; 42 | } 43 | 44 | #endif 45 | -------------------------------------------------------------------------------- /include/b63/utils/timer.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_UTILS_TIME_H_ 18 | #define _B63_UTILS_TIME_H_ 19 | #ifdef NO_GET_TIME_SUPPORTED 20 | #include 21 | #else 22 | #include 23 | #endif 24 | 25 | /* 26 | * This is used NOT for benchmarking itself, but to track 27 | * time passed and control the number of iterations. 28 | * See src/run.h -> b63_run_benchmark implementation for usage; 29 | */ 30 | int64_t b63_now_ms() { 31 | #ifdef NO_GET_TIME_SUPPORTED 32 | struct timeval tv; 33 | 34 | if (gettimeofday(&tv, NULL) == 0) 35 | return (uint64_t) (tv.tv_sec * 1000000 + tv.tv_usec) / 1000; 36 | else 37 | return 0; 38 | #else 39 | struct timespec t; 40 | clock_gettime(CLOCK_MONOTONIC, &t); 41 | int64_t res = 1000 * t.tv_sec + t.tv_nsec / 1000000; 42 | return res; 43 | #endif 44 | } 45 | 46 | #endif 47 | -------------------------------------------------------------------------------- /include/b63/utils/ttable.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright 2019 Oleksandr Kuvshynov 3 | * 4 | * Licensed under the Apache License, Version 2.0 (the "License"); 5 | * you may not use this file except in compliance with the License. 6 | * You may obtain a copy of the License at 7 | * 8 | * http://www.apache.org/licenses/LICENSE-2.0 9 | * 10 | * Unless required by applicable law or agreed to in writing, software 11 | * distributed under the License is distributed on an "AS IS" BASIS, 12 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | * See the License for the specific language governing permissions and 14 | * limitations under the License. 15 | */ 16 | 17 | #ifndef _B63_UTILS_TTABLE_H_ 18 | #define _B63_UTILS_TTABLE_H_ 19 | 20 | #define b63_ttable_0005_size 500 21 | static double b63_ttable_0005[b63_ttable_0005_size] = { 22 | 0.0, 63.65674, 9.92484, 5.84091, 4.60409, 4.03214, 3.70743, 3.49948, 23 | 3.35539, 3.24984, 3.16927, 3.10581, 3.05454, 3.01228, 2.97684, 2.94671, 24 | 2.92078, 2.89823, 2.87844, 2.86093, 2.84534, 2.83136, 2.81876, 2.80734, 25 | 2.79694, 2.78744, 2.77871, 2.77068, 2.76326, 2.75639, 2.75000, 2.74404, 26 | 2.73848, 2.73328, 2.72839, 2.72381, 2.71948, 2.71541, 2.71156, 2.70791, 27 | 2.70446, 2.70118, 2.69807, 2.69510, 2.69228, 2.68959, 2.68701, 2.68456, 28 | 2.68220, 2.67995, 2.67779, 2.67572, 2.67373, 2.67182, 2.66998, 2.66822, 29 | 2.66651, 2.66487, 2.66329, 2.66176, 2.66028, 2.65886, 2.65748, 2.65615, 30 | 2.65485, 2.65360, 2.65239, 2.65122, 2.65008, 2.64898, 2.64790, 2.64686, 31 | 2.64585, 2.64487, 2.64391, 2.64298, 2.64208, 2.64120, 2.64034, 2.63950, 32 | 2.63869, 2.63790, 2.63712, 2.63637, 2.63563, 2.63491, 2.63421, 2.63353, 33 | 2.63286, 2.63220, 2.63157, 2.63094, 2.63033, 2.62973, 2.62915, 2.62858, 34 | 2.62802, 2.62747, 2.62693, 2.62641, 2.62589, 2.62539, 2.62489, 2.62441, 35 | 2.62393, 2.62347, 2.62301, 2.62256, 2.62212, 2.62169, 2.62126, 2.62085, 36 | 2.62044, 2.62004, 2.61964, 2.61926, 2.61888, 2.61850, 2.61814, 2.61778, 37 | 2.61742, 2.61707, 2.61673, 2.61639, 2.61606, 2.61573, 2.61541, 2.61510, 38 | 2.61478, 2.61448, 2.61418, 2.61388, 2.61359, 2.61330, 2.61302, 2.61274, 39 | 2.61246, 2.61219, 2.61193, 2.61166, 2.61140, 2.61115, 2.61090, 2.61065, 40 | 2.61040, 2.61016, 2.60992, 2.60969, 2.60946, 2.60923, 2.60900, 2.60878, 41 | 2.60856, 2.60834, 2.60813, 2.60792, 2.60771, 2.60751, 2.60730, 2.60710, 42 | 2.60691, 2.60671, 2.60652, 2.60633, 2.60614, 2.60595, 2.60577, 2.60559, 43 | 2.60541, 2.60523, 2.60506, 2.60489, 2.60471, 2.60455, 2.60438, 2.60421, 44 | 2.60405, 2.60389, 2.60373, 2.60357, 2.60342, 2.60326, 2.60311, 2.60296, 45 | 2.60281, 2.60267, 2.60252, 2.60238, 2.60223, 2.60209, 2.60195, 2.60181, 46 | 2.60168, 2.60154, 2.60141, 2.60128, 2.60115, 2.60102, 2.60089, 2.60076, 47 | 2.60063, 2.60051, 2.60039, 2.60026, 2.60014, 2.60002, 2.59991, 2.59979, 48 | 2.59967, 2.59956, 2.59944, 2.59933, 2.59922, 2.59911, 2.59900, 2.59889, 49 | 2.59878, 2.59868, 2.59857, 2.59846, 2.59836, 2.59826, 2.59816, 2.59806, 50 | 2.59796, 2.59786, 2.59776, 2.59766, 2.59756, 2.59747, 2.59737, 2.59728, 51 | 2.59719, 2.59709, 2.59700, 2.59691, 2.59682, 2.59673, 2.59664, 2.59656, 52 | 2.59647, 2.59638, 2.59630, 2.59621, 2.59613, 2.59604, 2.59596, 2.59588, 53 | 2.59580, 2.59572, 2.59564, 2.59556, 2.59548, 2.59540, 2.59532, 2.59525, 54 | 2.59517, 2.59509, 2.59502, 2.59494, 2.59487, 2.59480, 2.59472, 2.59465, 55 | 2.59458, 2.59451, 2.59444, 2.59437, 2.59430, 2.59423, 2.59416, 2.59409, 56 | 2.59402, 2.59396, 2.59389, 2.59383, 2.59376, 2.59369, 2.59363, 2.59357, 57 | 2.59350, 2.59344, 2.59338, 2.59331, 2.59325, 2.59319, 2.59313, 2.59307, 58 | 2.59301, 2.59295, 2.59289, 2.59283, 2.59277, 2.59271, 2.59265, 2.59260, 59 | 2.59254, 2.59248, 2.59243, 2.59237, 2.59232, 2.59226, 2.59221, 2.59215, 60 | 2.59210, 2.59204, 2.59199, 2.59194, 2.59189, 2.59183, 2.59178, 2.59173, 61 | 2.59168, 2.59163, 2.59158, 2.59153, 2.59148, 2.59143, 2.59138, 2.59133, 62 | 2.59128, 2.59123, 2.59118, 2.59114, 2.59109, 2.59104, 2.59099, 2.59095, 63 | 2.59090, 2.59086, 2.59081, 2.59076, 2.59072, 2.59067, 2.59063, 2.59058, 64 | 2.59054, 2.59050, 2.59045, 2.59041, 2.59037, 2.59032, 2.59028, 2.59024, 65 | 2.59020, 2.59015, 2.59011, 2.59007, 2.59003, 2.58999, 2.58995, 2.58991, 66 | 2.58987, 2.58983, 2.58979, 2.58975, 2.58971, 2.58967, 2.58963, 2.58959, 67 | 2.58955, 2.58952, 2.58948, 2.58944, 2.58940, 2.58937, 2.58933, 2.58929, 68 | 2.58925, 2.58922, 2.58918, 2.58915, 2.58911, 2.58907, 2.58904, 2.58900, 69 | 2.58897, 2.58893, 2.58890, 2.58886, 2.58883, 2.58879, 2.58876, 2.58873, 70 | 2.58869, 2.58866, 2.58863, 2.58859, 2.58856, 2.58853, 2.58849, 2.58846, 71 | 2.58843, 2.58840, 2.58836, 2.58833, 2.58830, 2.58827, 2.58824, 2.58821, 72 | 2.58818, 2.58815, 2.58811, 2.58808, 2.58805, 2.58802, 2.58799, 2.58796, 73 | 2.58793, 2.58790, 2.58787, 2.58784, 2.58781, 2.58779, 2.58776, 2.58773, 74 | 2.58770, 2.58767, 2.58764, 2.58761, 2.58759, 2.58756, 2.58753, 2.58750, 75 | 2.58747, 2.58745, 2.58742, 2.58739, 2.58736, 2.58734, 2.58731, 2.58728, 76 | 2.58726, 2.58723, 2.58720, 2.58718, 2.58715, 2.58713, 2.58710, 2.58707, 77 | 2.58705, 2.58702, 2.58700, 2.58697, 2.58695, 2.58692, 2.58690, 2.58687, 78 | 2.58685, 2.58682, 2.58680, 2.58677, 2.58675, 2.58673, 2.58670, 2.58668, 79 | 2.58665, 2.58663, 2.58661, 2.58658, 2.58656, 2.58654, 2.58651, 2.58649, 80 | 2.58647, 2.58644, 2.58642, 2.58640, 2.58638, 2.58635, 2.58633, 2.58631, 81 | 2.58629, 2.58626, 2.58624, 2.58622, 2.58620, 2.58618, 2.58615, 2.58613, 82 | 2.58611, 2.58609, 2.58607, 2.58605, 2.58603, 2.58600, 2.58598, 2.58596, 83 | 2.58594, 2.58592, 2.58590, 2.58588, 2.58586, 2.58584, 2.58582, 2.58580, 84 | 2.58578, 2.58576, 2.58574, 2.58572}; 85 | 86 | #endif 87 | -------------------------------------------------------------------------------- /ref/api.md: -------------------------------------------------------------------------------- 1 | # B63 API Reference 2 | 3 | ## Core Macros 4 | 5 | ### Benchmark Definition 6 | 7 | ```c 8 | B63_BENCHMARK(name, n_param) 9 | ``` 10 | - Defines a benchmark function with name `name` 11 | - `n_param` is the parameter for iteration count 12 | - Must contain code to be measured 13 | 14 | ```c 15 | B63_BASELINE(name, n_param) 16 | ``` 17 | - Similar to `B63_BENCHMARK` but marks this as the baseline for comparison 18 | - Other benchmarks will be compared against this baseline 19 | 20 | ### Counter Definition 21 | 22 | ```c 23 | B63_COUNTER(name) 24 | ``` 25 | - Defines a custom counter with name `name` 26 | - Must return an `int64_t` value representing the counter's current value 27 | 28 | ### Benchmark Control 29 | 30 | ```c 31 | B63_KEEP(value) 32 | ``` 33 | - Prevents compiler from optimizing away calculations 34 | - Should be used on the result of benchmarked operations 35 | 36 | ```c 37 | B63_SUSPEND { ... } 38 | ``` 39 | - Temporarily suspends measurement 40 | - Useful for excluding setup/teardown operations from benchmarks 41 | 42 | ### Benchmark Execution 43 | 44 | ```c 45 | B63_RUN(argc, argv) 46 | ``` 47 | - Runs all registered benchmarks with default counters 48 | - Processes command-line arguments from `argc` and `argv` 49 | 50 | ```c 51 | B63_RUN_WITH(counters, argc, argv) 52 | ``` 53 | - Runs all registered benchmarks with specified counters 54 | - `counters` is a comma-separated string of counter names 55 | 56 | ## Data Structures 57 | 58 | ### Benchmark Structure 59 | 60 | ```c 61 | typedef struct { 62 | // Function pointer to benchmark implementation 63 | void (*fn)(int64_t iterations, void *opaque); 64 | // Benchmark name 65 | const char *name; 66 | // User data 67 | void *udata; 68 | } b63_benchmark; 69 | ``` 70 | 71 | ### Counter Structure 72 | 73 | ```c 74 | typedef struct { 75 | // Initialize counter 76 | void (*init)(b63_counter *c); 77 | // Clean up counter resources 78 | void (*clean)(b63_counter *c); 79 | // Read current counter value 80 | int64_t (*read)(void); 81 | // User data for counter 82 | void *udata; 83 | // Counter name 84 | const char *name; 85 | } b63_counter; 86 | ``` 87 | 88 | ### Suite Structure 89 | 90 | ```c 91 | typedef struct { 92 | // List of benchmarks 93 | b63_benchmark_list benchmarks; 94 | // List of counters 95 | b63_counter_list counters; 96 | // Printer for results 97 | b63_printer printer; 98 | // Configuration 99 | b63_config config; 100 | } b63_suite; 101 | ``` 102 | 103 | ### Configuration Structure 104 | 105 | ```c 106 | typedef struct { 107 | // Whether to run in interactive mode 108 | int interactive; 109 | // Number of epochs to run 110 | int epoch_count; 111 | // Time limit for benchmarks (seconds) 112 | double time_limit; 113 | // Output delimiter 114 | const char *delimiter; 115 | // Random seed 116 | unsigned int seed; 117 | } b63_config; 118 | ``` 119 | 120 | ## Functions 121 | 122 | ### Suite Management 123 | 124 | ```c 125 | // Initialize suite with default configuration 126 | void b63_suite_init(b63_suite *suite); 127 | 128 | // Process command-line arguments into configuration 129 | int b63_suite_parse_args(b63_suite *suite, int argc, char **argv); 130 | 131 | // Register a benchmark with the suite 132 | void b63_suite_register(b63_suite *suite, b63_benchmark *bm); 133 | 134 | // Register a counter with the suite 135 | void b63_suite_add_counter(b63_suite *suite, b63_counter *counter); 136 | 137 | // Run all benchmarks in the suite 138 | void b63_suite_run(b63_suite *suite); 139 | ``` 140 | 141 | ### Counter Management 142 | 143 | ```c 144 | // Initialize a counter list 145 | void b63_counter_list_init(b63_counter_list *list); 146 | 147 | // Add a counter to a list 148 | void b63_counter_list_add(b63_counter_list *list, b63_counter *counter); 149 | 150 | // Find a counter by name 151 | b63_counter *b63_counter_list_find(b63_counter_list *list, const char *name); 152 | 153 | // Clean up a counter list 154 | void b63_counter_list_clean(b63_counter_list *list); 155 | ``` 156 | 157 | ### Utility Functions 158 | 159 | ```c 160 | // Get current time in nanoseconds 161 | int64_t b63_now_ns(void); 162 | 163 | // Calculate statistics (mean, stdev, etc.) from samples 164 | void b63_stats_calc(double *samples, int n, b63_stats *stats); 165 | 166 | // Format a table row for output 167 | void b63_ttable_row(b63_ttable *table, const char *format, ...); 168 | ``` -------------------------------------------------------------------------------- /ref/architecture.md: -------------------------------------------------------------------------------- 1 | # B63 Architecture Reference 2 | 3 | ## Core Components 4 | 5 | B63 is organized around several key components that work together to provide benchmarking functionality: 6 | 7 | ### 1. Benchmark Definition 8 | 9 | Benchmarks are defined using macros that create functions with a standardized signature: 10 | 11 | - `B63_BENCHMARK`: Defines a benchmark to be measured 12 | - `B63_BASELINE`: Defines a baseline benchmark for comparison 13 | 14 | ### 2. Suite Management 15 | 16 | The benchmark suite handles configuration, initialization, and execution: 17 | 18 | - Suite initialization processes command line arguments 19 | - Counter registration and management 20 | - Benchmark execution and result collection 21 | - Output formatting and display 22 | 23 | ### 3. Counter System 24 | 25 | Counters provide the metrics for measurement: 26 | 27 | - Built-in counters (time, cycles) 28 | - Platform-specific counters (perf_events, jemalloc) 29 | - Custom counters for specialized metrics 30 | 31 | ### 4. Execution Flow 32 | 33 | 1. Initialize suite configuration from command line 34 | 2. Register counters specified in configuration 35 | 3. Execute each benchmark for the specified number of epochs 36 | 4. Collect and process measurements 37 | 5. Format and display results 38 | 39 | ## Key Data Structures 40 | 41 | ### Benchmark Structure (`b63_benchmark`) 42 | 43 | Represents a single benchmark function: 44 | 45 | - Function pointer to the benchmark implementation 46 | - Name for identification 47 | - Configuration parameters 48 | 49 | ### Epoch Structure (`b63_epoch`) 50 | 51 | Represents a single measurement unit: 52 | 53 | - Iteration count 54 | - Counter values before and after execution 55 | 56 | ### Counter Structure (`b63_counter`) 57 | 58 | Represents a specific performance metric: 59 | 60 | - Function pointers for initialization, cleanup, and measurement 61 | - Name for identification 62 | - Configuration parameters 63 | 64 | ### Suite Structure (`b63_suite`) 65 | 66 | Represents the entire benchmark suite: 67 | 68 | - List of registered benchmarks 69 | - List of registered counters 70 | - Configuration parameters 71 | - Output printer configuration 72 | 73 | ## File Organization 74 | 75 | The B63 library is organized into several header files, each with specific responsibilities: 76 | 77 | ### Core Headers 78 | 79 | - `b63.h`: Main include file that brings in all necessary components 80 | - `benchmark.h`: Core benchmark definition and structures 81 | - `suite.h`: Suite configuration and management 82 | - `run.h`: Execution and control flow for benchmarks 83 | 84 | ### Counter Implementation 85 | 86 | - `counter.h`: Counter interface definition 87 | - `counter_list.h`: Management of counter collections 88 | - `counters/*.h`: Individual counter implementations: 89 | - `cycles.h`: CPU cycle counting 90 | - `time.h`: Wall clock time measurement 91 | - `jemalloc.h`: Memory allocation tracking 92 | - `perf_events.h`: Linux perf events integration 93 | 94 | ### Utilities 95 | 96 | - `utils/`: Helper functionality: 97 | - `section_ptr_list.h`: Pointer list management 98 | - `stats.h`: Statistical calculations 99 | - `string.h`: String manipulation 100 | - `timer.h`: Timing utilities 101 | - `ttable.h`: Text table formatting -------------------------------------------------------------------------------- /ref/counters.md: -------------------------------------------------------------------------------- 1 | # B63 Counters Reference 2 | 3 | Counters are the core measurement mechanism in B63. They provide a flexible way to measure different aspects of code performance. 4 | 5 | ## Counter Interface 6 | 7 | All counters in B63 implement a common interface defined in `counter.h`: 8 | 9 | ```c 10 | typedef struct { 11 | void (*init)(b63_counter *c); // Initialize counter 12 | void (*clean)(b63_counter *c); // Clean up counter resources 13 | int64_t (*read)(void); // Read current counter value 14 | void *udata; // User data for counter 15 | const char *name; // Counter name 16 | } b63_counter; 17 | ``` 18 | 19 | ## Built-in Counters 20 | 21 | ### Time Counter (`time.h`) 22 | 23 | Measures wall-clock time in nanoseconds: 24 | 25 | - Uses platform-specific high-resolution timing (clock_gettime, mach_absolute_time, etc.) 26 | - Provides baseline performance measurement for all platforms 27 | 28 | ### CPU Cycles Counter (`cycles.h`) 29 | 30 | Measures CPU cycles consumed: 31 | 32 | - Architecture-specific implementation (ARM64, x86) 33 | - Provides lower-level performance measurement than wall-clock time 34 | - More consistent for CPU-bound operations 35 | 36 | ### Performance Events (`perf_events.h`) 37 | 38 | Linux-specific performance counters using the perf_events subsystem: 39 | 40 | - CPU cache misses 41 | - Branch mispredictions 42 | - Instructions executed 43 | - And many other hardware performance counters 44 | 45 | ### Memory Allocation (`jemalloc.h`) 46 | 47 | Tracks memory allocation when using jemalloc memory allocator: 48 | 49 | - Thread-specific allocation tracking 50 | - Measures bytes allocated/deallocated 51 | - Requires linking against jemalloc with appropriate flags 52 | 53 | ## Custom Counters 54 | 55 | You can define custom counters using the `B63_COUNTER` macro: 56 | 57 | ```c 58 | int64_t my_metric = 0; 59 | 60 | B63_COUNTER(my_counter) { 61 | return my_metric; 62 | } 63 | ``` 64 | 65 | Custom counters can measure application-specific metrics: 66 | 67 | - Function call counts 68 | - Business-specific operations 69 | - Cache hit ratios 70 | - Resource utilization 71 | 72 | ## Counter Selection 73 | 74 | Counters can be specified at runtime using the `-c` flag: 75 | 76 | ``` 77 | ./benchmark -c time,cycles,my_counter 78 | ``` 79 | 80 | Or programmatically with `B63_RUN_WITH`: 81 | 82 | ```c 83 | int main(int argc, char **argv) { 84 | B63_RUN_WITH("time,cycles,my_counter", argc, argv); 85 | return 0; 86 | } 87 | ``` 88 | 89 | ## Platform-Specific Counters 90 | 91 | B63 conditionally compiles certain counters based on the target platform: 92 | 93 | - **Linux**: perf_events counters for detailed hardware metrics 94 | - **macOS**: kperf-based counters for Apple hardware 95 | - **All Platforms**: Time and custom counters work everywhere -------------------------------------------------------------------------------- /ref/examples.md: -------------------------------------------------------------------------------- 1 | # B63 Examples Reference 2 | 3 | The B63 project includes several example programs that demonstrate different features and usage patterns. Here's an overview of the key examples: 4 | 5 | ## Basic Usage (`basic.c`) 6 | 7 | Demonstrates the most simple usage of B63: 8 | 9 | ```c 10 | #include "../include/b63/b63.h" 11 | 12 | B63_BENCHMARK(basic, n) { 13 | int i = 0, res = 0; 14 | for (i = 0; i < n; i++) { 15 | res += rand(); 16 | } 17 | B63_KEEP(res); 18 | } 19 | 20 | int main(int argc, char **argv) { 21 | B63_RUN(argc, argv); 22 | return 0; 23 | } 24 | ``` 25 | 26 | This example: 27 | - Defines a single benchmark 28 | - Measures the time it takes to call `rand()` multiple times 29 | - Uses `B63_KEEP` to prevent the compiler from optimizing away the results 30 | 31 | ## Custom Counters (`custom.c`) 32 | 33 | Shows how to define and use a custom counter: 34 | 35 | ```c 36 | #include "../include/b63/b63.h" 37 | 38 | int64_t callcount = 0LL; 39 | 40 | // Define custom counter 41 | B63_COUNTER(calls) { return callcount; } 42 | 43 | static int f() { 44 | callcount++; 45 | return /* some value */; 46 | } 47 | 48 | B63_BASELINE(call_normal, n) { 49 | int i = 0, res = 0; 50 | for (i = 0; i < n; i++) { 51 | res += f(); 52 | } 53 | B63_KEEP(res); 54 | } 55 | 56 | B63_BENCHMARK(call_twice, n) { 57 | int i = 0, res = 0; 58 | for (i = 0; i < n; i++) { 59 | res += (f() + f()); 60 | } 61 | B63_KEEP(res); 62 | } 63 | 64 | int main(int argc, char **argv) { 65 | B63_RUN_WITH("calls", argc, argv); 66 | return 0; 67 | } 68 | ``` 69 | 70 | This example: 71 | - Defines a custom counter that tracks function calls 72 | - Compares baseline (single call) with an alternative implementation (double call) 73 | - Uses `B63_RUN_WITH` to specify which counter to use 74 | 75 | ## Baseline Comparison (`baseline.c`) 76 | 77 | Demonstrates comparing different implementations against a baseline: 78 | 79 | ```c 80 | B63_BASELINE(some_baseline, n) { 81 | // Baseline implementation 82 | } 83 | 84 | B63_BENCHMARK(another_implementation, n) { 85 | // Alternative implementation 86 | } 87 | 88 | B63_BENCHMARK(yet_another_implementation, n) { 89 | // Another alternative implementation 90 | } 91 | ``` 92 | 93 | ## Suspending Measurement (`suspend.c`) 94 | 95 | Shows how to exclude setup/teardown code from measurement: 96 | 97 | ```c 98 | B63_BENCHMARK(with_suspend, n) { 99 | // Setup (not measured) 100 | B63_SUSPEND { 101 | // Setup code here 102 | } 103 | 104 | // Measured section 105 | for (int i = 0; i < n; i++) { 106 | // Code to measure 107 | } 108 | 109 | // Teardown (not measured) 110 | B63_SUSPEND { 111 | // Cleanup code here 112 | } 113 | } 114 | ``` 115 | 116 | ## Additional Examples 117 | 118 | - **assert.c**: Demonstrates using assertions in benchmarks 119 | - **baseline_multi.c**: Multiple baseline comparisons 120 | - **jemalloc.cpp**: Memory allocation tracking with jemalloc 121 | - **l1d_miss.cpp**: Measuring cache misses 122 | - **raw.c**: Low-level benchmark implementation 123 | - **storms.cpp**: Complex benchmarking scenario 124 | 125 | ## Building and Running Examples 126 | 127 | The examples directory includes a Makefile for building the examples: 128 | 129 | ```bash 130 | # Navigate to examples directory 131 | cd examples 132 | 133 | # Build all examples 134 | make 135 | 136 | # Run a specific example 137 | ./basic 138 | ./custom 139 | ./suspend 140 | 141 | # Run with options 142 | ./basic -i -c time,cycles -e 10 143 | ``` -------------------------------------------------------------------------------- /ref/overview.md: -------------------------------------------------------------------------------- 1 | # B63 - Lightweight Micro-Benchmarking Library 2 | 3 | B63 is a lightweight micro-benchmarking tool for C programming language. It provides a simple yet powerful framework for measuring and comparing performance of code snippets with minimal overhead. 4 | 5 | ## Key Features 6 | 7 | - **C-Focused:** Designed specifically for C (not limited to C++ like many other benchmarking tools) 8 | - **Custom Counters:** Measure various performance metrics beyond just time 9 | - CPU cycles 10 | - CPU Performance Monitoring Unit counters 11 | - Cache misses, branch mispredictions 12 | - jemalloc memory allocation tracking 13 | - **Reproducibility:** Support for seed-based reproducible results 14 | - **Multiple Output Modes:** 15 | - Plain text (suitable for scripting and data processing) 16 | - Interactive mode (for human-readable analysis) 17 | 18 | ## Core Concepts 19 | 20 | ### Benchmarks and Baselines 21 | 22 | B63 organizes benchmarks around the concept of measuring how performance changes relative to a baseline. This is particularly useful for: 23 | 24 | - Comparing different implementations of the same algorithm 25 | - Measuring performance impact of code changes 26 | - Testing optimization effectiveness 27 | 28 | ### Counters 29 | 30 | The library provides a flexible counter system that allows measuring different metrics: 31 | 32 | - **Time:** Wall-clock time in nanoseconds 33 | - **Cycles:** CPU cycle count (architecture-dependent) 34 | - **Memory:** When using jemalloc, track memory allocation 35 | - **Custom:** Easily create your own counters for application-specific metrics 36 | 37 | ### Runtime Configuration 38 | 39 | B63 offers configuration through command-line arguments: 40 | 41 | - `-i`: Interactive mode for human-readable output 42 | - `-c`: Override default counters 43 | - `-e`: Set epoch count for each benchmark 44 | - `-t`: Set time limit per benchmark 45 | - `-d`: Set delimiter for plaintext output 46 | - `-s`: Set seed for reproducibility -------------------------------------------------------------------------------- /ref/usage.md: -------------------------------------------------------------------------------- 1 | # B63 Usage Guide 2 | 3 | ## Basic Usage 4 | 5 | Here's a simple example of how to use B63 for benchmarking: 6 | 7 | ```c 8 | #include "b63/b63.h" 9 | 10 | B63_BENCHMARK(my_benchmark, n) { 11 | int i = 0, res = 0; 12 | for (i = 0; i < n; i++) { 13 | res += rand(); // Operation to benchmark 14 | } 15 | B63_KEEP(res); // Prevent compiler optimization of results 16 | } 17 | 18 | int main(int argc, char **argv) { 19 | B63_RUN(argc, argv); 20 | return 0; 21 | } 22 | ``` 23 | 24 | ## Comparing Implementations (Baseline Pattern) 25 | 26 | To compare different implementations, use the baseline pattern: 27 | 28 | ```c 29 | #include "b63/b63.h" 30 | 31 | // Define baseline implementation 32 | B63_BASELINE(baseline_implementation, n) { 33 | int i = 0, res = 0; 34 | for (i = 0; i < n; i++) { 35 | res += method1(); 36 | } 37 | B63_KEEP(res); 38 | } 39 | 40 | // Define alternative implementation to compare against baseline 41 | B63_BENCHMARK(alternative_implementation, n) { 42 | int i = 0, res = 0; 43 | for (i = 0; i < n; i++) { 44 | res += method2(); 45 | } 46 | B63_KEEP(res); 47 | } 48 | 49 | int main(int argc, char **argv) { 50 | B63_RUN(argc, argv); 51 | return 0; 52 | } 53 | ``` 54 | 55 | ## Custom Counters 56 | 57 | You can define custom counters to measure specific metrics: 58 | 59 | ```c 60 | #include "b63/b63.h" 61 | 62 | int64_t my_metric_count = 0; 63 | 64 | // Define custom counter 65 | B63_COUNTER(my_metric) { 66 | return my_metric_count; 67 | } 68 | 69 | B63_BENCHMARK(benchmark_with_custom_counter, n) { 70 | int i = 0, res = 0; 71 | for (i = 0; i < n; i++) { 72 | // Perform operation and update custom metric 73 | res += operation(); 74 | my_metric_count++; 75 | } 76 | B63_KEEP(res); 77 | } 78 | 79 | int main(int argc, char **argv) { 80 | // Run with custom counter 81 | B63_RUN_WITH("my_metric", argc, argv); 82 | return 0; 83 | } 84 | ``` 85 | 86 | ## Preventing Compiler Optimization 87 | 88 | Use `B63_KEEP` to prevent the compiler from optimizing away operations that don't otherwise affect the program: 89 | 90 | ```c 91 | B63_BENCHMARK(example, n) { 92 | int i = 0, res = 0; 93 | for (i = 0; i < n; i++) { 94 | res += expensive_calculation(); 95 | } 96 | // Prevent compiler from eliminating the loop 97 | B63_KEEP(res); 98 | } 99 | ``` 100 | 101 | ## Suspending Measurement 102 | 103 | Use `B63_SUSPEND` to temporarily suspend measurement during setup or teardown operations: 104 | 105 | ```c 106 | B63_BENCHMARK(with_suspend, n) { 107 | // Setup (not measured) 108 | B63_SUSPEND { 109 | setup_data_structures(); 110 | } 111 | 112 | // Measured section 113 | for (int i = 0; i < n; i++) { 114 | perform_operation(); 115 | } 116 | 117 | // Teardown (not measured) 118 | B63_SUSPEND { 119 | cleanup_data_structures(); 120 | } 121 | } 122 | ``` 123 | 124 | ## Command Line Options 125 | 126 | When running your benchmark executable, you can use these options: 127 | 128 | ``` 129 | ./benchmark -i -c time,cycles -e 10 -t 5.0 -s 42 130 | ``` 131 | 132 | - `-i`: Enable interactive mode 133 | - `-c time,cycles`: Use only time and cycles counters 134 | - `-e 10`: Run 10 epochs for each benchmark 135 | - `-t 5.0`: Limit each benchmark to 5 seconds 136 | - `-s 42`: Use seed 42 for reproducibility --------------------------------------------------------------------------------