├── .gitignore ├── Makefile ├── README.md ├── assets └── README ├── clock └── clock.h ├── common.h ├── cycle-timer.c ├── cycle-timer.h ├── exact-int └── exact-int.h ├── hedley.h ├── huge-alloc.c ├── huge-alloc.h ├── jevents ├── Makefile ├── README.md ├── cache.c ├── cpustr.c ├── event-rmap.c ├── examples │ ├── Makefile │ ├── addr.c │ ├── cpu.c │ ├── cpu.h │ ├── hist.cc │ ├── hist.h │ ├── jestat.c │ ├── rtest.c │ ├── rtest2.c │ └── rtest3.c ├── interrupts.c ├── interrupts.h ├── jevents-internal.h ├── jevents.c ├── jevents.h ├── jsession.h ├── jsmn.c ├── jsmn.h ├── json.c ├── json.h ├── libjevents.spec ├── listevents.c ├── measure.c ├── measure.h ├── perf-iter.c ├── perf-iter.h ├── perf_event_open.c ├── rawevent.c ├── rdpmc.c ├── rdpmc.h ├── resolve.c ├── session.c ├── showevent.c ├── tester └── util.h ├── main.c ├── opt-control.h ├── page-info.c ├── page-info.h ├── pcg_basic.c.h ├── pcg_basic.h ├── perf-timer.c ├── perf-timer.h ├── random-writes.c └── scripts ├── all.sh ├── common.sh ├── interleaved-1.sh ├── pdutil.py ├── plot-csv.py ├── plot.py ├── plot1.sh ├── plot2.sh ├── plot3.sh ├── plot4.sh ├── prefetch.sh ├── rwrite-1-vs-2.sh ├── rwrite2-unrolled.sh ├── rwrite2-vs-sfence.sh └── rwrite2-vs-sfenceC.sh /.gitignore: -------------------------------------------------------------------------------- 1 | # by default exclude anythign in jevents without an extension since it's 2 | # probably a binary file 3 | /jevents/** 4 | !/jevents/**/ 5 | !/jevents/**/*.* 6 | 7 | /.* 8 | !.gitignore 9 | *.o 10 | *.a 11 | bench 12 | *.log 13 | __pycache__ 14 | /tmp/* 15 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | override CFLAGS += -Wall -Wextra -O2 -g -march=haswell -std=gnu11 -Wno-unused-parameter 2 | 3 | # uncomment to use the fast gold linker 4 | # LDFLAGS = -use-ld=gold 5 | 6 | SRCS := $(wildcard *.c) 7 | OBJECTS := $(patsubst %.c,%.o,$(SRCS)) 8 | HEADERS := $(wildcard *.h *.hpp) 9 | LDFLAGS := -lm 10 | 11 | JE_LIB := jevents/libjevents.a 12 | JE_SRC := $(wildcard jevents/*.c jevents/*.h) 13 | 14 | bench: $(OBJECTS) $(JE_LIB) 15 | $(CC) $(CFLAGS) $(CPPFLAGS) $^ $(LDFLAGS) -o $@ $(JE_LIB) 16 | 17 | %.o: %.c $(HEADERS) 18 | $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@ $< 19 | 20 | $(JE_LIB): $(JE_SRC) 21 | cd jevents && $(MAKE) MAKEFLAGS= 22 | 23 | clean: 24 | rm -f weirdo-main 25 | rm -f *.o 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Various benchmarks mostly for streams of interleaved stores, in support of the blog post [What has your microcode done for you lately?](https://travisdowns.github.io/blog/2019/03/19/random-writes-and-microcode-oh-my.html). 2 | 3 | ## Building 4 | 5 | Currently it only works on Linux, but I am interested in porting it to Windows. It should be enough to run make: 6 | 7 | make 8 | 9 | ## Usage 10 | 11 | Run with no arguments for usage info, as follows: 12 | 13 | ``` 14 | Must provide 1 or 3 arguments 15 | 16 | Usage: 17 | bench TEST_NAME 18 | 19 | TEST_NAME is one of: 20 | 21 | interleaved 22 | basic interleaved stores (1 fixed 1 variable) 23 | interleaved-pf-fixed 24 | interleaved with fixed region prefetch 25 | interleaved-pf-var 26 | interleaved with variable region prefetch 27 | interleaved-pf-both 28 | interleaved with both region prefetch 29 | interleaved-u2 30 | interleaved unrolled by 2x 31 | interleaved-u4 32 | interleaved unrolled by 4x 33 | interleaved-sfenceA 34 | interleaved with 1 sfence 35 | interleaved-sfenceB 36 | interleaved with 1 sfence 37 | interleaved-sfenceC 38 | interleaved with 2 sfences 39 | wrandom1 40 | single region random stores 41 | wrandom1-unroll 42 | wrandom1 but unrolled and fast/cheaty RNG 43 | wlinear1 44 | linear 64B stide writes over one stream 45 | wlinearHL 46 | linear with lfence 47 | wlinearHS 48 | linear with sfence 49 | wlinear1-sfence 50 | linear with sfence 51 | rlinear1 52 | linear 64B stride reads over one region 53 | lcg 54 | raw LCG test 55 | pcg 56 | raw PCG test 57 | ``` 58 | 59 | At a minimum you need to provide the test name from the list above. 60 | 61 | If you provide only the test name, default starting and stopping sizes for the region are used (4 KiB to 512 KiB): 62 | 63 | ./bench interleaved 64 | 65 | Otherwise, you can provide your starting and stopping points as the 2nd and 3rd arguments, in KiB. The plots in the blog post all use 1 to 100,000 KiB as follows: 66 | 67 | ./bench interleaved 1 100000 68 | 69 | Values are rounded up to the next power of two, so 100,000 becomes 131,072. 70 | 71 | ## Plots 72 | 73 | The `/scripts` directory contains a bunch of `.sh` scripts that I use to generate the various plots. In particular, to generate _all_ plots it should be as simple as running the `all.sh` script: 74 | 75 | SUFFIX=new scripts/all.sh 76 | 77 | The `SUFFIX` here is appended to each plot name and indicates whether an old or new microcode version was used (the exact microcode revion is also automatically added to the plot title). The output appears in the `/assets` directory. 78 | 79 | You can also run any of the individual plots that `all.sh` creates directly, e.g., `scripts/rwrite-1-vs-2.sh` will generate the first plot from the post. My default these individual scripts will pop up an interactive window with the plot, but you can write to a file by setting the `OUTFILE` environment variable. There are a variety of other variables you can set too, for example `STOP=1000 scripts/rwrite-1-vs-2.sh` will change the stopping point to 1000 KiB which results in much faster plot generation. You can take a peek at `all.sh` for some examples or other variables. 80 | 81 | ## CPU Counters 82 | 83 | We support recording various Intel performance counters using pmu-tools' jevents library. To record events, set them in the `CPU_COUNTERS` environment variable, using the _short name_ for as shown in the supported events table: 84 | 85 | | Full Name | Short Name | 86 | | ------------------------- | ----------- | 87 | | cpu_clk_unhalted.thread_p | CYCLES | 88 | | hw_interrupts.received | INTERRUPTS | 89 | | l2_rqsts.references | L2_RQSTS.ALL | 90 | | l2_rqsts.all_rfo | L2.RFO_ALL | 91 | | l2_rqsts.rfo_miss | L2.RFO_MISS | 92 | | l2_rqsts.miss | L2.ALL_MISS | 93 | | l2_rqsts.all_pf | L2.ALL_PF | 94 | | llc.ref | LLC.REFS | 95 | | llc.miss | LLC.MISS | 96 | | mem_inst_retired.all_stores | ALL_STORES | 97 | -------------------------------------------------------------------------------- /assets/README: -------------------------------------------------------------------------------- 1 | Assets go here (generally created by the scripts in /scripts). 2 | -------------------------------------------------------------------------------- /common.h: -------------------------------------------------------------------------------- 1 | #ifndef COMMON_H_ 2 | #define COMMON_H_ 3 | 4 | #include 5 | 6 | typedef int (store_function)(size_t iters, char* a1, size_t size1, char *a2, size_t size2); 7 | 8 | store_function write_random_single; 9 | store_function write_random_singleu; 10 | store_function write_linear; 11 | store_function write_linearHL; 12 | store_function write_linearHS; 13 | store_function write_linear_sfence; 14 | store_function writes_inter; 15 | store_function writes_inter_pf_fixed; 16 | store_function writes_inter_pf_var; 17 | store_function writes_inter_pf_both; 18 | store_function writes_inter_u2; 19 | store_function writes_inter_u4; 20 | store_function writes_inter_sfenceA; 21 | store_function writes_inter_sfenceB; 22 | store_function writes_inter_sfenceC; 23 | store_function read_linear; 24 | store_function random_read2; 25 | store_function random_lcg; 26 | store_function random_pcg; 27 | 28 | #endif 29 | -------------------------------------------------------------------------------- /cycle-timer.c: -------------------------------------------------------------------------------- 1 | /* 2 | * cycle-timer.c 3 | * 4 | * Implementation for cycle-timer.h 5 | */ 6 | 7 | #include "clock/clock.h" 8 | 9 | #include "cycle-timer.h" 10 | #include "hedley.h" 11 | 12 | #include 13 | #include 14 | 15 | 16 | const size_t ITERS = 10000; 17 | const size_t TRIES = 11; 18 | const size_t WARMUP = 1000; 19 | 20 | volatile size_t sink; 21 | /** 22 | * Calibration loop that relies on store throughput being exactly 1 per cycle 23 | * on all modern x86 chips, and the loop overhead running totally in parallel. 24 | */ 25 | HEDLEY_NEVER_INLINE 26 | __attribute__((aligned(32))) 27 | void store_calibration(size_t iters) { 28 | do { 29 | sink = iters; 30 | } while (--iters > 0); 31 | } 32 | 33 | int intcompare(const void *l_, const void *r_) { 34 | int64_t l = *(const uint64_t *)l_; 35 | int64_t r = *(const uint64_t *)r_; 36 | return (l > r) - (l < r); 37 | } 38 | 39 | /* 40 | * Calculate the frequency of the CPU based on timing a tight loop that we expect to 41 | * take one iteration per cycle. 42 | * 43 | * ITERS is the base number of iterations to use: the calibration routine is actually 44 | * run twice, once with ITERS iterations and once with 2*ITERS, and a delta is used to 45 | * remove measurement overhead. 46 | */ 47 | HEDLEY_NEVER_INLINE 48 | static double get_ghz(bool print) { 49 | int64_t results[TRIES]; 50 | 51 | for (size_t w = 0; w < WARMUP + 1; w++) { 52 | for (size_t r = 0; r < TRIES; r++) { 53 | cl_timepoint t0 = cl_now(); 54 | store_calibration(ITERS); 55 | cl_timepoint t1 = cl_now(); 56 | store_calibration(ITERS * 2); 57 | cl_timepoint t2 = cl_now(); 58 | results[r] = cl_delta(t1, t2).nanos - cl_delta(t0, t1).nanos; 59 | } 60 | } 61 | 62 | // return the median value 63 | qsort(results, TRIES, sizeof(results[0]), intcompare); 64 | double ghz = ((double)ITERS / results[TRIES/2]); 65 | if (print) fprintf(stderr, "Estimated CPU speed: %5.2f GHz\n", ghz); 66 | return ghz; 67 | } 68 | 69 | static bool is_init = false; 70 | double ghz; 71 | 72 | void cl_init(bool print) { 73 | if (HEDLEY_UNLIKELY(!is_init)) { 74 | ghz = get_ghz(print); 75 | is_init = true; 76 | } 77 | }; 78 | 79 | cl_timepoint cl_now() { 80 | struct PsnipClockTimespec spec; 81 | if (psnip_clock_monotonic_get_time(&spec)) { 82 | return (cl_timepoint){0}; 83 | } else { 84 | return (cl_timepoint){spec.seconds * 1000000000ll + spec.nanoseconds}; 85 | } 86 | } 87 | 88 | /* 89 | * Take an interval value and convert it to cycles based on the 90 | * detected frequency of this host. 91 | */ 92 | double cl_to_cycles(cl_interval interval) { 93 | cl_init(false); 94 | return interval.nanos * ghz; 95 | } 96 | -------------------------------------------------------------------------------- /cycle-timer.h: -------------------------------------------------------------------------------- 1 | /* 2 | * cycle-timer.h 3 | * 4 | * A timer that returns results in CPU cycles in addition to nanoseconds. 5 | * It measures cycles indirectly by measuring the wall-time, and then converting 6 | * that to a cycle count based on a calibration loop performed once at startup. 7 | */ 8 | 9 | #ifndef CYCLE_TIMER_H_ 10 | #define CYCLE_TIMER_H_ 11 | 12 | #include 13 | #include 14 | 15 | /** 16 | * A point in time, or an interval when subtracted. You should probably 17 | * treat this as an opaque struct, in case I change the implementation 18 | * someday. 19 | */ 20 | struct cl_timepoint_ { 21 | int64_t nanos; 22 | }; 23 | typedef struct cl_timepoint_ cl_timepoint; 24 | 25 | /** 26 | * An interval created by subtracting two points in time, measured 27 | * in nanoseconds. 28 | */ 29 | struct cl_interval_ { 30 | int64_t nanos; 31 | }; 32 | typedef struct cl_interval_ cl_interval; 33 | 34 | /* return the current moment in time as a cycletimer_result */ 35 | cl_timepoint cl_now(); 36 | 37 | /* 38 | * Return the interval between timepoints first and second. 39 | * This value is positive iff second occus after first. 40 | */ 41 | static inline cl_interval cl_delta(cl_timepoint first, cl_timepoint second) { 42 | return (cl_interval){second.nanos - first.nanos}; 43 | } 44 | 45 | /* 46 | * Take an interval value and convert it to cycles based on the 47 | * detected frequency of this host. 48 | */ 49 | double cl_to_cycles(cl_interval interval); 50 | 51 | /* 52 | * Initialize the cycletimer infrastructure. Mostly this just means calculating 53 | * the cycle to nanoseconds value (i.e., the CPU frequency). You never *need* to 54 | * use this function, if you haven't call it, it will happens automatically when 55 | * init is necessary (usually lazily - when accessing the cl_to_cycles), 56 | * but may be lengthy, so this method is offfered so that the user can trigger 57 | * it at a time of their choosing (and allowing the user to elect whether to 58 | * print out diagnostic information about the calibration). 59 | * 60 | * If you pass true for print, dignostic information like the detected CPU 61 | * frequency is printed to stderr. 62 | */ 63 | void cl_init(bool print); 64 | 65 | 66 | 67 | #endif /* CYCLE_TIMER_HPP_ */ 68 | -------------------------------------------------------------------------------- /exact-int/exact-int.h: -------------------------------------------------------------------------------- 1 | /* Exact-width integer types 2 | * Portable Snippets - https://gitub.com/nemequ/portable-snippets 3 | * Created by Evan Nemerson 4 | * 5 | * To the extent possible under law, the authors have waived all 6 | * copyright and related or neighboring rights to this code. For 7 | * details, see the Creative Commons Zero 1.0 Universal license at 8 | * https://creativecommons.org/publicdomain/zero/1.0/ 9 | * 10 | * This header tries to define psnip_(u)int(8|16|32|64)_t to 11 | * appropriate types given your system. For most systems this means 12 | * including and adding a few preprocessor definitions. 13 | * 14 | * If you prefer, you can define any necessary types yourself. 15 | * Snippets in this repository which rely on these types will not 16 | * attempt to include this header if you have already defined the 17 | * types it uses. 18 | */ 19 | 20 | #if !defined(PSNIP_EXACT_INT_H) 21 | # define PSNIP_EXACT_INT_H 22 | # if !defined(PSNIP_EXACT_INT_HAVE_STDINT) 23 | # if defined(_STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) 24 | # define PSNIP_EXACT_INT_HAVE_STDINT 25 | # elif defined(__has_include) 26 | # if __has_include() 27 | # define PSNIP_EXACT_INT_HAVE_STDINT 28 | # endif 29 | # elif \ 30 | defined(HAVE_STDINT_H) || \ 31 | defined(_STDINT_H_INCLUDED) || \ 32 | defined(_STDINT_H) || \ 33 | defined(_STDINT_H_) 34 | # define PSNIP_EXACT_INT_HAVE_STDINT 35 | # elif \ 36 | (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5))) || \ 37 | (defined(_MSC_VER) && (_MSC_VER >= 1600)) || \ 38 | (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x570)) || \ 39 | (defined(__WATCOMC__) && (__WATCOMC__ >= 1250)) 40 | # define PSNIP_EXACT_INT_HAVE_STDINT 41 | # endif 42 | # endif 43 | 44 | # if \ 45 | defined(__INT8_TYPE__) && defined(__INT16_TYPE__) && defined(__INT32_TYPE__) && defined(__INT64_TYPE__) && \ 46 | defined(__UINT8_TYPE__) && defined(__UINT16_TYPE__) && defined(__UINT32_TYPE__) && defined(__UINT64_TYPE__) 47 | # define psnip_int8_t __INT8_TYPE__ 48 | # define psnip_int16_t __INT16_TYPE__ 49 | # define psnip_int32_t __INT32_TYPE__ 50 | # define psnip_int64_t __INT64_TYPE__ 51 | # define psnip_uint8_t __UINT8_TYPE__ 52 | # define psnip_uint16_t __UINT16_TYPE__ 53 | # define psnip_uint32_t __UINT32_TYPE__ 54 | # define psnip_uint64_t __UINT64_TYPE__ 55 | # elif defined(PSNIP_EXACT_INT_HAVE_STDINT) 56 | # include 57 | # if !defined(psnip_int8_t) 58 | # define psnip_int8_t int8_t 59 | # endif 60 | # if !defined(psnip_uint8_t) 61 | # define psnip_uint8_t uint8_t 62 | # endif 63 | # if !defined(psnip_int16_t) 64 | # define psnip_int16_t int16_t 65 | # endif 66 | # if !defined(psnip_uint16_t) 67 | # define psnip_uint16_t uint16_t 68 | # endif 69 | # if !defined(psnip_int32_t) 70 | # define psnip_int32_t int32_t 71 | # endif 72 | # if !defined(psnip_uint32_t) 73 | # define psnip_uint32_t uint32_t 74 | # endif 75 | # if !defined(psnip_int64_t) 76 | # define psnip_int64_t int64_t 77 | # endif 78 | # if !defined(psnip_uint64_t) 79 | # define psnip_uint64_t uint64_t 80 | # endif 81 | # elif defined(_MSC_VER) 82 | # if !defined(psnip_int8_t) 83 | # define psnip_int8_t __int8 84 | # endif 85 | # if !defined(psnip_uint8_t) 86 | # define psnip_uint8_t unsigned __int8 87 | # endif 88 | # if !defined(psnip_int16_t) 89 | # define psnip_int16_t __int16 90 | # endif 91 | # if !defined(psnip_uint16_t) 92 | # define psnip_uint16_t unsigned __int16 93 | # endif 94 | # if !defined(psnip_int32_t) 95 | # define psnip_int32_t __int32 96 | # endif 97 | # if !defined(psnip_uint32_t) 98 | # define psnip_uint32_t unsigned __int32 99 | # endif 100 | # if !defined(psnip_int64_t) 101 | # define psnip_int64_t __int64 102 | # endif 103 | # if !defined(psnip_uint64_t) 104 | # define psnip_uint64_t unsigned __int64 105 | # endif 106 | # else 107 | # include 108 | # if !defined(psnip_int8_t) 109 | # if defined(CHAR_MIN) && defined(CHAR_MAX) && (CHAR_MIN == (-127-1)) && (CHAR_MAX == 127) 110 | # define psnip_int8_t char 111 | # elif defined(SHRT_MIN) && defined(SHRT_MAX) && (SHRT_MIN == (-127-1)) && (SHRT_MAX == 127) 112 | # define psnip_int8_t short 113 | # elif defined(INT_MIN) && defined(INT_MAX) && (INT_MIN == (-127-1)) && (INT_MAX == 127) 114 | # define psnip_int8_t int 115 | # elif defined(LONG_MIN) && defined(LONG_MAX) && (LONG_MIN == (-127-1)) && (LONG_MAX == 127) 116 | # define psnip_int8_t long 117 | # elif defined(LLONG_MIN) && defined(LLONG_MAX) && (LLONG_MIN == (-127-1)) && (LLONG_MAX == 127) 118 | # define psnip_int8_t long long 119 | # else 120 | # error Unable to locate 8-bit signed integer type. 121 | # endif 122 | # endif 123 | # if !defined(psnip_uint8_t) 124 | # if defined(UCHAR_MAX) && (UCHAR_MAX == 255) 125 | # define psnip_uint8_t unsigned char 126 | # elif defined(USHRT_MAX) && (USHRT_MAX == 255) 127 | # define psnip_uint8_t unsigned short 128 | # elif defined(UINT_MAX) && (UINT_MAX == 255) 129 | # define psnip_uint8_t unsigned int 130 | # elif defined(ULONG_MAX) && (ULONG_MAX == 255) 131 | # define psnip_uint8_t unsigned long 132 | # elif defined(ULLONG_MAX) && (ULLONG_MAX == 255) 133 | # define psnip_uint8_t unsigned long long 134 | # else 135 | # error Unable to locate 8-bit unsigned integer type. 136 | # endif 137 | # endif 138 | # if !defined(psnip_int16_t) 139 | # if defined(CHAR_MIN) && defined(CHAR_MAX) && (CHAR_MIN == (-32767-1)) && (CHAR_MAX == 32767) 140 | # define psnip_int16_t char 141 | # elif defined(SHRT_MIN) && defined(SHRT_MAX) && (SHRT_MIN == (-32767-1)) && (SHRT_MAX == 32767) 142 | # define psnip_int16_t short 143 | # elif defined(INT_MIN) && defined(INT_MAX) && (INT_MIN == (-32767-1)) && (INT_MAX == 32767) 144 | # define psnip_int16_t int 145 | # elif defined(LONG_MIN) && defined(LONG_MAX) && (LONG_MIN == (-32767-1)) && (LONG_MAX == 32767) 146 | # define psnip_int16_t long 147 | # elif defined(LLONG_MIN) && defined(LLONG_MAX) && (LLONG_MIN == (-32767-1)) && (LLONG_MAX == 32767) 148 | # define psnip_int16_t long long 149 | # else 150 | # error Unable to locate 16-bit signed integer type. 151 | # endif 152 | # endif 153 | # if !defined(psnip_uint16_t) 154 | # if defined(UCHAR_MAX) && (UCHAR_MAX == 65535) 155 | # define psnip_uint16_t unsigned char 156 | # elif defined(USHRT_MAX) && (USHRT_MAX == 65535) 157 | # define psnip_uint16_t unsigned short 158 | # elif defined(UINT_MAX) && (UINT_MAX == 65535) 159 | # define psnip_uint16_t unsigned int 160 | # elif defined(ULONG_MAX) && (ULONG_MAX == 65535) 161 | # define psnip_uint16_t unsigned long 162 | # elif defined(ULLONG_MAX) && (ULLONG_MAX == 65535) 163 | # define psnip_uint16_t unsigned long long 164 | # else 165 | # error Unable to locate 16-bit unsigned integer type. 166 | # endif 167 | # endif 168 | # if !defined(psnip_int32_t) 169 | # if defined(CHAR_MIN) && defined(CHAR_MAX) && (CHAR_MIN == (-2147483647-1)) && (CHAR_MAX == 2147483647) 170 | # define psnip_int32_t char 171 | # elif defined(SHRT_MIN) && defined(SHRT_MAX) && (SHRT_MIN == (-2147483647-1)) && (SHRT_MAX == 2147483647) 172 | # define psnip_int32_t short 173 | # elif defined(INT_MIN) && defined(INT_MAX) && (INT_MIN == (-2147483647-1)) && (INT_MAX == 2147483647) 174 | # define psnip_int32_t int 175 | # elif defined(LONG_MIN) && defined(LONG_MAX) && (LONG_MIN == (-2147483647-1)) && (LONG_MAX == 2147483647) 176 | # define psnip_int32_t long 177 | # elif defined(LLONG_MIN) && defined(LLONG_MAX) && (LLONG_MIN == (-2147483647-1)) && (LLONG_MAX == 2147483647) 178 | # define psnip_int32_t long long 179 | # else 180 | # error Unable to locate 32-bit signed integer type. 181 | # endif 182 | # endif 183 | # if !defined(psnip_uint32_t) 184 | # if defined(UCHAR_MAX) && (UCHAR_MAX == 4294967295) 185 | # define psnip_uint32_t unsigned char 186 | # elif defined(USHRT_MAX) && (USHRT_MAX == 4294967295) 187 | # define psnip_uint32_t unsigned short 188 | # elif defined(UINT_MAX) && (UINT_MAX == 4294967295) 189 | # define psnip_uint32_t unsigned int 190 | # elif defined(ULONG_MAX) && (ULONG_MAX == 4294967295) 191 | # define psnip_uint32_t unsigned long 192 | # elif defined(ULLONG_MAX) && (ULLONG_MAX == 4294967295) 193 | # define psnip_uint32_t unsigned long long 194 | # else 195 | # error Unable to locate 32-bit unsigned integer type. 196 | # endif 197 | # endif 198 | # if !defined(psnip_int64_t) 199 | # if defined(CHAR_MIN) && defined(CHAR_MAX) && (CHAR_MIN == (-9223372036854775807LL-1)) && (CHAR_MAX == 9223372036854775807LL) 200 | # define psnip_int64_t char 201 | # elif defined(SHRT_MIN) && defined(SHRT_MAX) && (SHRT_MIN == (-9223372036854775807LL-1)) && (SHRT_MAX == 9223372036854775807LL) 202 | # define psnip_int64_t short 203 | # elif defined(INT_MIN) && defined(INT_MAX) && (INT_MIN == (-9223372036854775807LL-1)) && (INT_MAX == 9223372036854775807LL) 204 | # define psnip_int64_t int 205 | # elif defined(LONG_MIN) && defined(LONG_MAX) && (LONG_MIN == (-9223372036854775807LL-1)) && (LONG_MAX == 9223372036854775807LL) 206 | # define psnip_int64_t long 207 | # elif defined(LLONG_MIN) && defined(LLONG_MAX) && (LLONG_MIN == (-9223372036854775807LL-1)) && (LLONG_MAX == 9223372036854775807LL) 208 | # define psnip_int64_t long long 209 | # else 210 | # error Unable to locate 64-bit signed integer type. 211 | # endif 212 | # endif 213 | # if !defined(psnip_uint64_t) 214 | # if defined(UCHAR_MAX) && (UCHAR_MAX == 18446744073709551615ULL) 215 | # define psnip_uint64_t unsigned char 216 | # elif defined(USHRT_MAX) && (USHRT_MAX == 18446744073709551615ULL) 217 | # define psnip_uint64_t unsigned short 218 | # elif defined(UINT_MAX) && (UINT_MAX == 18446744073709551615ULL) 219 | # define psnip_uint64_t unsigned int 220 | # elif defined(ULONG_MAX) && (ULONG_MAX == 18446744073709551615ULL) 221 | # define psnip_uint64_t unsigned long 222 | # elif defined(ULLONG_MAX) && (ULLONG_MAX == 18446744073709551615ULL) 223 | # define psnip_uint64_t unsigned long long 224 | # else 225 | # error Unable to locate 64-bit unsigned integer type. 226 | # endif 227 | # endif 228 | # endif 229 | #endif 230 | -------------------------------------------------------------------------------- /huge-alloc.c: -------------------------------------------------------------------------------- 1 | /* 2 | * huge-alloc.cpp 3 | */ 4 | 5 | #include "huge-alloc.h" 6 | 7 | #include 8 | #include 9 | #include 10 | 11 | #include 12 | #include 13 | 14 | #include "page-info.h" 15 | 16 | #define HUGE_PAGE_SIZE ((size_t)(2u * 1024u * 1024u)) 17 | #define HUGE_PAGE_MASK ((size_t)-HUGE_PAGE_SIZE) 18 | 19 | /* allocate size bytes of storage in a hugepage */ 20 | void *huge_alloc(size_t user_size, bool print) { 21 | if (user_size > MAX_HUGE_ALLOC) { 22 | fprintf(stderr, "request exceeds MAX_HUGE_ALLOC in %s, check your math\n", __func__); 23 | return 0; 24 | } 25 | 26 | // we request size + 2 * HUGE_PAGE_SIZE so we'll always have at least one huge page boundary in the allocation 27 | size_t mmap_size = user_size + 2 * HUGE_PAGE_SIZE; 28 | 29 | char *mmap_p = (char *)mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); 30 | 31 | if (mmap_p == MAP_FAILED) { 32 | fprintf(stderr, "MMAP failed in %s\n", __func__); 33 | return 0; 34 | } 35 | 36 | // align up to a hugepage boundary 37 | char *aligned_p = (char *)(((uintptr_t)mmap_p + HUGE_PAGE_SIZE) & HUGE_PAGE_MASK); 38 | 39 | madvise(aligned_p, user_size + HUGE_PAGE_SIZE, MADV_HUGEPAGE); 40 | 41 | // touch the memory so we can get stats on it 42 | memset(aligned_p, 0xCC, user_size); 43 | 44 | page_info_array info = get_info_for_range(aligned_p, aligned_p + user_size); 45 | flag_count fcount = get_flag_count(info, KPF_THP); 46 | if (print) { 47 | if (user_size > 0 && fcount.pages_available == 0) { 48 | fprintf(stderr, "failed to get any huge page info - probably you need to run as root\n"); 49 | } else { 50 | fprintf(stderr, "hugepage ratio %4.3f (available %4.3f) for allocation of size %zu\n", 51 | (double)fcount.pages_set/fcount.pages_available, 52 | (double)fcount.pages_available/fcount.pages_total, 53 | user_size); 54 | } 55 | } 56 | 57 | return aligned_p; 58 | } 59 | -------------------------------------------------------------------------------- /huge-alloc.h: -------------------------------------------------------------------------------- 1 | /* 2 | * huge-alloc.hpp 3 | * 4 | * Inefficient allocator that allows allocating memory regions backed by THP pages. 5 | */ 6 | 7 | #ifndef HUGE_ALLOC_HPP_ 8 | #define HUGE_ALLOC_HPP_ 9 | 10 | #include 11 | #include 12 | 13 | // 1152921504606846976 bytes should be enough for everyone 14 | #define MAX_HUGE_ALLOC (1ULL << 60) 15 | 16 | #ifdef __cplusplus 17 | extern "C" { 18 | #endif 19 | 20 | /* allocate size bytes of storage in a hugepage */ 21 | void *huge_alloc(size_t size, bool print); 22 | 23 | /* free the pointer pointed to by p */ 24 | void huge_free(void *p); 25 | 26 | #ifdef __cplusplus 27 | } 28 | #endif 29 | 30 | 31 | #endif /* HUGE_ALLOC_HPP_ */ 32 | -------------------------------------------------------------------------------- /jevents/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY = all clean-examples all-examples install clean html man 2 | PREFIX=$(DESTDIR)/usr/local 3 | LIB=$(PREFIX)/lib64 4 | BIN=$(PREFIX)/bin 5 | INCLUDE=$(PREFIX)/include 6 | CFLAGS := -g -fPIC -Wall -O2 -Wno-unused-result 7 | OBJ := json.o jsmn.o jevents.o resolve.o cache.o cpustr.o rawevent.o \ 8 | perf-iter.o interrupts.o rdpmc.o measure.o perf_event_open.o \ 9 | session.o 10 | KDOC = /usr/src/linux/scripts/kernel-doc 11 | 12 | all: libjevents.a showevent listevents event-rmap all-examples 13 | 14 | clean-examples: 15 | make -C examples clean 16 | 17 | all-examples: libjevents.a 18 | make -C examples 19 | 20 | install: libjevents.a listevents showevent event-rmap 21 | install -d ${BIN} 22 | install -d ${LIB} 23 | install -d ${INCLUDE} 24 | install -m 755 listevents showevent event-rmap ${BIN} 25 | install -m 644 libjevents.a ${LIB} 26 | install -m 644 rdpmc.h jevents.h measure.h perf-iter.h jsession.h ${INCLUDE} 27 | # xxx install man page 28 | 29 | libjevents.a: ${OBJ} 30 | rm -f libjevents.a 31 | ar q libjevents.a $^ 32 | ranlib libjevents.a 33 | 34 | clean: clean-examples 35 | rm -f ${OBJ} libjevents.a resolve showevent listfiles jevents.html rmap event-rmap.o event-rmap \ 36 | listevents resolve-test showevent.o listevents.o 37 | 38 | resolve: resolve.c 39 | $(CC) $(CFLAGS) -DTEST=1 -o $@ $^ 40 | 41 | showevent: showevent.o libjevents.a 42 | 43 | listevents: listevents.o libjevents.a 44 | 45 | event-rmap: event-rmap.o libjevents.a 46 | 47 | DOCFILES := cache.c jevents.c cpustr.c rawevent.c interrupts.c measure.c rdpmc.c \ 48 | session.c 49 | 50 | html: jevents.html 51 | 52 | man: jeventstmp.man 53 | perl -ne 's/Kernel Hacker.s Manual/jevents/; open(F,">" . $$1 . ".man") if /^\.TH "(.*?)"/; print F $$_' jevents.man 54 | 55 | jeventstmp.man: $(DOCFILES) 56 | ${KDOC} -man ${DOCFILES} > $@ 57 | 58 | jevents.html: $(DOCFILES) 59 | ${KDOC} -html ${DOCFILES} > $@ 60 | -------------------------------------------------------------------------------- /jevents/README.md: -------------------------------------------------------------------------------- 1 | # jevents 2 | 3 | jevents is a C library to use from C programs to make access to the kernel Linux perf interface easier. 4 | It also includes some examples to use the library. 5 | 6 | ## Features 7 | 8 | * Resolving symbolic event names using downloaded event files 9 | * Reading performance counters from ring 3 in C programs, 10 | * Handling the perf ring buffer (for example to read memory addresses) 11 | 12 | For more details see the [API reference](http://halobates.de/jevents.html) 13 | 14 | ## Building 15 | 16 | cd jevents 17 | make 18 | sudo make install 19 | 20 | ## Downloading event lists 21 | 22 | Before using event lists they need to be downloaded. Use the pmu-tools 23 | event_download.py script for this. 24 | 25 | % event_download.py 26 | 27 | ## Examples 28 | 29 | * listevents: List all named perf and JSON events 30 | * showevent: Convert JSON name or perf alias to perf format and test with perf 31 | * event-rmap: Map low level perf event to named high-level event 32 | * addr: Profile a loadable test kernel with address profiling 33 | * jstat: Simple perf stat like tool with JSON event resolution. 34 | 35 | ## Initialization/Multithreading 36 | 37 | Functions accessing the JSON event data load the JSON file lazily when first 38 | used. This might result in data races when multiple threads call jevent 39 | functions. In such cases the event list can be loaded from the main thread by 40 | `read_events(NULL);`. 41 | 42 | ## self profiling 43 | 44 | Reading performance counters directly in the program without entering 45 | the kernel. 46 | 47 | This is very simplified, for a real benchmark you almost certainly 48 | want some warmup, multiple iterations, possibly context switch 49 | filtering and some filler code to avoid cache effects. 50 | 51 | ```C 52 | #include "rdpmc.h" 53 | 54 | struct rdpmc_ctx ctx; 55 | unsigned long long start, end; 56 | 57 | if (rdpmc_open(PERF_COUNT_HW_CPU_CYCLES, &ctx) < 0) ... error ... 58 | start = rdpmc_read(&ctx); 59 | ... your workload ... 60 | end = rdpmc_read(&ctx); 61 | ``` 62 | 63 | /sys/devices/cpu/rdpmc must be 1. 64 | 65 | http://halobates.de/modern-pmus-yokohama.pdf provides some 66 | additional general information on cycle counting. The techniques used 67 | with simple-pmu described there can be used with jevents too. 68 | 69 | ## Resolving named events 70 | 71 | Resolving named events to a perf event and set up reading from the perf ring buffer. 72 | 73 | First run event_download.py to download a current event list for your CPU. 74 | 75 | ```C 76 | #include "jevents.h" 77 | #include "perf-iter.h" 78 | #include 79 | #include 80 | #include 81 | 82 | struct perf_event_attr attr; 83 | if (resolve_event("cpu_clk_thread_unhalted.ref_xclk", &attr) < 0) { 84 | ... error ... 85 | } 86 | 87 | /* You can change attr, see the perf_event_open man page for details */ 88 | 89 | ''' 90 | -------------------------------------------------------------------------------- /jevents/cache.c: -------------------------------------------------------------------------------- 1 | /* Caching layer to resolve events without re-reading them */ 2 | 3 | /* 4 | * Copyright (c) 2014, Intel Corporation 5 | * Author: Andi Kleen 6 | * All rights reserved. 7 | * 8 | * Redistribution and use in source and binary forms, with or without 9 | * modification, are permitted provided that the following conditions are met: 10 | * 11 | * 1. Redistributions of source code must retain the above copyright notice, 12 | * this list of conditions and the following disclaimer. 13 | * 14 | * 2. Redistributions in binary form must reproduce the above copyright 15 | * notice, this list of conditions and the following disclaimer in the 16 | * documentation and/or other materials provided with the distribution. 17 | * 18 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 19 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 20 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 21 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 22 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 23 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 27 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 28 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 29 | * OF THE POSSIBILITY OF SUCH DAMAGE. 30 | */ 31 | 32 | #define _GNU_SOURCE 1 33 | #include "jevents.h" 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | 42 | /** 43 | * DOC: Resolve named Intel performance events to perf 44 | * 45 | * This library allows to resolve named Intel performance counter events 46 | * (for example INST_RETIRED.ANY) 47 | * by name and turn them into perf_event_attr attributes. It also 48 | * supports listing all events and resolving numeric events back to names. 49 | * 50 | * The standard workflow is the user calling "event_download.py" 51 | * to download the current list, and then 52 | * these functions can resolve or walk names. Alternatively 53 | * a JSON event file from https://download.01.org/perfmon 54 | * can be specified through the EVENTMAP= environment variable. 55 | */ 56 | 57 | struct event { 58 | struct event *next; 59 | char *name; 60 | char *desc; 61 | char *event; 62 | char *pmu; 63 | }; 64 | 65 | #define HASHSZ 37 66 | 67 | static struct event *eventlist[HASHSZ]; 68 | static bool eventlist_init; 69 | 70 | /* Weinberg's identifier hash */ 71 | static unsigned hashfn(const char *s) 72 | { 73 | unsigned h = 0; 74 | while (*s) { 75 | int c = tolower(*s); 76 | s++; 77 | h = h * 67 + (c - 113); 78 | } 79 | return h % HASHSZ; 80 | } 81 | 82 | static int collect_events(void *data, char *name, char *event, char *desc, 83 | char *pmu) 84 | { 85 | unsigned h = hashfn(name); 86 | struct event *e = malloc(sizeof(struct event)); 87 | if (!e) 88 | exit(ENOMEM); 89 | e->next = eventlist[h]; 90 | eventlist[h] = e; 91 | e->name = strdup(name); 92 | e->desc = strdup(desc); 93 | e->event = strdup(event); 94 | e->pmu = strdup(pmu); 95 | return 0; 96 | } 97 | 98 | static void free_events(void) 99 | { 100 | struct event *e, *next; 101 | int i; 102 | for (i = 0; i < HASHSZ; i++) { 103 | for (e = eventlist[i]; e; e = next) { 104 | next = e->next; 105 | free(e->name); 106 | free(e->desc); 107 | free(e->event); 108 | free(e->pmu); 109 | free(e); 110 | } 111 | eventlist[i] = NULL; 112 | } 113 | eventlist_init = false; 114 | } 115 | 116 | /** 117 | * read_events - Read JSON performance counter event list 118 | * @fn: File name to read. NULL to chose default location. 119 | * 120 | * Read the JSON event list fn. The other functions in the library 121 | * automatically read the default event list for the current CPU, 122 | * but calling this explicitly is useful to chose a specific one. 123 | * 124 | * This function is not thread safe and should not be called 125 | * from multiple threads in parallel. However once it is called 126 | * once all other functions are thread-safe. So for multi-threaded 127 | * use the main thread should call it once before other threads. 128 | * 129 | * Return: -1 on failure, otherwise 0. 130 | */ 131 | int read_events(const char *fn) 132 | { 133 | if (eventlist_init) { 134 | // treat subsequent read_events calls after the first as replacing the 135 | // event list 136 | free_events(); 137 | } 138 | eventlist_init = true; 139 | /* ??? free on error */ 140 | return json_events(fn, collect_events, NULL); 141 | } 142 | 143 | static struct fixed { 144 | char *name; 145 | char *event; 146 | } fixed[] = { 147 | { "inst_retired.any", "event=0xc0" }, 148 | { "cpu_clk_unhalted.thread", "event=0x3c" }, 149 | { "cpu_clk_unhalted.thread_any", "event=0x3c,any=1" }, 150 | {}, 151 | }; 152 | 153 | /* 154 | * Handle different fixed counter encodings between JSON and perf. 155 | */ 156 | static char *real_event(char *name, char *event) 157 | { 158 | int i; 159 | for (i = 0; fixed[i].name; i++) 160 | if (!strcasecmp(name, fixed[i].name)) 161 | return fixed[i].event; 162 | return event; 163 | } 164 | 165 | /** 166 | * resolve_event - Resolve named performance counter event 167 | * @name: Name of performance counter event (case in-sensitive) 168 | * @attr: perf_event_attr to initialize with name. 169 | * 170 | * The attr structure is cleared initially. 171 | * The user typically has to set up attr->sample_type/read_format 172 | * _after_ this call. 173 | * Note this function is only thread-safe when read_events() has 174 | * been called first single-threaded. 175 | * Return: -1 on failure, otherwise 0. 176 | */ 177 | 178 | int resolve_event(const char *name, struct perf_event_attr *attr) 179 | { 180 | struct event *e; 181 | char *buf; 182 | int ret; 183 | unsigned h = hashfn(name); 184 | 185 | if (!eventlist_init) { 186 | if (read_events(NULL) < 0) 187 | return -1; 188 | } 189 | for (e = eventlist[h]; e; e = e->next) { 190 | if (!strcasecmp(e->name, name)) { 191 | char *event = real_event(e->name, e->event); 192 | asprintf(&buf, "%s/%s/", e->pmu, event); 193 | ret = jevent_name_to_attr(buf, attr); 194 | free(buf); 195 | return ret; 196 | } 197 | } 198 | /* Try a perf style event */ 199 | if (jevent_name_to_attr(name, attr) == 0) 200 | return 0; 201 | asprintf(&buf, "cpu/%s/", name); 202 | ret = jevent_name_to_attr(buf, attr); 203 | free(buf); 204 | if (ret == 0) 205 | return ret; 206 | return -1; 207 | } 208 | 209 | /** 210 | * walk_events - Walk all the available performance counter events 211 | * @func: Callback to call on each event. 212 | * @data: Abstract data pointer to pass to callback. 213 | * 214 | * The callback gets passed the data argument, the name of the 215 | * event, the translated event in perf form (cpu/.../) and a 216 | * description of the event. 217 | * 218 | * Return: -1 on failure, otherwise 0. 219 | */ 220 | 221 | int walk_events(int (*func)(void *data, char *name, char *event, char *desc), 222 | void *data) 223 | { 224 | struct event *e; 225 | if (!eventlist_init) { 226 | if (read_events(NULL) < 0) 227 | return -1; 228 | } 229 | int i; 230 | for (i = 0; i < HASHSZ; i++) { 231 | for (e = eventlist[i]; e; e = e->next) { 232 | char *buf; 233 | asprintf(&buf, "%s/%s/", e->pmu, e->event); 234 | int ret = func(data, e->name, buf, e->desc); 235 | free(buf); 236 | if (ret) 237 | return ret; 238 | } 239 | } 240 | return 0; 241 | } 242 | 243 | /** 244 | * rmap_event - Map numeric event back to name and description. 245 | * @target: Event code to match (umask + event). 246 | * @name: Put pointer to event name into this. No need to free. 247 | * @desc: Put pointer to description into this. No need to free. Can be NULL. 248 | * 249 | * Offcore matrix events are not fully supported. 250 | * Ignores bits other than umask/event for now, so some events using cmask,inv 251 | * may be misidentified. May be slow. 252 | * Return: -1 on failure, otherwise 0. 253 | */ 254 | 255 | int rmap_event(unsigned target, char **name, char **desc) 256 | { 257 | struct event *e; 258 | if (!eventlist_init) { 259 | if (read_events(NULL) < 0) 260 | return -1; 261 | } 262 | int i; 263 | for (i = 0; i < HASHSZ; i++) { 264 | for (e = eventlist[i]; e; e = e->next) { 265 | // XXX should cache the numeric value 266 | char *s; 267 | unsigned event = 0, umask = 0; 268 | s = strstr(e->event, "event="); 269 | if (s) 270 | sscanf(s, "event=%x", &event); 271 | s = strstr(e->event, "umask="); 272 | if (s) 273 | sscanf(s, "umask=%x", &umask); 274 | if ((event | (umask << 8)) == (target & 0xffff)) { 275 | *name = e->name; 276 | if (desc) 277 | *desc = e->desc; 278 | return 0; 279 | } 280 | } 281 | } 282 | return -1; 283 | 284 | } 285 | -------------------------------------------------------------------------------- /jevents/cpustr.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2014, Intel Corporation 3 | * Author: Andi Kleen 4 | * All rights reserved. 5 | * 6 | * Redistribution and use in source and binary forms, with or without 7 | * modification, are permitted provided that the following conditions are met: 8 | * 9 | * 1. Redistributions of source code must retain the above copyright notice, 10 | * this list of conditions and the following disclaimer. 11 | * 12 | * 2. Redistributions in binary form must reproduce the above copyright 13 | * notice, this list of conditions and the following disclaimer in the 14 | * documentation and/or other materials provided with the distribution. 15 | * 16 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 17 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 18 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 19 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 20 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 21 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 22 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 23 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 25 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 26 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 27 | * OF THE POSSIBILITY OF SUCH DAMAGE. 28 | */ 29 | 30 | #define _GNU_SOURCE 1 31 | #include 32 | #include 33 | #include "jevents.h" 34 | 35 | /** 36 | * get_cpu_str - Return string describing the current CPU or NULL. 37 | * Needs to be freed by caller. 38 | * 39 | * Used to store JSON event lists in the cache directory. 40 | */ 41 | char *get_cpu_str(void) 42 | { 43 | return get_cpu_str_type("-core", NULL); 44 | } 45 | 46 | /** 47 | * get_cpu_str - Return string describing the current CPU for type or NULL. 48 | * @type: "-core" or "-uncore" 49 | * @idstr_step: if non NULL write idstr with stepping to pointer. 50 | * Both result and idstr_step (if non NULL) need to be freed by 51 | * caller. 52 | */ 53 | char *get_cpu_str_type(char *type, char **idstr_step) 54 | { 55 | char *line = NULL; 56 | size_t llen = 0; 57 | int found = 0, n; 58 | char vendor[30]; 59 | int model = 0, fam = 0, step = 0; 60 | char *res = NULL; 61 | FILE *f = fopen("/proc/cpuinfo", "r"); 62 | 63 | if (!f) 64 | return NULL; 65 | while (getline(&line, &llen, f) > 0) { 66 | if (sscanf(line, "vendor_id : %29s", vendor) == 1) 67 | found++; 68 | else if (sscanf(line, "model : %d", &model) == 1) 69 | found++; 70 | else if (sscanf(line, "cpu family : %d", &fam) == 1) 71 | found++; 72 | else if (sscanf(line, "stepping : %d", &step) == 1) 73 | found++; 74 | if (found == 4) { 75 | if (idstr_step) 76 | asprintf(idstr_step, "%s-%d-%X-%X%s", vendor, fam, 77 | model, step, type); 78 | n = asprintf(&res, "%s-%d-%X%s", vendor, fam, model, 79 | type); 80 | if (n < 0) 81 | res = NULL; 82 | break; 83 | } 84 | } 85 | free(line); 86 | fclose(f); 87 | return res; 88 | } 89 | -------------------------------------------------------------------------------- /jevents/event-rmap.c: -------------------------------------------------------------------------------- 1 | #include "jevents.h" 2 | #include 3 | #include 4 | 5 | int main(int ac, char **av) 6 | { 7 | while (*++av) { 8 | unsigned event = strtoul(*av, NULL, 0); 9 | char *name, *desc; 10 | if (rmap_event(event, &name, &desc) == 0) 11 | printf("%x: %s : %s\n", event, name, desc); 12 | else 13 | printf("%x not found\n", event); 14 | } 15 | return 0; 16 | } 17 | -------------------------------------------------------------------------------- /jevents/examples/Makefile: -------------------------------------------------------------------------------- 1 | # build jevents first 2 | CFLAGS := -g -Wall -O2 -I .. -Wno-unused-result 3 | CXXFLAGS := -g -Wall -O2 -fPIC 4 | LDFLAGS := -L .. 5 | LDLIBS = -ljevents 6 | 7 | all: addr rtest rtest2 rtest3 jestat 8 | 9 | # no deps on the includes 10 | 11 | ADDR_OBJ := addr.o hist.o cpu.o 12 | 13 | addr: ${ADDR_OBJ} ../libjevents.a 14 | 15 | addr: LDLIBS += -lstdc++ -ldl 16 | 17 | rtest2: LDLIBS += -lm 18 | 19 | rtest: rtest.o ../libjevents.a 20 | 21 | rtest2: rtest2.o ../libjevents.a 22 | 23 | rtest3: rtest3.o ../libjevents.a 24 | 25 | jestat: jestat.o ../libjevents.a 26 | 27 | clean: 28 | rm -f addr ${ADDR_OBJ} jestat jestat.o 29 | rm -f rtest3 rtest3.o rtest2 rtest2.o rtest rtest.o 30 | -------------------------------------------------------------------------------- /jevents/examples/addr.c: -------------------------------------------------------------------------------- 1 | /* 2 | * perf address sampling self profiling demo. 3 | * Requires a 3.10+ kernel with PERF_SAMPLE_ADDR support and a supported Intel CPU. 4 | * 5 | * Copyright (c) 2013 Intel Corporation 6 | * Author: Andi Kleen 7 | * 8 | * Redistribution and use in source and binary forms, with or without 9 | * modification, are permitted provided that: (1) source code distributions 10 | * retain the above copyright notice and this paragraph in its entirety, (2) 11 | * distributions including binary code include the above copyright notice and 12 | * this paragraph in its entirety in the documentation or other materials 13 | * provided with the distribution 14 | * 15 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 16 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 17 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 18 | */ 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | 30 | #include "hist.h" 31 | #include "perf-iter.h" 32 | #include "util.h" 33 | #include "cpu.h" 34 | 35 | /* 2^n size of event ring buffer (in pages) */ 36 | #define BUF_SIZE_SHIFT 8 37 | 38 | #define SIZE ((100*MB)/sizeof(float)) 39 | 40 | float *x, *y; 41 | 42 | void simple_test_init(void) 43 | { 44 | y = calloc(SIZE, sizeof(float)); 45 | x = calloc(SIZE, sizeof(float)); 46 | int i; 47 | for (i = 0; i < SIZE; i++) { 48 | y[i] = 1.0; 49 | x[i] = 2.0; 50 | } 51 | 52 | printf("test area %p-%p, %p-%p\n", x, x+SIZE, y, y+SIZE); 53 | } 54 | 55 | void simple_test_load(void) 56 | { 57 | int i; 58 | int j; 59 | for (j = 0; j < 20; j++) { 60 | for (i = 0; i < SIZE; i++) { 61 | y[i] = y[i] * x[i]; 62 | mb(); /* Don't optimize the loop away */ 63 | } 64 | } 65 | } 66 | 67 | void (*test_init)(void) = simple_test_init; 68 | void (*test_load)(void) = simple_test_load; 69 | 70 | void gen_hist(char *name, struct perf_fd *pfd) 71 | { 72 | struct perf_iter iter; 73 | struct hist *h = init_hist(); 74 | 75 | perf_iter_init(&iter, pfd); 76 | int samples = 0, others = 0, throttled = 0, skipped = 0; 77 | u64 lost = 0; 78 | while (!perf_iter_finished(&iter)) { 79 | char buffer[64]; 80 | struct perf_event_header *hdr = perf_buffer_read(&iter, buffer, 64); 81 | 82 | if (!hdr) { 83 | skipped++; 84 | continue; 85 | } 86 | 87 | if (hdr->type != PERF_RECORD_SAMPLE) { 88 | if (hdr->type == PERF_RECORD_THROTTLE) 89 | throttled++; 90 | else if (hdr->type == PERF_RECORD_LOST) 91 | lost += perf_hdr_payload(hdr)[1]; 92 | else 93 | others++; 94 | continue; 95 | } 96 | samples++; 97 | if (hdr->size != 16) { 98 | printf("unexpected sample size %d\n", hdr->size); 99 | continue; 100 | } 101 | 102 | u64 val = perf_hdr_payload(hdr)[0]; 103 | /* Filter out kernel samples, which can happen due to OOO skid */ 104 | if ((long long)val < 0) 105 | continue; 106 | hist_add(h, val); 107 | } 108 | perf_iter_continue(&iter); 109 | 110 | printf("%s: %d samples, %d others, %llu lost, %d throttled, %d skipped\n", 111 | name, 112 | samples, 113 | others, 114 | lost, 115 | throttled, 116 | skipped); 117 | hist_print(h, 0.001); 118 | free_hist(h); 119 | } 120 | 121 | int main(int ac, char **av) 122 | { 123 | bool cycles_only = false; 124 | 125 | /* Set up perf for loads */ 126 | struct perf_event_attr attr = { 127 | .type = PERF_TYPE_RAW, 128 | .size = PERF_ATTR_SIZE_VER0, 129 | .sample_type = PERF_SAMPLE_ADDR, 130 | .sample_period = 10000, /* Period */ 131 | .exclude_kernel = 1, 132 | .precise_ip = 1, /* Enable PEBS */ 133 | .config1 = 3, /* Load Latency threshold */ 134 | .config = mem_loads_event(), /* Event */ 135 | .disabled = 1, 136 | }; 137 | 138 | if (attr.config == -1) { 139 | printf("Unknown CPU model\n"); 140 | exit(1); 141 | } 142 | 143 | if (av[1] && !strcmp(av[1], "cycles")) { 144 | attr.sample_type = PERF_SAMPLE_IP; 145 | attr.precise_ip = 0; 146 | attr.config = 0x3c; 147 | cycles_only = true; 148 | av--; 149 | } 150 | 151 | if (av[1]) { 152 | void *test_obj; 153 | test_obj = dlopen(av[1], RTLD_NOW); 154 | if (!test_obj) { 155 | fprintf(stderr, "Cannot load %s: %s\n", av[1], dlerror()); 156 | exit(1); 157 | } 158 | test_init = dlsym(test_obj, "test_init"); 159 | test_load = dlsym(test_obj, "test_load"); 160 | if (!test_init || !test_load) { 161 | fprintf(stderr, "%s missing test_init or test_load symbols: %s\n", 162 | av[1], dlerror()); 163 | exit(1); 164 | } 165 | } 166 | 167 | struct perf_fd loads, stores; 168 | if (perf_fd_open(&loads, &attr, BUF_SIZE_SHIFT) < 0) 169 | err("perf event init loads"); 170 | printf("loads event %llx\n", attr.config); 171 | 172 | bool have_stores = false; 173 | if (0 && !cycles_only) { 174 | attr.config = mem_stores_event(); 175 | attr.config1 = 0; 176 | if (perf_fd_open(&stores, &attr, BUF_SIZE_SHIFT) < 0) 177 | err("perf event init stores"); 178 | printf("stores event %llx\n", attr.config); 179 | have_stores = true; 180 | } 181 | 182 | test_init(); 183 | 184 | /* Run measurement */ 185 | 186 | if (perf_enable(&loads) < 0) 187 | err("PERF_EVENT_IOC_ENABLE"); 188 | if (0) 189 | perf_enable(&stores); 190 | 191 | test_load(); 192 | 193 | if (perf_disable(&loads) < 0) 194 | err("PERF_EVENT_IOC_DISABLE"); 195 | if (0) 196 | perf_disable(&stores); 197 | 198 | gen_hist("loads", &loads); 199 | perf_fd_close(&loads); 200 | if (have_stores) { 201 | gen_hist("stores", &stores); 202 | perf_fd_close(&stores); 203 | } 204 | 205 | return 0; 206 | } 207 | -------------------------------------------------------------------------------- /jevents/examples/cpu.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | /* CPU detection and event tables */ 18 | #include 19 | #include 20 | #include 21 | #include 22 | 23 | #include "cpu.h" 24 | 25 | struct cpu_events { 26 | int *models; 27 | unsigned mem_stores; 28 | unsigned mem_loads; 29 | }; 30 | 31 | #define MEM_LOADS_SNB 0x1cd /* MEM_TRANS_RETIRED.LOAD_LATENCY */ 32 | #define MEM_STORES_SNB 0x2cd /* MEM_TRANS_RETIRED.PRECISE_STORES */ 33 | static int snb_models[] = { 42, 45, 58, 62, 0 }; 34 | 35 | #define MEM_LOADS_HSW MEM_LOADS_SNB 36 | #define MEM_STORES_HSW 0x82d0 37 | static int hsw_models[] = { 60, 70, 71, 63, 61, 0 }; 38 | 39 | /* Nehalem and Westmere */ 40 | #define MEM_LOADS_NHM 0x100b /* MEM_INST_RETIRED.LOAD_LATENCY */ 41 | #define MEM_STORES_NHM -1 /* not supported */ 42 | 43 | static int nhm_models[] = { 26, 30, 46, 37, 44, 47, 0 }; 44 | 45 | struct cpu_events events[] = { 46 | { snb_models, MEM_STORES_SNB, MEM_LOADS_SNB }, 47 | { nhm_models, MEM_STORES_NHM, MEM_LOADS_NHM }, 48 | { hsw_models, MEM_STORES_HSW, MEM_LOADS_HSW }, 49 | {} 50 | }; 51 | 52 | static unsigned get_cpu_model(void) 53 | { 54 | unsigned sig; 55 | if (__get_cpuid_max(0, &sig) >= 1 && sig == *(int *)"Genu") { 56 | unsigned a, b, c, d; 57 | __cpuid(1, a, b, c, d); 58 | unsigned family = (a >> 8) & 0xf; 59 | if (family == 6) 60 | return ((a >> 4) & 0xf) + (((a >> 16) & 0xf) << 4); 61 | } 62 | return 0; 63 | } 64 | 65 | static bool match_cpu_model(int mod, int *models) 66 | { 67 | int i; 68 | for (i = 0; models[i]; i++) 69 | if (models[i] == mod) 70 | return true; 71 | return false; 72 | } 73 | 74 | /** 75 | * mem_stores_event - Return precise mem load event for current CPU. 76 | * This is an event which supports load address monitoring. 77 | * Return: raw event, can be put int perf_event_attr->config. 78 | * -1 or error. 79 | */ 80 | 81 | unsigned mem_loads_event(void) 82 | { 83 | int mod = get_cpu_model(); 84 | int i; 85 | for (i = 0; events[i].models; i++) 86 | if (match_cpu_model(mod, events[i].models)) 87 | return events[i].mem_loads; 88 | return -1; 89 | } 90 | 91 | /** 92 | * mem_stores_event - Return precise mem stores event for current CPU. 93 | * This is an event which supports load address monitoring. 94 | * Return: raw event, can be put int perf_event_attr->config. 95 | * -1 or error. 96 | */ 97 | unsigned mem_stores_event(void) 98 | { 99 | int mod = get_cpu_model(); 100 | int i; 101 | for (i = 0; events[i].models; i++) 102 | if (match_cpu_model(mod, events[i].models)) 103 | return events[i].mem_stores; 104 | return -1; 105 | } 106 | -------------------------------------------------------------------------------- /jevents/examples/cpu.h: -------------------------------------------------------------------------------- 1 | unsigned mem_loads_event(void); 2 | unsigned mem_stores_event(void); 3 | -------------------------------------------------------------------------------- /jevents/examples/hist.cc: -------------------------------------------------------------------------------- 1 | // STL based histogram 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include "hist.h" 7 | 8 | using namespace std; 9 | 10 | extern "C" { 11 | 12 | typedef map hist_type; 13 | 14 | struct hist { 15 | hist_type hist; 16 | uint64_t total; 17 | }; 18 | 19 | hist *init_hist() 20 | { 21 | struct hist *h = new hist; 22 | h->total = 0; 23 | return h; 24 | } 25 | 26 | void hist_add(hist *h, uint64_t val) 27 | { 28 | h->hist[val]++; 29 | h->total++; 30 | } 31 | 32 | void hist_print(hist *h, double min_percent) 33 | { 34 | unsigned long long below_thresh = 0; 35 | typedef pair val_pair; 36 | priority_queue q; 37 | 38 | for (hist_type::iterator it = h->hist.begin(); it != h->hist.end(); it++) { 39 | double percent = (double)(it->second) / (double)h->total; 40 | if (percent >= min_percent) { 41 | val_pair p(it->second, it->first); 42 | q.push(p); 43 | } else 44 | below_thresh += it->second; 45 | } 46 | printf("%11s %16s %16s\n", "PERCENT", "ADDR", "SAMPLES"); 47 | while (!q.empty()) { 48 | val_pair p = q.top(); 49 | printf("%10.2f%% %16llx %16llu\n", 50 | (p.first / (double)h->total) * 100.0, 51 | (unsigned long long)p.second, 52 | (unsigned long long)p.first); 53 | q.pop(); 54 | } 55 | printf("%llu below threshold\n", below_thresh); 56 | } 57 | 58 | void free_hist(hist *h) 59 | { 60 | delete h; 61 | } 62 | 63 | } 64 | -------------------------------------------------------------------------------- /jevents/examples/hist.h: -------------------------------------------------------------------------------- 1 | 2 | #ifdef __cplusplus 3 | extern "C" { 4 | #endif 5 | 6 | #include 7 | 8 | struct hist; 9 | 10 | struct hist *init_hist(void); 11 | void hist_add(struct hist *h, uint64_t); 12 | void hist_print(struct hist *h, double min_percent); 13 | void free_hist(struct hist *); 14 | 15 | #ifdef __cplusplus 16 | } 17 | #endif 18 | -------------------------------------------------------------------------------- /jevents/examples/jestat.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2015, Intel Corporation 3 | * Author: Andi Kleen 4 | * All rights reserved. 5 | * 6 | * Redistribution and use in source and binary forms, with or without 7 | * modification, are permitted provided that the following conditions are met: 8 | * 9 | * 1. Redistributions of source code must retain the above copyright notice, 10 | * this list of conditions and the following disclaimer. 11 | * 12 | * 2. Redistributions in binary form must reproduce the above copyright 13 | * notice, this list of conditions and the following disclaimer in the 14 | * documentation and/or other materials provided with the distribution. 15 | * 16 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 17 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 18 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 19 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 20 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 21 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 22 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 23 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 25 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 26 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 27 | * OF THE POSSIBILITY OF SUCH DAMAGE. 28 | */ 29 | 30 | /* Poor man's perf stat using jevents */ 31 | /* jstat [-a] [-p pid] [-e events] program */ 32 | /* Supports named events if downloaded first (w/ event_download.py) */ 33 | /* Run listevents to show the available events */ 34 | 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | #include 44 | #include 45 | #include "jevents.h" 46 | #include "jsession.h" 47 | 48 | #define err(x) perror(x), exit(1) 49 | #define PAIR(x) x, sizeof(x) - 1 50 | 51 | void print_data(struct eventlist *el) 52 | { 53 | struct event *e; 54 | int i; 55 | 56 | for (e = el->eventlist; e; e = e->next) { 57 | uint64_t v = 0; 58 | for (i = 0; i < el->num_cpus; i++) 59 | v += event_scaled_value(e, i); 60 | printf("%-30s %'10lu\n", e->event, v); 61 | } 62 | } 63 | 64 | static struct option opts[] = { 65 | { "all-cpus", no_argument, 0, 'a' }, 66 | { "events", required_argument, 0, 'e'}, 67 | {}, 68 | }; 69 | 70 | void usage(void) 71 | { 72 | fprintf(stderr, "Usage: jstat [-a] [-e events] program\n" 73 | "--all -a Measure global system\n" 74 | "-e --events list Comma separate list of events to measure. Use {} for groups\n" 75 | "Run event_download.py once first to use symbolic events\n"); 76 | exit(1); 77 | } 78 | 79 | void sigint(int sig) {} 80 | 81 | int main(int ac, char **av) 82 | { 83 | char *events = "instructions,cpu-cycles,cache-misses,cache-references"; 84 | int opt; 85 | int child_pipe[2]; 86 | struct eventlist *el; 87 | bool measure_all = false; 88 | int measure_pid = -1; 89 | int child_pid; 90 | 91 | setlocale(LC_NUMERIC, ""); 92 | el = alloc_eventlist(); 93 | 94 | while ((opt = getopt_long(ac, av, "ae:p:", opts, NULL)) != -1) { 95 | switch (opt) { 96 | case 'e': 97 | if (parse_events(el, optarg) < 0) 98 | exit(1); 99 | events = NULL; 100 | break; 101 | case 'a': 102 | measure_all = true; 103 | break; 104 | default: 105 | usage(); 106 | } 107 | } 108 | if (av[optind] == NULL && !measure_all) { 109 | fprintf(stderr, "Specify command or -a\n"); 110 | exit(1); 111 | } 112 | if (events && parse_events(el, events) < 0) 113 | exit(1); 114 | pipe(child_pipe); 115 | signal(SIGCHLD, SIG_IGN); 116 | child_pid = measure_pid = fork(); 117 | if (measure_pid < 0) 118 | err("fork"); 119 | if (measure_pid == 0) { 120 | char buf; 121 | /* Wait for events to be set up */ 122 | read(child_pipe[0], &buf, 1); 123 | if (av[optind] == NULL) { 124 | pause(); 125 | _exit(0); 126 | } 127 | execvp(av[optind], av + optind); 128 | write(2, PAIR("Cannot execute program\n")); 129 | _exit(1); 130 | } 131 | if (setup_events(el, measure_all, measure_pid) < 0) 132 | exit(1); 133 | signal(SIGINT, sigint); 134 | if (child_pid >= 0) { 135 | write(child_pipe[1], "x", 1); 136 | waitpid(measure_pid, NULL, 0); 137 | } else { 138 | pause(); 139 | } 140 | read_all_events(el); 141 | print_data(el); 142 | return 0; 143 | } 144 | -------------------------------------------------------------------------------- /jevents/examples/rtest.c: -------------------------------------------------------------------------------- 1 | /* Demonstrate self profiling for context switches */ 2 | #include 3 | #include 4 | #include 5 | #include "rdpmc.h" 6 | 7 | #define HW_INTERRUPTS 0x1cb 8 | 9 | typedef unsigned long long u64; 10 | 11 | u64 get_time(void) 12 | { 13 | struct timeval tv; 14 | gettimeofday(&tv, NULL); 15 | return (u64)tv.tv_sec * 1000000 + tv.tv_usec; 16 | } 17 | 18 | int main(int ac, char **av) 19 | { 20 | int i; 21 | int cswitch = 0; 22 | struct rdpmc_ctx ctx; 23 | int iter = 10000; 24 | 25 | if (av[1]) 26 | iter = atoi(av[1]); 27 | 28 | if (rdpmc_open(HW_INTERRUPTS, &ctx) < 0) 29 | exit(1); 30 | 31 | u64 t0 = get_time(); 32 | u64 prev = rdpmc_read(&ctx); 33 | for (i = 0; i < iter; i++) { 34 | u64 n = rdpmc_read(&ctx); 35 | if (n != prev) { 36 | cswitch++; 37 | prev = n; 38 | } 39 | } 40 | 41 | u64 t1 = get_time(); 42 | 43 | printf("%d interrupts, %llu usec duration\n", cswitch, t1-t0); 44 | 45 | rdpmc_close(&ctx); 46 | return 0; 47 | } 48 | -------------------------------------------------------------------------------- /jevents/examples/rtest2.c: -------------------------------------------------------------------------------- 1 | /* Measure a thousand sins */ 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include "interrupts.h" 7 | #include "rdpmc.h" 8 | 9 | /* Requires a Intel Sandy or Ivy Bridge CPU for the interrupt test, 10 | On others it may loop forever, unless you disable the interrupt test. 11 | This is not a realistic test of real performance because it's too 12 | predictable for cache and branch predictors, 13 | see http://halobates.de/blog/p/227 */ 14 | 15 | #define ITER 1000 16 | typedef unsigned long long u64; 17 | 18 | volatile double var = 10.0; 19 | volatile double var2; 20 | 21 | int main(void) 22 | { 23 | struct rdpmc_ctx ctx; 24 | int warmup = 0; 25 | 26 | if (rdpmc_open(PERF_COUNT_HW_CPU_CYCLES, &ctx) < 0) 27 | exit(1); 28 | interrupts_init(); 29 | for (;;) { 30 | int i; 31 | u64 start_int; 32 | u64 a, b; 33 | 34 | start_int = get_interrupts(); 35 | a = rdpmc_read(&ctx); 36 | for (i = 0; i < ITER; i++) 37 | var2 += sin(var); 38 | b = rdpmc_read(&ctx); 39 | if (get_interrupts() == start_int && warmup > 0) { 40 | printf("%u sin() took %llu cycles avg\n", ITER, (b-a)/ITER); 41 | break; 42 | } 43 | warmup++; 44 | } 45 | interrupts_exit(); 46 | rdpmc_close(&ctx); 47 | return 0; 48 | } 49 | -------------------------------------------------------------------------------- /jevents/examples/rtest3.c: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include "rdpmc.h" 7 | 8 | typedef unsigned long long u64; 9 | typedef long long s64; 10 | 11 | u64 get_time(void) 12 | { 13 | struct timeval tv; 14 | gettimeofday(&tv, NULL); 15 | return (u64)tv.tv_sec * 1000000 + tv.tv_usec; 16 | } 17 | 18 | volatile int interrupted; 19 | 20 | void stop(int sig) 21 | { 22 | interrupted = 1; 23 | } 24 | 25 | int main(int ac, char **av) 26 | { 27 | int i; 28 | struct rdpmc_ctx ctx; 29 | int thresh = 10000; 30 | 31 | if (av[1]) 32 | thresh = atoi(av[1]); 33 | 34 | if (rdpmc_open(0, &ctx) < 0) 35 | exit(1); 36 | 37 | signal(SIGINT, stop); 38 | 39 | printf("Press Ctrl-C to stop\n"); 40 | 41 | u64 prev = rdpmc_read(&ctx); 42 | 43 | i = 0; 44 | while (!interrupted) { 45 | u64 next = rdpmc_read(&ctx); 46 | s64 delta = next - prev; 47 | 48 | if (delta > thresh) 49 | printf("%d: %lld\n", i, delta); 50 | 51 | prev = next; 52 | i++; 53 | } 54 | 55 | rdpmc_close(&ctx); 56 | return 0; 57 | } 58 | -------------------------------------------------------------------------------- /jevents/interrupts.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2012,2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | /** DOC: Account for interrupts on Intel Core/Xeon systems 18 | * 19 | * This is useful for micro benchmarks to filter out measurement 20 | * samples that are disturbed by a context switch caused by OS 21 | * noise. 22 | * 23 | * Requires a Linux 3.3+ kernel 24 | */ 25 | #include "rdpmc.h" 26 | #include "interrupts.h" 27 | 28 | /* Intel Sandy Bridge */ 29 | #define HW_INTERRUPTS 0x1cb 30 | 31 | static __thread int int_ok = -1; 32 | static __thread struct rdpmc_ctx int_ctx; 33 | 34 | /** 35 | * interrupts_init - Initialize interrupt counter per thread 36 | * 37 | * Must be called for each application thread. 38 | */ 39 | void interrupts_init(void) 40 | { 41 | int_ok = rdpmc_open(HW_INTERRUPTS, &int_ctx); 42 | } 43 | 44 | /** 45 | * interrupts_exit - Free interrupt counter per thread. 46 | * 47 | * Must be called for each application thread. 48 | */ 49 | void interrupts_exit(void) 50 | { 51 | if (int_ok >= 0) 52 | rdpmc_close(&int_ctx); 53 | } 54 | 55 | /** 56 | * get_interrupts - get current interrupt counter. 57 | * 58 | * Get the current hardware interrupt count. When the number changed 59 | * for a measurement period you had some sort of context switch. 60 | * The sample for this period should be discarded. 61 | * This returns absolute numbers. 62 | */ 63 | unsigned long long get_interrupts(void) 64 | { 65 | if (int_ok >= 0) 66 | return rdpmc_read(&int_ctx); 67 | return 0; 68 | } 69 | -------------------------------------------------------------------------------- /jevents/interrupts.h: -------------------------------------------------------------------------------- 1 | 2 | /* 3 | * Copyright (c) 2012,2013 Intel Corporation 4 | * Author: Andi Kleen 5 | * 6 | * Redistribution and use in source and binary forms, with or without 7 | * modification, are permitted provided that: (1) source code distributions 8 | * retain the above copyright notice and this paragraph in its entirety, (2) 9 | * distributions including binary code include the above copyright notice and 10 | * this paragraph in its entirety in the documentation or other materials 11 | * provided with the distribution 12 | * 13 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 14 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 15 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 16 | */ 17 | 18 | #ifndef INTERRUPTS_H 19 | #define INTERRUPTS_H 1 20 | 21 | #ifdef __cplusplus 22 | extern "C" { 23 | #endif 24 | 25 | void interrupts_init(void); 26 | void interrupts_exit(void); 27 | unsigned long long get_interrupts(void); 28 | 29 | #ifdef __cplusplus 30 | } 31 | #endif 32 | 33 | #endif 34 | -------------------------------------------------------------------------------- /jevents/jevents-internal.h: -------------------------------------------------------------------------------- 1 | /* 2 | * jevents-internal.h 3 | * 4 | * Things that jevents internal implementation should call, but that you probably 5 | * shouldn't mess with. 6 | */ 7 | 8 | #ifndef JEVENTS_INTERNAL_H_ 9 | #define JEVENTS_INTERNAL_H_ 10 | 11 | 12 | void set_last_error(const char *format, ...); 13 | 14 | 15 | #endif /* JEVENTS_INTERNAL_H_ */ 16 | -------------------------------------------------------------------------------- /jevents/jevents.c: -------------------------------------------------------------------------------- 1 | /* Parse event JSON files */ 2 | 3 | /* 4 | * Copyright (c) 2014, Intel Corporation 5 | * All rights reserved. 6 | * 7 | * Redistribution and use in source and binary forms, with or without 8 | * modification, are permitted provided that the following conditions are met: 9 | * 10 | * 1. Redistributions of source code must retain the above copyright notice, 11 | * this list of conditions and the following disclaimer. 12 | * 13 | * 2. Redistributions in binary form must reproduce the above copyright 14 | * notice, this list of conditions and the following disclaimer in the 15 | * documentation and/or other materials provided with the distribution. 16 | * 17 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 20 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 21 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 22 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 28 | * OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | #define _GNU_SOURCE 1 32 | #include 33 | #include 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include "jsmn.h" 41 | #include "json.h" 42 | #include "jevents.h" 43 | #include "jevents-internal.h" 44 | 45 | static const char *json_default_name(char *type) 46 | { 47 | char *cache = NULL; 48 | char *idstr_step = NULL; 49 | char *idstr = get_cpu_str_type(type, &idstr_step); 50 | char *res = NULL; 51 | char *home = NULL; 52 | char *emap; 53 | 54 | emap = getenv("EVENTMAP"); 55 | if (emap) { 56 | if (access(emap, R_OK) == 0) 57 | return emap; 58 | free(idstr); 59 | if (asprintf(&idstr, "%s%s", emap, type) < 0) 60 | goto out; 61 | } 62 | 63 | cache = getenv("XDG_CACHE_HOME"); 64 | if (!cache) { 65 | home = getenv("HOME"); 66 | if (!home || asprintf(&cache, "%s/.cache", home) < 0) 67 | goto out; 68 | } 69 | 70 | if (access(cache, R_OK) == -1) { 71 | set_last_error("Unable to open user .cache directory at %s", cache); 72 | goto out; 73 | } 74 | out: 75 | free(cache); 76 | free(idstr); 77 | free(idstr_step); 78 | return res; 79 | } 80 | 81 | static void addfield(char *map, char **dst, const char *sep, 82 | const char *a, jsmntok_t *bt) 83 | { 84 | unsigned len = strlen(a) + 1 + strlen(sep); 85 | int olen = *dst ? strlen(*dst) : 0; 86 | int blen = bt ? json_len(bt) : 0; 87 | 88 | *dst = realloc(*dst, len + olen + blen); 89 | if (!*dst) 90 | exit(ENOMEM); 91 | if (!olen) 92 | *(*dst) = 0; 93 | else 94 | strcat(*dst, sep); 95 | strcat(*dst, a); 96 | if (bt) 97 | strncat(*dst, map + bt->start, blen); 98 | } 99 | 100 | static void fixname(char *s) 101 | { 102 | for (; *s; s++) 103 | *s = tolower(*s); 104 | } 105 | 106 | static void fixdesc(char *s) 107 | { 108 | char *e; 109 | 110 | if (!s) 111 | return; 112 | 113 | e = s + strlen(s); 114 | 115 | /* Remove trailing dots that look ugly in perf list */ 116 | --e; 117 | while (e >= s && isspace(*e)) 118 | --e; 119 | if (*e == '.') 120 | *e = 0; 121 | } 122 | 123 | static struct map { 124 | const char *json; 125 | const char *perf; 126 | } unit_to_pmu[] = { 127 | { "CBO", "cbox" }, 128 | { "QPI LL", "qpi" }, 129 | { "SBO", "sbox" }, 130 | { "IMPH-U", "cbox" }, 131 | /* FIXME: need to convert event/umask like ocperf for ncu */ 132 | { "NCU", "cbox" }, 133 | {} 134 | }; 135 | 136 | static const char *field_to_perf(struct map *table, char *map, jsmntok_t *val) 137 | { 138 | int i; 139 | 140 | for (i = 0; table[i].json; i++) { 141 | if (json_streq(map, val, table[i].json)) 142 | return table[i].perf; 143 | } 144 | return NULL; 145 | } 146 | 147 | #define EXPECT(e, t, m) do { if (!(e)) { \ 148 | jsmntok_t *loc = (t); \ 149 | if (!(t)->start && (t) > tokens) \ 150 | loc = (t) - 1; \ 151 | fprintf(stderr, "%s:%d: " m ", got %s\n", fn, \ 152 | json_line(map, loc), \ 153 | json_name(t)); \ 154 | goto out_free; \ 155 | } } while (0) 156 | 157 | static struct msrmap { 158 | const char *num; 159 | const char *pname; 160 | } msrmap[] = { 161 | { "0x3F6", "ldlat=" }, 162 | { "0x1A6", "offcore_rsp=" }, 163 | { "0x1A7", "offcore_rsp=" }, 164 | { "0x3F7", "frontend="}, 165 | { NULL, NULL } 166 | }; 167 | 168 | static struct field { 169 | const char *field; 170 | const char *kernel; 171 | } fields[] = { 172 | { "UMask", "umask=" }, 173 | { "CounterMask", "cmask=" }, 174 | { "Invert", "inv=" }, 175 | { "AnyThread", "any=" }, 176 | { "EdgeDetect", "edge=" }, 177 | { "SampleAfterValue", "period=" }, 178 | { NULL, NULL } 179 | }; 180 | 181 | static void cut_comma(char *map, jsmntok_t *newval) 182 | { 183 | int i; 184 | 185 | /* Cut off everything after comma */ 186 | for (i = newval->start; i < newval->end; i++) { 187 | if (map[i] == ',') 188 | newval->end = i; 189 | } 190 | } 191 | 192 | static int match_field(char *map, jsmntok_t *field, int nz, 193 | char **event, jsmntok_t *val) 194 | { 195 | struct field *f; 196 | jsmntok_t newval = *val; 197 | 198 | for (f = fields; f->field; f++) 199 | if (json_streq(map, field, f->field) && nz) { 200 | if (json_streq(map, val, "0x00") || 201 | json_streq(map, val, "0x0")) 202 | return 1; 203 | cut_comma(map, &newval); 204 | addfield(map, event, ",", f->kernel, &newval); 205 | return 1; 206 | } 207 | return 0; 208 | } 209 | 210 | static struct msrmap *lookup_msr(char *map, jsmntok_t *val) 211 | { 212 | jsmntok_t newval = *val; 213 | static bool warned = false; 214 | int i; 215 | 216 | cut_comma(map, &newval); 217 | for (i = 0; msrmap[i].num; i++) 218 | if (json_streq(map, &newval, msrmap[i].num)) 219 | return &msrmap[i]; 220 | if (!warned) { 221 | warned = true; 222 | fprintf(stderr, "Unknown MSR in event file %.*s\n", 223 | json_len(val), map + val->start); 224 | } 225 | return NULL; 226 | } 227 | 228 | /** 229 | * json_events - Read JSON event file from disk and call event callback. 230 | * @fn: File name to read or NULL for default. 231 | * @func: Callback to call for each event 232 | * @data: Abstract pointer to pass to func. 233 | * 234 | * The callback gets the data pointer, the event name, the event 235 | * in perf format and a description passed. 236 | * 237 | * Call func with each event in the json file 238 | * Return: -1 on failure, otherwise 0. 239 | */ 240 | int json_events(const char *fn, 241 | int (*func)(void *data, char *name, char *event, char *desc, char *pmu), 242 | void *data) 243 | { 244 | int err = -EIO; 245 | size_t size; 246 | jsmntok_t *tokens, *tok; 247 | int i, j, len; 248 | char *map = NULL; 249 | char buf[128]; 250 | const char *orig_fn = fn; 251 | 252 | if (!fn) { 253 | fn = json_default_name("-core"); 254 | if (!fn) 255 | return JEV_NO_PMU_EVENTS_FILE; 256 | if (access(fn, R_OK) == -1) { 257 | set_last_error("Unable to open CPU events file at %s - unsupported CPU or need to download new events", fn); 258 | return JEV_NO_PMU_EVENTS_FILE; 259 | } 260 | } 261 | 262 | tokens = parse_json(fn, &map, &size, &len); 263 | if (!tokens) 264 | return -EIO; 265 | EXPECT(tokens->type == JSMN_ARRAY, tokens, "expected top level array"); 266 | tok = tokens + 1; 267 | for (i = 0; i < tokens->size; i++) { 268 | char *event = NULL, *desc = NULL, *name = NULL; 269 | char *pmu = NULL; 270 | char *filter = NULL; 271 | unsigned long long eventcode = 0; 272 | struct msrmap *msr = NULL; 273 | jsmntok_t *msrval = NULL; 274 | jsmntok_t *precise = NULL; 275 | jsmntok_t *obj = tok++; 276 | 277 | EXPECT(obj->type == JSMN_OBJECT, obj, "expected object"); 278 | for (j = 0; j < obj->size; j += 2) { 279 | jsmntok_t *field, *val; 280 | int nz; 281 | 282 | field = tok + j; 283 | EXPECT(field->type == JSMN_STRING, tok + j, 284 | "Expected field name"); 285 | val = tok + j + 1; 286 | EXPECT(val->type == JSMN_STRING, tok + j + 1, 287 | "Expected string value"); 288 | 289 | nz = !json_streq(map, val, "0") && !json_streq(map, val, "0x00"); 290 | if (match_field(map, field, nz, &event, val)) { 291 | /* ok */ 292 | } else if (json_streq(map, field, "EventCode")) { 293 | char *code = NULL; 294 | addfield(map, &code, "", "", val); 295 | eventcode |= strtoul(code, NULL, 0); 296 | free(code); 297 | } else if (json_streq(map, field, "ExtSel")) { 298 | char *code = NULL; 299 | addfield(map, &code, "", "", val); 300 | eventcode |= strtoul(code, NULL, 0) << 21; 301 | free(code); 302 | } else if (json_streq(map, field, "EventName")) { 303 | addfield(map, &name, "", "", val); 304 | } else if (json_streq(map, field, "BriefDescription")) { 305 | addfield(map, &desc, "", "", val); 306 | fixdesc(desc); 307 | } else if (json_streq(map, field, "PEBS") && nz && desc && 308 | !strstr(desc, "(Precise Event)")) { 309 | precise = val; 310 | } else if (json_streq(map, field, "MSRIndex") && nz) { 311 | msr = lookup_msr(map, val); 312 | } else if (json_streq(map, field, "MSRValue")) { 313 | msrval = val; 314 | } else if (json_streq(map, field, "Errata") && 315 | !json_streq(map, val, "null")) { 316 | addfield(map, &desc, ". ", 317 | " Spec update: ", val); 318 | } else if (json_streq(map, field, "Data_LA") && nz) { 319 | addfield(map, &desc, ". ", 320 | " Supports address when precise", 321 | NULL); 322 | } else if (json_streq(map, field, "Unit")) { 323 | const char *ppmu; 324 | char *s; 325 | 326 | ppmu = field_to_perf(unit_to_pmu, map, val); 327 | if (ppmu) { 328 | pmu = strdup(ppmu); 329 | } else { 330 | addfield(map, &pmu, "", "", val); 331 | for (s = pmu; *s; s++) 332 | *s = tolower(*s); 333 | } 334 | addfield(map, &desc, ". ", "Unit: ", NULL); 335 | addfield(map, &desc, "", pmu, NULL); 336 | } else if (json_streq(map, field, "Filter") && 337 | !json_streq(map, val, "na")) { 338 | addfield(map, &filter, "", "", val); 339 | } 340 | /* ignore unknown fields */ 341 | } 342 | if (precise) { 343 | if (json_streq(map, precise, "2")) 344 | addfield(map, &desc, " ", "(Must be precise)", 345 | NULL); 346 | else 347 | addfield(map, &desc, " ", 348 | "(Precise event)", NULL); 349 | } 350 | snprintf(buf, sizeof buf, "event=%#llx", eventcode); 351 | addfield(map, &event, ",", buf, NULL); 352 | if (filter) 353 | addfield(map, &event, ",", filter, NULL); 354 | if (msr != NULL) 355 | addfield(map, &event, ",", msr->pname, msrval); 356 | if (!pmu) 357 | pmu = strdup("cpu"); 358 | err = -EIO; 359 | if (name && event) { 360 | fixname(name); 361 | err = func(data, name, event, desc, pmu); 362 | } 363 | free(event); 364 | free(desc); 365 | free(name); 366 | free(pmu); 367 | free(filter); 368 | if (err) 369 | break; 370 | tok += j; 371 | } 372 | EXPECT(tok - tokens == len, tok, "unexpected objects at end"); 373 | err = 0; 374 | out_free: 375 | if (map) 376 | free_json(map, size, tokens); 377 | if (!orig_fn && !err) { 378 | fn = json_default_name("-uncore"); 379 | err = json_events(fn, func, data); 380 | /* Ignore open error */ 381 | if (err == -EIO) 382 | return 0; 383 | } 384 | if (!orig_fn) 385 | free((char *)fn); 386 | return err; 387 | } 388 | 389 | const char* jevent_error_to_string(int error_code) { 390 | switch (error_code) { 391 | case 0: return "Success"; 392 | case JEV_GENERIC_ERROR: return "Unspecified error"; 393 | case JEV_NO_PMU_EVENTS_FILE: return "Cannot find the appropriate CPU-specific event file"; 394 | } 395 | return "Unknown error"; 396 | } 397 | 398 | // TODO: make this threadsafe 399 | const char *last_error; 400 | bool last_error_freeable; 401 | 402 | void set_last_error(const char *format, ...) { 403 | if (last_error_freeable) 404 | free((char *)last_error); 405 | va_list fmt_args; 406 | va_start(fmt_args, format); 407 | char *le; 408 | if (vasprintf(&le, format, fmt_args) < 0) { 409 | last_error = (char *)format; // reasonable backup if vasprintf failed for some reason 410 | last_error_freeable = false; 411 | } else { 412 | last_error = le; 413 | last_error_freeable = true; 414 | } 415 | va_end(fmt_args); 416 | } 417 | 418 | const char *jevent_get_error_details() { 419 | return last_error ? last_error : "No details available."; 420 | } 421 | 422 | 423 | -------------------------------------------------------------------------------- /jevents/jevents.h: -------------------------------------------------------------------------------- 1 | #ifndef JEVENTS_H 2 | #define JEVENTS_H 1 3 | 4 | #include 5 | #include 6 | 7 | #ifdef __cplusplus 8 | extern "C" { 9 | #endif 10 | 11 | int json_events(const char *fn, 12 | int (*func)(void *data, char *name, char *event, char *desc, 13 | char *pmu), 14 | void *data); 15 | char *get_cpu_str(void); 16 | char *get_cpu_str_type(char *type, char **idstr_step); 17 | 18 | struct perf_event_attr; 19 | 20 | int jevent_name_to_attr(const char *str, struct perf_event_attr *attr); 21 | int resolve_event(const char *name, struct perf_event_attr *attr); 22 | int read_events(const char *fn); 23 | int walk_events(int (*func)(void *data, char *name, char *event, char *desc), 24 | void *data); 25 | int walk_perf_events(int (*func)(void *data, char *name, char *event, char *desc), 26 | void *data); 27 | char *format_raw_event(struct perf_event_attr *attr, char *name); 28 | int rmap_event(unsigned event, char **name, char **desc); 29 | 30 | int perf_event_open(struct perf_event_attr *attr, pid_t pid, 31 | int cpu, int group_fd, unsigned long flags); 32 | char *resolve_pmu(int type); 33 | bool jevent_pmu_uncore(const char *str); 34 | 35 | #ifdef __cplusplus 36 | } 37 | #endif 38 | 39 | enum jevents_error { 40 | JEV_GENERIC_ERROR = -1, 41 | JEV_NO_PMU_EVENTS_FILE = -2, 42 | 43 | }; 44 | 45 | /* 46 | * Returns a string describing the given error_code. Any code in the jevents_error 47 | * enum is supported, in addition to 0, which returns "Success". Any other code 48 | * returns "Unknown error". 49 | */ 50 | const char* jevent_error_to_string(int error_code); 51 | 52 | /* 53 | * When a function returns an error code, this function may return additional details. 54 | */ 55 | const char* jevent_get_error_details(); 56 | 57 | #endif 58 | -------------------------------------------------------------------------------- /jevents/jsession.h: -------------------------------------------------------------------------------- 1 | #ifndef JSESSION_H 2 | #define JSESSION_H 1 3 | 4 | #include 5 | #include 6 | 7 | #ifdef __cplusplus 8 | extern "C" { 9 | #endif 10 | 11 | struct event { 12 | struct event *next; 13 | struct perf_event_attr attr; 14 | char *event; 15 | bool end_group, group_leader; 16 | bool uncore; 17 | struct efd { 18 | int fd; 19 | uint64_t val[3]; 20 | } efd[0]; /* num_cpus */ 21 | }; 22 | 23 | struct eventlist { 24 | struct event *eventlist; 25 | struct event *eventlist_last; 26 | int num_cpus; 27 | }; 28 | 29 | int parse_events(struct eventlist *el, char *events); 30 | int setup_events(struct eventlist *el, bool measure_all, int measure_pid); 31 | int setup_event(struct event *e, int cpu, struct event *leader, bool measure_all, int measure_pid); 32 | int read_event(struct event *e, int cpu); 33 | int read_all_events(struct eventlist *el); 34 | struct eventlist *alloc_eventlist(void); 35 | uint64_t event_scaled_value(struct event *e, int cpu); 36 | 37 | #ifdef __cplusplus 38 | } 39 | #endif 40 | 41 | #endif 42 | -------------------------------------------------------------------------------- /jevents/jsmn.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2010 Serge A. Zaitsev 3 | * 4 | * Permission is hereby granted, free of charge, to any person obtaining a copy 5 | * of this software and associated documentation files (the "Software"), to deal 6 | * in the Software without restriction, including without limitation the rights 7 | * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | * copies of the Software, and to permit persons to whom the Software is 9 | * furnished to do so, subject to the following conditions: 10 | * 11 | * The above copyright notice and this permission notice shall be included in 12 | * all copies or substantial portions of the Software. 13 | * 14 | * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 20 | * THE SOFTWARE. 21 | * 22 | * Slightly modified by AK to not assume 0 terminated input. 23 | */ 24 | 25 | #include 26 | #include "jsmn.h" 27 | 28 | /* 29 | * Allocates a fresh unused token from the token pull. 30 | */ 31 | static jsmntok_t *jsmn_alloc_token(jsmn_parser *parser, 32 | jsmntok_t *tokens, size_t num_tokens) 33 | { 34 | jsmntok_t *tok; 35 | 36 | if ((unsigned)parser->toknext >= num_tokens) 37 | return NULL; 38 | tok = &tokens[parser->toknext++]; 39 | tok->start = tok->end = -1; 40 | tok->size = 0; 41 | return tok; 42 | } 43 | 44 | /* 45 | * Fills token type and boundaries. 46 | */ 47 | static void jsmn_fill_token(jsmntok_t *token, jsmntype_t type, 48 | int start, int end) 49 | { 50 | token->type = type; 51 | token->start = start; 52 | token->end = end; 53 | token->size = 0; 54 | } 55 | 56 | /* 57 | * Fills next available token with JSON primitive. 58 | */ 59 | static jsmnerr_t jsmn_parse_primitive(jsmn_parser *parser, const char *js, 60 | size_t len, 61 | jsmntok_t *tokens, size_t num_tokens) 62 | { 63 | jsmntok_t *token; 64 | int start; 65 | 66 | start = parser->pos; 67 | 68 | for (; parser->pos < len; parser->pos++) { 69 | switch (js[parser->pos]) { 70 | #ifndef JSMN_STRICT 71 | /* 72 | * In strict mode primitive must be followed by "," 73 | * or "}" or "]" 74 | */ 75 | case ':': 76 | #endif 77 | case '\t': 78 | case '\r': 79 | case '\n': 80 | case ' ': 81 | case ',': 82 | case ']': 83 | case '}': 84 | goto found; 85 | default: 86 | break; 87 | } 88 | if (js[parser->pos] < 32 || js[parser->pos] >= 127) { 89 | parser->pos = start; 90 | return JSMN_ERROR_INVAL; 91 | } 92 | } 93 | #ifdef JSMN_STRICT 94 | /* 95 | * In strict mode primitive must be followed by a 96 | * comma/object/array. 97 | */ 98 | parser->pos = start; 99 | return JSMN_ERROR_PART; 100 | #endif 101 | 102 | found: 103 | token = jsmn_alloc_token(parser, tokens, num_tokens); 104 | if (token == NULL) { 105 | parser->pos = start; 106 | return JSMN_ERROR_NOMEM; 107 | } 108 | jsmn_fill_token(token, JSMN_PRIMITIVE, start, parser->pos); 109 | parser->pos--; 110 | return JSMN_SUCCESS; 111 | } 112 | 113 | /* 114 | * Fills next token with JSON string. 115 | */ 116 | static jsmnerr_t jsmn_parse_string(jsmn_parser *parser, const char *js, 117 | size_t len, 118 | jsmntok_t *tokens, size_t num_tokens) 119 | { 120 | jsmntok_t *token; 121 | int start = parser->pos; 122 | 123 | parser->pos++; 124 | 125 | /* Skip starting quote */ 126 | for (; parser->pos < len; parser->pos++) { 127 | char c = js[parser->pos]; 128 | 129 | /* Quote: end of string */ 130 | if (c == '\"') { 131 | token = jsmn_alloc_token(parser, tokens, num_tokens); 132 | if (token == NULL) { 133 | parser->pos = start; 134 | return JSMN_ERROR_NOMEM; 135 | } 136 | jsmn_fill_token(token, JSMN_STRING, start+1, 137 | parser->pos); 138 | return JSMN_SUCCESS; 139 | } 140 | 141 | /* Backslash: Quoted symbol expected */ 142 | if (c == '\\') { 143 | parser->pos++; 144 | switch (js[parser->pos]) { 145 | /* Allowed escaped symbols */ 146 | case '\"': 147 | case '/': 148 | case '\\': 149 | case 'b': 150 | case 'f': 151 | case 'r': 152 | case 'n': 153 | case 't': 154 | break; 155 | /* Allows escaped symbol \uXXXX */ 156 | case 'u': 157 | /* TODO */ 158 | break; 159 | /* Unexpected symbol */ 160 | default: 161 | parser->pos = start; 162 | return JSMN_ERROR_INVAL; 163 | } 164 | } 165 | } 166 | parser->pos = start; 167 | return JSMN_ERROR_PART; 168 | } 169 | 170 | /* 171 | * Parse JSON string and fill tokens. 172 | */ 173 | jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, size_t len, 174 | jsmntok_t *tokens, 175 | unsigned int num_tokens) 176 | { 177 | jsmnerr_t r; 178 | int i; 179 | jsmntok_t *token; 180 | 181 | for (; parser->pos < len; parser->pos++) { 182 | char c; 183 | jsmntype_t type; 184 | 185 | c = js[parser->pos]; 186 | switch (c) { 187 | case '{': 188 | case '[': 189 | token = jsmn_alloc_token(parser, tokens, num_tokens); 190 | if (token == NULL) 191 | return JSMN_ERROR_NOMEM; 192 | if (parser->toksuper != -1) 193 | tokens[parser->toksuper].size++; 194 | token->type = (c == '{' ? JSMN_OBJECT : JSMN_ARRAY); 195 | token->start = parser->pos; 196 | parser->toksuper = parser->toknext - 1; 197 | break; 198 | case '}': 199 | case ']': 200 | type = (c == '}' ? JSMN_OBJECT : JSMN_ARRAY); 201 | for (i = parser->toknext - 1; i >= 0; i--) { 202 | token = &tokens[i]; 203 | if (token->start != -1 && token->end == -1) { 204 | if (token->type != type) 205 | return JSMN_ERROR_INVAL; 206 | parser->toksuper = -1; 207 | token->end = parser->pos + 1; 208 | break; 209 | } 210 | } 211 | /* Error if unmatched closing bracket */ 212 | if (i == -1) 213 | return JSMN_ERROR_INVAL; 214 | for (; i >= 0; i--) { 215 | token = &tokens[i]; 216 | if (token->start != -1 && token->end == -1) { 217 | parser->toksuper = i; 218 | break; 219 | } 220 | } 221 | break; 222 | case '\"': 223 | r = jsmn_parse_string(parser, js, len, tokens, 224 | num_tokens); 225 | if (r < 0) 226 | return r; 227 | if (parser->toksuper != -1) 228 | tokens[parser->toksuper].size++; 229 | break; 230 | case '\t': 231 | case '\r': 232 | case '\n': 233 | case ':': 234 | case ',': 235 | case ' ': 236 | break; 237 | #ifdef JSMN_STRICT 238 | /* 239 | * In strict mode primitives are: 240 | * numbers and booleans. 241 | */ 242 | case '-': 243 | case '0': 244 | case '1': 245 | case '2': 246 | case '3': 247 | case '4': 248 | case '5': 249 | case '6': 250 | case '7': 251 | case '8': 252 | case '9': 253 | case 't': 254 | case 'f': 255 | case 'n': 256 | #else 257 | /* 258 | * In non-strict mode every unquoted value 259 | * is a primitive. 260 | */ 261 | default: 262 | #endif 263 | r = jsmn_parse_primitive(parser, js, len, tokens, 264 | num_tokens); 265 | if (r < 0) 266 | return r; 267 | if (parser->toksuper != -1) 268 | tokens[parser->toksuper].size++; 269 | break; 270 | 271 | #ifdef JSMN_STRICT 272 | /* Unexpected char in strict mode */ 273 | default: 274 | return JSMN_ERROR_INVAL; 275 | #endif 276 | } 277 | } 278 | 279 | for (i = parser->toknext - 1; i >= 0; i--) { 280 | /* Unmatched opened object or array */ 281 | if (tokens[i].start != -1 && tokens[i].end == -1) 282 | return JSMN_ERROR_PART; 283 | } 284 | 285 | return JSMN_SUCCESS; 286 | } 287 | 288 | /* 289 | * Creates a new parser based over a given buffer with an array of tokens 290 | * available. 291 | */ 292 | void jsmn_init(jsmn_parser *parser) 293 | { 294 | parser->pos = 0; 295 | parser->toknext = 0; 296 | parser->toksuper = -1; 297 | } 298 | -------------------------------------------------------------------------------- /jevents/jsmn.h: -------------------------------------------------------------------------------- 1 | #ifndef __JSMN_H_ 2 | #define __JSMN_H_ 3 | 4 | #ifdef __cplusplus 5 | extern "C" { 6 | #endif 7 | 8 | /* 9 | * JSON type identifier. Basic types are: 10 | * o Object 11 | * o Array 12 | * o String 13 | * o Other primitive: number, boolean (true/false) or null 14 | */ 15 | typedef enum { 16 | JSMN_PRIMITIVE = 0, 17 | JSMN_OBJECT = 1, 18 | JSMN_ARRAY = 2, 19 | JSMN_STRING = 3 20 | } jsmntype_t; 21 | 22 | typedef enum { 23 | /* Not enough tokens were provided */ 24 | JSMN_ERROR_NOMEM = -1, 25 | /* Invalid character inside JSON string */ 26 | JSMN_ERROR_INVAL = -2, 27 | /* The string is not a full JSON packet, more bytes expected */ 28 | JSMN_ERROR_PART = -3, 29 | /* Everything was fine */ 30 | JSMN_SUCCESS = 0 31 | } jsmnerr_t; 32 | 33 | /* 34 | * JSON token description. 35 | * @param type type (object, array, string etc.) 36 | * @param start start position in JSON data string 37 | * @param end end position in JSON data string 38 | */ 39 | typedef struct { 40 | jsmntype_t type; 41 | int start; 42 | int end; 43 | int size; 44 | } jsmntok_t; 45 | 46 | /* 47 | * JSON parser. Contains an array of token blocks available. Also stores 48 | * the string being parsed now and current position in that string 49 | */ 50 | typedef struct { 51 | unsigned int pos; /* offset in the JSON string */ 52 | int toknext; /* next token to allocate */ 53 | int toksuper; /* superior token node, e.g parent object or array */ 54 | } jsmn_parser; 55 | 56 | /* 57 | * Create JSON parser over an array of tokens 58 | */ 59 | void jsmn_init(jsmn_parser *parser); 60 | 61 | /* 62 | * Run JSON parser. It parses a JSON data string into and array of tokens, 63 | * each describing a single JSON object. 64 | */ 65 | jsmnerr_t jsmn_parse(jsmn_parser *parser, const char *js, 66 | size_t len, 67 | jsmntok_t *tokens, unsigned int num_tokens); 68 | 69 | #ifdef __cplusplus 70 | } 71 | #endif 72 | 73 | #endif /* __JSMN_H_ */ 74 | -------------------------------------------------------------------------------- /jevents/json.c: -------------------------------------------------------------------------------- 1 | /* Parse JSON files using the JSMN parser. */ 2 | 3 | /* 4 | * Copyright (c) 2014, Intel Corporation 5 | * All rights reserved. 6 | * 7 | * Redistribution and use in source and binary forms, with or without 8 | * modification, are permitted provided that the following conditions are met: 9 | * 10 | * 1. Redistributions of source code must retain the above copyright notice, 11 | * this list of conditions and the following disclaimer. 12 | * 13 | * 2. Redistributions in binary form must reproduce the above copyright 14 | * notice, this list of conditions and the following disclaimer in the 15 | * documentation and/or other materials provided with the distribution. 16 | * 17 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 20 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 21 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 22 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 28 | * OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | #include 32 | #include 33 | #include 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include "jsmn.h" 40 | #include "json.h" 41 | #include "jevents-internal.h" 42 | 43 | static char *mapfile(const char *fn, size_t *size) 44 | { 45 | struct stat st; 46 | char *map = NULL; 47 | int err; 48 | int fd = open(fn, O_RDONLY); 49 | 50 | if (fd < 0) 51 | return NULL; 52 | err = fstat(fd, &st); 53 | if (err < 0) 54 | goto out; 55 | *size = st.st_size; 56 | map = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); 57 | if (map == (char *)MAP_FAILED) 58 | map = NULL; 59 | out: 60 | close(fd); 61 | return map; 62 | } 63 | 64 | static void unmapfile(char *map, size_t size) 65 | { 66 | munmap(map, size); 67 | } 68 | 69 | /* 70 | * Parse json file using jsmn. Return array of tokens, 71 | * and mapped file. Caller needs to free array. 72 | */ 73 | jsmntok_t *parse_json(const char *fn, char **map, size_t *size, int *len) 74 | { 75 | jsmn_parser parser; 76 | jsmntok_t *tokens; 77 | jsmnerr_t res; 78 | unsigned sz; 79 | 80 | *map = mapfile(fn, size); 81 | if (!*map) { 82 | set_last_error("Failed to open event file %s", fn); 83 | return NULL; 84 | } 85 | /* Heuristic */ 86 | sz = *size * 16; 87 | tokens = malloc(sz); 88 | if (!tokens) { 89 | set_last_error("malloc failed"); 90 | goto error; 91 | } 92 | jsmn_init(&parser); 93 | res = jsmn_parse(&parser, *map, *size, tokens, 94 | sz / sizeof(jsmntok_t)); 95 | if (res != JSMN_SUCCESS) { 96 | set_last_error("Failed while parsing event file %s, error: %d", fn, res); 97 | fprintf(stderr, "%s: json error %d\n", fn, res); 98 | goto error_free; 99 | } 100 | if (len) 101 | *len = parser.toknext; 102 | return tokens; 103 | error_free: 104 | free(tokens); 105 | error: 106 | unmapfile(*map, *size); 107 | return NULL; 108 | } 109 | 110 | void free_json(char *map, size_t size, jsmntok_t *tokens) 111 | { 112 | free(tokens); 113 | unmapfile(map, size); 114 | } 115 | 116 | static int countchar(char *map, char c, int end) 117 | { 118 | int i; 119 | int count = 0; 120 | for (i = 0; i < end; i++) 121 | if (map[i] == c) 122 | count++; 123 | return count; 124 | } 125 | 126 | /* Return line number of a jsmn token */ 127 | int json_line(char *map, jsmntok_t *t) 128 | { 129 | return countchar(map, '\n', t->start) + 1; 130 | } 131 | 132 | static const char *jsmn_types[] = { 133 | [JSMN_PRIMITIVE] = "primitive", 134 | [JSMN_ARRAY] = "array", 135 | [JSMN_OBJECT] = "object", 136 | [JSMN_STRING] = "string" 137 | }; 138 | 139 | #define LOOKUP(a, i) ((i) < (sizeof(a)/sizeof(*(a))) ? ((a)[i]) : "?") 140 | 141 | /* Return type name of a jsmn token */ 142 | const char *json_name(jsmntok_t *t) 143 | { 144 | return LOOKUP(jsmn_types, t->type); 145 | } 146 | 147 | int json_len(jsmntok_t *t) 148 | { 149 | return t->end - t->start; 150 | } 151 | 152 | /* Is string t equal to s? */ 153 | int json_streq(char *map, jsmntok_t *t, const char *s) 154 | { 155 | unsigned len = t->end - t->start; 156 | return len == strlen(s) && !strncasecmp(map + t->start, s, len); 157 | } 158 | -------------------------------------------------------------------------------- /jevents/json.h: -------------------------------------------------------------------------------- 1 | #ifndef JSON_H 2 | #define JSON_H 1 3 | 4 | #include "jsmn.h" 5 | 6 | #ifdef __cplusplus 7 | extern "C" { 8 | #endif 9 | 10 | jsmntok_t *parse_json(const char *fn, char **map, size_t *size, int *len); 11 | void free_json(char *map, size_t size, jsmntok_t *tokens); 12 | int json_line(char *map, jsmntok_t *t); 13 | const char *json_name(jsmntok_t *t); 14 | int json_streq(char *map, jsmntok_t *t, const char *s); 15 | int json_len(jsmntok_t *t); 16 | 17 | #ifdef __cplusplus 18 | } 19 | #endif 20 | 21 | #endif 22 | -------------------------------------------------------------------------------- /jevents/libjevents.spec: -------------------------------------------------------------------------------- 1 | Name: libjevents 2 | Version: 1 3 | Release: 1%{?dist} 4 | Summary: libjevents shared library from pmu-tools 5 | 6 | License: BSD 7 | URL: https://github.com/andikleen/pmu-tools/jevents 8 | # git clone https://github.com/andikleen/pmu-tools.git pmu-tools 9 | # cd pmu-tools && tar czf jevents.tar.gz jevents/ 10 | Source0: jevents.tar.gz 11 | 12 | %description 13 | jevents library from pmu-tools. 14 | 15 | %prep 16 | %setup -q -n jevents 17 | 18 | 19 | %build 20 | %make_build PREFIX=%{buildroot}/usr 21 | 22 | %install 23 | %make_install PREFIX=%{buildroot}/usr 24 | 25 | %files 26 | /usr/bin/event-rmap 27 | /usr/bin/listevents 28 | /usr/bin/showevent 29 | /usr/include/* 30 | /usr/lib64/libjevents.a 31 | 32 | %changelog 33 | 34 | * Sat Mar 3 2018 Pablo Llopis 1-1 35 | - Initial specfile version 36 | -------------------------------------------------------------------------------- /jevents/listevents.c: -------------------------------------------------------------------------------- 1 | /* List all events */ 2 | /* -v print descriptions */ 3 | /* pattern print only events matching shell pattern */ 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include "jevents.h" 10 | 11 | int verbose = 0; 12 | 13 | struct event { 14 | char *name; 15 | char *event; 16 | char *desc; 17 | }; 18 | 19 | struct walk_data { 20 | int count; 21 | int ind; 22 | char *match; 23 | struct event *events; 24 | }; 25 | 26 | static int count_event(void *data, char *name, char *event, char *desc) 27 | { 28 | struct walk_data *wd = data; 29 | if (wd->match && fnmatch(wd->match, name, 0)) 30 | return 0; 31 | wd->count++; 32 | return 0; 33 | } 34 | 35 | static int store_event(void *data, char *name, char *event, char *desc) 36 | { 37 | struct walk_data *wd = data; 38 | 39 | if (wd->match && fnmatch(wd->match, name, 0)) 40 | return 0; 41 | assert(wd->ind < wd->count); 42 | struct event *e = &wd->events[wd->ind++]; 43 | e->name = strdup(name); 44 | e->event = strdup(event); 45 | e->desc = strdup(desc); 46 | return 0; 47 | } 48 | 49 | static int cmp_events(const void *ap, const void *bp) 50 | { 51 | const struct event *a = ap; 52 | const struct event *b = bp; 53 | return strcmp(a->name, b->name); 54 | } 55 | 56 | int main(int ac, char **av) 57 | { 58 | if (av[1] && !strcmp(av[1], "-v")) { 59 | av++; 60 | verbose = 1; 61 | } 62 | 63 | read_events(NULL); 64 | struct walk_data wd = { .match = av[1] }; 65 | walk_events(count_event, &wd); 66 | walk_perf_events(count_event, &wd); 67 | wd.events = calloc(sizeof(struct event), wd.count); 68 | walk_events(store_event, &wd); 69 | walk_perf_events(store_event, &wd); 70 | qsort(wd.events, wd.count, sizeof(struct event), cmp_events); 71 | int i; 72 | for (i = 0; i < wd.count; i++) { 73 | struct event *e = &wd.events[i]; 74 | printf("%-40s ", e->name); 75 | printf("%s\n", e->event); 76 | if (verbose && e->desc[0]) 77 | printf("\t%s\n", e->desc); /* XXX word wrap */ 78 | } 79 | return 0; 80 | } 81 | -------------------------------------------------------------------------------- /jevents/measure.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2012,2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include "measure.h" 22 | #include "rdpmc.h" 23 | 24 | /** 25 | * DOC: Measuring of predefined counter groups in a process 26 | * 27 | * Higher level interface to measure CPU performance counters in process 28 | * context. The program calls the appropiate functions around 29 | * code that should be measured in individual thread. 30 | * 31 | * The data is accumulated globally and printed 32 | */ 33 | 34 | struct res { 35 | struct res *next; 36 | unsigned long long start[N_COUNTER]; 37 | unsigned long long count[N_COUNTER]; 38 | char *name; 39 | struct measure *measure; 40 | }; 41 | 42 | static struct res *all_res; 43 | static pthread_mutex_t all_res_lock = PTHREAD_MUTEX_INITIALIZER; 44 | static __thread struct rdpmc_ctx ctx[N_COUNTER]; 45 | static __thread struct res *cur_res; 46 | 47 | static struct res *alloc_res(char *name, struct measure *measure) 48 | { 49 | struct res *r = calloc(sizeof(struct res), 1); 50 | if (!name) 51 | name = ""; 52 | r->name = strdup(name); 53 | r->measure = measure; 54 | pthread_mutex_lock(&all_res_lock); 55 | r->next = all_res; 56 | all_res = r; 57 | pthread_mutex_unlock(&all_res_lock); 58 | return r; 59 | } 60 | 61 | /** 62 | * measure_group_init - Initialize a measurement group 63 | * @g: measurement group (usually predefined) 64 | * @name: name of measurements or NULL 65 | * 66 | * Initialize a measurement group and allocate the counters. 67 | * All measurements with the same name are printed together (so multiple 68 | * names can be used to measure different parts of the program) 69 | * Exits when the counters cannot be allocated. 70 | * Has to be freed in the same thread with measure_group_finish() 71 | * Only one measurement group per thread can be active at a time. 72 | */ 73 | void measure_group_init(struct measure *g, char *name) 74 | { 75 | struct res *r = alloc_res(name, g); 76 | cur_res = r; 77 | 78 | int i; 79 | struct rdpmc_ctx *leader = NULL; 80 | for (i = 0; i < N_COUNTER; i++) { 81 | struct perf_event_attr attr = { 82 | .type = PERF_TYPE_RAW, 83 | .size = sizeof(struct perf_event_attr), 84 | .config = g[i].counter, 85 | .sample_type = PERF_SAMPLE_READ, 86 | .exclude_kernel = 1, 87 | }; 88 | if (rdpmc_open_attr(&attr, &ctx[i], leader) < 0) 89 | exit(1); 90 | if (!leader) 91 | leader = &ctx[i]; 92 | } 93 | } 94 | 95 | /** 96 | * measure_group_start - Start measuring in a measurement group. 97 | * 98 | * Start a measurement period for the current group in this thread. 99 | * Multiple measurement periods are accumulated. 100 | */ 101 | void measure_group_start(void) 102 | { 103 | int i; 104 | for (i = 0; i < N_COUNTER; i++) 105 | cur_res->start[i] = rdpmc_read(&ctx[i]); 106 | } 107 | 108 | /** 109 | * measure_group_stop - Stop measuring a measurement group 110 | * 111 | * Stop the measurement for the current measurement group. 112 | */ 113 | void measure_group_stop(void) 114 | { 115 | unsigned long long end[N_COUNTER]; 116 | int i; 117 | for (i = 0; i < N_COUNTER; i++) 118 | end[i] = rdpmc_read(&ctx[i]); 119 | for (i = 0; i < N_COUNTER; i++) 120 | cur_res->count[i] += end[i] - cur_res->start[i]; 121 | } 122 | 123 | /** 124 | * measurement_group_finish - Free the counter resources of a group 125 | * 126 | * Has to be called in the thread that executed measure_group_init() 127 | */ 128 | void measure_group_finish(void) 129 | { 130 | cur_res = NULL; 131 | int i; 132 | for (i = 0; i < N_COUNTER; i++) 133 | rdpmc_close(&ctx[i]); 134 | } 135 | 136 | static int cmp_res(const void *a, const void *b) 137 | { 138 | struct res **ra = (struct res **)a; 139 | struct res **rb = (struct res **)b; 140 | return strcmp((*ra)->name, (*rb)->name); 141 | } 142 | 143 | static struct res **sort_results(int *lenp) 144 | { 145 | struct res *r; 146 | int len = 0; 147 | for (r = all_res; r; r = r->next) 148 | len++; 149 | struct res **sr = malloc(len * sizeof(struct res *)); 150 | int j = 0; 151 | for (r = all_res; r; r = r->next) 152 | sr[j++] = r; 153 | qsort(sr, len, sizeof(struct res *), cmp_res); 154 | *lenp = len; 155 | return sr; 156 | } 157 | 158 | static void print_counters(FILE *fh, struct measure *m, 159 | unsigned long long total[N_COUNTER]) 160 | { 161 | int i; 162 | for (i = 0; i < N_COUNTER; i++) { 163 | if (m[i].name == NULL) 164 | continue; 165 | if (m[i].func) 166 | total[i] = m[i].func(m, total, i); 167 | printf("%20s\t%8llu ", m[i].name, total[i]); 168 | if (m[i].ratio_to >= 0) 169 | printf("(%.2f%%)", 170 | 100.0 * (total[m[i].ratio_to] * (double)total[i])); 171 | putchar('\n'); 172 | } 173 | } 174 | 175 | /** 176 | * measure_print_all - Print the accumulated data for all measurement groups 177 | * @fh: stdio file descriptor to output data 178 | */ 179 | void measure_print_all(FILE *fh) 180 | { 181 | unsigned long long total[N_COUNTER]; 182 | int len; 183 | struct res **sr = sort_results(&len); 184 | int i, j; 185 | 186 | for (j = 0; j < len; j++) { 187 | if (j == 0 || strcmp(sr[j - 1]->name, sr[j]->name)) { 188 | if (j > 0) { 189 | printf("%s:\n", sr[j]->name); 190 | print_counters(fh, sr[j]->measure, total); 191 | } 192 | memset(total, 0, sizeof(unsigned long long) * N_COUNTER); 193 | } 194 | for (i = 0; i < N_COUNTER; i++) 195 | total[i] += sr[j]->count[i]; 196 | } 197 | free(sr); 198 | } 199 | 200 | /** 201 | * measure_free_all - Free the accumulated data from past measurements 202 | */ 203 | void measure_free_all(void) 204 | { 205 | struct res *r, *next; 206 | for (r = all_res; r; r = next) { 207 | next = r->next; 208 | free(r); 209 | } 210 | all_res = NULL; 211 | } 212 | -------------------------------------------------------------------------------- /jevents/measure.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2012,2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | 18 | #ifndef MEASURE_H 19 | #define MEASURE_H 1 20 | 21 | #include 22 | 23 | #ifdef __cplusplus 24 | extern "C" { 25 | #endif 26 | 27 | #define N_COUNTER 4 28 | 29 | struct measure { 30 | char *name; 31 | unsigned long long counter; 32 | int ratio_to; /* or -1 */ 33 | unsigned long long (*func)(struct measure *m, 34 | unsigned long long total[N_COUNTER], int i); 35 | }; 36 | 37 | #ifdef EVENT_MACROS 38 | #define ETO(x,y) { #x, x, y } 39 | #define ETO0(x) ETO(x, 0) 40 | #define E(x) { #x, x, -1 } 41 | #define EFUNC(x,y, f) { #x, x, y, f } 42 | #endif 43 | 44 | void measure_group_init(struct measure *g, char *name); 45 | void measure_group_start(void); 46 | void measure_group_stop(void); 47 | void measure_group_finish(void); 48 | void measure_print_all(FILE *fh); 49 | void measure_free_all(void); 50 | 51 | #ifdef __cplusplus 52 | } 53 | #endif 54 | 55 | #endif 56 | -------------------------------------------------------------------------------- /jevents/perf-iter.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | /** 18 | * DOC: A simple perf library to manage the perf ring buffer 19 | * 20 | * This library provides a simple wrapping layer for the perf 21 | * mmap ring buffer. This allows to access perf events in 22 | * zero-copy from a user program. 23 | */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include "jevents.h" 31 | 32 | #include "util.h" 33 | #include "perf-iter.h" 34 | 35 | /** 36 | * perf_iter_init - Initialize iterator for perf ring buffer 37 | * @iter: Iterator to initialize. 38 | * @pfd: perf_fd from perf_fd_open() to use with the iterator. 39 | * 40 | * Needs to be called first to start walking a perf buffer. 41 | */ 42 | 43 | void perf_iter_init(struct perf_iter *iter, struct perf_fd *pfd) 44 | { 45 | int pagesize = sysconf(_SC_PAGESIZE); 46 | int page_shift = ffs(pagesize) - 1; 47 | 48 | iter->mpage = pfd->mpage; 49 | iter->bufsize = (1ULL << (pfd->buf_size_shift + page_shift)); 50 | iter->ring_buffer_mask = iter->bufsize - 1; 51 | iter->cur = iter->mpage->data_tail & iter->ring_buffer_mask; 52 | /* Kernel only changes head */ 53 | iter->raw_head = iter->mpage->data_head; 54 | iter->avail = iter->raw_head - iter->mpage->data_tail; 55 | iter->head = iter->raw_head & iter->ring_buffer_mask; 56 | mb(); 57 | iter->data = (char *)(iter->mpage) + pagesize; 58 | } 59 | 60 | /** 61 | * perf_buffer_read - Access data in perf ring iterator. 62 | * @iter: Iterator to copy data from 63 | * @buffer: Temporary buffer to use for wrapped events 64 | * @bufsize: Size of buffer 65 | * 66 | * Return the next available perf_event_header in the ring buffer. 67 | * This normally does zero copy, but for wrapped events 68 | * they are copied into the temporary buffer supplied and a 69 | * pointer into that is returned. 70 | * 71 | * Return: NULL when nothing available, otherwise perf_event_header. 72 | */ 73 | 74 | struct perf_event_header *perf_buffer_read(struct perf_iter *iter, void *buffer, int bufsize) 75 | { 76 | struct perf_event_header *hdr = (struct perf_event_header *)(iter->data + iter->cur); 77 | u64 left = iter->bufsize - iter->cur; 78 | 79 | if (left >= sizeof(hdr->size) && hdr->size <= left) { 80 | iter->cur += hdr->size; 81 | iter->avail -= hdr->size; 82 | /* Copy less fast path */ 83 | return hdr; 84 | } else { 85 | /* 86 | * Buffer wraps. This case is untested in this example. 87 | * Assumes hdr->size is always continuous by itself. 88 | */ 89 | if (left) { 90 | if (hdr->size > bufsize) 91 | return NULL; 92 | memcpy(buffer, hdr, left); 93 | } else { 94 | hdr = (struct perf_event_header *)iter->data; 95 | if (hdr->size > bufsize) 96 | return NULL; 97 | } 98 | memcpy(buffer + left, iter->data, hdr->size - left); 99 | iter->cur = hdr->size - left; 100 | iter->avail -= hdr->size; 101 | return buffer; 102 | } 103 | } 104 | 105 | /** 106 | * perf_iter_continue - Allow the kernel to log over our data. 107 | * @iter: Iterator. 108 | * Tell the kernel we are finished with the data and it can 109 | * continue logging. 110 | */ 111 | 112 | void perf_iter_continue(struct perf_iter *iter) 113 | { 114 | iter->mpage->data_tail = iter->raw_head; 115 | mb(); 116 | } 117 | 118 | static unsigned perf_mmap_size(int buf_size_shift) 119 | { 120 | return ((1U << buf_size_shift) + 1) * sysconf(_SC_PAGESIZE); 121 | } 122 | 123 | /** 124 | * perf_fd_open - Open a perf event with ring buffer for the current thread 125 | * @p: perf_fd to initialize 126 | * @attr: perf event attribute to use 127 | * @buf_size_shift: log2 of buffer size. 128 | * Return: -1 on error, otherwise 0. 129 | */ 130 | int perf_fd_open(struct perf_fd *p, struct perf_event_attr *attr, int buf_size_shift) 131 | { 132 | return perf_fd_open_other(p, attr, buf_size_shift, 0, -1); 133 | } 134 | 135 | /** 136 | * perf_fd_open_other - Open a perf event with ring buffer for other thread or cpu 137 | * @p: perf_fd to initialize 138 | * @attr: perf event attribute to use 139 | * @buf_size_shift: log2 of buffer size. 140 | * @pid: pid/tid to trace, or 0 for current, or -1 for any 141 | * @cpu: cpu to trace, or -1 for any. 142 | * Return: -1 on error, otherwise 0. 143 | */ 144 | int perf_fd_open_other(struct perf_fd *p, struct perf_event_attr *attr, int buf_size_shift, 145 | int pid, int cpu) 146 | { 147 | p->pfd = perf_event_open(attr, pid, cpu, -1, 0); 148 | if (p->pfd < 0) 149 | return -1; 150 | 151 | struct perf_event_mmap_page *mpage; 152 | mpage = mmap(NULL, perf_mmap_size(buf_size_shift), 153 | PROT_READ|PROT_WRITE, MAP_SHARED, 154 | p->pfd, 0); 155 | if (mpage == (struct perf_event_mmap_page *)-1L) { 156 | close(p->pfd); 157 | return -1; 158 | } 159 | p->mpage = mpage; 160 | p->buf_size_shift = buf_size_shift; 161 | return 0; 162 | } 163 | 164 | /** 165 | * perf_fd_close - Close perf_fd 166 | * @p: pfd to close. 167 | */ 168 | 169 | void perf_fd_close(struct perf_fd *p) 170 | { 171 | munmap(p->mpage, perf_mmap_size(p->buf_size_shift)); 172 | close(p->pfd); 173 | p->mpage = NULL; 174 | } 175 | 176 | /** 177 | * perf_enable - Start perf collection on pfd 178 | * @p: perf fd 179 | * Return: -1 for error, otherwise 0. 180 | */ 181 | 182 | int perf_enable(struct perf_fd *p) 183 | { 184 | return ioctl(p->pfd, PERF_EVENT_IOC_ENABLE, 0); 185 | } 186 | 187 | /** 188 | * perf_enable - Stop perf collection on pfd 189 | * @p: perf fd 190 | * Return: -1 for error, otherwise 0. 191 | */ 192 | int perf_disable(struct perf_fd *p) 193 | { 194 | return ioctl(p->pfd, PERF_EVENT_IOC_DISABLE, 0); 195 | } 196 | -------------------------------------------------------------------------------- /jevents/perf-iter.h: -------------------------------------------------------------------------------- 1 | #ifndef _PERF_ITER_H 2 | #define _PERF_ITER_H 1 3 | 4 | #include 5 | 6 | #ifdef __cplusplus 7 | extern "C" { 8 | #endif 9 | 10 | struct perf_event_mmap_page; 11 | struct perf_event_header; 12 | 13 | /* Iterator for perf ring buffer */ 14 | 15 | struct perf_iter { 16 | uint64_t ring_buffer_mask; 17 | uint64_t head, cur, raw_head, bufsize; 18 | int64_t avail; 19 | char *data; 20 | struct perf_event_mmap_page *mpage; 21 | }; 22 | 23 | struct perf_fd { 24 | int pfd; 25 | struct perf_event_mmap_page *mpage; 26 | int buf_size_shift; 27 | }; 28 | 29 | int perf_fd_open(struct perf_fd *p, struct perf_event_attr *attr, int buf_size_shift); 30 | int perf_fd_open_other(struct perf_fd *p, struct perf_event_attr *attr, int buf_size_shift, 31 | int pid, int cpu); 32 | void perf_fd_close(struct perf_fd *p); 33 | void perf_iter_continue(struct perf_iter *iter); 34 | struct perf_event_header *perf_buffer_read(struct perf_iter *iter, void *buffer, int bufsize); 35 | void perf_iter_init(struct perf_iter *iter, struct perf_fd *pfd); 36 | int perf_enable(struct perf_fd *p); 37 | int perf_disable(struct perf_fd *p); 38 | 39 | static inline int perf_iter_finished(struct perf_iter *iter) 40 | { 41 | return iter->avail <= 0; 42 | } 43 | 44 | static inline uint64_t *perf_hdr_payload(struct perf_event_header *hdr) 45 | { 46 | return (uint64_t *)(hdr + 1); 47 | } 48 | 49 | #ifdef __cplusplus 50 | } 51 | #endif 52 | 53 | #endif 54 | -------------------------------------------------------------------------------- /jevents/perf_event_open.c: -------------------------------------------------------------------------------- 1 | /* Until glibc provides a proper stub ... */ 2 | #include 3 | #include 4 | #include 5 | 6 | /* If someone else has a better one we use that */ 7 | 8 | __attribute__((weak)) 9 | int perf_event_open(struct perf_event_attr *attr, pid_t pid, 10 | int cpu, int group_fd, unsigned long flags) 11 | { 12 | return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); 13 | } 14 | -------------------------------------------------------------------------------- /jevents/rawevent.c: -------------------------------------------------------------------------------- 1 | /* Output raw events in perf form. */ 2 | /* 3 | * Copyright (c) 2014, Intel Corporation 4 | * Author: Andi Kleen 5 | * All rights reserved. 6 | * 7 | * Redistribution and use in source and binary forms, with or without 8 | * modification, are permitted provided that the following conditions are met: 9 | * 10 | * 1. Redistributions of source code must retain the above copyright notice, 11 | * this list of conditions and the following disclaimer. 12 | * 13 | * 2. Redistributions in binary form must reproduce the above copyright 14 | * notice, this list of conditions and the following disclaimer in the 15 | * documentation and/or other materials provided with the distribution. 16 | * 17 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 20 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 21 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 22 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 28 | * OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | #include 32 | #include 33 | #include 34 | #include 35 | #include "jevents.h" 36 | 37 | #define BUFS 1024 38 | 39 | /** 40 | * format_raw_event - Format a resolved event for perf's command line tool 41 | * @attr: Previously resolved perf_event_attr. 42 | * @name: Name to add to the event or NULL. 43 | * Return a string of the formatted event. The caller must free string. 44 | */ 45 | 46 | char *format_raw_event(struct perf_event_attr *attr, char *name) 47 | { 48 | char buf[BUFS]; 49 | int off = 0; 50 | char *pmu; 51 | 52 | pmu = resolve_pmu(attr->type); 53 | if (!pmu) 54 | return NULL; 55 | off = snprintf(buf, BUFS, "%s/config=%#llx", pmu, attr->config); 56 | free(pmu); 57 | if (attr->config1) 58 | off += sprintf(buf + off, ",config1=%#llx", attr->config1); 59 | if (attr->config2) 60 | off += sprintf(buf + off, ",config2=%#llx", attr->config2); 61 | if (name) 62 | off += snprintf(buf + off, BUFS - off, ",name=%s", name); 63 | off += snprintf(buf + off, BUFS - off, "/"); 64 | return strdup(buf); 65 | } 66 | -------------------------------------------------------------------------------- /jevents/rdpmc.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2012,2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | /* Ring 3 RDPMC support */ 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include "jevents.h" 26 | 27 | #if defined(__ICC) || defined(__INTEL_COMPILER) 28 | #include "immintrin.h" 29 | #endif 30 | 31 | /** 32 | * DOC: Ring 3 counting for CPU performance counters 33 | * 34 | * This library allows accessing CPU performance counters from ring 3 35 | * using the perf_events subsystem. This is useful to measure specific 36 | * parts of programs (e.g. excluding initialization code) 37 | * 38 | * Requires a Linux 3.3+ kernel 39 | */ 40 | 41 | #include "rdpmc.h" 42 | 43 | typedef unsigned long long u64; 44 | 45 | #define rmb() asm volatile("" ::: "memory") 46 | 47 | /** 48 | * rdpmc_open - initialize a simple ring 3 readable performance counter 49 | * @counter: Raw event descriptor (UUEE UU unit mask EE event) 50 | * @ctx: Pointer to struct &rdpmc_ctx that is initialized 51 | * 52 | * The counter will be set up to count CPU events excluding the kernel. 53 | * Must be called for each thread using the counter. 54 | * The caller must make sure counter is suitable for the running CPU. 55 | * Only works in 3.3+ kernels. 56 | * Must be closed with rdpmc_close() 57 | */ 58 | 59 | int rdpmc_open(unsigned counter, struct rdpmc_ctx *ctx) 60 | { 61 | struct perf_event_attr attr = { 62 | .type = counter > 10 ? PERF_TYPE_RAW : PERF_TYPE_HARDWARE, 63 | .size = PERF_ATTR_SIZE_VER0, 64 | .config = counter, 65 | .sample_type = PERF_SAMPLE_READ, 66 | .exclude_kernel = 1, 67 | }; 68 | return rdpmc_open_attr(&attr, ctx, NULL); 69 | } 70 | 71 | /** 72 | * rdpmc_open_attr - initialize a raw ring 3 readable performance counter 73 | * @attr: perf struct %perf_event_attr for the counter 74 | * @ctx: Pointer to struct %rdpmc_ctx that is initialized. 75 | * @leader_ctx: context of group leader or NULL 76 | * 77 | * This allows more flexible setup with a custom &perf_event_attr. 78 | * For simple uses rdpmc_open() should be used instead. 79 | * Must be called for each thread using the counter. 80 | * Must be closed with rdpmc_close() 81 | */ 82 | int rdpmc_open_attr(struct perf_event_attr *attr, struct rdpmc_ctx *ctx, 83 | struct rdpmc_ctx *leader_ctx) 84 | { 85 | ctx->fd = perf_event_open(attr, 0, -1, 86 | leader_ctx ? leader_ctx->fd : -1, 0); 87 | if (ctx->fd < 0) { 88 | perror("perf_event_open"); 89 | return -1; 90 | } 91 | ctx->buf = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_READ, MAP_SHARED, ctx->fd, 0); 92 | if (ctx->buf == MAP_FAILED) { 93 | close(ctx->fd); 94 | perror("mmap on perf fd"); 95 | return -1; 96 | } 97 | return 0; 98 | } 99 | 100 | /** 101 | * rdpmc_close - free a ring 3 readable performance counter 102 | * @ctx: Pointer to &rdpmc_ctx context. 103 | * 104 | * Must be called by each thread for each context it initialized. 105 | */ 106 | void rdpmc_close(struct rdpmc_ctx *ctx) 107 | { 108 | close(ctx->fd); 109 | munmap(ctx->buf, sysconf(_SC_PAGESIZE)); 110 | } 111 | 112 | /** 113 | * rdpmc_read - read a ring 3 readable performance counter 114 | * @ctx: Pointer to initialized &rdpmc_ctx structure. 115 | * 116 | * Read the current value of a running performance counter. 117 | * This should only be called from the same thread/process as opened 118 | * the context. For new threads please create a new context. 119 | */ 120 | unsigned long long rdpmc_read(struct rdpmc_ctx *ctx) 121 | { 122 | u64 val; 123 | unsigned seq; 124 | u64 offset; 125 | typeof (ctx->buf) buf = ctx->buf; 126 | unsigned index; 127 | 128 | do { 129 | seq = buf->lock; 130 | rmb(); 131 | index = buf->index; 132 | offset = buf->offset; 133 | if (index == 0) /* rdpmc not allowed */ 134 | return offset; 135 | #if defined(__ICC) || defined(__INTEL_COMPILER) 136 | val = _rdpmc(index - 1); 137 | #else 138 | val = __builtin_ia32_rdpmc(index - 1); 139 | #endif 140 | rmb(); 141 | } while (buf->lock != seq); 142 | return val + offset; 143 | } 144 | 145 | -------------------------------------------------------------------------------- /jevents/rdpmc.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2012,2013 Intel Corporation 3 | * Author: Andi Kleen 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that: (1) source code distributions 7 | * retain the above copyright notice and this paragraph in its entirety, (2) 8 | * distributions including binary code include the above copyright notice and 9 | * this paragraph in its entirety in the documentation or other materials 10 | * provided with the distribution 11 | * 12 | * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 13 | * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 14 | * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 15 | */ 16 | 17 | #ifndef RDPMC_H 18 | #define RDPMC_H 1 19 | 20 | #include 21 | 22 | #ifdef __cplusplus 23 | extern "C" { 24 | #endif 25 | 26 | struct rdpmc_ctx { 27 | int fd; 28 | struct perf_event_mmap_page *buf; 29 | }; 30 | 31 | int rdpmc_open(unsigned counter, struct rdpmc_ctx *ctx); 32 | int rdpmc_open_attr(struct perf_event_attr *attr, struct rdpmc_ctx *ctx, 33 | struct rdpmc_ctx *leader_ctx); 34 | void rdpmc_close(struct rdpmc_ctx *ctx); 35 | unsigned long long rdpmc_read(struct rdpmc_ctx *ctx); 36 | 37 | #ifdef __cplusplus 38 | } 39 | #endif 40 | 41 | #endif 42 | -------------------------------------------------------------------------------- /jevents/resolve.c: -------------------------------------------------------------------------------- 1 | /* Resolve perf style event descriptions to attr */ 2 | /* 3 | * Copyright (c) 2014, Intel Corporation 4 | * Author: Andi Kleen 5 | * All rights reserved. 6 | * 7 | * Redistribution and use in source and binary forms, with or without 8 | * modification, are permitted provided that the following conditions are met: 9 | * 10 | * 1. Redistributions of source code must retain the above copyright notice, 11 | * this list of conditions and the following disclaimer. 12 | * 13 | * 2. Redistributions in binary form must reproduce the above copyright 14 | * notice, this list of conditions and the following disclaimer in the 15 | * documentation and/or other materials provided with the distribution. 16 | * 17 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 20 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 21 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 22 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 28 | * OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | #define _GNU_SOURCE 1 32 | #include "jevents.h" 33 | #include 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | 44 | #ifndef PERF_ATTR_SIZE_VER1 45 | #define PERF_ATTR_SIZE_VER1 72 46 | #endif 47 | 48 | #define MAXFILE 4096 49 | 50 | static int read_file(char **val, const char *fmt, ...) 51 | { 52 | char *fn; 53 | va_list ap; 54 | int fd; 55 | int ret = -1; 56 | int len; 57 | 58 | *val = malloc(MAXFILE); 59 | va_start(ap, fmt); 60 | vasprintf(&fn, fmt, ap); 61 | va_end(ap); 62 | fd = open(fn, O_RDONLY); 63 | free(fn); 64 | if (fd >= 0) { 65 | if ((len = read(fd, *val, MAXFILE - 1)) > 0) { 66 | ret = 0; 67 | (*val)[len] = 0; 68 | } 69 | close(fd); 70 | } 71 | if (ret < 0) { 72 | free(*val); 73 | *val = NULL; 74 | } 75 | return ret; 76 | } 77 | 78 | #define BITS(x) ((x) == 64 ? -1ULL : (1ULL << (x)) - 1) 79 | 80 | static bool try_parse(char *format, char *fmt, __u64 val, __u64 *config) 81 | { 82 | int start, end; 83 | int n = sscanf(format, fmt, &start, &end); 84 | if (n == 1) 85 | end = start + 1; 86 | if (n == 0) 87 | return false; 88 | *config |= (val & BITS(end - start + 1)) << start; 89 | return true; 90 | } 91 | 92 | static int read_qual(const char *qual, struct perf_event_attr *attr, 93 | const char *str) 94 | { 95 | while (*qual) { 96 | switch (*qual) { 97 | case 'p': 98 | attr->precise_ip++; 99 | break; 100 | case 'k': 101 | attr->exclude_user = 1; 102 | break; 103 | case 'u': 104 | attr->exclude_kernel = 1; 105 | break; 106 | case 'h': 107 | attr->exclude_guest = 1; 108 | break; 109 | /* XXX more */ 110 | default: 111 | fprintf(stderr, "Unknown modifier %c at end for %s\n", *qual, str); 112 | return -1; 113 | } 114 | qual++; 115 | } 116 | return 0; 117 | } 118 | 119 | static bool special_attr(char *name, int val, struct perf_event_attr *attr) 120 | { 121 | if (!strcmp(name, "period")) { 122 | attr->sample_period = val; 123 | return true; 124 | } 125 | if (!strcmp(name, "freq")) { 126 | attr->sample_freq = val; 127 | attr->freq = 1; 128 | return true; 129 | } 130 | if (!strcmp(name, "config")) { 131 | attr->config = val; 132 | return true; 133 | } 134 | if (!strcmp(name, "config1")) { 135 | attr->config2 = val; 136 | return true; 137 | } 138 | if (!strcmp(name, "config2")) { 139 | attr->config2 = val; 140 | return true; 141 | } 142 | if (!strcmp(name, "name")) { 143 | // we accept the name attribute, but don't have anywhere to put it inside 144 | // perf_event_attr, so we just drop it but at least avoid an unhandled attr error 145 | return true; 146 | } 147 | return false; 148 | } 149 | 150 | static int parse_terms(char *pmu, char *config, struct perf_event_attr *attr, int recur) 151 | { 152 | char *format = NULL; 153 | char *term; 154 | 155 | char *newl = strchr(config, '\n'); 156 | if (newl) 157 | *newl = 0; 158 | 159 | while ((term = strsep(&config, ",")) != NULL) { 160 | char name[30]; 161 | int n; 162 | unsigned long long val = 1; 163 | 164 | n = sscanf(term, "%30[^=]=%lli", name, &val); 165 | if (n < 1) 166 | break; 167 | if (special_attr(name, val, attr)) 168 | continue; 169 | free(format); 170 | if (read_file(&format, "/sys/devices/%s/format/%s", pmu, name) < 0) { 171 | char *alias = NULL; 172 | 173 | if (recur == 0 && 174 | read_file(&alias, "/sys/devices/%s/events/%s", pmu, name) == 0) { 175 | if (parse_terms(pmu, alias, attr, 1) < 0) { 176 | free(alias); 177 | fprintf(stderr, "Cannot parse kernel event alias %s for %s\n", name, 178 | term); 179 | break; 180 | } 181 | free(alias); 182 | continue; 183 | } 184 | fprintf(stderr, "Cannot parse qualifier %s for %s\n", name, term); 185 | break; 186 | } 187 | bool ok = try_parse(format, "config:%d-%d", val, &attr->config) || 188 | try_parse(format, "config:%d", val, &attr->config) || 189 | try_parse(format, "config1:%d-%d", val, &attr->config1) || 190 | try_parse(format, "config1:%d", val, &attr->config1); 191 | bool ok2 = try_parse(format, "config2:%d-%d", val, &attr->config2) || 192 | try_parse(format, "config2:%d", val, &attr->config2); 193 | if (!ok && !ok2) { 194 | fprintf(stderr, "Cannot parse kernel format %s: %s for %s\n", 195 | name, format, term); 196 | break; 197 | } 198 | if (ok2) 199 | attr->size = PERF_ATTR_SIZE_VER1; 200 | } 201 | free(format); 202 | if (term) 203 | return -1; 204 | return 0; 205 | } 206 | 207 | static int try_pmu_type(char **type, char *fmt, char *pmu) 208 | { 209 | char newpmu[30]; 210 | snprintf(newpmu, 30, fmt, pmu); 211 | int ret = read_file(type, "/sys/devices/%s/type", newpmu); 212 | if (ret >= 0) 213 | strcpy(pmu, newpmu); 214 | return ret; 215 | } 216 | 217 | /** 218 | * jevent_pmu_uncore - Is perf event string for an uncore PMU. 219 | * @pmu: perf pmu 220 | * Return true if yes, false if not or unparseable. 221 | */ 222 | bool jevent_pmu_uncore(const char *str) 223 | { 224 | char *cpumask; 225 | int cpus; 226 | char pmu[30]; 227 | 228 | if (!strchr(str, '/')) 229 | return false; 230 | if (sscanf(str, "%30[^/]", pmu) < 1) 231 | return false; 232 | int ret = read_file(&cpumask, "/sys/devices/%s/cpumask", pmu); 233 | if (ret < 0) 234 | return false; 235 | bool isuncore = sscanf(cpumask, "%d", &cpus) == 1 && cpus == 0; 236 | free(cpumask); 237 | return isuncore; 238 | } 239 | 240 | /** 241 | * jevent_name_to_attr - Resolve perf style event to perf_attr 242 | * @str: perf style event (e.g. cpu/event=1/) 243 | * @attr: perf_attr to fill in. 244 | * 245 | * Resolve perf new style event descriptor to perf ATTR. User must initialize 246 | * attr->sample_type and attr->read_format as needed after this call, 247 | * and possibly other fields. Returns 0 when succeeded. 248 | */ 249 | int jevent_name_to_attr(const char *str, struct perf_event_attr *attr) 250 | { 251 | char pmu[30], config[200]; 252 | int qual_off = -1; 253 | 254 | memset(attr, 0, sizeof(struct perf_event_attr)); 255 | attr->size = PERF_ATTR_SIZE_VER0; 256 | attr->type = PERF_TYPE_RAW; 257 | 258 | if (sscanf(str, "r%llx%n", &attr->config, &qual_off) == 1) { 259 | assert(qual_off != -1); 260 | if (str[qual_off] == 0) 261 | return 0; 262 | if (str[qual_off] == ':' && read_qual(str + qual_off, attr, str) == 0) 263 | return 0; 264 | return -1; 265 | } 266 | if (sscanf(str, "%30[^/]/%200[^/]/%n", pmu, config, &qual_off) < 2) 267 | return -1; 268 | char *type = NULL; 269 | /* FIXME need interface for multiple outputs and try more instances */ 270 | if (try_pmu_type(&type, "%s", pmu) < 0 && 271 | try_pmu_type(&type, "uncore_%s", pmu) < 0 && 272 | try_pmu_type(&type, "uncore_%s_0", pmu) < 0 && 273 | try_pmu_type(&type, "uncore_%s_1", pmu) < 0) 274 | return -1; 275 | attr->type = atoi(type); 276 | free(type); 277 | if (parse_terms(pmu, config, attr, 0) < 0) 278 | return -1; 279 | if (qual_off != -1 && read_qual(str + qual_off, attr, str) < 0) 280 | return -1; 281 | return 0; 282 | } 283 | 284 | /** 285 | * walk_perf_events - walk all kernel supplied perf events 286 | * @func: Callback function to call for each event. 287 | * @data: data pointer to pass to func. 288 | */ 289 | int walk_perf_events(int (*func)(void *data, char *name, char *event, char *desc), 290 | void *data) 291 | { 292 | int ret = 0; 293 | glob_t g; 294 | if (glob("/sys/devices/*/events/*", 0, NULL, &g) != 0) 295 | return -1; 296 | int i; 297 | for (i = 0; i < g.gl_pathc; i++) { 298 | char pmu[32], event[32]; 299 | 300 | if (sscanf(g.gl_pathv[i], "/sys/devices/%30[^/]/events/%30s", 301 | pmu, event) != 2) { 302 | fprintf(stderr, "No match on %s\n", g.gl_pathv[i]); 303 | continue; 304 | } 305 | if (strchr(event, '.')) 306 | continue; 307 | 308 | 309 | char *val; 310 | if (read_file(&val, g.gl_pathv[i])) { 311 | fprintf(stderr, "Cannot read %s\n", g.gl_pathv[i]); 312 | continue; 313 | } 314 | char *s; 315 | for (s = val; *s; s++) { 316 | if (*s == '\n') 317 | *s = 0; 318 | } 319 | char *val2; 320 | asprintf(&val2, "%s/%s/", pmu, val); 321 | free(val); 322 | 323 | char *buf; 324 | asprintf(&buf, "%s/%s/", pmu, event); 325 | ret = func(data, buf, val2, ""); 326 | free(val2); 327 | free(buf); 328 | if (ret) 329 | break; 330 | } 331 | globfree(&g); 332 | return ret; 333 | } 334 | 335 | /* Should cache pmus. Caller must free return value. */ 336 | char *resolve_pmu(int type) 337 | { 338 | glob_t g; 339 | if (glob("/sys/devices/*/type", 0, NULL, &g)) 340 | return NULL; 341 | int i; 342 | char *pmun = NULL; 343 | for (i = 0; i < g.gl_pathc; i++) { 344 | char pmu[30]; 345 | if (sscanf(g.gl_pathv[i], "/sys/devices/%30[^/]/type", pmu) != 1) 346 | continue; 347 | char *numbuf; 348 | int num; 349 | if (read_file(&numbuf, g.gl_pathv[i]) < 0 || 350 | sscanf(numbuf, "%d", &num) != 1) 351 | break; 352 | if (num == type) { 353 | pmun = strdup(pmu); 354 | break; 355 | } 356 | } 357 | globfree(&g); 358 | return pmun; 359 | } 360 | 361 | #ifdef TEST 362 | #include "jevents.h" 363 | int main(int ac, char **av) 364 | { 365 | struct perf_event_attr attr = { 0 }; 366 | int ret = 1; 367 | 368 | if (!av[1]) { 369 | printf("Usage: ... perf-event-to-parse\n"); 370 | exit(1); 371 | } 372 | while (*++av) { 373 | if (jevent_name_to_attr(*av, &attr) < 0) 374 | printf("cannot parse %s\n", *av); 375 | printf("config %llx config1 %llx\n", attr.config, attr.config1); 376 | int fd; 377 | if ((fd = perf_event_open(&attr, 0, -1, -1, 0)) < 0) 378 | perror("perf_event_open"); 379 | else 380 | ret = 0; 381 | close(fd); 382 | } 383 | return ret; 384 | } 385 | #endif 386 | -------------------------------------------------------------------------------- /jevents/session.c: -------------------------------------------------------------------------------- 1 | /* Simple session layer for multiple perf events. */ 2 | /* 3 | * Copyright (c) 2015, Intel Corporation 4 | * Author: Andi Kleen 5 | * All rights reserved. 6 | * 7 | * Redistribution and use in source and binary forms, with or without 8 | * modification, are permitted provided that the following conditions are met: 9 | * 10 | * 1. Redistributions of source code must retain the above copyright notice, 11 | * this list of conditions and the following disclaimer. 12 | * 13 | * 2. Redistributions in binary form must reproduce the above copyright 14 | * notice, this list of conditions and the following disclaimer in the 15 | * documentation and/or other materials provided with the distribution. 16 | * 17 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 18 | * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 19 | * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 20 | * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 21 | * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 22 | * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26 | * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED 28 | * OF THE POSSIBILITY OF SUCH DAMAGE. 29 | */ 30 | 31 | #include 32 | #include 33 | #include 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include "jevents.h" 41 | #include "jsession.h" 42 | 43 | /** 44 | * alloc_eventlist - Alloc a list of events. 45 | */ 46 | 47 | struct eventlist *alloc_eventlist(void) 48 | { 49 | struct eventlist *el = calloc(sizeof(struct eventlist), 1); 50 | if (!el) 51 | return NULL; 52 | el->num_cpus = sysconf(_SC_NPROCESSORS_CONF); 53 | return el; 54 | } 55 | 56 | static struct event *new_event(struct eventlist *el, char *s) 57 | { 58 | struct event *e = calloc(sizeof(struct event) + 59 | sizeof(struct efd) * el->num_cpus, 1); 60 | e->next = NULL; 61 | if (!el->eventlist) 62 | el->eventlist = e; 63 | if (el->eventlist_last) 64 | el->eventlist_last->next = e; 65 | el->eventlist_last = e; 66 | e->event = strdup(s); 67 | return e; 68 | } 69 | 70 | /** 71 | * parse_events - parse a perf style string with events 72 | * @el: List of events allocated earlier 73 | * @events: Comma separated lists of events. {} style groups are legal. 74 | * 75 | * JSON events are supported, if the event lists are downloaded first. 76 | */ 77 | int parse_events(struct eventlist *el, char *events) 78 | { 79 | char *s, *tmp; 80 | 81 | events = strdup(events); 82 | if (! events) return -1; 83 | for (s = strtok_r(events, ",", &tmp); 84 | s; 85 | s = strtok_r(NULL, ",", &tmp)) { 86 | bool group_leader = false, end_group = false; 87 | int len; 88 | 89 | if (s[0] == '{') { 90 | s++; 91 | group_leader = true; 92 | } else if (len = strlen(s), len > 0 && s[len - 1] == '}') { 93 | s[len - 1] = 0; 94 | end_group = true; 95 | } 96 | 97 | struct event *e = new_event(el, s); 98 | e->uncore = jevent_pmu_uncore(s); 99 | e->group_leader = group_leader; 100 | e->end_group = end_group; 101 | if (resolve_event(s, &e->attr) < 0) { 102 | fprintf(stderr, "Cannot resolve %s\n", e->event); 103 | return -1; 104 | } 105 | } 106 | free(events); 107 | return 0; 108 | } 109 | 110 | static bool cpu_online(int i) 111 | { 112 | bool ret = false; 113 | char fn[100]; 114 | sprintf(fn, "/sys/devices/system/cpu/cpu%d/online", i); 115 | int fd = open(fn, O_RDONLY); 116 | if (fd >= 0) { 117 | char buf[128]; 118 | int n = read(fd, buf, 128); 119 | if (n > 0 && !strncmp(buf, "1", 1)) 120 | ret = true; 121 | close(fd); 122 | } 123 | return ret; 124 | } 125 | 126 | /** 127 | * setup_event - Create perf descriptor for a single event. 128 | * @e: Event to measure. 129 | * @cpu: CPU to measure. 130 | * @leader: Leader event to define a group. 131 | * @measure_all: If true measure all processes (may need root) 132 | * @measure_pid: If not -1 measure specific process. 133 | * 134 | * This is a low level function. Normally setup_events() should be used. 135 | * Return -1 on failure. 136 | */ 137 | 138 | int setup_event(struct event *e, int cpu, struct event *leader, 139 | bool measure_all, int measure_pid) 140 | { 141 | e->attr.inherit = 1; 142 | if (!measure_all) { 143 | e->attr.disabled = 1; 144 | e->attr.enable_on_exec = 1; 145 | } 146 | e->attr.read_format |= PERF_FORMAT_TOTAL_TIME_ENABLED | 147 | PERF_FORMAT_TOTAL_TIME_RUNNING; 148 | 149 | e->efd[cpu].fd = perf_event_open(&e->attr, 150 | measure_all ? -1 : measure_pid, 151 | cpu, 152 | leader ? leader->efd[cpu].fd : -1, 153 | 0); 154 | 155 | if (e->efd[cpu].fd < 0) { 156 | /* Handle offline CPU */ 157 | if (errno == EINVAL && !cpu_online(cpu)) 158 | return 0; 159 | 160 | fprintf(stderr, "Cannot open perf event for %s/%d: %s\n", 161 | e->event, cpu, strerror(errno)); 162 | return -1; 163 | } 164 | return 0; 165 | } 166 | 167 | /** 168 | * setup_events - Set up perf events for a event list. 169 | * @el: List of events, allocated and parsed earlier. 170 | * @measure_all: If true measure all of system (may need root) 171 | * @measure_pid: If not -1 measure pid. 172 | * 173 | * Return -1 on failure, otherwise 0. 174 | */ 175 | 176 | int setup_events(struct eventlist *el, bool measure_all, int measure_pid) 177 | { 178 | struct event *e, *leader = NULL; 179 | int i; 180 | int err = 0; 181 | int ret; 182 | 183 | for (e = el->eventlist; e; e = e->next) { 184 | if (e->uncore) { 185 | /* XXX for every socket. for now just 0. */ 186 | ret = setup_event(e, 0, leader, measure_all, measure_pid); 187 | if (ret < 0) { 188 | err = ret; 189 | continue; 190 | } 191 | for (i = 1; i < el->num_cpus; i++) 192 | e->efd[i].fd = -1; 193 | } else { 194 | for (i = 0; i < el->num_cpus; i++) { 195 | ret = setup_event(e, i, leader, 196 | measure_all, 197 | measure_pid); 198 | if (ret < 0) { 199 | err = ret; 200 | continue; 201 | } 202 | } 203 | } 204 | if (e->group_leader) 205 | leader = e; 206 | if (e->end_group) 207 | leader = NULL; 208 | } 209 | return err; 210 | } 211 | 212 | /** 213 | * read_event - Read the value of a single event for one CPU. 214 | * @e: event to read 215 | * @cpu: cpu number to read 216 | * Returns -1 on failure, otherwise 0. 217 | * The value read can be retrieved later with event_scaled_value. 218 | */ 219 | 220 | int read_event(struct event *e, int cpu) 221 | { 222 | int n = read(e->efd[cpu].fd, &e->efd[cpu].val, 3 * 8); 223 | if (n < 0) { 224 | fprintf(stderr, "Error reading from %s/%d: %s\n", 225 | e->event, cpu, strerror(errno)); 226 | return -1; 227 | } 228 | return 0; 229 | } 230 | 231 | /** 232 | * read_event - Read value of all events on all CPUs. 233 | * @el: eventlist. Must be allocated, parsed, set up earlier. 234 | * Returns -1 on failure, otherwise 0. 235 | */ 236 | 237 | int read_all_events(struct eventlist *el) 238 | { 239 | struct event *e; 240 | int i; 241 | 242 | for (e = el->eventlist; e; e = e->next) { 243 | if (e->uncore) { 244 | /* XXX all sockets */ 245 | if (e->efd[0].fd < 0) 246 | continue; 247 | if (read_event(e, 0) < 0) 248 | return -1; 249 | } 250 | for (i = 0; i < el->num_cpus; i++) { 251 | if (e->efd[i].fd < 0) 252 | continue; 253 | if (read_event(e, i) < 0) 254 | return -1; 255 | } 256 | } 257 | return 0; 258 | } 259 | 260 | /** 261 | * event_scaled_value - Retrieve a read value for a cpu 262 | * @e: Event 263 | * @cpu: CPU number 264 | * Return scaled value read earlier. 265 | */ 266 | uint64_t event_scaled_value(struct event *e, int cpu) 267 | { 268 | uint64_t *val = e->efd[cpu].val; 269 | if (val[1] != val[2] && val[2]) 270 | return val[0] * (double)val[1] / (double)val[2]; 271 | return val[0]; 272 | } 273 | -------------------------------------------------------------------------------- /jevents/showevent.c: -------------------------------------------------------------------------------- 1 | /* Resolve perf event descriptions with symbolic names to raw perf descriptions */ 2 | #include "jevents.h" 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | int main(int ac, char **av) 10 | { 11 | int test = 0; 12 | int ret = 0; 13 | 14 | while (*++av) { 15 | if (!strcmp(*av, "--test")) { 16 | test = 1; 17 | continue; 18 | } 19 | 20 | struct perf_event_attr attr; 21 | if (resolve_event(*av, &attr) < 0) { 22 | fprintf(stderr, "Cannot resolve %s\n", *av); 23 | ret = 1; 24 | continue; 25 | } 26 | char *ev = format_raw_event(&attr, *av); 27 | printf("%s\n", ev); 28 | free(ev); 29 | if (test) { 30 | if (perf_event_open(&attr, 0, -1, -1, 0) < 0) 31 | perror("perf_event_open"); 32 | } 33 | } 34 | return ret; 35 | } 36 | -------------------------------------------------------------------------------- /jevents/tester: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # tests for jevents 3 | # may need executing event_download.py first to get event list for this cpu 4 | set -e 5 | set -x 6 | 7 | failed() { 8 | echo FAILED 9 | } 10 | trap failed ERR 0 11 | 12 | PATH=.:./examples:$PATH 13 | 14 | 15 | $V listevents > l 16 | [ $(wc -l < l) -gt 50 ] 17 | grep -q offcore_response l 18 | 19 | if grep -q br_misp_retired.taken l ; then 20 | E=br_misp_retired.taken 21 | elif grep -q br_misp_retired.near_taken l ; then 22 | E=br_misp_retired.near_taken 23 | else 24 | E=instructions 25 | fi 26 | 27 | $V jestat true 28 | $V jestat -e cpu-cycles,cpu_clk_unhalted.ref_tsc,$E true 29 | $V jestat -e "{cpu-cycles,cpu_clk_unhalted.ref_tsc},{$E,cache-references}" -a sleep 1 30 | $V jestat -a sleep 1 31 | 32 | # test all events 33 | LEN=$(wc -l l | awk ' { print $1 }') 34 | INC=20 35 | for ((i = 1; i <= LEN; i += INC)) ; do 36 | # skip i915/vcs-* which often returns ENODEV for no good reason 37 | $V jestat $(nl l | 38 | egrep -v 'i915/vcs' | 39 | awk -v v=$i -v inc=$INC '$1 >= v && $1 <= v+inc { print "-e " $2 } ') -a true 40 | done 41 | 42 | $V showevent $E 43 | 44 | $V event-rmap $E 45 | 46 | $V examples/addr 47 | examples/rtest 48 | examples/rtest2 49 | 50 | trap "" ERR 0 51 | 52 | echo SUCCEEDED 53 | 54 | 55 | -------------------------------------------------------------------------------- /jevents/util.h: -------------------------------------------------------------------------------- 1 | #ifdef __cplusplus 2 | extern "C" { 3 | #endif 4 | 5 | #define err(x) perror(x), exit(1) 6 | #define mb() asm volatile("" ::: "memory") 7 | #define MB (1024*1024) 8 | typedef unsigned long long u64; 9 | typedef long long s64; 10 | 11 | #ifdef __cplusplus 12 | } 13 | #endif 14 | -------------------------------------------------------------------------------- /main.c: -------------------------------------------------------------------------------- 1 | #define _GNU_SOURCE 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | #include 10 | #include 11 | #include 12 | 13 | #include "common.h" 14 | 15 | #include "cycle-timer.h" 16 | #include "huge-alloc.h" 17 | #include "perf-timer.h" 18 | 19 | #include 20 | #include 21 | 22 | #include 23 | 24 | bool verbose; 25 | bool summary; 26 | 27 | int getenv_int(const char *var, int def) { 28 | const char *val = getenv(var); 29 | return val ? atoi(val) : def; 30 | } 31 | 32 | bool getenv_bool(const char *var) { 33 | const char *val = getenv(var); 34 | return val && strcmp(val, "1") == 0; 35 | } 36 | 37 | 38 | typedef struct { 39 | const char *name; 40 | store_function *f; 41 | const char *desc; 42 | // expected number of L1 and L2 hits (L2 hits implying L1 misses) 43 | int l1_hits; 44 | int l2_hits; 45 | // a multiplier applied to the number of "kernel loops" (accesses the test is 46 | // request to perform), useful for tests that have sigifnicant startup overhead 47 | int loop_mul; 48 | } test_description; 49 | 50 | 51 | 52 | 53 | // /---------- l1_hits 54 | // | /------- l2_hits 55 | // | | /--- loop_mul 56 | const test_description all_funcs[] = { // v v v 57 | { "interleaved" , writes_inter , "basic interleaved stores (1 fixed 1 variable)" , 1, 1, 1}, 58 | { "interleaved-pf-fixed" , writes_inter_pf_fixed , "interleaved with fixed region prefetch" , 1, 1, 1}, 59 | { "interleaved-pf-var" , writes_inter_pf_var , "interleaved with variable region prefetch" , 1, 1, 1}, 60 | { "interleaved-pf-both" , writes_inter_pf_both , "interleaved with both region prefetch" , 1, 1, 1}, 61 | { "interleaved-u2" , writes_inter_u2 , "interleaved unrolled by 2x" , 1, 1, 1}, 62 | { "interleaved-u4" , writes_inter_u4 , "interleaved unrolled by 4x" , 1, 1, 1}, 63 | { "interleaved-sfenceA" , writes_inter_sfenceA , "interleaved with 1 sfence" , 1, 1, 1}, 64 | { "interleaved-sfenceB" , writes_inter_sfenceB , "interleaved with 1 sfence" , 1, 1, 1}, 65 | { "interleaved-sfenceC" , writes_inter_sfenceC , "interleaved with 2 sfences" , 1, 1, 1}, 66 | { "wrandom1" , write_random_single , "single region random stores " , 1, 1, 1}, 67 | { "wrandom1-unroll" , write_random_singleu , "wrandom1 but unrolled and fast/cheaty RNG" , 1, 1, 1}, 68 | { "wlinear1" , write_linear , "linear 64B stide writes over one stream" , 1, 1, 1}, 69 | { "wlinearHL" , write_linearHL , "linear with lfence" , 1, 1, 1}, 70 | { "wlinearHS" , write_linearHS , "linear with sfence" , 1, 1, 1}, 71 | { "wlinear1-sfence" , write_linear_sfence , "linear with sfence" , 1, 1, 1}, 72 | { "rlinear1" , read_linear , "linear 64B stride reads over one region" , 1, 1, 1}, 73 | { "lcg" , random_lcg , "raw LCG test" , 1, 1, 1}, 74 | { "pcg" , random_pcg , "raw PCG test" , 1, 1, 1}, 75 | {} // sentinel 76 | }; 77 | 78 | 79 | void flushmem(void *p, size_t size) { 80 | for (size_t i = 0; i < size; i += 64) { 81 | _mm_clflush((char *)p + i); 82 | } 83 | _mm_mfence(); 84 | } 85 | 86 | char *alloc(size_t size) { 87 | // static volatile int zero; 88 | size_t grossed_up = size * 2 + 1000; 89 | char *p = huge_alloc(grossed_up, !summary); 90 | memset(p, 0x13, grossed_up); 91 | 92 | flushmem(p, grossed_up); 93 | 94 | return p; 95 | } 96 | 97 | void pinToCpu(int cpu) { 98 | cpu_set_t set; 99 | CPU_ZERO(&set); 100 | CPU_SET(cpu, &set); 101 | if (sched_setaffinity(0, sizeof(set), &set)) { 102 | assert("pinning failed" && false); 103 | } 104 | } 105 | 106 | 107 | void usageError() { 108 | fprintf(stderr, 109 | "Usage:\n" 110 | "\tbench TEST_NAME\n" 111 | "\n TEST_NAME is one of:\n\n" 112 | ); 113 | 114 | for (const test_description* desc = all_funcs; desc->name; desc++) { 115 | printf(" %s\n\t%s\n", desc->name, desc->desc); 116 | } 117 | exit(EXIT_FAILURE); 118 | } 119 | 120 | /* dump the tests in a single space-separate strings, perhaps convenient so you can do something like: 121 | for test in $(W_DUMPTESTS=1 ./weirdo-main); do ./weirdo-main $test; done 122 | to run all tests. */ 123 | void dump_tests() { 124 | for (const test_description* desc = all_funcs; desc->name; desc++) { 125 | printf("%s ", desc->name); 126 | } 127 | } 128 | 129 | size_t parse_kib(const char* strval) { 130 | int val = atoi(strval); 131 | if (val <= 0) { 132 | fprintf(stderr, "bad start/stop value: '%s'\n", strval); 133 | usageError(); 134 | } 135 | size_t ret = 1; 136 | while (ret < (size_t)val) { 137 | ret *= 2; 138 | } 139 | return ret; 140 | } 141 | 142 | /* return the alignment of the given pointer 143 | i.e., the largest power of two that divides it */ 144 | size_t get_alignment(void *p) { 145 | return (size_t)((1UL << __builtin_ctzl((uintptr_t)p))); 146 | } 147 | 148 | void update_min_counts(bool first, event_counts* min, event_counts cur) { 149 | for (size_t i = 0; i < MAX_EVENTS; i++) { 150 | min->counts[i] = (first || cur.counts[i] < min->counts[i] ? cur.counts[i] : min->counts[i]); 151 | } 152 | } 153 | 154 | void print_count_deltas(event_counts delta, uint64_t divisor, const char *format, const char *delim) { 155 | for (size_t i=0; i < num_counters(); i++) { 156 | if (i) printf("%s", delim); 157 | printf(format, (double)delta.counts[i] / divisor); 158 | } 159 | } 160 | 161 | int main(int argc, char** argv) { 162 | 163 | summary = getenv_bool("SUMMARY"); 164 | verbose = !summary; // && getenv_bool("W_VERBOSE"); 165 | bool dump_tests_flag = getenv_bool("DUMPTESTS"); 166 | bool allow_alias = getenv_bool("ALLOW_ALIAS"); 167 | bool plot = getenv_bool("PLOT"); // output the data in csv format suitable for plotting 168 | bool do_list_events = getenv_bool("LIST_EVENTS"); // list the events and quit 169 | 170 | int array1_kib = getenv_int("ARRAY1_SIZE", 128); 171 | int pincpu = getenv_int("PINCPU", 1); 172 | int iter_base = getenv_int("ITERBASE", 100); 173 | 174 | const char* counter_string = getenv("CPU_COUNTERS"); 175 | bool use_counters = counter_string != 0; 176 | 177 | if (dump_tests_flag) { 178 | dump_tests(); 179 | exit(EXIT_SUCCESS); 180 | } 181 | 182 | if (do_list_events) { 183 | list_events(); 184 | exit(EXIT_SUCCESS); 185 | } 186 | 187 | if (argc != 2 && argc != 4) { 188 | fprintf(stderr, "Must provide 1 or 3 arguments\n\n"); 189 | usageError(); 190 | } 191 | 192 | size_t start_kib = (argc == 4 ? parse_kib(argv[2]) : 4); 193 | size_t stop_kib = (argc == 4 ? parse_kib(argv[3]) : 512); 194 | 195 | const char* fname = argc >= 2 ? argv[1] : all_funcs[0].name; // first func is the default 196 | const test_description *test = 0; 197 | for (const test_description* desc = all_funcs; desc->name; desc++) { 198 | if (strcmp(desc->name, fname) == 0) { 199 | test = desc; 200 | break; 201 | } 202 | } 203 | 204 | if (!test) { 205 | fprintf(stderr, "Bad test name: %s\n", fname); 206 | usageError(); 207 | } 208 | 209 | pinToCpu(pincpu); 210 | 211 | cl_init(!summary); 212 | 213 | // run the whole test repeat_count times, each of which calls the test function iters times, 214 | // and each test function should loop kernel_loops times (with 1 or 2 stores), or equivalent with unrolling 215 | unsigned repeat_count = 10; 216 | size_t iters = iter_base; 217 | size_t output1_size = array1_kib * 1024; // in bytes 218 | 219 | char *output1 = alloc(output1_size * 2); 220 | // adjust the second array by a page to avoid aliasing (not 4K aliasing though), unless allow_alias is set 221 | char *output2 = alloc(stop_kib * 1024 * 2) + (allow_alias ? 0 : 4096); 222 | 223 | if (!summary) { 224 | fprintf(stderr, "Running test %s : %s\n", test->name, test->desc); 225 | } 226 | 227 | if (verbose) { 228 | fprintf(stderr, "pinned to cpu : %10d\n", pincpu); 229 | fprintf(stderr, "current cpu : %10d\n", sched_getcpu()); 230 | fprintf(stderr, "array1 size : %10zu KiB\n", output1_size / 1024); 231 | fprintf(stderr, "array1 align : %10zu\n", get_alignment(output1)); 232 | fprintf(stderr, "array2 start : %10zu KiB\n", start_kib); 233 | fprintf(stderr, "array2 stop : %10zu KiB\n", stop_kib); 234 | fprintf(stderr, "array2 align : %10zu\n", get_alignment(output2)); 235 | } 236 | 237 | if (use_counters) { 238 | setup_counters(!plot, counter_string); 239 | } 240 | 241 | if (!summary) { 242 | fprintf(stderr, "Starting main loop after %zu ms\n", (size_t)clock() * 1000u / CLOCKS_PER_SEC); 243 | } 244 | 245 | if (plot) { 246 | printf("array2 KiB,cycles/iter"); 247 | } else { 248 | printf("array2 KiB | iter | cyc/iter"); 249 | } 250 | print_counter_headings(plot ? ",%s" : " | %12s"); 251 | printf("\n"); 252 | 253 | for (size_t array2_kib = start_kib; array2_kib <= stop_kib; array2_kib *= 2) { 254 | size_t array2_size = array2_kib * 1024; 255 | size_t max_size = output1_size > array2_size ? output1_size : array2_size; 256 | size_t kernel_loops = 2 * max_size * test->loop_mul / 64; 257 | double total_iters = (double)kernel_loops * iters; 258 | double min_cycles = UINT64_MAX, max_cycles = 0; 259 | event_counts min_counts = {}; 260 | for (unsigned repeat = 0; repeat < repeat_count; repeat++) { 261 | event_counts counts_before = read_counters(); 262 | cl_timepoint start = cl_now(); 263 | for (int c = iters; c-- > 0;) { 264 | test->f(kernel_loops, output1, output1_size, output2, array2_size); 265 | _mm_lfence(); // prevent inter-iteration overlap 266 | } 267 | cl_timepoint end = cl_now(); 268 | event_counts counts_after = read_counters(); 269 | cl_interval delta_nanos = cl_delta(start, end); 270 | double cycles = cl_to_cycles(delta_nanos); 271 | event_counts counts_delta = calc_delta(counts_before, counts_after); 272 | if (!plot) { 273 | printf("%10zu %6d %10.2f ", array2_kib, repeat, cycles / total_iters); 274 | print_count_deltas(counts_delta, total_iters, " %12.2f", " "); 275 | printf("\n"); 276 | } 277 | min_cycles = fmin(min_cycles, cycles); 278 | max_cycles = fmax(max_cycles, cycles); 279 | update_min_counts(repeat == 0, &min_counts, counts_delta); 280 | fflush(stdout); 281 | } 282 | if (plot) { 283 | printf("%zu,%.4f", array2_kib, min_cycles / total_iters); 284 | print_count_deltas(min_counts, total_iters, ",%.4f", ""); 285 | printf("\n"); 286 | } else { 287 | const char *fmt = "%10zu %6s %10.2f "; 288 | printf(fmt, array2_kib, "min", min_cycles / total_iters); 289 | print_count_deltas(min_counts, total_iters, " %12.2f", " "); 290 | printf("\n"); 291 | printf(fmt, array2_kib, "max", max_cycles / total_iters); 292 | printf("\n----------------------------------------\n"); 293 | } 294 | } 295 | 296 | 297 | } 298 | 299 | 300 | 301 | -------------------------------------------------------------------------------- /opt-control.h: -------------------------------------------------------------------------------- 1 | 2 | /* 3 | * "sinking" a value instructs the compiler to caculate it, i.e., 4 | * makes the compiler believe that the value is necessary and hence 5 | * must the calculated. The actual sink implementation is empty and 6 | * so usually leaves no trace in the generated code except that the 7 | * value will be calculated. 8 | */ 9 | static inline void sink(int x) { 10 | __asm__ volatile ("" :: "r"(x) :); 11 | } 12 | 13 | /* 14 | * Similar to sink except that it sinks the content pointed to 15 | * by the pointer, so the compiler will materialize in memory 16 | * anything pointed to by the pointer. 17 | */ 18 | static inline void sink_ptr(void *p) { 19 | __asm__ volatile ("" :: "r"(p) : "memory"); 20 | } 21 | -------------------------------------------------------------------------------- /page-info.c: -------------------------------------------------------------------------------- 1 | /* 2 | * smaps.c 3 | * 4 | * Created on: Jan 31, 2017 5 | * Author: tdowns 6 | */ 7 | 8 | #include "page-info.h" 9 | 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | 23 | 24 | #define PM_PFRAME_MASK ((1ULL << 55) - 1) 25 | #define PM_SOFT_DIRTY (1ULL << 55) 26 | #define PM_MMAP_EXCLUSIVE (1ULL << 56) 27 | #define PM_FILE (1ULL << 61) 28 | #define PM_SWAP (1ULL << 62) 29 | #define PM_PRESENT (1ULL << 63) 30 | 31 | 32 | /** bundles a flag with its description */ 33 | typedef struct { 34 | int flag_num; 35 | char const *name; 36 | bool show_default; 37 | } flag; 38 | 39 | #define FLAG_SHOW(name) { KPF_ ## name, # name, true }, 40 | #define FLAG_HIDE(name) { KPF_ ## name, # name, false }, 41 | 42 | const flag kpageflag_defs[] = { 43 | FLAG_SHOW(LOCKED ) 44 | FLAG_HIDE(ERROR ) 45 | FLAG_HIDE(REFERENCED ) 46 | FLAG_HIDE(UPTODATE ) 47 | FLAG_HIDE(DIRTY ) 48 | FLAG_HIDE(LRU ) 49 | FLAG_SHOW(ACTIVE ) 50 | FLAG_SHOW(SLAB ) 51 | FLAG_HIDE(WRITEBACK ) 52 | FLAG_HIDE(RECLAIM ) 53 | FLAG_SHOW(BUDDY ) 54 | FLAG_SHOW(MMAP ) 55 | FLAG_SHOW(ANON ) 56 | FLAG_SHOW(SWAPCACHE ) 57 | FLAG_SHOW(SWAPBACKED ) 58 | FLAG_SHOW(COMPOUND_HEAD) 59 | FLAG_SHOW(COMPOUND_TAIL) 60 | FLAG_SHOW(HUGE ) 61 | FLAG_SHOW(UNEVICTABLE ) 62 | FLAG_SHOW(HWPOISON ) 63 | FLAG_SHOW(NOPAGE ) 64 | FLAG_SHOW(KSM ) 65 | FLAG_SHOW(THP ) 66 | /* older kernels won't have these new flags, so conditionally compile in support for them */ 67 | #ifdef KPF_BALLOON 68 | FLAG_SHOW(BALLOON ) 69 | #endif 70 | #ifdef KPF_ZERO_PAGE 71 | FLAG_SHOW(ZERO_PAGE ) 72 | #endif 73 | #ifdef KPF_IDLE 74 | FLAG_SHOW(IDLE ) 75 | #endif 76 | 77 | { -1, 0, false } // sentinel 78 | }; 79 | 80 | #define kpageflag_count (sizeof(kpageflag_defs)/sizeof(kpageflag_defs[0]) - 1) 81 | 82 | #define ITERATE_FLAGS for (flag const *f = kpageflag_defs; f->flag_num != -1; f++) 83 | 84 | 85 | // x-macro for doing some operation on all the pagemap flags 86 | #define PAGEMAP_X(fn) \ 87 | fn(softdirty ) \ 88 | fn(exclusive ) \ 89 | fn(file ) \ 90 | fn(swapped ) \ 91 | fn(present ) 92 | 93 | static unsigned get_page_size() { 94 | long psize = sysconf(_SC_PAGESIZE); 95 | assert(psize >= 1 && psize <= UINT_MAX); 96 | return (unsigned)psize; 97 | } 98 | 99 | /* round the given pointer down to the page boundary (i.e,. return a pointer to the page it lives in) */ 100 | static inline void *pagedown(void *p, unsigned psize) { 101 | return (void *)(((uintptr_t)p) & -(uintptr_t)psize); 102 | } 103 | 104 | /** 105 | * Extract the interesting info from a 64-bit pagemap value, and return it as a page_info. 106 | */ 107 | page_info extract_info(uint64_t bits) { 108 | page_info ret = {}; 109 | ret.pfn = bits & PM_PFRAME_MASK; 110 | ret.softdirty = bits & PM_SOFT_DIRTY; 111 | ret.exclusive = bits & PM_MMAP_EXCLUSIVE; 112 | ret.file = bits & PM_FILE; 113 | ret.swapped = bits & PM_SWAP; 114 | ret.present = bits & PM_PRESENT; 115 | return ret; 116 | } 117 | 118 | /* print page_info to the given file */ 119 | void fprint_info(FILE* f, page_info info) { 120 | fprintf(f, 121 | "PFN: %p\n" 122 | "softdirty = %d\n" 123 | "exclusive = %d\n" 124 | "file = %d\n" 125 | "swapped = %d\n" 126 | "present = %d\n", 127 | (void*)info.pfn, 128 | info.softdirty, 129 | info.exclusive, 130 | info.file, 131 | info.swapped, 132 | info.present); 133 | } 134 | 135 | void print_info(page_info info) { 136 | fprint_info(stdout, info); 137 | } 138 | 139 | flag_count get_flag_count(page_info_array infos, int flag_num) { 140 | flag_count ret = {}; 141 | 142 | if (flag_num < 0 || flag_num > 63) { 143 | return ret; 144 | } 145 | 146 | uint64_t flag = (1ULL << flag_num); 147 | 148 | ret.flag = flag_num; 149 | ret.pages_total = infos.num_pages; 150 | 151 | for (size_t i = 0; i < infos.num_pages; i++) { 152 | page_info info = infos.info[i]; 153 | if (info.kpageflags_ok) { 154 | ret.pages_set += (info.kpageflags & flag) == flag; 155 | ret.pages_available++; 156 | } 157 | } 158 | return ret; 159 | } 160 | 161 | /** 162 | * Print the table header that lines up with the tabluar format used by the "table" printing 163 | * functions. Called by fprint_ratios, or you can call it yourself if you want to prefix the 164 | * output with your own columns. 165 | */ 166 | void fprint_info_header(FILE *file) { 167 | fprintf(file, " PFN sdirty excl file swappd presnt "); 168 | ITERATE_FLAGS { if (f->show_default) fprintf(file, "%4.4s ", f->name); } 169 | fprintf(file, "\n"); 170 | } 171 | 172 | /* print one info in a tabular format (as a single row) */ 173 | void fprint_info_row(FILE *file, page_info info) { 174 | fprintf(file, "%12p %7d%7d%7d%7d%7d ", 175 | (void*)info.pfn, 176 | info.softdirty, 177 | info.exclusive, 178 | info.file, 179 | info.swapped, 180 | info.present); 181 | 182 | if (info.kpageflags_ok) { 183 | ITERATE_FLAGS { if (f->show_default) fprintf(file, "%4d ", !!(info.kpageflags & (1ULL << f->flag_num))); } 184 | } 185 | fprintf(file, "\n"); 186 | } 187 | 188 | #define DECLARE_ACCUM(name) size_t name ## _accum = 0; 189 | #define INCR_ACCUM(name) name ## _accum += info->name; 190 | #define PRINT_ACCUM(name) fprintf(file, "%7.4f", (double)name ## _accum / infos.num_pages); 191 | 192 | 193 | void fprint_ratios_noheader(FILE *file, page_info_array infos) { 194 | PAGEMAP_X(DECLARE_ACCUM); 195 | size_t total_kpage_ok = 0; 196 | size_t flag_totals[kpageflag_count] = {}; 197 | for (size_t p = 0; p < infos.num_pages; p++) { 198 | page_info *info = &infos.info[p]; 199 | PAGEMAP_X(INCR_ACCUM); 200 | if (info->kpageflags_ok) { 201 | total_kpage_ok++; 202 | int i = 0; 203 | ITERATE_FLAGS { 204 | flag_totals[i++] += !!(info->kpageflags & (1ULL << f->flag_num)); 205 | } 206 | } 207 | } 208 | 209 | printf("%12s ", "----------"); 210 | PAGEMAP_X(PRINT_ACCUM) 211 | 212 | int i = 0; 213 | if (total_kpage_ok > 0) { 214 | ITERATE_FLAGS { 215 | if (f->show_default) fprintf(file, " %4.2f", (double)flag_totals[i] / total_kpage_ok); 216 | i++; 217 | } 218 | } 219 | fprintf(file, "\n"); 220 | } 221 | 222 | /* 223 | * Print a table with one row per page from the given infos. 224 | */ 225 | void fprint_ratios(FILE *file, page_info_array infos) { 226 | fprint_info_header(file); 227 | fprint_ratios_noheader(file, infos); 228 | } 229 | 230 | /* 231 | * Prints a summary of all the pages in the given array as ratios: the fraction of the time the given 232 | * flag was set. 233 | */ 234 | void fprint_table(FILE *f, page_info_array infos) { 235 | fprintf(f, "%zu total pages\n", infos.num_pages); 236 | fprint_info_header(f); 237 | for (size_t p = 0; p < infos.num_pages; p++) { 238 | fprint_info_row(f, infos.info[p]); 239 | } 240 | } 241 | 242 | 243 | 244 | /** 245 | * Get info for a single page indicated by the given pointer (which may point anywhere in the page) 246 | */ 247 | page_info get_page_info(void *p) { 248 | unsigned psize = get_page_size(); 249 | FILE *pagemap_file = fopen("/proc/self/pagemap", "rb"); 250 | if (!pagemap_file) err(EXIT_FAILURE, "failed to open pagemap"); 251 | 252 | if (fseek(pagemap_file, (uintptr_t)p / psize * sizeof(uint64_t), SEEK_SET)) err(EXIT_FAILURE, "pagemap seek failed"); 253 | 254 | uint64_t bits; 255 | int readc; 256 | if ((readc = fread(&bits, sizeof(bits), 1, pagemap_file)) != 1) err(EXIT_FAILURE, "unexpected fread return: %d", readc); 257 | 258 | page_info info = extract_info(bits); 259 | 260 | if (info.pfn) { 261 | // we got a pfn, try to read /proc/kpageflags 262 | FILE *kpageflags_file = fopen("/proc/kpageflags", "rb"); 263 | if (!kpageflags_file) { 264 | warn("failed to open kpageflags"); 265 | } else { 266 | if (fseek(kpageflags_file, info.pfn * sizeof(bits), SEEK_SET)) err(EXIT_FAILURE, "kpageflags seek failed"); 267 | if ((readc = fread(&bits, sizeof(bits), 1, kpageflags_file)) != 1) err(EXIT_FAILURE, "unexpected fread return: %d", readc); 268 | info.kpageflags_ok = true; 269 | info.kpageflags = bits; 270 | fclose(kpageflags_file); 271 | } 272 | } 273 | 274 | fclose(pagemap_file); 275 | 276 | return info; 277 | } 278 | 279 | /** 280 | * Get information for each page in the range from start (inclusive) to end (exclusive). 281 | */ 282 | page_info_array get_info_for_range(void *start, void *end) { 283 | unsigned psize = get_page_size(); 284 | void *start_page = pagedown(start, psize); 285 | void *end_page = pagedown(end - 1, psize) + psize; 286 | size_t page_count = start < end ? (end_page - start_page) / psize : 0; 287 | assert(page_count == 0 || start_page < end_page); 288 | 289 | page_info *infos = malloc((page_count + 1) * sizeof(page_info)); 290 | 291 | for (size_t p = 0; p < page_count; p++) { 292 | infos[p] = get_page_info((char *)start + p * psize); 293 | } 294 | 295 | return (page_info_array){ page_count, infos }; 296 | } 297 | 298 | void free_info_array(page_info_array infos) { 299 | free(infos.info); 300 | } 301 | 302 | int flag_from_name(char const *name) { 303 | ITERATE_FLAGS { 304 | if (strcasecmp(f->name, name) == 0) { 305 | return f->flag_num; 306 | } 307 | } 308 | return -1; 309 | } 310 | 311 | 312 | -------------------------------------------------------------------------------- /page-info.h: -------------------------------------------------------------------------------- 1 | /* 2 | * page-info.h 3 | */ 4 | 5 | #ifndef PAGE_INFO_H_ 6 | #define PAGE_INFO_H_ 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #ifdef __cplusplus 14 | extern "C" { 15 | #endif 16 | 17 | typedef struct { 18 | /* page frame number: if present, the physical frame for the page */ 19 | uint64_t pfn; 20 | /* soft-dirty set */ 21 | bool softdirty; 22 | /* exclusively mapped, see e.g., https://patchwork.kernel.org/patch/6787921/ */ 23 | bool exclusive; 24 | /* is a file mapping */ 25 | bool file; 26 | /* page is swapped out */ 27 | bool swapped; 28 | /* page is present, i.e, a physical page is allocated */ 29 | bool present; 30 | /* if true, the kpageflags were successfully loaded, if false they were not (and are all zero) */ 31 | bool kpageflags_ok; 32 | /* the 64-bit flag value extracted from /proc/kpageflags only if pfn is non-null */ 33 | uint64_t kpageflags; 34 | 35 | } page_info; 36 | /* 37 | * Information for a number of virtually consecutive pages. 38 | */ 39 | typedef struct { 40 | /* how many pages_info are structures are in the array pointed to by info */ 41 | size_t num_pages; 42 | 43 | /* pointer to the array of page_info structures */ 44 | page_info *info; 45 | } page_info_array; 46 | 47 | 48 | typedef struct { 49 | /* the number of pages on which this flag was set, always <= pages_available */ 50 | size_t pages_set; 51 | 52 | /* the number of pages on which information could be obtained */ 53 | size_t pages_available; 54 | 55 | /* the total number of pages examined, which may be greater than pages_available if 56 | * the flag value could not be obtained for some pages (usually because the pfn is not available 57 | * since the page is not yet present or because running as non-root. 58 | */ 59 | size_t pages_total; 60 | 61 | /* the flag the values were queried for */ 62 | int flag; 63 | 64 | } flag_count; 65 | 66 | /** 67 | * Examine the page info in infos to count the number of times a specified /proc/kpageflags flag was set, 68 | * effectively giving you a ratio, so you can say "80% of the pages for this allocation are backed by 69 | * huge pages" or whatever. 70 | * 71 | * The flags *must* come from kpageflags (these are not the same as those in /proc/pid/pagemap) and 72 | * are declared in linux/kernel-page-flags.h. 73 | * 74 | * Ideally, the flag information is available for all the pages in the range, so you can 75 | * say something about the entire range, but this is often not the case because (a) flags 76 | * are not available for pages that aren't present and (b) flags are generally never available 77 | * for non-root users. So the ratio structure indicates both the total number of pages as 78 | * well as the number of pages for which the flag information was available. 79 | */ 80 | flag_count get_flag_count(page_info_array infos, int flag); 81 | 82 | /** 83 | * Given the case-insensitive name of a flag, return the flag number (the index of the bit 84 | * representing this flag), or -1 if the flag is not found. The "names" of the flags are 85 | * the same as the macro names in without the KPF_ prefix. 86 | * 87 | * For example, the name of the transparent hugepages flag is "THP" and the corresponding 88 | * macro is KPF_THP, and the value of this macro and returned by this method is 22. 89 | * 90 | * You can generate the corresponding mask value to check the flag using (1ULL << value). 91 | */ 92 | int flag_from_name(char const *name); 93 | 94 | /** 95 | * Print the info in the page_info structure to stdout. 96 | */ 97 | void print_info(page_info info); 98 | 99 | /** 100 | * Print the info in the page_info structure to the give file. 101 | */ 102 | void fprint_info(FILE* file, page_info info); 103 | 104 | 105 | /** 106 | * Print the table header that lines up with the tabluar format used by the "table" printing 107 | * functions. Called by fprint_ratios, or you can call it yourself if you want to prefix the 108 | * output with your own columns. 109 | */ 110 | void fprint_info_header(FILE *file); 111 | 112 | /* print one info in a tabular format (as a single row) */ 113 | void fprint_info_row(FILE *file, page_info info); 114 | 115 | 116 | /** 117 | * Print the ratio for each flag in infos. The ratio is the number of times the flag was set over 118 | * the total number of pages (or the total number of pages for which the information could be obtained). 119 | */ 120 | void fprint_ratios_noheader(FILE *file, page_info_array infos); 121 | /* 122 | * Print a table with one row per page from the given infos. 123 | */ 124 | void fprint_ratios(FILE *file, page_info_array infos); 125 | 126 | /* 127 | * Prints a summary of all the pages in the given array as ratios: the fraction of the time the given 128 | * flag was set. 129 | */ 130 | void fprint_table(FILE *f, page_info_array infos); 131 | 132 | 133 | /** 134 | * Get info for a single page indicated by the given pointer (which may point anywhere in the page). 135 | */ 136 | page_info get_page_info(void *p); 137 | 138 | /** 139 | * Get information for each page in the range from start (inclusive) to end (exclusive). 140 | */ 141 | page_info_array get_info_for_range(void *start, void *end); 142 | 143 | /** 144 | * Free the memory associated with the given page_info_array. You shouldn't use it after this call. 145 | */ 146 | void free_info_array(page_info_array infos); 147 | 148 | #ifdef __cplusplus 149 | } 150 | #endif 151 | 152 | #endif /* PAGE_INFO_H_ */ 153 | -------------------------------------------------------------------------------- /pcg_basic.c.h: -------------------------------------------------------------------------------- 1 | /* 2 | * PCG Random Number Generation for C. 3 | * 4 | * Copyright 2014 Melissa O'Neill 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | * 18 | * For additional information about the PCG random number generation scheme, 19 | * including its license and other licensing options, visit 20 | * 21 | * http://www.pcg-random.org 22 | */ 23 | 24 | /* 25 | * This code is derived from the full C implementation, which is in turn 26 | * derived from the canonical C++ PCG implementation. The C++ version 27 | * has many additional features and is preferable if you can use C++ in 28 | * your project. 29 | */ 30 | 31 | #include "pcg_basic.h" 32 | 33 | // state for global RNGs 34 | 35 | static pcg32_random_t pcg32_global = PCG32_INITIALIZER; 36 | 37 | // pcg32_srandom(initstate, initseq) 38 | // pcg32_srandom_r(rng, initstate, initseq): 39 | // Seed the rng. Specified in two parts, state initializer and a 40 | // sequence selection constant (a.k.a. stream id) 41 | 42 | void pcg32_srandom_r(pcg32_random_t* rng, uint64_t initstate, uint64_t initseq) 43 | { 44 | rng->state = 0U; 45 | rng->inc = (initseq << 1u) | 1u; 46 | pcg32_random_r(rng); 47 | rng->state += initstate; 48 | pcg32_random_r(rng); 49 | } 50 | 51 | void pcg32_srandom(uint64_t seed, uint64_t seq) 52 | { 53 | pcg32_srandom_r(&pcg32_global, seed, seq); 54 | } 55 | 56 | // pcg32_random() 57 | // pcg32_random_r(rng) 58 | // Generate a uniformly distributed 32-bit random number 59 | 60 | uint32_t pcg32_random_r(pcg32_random_t* rng) 61 | { 62 | uint64_t oldstate = rng->state; 63 | rng->state = oldstate * 6364136223846793005ULL + rng->inc; 64 | uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u; 65 | uint32_t rot = oldstate >> 59u; 66 | return (xorshifted >> rot) | (xorshifted << ((-rot) & 31)); 67 | } 68 | 69 | uint32_t pcg32_random() 70 | { 71 | return pcg32_random_r(&pcg32_global); 72 | } 73 | 74 | 75 | // pcg32_boundedrand(bound): 76 | // pcg32_boundedrand_r(rng, bound): 77 | // Generate a uniformly distributed number, r, where 0 <= r < bound 78 | 79 | uint32_t pcg32_boundedrand_r(pcg32_random_t* rng, uint32_t bound) 80 | { 81 | // To avoid bias, we need to make the range of the RNG a multiple of 82 | // bound, which we do by dropping output less than a threshold. 83 | // A naive scheme to calculate the threshold would be to do 84 | // 85 | // uint32_t threshold = 0x100000000ull % bound; 86 | // 87 | // but 64-bit div/mod is slower than 32-bit div/mod (especially on 88 | // 32-bit platforms). In essence, we do 89 | // 90 | // uint32_t threshold = (0x100000000ull-bound) % bound; 91 | // 92 | // because this version will calculate the same modulus, but the LHS 93 | // value is less than 2^32. 94 | 95 | uint32_t threshold = -bound % bound; 96 | 97 | // Uniformity guarantees that this loop will terminate. In practice, it 98 | // should usually terminate quickly; on average (assuming all bounds are 99 | // equally likely), 82.25% of the time, we can expect it to require just 100 | // one iteration. In the worst case, someone passes a bound of 2^31 + 1 101 | // (i.e., 2147483649), which invalidates almost 50% of the range. In 102 | // practice, bounds are typically small and only a tiny amount of the range 103 | // is eliminated. 104 | for (;;) { 105 | uint32_t r = pcg32_random_r(rng); 106 | if (r >= threshold) 107 | return r % bound; 108 | } 109 | } 110 | 111 | 112 | uint32_t pcg32_boundedrand(uint32_t bound) 113 | { 114 | return pcg32_boundedrand_r(&pcg32_global, bound); 115 | } 116 | 117 | -------------------------------------------------------------------------------- /pcg_basic.h: -------------------------------------------------------------------------------- 1 | /* 2 | * PCG Random Number Generation for C. 3 | * 4 | * Copyright 2014 Melissa O'Neill 5 | * 6 | * Licensed under the Apache License, Version 2.0 (the "License"); 7 | * you may not use this file except in compliance with the License. 8 | * You may obtain a copy of the License at 9 | * 10 | * http://www.apache.org/licenses/LICENSE-2.0 11 | * 12 | * Unless required by applicable law or agreed to in writing, software 13 | * distributed under the License is distributed on an "AS IS" BASIS, 14 | * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 | * See the License for the specific language governing permissions and 16 | * limitations under the License. 17 | * 18 | * For additional information about the PCG random number generation scheme, 19 | * including its license and other licensing options, visit 20 | * 21 | * http://www.pcg-random.org 22 | */ 23 | 24 | /* 25 | * This code is derived from the full C implementation, which is in turn 26 | * derived from the canonical C++ PCG implementation. The C++ version 27 | * has many additional features and is preferable if you can use C++ in 28 | * your project. 29 | */ 30 | 31 | #ifndef PCG_BASIC_H_INCLUDED 32 | #define PCG_BASIC_H_INCLUDED 1 33 | 34 | #include 35 | 36 | #if __cplusplus 37 | extern "C" { 38 | #endif 39 | 40 | struct pcg_state_setseq_64 { // Internals are *Private*. 41 | uint64_t state; // RNG state. All values are possible. 42 | uint64_t inc; // Controls which RNG sequence (stream) is 43 | // selected. Must *always* be odd. 44 | }; 45 | typedef struct pcg_state_setseq_64 pcg32_random_t; 46 | 47 | // If you *must* statically initialize it, here's one. 48 | 49 | #define PCG32_INITIALIZER { 0x853c49e6748fea9bULL, 0xda3e39cb94b95bdbULL } 50 | 51 | // pcg32_srandom(initstate, initseq) 52 | // pcg32_srandom_r(rng, initstate, initseq): 53 | // Seed the rng. Specified in two parts, state initializer and a 54 | // sequence selection constant (a.k.a. stream id) 55 | 56 | void pcg32_srandom(uint64_t initstate, uint64_t initseq); 57 | void pcg32_srandom_r(pcg32_random_t* rng, uint64_t initstate, 58 | uint64_t initseq); 59 | 60 | // pcg32_random() 61 | // pcg32_random_r(rng) 62 | // Generate a uniformly distributed 32-bit random number 63 | 64 | uint32_t pcg32_random(void); 65 | uint32_t pcg32_random_r(pcg32_random_t* rng); 66 | 67 | // pcg32_boundedrand(bound): 68 | // pcg32_boundedrand_r(rng, bound): 69 | // Generate a uniformly distributed number, r, where 0 <= r < bound 70 | 71 | uint32_t pcg32_boundedrand(uint32_t bound); 72 | uint32_t pcg32_boundedrand_r(pcg32_random_t* rng, uint32_t bound); 73 | 74 | #if __cplusplus 75 | } 76 | #endif 77 | 78 | #endif // PCG_BASIC_H_INCLUDED 79 | -------------------------------------------------------------------------------- /perf-timer.c: -------------------------------------------------------------------------------- 1 | #include "perf-timer.h" 2 | 3 | #include "jevents/rdpmc.h" 4 | #include "jevents/jevents.h" 5 | 6 | #include 7 | #include 8 | #include 9 | 10 | typedef struct { 11 | const char *name, *short_name; 12 | const char *event_string; 13 | } event; 14 | 15 | const event ALL_EVENTS[] = { //123456789012 16 | { "cpu_clk_unhalted.thread_p" , "CYCLES" , "cpu/event=0x3c/" } , 17 | { "hw_interrupts.received" , "INTERRUPTS" , "cpu/config=0x1cb/" } , 18 | { "l2_rqsts.references" , "L2.ALL" , "cpu/umask=0xFF,event=0x24/"} , 19 | { "l2_rqsts.all_rfo" , "L2.RFO_ALL" , "cpu/umask=0xE2,event=0x24/"} , 20 | { "l2_rqsts.rfo_miss" , "L2.RFO_MISS" , "cpu/umask=0x22,event=0x24/"} , 21 | { "l2_rqsts.miss" , "L2.ALL_MISS" , "cpu/umask=0x3F,event=0x24/"} , 22 | { "l2_rqsts.all_pf" , "L2.ALL_PF" , "cpu/umask=0xF8,event=0x24/"} , 23 | { "llc.ref" , "LLC.REFS" , "cpu/umask=0x4F,event=0x2E/"} , 24 | { "llc.miss" , "LLC.MISS" , "cpu/umask=0x41,event=0x2E/"} , 25 | { "mem_inst_retired.all_stores" , "ALL_STORES" , "cpu/umask=0x82,event=0xd0/"} , 26 | { 0 } // sentinel 27 | }; 28 | 29 | typedef struct { 30 | // the associated event 31 | const event *event; 32 | // the jevents context object 33 | struct rdpmc_ctx jevent_ctx; 34 | } event_ctx; 35 | 36 | size_t context_count; 37 | event_ctx contexts[MAX_EVENTS]; // programmed events go here 38 | 39 | /** 40 | * Take a perf_event_attr objects and return a string representation suitable 41 | * for use as an event for perf, or just for display. 42 | */ 43 | void printf_perf_attr(FILE *f, const struct perf_event_attr* attr) { 44 | char* pmu = resolve_pmu(attr->type); 45 | fputs(pmu ? pmu : "???", f); 46 | 47 | #define APPEND_IF_NZ1(field) APPEND_IF_NZ2(field,field) 48 | #define APPEND_IF_NZ2(name, field) if (attr->field) fprintf(f, "/" #name "=0x%lx, ", (long)attr->field); 49 | 50 | APPEND_IF_NZ1(config); 51 | APPEND_IF_NZ1(config1); 52 | APPEND_IF_NZ1(config2); 53 | APPEND_IF_NZ2(period, sample_period); 54 | APPEND_IF_NZ1(sample_type); 55 | APPEND_IF_NZ1(read_format); 56 | 57 | fprintf(f, "/"); 58 | } 59 | 60 | void print_caps(FILE *f, const struct rdpmc_ctx *ctx) { 61 | fprintf(f, " caps: R: %d UT: %d ZT: %d index: 0x%u", 62 | ctx->buf->cap_user_rdpmc, ctx->buf->cap_user_time, ctx->buf->cap_user_time_zero, ctx->buf->index); 63 | } 64 | 65 | /* list the events in markdown format */ 66 | void list_events() { 67 | const char *fmt = "| %-27s | %-12s |\n"; 68 | printf(fmt, "Full Name", "Short Name"); 69 | printf(fmt, "-------------------------", "-----------"); 70 | for (const event *e = ALL_EVENTS; e->name; e++) { 71 | printf(fmt, e->name, e->short_name); 72 | } 73 | } 74 | 75 | 76 | void setup_counters(bool verbose, const char *counter_string) { 77 | for (const event *e = ALL_EVENTS; e->name; e++) { 78 | if (!strstr(counter_string, e->short_name)) { 79 | continue; 80 | } 81 | fprintf(stderr, "Enabling event %s (%s)\n", e->short_name, e->name); 82 | struct perf_event_attr attr = {}; 83 | int err = jevent_name_to_attr(e->event_string, &attr); 84 | if (err) { 85 | fprintf(stderr, "Unable to resolve event %s - report this as a bug along with your CPU model string\n", e->name); 86 | fprintf(stderr, "jevents error %2d: %s\n", err, jevent_error_to_string(err)); 87 | fprintf(stderr, "jevents details : %s\n", jevent_get_error_details()); 88 | } else { 89 | struct rdpmc_ctx ctx = {}; 90 | attr.sample_period = 0; 91 | if (rdpmc_open_attr(&attr, &ctx, 0)) { 92 | fprintf(stderr, "Failed to program event '%s'. Resolved to: ", e->name); 93 | printf_perf_attr(stderr, &attr); 94 | } else { 95 | if (verbose) { 96 | printf("Resolved and programmed event '%s' to '", e->name); 97 | printf_perf_attr(stdout, &attr); 98 | print_caps(stdout, &ctx); 99 | printf("\n"); 100 | } 101 | contexts[context_count++] = (event_ctx){e, ctx}; 102 | if (context_count == MAX_EVENTS) { 103 | break; 104 | } 105 | } 106 | } 107 | } 108 | } 109 | 110 | void print_counter_headings(const char* format) { 111 | for (size_t i = 0; i < context_count; i++) { 112 | printf(format, contexts[i].event->short_name); 113 | } 114 | } 115 | 116 | event_counts read_counters() { 117 | event_counts ret = {}; 118 | for (size_t i=0; i < context_count; i++) { 119 | ret.counts[i] = rdpmc_read(&contexts[i].jevent_ctx); 120 | } 121 | return ret; 122 | } 123 | 124 | size_t num_counters() { 125 | return context_count; 126 | } 127 | 128 | event_counts calc_delta(event_counts before, event_counts after) { 129 | event_counts ret = {}; 130 | for (size_t i=0; i < MAX_EVENTS; i++) { 131 | ret.counts[i] = after.counts[i] - before.counts[i]; 132 | } 133 | return ret; 134 | } 135 | -------------------------------------------------------------------------------- /perf-timer.h: -------------------------------------------------------------------------------- 1 | /* simple PMU counting capabilities */ 2 | #include 3 | #include 4 | #include 5 | 6 | #define MAX_EVENTS 8 7 | 8 | typedef struct { 9 | uint64_t counts[MAX_EVENTS]; 10 | } event_counts; 11 | 12 | void list_events(); 13 | 14 | void setup_counters(bool verbose, const char *counter_string); 15 | 16 | void print_counter_headings(const char *delim); 17 | 18 | event_counts read_counters(); 19 | 20 | /* number of succesfully programmed counters */ 21 | size_t num_counters(); 22 | 23 | event_counts calc_delta(event_counts before, event_counts after); 24 | -------------------------------------------------------------------------------- /random-writes.c: -------------------------------------------------------------------------------- 1 | 2 | // as a hack we just include pcg_basic.c (renamed to .c.h) so the rand functions can be inlined 3 | #include "pcg_basic.c.h" 4 | 5 | #include "common.h" 6 | #include "hedley.h" 7 | 8 | #include 9 | #include 10 | #include 11 | 12 | typedef uint64_t lcg_state; 13 | 14 | #ifdef USE_PCG 15 | #define RAND_FUNC pcg32_random_r 16 | #define RAND_INIT PCG32_INITIALIZER 17 | typedef pcg32_random_t rng_state; 18 | #else 19 | #define RAND_FUNC lcg_next 20 | typedef lcg_state rng_state; 21 | #define RAND_INIT 0x1234567890ABCDEFull 22 | #endif 23 | 24 | 25 | int random_pcg(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 26 | pcg32_random_t rng = PCG32_INITIALIZER; 27 | uint32_t total = 0; 28 | do { 29 | total += pcg32_random_r(&rng); 30 | } while (--iters > 0); 31 | return total; 32 | } 33 | 34 | uint32_t lcg_next(lcg_state* state) { 35 | uint64_t newstate = *state * 6364136223846793005ull + 1ull; 36 | *state = newstate; 37 | return newstate >> 32; 38 | } 39 | 40 | int random_lcg(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 41 | lcg_state rng = 123; 42 | uint32_t total = 0; 43 | do { 44 | total += lcg_next(&rng); 45 | } while (--iters > 0); 46 | return total; 47 | } 48 | 49 | 50 | int write_random_singleu(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 51 | rng_state rng = RAND_INIT; 52 | iters /= 8u; 53 | do { 54 | uint64_t val = RAND_FUNC(&rng); 55 | a2[((val ) & (size2 - 1))] = 2; 56 | a2[((val + 0xd6b45560u) & (size2 - 1))] = 3; 57 | a2[((val + 0x23a7b9cau) & (size2 - 1))] = 4; 58 | a2[((val + 0x60776172u) & (size2 - 1))] = 5; 59 | a2[((val + 0x43b006cau) & (size2 - 1))] = 6; 60 | a2[((val + 0xa8b0af69u) & (size2 - 1))] = 7; 61 | a2[((val + 0x66da7813u) & (size2 - 1))] = 8; 62 | a2[((val + 0xcc667058u) & (size2 - 1))] = 9; 63 | } while (--iters > 0); 64 | return 0; 65 | } 66 | 67 | int write_random_single(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 68 | rng_state rng = RAND_INIT; 69 | do { 70 | uint32_t val = RAND_FUNC(&rng); 71 | a2[(val & (size2 - 1))] = 2; 72 | } while (--iters > 0); 73 | return 0; 74 | } 75 | 76 | int write_linear(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 77 | size_t index = 0; 78 | do { 79 | a2[index & (size2 - 1)] = iters; 80 | index += 64; 81 | } while (--iters > 0); 82 | return 0; 83 | } 84 | 85 | int write_linear_sfence(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 86 | size_t index = 0; 87 | do { 88 | a2[index & (size2 - 1)] = iters; 89 | index += 64; 90 | _mm_sfence(); 91 | } while (--iters > 0); 92 | return 0; 93 | } 94 | 95 | HEDLEY_ALWAYS_INLINE 96 | static inline int write_linearT(size_t iters, char* a1, size_t size1, char *a2, size_t size2, void (*fence)()) { 97 | size_t index = 0, index2 = 0; 98 | do { 99 | a2[index & (size2 - 1)] = iters; 100 | size_t temp = 0; 101 | fence(); 102 | a2[(index & (size2 - 1)) + 1] = iters; 103 | temp += a1[index2 + temp] & 0x100; 104 | temp += a1[index2 + temp] & 0x100; 105 | temp += a1[index2 + temp] & 0x100; 106 | temp += a1[index2 + temp] & 0x100; 107 | temp += a1[index2 + temp] & 0x100; 108 | //assert(temp == 0); 109 | index += 64 + temp; 110 | index2 = (index2 + 64) & (size1 - 1); 111 | } while (--iters > 0); 112 | return 0; 113 | } 114 | 115 | static inline void lfence() { 116 | _mm_lfence(); 117 | } 118 | 119 | static inline void sfence() { 120 | _mm_sfence(); 121 | } 122 | 123 | int write_linearHL(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 124 | return write_linearT(iters, a1, size1, a2, size2, lfence); 125 | } 126 | 127 | int write_linearHS(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 128 | return write_linearT(iters, a1, size1, a2, size2, sfence); 129 | } 130 | 131 | int read_linear(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 132 | size_t index = 0; 133 | char total = 0; 134 | do { 135 | total += a2[index & (size2 - 1)]; 136 | index += 64; 137 | } while (--iters > 0); 138 | return total; 139 | } 140 | 141 | int writes_inter(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 142 | rng_state rng = RAND_INIT; 143 | do { 144 | uint32_t val = RAND_FUNC(&rng); 145 | a1[(val & (size1 - 1))] = 1; 146 | a2[(val & (size2 - 1))] = 2; 147 | } while (--iters > 0); 148 | return 0; 149 | } 150 | 151 | int writes_inter_pf_fixed(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 152 | rng_state rng = RAND_INIT; 153 | do { 154 | uint32_t val = RAND_FUNC(&rng); 155 | __builtin_prefetch(a1 + (val & (size1 - 1))); 156 | a1[(val & (size1 - 1))] = 1; 157 | a2[(val & (size2 - 1))] = 2; 158 | } while (--iters > 0); 159 | return 0; 160 | } 161 | 162 | 163 | int writes_inter_pf_var(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 164 | rng_state rng = RAND_INIT; 165 | do { 166 | uint32_t val = RAND_FUNC(&rng); 167 | __builtin_prefetch(a2 + (val & (size2 - 1))); 168 | a1[(val & (size1 - 1))] = 1; 169 | a2[(val & (size2 - 1))] = 2; 170 | } while (--iters > 0); 171 | return 0; 172 | } 173 | 174 | 175 | int writes_inter_pf_both(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 176 | rng_state rng = RAND_INIT; 177 | do { 178 | uint32_t val = RAND_FUNC(&rng); 179 | __builtin_prefetch(a1 + (val & (size1 - 1))); 180 | __builtin_prefetch(a2 + (val & (size2 - 1))); 181 | a1[(val & (size1 - 1))] = 1; 182 | a2[(val & (size2 - 1))] = 2; 183 | } while (--iters > 0); 184 | return 0; 185 | } 186 | 187 | // unroll by 2 and reorder writes to the same region to make them adjacent 188 | int writes_inter_u2(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 189 | rng_state rng = RAND_INIT; 190 | iters /= 2; // cut the iterations in half since we do double the work in each itr 191 | do { 192 | uint32_t val1 = RAND_FUNC(&rng); 193 | uint32_t val2 = RAND_FUNC(&rng); 194 | a1[(val1 & (size1 - 1))] = 1; 195 | a1[(val2 & (size1 - 1))] = 1; 196 | a2[(val1 & (size2 - 1))] = 2; 197 | a2[(val2 & (size2 - 1))] = 2; 198 | } while (--iters > 0); 199 | return 0; 200 | } 201 | 202 | // unroll by 4 and reorder writes to the same region to make them adjacent 203 | int writes_inter_u4(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 204 | rng_state rng = RAND_INIT; 205 | iters /= 4; 206 | do { 207 | uint32_t val1 = RAND_FUNC(&rng); 208 | uint32_t val2 = RAND_FUNC(&rng); 209 | uint32_t val3 = RAND_FUNC(&rng); 210 | uint32_t val4 = RAND_FUNC(&rng); 211 | a1[(val1 & (size1 - 1))] = 1; 212 | a1[(val2 & (size1 - 1))] = 1; 213 | a1[(val3 & (size1 - 1))] = 1; 214 | a1[(val4 & (size1 - 1))] = 1; 215 | a2[(val1 & (size2 - 1))] = 2; 216 | a2[(val2 & (size2 - 1))] = 2; 217 | a2[(val3 & (size2 - 1))] = 2; 218 | a2[(val4 & (size2 - 1))] = 2; 219 | } while (--iters > 0); 220 | return 0; 221 | } 222 | 223 | int writes_inter_sfenceA(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 224 | rng_state rng = RAND_INIT; 225 | do { 226 | uint32_t val = RAND_FUNC(&rng); 227 | a1[(val & (size1 - 1))] = 1; 228 | _mm_sfence(); 229 | a2[(val & (size2 - 1))] = 2; 230 | } while (--iters > 0); 231 | return 0; 232 | } 233 | 234 | int writes_inter_sfenceB(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 235 | rng_state rng = RAND_INIT; 236 | do { 237 | uint32_t val = RAND_FUNC(&rng); 238 | a1[(val & (size1 - 1))] = 1; 239 | a2[(val & (size2 - 1))] = 2; 240 | _mm_sfence(); 241 | } while (--iters > 0); 242 | return 0; 243 | } 244 | 245 | int writes_inter_sfenceC(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 246 | rng_state rng = RAND_INIT; 247 | do { 248 | uint32_t val = RAND_FUNC(&rng); 249 | a1[(val & (size1 - 1))] = 1; 250 | _mm_sfence(); 251 | a2[(val & (size2 - 1))] = 2; 252 | _mm_sfence(); 253 | } while (--iters > 0); 254 | return 0; 255 | } 256 | 257 | int random_read2(size_t iters, char* a1, size_t size1, char *a2, size_t size2) { 258 | char total = 0; 259 | rng_state rng = RAND_INIT; 260 | do { 261 | uint32_t val = RAND_FUNC(&rng); 262 | total += a1[(val & (size1 - 1))]; 263 | total += a2[(val & (size2 - 1))]; 264 | } while (--iters > 0); 265 | return total; 266 | } 267 | 268 | -------------------------------------------------------------------------------- /scripts/all.sh: -------------------------------------------------------------------------------- 1 | # run all the plot generation scripts to generate assets 2 | if [ -z "$SUFFIX" ]; then 3 | echo "Set filename SUFFIX to new or old depending on microcode" 4 | exit 1 5 | fi 6 | 7 | # turn off prefetching 8 | sudo wrmsr -a 0x1a4 "$((2#1111))" && sudo rdmsr -a -x 0x1A4 9 | 10 | OUTFILE=assets/i-vs-s-$SUFFIX.svg ./scripts/rwrite-1-vs-2.sh 11 | OUTFILE=assets/i-plus-counters-$SUFFIX.svg ./scripts/interleaved-1.sh 12 | OUTFILE=assets/i-sfence-$SUFFIX.svg ./scripts/rwrite2-vs-sfence.sh 13 | OUTFILE=assets/i-prefetch-$SUFFIX.svg ./scripts/prefetch.sh 14 | 15 | export ARRAY1_SIZE=2048 16 | 17 | OUTFILE=assets/i-vs-s-2mib-$SUFFIX.svg ./scripts/rwrite-1-vs-2.sh 18 | OUTFILE=assets/i-plus-counters-2mib-$SUFFIX.svg ./scripts/interleaved-1.sh 19 | OUTFILE=assets/i-sfence-2mib-$SUFFIX.svg ./scripts/rwrite2-vs-sfence.sh 20 | OUTFILE=assets/i-unrolled-2mib-$SUFFIX.svg ./scripts/rwrite2-unrolled.sh 21 | OUTFILE=assets/i-prefetch-2mib-$SUFFIX.svg ./scripts/prefetch.sh 22 | -------------------------------------------------------------------------------- /scripts/common.sh: -------------------------------------------------------------------------------- 1 | #! /bin/echo dont-run-this-directly 2 | 3 | set -e 4 | SCRIPTNAME=$(basename "$0") 5 | SCRIPTDIR=$(dirname "$0") 6 | 7 | # call this from the parent directory, not from inside scripts 8 | # use it like script.sh [OUTPUT] 9 | # where OUTPUT is the output filename for the plot, or not filename for an interactive graph 10 | if [ ! -f "$PWD/scripts/$SCRIPTNAME" ]; then 11 | set +e 12 | PARENT="$( cd "$(dirname "$0")/.." ; pwd -P )" 13 | echo "Please run this script from the root project directory: $PARENT" 14 | exit 1 15 | fi 16 | 17 | : ${START:=1} 18 | : ${STOP:=100000} 19 | 20 | if [ -z "$OUTFILE" ]; then 21 | OUT=() 22 | else 23 | OUT=("--out" "$OUTFILE") 24 | fi 25 | 26 | export PLOT=1 27 | 28 | mkdir -p tmp 29 | TDIR=$(mktemp -d "./tmp/XXXXXXXX") 30 | 31 | MICROCODE=$(grep -m1 microcode /proc/cpuinfo | grep -o '0x.*') 32 | NEWLINE=$'\n' 33 | 34 | PLOT1=./scripts/plot1.sh 35 | PLOT2=./scripts/plot2.sh 36 | PLOT3=./scripts/plot3.sh 37 | PLOT4=./scripts/plot4.sh 38 | PLOTPY=./scripts/plot-csv.py 39 | 40 | # arrayify values passed as strings from parent 41 | COLARRAY=($COLS) 42 | C2ARRAY=($COLS2) 43 | #IFS=',' read -r -a arr <<< "$COUNTER_LIST" 44 | 45 | COMMON_ARGS=(-v --xcol 0\ 46 | ${COLS:+--cols ${COLARRAY[@]}} \ 47 | ${COLS2:+--cols2 ${C2ARRAY[@]}} \ 48 | ${CLABELS:+--clabels "$CLABELS"} \ 49 | ${YLABEL:+--ylabel "$YLABEL"} 50 | --title "$TITLE" \ 51 | "${OUT[@]}") 52 | 53 | -------------------------------------------------------------------------------- /scripts/interleaved-1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source "$(dirname "$0")/common.sh" 3 | 4 | export CPU_COUNTERS="L2.RFO_ALL,L2.RFO_MISS,L2.ALL" 5 | export TITLE="Interleaved Stores w/ Selected Counters${NEWLINE}(microcode rev: $MICROCODE)" 6 | export COLS="0 1" 7 | export COLS2="2 3 4" 8 | export CLABELS="region size (KiB),interleaved" 9 | export YLABEL="cycles/iteration" 10 | 11 | "$PLOT1" interleaved 12 | -------------------------------------------------------------------------------- /scripts/pdutil.py: -------------------------------------------------------------------------------- 1 | # renames duplicate columns by suffixing _1, _2 etc 2 | class renamer(): 3 | def __init__(self): 4 | self.d = dict() 5 | 6 | def __call__(self, x): 7 | if x not in self.d: 8 | self.d[x] = 0 9 | return x 10 | else: 11 | self.d[x] += 1 12 | return "%s_%d" % (x, self.d[x]) 13 | -------------------------------------------------------------------------------- /scripts/plot-csv.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import matplotlib.pyplot as plt 4 | import pandas as pd 5 | import numpy as np 6 | import csv 7 | import argparse 8 | import sys 9 | import collections 10 | import os 11 | import pdutil 12 | 13 | 14 | p = argparse.ArgumentParser(usage='plot output from PLOT=1 ./bench') 15 | p.add_argument('input', help='CSV file to plot (or stdin)', nargs='*', 16 | type=argparse.FileType('r'), default=[ sys.stdin ]) 17 | p.add_argument('--xcol', help='Column index to use as x axis (default: 0)', type=int, default=0) 18 | p.add_argument('--cols', help='Use only these zero-based columns on primary axis (default: all columns)', 19 | type=int, nargs='+') 20 | p.add_argument('--cols2', help='Use only these zero-based columns on secondary axis (default: no secondary axis)', 21 | type=int, nargs='+') 22 | p.add_argument('--clabels', help="Comma separated list of column names used as label for data series (default: column header)", 23 | type=lambda x: x.split(',')) 24 | p.add_argument('--title', help='Set chart title', default='Some chart (use --title to specify title)') 25 | p.add_argument('--xlabel', help='Set x axis label') 26 | p.add_argument('--ylabel', help='Set y axis label') 27 | p.add_argument('--out', help='output filename') 28 | p.add_argument('--suffix-names', help='Suffix each column name with the file it came from', action='store_true') 29 | p.add_argument('--verbose', '-v', help='enable verbose logging', action='store_true') 30 | args = p.parse_args() 31 | 32 | vprint = print if args.verbose else lambda *a: None 33 | vprint("args = ", args) 34 | 35 | xi = args.xcol 36 | dfs = [] 37 | for f in args.input: 38 | df = pd.read_csv(f) 39 | if args.suffix_names: 40 | df = df.add_suffix(' ' + os.path.basename(f.name)) 41 | vprint("----- df from: ", f.name, "-----\n", df.head(), "\n---------------------") 42 | dfs.append(df) 43 | 44 | df = pd.concat(dfs, axis=1) 45 | vprint("----- merged df -----\n", df.head(), "\n---------------------") 46 | 47 | # rename any duplicate columns because otherwise Pandas gets mad 48 | df = df.rename(columns=pdutil.renamer()) 49 | 50 | 51 | vprint("---- renamed df ----\n", df.head(), "\n---------------------") 52 | 53 | def extract_cols(cols, df, name): 54 | vprint(name, "axis columns: ", cols) 55 | if (max(cols) >= len(df.columns)): 56 | print("Column", max(cols), "too large: input only has", len(df.columns), "columns", file=sys.stderr) 57 | exit(1) 58 | pruned = df.iloc[:, cols] 59 | vprint("----- pruned ", name, " df -----\n", pruned.head(), "\n---------------------") 60 | return pruned 61 | 62 | df2 = extract_cols(args.cols2, df, "secondary") if args.cols2 else None 63 | 64 | if args.cols: 65 | df = extract_cols(args.cols, df, "primary") 66 | 67 | if args.clabels: 68 | if len(df.columns) != len(args.clabels): 69 | sys.exit("ERROR: number of column labels not equal to the number of selected columns") 70 | df.columns = args.clabels 71 | 72 | # dupes will break pandas beyond this point, should be impossible due to above renaming 73 | dupes = df.columns.duplicated() 74 | if True in dupes: 75 | print("Duplicate columns after merge and pruning, consider --suffix-names", 76 | df.columns[dupes].values.tolist(), file=sys.stderr) 77 | exit(1) 78 | 79 | # set x labels to strings so we don't get a scatter plot, and 80 | # so the x labels are not themselves plotted 81 | df.iloc[:,xi] = df.iloc[:,xi].apply(str) 82 | 83 | 84 | ax = df.plot.line(x=xi, title=args.title, figsize=(12,8), grid=True) 85 | 86 | # this sets the ticks explicitly to one per x value, which means that 87 | # all x values will be shown, but the x-axis could be crowded if there 88 | # are too many, remmove to use the auto tick density 89 | ticks = df.iloc[:,xi].values 90 | plt.xticks(ticks=range(len(ticks)), labels=ticks) 91 | 92 | if args.ylabel: 93 | ax.set_ylabel(args.ylabel) 94 | 95 | if args.xlabel: 96 | ax.set_xlabel(args.xlabel) 97 | 98 | # secondary axis handling 99 | if df2 is not None: 100 | df2.plot(secondary_y=True, ax=ax, grid=True) 101 | 102 | if (args.out): 103 | vprint("Saving figure to ", args.out, "...") 104 | plt.savefig(args.out) 105 | else: 106 | vprint("Showing interactive plot...") 107 | plt.show() 108 | -------------------------------------------------------------------------------- /scripts/plot.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import matplotlib.pyplot as plt 4 | import csv 5 | import argparse 6 | import sys 7 | import collections 8 | 9 | 10 | p = argparse.ArgumentParser( 11 | usage='plot output from W_PLOT=1 ./bench') 12 | p.add_argument('file', help='CSV file to plot (or stdin)', nargs='?') 13 | p.add_argument('--out', help='output filename') 14 | args = p.parse_args() 15 | print("args = ", args) 16 | 17 | 18 | if args.file: 19 | input = open(args.file, "r") 20 | else: 21 | input = sys.stdin 22 | 23 | columns = collections.defaultdict(list) 24 | 25 | data = csv.reader(input, delimiter=',') 26 | headers = [h for h in next(data) if h != ''] 27 | x_label = headers[0] 28 | headers.pop(0) 29 | print("headers are ", headers) 30 | for row in data: 31 | for (i,v) in enumerate(row): 32 | if v != '': columns[i].append(float(v)) 33 | 34 | for i,c in columns.items(): 35 | print("Column: ", i, "\n", c) 36 | 37 | x = [str(int(v)) for v in columns[0]] 38 | 39 | print("x = ", x) 40 | 41 | fig, ax1 = plt.subplots(figsize=(10,7)) 42 | ax1.set_xlabel(x_label) 43 | ax1.set_ylabel(headers[0]) 44 | 45 | ax2 = ax1.twinx() 46 | ax2.set_ylabel('events') 47 | # the next line sets both axis to use the same color cycle so we don't get dupe colors 48 | ax2._get_lines.prop_cycler = ax1._get_lines.prop_cycler 49 | 50 | for i,header in enumerate(headers, 1): 51 | y = columns[i] 52 | print("Column ", i, " y = ", y) 53 | ax = ax2 if i > 2 else ax1 54 | ax.plot(x, y, label=header.strip(), linewidth=2, marker='.') 55 | 56 | ax1.set_ylim(bottom=0) 57 | plt.title('Performance of interleaved stores') 58 | fig.legend(loc='upper right', ncol=2) 59 | 60 | if (args.out): 61 | plt.savefig(args.out) 62 | else: 63 | plt.show() 64 | -------------------------------------------------------------------------------- /scripts/plot1.sh: -------------------------------------------------------------------------------- 1 | # plot one tests, with performance counters, which you set in env var CPU_COUNTERS 2 | 3 | source "$(dirname "$0")/common.sh" 4 | 5 | if [ "$#" -ne 1 ]; then 6 | echo -e "Usage: plot1.sh TEST1\n\tWhere TEST1 is the test names" 7 | exit 1 8 | fi 9 | 10 | FILE1="$TDIR/$1" 11 | 12 | echo "Writing temporary results to $FILE1. Plot to ${OUT[@]}" >&2 13 | 14 | ./bench "$1" $START $STOP > "$FILE1" 15 | 16 | "$PLOTPY" "$FILE1" "${COMMON_ARGS[@]}" 17 | 18 | -------------------------------------------------------------------------------- /scripts/plot2.sh: -------------------------------------------------------------------------------- 1 | # plot two different tests, given as $1 and $2 against each other 2 | 3 | source "$(dirname "$0")/common.sh" 4 | 5 | if [ "$#" -ne 2 ]; then 6 | echo -e "Usage: plot2.sh TEST1 TEST2\n\tWhere TEST1 and TEST2 are test names" 7 | exit 1 8 | fi 9 | 10 | FILE1="$TDIR/$1" 11 | FILE2="$TDIR/$2" 12 | 13 | echo "Writing temporary results to $FILE1 and $FILE2. Plot to ${OUT[@]}" >&2 14 | 15 | ./bench "$1" $START $STOP > "$FILE1" 16 | ./bench "$2" $START $STOP > "$FILE2" 17 | 18 | "$PLOTPY" "$FILE1" "$FILE2" "${COMMON_ARGS[@]}" 19 | 20 | -------------------------------------------------------------------------------- /scripts/plot3.sh: -------------------------------------------------------------------------------- 1 | # plot three different tests, given as $1, $2 and $3 against each other 2 | 3 | source "$(dirname "$0")/common.sh" 4 | 5 | if [ "$#" -ne 3 ]; then 6 | echo -e "Usage: plot3.sh TEST1 TEST2 TEST3\n\tWhere TEST1, TEST2 and TEST3 are test names" 7 | exit 1 8 | fi 9 | 10 | FILE1="$TDIR/$1" 11 | FILE2="$TDIR/$2" 12 | FILE3="$TDIR/$3" 13 | 14 | echo "Writing temporary results to $FILE1, $FILE2 and $FILE3. Plot to ${OUT[@]}" >&2 15 | 16 | ./bench "$1" $START $STOP > "$FILE1" 17 | ./bench "$2" $START $STOP > "$FILE2" 18 | ./bench "$3" $START $STOP > "$FILE3" 19 | 20 | "$PLOTPY" "$FILE1" "$FILE2" "$FILE3" "${COMMON_ARGS[@]}" 21 | 22 | -------------------------------------------------------------------------------- /scripts/plot4.sh: -------------------------------------------------------------------------------- 1 | # plot three different tests, given as $1, $2 and $3 against each other 2 | 3 | source "$(dirname "$0")/common.sh" 4 | 5 | if [ "$#" -ne 4 ]; then 6 | echo -e "Usage: plot4.sh TEST1 ... TEST4\n\tWhere TEST{1,2,3,4} are test names" 7 | exit 1 8 | fi 9 | 10 | FILE1="$TDIR/$1" 11 | FILE2="$TDIR/$2" 12 | FILE3="$TDIR/$3" 13 | FILE4="$TDIR/$4" 14 | 15 | echo "Writing temporary results to $FILE1, $FILE2 and $FILE3. Plot to ${OUT[@]}" >&2 16 | 17 | ./bench "$1" $START $STOP > "$FILE1" 18 | ./bench "$2" $START $STOP > "$FILE2" 19 | ./bench "$3" $START $STOP > "$FILE3" 20 | ./bench "$4" $START $STOP > "$FILE4" 21 | 22 | "$PLOTPY" "$FILE1" "$FILE2" "$FILE3" "$FILE4" "${COMMON_ARGS[@]}" 23 | 24 | -------------------------------------------------------------------------------- /scripts/prefetch.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source "$(dirname "$0")/common.sh" 3 | 4 | export TITLE="Interleaved with Prefetching${NEWLINE}(microcode rev: $MICROCODE)" 5 | export COLS="0 1 3 5 7" 6 | export CLABELS="region size (KiB),interleaved,prefetch fixed,prefetch variable,prefetch both" 7 | export YLABEL="cycles/iteration" 8 | 9 | "$PLOT4" interleaved interleaved-pf-fixed interleaved-pf-var interleaved-pf-both 10 | -------------------------------------------------------------------------------- /scripts/rwrite-1-vs-2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source "$(dirname "$0")/common.sh" 3 | 4 | export TITLE="Interleaved vs Single Stores${NEWLINE}(microcode rev: $MICROCODE)" 5 | export COLS="0 1 3" 6 | export CLABELS="region size (KiB),single,interleaved" 7 | export YLABEL="cycles/iteration" 8 | 9 | "$PLOT2" wrandom1 interleaved 10 | -------------------------------------------------------------------------------- /scripts/rwrite2-unrolled.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source "$(dirname "$0")/common.sh" 3 | 4 | export TITLE="Interleaved Stores with Unrolling${NEWLINE}(microcode rev: $MICROCODE)" 5 | export COLS="0 1 3 5" 6 | export CLABELS="region size (KiB),interleaved,interleaved (unroll 2),interleaved (unroll 4)" 7 | export YLABEL="cycles/iteration" 8 | 9 | "$PLOT3" interleaved interleaved-u2 interleaved-u4 10 | -------------------------------------------------------------------------------- /scripts/rwrite2-vs-sfence.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source "$(dirname "$0")/common.sh" 3 | 4 | export TITLE="Interleaved vs Single Stores${NEWLINE}(microcode rev: $MICROCODE)" 5 | export COLS="0 1 3 5" 6 | export CLABELS="region size (KiB),interleaved,sfenceA,sfenceB" 7 | export YLABEL="cycles/iteration" 8 | 9 | "$PLOT3" interleaved interleaved-sfenceA interleaved-sfenceB 10 | -------------------------------------------------------------------------------- /scripts/rwrite2-vs-sfenceC.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | source "$(dirname "$0")/common.sh" 3 | 4 | export TITLE="Interleaved vs Single Stores${NEWLINE}(microcode rev: $MICROCODE)" 5 | export COLS="0 1 3 5 7" 6 | export CLABELS="region size (KiB),interleaved,sfenceA,sfenceB,sfenceC" 7 | export YLABEL="cycles/iteration" 8 | 9 | "$PLOT4" interleaved interleaved-sfenceA interleaved-sfenceB interleaved-sfenceC 10 | --------------------------------------------------------------------------------