├── .gitignore
├── Makefile
├── README.md
├── benchmarks
    ├── Makefile
    └── bench.cpp
├── figures
    ├── HotSpotOffCPU.png
    ├── benchmark_linear_complexity.png
    ├── dot_disassembly.png
    ├── frontend_v_backend.png
    ├── hotspotoffcpuconfig.png
    ├── logo.svg
    ├── perf_report_homescreen.png
    ├── read_portal.svg
    ├── vtkm_cuda.svg
    ├── vtkm_openmp.svg
    ├── vtkm_tbb_rendering.svg
    └── ymm_dot_product.png
├── perf_permissions.sh
└── src
    ├── decent_code.cpp
    ├── dot.asm
    ├── mwe.cpp
    └── use_asm.cpp


/.gitignore:
--------------------------------------------------------------------------------
1 | dot
2 | perf.data
3 | perf.data.old
4 | a.out
5 | *.o
6 | *.x
7 | 
8 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | CXX = g++
 2 | CPPFLAGS = --std=c++17 -g -fno-omit-frame-pointer -O3 -march=native -fno-finite-math-only -ffast-math -fopenmp
 3 | 
 4 | all: dot
 5 | 
 6 | dot: src/mwe.cpp
 7 | 	$(CXX) $(CPPFLAGS) $< -o $@
 8 | 
 9 | use_asm.x: src/use_asm.cpp
10 | 	yasm -f elf64 -g dwarf2 src/dot.asm -o dot.o
11 | 	$(CXX) -o $@ dot.o $<
12 | 
13 | clean:
14 | 	rm -f dot *.x *.o perf.data perf.data.old a.out src/*.o src/*.x src/perf.data src/perf.data.old
15 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
   1 | slidenumbers: true
   2 | 
   3 | # Performance Tuning
   4 | 
   5 | ![inline](figures/logo.svg)
   6 | 
   7 | Nick Thompson
   8 | 
   9 | ^ Thanks for coming. First let's give a shoutout to Matt Wolf for putting this tutorial together, and to Barney Maccabe for putting the support behind it to make it happen.
  10 | 
  11 | ---
  12 | 
  13 | Session 1: Using `perf`
  14 | 
  15 | [Follow along](https://github.com/NAThompson/performance_tuning_tutorial):
  16 | 
  17 | ```
  18 | $ git clone https://github.com/NAThompson/performance_tuning_tutorial
  19 | $ cd performance_tuning_tutorial
  20 | $ make
  21 | $ ./dot 100000000
  22 | ```
  23 | 
  24 | ^ This is a tutorial, so definitely follow along. I will be pacing this under the assumption you will be following along, so you'll get bored if you're watching. In addition, at the end of the tutorial we'll do a short quiz, not to stress anyone out, but to solidify the concepts. I hope that'll galvanize us to bring a bit more intensity than usually brought to a six hour training session! If the stakes are too low, we're just gonna waste two good mornings.
  25 | 
  26 | ^ Please get the notes from github, and attempt to issue the commands.
  27 | 
  28 | ---
  29 | 
  30 | ## What is `perf`?
  31 | 
  32 |  - Performance tools for linux
  33 |  - Designed to profile kernel, but can profile userspace apps
  34 |  - Sampling based
  35 |  - Canonized in linux kernel source code
  36 | 
  37 | ---
  38 | 
  39 | ## Installing `perf`: Ubuntu
  40 | 
  41 | ```bash
  42 | $ sudo apt install linux-tools-common
  43 | $ sudo apt install linux-tools-generic
  44 | $ sudo apt install linux-tools-`uname -r`
  45 | ```
  46 | 
  47 | ^ Installation is pretty easy on Ubuntu.
  48 | 
  49 | ---
  50 | 
  51 | ## Installing `perf`: CentOS
  52 | 
  53 | ```bash
  54 | $ yum install perf
  55 | ```
  56 | 
  57 | ---
  58 | 
  59 | ## Access `perf`:
  60 | 
  61 | `perf` is available on Summit (summit.olcf.ornl.gov), Andes (andes.olcf.ornl.gov) and the SNS nodes (analysis.sns.gov)
  62 | 
  63 | I have verified that all the commands of this tutorial work on Andes.
  64 | 
  65 | ---
  66 | 
  67 | ## Installing `perf`: Source build
  68 | 
  69 | ```bash
  70 | $ git clone --depth=1 https://github.com/torvalds/linux.git
  71 | $ cd linux/tools/perf;
  72 | $ make
  73 | $ ./perf
  74 | ```
  75 | 
  76 | ^ I like doing source builds of `perf`. Not only because I often don't have root, but also because `perf` improves over time, so I like to get the latest version. For example, new hardware counters were recently added for the Power9 architecture.
  77 | 
  78 | ---
  79 | 
  80 | ## Please do a source build for this tutorial!
  81 | 
  82 | A source build is the first step to owning your tools, and will help us all be on the same page.
  83 | 
  84 | ---
  85 | 
  86 | ## Ubuntu Dependencies
  87 | 
  88 | ```
  89 | $ sudo apt install -y bison flex libslang2-dev systemtap-sdt-dev \
  90 |    libnuma-dev libcap-dev libbabeltrace-ctf-dev libiberty-dev python-dev
  91 | ```
  92 | 
  93 | ---
  94 | 
  95 | # `perf_permissions.sh`
  96 | 
  97 | ```bash
  98 | #!/bin/bash
  99 | 
 100 | # Taken from Milian Wolf's talk "Linux perf for Qt developers"
 101 | sudo mount -o remount,mode=755 /sys/kernel/debug
 102 | sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
 103 | echo "0" | sudo tee /proc/sys/kernel/kptr_restrict
 104 | echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid
 105 | sudo chown `whoami` /sys/kernel/debug/tracing/uprobe_events
 106 | sudo chmod a+rw /sys/kernel/debug/tracing/uprobe_events
 107 | ```
 108 | 
 109 | ^ If we have root, we have the ability to extract more information from `perf` traces. Kernel debug symbols are a nice to have, but not a need to have, so if you don't have root, don't fret too much.
 110 | 
 111 | ---
 112 | 
 113 | ## `perf` MWE
 114 | 
 115 | ```bash
 116 | $ perf stat ls
 117 | data  Desktop  Documents  Downloads  Music  Pictures  Public  Templates  TIS  Videos
 118 | 
 119 |  Performance counter stats for 'ls':
 120 | 
 121 |               2.78 msec task-clock:u              #    0.094 CPUs utilized          
 122 |                  0      context-switches:u        #    0.000 K/sec                  
 123 |                  0      cpu-migrations:u          #    0.000 K/sec                  
 124 |                283      page-faults:u             #    0.102 M/sec                  
 125 |            838,657      cycles:u                  #    0.302 GHz                    
 126 |            584,659      instructions:u            #    0.70  insn per cycle         
 127 |            128,106      branches:u                #   46.109 M/sec                  
 128 |              7,907      branch-misses:u           #    6.17% of all branches        
 129 | 
 130 |        0.029630910 seconds time elapsed
 131 | 
 132 |        0.000000000 seconds user
 133 |        0.003539000 seconds sys
 134 | ```
 135 | 
 136 | ^ This is the `perf` "hello world". You might see something a bit different depending on your architecture and `perf` version.
 137 | 
 138 | ---
 139 | 
 140 | ## Why `perf`?
 141 | 
 142 | There are lots of great performance analysis tools (Intel VTune, Score-P, tau, cachegrind), but my opinion is that `perf` should be the first tool you reach for.
 143 | 
 144 | ---
 145 | 
 146 | ## Why `perf`?
 147 | 
 148 | - No fighting for a license, or install Java runtimes on HPC clusters
 149 | - No need to vandalize source code, or be constrained to work with a set of specific languages
 150 | - Text GUI, so easy to use in terminal and over `ssh`
 151 | 
 152 | ---
 153 | 
 154 | ## Why `perf`?
 155 | 
 156 | - Available on any Linux system
 157 | - Not limited to x86: works on ARM, RISC-V, PowerPC, Sparc
 158 | - Samples rather than models your program
 159 | - Doesn't slow your program down
 160 | 
 161 | ^ I was trained in mathematics, and I love learning math because it feels permanent. The situation in computer science is much worse. For example, if no one decides to write a Fortran compiler that targets the new Apple M1 chip, there's no Fortran on the Apple M1! So learning tools which will last is important to me.
 162 | 
 163 | ^ `perf` is part of the linux kernel, so it has credibility that it will survive for a long time. It also works on any architecture Linux compiles on, so it's widely available. As a sampling profiler, it relies on statistics, not a model of your program.
 164 | 
 165 | ---
 166 | 
 167 | ## Why not `perf`?
 168 | 
 169 | - Text GUI, so fancy graphics must be generated by post-processing
 170 | - *Only* available on Linux
 171 | - Significant limitations when profiling GPUs
 172 | 
 173 | ---
 174 | 
 175 | ### `src/mwe.cpp`
 176 | 
 177 | ```cpp
 178 | #include <iostream>
 179 | #include <vector>
 180 | 
 181 | double dot_product(double* a, double* b, size_t n) {
 182 |     double d = 0;
 183 |     for (size_t i = 0; i < n; ++i) {
 184 |         d += a[i]*b[i];
 185 |     }
 186 |     return d;
 187 | }
 188 | 
 189 | int main(int argc, char** argv) {
 190 |     if (argc != 2) {
 191 |         std::cerr << "Usage: ./dot 10\n";
 192 |         return 1;
 193 |     }
 194 |     size_t n = atoi(argv[1]);
 195 |     std::vector<double> a(n);
 196 |     std::vector<double> b(n);
 197 |     for (size_t i = 0; i < n; ++i) {
 198 |         a[i] = i;
 199 |         b[i] = 1/double(i+3);
 200 |     }
 201 |     double d = dot_product(a.data(), b.data(), n);
 202 |     std::cout << "a·b = " << d << "\n";
 203 | }
 204 | ```
 205 | 
 206 | ---
 207 | 
 208 | ## Running the MWE under `perf`
 209 | 
 210 | ```bash
 211 | $ g++ src/mwe.cpp
 212 | $ perf stat ./a.out 1000000000
 213 | a.b = 1e+09
 214 | 
 215 |  Performance counter stats for './a.out 1000000000':
 216 | 
 217 |          14,881.09 msec task-clock:u              #    0.999 CPUs utilized
 218 |                  0      context-switches:u        #    0.000 K/sec
 219 |                  0      cpu-migrations:u          #    0.000 K/sec
 220 |             17,595      page-faults:u             #    0.001 M/sec
 221 |     39,657,728,345      cycles:u                  #    2.665 GHz                      (50.00%)
 222 |     27,974,789,022      stalled-cycles-frontend:u #   70.54% frontend cycles idle     (50.01%)
 223 |      6,000,965,962      stalled-cycles-backend:u  #   15.13% backend cycles idle      (50.01%)
 224 |     88,999,950,765      instructions:u            #    2.24  insn per cycle
 225 |                                                   #    0.31  stalled cycles per insn  (50.00%)
 226 |     15,998,544,101      branches:u                # 1075.093 M/sec                    (49.99%)
 227 |             37,578      branch-misses:u           #    0.00% of all branches          (49.99%)
 228 | 
 229 |       14.892496917 seconds time elapsed
 230 | 
 231 |       13.566616000 seconds user
 232 |        1.199643000 seconds sys
 233 | ```
 234 | 
 235 | ^ If you have a different `perf` version, you might see `stalled-cycles:frontend` and `stalled-cycles:backend`.
 236 | Stalled frontend cycles are those where instructions could not be decoded fast enough to operate on the data.
 237 | Stalled backend cycles are those where data did not arrive fast enough. Backend cycles stall much more frequently than frontend cycles. See [here](https://stackoverflow.com/questions/22165299) for more details.
 238 | 
 239 | ---
 240 | 
 241 | ## Learning from `perf stat`
 242 | 
 243 | - 2.24 instructions/cycle and large number of stalled frontend cycles means we're probably CPU bound. Right? Right?? (Stay tuned)
 244 | - Our branch miss rate is really good!
 245 | 
 246 | But it's not super informative, nor is it actionable.
 247 | 
 248 | ---
 249 | 
 250 | ## Aside on 'frontend-cycles' vs 'backend-cycles'
 251 | 
 252 | ![inline](figures/frontend_v_backend.png)
 253 | 
 254 | [Source](https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html)
 255 | 
 256 | This is how Intel divvies up the "frontend" and "backend" of the CPU. Frontend is responsible for instruction scheduling and decoding, the backend is for executing instructions and fetching data.
 257 | 
 258 | ---
 259 | 
 260 | > The cycles stalled in the back-end are a waste because the CPU has to wait for resources (usually memory) or to finish long latency instructions (e.g. transcedentals - sqrt, reciprocals, divisions, etc.). The cycles stalled in the front-end are a waste because that means that the Front-End does not feed the Back End with micro-operations. This can mean that you have misses in the Instruction cache, or complex instructions that are not already decoded in the micro-op cache. Just-in-time compiled code usually expresses this behavior.
 261 | 
 262 | -- [stackoverflow](https://stackoverflow.com/a/29059380/)
 263 | 
 264 | ---
 265 | 
 266 | ## Learning from `perf stat`
 267 | 
 268 | `perf` is written by kernel developers, so the `perf stat` defaults are for them.
 269 | 
 270 | At ORNL, we're HPC developers, so let's make some changes. What stats do we have available?
 271 | 
 272 | ---
 273 | 
 274 | ```
 275 | $ perf list
 276 | List of pre-defined events (to be used in -e):
 277 | 
 278 |   branch-misses                                      [Hardware event]
 279 |   cache-misses                                       [Hardware event]
 280 |   cache-references                                   [Hardware event]
 281 |   instructions                                       [Hardware event]
 282 |   task-clock                                         [Software event]
 283 | 
 284 |   L1-dcache-load-misses                              [Hardware cache event]
 285 |   L1-dcache-loads                                    [Hardware cache event]
 286 |   LLC-load-misses                                    [Hardware cache event]
 287 |   LLC-loads                                          [Hardware cache event]
 288 | 
 289 |   cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
 290 |   cache-references OR cpu/cache-references/          [Kernel PMU event]
 291 |   power/energy-cores/                                [Kernel PMU event]
 292 |   power/energy-pkg/                                  [Kernel PMU event]
 293 |   power/energy-ram/                                  [Kernel PMU event]
 294 | ```
 295 | 
 296 | ^ Every architecture has a different set of PMCs, so this list will be different for everyone. I like the `power` measurements, since speed is not the only sensible objective we might want to pursue.
 297 | 
 298 | ---
 299 | 
 300 | ## Custom events
 301 | 
 302 | ```
 303 | perf stat -e instructions,cycles,L1-dcache-load-misses,L1-dcache-loads,LLC-load-misses,LLC-loads ./dot 100000000
 304 | a.b = 9.99999e+07
 305 | 
 306 |  Performance counter stats for './dot 100000000':
 307 | 
 308 |      8,564,368,466      instructions:u            #    1.41  insn per cycle           (49.98%)
 309 |      6,060,955,584      cycles:u                                                      (66.65%)
 310 |         34,089,080      L1-dcache-load-misses:u   #    0.90% of all L1-dcache hits    (83.34%)
 311 |      3,805,929,303      L1-dcache-loads:u                                             (83.32%)
 312 |            854,522      LLC-load-misses:u         #   39.87% of all LL-cache hits     (33.31%)
 313 |          2,143,437      LLC-loads:u                                                   (33.31%)
 314 | 
 315 |        5.045450844 seconds time elapsed
 316 | 
 317 |        2.856660000 seconds user
 318 |        2.185739000 seconds sys
 319 | ```
 320 | 
 321 | ^ Hmm . . . 40% LL cache miss rate, yet 1.4 instructions/cycle. This CPU-bound vs memory-bound is a bit complicated . . .
 322 | 
 323 | ^ Personally I don't regard CPU-bound vs memory-bound to be an "actionable" way of thinking. We can turn a slow CPU bound program into a fast memory-bound program by just not doing dumb stuff.
 324 | 
 325 | ---
 326 | 
 327 | ## Custom events: gotchas
 328 | 
 329 | These events are not stable across CPU architectures, nor even `perf` versions!
 330 | 
 331 | The events expose the functionality of hardware counters; different hardware has different counters.
 332 | 
 333 | And someone needs to do the work of exposing them in `perf`!
 334 | 
 335 | ---
 336 | 
 337 | ```
 338 | $ perf list
 339 |   cycle_activity.stalls_l1d_pending                 
 340 |        [Execution stalls due to L1 data cache misses]
 341 |   cycle_activity.stalls_l2_pending                  
 342 |        [Execution stalls due to L2 cache misses]
 343 |   cycle_activity.stalls_ldm_pending                 
 344 |        [Execution stalls due to memory subsystem]
 345 | $ perf stat -e cycle_activity.stalls_ldm_pending,cycle_activity.stalls_l2_pending,cycle_activity.stalls_l1d_pending,cycles ./dot 10000000
 346 | a.b = 9.99999e+07
 347 | 
 348 |  Performance counter stats for './dot 100000000':
 349 | 
 350 |        509,998,525      cycle_activity.stalls_ldm_pending:u                                   
 351 |        127,137,070      cycle_activity.stalls_l2_pending:u                                   
 352 |         70,555,574      cycle_activity.stalls_l1d_pending:u                                   
 353 |      5,708,220,052      cycles:u                                                    
 354 | 
 355 |        3.637099623 seconds time elapsed
 356 | 
 357 |        2.463966000 seconds user
 358 |        1.172459000 seconds sys
 359 | ```
 360 | 
 361 | ---
 362 | 
 363 | ## Kinda painful typing these events: Use `-d` (`--detailed`)
 364 | 
 365 | ```
 366 | $ perf stat -d ./dot 100000000
 367 |  Performance counter stats for './dot 100000000':
 368 | 
 369 |           1,945.17 msec task-clock:u              #    0.970 CPUs utilized
 370 |                  0      context-switches:u        #    0.000 K/sec
 371 |                  0      cpu-migrations:u          #    0.000 K/sec
 372 |            390,463      page-faults:u             #    0.201 M/sec
 373 |      3,329,516,701      cycles:u                  #    1.712 GHz                      (49.97%)
 374 |      1,272,884,914      instructions:u            #    0.38  insn per cycle           (62.50%)
 375 |        150,445,759      branches:u                #   77.343 M/sec                    (62.55%)
 376 |             14,766      branch-misses:u           #    0.01% of all branches          (62.53%)
 377 |         76,672,490      L1-dcache-loads:u         #   39.417 M/sec                    (62.53%)
 378 |         51,315,841      L1-dcache-load-misses:u   #   66.93% of all L1-dcache hits    (62.52%)
 379 |          7,867,383      LLC-loads:u               #    4.045 M/sec                    (49.94%)
 380 |          7,618,746      LLC-load-misses:u         #   96.84% of all LL-cache hits     (49.96%)
 381 | 
 382 |        2.005801176 seconds time elapsed
 383 | 
 384 |        0.982545000 seconds user
 385 |        0.963534000 seconds sys
 386 | ```
 387 | 
 388 | ---
 389 | 
 390 | ## `perf stat -d` output on Andes
 391 | 
 392 | ```
 393 | [nthompson@andes-login1]~/performance_tuning_tutorial% perf stat -d ./dot 1000000000
 394 | a.b = 1e+09
 395 | 
 396 |  Performance counter stats for './dot 1000000000':
 397 | 
 398 |           2,242.43 msec task-clock:u              #    0.999 CPUs utilized
 399 |                  0      context-switches:u        #    0.000 K/sec
 400 |                  0      cpu-migrations:u          #    0.000 K/sec
 401 |              8,456      page-faults:u             #    0.004 M/sec
 402 |      2,972,264,893      cycles:u                  #    1.325 GHz                      (29.99%)
 403 |          1,366,982      stalled-cycles-frontend:u #    0.05% frontend cycles idle     (30.02%)
 404 |        747,429,126      stalled-cycles-backend:u  #   25.15% backend cycles idle      (30.07%)
 405 |      3,499,896,128      instructions:u            #    1.18  insn per cycle
 406 |                                                   #    0.21  stalled cycles per insn  (30.06%)
 407 |        749,888,957      branches:u                #  334.410 M/sec                    (30.02%)
 408 |              9,206      branch-misses:u           #    0.00% of all branches          (29.98%)
 409 |      1,108,395,106      L1-dcache-loads:u         #  494.284 M/sec                    (29.97%)
 410 |         36,998,921      L1-dcache-load-misses:u   #    3.34% of all L1-dcache accesses  (29.97%)
 411 |                  0      LLC-loads:u               #    0.000 K/sec                    (29.97%)
 412 |                  0      LLC-load-misses:u         #    0.00% of all LL-cache accesses  (29.97%)
 413 | 
 414 |        2.244079417 seconds time elapsed
 415 | 
 416 |        1.000742000 seconds user
 417 |        1.214037000 seconds sys
 418 | ```
 419 | 
 420 | ^ Lots of backend cycles stalled in this one. This could be from high latency operations like divisions or from slow memory accesses.
 421 | 
 422 | ---
 423 | 
 424 | ## `perf stat` on a different type of computation
 425 | 
 426 | ```
 427 | [nthompson@andes-login1]~/performance_tuning_tutorial% perf stat -d git archive --format=tar.gz --prefix=HEAD/ HEAD > HEAD.tar.gz
 428 | 
 429 |  Performance counter stats for 'git archive --format=tar.gz --prefix=HEAD/ HEAD':
 430 | 
 431 |              99.81 msec task-clock:u              #    0.795 CPUs utilized
 432 |                  0      context-switches:u        #    0.000 K/sec
 433 |                  0      cpu-migrations:u          #    0.000 K/sec
 434 |              1,408      page-faults:u             #    0.014 M/sec
 435 |        276,165,489      cycles:u                  #    2.767 GHz                      (28.07%)
 436 |         72,227,873      stalled-cycles-frontend:u #   26.15% frontend cycles idle     (28.36%)
 437 |         60,614,109      stalled-cycles-backend:u  #   21.95% backend cycles idle      (29.37%)
 438 |        394,352,577      instructions:u            #    1.43  insn per cycle
 439 |                                                   #    0.18  stalled cycles per insn  (30.58%)
 440 |         66,882,750      branches:u                #  670.113 M/sec                    (31.95%)
 441 |          2,974,856      branch-misses:u           #    4.45% of all branches          (32.23%)
 442 |        183,326,327      L1-dcache-loads:u         # 1836.788 M/sec                    (31.19%)
 443 |             49,245      L1-dcache-load-misses:u   #    0.03% of all L1-dcache accesses  (30.05%)
 444 |                  0      LLC-loads:u               #    0.000 K/sec                    (29.37%)
 445 |                  0      LLC-load-misses:u         #    0.00% of all LL-cache accesses  (28.84%)
 446 | 
 447 |        0.125489614 seconds time elapsed
 448 | 
 449 |        0.092736000 seconds user
 450 |        0.006905000 seconds sys
 451 | ```
 452 | 
 453 | ^ Compression has much higher instruction complexity than a dot product, and we see that reflected here in the stalled frontend cycles. We also have a much higher branch miss rate.
 454 | 
 455 | ---
 456 | 
 457 | ## `perf stat` is great for reporting . . .
 458 | 
 459 | But not super actionable.
 460 | 
 461 | ---
 462 | 
 463 | ## Get Actionable Data
 464 | 
 465 | ```
 466 | $ perf record -g ./dot 100000000
 467 | a.b = 9.99999e+07
 468 | [ perf record: Woken up 3 times to write data ]
 469 | [ perf record: Captured and wrote 0.735 MB perf.data (5894 samples) ]
 470 | $ perf report -g -M intel
 471 | ```
 472 | 
 473 | ![inline](figures/perf_report_homescreen.png)
 474 | 
 475 | ---
 476 | 
 477 | ## Wait, what's actionable about this?
 478 | 
 479 | See how half the time is spend in the `std::vector` allocator?
 480 | 
 481 | That be a clue.
 482 | 
 483 | ---
 484 | 
 485 | ## Self and Children
 486 | 
 487 | - The `Self` column says how much time was taken within the function.
 488 | - The `Children` column says how much time was spent in functions called by the function.
 489 | 
 490 | - If the `Children` column value is very near the `Self` column value, that function isn't your hotspot!
 491 | 
 492 | 
 493 | ---
 494 | 
 495 | If `Self` and `Children` is confusing, just get rid of it:
 496 | 
 497 | ```bash
 498 | $ perf report -g -M intel --no-children
 499 | ```
 500 | 
 501 | ---
 502 | 
 503 | ## More intelligible `perf report`
 504 | 
 505 | ```bash
 506 | $ perf report --no-children -s dso,sym,srcline
 507 | ```
 508 | 
 509 | Best to put this in a `perf config`:
 510 | 
 511 | ```
 512 | $ perf config --user report.children=false
 513 | $ cat ~/.perfconfig
 514 | [report]
 515 | 	children = false
 516 | ```
 517 | 
 518 | ---
 519 | 
 520 | ## Some other nice config options
 521 | 
 522 | ```
 523 | $ perf config --user annotate.disassembler_style=intel
 524 | $ perf config --user report.percent-limit=0.1
 525 | ```
 526 | 
 527 | ---
 528 | 
 529 | ## Disassembly
 530 | 
 531 | ![inline](figures/dot_disassembly.png)
 532 | 
 533 | ---
 534 | 
 535 | ## What is happening?????
 536 | 
 537 | - If you don't know x86 assembly, I recommend Ray Seyfarth's [Introduction to 64 Bit Assembly Language Programming for Linux and OS X](http://rayseyfarth.com/asm/)
 538 | 
 539 | - If you need to look up instructions one at a time, Felix Cloutier's [x64 reference](https://www.felixcloutier.com/x86/) is a great resource.
 540 | 
 541 | - If you need to examine how compiler flags interact with generated assembly, try [godbolt](https://godbolt.org).
 542 | 
 543 | ---
 544 | 
 545 | ## Detour: System V ABI (Linux)
 546 | 
 547 | - Floating point arguments are passed in registers `xmm0-xmm7`.
 548 | - Integer parameters are passed in registers `rdi`, `rsi`, `rdx`, `rcx`, `r8`, and `r9`, in that order.
 549 | - A floating point return value is placed in register `xmm0`.
 550 | - Integer return values are placed in `rax`.
 551 | 
 552 | Knowing this makes your godbolt's a bit easier to read!
 553 | 
 554 | ---
 555 | 
 556 | ## The default assembly generated by gcc is braindead
 557 | 
 558 | See the [godbolt](https://godbolt.org/z/8qqhGj).
 559 | 
 560 | - Superfluous stack writes.
 561 | - No AVX instructions, no fused-multiply adds
 562 | 
 563 | Consequence: Lots of time spent moving data around.
 564 | 
 565 | ---
 566 | 
 567 | ## Sidetrack: Fused-multiply add
 568 | 
 569 | We'll use the fused-multiply add instruction as a "canonical" example of an instruction which we *want* generated, but due to history, chaos, and dysfunction, generally *isn't*.
 570 | 
 571 | ---
 572 | 
 573 | ## Sidetrack: Fused-multiply add
 574 | 
 575 | The fma is defined as
 576 | 
 577 | $$
 578 | \mathrm{fma}(a,b,c) := \mathrm{rnd}(a*b + c)
 579 | $$
 580 | 
 581 | i.e., the multiplication and addition is performanced in a single instruction, with a single rounding.
 582 | 
 583 | ^ I recently determined `gcc` wasn't generating fma's in our flagship product VTK-m. It's often said that it's meaningless to talk about performance of code compiled without optimizations. Implicit in this statement is another: It's incredibly difficult to convince the compiler to generate optimized assembly! (The Intel compiler is very good in this regard.)
 584 | 
 585 | ---
 586 | 
 587 | ## My preferred CPPFLAGS:
 588 | 
 589 | ```
 590 | -g -O3 -ffast-math -fno-finite-math-only -march=native -fno-omit-frame-pointer
 591 | ```
 592 | 
 593 | How does that look on [godbolt](https://godbolt.org/z/4dnfYb)?
 594 | 
 595 | Key instruction: `vfmadd132pd`; vectorized fused multiply add on `xmm`/`ymm` registers.
 596 | 
 597 | ---
 598 | 
 599 | ## Recompile with good flags
 600 | 
 601 | ```
 602 | $ make
 603 | $ perf stat ./dot 100000000     
 604 | a.b = 9.99999e+07
 605 | 
 606 |  Performance counter stats for './dot 100000000':
 607 | 
 608 |           2,428.06 msec task-clock:u              #    0.998 CPUs utilized          
 609 |                  0      context-switches:u        #    0.000 K/sec                  
 610 |                  0      cpu-migrations:u          #    0.000 K/sec                  
 611 |            390,994      page-faults:u             #    0.161 M/sec                  
 612 |      3,651,637,732      cycles:u                  #    1.504 GHz                    
 613 |      1,676,766,309      instructions:u            #    0.46  insn per cycle         
 614 |        225,636,250      branches:u                #   92.929 M/sec                  
 615 |              9,303      branch-misses:u           #    0.00% of all branches        
 616 | 
 617 |        2.432163719 seconds time elapsed
 618 | ```
 619 | 
 620 | 1/3rd of the instructions/cycle, yet twice as fast, because it ran ~1/5th the number of instructions.
 621 | 
 622 | 
 623 | ---
 624 | 
 625 | # Exercise
 626 | 
 627 | Look at the code of `src/mwe.cpp`. Is it really measuring a dot product? Look at it under `perf report`.
 628 | 
 629 | Fix it if not.
 630 | 
 631 | ^ The performance of `src/mwe.cpp` is dominated by the cost of initializing data.
 632 | ^ The data initialization converts integers to floats and does divisions. Removing these increases the performance.
 633 | ^ Even once this is done, 40% of the time is spent in data allocation. This indicates a need for a more sophisticated approach.
 634 | 
 635 | ---
 636 | 
 637 | ## Register width
 638 | 
 639 | - The 8008 architecture from 1972 had 8 bit registers, now vaguely resembling our current `al` register.
 640 | 
 641 | - 16 bit registers were added to the 8086 in 1972; now labelled `ax`.
 642 | 
 643 | - 32 bit registers on the 80386 architecture in 1985; these are now prefixed with `e`, such as the `eax`, `ebx`, so on.
 644 | 
 645 | - 64 bit registers were added in 2003 for the `x86_64` architecture. They are prefixed with `r`, such as the `rax` and `rbx` registers.
 646 | 
 647 | ---
 648 | 
 649 | ## Register width
 650 | 
 651 | Compilers utilize the full width of integer registers without much fuss. The situation for floating point registers is much worse.
 652 | 
 653 | 
 654 | ---
 655 | 
 656 | ## Floating point register width
 657 | 
 658 | An `xmm` register is 128 bits wide, and can hold 2 doubles, or 4 floats.
 659 | 
 660 | AVX2 introduced the `ymm` registers, which are 256 bits wide, and can hold 4 doubles, or 8 floats.
 661 | 
 662 | AVX-512 (2016) introduced the `zmm` registers, which can hold 8 doubles or 16 floats.
 663 | 
 664 | ---
 665 | 
 666 | To determine if your CPU has `ymm` registers, check for avx2 instruction support:
 667 | 
 668 | ```bash
 669 | $ lscpu | grep avx2
 670 | Flags:
 671 | fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
 672 | dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon
 673 | pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64
 674 | monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt
 675 | tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi
 676 | flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
 677 | ```
 678 | 
 679 | or (on Centos)
 680 | 
 681 | ```bash
 682 | $ cat /proc/cpuinfo | grep avx2
 683 | ```
 684 | 
 685 | ---
 686 | 
 687 | ## Mind bogglement
 688 | 
 689 | I couldn't get `gcc` or `clang` to generate AVX-512 instructions, so I went looking for the story . . .
 690 | 
 691 | ---
 692 | 
 693 | ## Mind bogglement
 694 | 
 695 | > I hope AVX512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on. I hope Intel gets back to basics: gets their process working again, and concentrate more on regular code that isn't HPC or some other pointless special case.
 696 | 
 697 | -- [Linus Torvalds](https://www.realworldtech.com/forum/?threadid=193189&curpostid=193190)
 698 | 
 699 | ---
 700 | 
 701 | ## Vector instructions
 702 | 
 703 | Even in the CS people don't like AVX-512, it is still difficult to find the magical incantations required to generate AVX2 instructions.
 704 | 
 705 | It generally requires an `-march=native` compiler flag.
 706 | 
 707 | ---
 708 | 
 709 | ## Beautiful assembly:
 710 | 
 711 | ![inline](figures/ymm_dot_product.png)
 712 | 
 713 | ---
 714 | 
 715 | ## Exercise
 716 | 
 717 | On *Andes*, what causes this error?
 718 | 
 719 | ```
 720 | $ module load intel/19.0.3
 721 | $ icc -march=skylake-avx512 src/mwe.cpp
 722 | $ ./a.out 1000000
 723 | zsh: illegal hardware instruction (core dumped)
 724 | ```
 725 | 
 726 | ---
 727 | 
 728 | Compiler defaults are for *compatibility*, not for performance!
 729 | 
 730 | ---
 731 | 
 732 | ## `perf report` commands
 733 | 
 734 | - `k`: Show line numbers of source code
 735 | - `o`: Show instruction number
 736 | - `t`: Switch between percentage and samples
 737 | - `J`: Number of jump sources on target; number of places that can jump here.
 738 | - `s`: Hide/Show source code
 739 | - `h`: Show options
 740 | 
 741 | ---
 742 | 
 743 | ## perf gotchas
 744 | 
 745 | - perf sometimes attributes the time in a single instruction to the *next* instruction.
 746 | 
 747 | ---
 748 | 
 749 | ## perf gotchas
 750 | 
 751 | ```
 752 |      │         if (absx < 1)
 753 | 7.76 │       ucomis xmm1,QWORD PTR [rbp-0x20]
 754 | 0.95 │     ↓ jbe    a6
 755 | 1.82 │       movsd  xmm0,QWORD PTR ds:0x46a198
 756 | 0.01 │       movsd  xmm1,QWORD PTR ds:0x46a1a0
 757 | 0.01 |       movsd  xmm2,QWORD PTR ds:0x46a100
 758 | ```
 759 | 
 760 | Hmm, so moving data into `xmm1` and `xmm2` is 182x faster than moving data into `xmm0` . . .
 761 | 
 762 | Looks like a misattribution of the `jbe`.
 763 | 
 764 | ---
 765 | 
 766 | > . .  if you're trying to capture the IP on some PMC event, and there's a delay between the PMC overflow and capturing the IP, then the IP will point to the wrong address. This is skew. Another contributing problem is that micro-ops are processed in parallel and out-of-order, while the instruction pointer points to the resumption instruction, not the instruction that caused the event.
 767 | 
 768 | --[Brendan Gregg](http://www.brendangregg.com/perf.html)
 769 | 
 770 | ---
 771 | 
 772 | ## Two sensible goals
 773 | 
 774 | Reduce power consumption, and/or reduce runtime.
 775 | 
 776 | Not necessarily the same thing. Benchmark power consumption:
 777 | 
 778 | ```
 779 | $ perf list | grep energy
 780 |   power/energy-cores/                                [Kernel PMU event]
 781 |   power/energy-pkg/                                  [Kernel PMU event]
 782 |   power/energy-ram/                                  [Kernel PMU event]
 783 | $ perf stat -e energy-cores ./dot 100000000
 784 | Performance counter stats for 'system wide':
 785 | 
 786 |               8.55 Joules power/energy-cores/
 787 | ```
 788 | 
 789 | ---
 790 | 
 791 | ## Improving reproducibility
 792 | 
 793 | For small optimizations (< 2% gains), our perf data often gets swamped in noise.
 794 | 
 795 | ```
 796 | $ perf stat -e uops_retired.all,instructions,cycles -r 5 ./dot 100000000
 797 |  Performance counter stats for './dot 100000000' (5 runs):
 798 | 
 799 |      1,817,358,542      uops_retired.all:u                                            ( +-  0.00% )
 800 |      1,276,765,688      instructions:u            #    0.45  insn per cycle           ( +-  0.00% )
 801 |      2,823,559,592      cycles:u                                                      ( +-  0.11% )
 802 | 
 803 |             2.1110 +- 0.0422 seconds time elapsed  ( +-  2.00% )
 804 | ```
 805 | 
 806 | ---
 807 | 
 808 | # Improving reproducibility
 809 | 
 810 | Small optimizations are really important, but really hard to measure reliably.
 811 | 
 812 | See [Producing wrong data without doing anything obviously wrong!](https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf)
 813 | 
 814 | Link order, environment variables, [running in a new directory](https://youtu.be/koTf7u0v41o?t=1318), cache set of hot instructions can have huge impact on performance!
 815 | 
 816 | ---
 817 | 
 818 | ## Improving reproducibility
 819 | 
 820 | Instruction count and uops are reproducible, but time and cycles are not.
 821 | 
 822 | Use instruction count and uops retired as imperfect metrics for small optimizations when variance in runtime will swamp improvements.
 823 | 
 824 | ---
 825 | 
 826 | ## Long tail `perf`
 827 | 
 828 | Attaching to a running process or MPI rank
 829 | 
 830 | ```bash
 831 | $ top # find rogue process
 832 | $ perf stat -d -p `pidof paraview`
 833 | ^C
 834 | ```
 835 | 
 836 | ---
 837 | 
 838 | ## Long tail `perf`
 839 | 
 840 | Sometimes, `perf` will gather *way* too much data, creating a huge `perf.data` file.
 841 | 
 842 | Solved by reducing sampling frequency:
 843 | 
 844 | ```bash
 845 | $ perf record -F 10 ./dot 100000000
 846 | ```
 847 | 
 848 | or compressing (requires compilation with `zstd` support):
 849 | 
 850 | ```bash
 851 | $ perf record -z ./dot 100000
 852 | ```
 853 | 
 854 | ---
 855 | 
 856 | ## Exercise
 857 | 
 858 | Replace the computation of $$\mathbf{a}\cdot \mathbf{b}$$ with the computation of $$\left\|\mathbf{a}\right\|^2$$.
 859 | 
 860 | This halves the number of memory references/flop.
 861 | 
 862 | Is it observable under `perf stat`?
 863 | 
 864 | ^ I see a meaningful reduction in L1 cache miss rate.
 865 | 
 866 | ---
 867 | 
 868 | 
 869 | # Exercise
 870 | 
 871 | Parallelize the dot product using a framework of your choice.
 872 | 
 873 | How does it look under `perf`?
 874 | 
 875 | 
 876 | ---
 877 | 
 878 | # Solution: OpenMP
 879 | 
 880 | ```cpp
 881 | double dot_product(double* a, double* b, size_t n) {
 882 |     double d = 0;
 883 |     #pragma omp parallel for reduction(+:d)
 884 |     for (size_t i = 0; i < n; ++i) {
 885 |         d += a[i]*b[i];
 886 |     }
 887 |     return d;
 888 | }
 889 | ```
 890 | 
 891 | ---
 892 | 
 893 | # Solution: C++17
 894 | 
 895 | 
 896 | ```cpp
 897 | double d = std::transform_reduce(std::execution::par_unseq,
 898 |                                  a.begin(), a.end(), b.begin(), 0.0);
 899 | ```
 900 | 
 901 | (FYI: I had to do a [source build](https://github.com/oneapi-src/oneTBB/) of TBB to get this to work.)
 902 | 
 903 | ---
 904 | 
 905 | ## Parallel `perf` lessons
 906 | 
 907 | `perf` is great at finding *hotspots*, not so great at finding coldspots.
 908 | 
 909 | [hotspot](https://github.com/KDAB/hotspot), discussed later, will overcome this problem.
 910 | 
 911 | ---
 912 | 
 913 | Break?
 914 | 
 915 | ---
 916 | 
 917 | ## Session 2 Goals
 918 | 
 919 | - Learn about google/benchmark
 920 | - Profile entire workflows and generate flamegraphs and timecharts
 921 | 
 922 | ---
 923 | 
 924 | ## Challenges we need to overcome
 925 | 
 926 | - Our MWE spent fully half its time initializing data. That's not very interesting.
 927 | - We could only specify one vector length at a time. What if we'd written a performance bug that induced quadratic scaling?
 928 | 
 929 | ---
 930 | 
 931 | ## A [google/benchmark](https://github.com/google/benchmark/) [example](https://github.com/boostorg/math/blob/develop/reporting/performance/chebyshev_clenshaw.cpp):
 932 | 
 933 | ```bash
 934 | $ ./reporting/performance/chebyshev_clenshaw.x --benchmark_filter=^ChebyshevClenshaw
 935 | 2020-10-16T15:36:34-04:00
 936 | Running ./reporting/performance/chebyshev_clenshaw.x
 937 | Run on (16 X 2300 MHz CPU s)
 938 | CPU Caches:
 939 |   L1 Data 32 KiB (x8)
 940 |   L1 Instruction 32 KiB (x8)
 941 |   L2 Unified 256 KiB (x8)
 942 |   L3 Unified 16384 KiB (x1)
 943 | Load Average: 2.49, 2.29, 2.09
 944 | ----------------------------------------------------------------------------
 945 | Benchmark                                  Time             CPU   Iterations
 946 | ----------------------------------------------------------------------------
 947 | ChebyshevClenshaw<double>/2            0.966 ns        0.965 ns    637018028
 948 | ChebyshevClenshaw<double>/4             1.69 ns         1.69 ns    413440355
 949 | ChebyshevClenshaw<double>/8             4.26 ns         4.25 ns    161924589
 950 | ChebyshevClenshaw<double>/16            13.3 ns         13.3 ns     52107759
 951 | ChebyshevClenshaw<double>/32            39.4 ns         39.4 ns     17071255
 952 | ChebyshevClenshaw<double>/64             108 ns          108 ns      6438439
 953 | ChebyshevClenshaw<double>/128            246 ns          245 ns      2852707
 954 | ChebyshevClenshaw<double>/256            522 ns          521 ns      1316359
 955 | ChebyshevClenshaw<double>/512           1100 ns         1100 ns       640076
 956 | ChebyshevClenshaw<double>/1024          2180 ns         2179 ns       311353
 957 | ChebyshevClenshaw<double>/2048          4499 ns         4496 ns       152754
 958 | ChebyshevClenshaw<double>/4096          9086 ns         9081 ns        79369
 959 | ChebyshevClenshaw<double>_BigO          2.27 N          2.26 N
 960 | ChebyshevClenshaw<double>_RMS              4 %             4 %
 961 | ```
 962 | 
 963 | ---
 964 | 
 965 | ## Goals for google/benchmark
 966 | 
 967 | - Empirically determine asymptotic complexity; is it $$\mathcal{O}(N)$$, $$\mathcal{O}(N^2)$$, or $$\mathcal{O}(\log(N))$$?
 968 | - Test inputs of different lengths
 969 | - Test different types (`float`, `double`, `long double`)
 970 | - Dominate the runtime with interesting and relevant operations so our `perf` traces are more meaningful.
 971 | 
 972 | ---
 973 | 
 974 | ## Installation
 975 | 
 976 | - Grab a [release tarball](https://github.com/google/benchmark/releases)
 977 | - `pip install google-benchmark`
 978 | - `brew install google-benchmark`
 979 | - `spack install benchmark`
 980 | 
 981 | ---
 982 | 
 983 | ## Installation
 984 | 
 985 | Source build
 986 | 
 987 | ```bash
 988 | $ git clone https://github.com/google/benchmark.git
 989 | $ cd benchmark && mkdir build && cd build
 990 | build$ cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_TESTING=OFF ../ -G Ninja
 991 | build$ ninja
 992 | build$ sudo ninja install
 993 | ```
 994 | 
 995 | ---
 996 | 
 997 | ## Example: `benchmarks/bench.cpp`
 998 | 
 999 | ```cpp
1000 | #include <vector>
1001 | #include <random>
1002 | #include <benchmark/benchmark.h>
1003 | 
1004 | template<class Real>
1005 | void DotProduct(benchmark::State& state) {
1006 |     std::vector<Real> a(state.range(0));
1007 |     std::vector<Real> b(state.range(0));
1008 |     std::random_device rd;
1009 |     std::uniform_real_distribution<Real> unif(-1,1);
1010 |     for (size_t i = 0; i < a.size(); ++i) {
1011 |         a[i] = unif(rd);
1012 |         b[i] = unif(rd);
1013 |     }
1014 | 
1015 |     for (auto _ : state) {
1016 |         benchmark::DoNotOptimize(dot_product(a.data(), b.data(), a.size()));
1017 |     }
1018 |     state.SetComplexityN(state.range(0));
1019 | }
1020 | 
1021 | BENCHMARK_TEMPLATE(DotProduct, float)->RangeMultiplier(2)->Range(1<<3, 1<<18)->Complexity();
1022 | BENCHMARK_TEMPLATE(DotProduct, double)->DenseRange(8, 1024*1024, 512)->Complexity();
1023 | BENCHMARK_TEMPLATE(DotProduct, long double)->RangeMultiplier(2)->Range(1<<3, 1<<18)->Complexity(benchmark::oN);
1024 | 
1025 | BENCHMARK_MAIN();
1026 | ```
1027 | 
1028 | ---
1029 | 
1030 | Instantiate a benchmark on type float:
1031 | 
1032 | ```cpp
1033 | BENCHMARK_TEMPLATE(DotProduct, float);
1034 | ```
1035 | 
1036 | Test on vectors of length 8, 16, 32,.., 262144:
1037 | 
1038 | ```
1039 | ->RangeMultiplier(2)->Range(1<<3, 1<<18)
1040 | ```
1041 | 
1042 | Regress the performance data against $$\mathcal{O}(\log(n)), \mathcal{O}(n), \mathcal{O}(n^2), \mathcal{O}(n^3)$$:
1043 | 
1044 | ```
1045 | ->Complexity();
1046 | ```
1047 | 
1048 | ---
1049 | 
1050 | Force regression against $$\mathcal{O}(n)$$:
1051 | 
1052 | ```
1053 | ->Complexity(benchmark::oN);
1054 | ```
1055 | 
1056 | Repeat the calculation until confidence in the runtime is obtained:
1057 | 
1058 | ```cpp
1059 | for (auto _ : state) { ... }
1060 | ```
1061 | 
1062 | Make sure the compiler doesn't elide these instructions:
1063 | 
1064 | ```cpp
1065 | benchmark::DoNotOptimize(dot_product(a.data(), b.data(), a.size()));
1066 | ```
1067 | 
1068 | ---
1069 | 
1070 | ## google/benchmark party tricks: Visualize complexity
1071 | 
1072 | Set a counter to the length of the vector:
1073 | 
1074 | ```
1075 | state.counters["n"] = state.range(0);
1076 | ```
1077 | 
1078 | Then get the output as CSV:
1079 | 
1080 | ```
1081 | benchmarks$ ./dot_bench --benchmark_format=csv
1082 | ```
1083 | 
1084 | Finally, copy-paste the console output into [scatterplot.online](https://scatterplot.online/)
1085 | 
1086 | ---
1087 | 
1088 | ![inline](figures/benchmark_linear_complexity.png)
1089 | 
1090 | ---
1091 | 
1092 | ## `SetBytesProcessed`
1093 | 
1094 | We can attack the memory-bound vs CPU bound problem via `SetBytesProcessed`.
1095 | 
1096 | ```
1097 |  ./dot_bench --benchmark_filter=DotProduct\<double
1098 | 2020-10-18T12:33:41-04:00
1099 | Running ./dot_bench
1100 | Run on (16 X 4300 MHz CPU s)
1101 | CPU Caches:
1102 |   L1 Data 32 KiB (x8)
1103 |   L1 Instruction 32 KiB (x8)
1104 |   L2 Unified 1024 KiB (x8)
1105 |   L3 Unified 11264 KiB (x1)
1106 | Load Average: 0.63, 0.54, 0.72
1107 | -------------------------------------------------------------------------------------
1108 | Benchmark                           Time             CPU   Iterations UserCounters...
1109 | -------------------------------------------------------------------------------------
1110 | DotProduct<double>/64           0.004 us        0.004 us    155850953 bytes_per_second=212.277G/s n=64
1111 | DotProduct<double>/128          0.010 us        0.010 us     73113102 bytes_per_second=200.232G/s n=128
1112 | DotProduct<double>/256          0.015 us        0.015 us     45589300 bytes_per_second=247.706G/s n=256
1113 | DotProduct<double>/512          0.029 us        0.029 us     24430471 bytes_per_second=266.21G/s n=512
1114 | DotProduct<double>/1024         0.056 us        0.056 us     12490510 bytes_per_second=273.686G/s n=1024
1115 | DotProduct<double>/2048         0.158 us        0.158 us      4413687 bytes_per_second=193.436G/s n=2.048k
1116 | DotProduct<double>/4096         0.676 us        0.676 us      1035341 bytes_per_second=90.2688G/s n=4.096k
1117 | DotProduct<double>/8192          1.33 us         1.33 us       520428 bytes_per_second=91.5784G/s n=8.192k
1118 | DotProduct<double>/16384         2.71 us         2.71 us       258728 bytes_per_second=89.9407G/s n=16.384k
1119 | DotProduct<double>/32768         5.51 us         5.51 us       127636 bytes_per_second=88.5911G/s n=32.768k
1120 | DotProduct<double>/65536         19.9 us         19.9 us        35225 bytes_per_second=49.1777G/s n=65.536k
1121 | DotProduct<double>/131072        77.7 us         77.7 us         9013 bytes_per_second=25.141G/s n=131.072k
1122 | DotProduct<double>/262144         157 us          157 us         4458 bytes_per_second=24.8915G/s n=262.144k
1123 | DotProduct<double>/524288         330 us          330 us         2129 bytes_per_second=23.6636G/s n=524.288k
1124 | DotProduct<double>/1048576        812 us          812 us          835 bytes_per_second=19.2495G/s n=1048.58k
1125 | ```
1126 | 
1127 | ---
1128 | 
1129 | ## Is this good or not?
1130 | 
1131 | ```bash
1132 | $ sudo lshw -class memory
1133 |   *-memory:0
1134 |        description: System Memory
1135 |        physical id: 3d
1136 |        slot: System board or motherboard
1137 |      *-bank:0
1138 |           description: DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
1139 |           physical id: 0
1140 |           serial: #@
1141 |           slot: CPU1_DIMM_A0
1142 |           size: 8GiB
1143 |           width: 64 bits
1144 |           clock: 2666MHz (0.4ns)
1145 | ```
1146 | 
1147 | So our RAM can transfer 8bytes at 2.666Ghz--19.2GB/second.
1148 | 
1149 | ---
1150 | 
1151 | ## Exercise
1152 | 
1153 | Determine the size of the lowest level cache on your machine.
1154 | 
1155 | Can you empirically observe cache effects?
1156 | 
1157 | Hint: Use the `DenseRange` option.
1158 | 
1159 | ---
1160 | 
1161 | ## Long tail `google/benchmark`
1162 | 
1163 | If you have root, you can decrease run-to-run variance via
1164 | 
1165 | ```
1166 | $ sudo cpupower frequency-set --governor performance
1167 | ```
1168 | 
1169 | ---
1170 | 
1171 | # perf + google/benchmark
1172 | 
1173 | ```
1174 | $ perf record -g ./dot_bench --benchmark_filter=DotProduct\<double
1175 | $ perf annotate
1176 | Percent│       cmp           rdi,0x2                                                                                         ▒
1177 |        │     ↓ jbe           795                                                                                             ▒
1178 |   0.30 │       xor           eax,eax                                                                                         ▒
1179 |   0.04 │       vxorpd        xmm0,xmm0,xmm0                                                                                  ▒
1180 |        │       nop                                                                                                           ◆
1181 |        │     d += a[i]*b[i];                                                                                                 ▒
1182 |  25.30 │2e0:┌─→vmovupd       ymm2,YMMWORD PTR [r13+rax*1+0x0]                                                                ▒
1183 |  65.58 │    │  vfmadd231pd   ymm0,ymm2,YMMWORD PTR [r12+rax*1]                                                               ▒
1184 |        │    │for (unsigned long long i = 0; i < n; ++i) {                                                                    ▒
1185 |   1.94 │    │  add           rax,0x20                                                                                        ▒
1186 |        │    ├──cmp           rdx,rax                                                                                         ▒
1187 |   1.99 │    └──jne           2e0                                                                                             
1188 | ```
1189 | 
1190 | Now most of our time is spent in the interesting part of our code.
1191 | 
1192 | ---
1193 | 
1194 | # perf + google/benchmark gotchas
1195 | 
1196 | In constrast to our previous examples, the instructions and uops count are *not* stable.
1197 | 
1198 | > The number of iterations to run is determined dynamically by running the benchmark a few times and measuring the time taken and ensuring that the ultimate result will be statistically stable.
1199 | --[Google benchmark docs](https://github.com/google/benchmark)
1200 | 
1201 | ---
1202 | 
1203 | ## Exercise
1204 | 
1205 | Profile a squared norm using google/benchmark.
1206 | 
1207 | Compute it in both `float` and `double` precision, determine asymptotic complexity, and the number of bytes/second you are able to process.
1208 | 
1209 | ---
1210 | 
1211 | ## Exercise
1212 | 
1213 | Compare interpolation search to binary search use `perf` and `googlebenchmark`.
1214 | 
1215 | ---
1216 | 
1217 | ## Break?
1218 | 
1219 | ---
1220 | 
1221 | ## Session 3
1222 | 
1223 | Flamegraphs
1224 | 
1225 | ---
1226 | 
1227 | # _What is google/benchmark not good for?_
1228 | 
1229 | Profiling workflows. It's a *microbenchmark* library.
1230 | 
1231 | But huge problems can often arise integrating even well-designed and performant functions.
1232 | 
1233 | What to do?
1234 | 
1235 | ---
1236 | 
1237 | ## [Flamegraph](https://gitlab.kitware.com/vtk/vtk-m/-/issues/499) of VTK-m graphics pipeline
1238 | 
1239 | 
1240 | ![inline](figures/read_portal.svg)
1241 | 
1242 | ---
1243 | 
1244 | Flamegraphs present *sorted* unique stack frames, width drawn proportional to samples in that frame/total samples.
1245 | 
1246 | Sorting the stack frames means the x-axis is not a time axis! Great for multithreaded code. x-axis is sorted alphabetically.
1247 | 
1248 | y-axis is that callstack.
1249 | 
1250 | See the [paper](https://queue.acm.org/detail.cfm?id=2927301).
1251 | 
1252 | ---
1253 | 
1254 | ## Flamegraphs
1255 | 
1256 | ```
1257 | $ git clone https://github.com/brendangregg/FlameGraph.git
1258 | ```
1259 | 
1260 | ---
1261 | 
1262 | ## Flamegraph MWE
1263 | 
1264 | In a directory with a `perf.data` file, run
1265 | 
1266 | ```
1267 | $ perf script | ~/FlameGraph/stackcollapse-perf.pl| ~/FlameGraph/flamegraph.pl > flame.svg
1268 | $ firefox flame.svg
1269 | ```
1270 | 
1271 | I find this hard to remember, so I have an alias:
1272 | 
1273 | ```
1274 | $ alias | grep flame
1275 | flamegraph='perf script | ~/FlameGraph/stackcollapse-perf.pl| ~/FlameGraph/flamegraph.pl > flame.svg'
1276 | ```
1277 | 
1278 | ---
1279 | 
1280 | ## Viewing flamegraphs
1281 | 
1282 | Firefox is best, but no Firefox on Andes. Try ImageMagick:
1283 | 
1284 | ```
1285 | $ ssh -X `whoami`@andes.olcf.ornl.gov
1286 | $ module load imagemagick/7.0.8-7-py3
1287 | $ magick display flame.svg
1288 | ```
1289 | 
1290 | 
1291 | 
1292 | ---
1293 | 
1294 | ## Flamegraph example: VTK-m Volume Rendering
1295 | 
1296 | ```
1297 | $ git clone https://gitlab.kitware.com/vtk/vtk-m.git
1298 | $ cd vtk-m && mkdir build && cd build
1299 | $ cmake ../ \
1300 |    -DCMAKE_CXX_FLAGS="${CMAKE_CXX_FLAGS} -march=native -fno-omit-frame-pointer -Wfatal-errors -ffast-math -fno-finite-math-only -O3 -g" \
1301 |    -DVTKm_ENABLE_EXAMPLES=ON -DVTKm_ENABLE_OPENMP=ON -DVTKm_ENABLE_TESTING=OFF -G Ninja  
1302 | $ ninja
1303 | $ perf stat -d ./examples/demo/Demo
1304 | $ perf record -g ./examples/demo/Demo
1305 | ```
1306 | 
1307 | Note: If you have a huge program you'd like to profile, compile it now and follow along!
1308 | 
1309 | ---
1310 | 
1311 | # Step by step: `perf script`
1312 | 
1313 | Dumps all recorded stack traces
1314 | 
1315 | ```
1316 | $ perf script
1317 | perf 20820 510465.112358:          1 cycles:
1318 |         ffffffff9f277a8a native_write_msr+0xa ([kernel.kallsyms])
1319 |         ffffffff9f20d7ed __intel_pmu_enable_all.constprop.31+0x4d ([kernel.kallsyms])
1320 |         ffffffff9f20dc29 intel_tfa_pmu_enable_all+0x39 ([kernel.kallsyms])
1321 |         ffffffff9f207aec x86_pmu_enable+0x11c ([kernel.kallsyms])
1322 |         ffffffff9f40ac26 ctx_resched+0x96 ([kernel.kallsyms])
1323 |         ffffffff9f415562 perf_event_exec+0x182 ([kernel.kallsyms])
1324 |         ffffffff9f4e65e2 setup_new_exec+0xc2 ([kernel.kallsyms])
1325 |         ffffffff9f55a9ff load_elf_binary+0x3af ([kernel.kallsyms])
1326 |         ffffffff9f4e4441 search_binary_handler+0x91 ([kernel.kallsyms])
1327 |         ffffffff9f4e5696 __do_execve_file.isra.39+0x6f6 ([kernel.kallsyms])
1328 |         ffffffff9f4e5a49 __x64_sys_execve+0x39 ([kernel.kallsyms])
1329 |         ffffffff9f204417 do_syscall_64+0x57 ([kernel.kallsyms])
1330 | ```
1331 | 
1332 | ---
1333 | 
1334 | ## Step by step: `stackcollapse-perf.pl`
1335 | 
1336 | 
1337 | Merges duplicate stack samples, sorts them alphabetically:
1338 | 
1339 | ```
1340 | $ perf script | ~/FlameGraph/stackcollapse-perf.pl > out.folded
1341 | $ cat out.folded | more
1342 | Demo;[libgomp.so.1.0.0] 1856
1343 | Demo;[unknown];[libgomp.so.1.0.0];vtkm::cont::DeviceAdapterAlgorithm<vtkm::cont::DeviceAdapterTagOpenMP>::ScheduleTask 8
1344 | ```
1345 | 
1346 | 
1347 | ---
1348 | 
1349 | ## Generate a stack
1350 | 
1351 | ```
1352 | $ perf script | ~/FlameGraph/stackcollapse-perf.pl > out.folded
1353 | $ ~/FlameGraph/flamegraph.pl out.folded --title="VTK-m rendering and isocontouring" > flame.svg
1354 | ```
1355 | 
1356 | ---
1357 | 
1358 | ## Convert sorted/unique stack frames into pretty picture
1359 | 
1360 | 
1361 | ```
1362 | $ cat out.folded | ~/FlameGraph/flamegraph.pl
1363 | <?xml version="1.0" standalone="no"?>
1364 | <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
1365 | <svg version="1.1" width="1200" height="2134" onload="init(evt)" viewBox="0 0 1200 2134" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.
1366 | org/1999/xlink">
1367 | <!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.ht
1368 | ml for examples. -->
1369 | <!-- NOTES:  -->
1370 | <defs>
1371 |         <linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
1372 |                 <stop stop-color="#eeeeee" offset="5%" />
1373 |                 <stop stop-color="#eeeeb0" offset="95%" />
1374 |         </linearGradient>
1375 | </defs>
1376 | <style type="text/css">
1377 | ```
1378 | 
1379 | ---
1380 | 
1381 | ## Analyzing only one particular function
1382 | 
1383 | ```
1384 | $ grep BVHTraverser out.folded | ~/FlameGraph/flamegraph.pl > flame.svg
1385 | ```
1386 | 
1387 | ---
1388 | 
1389 | ## perf in other languages and contexts
1390 | 
1391 | See [Brendan Gregg's](https://youtu.be/tAY8PnfrS_k) YOW! keynote for Java performance analysis using this workflow.
1392 | 
1393 | ---
1394 | 
1395 | ## VTK-m Volume Rendering with OpenMP
1396 | 
1397 | ![inline](figures/vtkm_openmp.svg)
1398 | 
1399 | ---
1400 | 
1401 | ## VTK-m Volume Rendering (TBB)
1402 | 
1403 | ![inline](figures/vtkm_tbb_rendering.svg)
1404 | 
1405 | ---
1406 | 
1407 | ## VTK-m Volume Rendering with CUDA
1408 | 
1409 | ![inline](figures/vtkm_cuda.svg)
1410 | 
1411 | ---
1412 | 
1413 | ## `perf` GUI?
1414 | 
1415 | You can use [hotspot](https://github.com/KDAB/hotspot) if you like GUIs.
1416 | 
1417 | hotspot also has a number of excellent alternative visualizations created from the `perf.data` file.
1418 | 
1419 | ---
1420 | 
1421 | ## Installing hotspot
1422 | 
1423 | Download the [AppImage](https://github.com/KDAB/hotspot/releases),
1424 | 
1425 | ```bash
1426 | $ chmod a+x Hotspot-git.102d4b7-x86_64.AppImage
1427 | $ ./Hotspot-git.102d4b7-x86_64.AppImage
1428 | ```
1429 | 
1430 | Note: On Andes, use
1431 | 
1432 | ```bash
1433 | $ ./Hotspot-git.102d4b7-x86_64.AppImage --appimage-extract-and-run
1434 | ```
1435 | 
1436 | 
1437 | ---
1438 | 
1439 | ## Wait-time analysis (i.e., off-CPU profiling)
1440 | 
1441 | Flamegraphs show the expense of *executing instructions*.
1442 | 
1443 | In multithreaded environments, we often need to know the expense of *doing nothing*.
1444 | 
1445 | Getting an idea of which cores are doing nothing is called "off-CPU profiling", or "wait-time analysis".
1446 | 
1447 | ---
1448 | 
1449 | ## Other Criticisms of Flamegraphs
1450 | 
1451 | See [here](https://stackoverflow.com/a/25870103) for numerous ways for problems to hide from Flamegraphs.
1452 | 
1453 | ---
1454 | 
1455 | ## Off-CPU profiling with hotspot
1456 | 
1457 | ![inline](figures/HotSpotOffCPU.png)
1458 | 
1459 | ---
1460 | 
1461 | ## Off-CPU profiling: `perf`
1462 | 
1463 | ```bash
1464 | $ perf record --call-graph dwarf --event cycles --switch-events --event sched:sched_switch \
1465 |    --aio --sample-cpu ~/vtk-m/build/examples/demo/Demo
1466 | ```
1467 | 
1468 | ---
1469 | 
1470 | ## Off-CPU profiling: `hotspot`
1471 | 
1472 | ![inline](figures/hotspotoffcpuconfig.png)
1473 | 
1474 | ---
1475 | 
1476 | ## Discussion: Is this bad or good?
1477 | 
1478 | ---
1479 | 
1480 | ## My conclusion:
1481 | 
1482 | Amdahl's law is very harsh--our thread utilization gets destroyed by the lz77 encoding of the PNG.
1483 | 
1484 | Not exactly what you hope for when you are computing an isocontour.
1485 | 


--------------------------------------------------------------------------------
/benchmarks/Makefile:
--------------------------------------------------------------------------------
 1 | CXX = g++
 2 | INCFLAGS = -I/usr/local/include
 3 | CPPFLAGS = --std=c++11 -g -fno-omit-frame-pointer -O3 -march=native -fno-finite-math-only -ffast-math
 4 | LINKFLAGS = -lbenchmark -pthread -L/usr/local/lib
 5 | 
 6 | all: bench
 7 | 
 8 | bench: bench.cpp
 9 | 	$(CXX) $(INCFLAGS) $(CPPFLAGS) $< -o $@ $(LINKFLAGS)
10 | 
11 | clean:
12 | 	rm -f bench perf.data perf.data.old
13 | 


--------------------------------------------------------------------------------
/benchmarks/bench.cpp:
--------------------------------------------------------------------------------
  1 | #include <iostream>
  2 | #include <vector>
  3 | #include <random>
  4 | #include <algorithm>
  5 | #include <chrono>
  6 | #include <benchmark/benchmark.h>
  7 | 
  8 | template<typename Real>
  9 | Real dot_product(Real* a, Real* b, unsigned long long n) {
 10 |     Real d = 0;
 11 |     for (unsigned long long i = 0; i < n; ++i) {
 12 |         d += a[i]*b[i];
 13 |     }
 14 |     return d;
 15 | }
 16 | 
 17 | 
 18 | template<class Real>
 19 | void DotProduct(benchmark::State& state)
 20 | {
 21 |     std::vector<Real> a(state.range(0));
 22 |     std::vector<Real> b(state.range(0));
 23 |     std::random_device rd;
 24 |     std::uniform_real_distribution<Real> unif(-1,1);
 25 |     for (size_t i = 0; i < a.size(); ++i)
 26 |     {
 27 |         a[i] = unif(rd);
 28 |         b[i] = unif(rd);
 29 |     }
 30 | 
 31 |     for (auto _ : state)
 32 |     {
 33 |         benchmark::DoNotOptimize(dot_product(a.data(), b.data(), a.size()));
 34 |     }
 35 |     state.SetBytesProcessed(2*int64_t(state.range(0))*sizeof(Real)*int64_t(state.iterations()));
 36 |     state.counters["n"] = state.range(0);
 37 |     state.SetComplexityN(state.range(0));
 38 | }
 39 | 
 40 | BENCHMARK_TEMPLATE(DotProduct, float)->RangeMultiplier(2)->Range(1<<3, 1<<22)->Complexity(benchmark::oN)->Unit(benchmark::kMicrosecond);
 41 | BENCHMARK_TEMPLATE(DotProduct, double)->RangeMultiplier(2)->Range(1<<6, 1<<22)->Complexity(benchmark::oN)->Unit(benchmark::kMicrosecond);
 42 | BENCHMARK_TEMPLATE(DotProduct, long double)->RangeMultiplier(2)->Range(1<<3, 1<<18)->Complexity()->Unit(benchmark::kMicrosecond);
 43 | 
 44 | 
 45 | template <typename T>
 46 | int64_t interpolation_search(T arr[], int64_t size, T key)
 47 | {
 48 | 
 49 |     int64_t low = 0;
 50 |     int64_t high = size - 1;
 51 |     int64_t mid;
 52 | 
 53 |     while ((arr[high] != arr[low]) && (key >= arr[low]) && (key <= arr[high])) {
 54 |         mid = low + ((key - arr[low]) * (high - low) / (arr[high] - arr[low]));
 55 | 
 56 |         if (arr[mid] < key)
 57 |             low = mid + 1;
 58 |         else if (key < arr[mid])
 59 |             high = mid - 1;
 60 |         else
 61 |             return mid;
 62 |     }
 63 | 
 64 |     return low;
 65 | }
 66 | 
 67 | template<class Real>
 68 | void InterpolationSearch(benchmark::State& state)
 69 | {
 70 |     std::vector<Real> a(state.range(0));
 71 |     std::random_device rd;
 72 |     std::uniform_real_distribution<Real> unif(-1,1);
 73 |     for (size_t i = 0; i < a.size(); ++i)
 74 |     {
 75 |         a[i] = unif(rd);
 76 |     }
 77 |     std::sort(a.begin(), a.end());
 78 | 
 79 |     for (auto _ : state)
 80 |     {
 81 |         int64_t i = interpolation_search(a.data(), a.size(), unif(rd));
 82 |         benchmark::DoNotOptimize(i);
 83 |     }
 84 |     state.SetComplexityN(state.range(0));
 85 | }
 86 | 
 87 | BENCHMARK_TEMPLATE(InterpolationSearch, double)->RangeMultiplier(2)->Range(1<<5, 1<<18)->Complexity()->Unit(benchmark::kMicrosecond);
 88 | 
 89 | template<class Real>
 90 | void BinarySearch(benchmark::State& state)
 91 | {
 92 |     std::vector<Real> a(state.range(0));
 93 |     std::random_device rd;
 94 |     std::uniform_real_distribution<Real> unif(-1,1);
 95 |     for (size_t i = 0; i < a.size(); ++i)
 96 |     {
 97 |         a[i] = unif(rd);
 98 |     }
 99 |     std::sort(a.begin(), a.end());
100 |     for (auto _ : state)
101 |     {
102 |         auto it = std::upper_bound(a.begin(), a.end(), unif(rd));
103 |         benchmark::DoNotOptimize(it);
104 |     }
105 |     state.SetComplexityN(state.range(0));
106 | }
107 | 
108 | BENCHMARK_TEMPLATE(BinarySearch, double)->RangeMultiplier(2)->Range(1<<5, 1<<18)->Complexity()->Unit(benchmark::kMicrosecond);
109 | 
110 | 
111 | BENCHMARK_MAIN();
112 | 


--------------------------------------------------------------------------------
/figures/HotSpotOffCPU.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/HotSpotOffCPU.png


--------------------------------------------------------------------------------
/figures/benchmark_linear_complexity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/benchmark_linear_complexity.png


--------------------------------------------------------------------------------
/figures/dot_disassembly.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/dot_disassembly.png


--------------------------------------------------------------------------------
/figures/frontend_v_backend.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/frontend_v_backend.png


--------------------------------------------------------------------------------
/figures/hotspotoffcpuconfig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/hotspotoffcpuconfig.png


--------------------------------------------------------------------------------
/figures/logo.svg:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="utf-8"?>
  2 | <!-- Generator: Adobe Illustrator 18.1.0, SVG Export Plug-In . SVG Version: 6.00 Build 0)  -->
  3 | <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
  4 | <svg version="1.1" id="ORNL_two-line" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px"
  5 | 	 y="0px" viewBox="-0.1 2673.5 3400 817.9" enable-background="new -0.1 2673.5 3400 817.9" xml:space="preserve">
  6 | <title>ORNL Two-line_white</title>
  7 | <path fill="#007833" d="M666.3,3269.1c0,52.1,2.6,95.1,3.3,104.6c0.7,15.3,4.7,22.7,16.7,22.7h8.7c2.6,0,2.6,1.3,2.6,2.8v7
  8 | 	c0,1.8,0,2.4-2.6,2.4s-23.4-0.9-36.1-0.9c-8.7,0-31.4,0.9-34.9,0.9c-1.3,0-2-0.6-2-2.8v-7.2c0.1-1,1-1.8,2-1.8h11.4
  9 | 	c8.7,0,13.4-7,15.1-25.4c0-5.5,2.4-48.8,2.4-97.6v-39c0-18.4-11.4-32.6-26.1-32.6h-10.7c-1.3,0-2-0.9-2-2.2v-8.5
 10 | 	c0-1.2,0.7-1.8,2-1.8c1.7,0,4.3,0,10.4,0.6c6.1,0.6,15.1,0,29.8,0c5.7,0,10.7,0,14.7-0.6s7.4,0,9.3,0c2.9-0.3,5.9,0.8,8,2.8
 11 | 	c3.3,3.1,68.9,78.6,75.6,87.1c4.7,4.9,56.8,64.4,62.2,70.5h1.3v-31.9c0-27,0-42.3-0.7-57.6c-0.7-11.6-2-41.8-3.7-46.9
 12 | 	c-2.7-7.3-9.9-12-17.7-11.6h-10.7c-2,0-2-1.3-2-2.8v-7c0-1.8,0-2.4,2.6-2.4c4,0,14.7,0.9,37.4,0.9c20.1,0,27.9-0.9,32.1-0.9
 13 | 	c2.6,0,3.3,1.3,3.3,2.2v7c0,1.8-0.7,3.1-2,3.1h-9.3c-7,0-10,4.9-11,16.2c-1.7,20.9-5,72.4-5,103.7v81c0,4.3,0,8.6-4,8.6
 14 | 	c-4.3-0.2-8.3-1.9-11.4-4.9c-2.6-3.1-20.9-22.1-41.5-44.8c-23.4-26.4-49.5-55.2-53.5-60.2c-4-4.9-58.1-65.7-62.2-69.7h-2
 15 | 	L666.3,3269.1z"/>
 16 | <path fill="#007833" d="M960.8,3392.8c-8.1,7.9-18.1,13.7-29.1,16.6c-7,1.7-14.2,2.5-21.4,2.2c-12.3,0-23.1-9.2-23.1-26.4
 17 | 	c0-13.9,7-27.9,30.7-37.7c14-5.5,27.6-11.8,40.8-19v-16c0.2-6.2-1.7-12.2-5.4-17.1c-4.8-3.6-10.8-5.2-16.7-4.6
 18 | 	c-6.8,0-13.6,1.9-19.4,5.5c-4.7,3.1-1.3,7.7-1.3,15.1c0,11.1-6,17.8-20.1,17.8c-5.8,0.5-10.8-3.8-11.3-9.5c0-0.3,0-0.6,0-0.9
 19 | 	c0.7-10.4,6.5-19.8,15.4-25.2c15.8-11.2,34.8-17.2,54.2-17.1c18.8,0,24.7,3.7,30.7,8.6c6.1,6,9.1,14.5,8,23l0.7,48.8v14.1
 20 | 	c0,16.6,2.6,21.5,8.7,21.5c4.1,0.1,8.1-1.7,10.7-4.9c1.3-1.8,2-1.8,3.3-1.3l4,2.4c1.3,0.6,2,1.8,0.7,4.9
 21 | 	c-6,11.3-17.9,18.2-30.7,17.8c-12.5,0.8-23.9-6.8-27.9-18.7L960.8,3392.8z M958.8,3339.8c-14.7,8.6-36.8,13.9-36.8,36.2
 22 | 	c0,12.5,7.4,18.7,15.4,18.7c9.3-0.6,21.4-9.2,21.4-14.7L958.8,3339.8z"/>
 23 | <path fill="#007833" d="M1024.1,3296.8c-2.6,0-4.7-1.3-4.7-4.3s2-4.3,6.3-5.9c9.3-3.1,38.4-25.8,45.8-38.6c1-2,3.1-3.2,5.4-3.1
 24 | 	c2.1-0.1,3.9,1.5,4.1,3.6c0,0.2,0,0.5,0,0.7l-1.3,30.4h44.1c2,0,2.6,1.3,2.4,2.4l-3.3,12.9c0,1.3-1.3,1.8-3.3,1.8h-40.4
 25 | 	c-0.7,12.9-1,27.9-1,41.1v27.9c0,23.9,11,28.9,17.7,28.9c6.6,0.1,13.2-1.2,19.4-3.7c2.4-0.9,3.3-0.9,4,0.6l2,3.7
 26 | 	c0.6,1.6,0.1,3.4-1.3,4.3c-12.4,8.2-27,12.5-41.8,12.3c-20.1,0-34.9-14.7-34.9-27.9c0-11,1-35.6,1-47.3v-39.9L1024.1,3296.8z"/>
 27 | <path fill="#007833" d="M1157.5,3320.4c0-9.2-2-15.3-17.4-19c-3-0.6-3.3-1.5-3.3-2.8v-3.3c0-1.3,0.7-1.8,2.4-2.4
 28 | 	c4.7-1.8,37.8-12.5,43.4-14.7c2.6-1,5.3-1.7,8-1.8c2,0,3,1.3,2.6,3.7c-0.7,4.9-0.7,24.5-0.7,50v33.1c-0.2,8.8,0,17.6,0.7,26.3
 29 | 	c0.7,5.5,2,7.9,5.4,7.9h13.4c2.6,0,3.3,0.6,3.3,2.2v6.1c0,1.8-0.7,3.1-2.6,3.1c-2,0-18.1-0.9-40.4-0.9c-19.4,0-32.1,0.9-36.1,0.9
 30 | 	c-1.7,0-2.6-0.6-2.6-3.1v-5.9c0-1.8,1.3-2.4,3.3-2.4h14.7c3.3,0,5-1.3,5.4-6.1c0-9.2,0.7-19,0.7-27.9L1157.5,3320.4z M1168.9,3244.5
 31 | 	c-10.3,0.8-19.4-6.8-20.4-17.1c0-8.9,9-17.1,22.4-17.1c9.7-1.2,18.5,5.8,19.7,15.5c0,0.2,0,0.3,0.1,0.5
 32 | 	C1190.6,3235.1,1183.9,3244.5,1168.9,3244.5L1168.9,3244.5z"/>
 33 | <path fill="#007833" d="M1348.8,3294c14.5,12,23,29.9,23.1,48.8c-0.6,17.7-8.1,34.4-20.9,46.6c-15,14.7-35.2,22.6-56.2,22.1
 34 | 	c-20.5,1.2-40.4-6.9-54.2-22.1c-12.6-12.5-19.2-29.8-18.1-47.5c1-20.4,11.3-39.1,27.9-51c13.8-10,30.5-15.2,47.5-14.7
 35 | 	C1316.5,3275.5,1334.7,3281.9,1348.8,3294z M1331.1,3341.5c0-33.7-19.4-53.4-34.9-53.4c-7.8-0.4-15.5,2.7-20.9,8.3
 36 | 	c-10.4,10.7-14.4,23.6-14.4,46c0,17.8,1.3,29.4,10.4,42.7c5.7,8.9,15.9,14,26.4,13.2c21.6,0.3,33.3-25.2,33.3-56.5L1331.1,3341.5z"
 37 | 	/>
 38 | <path fill="#007833" d="M1425.7,3278.6c2.1-1.3,4.5-2.1,7-2.4c2,0,3.3,1.2,4,3.7l2.6,16h2c11.6-11.7,27-18.7,43.4-19.7
 39 | 	c10.3-0.7,20.6,1.9,29.4,7.4c14.7,9.8,18.4,19,18.4,31.6v44.2c0,9.8,0,23.3,0,29.4c0,6.1,4,8.6,8,8.6h8.8c3,0,4,0.6,4,2.2v7
 40 | 	c0,1.5-0.7,2.2-2.6,2.2c-2,0-12.7-0.9-36.1-0.9c-20.1,0-30.7,0.9-32.8,0.9c-2.1,0-2.6-1.3-2.6-3.7v-5.2c0-1.8,0.7-2.4,4-2.4h7
 41 | 	c2.6,0,5.4-1.8,6-7.4c1-9.6,1.5-19.2,1.3-28.9V3326c0-12.3-0.7-20.2-8.7-26.3c-5.7-4.4-12.9-6.4-20.1-5.5c-7.9,0-15.5,2.8-21.4,7.9
 42 | 	c-5.2,6-7.7,13.9-7,21.8v63.2c0,7,1.7,10.5,6.3,10.5h6c2.6,0,3.3,0.6,3.3,2.8v5.4c0,1.8-0.7,3.1-2.6,3.1s-11.4-0.9-31.4-0.9
 43 | 	s-31.4,0.9-33.5,0.9s-2-1.3-2-3.1v-5.9c0-1.8,0.7-2.4,3.3-2.4h7c4,0,8.7-1.3,8.7-6.1v-69.2c0-9.2-6-13.9-13.9-17.8l-2.6-1.3
 44 | 	c-1.5-0.3-2.6-1.5-2.6-3.1v-2.4c0-1.3,1.3-2.4,3.3-3.1L1425.7,3278.6z"/>
 45 | <path fill="#007833" d="M1643.7,3392.8c-8.1,7.9-18.1,13.7-29.1,16.6c-7,1.7-14.2,2.5-21.4,2.2c-12.3,0-23.1-9.2-23.1-26.4
 46 | 	c0-13.9,7-27.9,30.7-37.7c14-5.5,27.6-11.8,40.8-19v-16c0.2-6.2-1.7-12.2-5.4-17.1c-4.8-3.6-10.8-5.2-16.7-4.6
 47 | 	c-6.8,0-13.6,1.9-19.4,5.5c-4.7,3.1-1.3,7.7-1.3,15.1c0,11.1-6,17.8-20.1,17.8c-5.8,0.5-10.8-3.8-11.3-9.5c0-0.3,0-0.6,0-0.9
 48 | 	c0.7-10.4,6.5-19.8,15.4-25.2c15.8-11.2,34.8-17.2,54.2-17.1c18.8,0,24.7,3.7,30.7,8.6c6.1,6,9.1,14.5,8,23l0.7,48.8v14.1
 49 | 	c0,16.6,2.6,21.5,8.7,21.5c4.1,0.1,8.1-1.7,10.7-4.9c1.3-1.8,2-1.8,3.3-1.3l4,2.4c1.3,0.6,2,1.8,0.7,4.9c-6,11.3-18,18.2-30.7,17.8
 50 | 	c-12.5,0.8-23.9-6.8-27.9-18.7L1643.7,3392.8z M1641.7,3339.8c-14.7,8.6-36.8,13.9-36.8,36.2c0,12.5,7.4,18.7,15.4,18.7
 51 | 	c9.3-0.6,21.4-9.2,21.4-14.7V3339.8z"/>
 52 | <path fill="#007833" d="M1738.7,3290.4c0-21.5-0.7-47.9-1.3-55.8c-0.7-10.5-6.3-15.3-17.1-18.4l-7-1.8c-2-0.6-2-1.3-2-2.4v-3.7
 53 | 	c0-0.6,0.7-1.8,2.6-2.4c17.2-4.4,34-10.4,50.1-17.8c2.4-0.9,4.9-1.5,7.4-1.8c2,0,3.3,1.3,3.3,5.5c-0.7,16.6-1.3,80.4-1.3,97.6v44.2
 54 | 	c0,13.9,0,45.4,0.7,54.6c0.7,5.5,1.3,9.8,8.7,9.8h12.7c3.3,0,4,0.6,4,2.8v4.9c0,1.8-0.7,3.1-2,3.1c-2.6,0-17.4-0.9-40.8-0.9
 55 | 	c-25.4,0-38.8,0.9-41.5,0.9s-2.6-1.3-2.6-3.1v-5.9c0-1.8,0.7-2.4,4.7-2.4h12.7c3.3,0,8.4-2.4,8.4-11v-95.9L1738.7,3290.4
 56 | 	L1738.7,3290.4z"/>
 57 | <path fill="#007833" d="M2084.7,3344.6c0.7,0.6,0.7,1.3,0.7,3.1c-1.3,1.8-14.7,52.8-16.7,61.1c-8.7,0-105-0.9-139.4-0.9
 58 | 	c-30.1,0-41.8,0.9-46.1,0.9c-2.6,0-2.6-0.6-2.6-2.8v-7c0-1.8,0-2.4,2-2.4h13.4c12.1,0,13.9-4.3,13.9-8.3c0-5.5,1.3-82.2,1.3-100.6
 59 | 	v-17.1c0-19.7-0.7-52.1-1.3-57.4c-0.7-7.9-2-11-10.7-11h-13.9c-1.5,0-2.7-1.3-2.6-2.8l0,0v-7c0-1.8,0.7-2.4,2.6-2.4
 60 | 	c3.3,0,15.4,0.9,46.1,0.9c36.1,0,50.1-0.9,52.8-0.9s3.3,1.3,3.3,2.8v7c0,1.3-0.7,2.4-3.3,2.4h-18.1c-10,0-11.4,3.1-11.4,11.4
 61 | 	c0,4.3-1.3,50.3-1.3,71.8v26.3c0,5.5,0,71.8,1.3,85h18.8c18.8,0,42.8,0,56.2-1.5c18.8-2.4,40.1-39.2,44.8-50.9c1.3-1.3,2-1.8,4-1.3
 62 | 	L2084.7,3344.6z"/>
 63 | <path fill="#007833" d="M2179.8,3392.8c-8.1,7.9-18.1,13.7-29.1,16.6c-7,1.7-14.2,2.5-21.4,2.2c-12.3,0-23.1-9.2-23.1-26.4
 64 | 	c0-13.9,7-27.9,30.7-37.7c14-5.5,27.6-11.8,40.8-19v-16c0.2-6.2-1.7-12.2-5.4-17.1c-4.8-3.6-10.8-5.2-16.7-4.6
 65 | 	c-6.8,0-13.6,1.9-19.4,5.5c-4.7,3.1-1.3,7.7-1.3,15.1c0,11.1-6,17.8-20.1,17.8c-5.8,0.5-10.8-3.8-11.3-9.5c0-0.3,0-0.6,0-0.9
 66 | 	c0.7-10.4,6.5-19.8,15.4-25.2c15.8-11.2,34.8-17.2,54.2-17.1c18.8,0,24.7,3.7,30.7,8.6c6.1,6,9.1,14.6,7.9,23.1l0.7,48.8v13.9
 67 | 	c0,16.6,2.6,21.5,8.7,21.5c4.1,0.1,8.1-1.7,10.7-4.9c1.3-1.8,2-1.8,3.3-1.3l4,2.4c1.3,0.6,2,1.8,0.7,4.9
 68 | 	c-6,11.3-17.9,18.2-30.7,17.8c-12.5,0.8-23.9-6.8-27.9-18.7L2179.8,3392.8z M2177.8,3339.8c-14.7,8.6-36.8,13.9-36.8,36.2
 69 | 	c0,12.5,7.4,18.7,15.4,18.7c9.3-0.6,21.4-9.2,21.4-14.7L2177.8,3339.8L2177.8,3339.8z"/>
 70 | <path fill="#007833" d="M2269,3410.5c-1.3,0.7-2.8,1-4.3,0.9l-4.7-1.3c-1.3-0.6-1.3-1.3-1.3-2.4c0.7-4.3,1-30,1-52.1l-0.7-112
 71 | 	c0-20.9-3.7-24.5-14.4-27.9l-4-1.3c-4-0.6-4.7-1.3-4.7-2.4v-3.1c0-0.6,0.7-1.8,3.3-2.4c13.4-4.3,32.8-12.3,43.4-17.1
 72 | 	c3-2,6.4-3.3,10-3.7c2.6,0,4,1.8,3.3,4.9s-1.3,37.4-1.7,64.1v36.8c12.9-10.4,29.2-16,45.8-15.6c19.5,0.1,37.8,9.7,48.8,25.8
 73 | 	c7,11.5,10.4,24.9,9.7,38.3c0,33.7-32.8,71.2-73.5,71.2c-14.7,0.5-29.3-2.2-42.8-7.9L2269,3410.5z M2294.1,3341.5
 74 | 	c0,25.8,0.7,35.6,4.7,44.8c5.2,8.7,14.6,14,24.7,13.9c12.2,0.5,23.7-5.5,30.4-15.6c6.4-10.3,9.5-22.3,8.7-34.4
 75 | 	c0-35.6-16-55.8-42.5-55.8c-9.5-0.1-18.7,3.2-26.1,9.2v37.9H2294.1L2294.1,3341.5z"/>
 76 | <path fill="#007833" d="M2543.3,3294c14.5,12,23,29.9,23.1,48.8c-0.6,17.7-8.1,34.4-20.9,46.6c-15,14.7-35.2,22.6-56.2,22.1
 77 | 	c-20.5,1.2-40.4-6.9-54.2-22.1c-12.6-12.5-19.2-29.8-18.1-47.5c1-20.4,11.3-39.1,27.9-51c13.8-10,30.5-15.2,47.5-14.7
 78 | 	C2511,3275.5,2529.2,3281.9,2543.3,3294z M2525.6,3341.9c0-33.7-19.4-53.4-34.9-53.4c-7.8-0.4-15.5,2.7-20.9,8.3
 79 | 	c-10.4,10.7-14.4,23.6-14.4,46c0,17.8,1.3,29.4,10.4,42.7c5.7,8.9,15.9,14,26.4,13.2C2514.3,3398.5,2525.6,3373.1,2525.6,3341.9z"/>
 80 | <path fill="#007833" d="M2600.6,3328.9c0-13.9-2.6-19-12.7-23.9l-6-3.1c-2-0.6-2.6-1.3-2.6-2.4v-1.8c0-1.3,0.7-1.8,2.6-2.8
 81 | 	l40.4-17.5c1.7-0.7,3.5-1.1,5.4-1.3c2,0,2.6,1.8,3,3.7l3,22.1h1.3c11.4-17.1,25.4-27.9,40.1-27.9c8.7-0.9,16.4,5.4,17.3,14.1
 82 | 	c0,0.4,0.1,0.8,0.1,1.3c-1,9.3-8.7,16.3-18.1,16.6c-3.9,0-7.8-0.8-11.4-2.4c-2.8-1.7-6.1-2.6-9.3-2.4c-5.8,0.5-11.1,3.8-13.9,8.9
 83 | 	c-2.5,4.6-3.9,9.8-4,15.1v59.6c0,9.8,2.6,12.9,9.3,12.9h14.7c2,0,2.6,0.6,2.6,2.2v7c0,1.5-0.7,2.2-2.4,2.2s-17.1-0.9-41.5-0.9
 84 | 	s-36.5,0.9-39.1,0.9s-2.4-0.6-2.4-2.2v-7c0-1.8,0.7-2.4,3-2.4h11.7c5.4,0,8.7-2.4,8.7-8.3L2600.6,3328.9z"/>
 85 | <path fill="#007833" d="M2777.1,3392.8c-8.1,7.9-18.1,13.7-29.1,16.6c-7,1.7-14.2,2.5-21.4,2.2c-12.3,0-23.1-9.2-23.1-26.4
 86 | 	c0-13.9,7-27.9,30.7-37.7c14-5.5,27.6-11.8,40.8-19v-16c0.2-6.2-1.7-12.2-5.4-17.1c-4.8-3.6-10.8-5.2-16.7-4.6
 87 | 	c-6.8,0-13.6,1.9-19.4,5.5c-4.7,3.1-1.3,7.7-1.3,15.1c0,11.1-6,17.8-20.1,17.8c-5.8,0.5-10.8-3.8-11.3-9.5c0-0.3,0-0.6,0-0.9
 88 | 	c0.7-10.4,6.5-19.8,15.4-25.2c15.8-11.2,34.8-17.2,54.2-17.1c18.8,0,24.7,3.7,30.7,8.6c6.1,6,9.1,14.5,8,23l0.7,48.8v14.1
 89 | 	c0,16.6,2.6,21.5,8.7,21.5c4.1,0.1,8.1-1.7,10.7-4.9c1.3-1.8,2-1.8,3.3-1.3l4,2.4c1.3,0.6,2,1.8,0.7,4.9c-6,11.3-18,18.2-30.7,17.8
 90 | 	c-12.5,0.8-23.9-6.8-27.9-18.7L2777.1,3392.8z M2775.1,3339.8c-14.7,8.6-36.8,13.9-36.8,36.2c0,12.5,7.4,18.7,15.4,18.7
 91 | 	c9.3-0.6,21.4-9.2,21.4-14.7L2775.1,3339.8L2775.1,3339.8z"/>
 92 | <path fill="#007833" d="M2840.2,3296.8c-2.6,0-4.7-1.3-4.7-4.3s2-4.3,6.3-5.9c9.3-3.1,38.4-25.8,45.8-38.6c1-2,3.1-3.2,5.4-3.1
 93 | 	c2.1-0.1,3.9,1.5,4.1,3.6c0,0.2,0,0.5,0,0.7l-1.3,30.4h44.1c2,0,2.6,1.3,2.4,2.4l-3.3,12.9c0,1.3-1.3,1.8-3.3,1.8h-40.6
 94 | 	c-0.7,12.9-1,27.9-1,41.1v27.9c0,23.9,11,28.9,17.7,28.9c6.6,0.1,13.2-1.2,19.4-3.7c2.4-0.9,3.3-0.9,4,0.6l2,3.7
 95 | 	c0.6,1.6,0.1,3.4-1.3,4.3c-12.4,8.2-27,12.5-41.8,12.3c-20.1,0-34.9-14.7-34.9-27.9c0-11,1-35.6,1-47.3v-39.9L2840.2,3296.8z"/>
 96 | <path fill="#007833" d="M3069.9,3294c14.5,12,23,29.9,23.1,48.8c-0.6,17.7-8.1,34.4-20.9,46.6c-15,14.7-35.2,22.6-56.2,22.1
 97 | 	c-20.5,1.2-40.4-6.9-54.2-22.1c-12.6-12.5-19.2-29.8-18.1-47.5c1-20.4,11.3-39.1,27.9-51c13.8-10,30.5-15.2,47.5-14.7
 98 | 	C3037.5,3275.5,3055.7,3281.9,3069.9,3294z M3052.1,3341.5c0-33.7-19.4-53.4-34.9-53.4c-7.8-0.4-15.5,2.7-20.9,8.3
 99 | 	c-10.4,10.7-14.4,23.6-14.4,46c0,17.8,1.3,29.4,10.4,42.7c5.7,8.9,15.9,14,26.4,13.2c21.9,0.3,33.3-25.2,33.3-56.5L3052.1,3341.5z"
100 | 	/>
101 | <path fill="#007833" d="M3127.2,3328.9c0-13.9-2.6-19-12.7-23.9l-6-3.1c-2-0.6-2.6-1.3-2.6-2.4v-1.8c0-1.3,0.7-1.8,2.6-2.8
102 | 	l40.4-17.5c1.7-0.7,3.5-1.1,5.4-1.3c2,0,2.6,1.8,3,3.7l3,22.1h1.3c11.4-17.1,25.4-27.9,40.1-27.9c8.7-0.9,16.4,5.4,17.3,14.1
103 | 	c0,0.4,0.1,0.8,0.1,1.3c-1,9.3-8.7,16.3-18.1,16.6c-3.9,0-7.8-0.8-11.4-2.4c-2.8-1.7-6.1-2.6-9.3-2.4c-5.8,0.5-11.1,3.8-13.9,8.9
104 | 	c-2.5,4.6-3.9,9.8-4,15.1v59.6c0,9.8,2.6,12.9,9.3,12.9h14.7c2,0,2.6,0.6,2.6,2.2v7c0,1.5-0.7,2.2-2.4,2.2s-17.1-0.9-41.5-0.9
105 | 	s-36.5,0.9-39.1,0.9s-2.4-0.6-2.4-2.2v-7c0-1.8,0.7-2.4,3-2.4h11.5c5.4,0,8.7-2.4,8.7-8.3L3127.2,3328.9z"/>
106 | <path fill="#007833" d="M3224.8,3474.9c0.2-8.6,7.4-15.5,16-15.3c5.1,0.2,10.1,1.5,14.7,3.7c2.8,1.4,6.1,1.1,8.7-0.6
107 | 	c10.1-15.4,18.6-31.7,25.4-48.8c-2-9.2-40.8-104.6-43.1-110.1c-3.7-7.9-7-12.9-13.9-12.9h-7c-2,0-2.6-0.9-2.6-2.4v-7
108 | 	c0-1.3,0.7-2.2,2.6-2.2c2,0,10.7,0.9,35.4,0.9c19.4,0,29.4-0.9,33.5-0.9c2,0,2.6,0.6,2.6,2.4v7c0,1.3-0.7,2.2-2.4,2.2h-5
109 | 	c-3.3,0-6,1.8-5,6.4c1,6.1,20.9,68.7,24.4,75.5h1.3c6-11.6,29.4-63.8,31.1-70.3c2-7.9-1-11.6-7-11.6h-5.4c-2,0-2.6-0.9-2.6-2.2v-7
110 | 	c0-1.5,0.7-2.4,2.6-2.4c4,0,10.7,0.9,28.7,0.9c11.4,0,19.4-0.9,22.4-0.9c1.7,0,2.4,0.9,2.4,2.4v6.4c0,1.8-0.7,2.4-2,2.4h-4
111 | 	c-4.6-0.1-9,1.8-12.1,5.2c-7,7.9-29.4,54-53.5,103.7c-16.7,33.1-30.7,62-36.8,71.8c-5.4,7.9-12.1,20.2-30.1,20.2
112 | 	C3232.9,3491.4,3224.8,3484.6,3224.8,3474.9z"/>
113 | <path fill="#007833" d="M945.8,2725.1c49.3,40.4,77.3,101.3,75.8,165.1c0.7,44.4-13.9,87.7-41.3,122.7
114 | 	c-35.7,44.6-83.6,79.2-167.3,79.2c-44.5-1.1-87.9-13.8-126-36.8c-39.6-24-81.4-84.8-81.4-165.1c0.9-50.6,19-99.4,51.4-138.3
115 | 	c48.8-60.2,109.1-71.5,157.1-71.5C862.1,2678.9,908.8,2694.9,945.8,2725.1z M731.6,2736.2c-34,32.9-50.2,90.6-50.2,135
116 | 	c0,145,89.2,196.3,136.1,196.3c112.6,0,130.5-124.9,130.5-171.8c0-121.6-82.5-190.7-136.1-190.7
117 | 	C801.9,2705.1,769.5,2699.5,731.6,2736.2z"/>
118 | <path fill="#007833" d="M1105.1,2959.4c-3.6,10.5-25.9,75.2-27.9,87.5c-0.7,3.1-1,6.3-0.9,9.6c0,5.9,7,10.9,16.3,10.9h19
119 | 	c3.6,0,4.5,0.9,4.5,2.3v11.4c0,2.7-0.9,4.5-3.6,4.5c-6.3,0-25.4-1.4-57.1-1.4c-33.5,0-43.5,1.4-49.8,1.4c-2.7,0-2.7-2.7-2.7-5.9
120 | 	v-8.6c0-2.7,0.9-3.6,2.7-3.6h13.9c12.7,0,20.9-6.3,29-20.4c4.5-8.2,37.2-89.7,50.7-127.8c3.2-9.1,39.9-106.9,44.4-118.5
121 | 	c0.9-4.5,3.6-12.7,0.9-15.4c-1.4-1.5-2.4-3.4-2.7-5.4c0-1.8,0.9-3.6,4.5-4.5c16.2-4.9,31.3-12.6,44.9-22.7c2.7-2.7,4.5-4.5,7-4.5
122 | 	s4.5,5.4,5.4,8.2c8.2,27.9,43.5,129.6,51.7,158.6c10.9,35.3,39.9,119.6,46.2,131.9c8.2,15.4,19.9,20.9,29,20.9h14.5
123 | 	c2.7,0,3.6,0.9,3.6,3.2v9.6c0,2.7-0.9,5.4-4.5,5.4c-8.2,0-17.2-1.4-67.1-1.4c-38.1,0-53.5,1.4-63.4,1.4c-3.6,0-4.5-2.7-4.5-5.4v-9.5
124 | 	c0-2.7,0.9-3.6,3.6-3.6h11.8c12.7,0,16.3-4.5,13.6-16.8c-3.6-16.3-21.7-74.3-27.2-91.1h-105.8V2959.4z M1205.7,2941.4
125 | 	c-4.5-13.1-35.3-116.5-38.1-125.5c-1.8-5.4-2.7-8.2-4.5-8.2s-4.5,1.8-5.9,5.9c-1.8,5.9-42.6,117.4-45.8,127.8H1205.7L1205.7,2941.4z
126 | 	"/>
127 | <path fill="#007833" d="M1460.6,2961.5c0,22.7,0.9,88.8,1.3,95.6c0.1,5.6,4.7,10,10.2,10c0.1,0,0.1,0,0.2,0h17.2
128 | 	c1.8,0,3.6,0.9,3.6,2.3v12.2c0,1.8-0.9,3.6-2.7,3.6c-5.4,0-19-1.4-58-1.4s-53.5,1.4-58,1.4c-1.8,0-2.7-0.9-2.7-3.2v-12.2
129 | 	c-0.2-1.3,0.8-2.6,2.1-2.7c0.2,0,0.4,0,0.6,0h16.3c8.2,0,11.8-3.6,12.2-8.6c0.5-3.6,1.4-86.1,1.4-142.3v-26.6
130 | 	c0-8.2-0.9-95.2-0.9-99.3c0-5.4-2.7-10.9-13.9-10.9h-13.9c-2.7,0-3.6-0.9-3.6-3.2v-12c0-1.8,0.9-2.7,3.6-2.7c5.4,0,19,1.4,58,1.4
131 | 	c35.3,0,49.8-1.4,56.2-1.4c2.7,0,3.6,0.9,3.6,3.1v11.4c0,2.7-0.9,3.6-3.6,3.6h-14.5c-9.1,0-13.9,2.7-13.9,13.1
132 | 	c0,7-1.8,77.9-1.8,99.7v19.9h8.2c5.8-0.1,11.3-2.4,15.4-6.3c8.6-7,87.4-105.6,92.4-112.4c6.3-9.1,0.9-14.5-7-14.5h-7
133 | 	c-1.8,0-2.3-0.9-2.3-3.2v-10.7c0-2.7,0.9-3.6,2.7-3.6c5.4,0,23.6,1.4,53.5,1.4c18.1,0,33.5-1.4,39-1.4c2.7,0,3.6,0.9,3.6,3.1v11.4
134 | 	c0.2,1.8-1.2,3.5-3,3.6c-0.2,0-0.4,0-0.6,0h-9.1c-9,0.2-17.8,2.8-25.4,7.7c-21.3,13.9-85.2,87.9-102.4,110.6
135 | 	c18.1,22.7,110.6,142.3,124.1,158.2c6.9,7.6,16.9,11.6,27.2,10.9h10c1.8-0.2,3.4,1,3.6,2.8c0,0.1,0,0.3,0,0.4v11.4
136 | 	c0,2.7-0.9,3.6-3.6,3.6c-4.5,0-20.9-1.4-62.7-1.4c-38.1,0-55.3,1.4-61.6,1.4c-1.8,0-2.7-0.9-2.7-3.6v-11.5c-0.2-1.3,0.7-2.5,2-2.7
137 | 	c0.3,0,0.5,0,0.8,0h6.3c4.4-0.1,8-3.7,7.9-8.1c0-1.3-0.3-2.5-0.9-3.7c-4.5-9.1-77.9-114.2-85.2-121.4c-4-4.2-9.6-6.5-15.4-6.3h-8.2
138 | 	L1460.6,2961.5z"/>
139 | <path fill="#007833" d="M1930.2,2937.8c0,33.5,1.1,95.9,2.2,104.6c1.7,11.2,5.6,20.9,14.5,20.9h27.9c3.3,0,4.5,1.7,4.5,3.9v13.4
140 | 	c0,3.9-1.7,5-6.1,5c-7,0-18.4-1.7-76.7-1.7c-54.7,0-68,1.7-73.6,1.7c-3.3,0-4.5-1.7-4.5-5v-12.8c0-2.8,1.1-4.5,3.3-4.5h19
141 | 	c12.3,0,20.9-3.3,20.9-18.4c1.1-55.8,1.1-108.2,1.1-160.3V2738c0-19-7.8-27.9-17.8-27.9h-21.7c-2.8,0-3.9-1.1-3.9-3.3v-15.7
142 | 	c0-2.8,1.1-3.9,3.9-3.9c3.9,0,21.7,1.7,76.7,1.7c31.2,0,82.5-1.7,99.3-1.7c44.6,0,79.2,15.1,101.5,34.9
143 | 	c18.5,15.7,29.4,38.5,30.1,62.7c0,70.3-42.4,99.3-103.7,109.3v2.4c13.4,5.1,25.1,14,33.5,25.7c7,8.9,41.3,61.3,45.7,68
144 | 	c14.7,22,30.5,43.2,47.4,63.6c10.6,9.5,26.2,10.6,39.6,10.6c3.3,0,4.5,1.7,4.5,3.9v11.7c0,3.9-1.1,5.6-4.5,5.6
145 | 	c-4.5,0-19-1.7-54.7-1.7c-15.6,0-36.8,1.1-48.8,1.7c-9.6-9.4-17.9-20.1-24.5-31.8c-10-15.6-31.8-55.8-36.8-64.7
146 | 	c-5.6-10-20.9-35.7-30.1-48c-13.9-19.5-26.8-34.9-50.2-34.9c-2.2,0-11.7,0.6-17.8,1.1L1930.2,2937.8z M1931.3,2887.5
147 | 	c6.9,1.2,13.9,2,20.9,2.2c87,0,104.6-70.3,104.6-95.9c0.5-23.3-8.3-45.9-24.5-62.7c-20.1-17.8-45.7-24-73.6-24
148 | 	c-6.9-0.7-13.8,0.5-20.1,3.3c-4.8,3.9-7.4,10-7,16.2L1931.3,2887.5z"/>
149 | <path fill="#007833" d="M2323.9,2937c0,4.5,0.9,108.7,0.9,113.8c0,14.5,7,16.3,18.1,16.3h19c1.8-0.2,3.4,1,3.6,2.8
150 | 	c0,0.1,0,0.3,0,0.4v11.4c0,2.7-1.8,3.6-4.5,3.6c-5.4,0-23.6-1.4-66.2-1.4c-45.3,0-65.2,1.4-69.7,1.4c-2.7,0-3.6-0.9-3.6-3.2V3070
151 | 	c0-1.8,0.9-2.7,3.6-2.7h18.1c9.1,0,19-0.9,20.9-13.1c0.9-6.3,1.8-104.6,1.8-128.7v-31.9c0-10.9-0.9-92.4-1.4-98.4
152 | 	c-1.3-10.9-5.9-15.4-18.6-15.4h-23.6c-2.7,0-3.6-0.9-3.6-3.2v-11.7c0-2.3,0.9-3.1,2.7-3.1c5.4,0,29,1.4,73.4,1.4s58-1.4,64.3-1.4
153 | 	c2.7,0,3.6,1.8,3.6,4v10.5c0.2,1.8-1.2,3.5-3,3.6c-0.2,0-0.4,0-0.6,0h-17.2c-10,0-17.2,3.6-17.2,12.2c0,4.5-0.9,104.6-0.9,109.7
154 | 	L2323.9,2937z"/>
155 | <path fill="#007833" d="M2443.7,2875.4c0-11.8-1.8-76.1-2.7-83c-0.9-10.9-7-12.7-16.3-12.7h-24.6c-1.8,0.2-3.4-1-3.6-2.8
156 | 	c0-0.1,0-0.3,0-0.4v-12.3c0-1.8,1.8-2.7,4-2.7c13.1,0,36.7,1.4,68.5,1.4c22.7,0,45.3-1.4,68-1.4c60.7,0,109.7,5.9,151.3,41.2
157 | 	c29.9,24.5,50.7,62.7,50.7,115.1s-14.5,86.1-44.4,118.5c-44.4,48-115.1,48.8-145.9,48.8c-17.2,0-63.4-1.4-86.1-1.4
158 | 	c-39.9,0-58,1.4-64.3,1.4c-1.8,0-2.7-0.9-2.7-4v-11.3c0-1.8,0.9-2.7,3.6-2.7h26.3c9.1,0,13.9-3.6,14.9-11.3
159 | 	c1.4-9.1,3.2-111.5,3.2-145.9L2443.7,2875.4z M2500.8,2919.9c0,5.4,0.9,82.5,0.9,110.6c-0.2,9.1,2.6,18.1,8.2,25.4
160 | 	c5.4,8.6,22.7,12.2,39.9,12.2c36.8,0.2,71.7-16.7,94.3-45.8c21.8-30,33.2-66.2,32.6-103.3c-0.5-36.1-13.3-71-36.3-98.8
161 | 	c-26.9-27.1-63.4-42.4-101.5-42.6c-10,0-21.7,0-29,4.5c-7.2,4.5-8.2,10.9-8.2,19.9c-0.9,19-0.9,77.9-0.9,85.2V2919.9z"/>
162 | <path fill="#007833" d="M3075.6,3038.9c0,6.3,1.8,10.9,1.8,14.5s-2.7,5.4-7,7c-39.6,20.8-83.9,31.1-128.7,29.9
163 | 	c-44.4,0-99.7-15.4-130.5-52.6c-27.9-30.8-41.8-69.7-41.8-123.2c0.2-43.7,19-85.2,51.7-114.2c32.6-29,74.3-44.4,128.7-44.4
164 | 	c43.5,0,77.9,13.9,92.4,24.5c5.4,4.1,8.2,2.7,10.9-0.9l5.4-7c2.3-3.2,3.6-4.5,6.3-4.5c4.5,0,5.4,2.7,5.4,6.3l4.5,90.6
165 | 	c0,3.6,0,5.9-1.8,6.3l-10.9,1.8c-2.7,0.5-3.6-0.9-4.5-3.6c-7.7-23.2-20-44.5-36.2-62.7c-19.6-20.5-46.8-32-75.2-31.7
166 | 	c-29.7-1.1-58.2,12-76.7,35.3c-24.1,28.7-37.3,64.9-37.2,102.4c0,53.5,14.5,99.7,40.8,126c18.8,19.9,45.1,31.1,72.5,30.8
167 | 	c29,0,46.2-3.6,61.6-11.8c10-5.4,10.9-18.1,10.9-41.8v-32.7c0-16.3-4.5-20.9-16.3-20.9h-19.5c-2.7,0-3.6-1.8-3.6-4.1v-10.5
168 | 	c0-2.7,0.9-3.6,2.7-3.6c5.4,0,27.2,1.3,64.3,1.3c37.2,0,53.5-1.3,58-1.3s4.5,0.9,4.5,3.1v11.4c0,2.7-0.9,3.6-3.6,3.6h-15.4
169 | 	c-8.2,0-13.9,3.6-13.9,16.8L3075.6,3038.9z"/>
170 | <path fill="#007833" d="M3214.7,2952.4c0.5,24.5,1.4,68.9,1.4,77c0,23.6,8.2,32.6,18.1,35.3c11.6,2.7,23.5,3.9,35.3,3.6
171 | 	c8.2,0,29-0.9,38.1-1.8c13.5-1.8,26.3-7.1,37.2-15.4c8.2-7,33.5-37.2,41.8-48c1.8-2.3,3.2-2.7,4.9-1.8l7,4.5
172 | 	c1.4,0.9,1.8,2.3,0.9,4.5l-21.7,68c-1.2,4.4-5.4,7.3-10,7c-4.5,0-39.9-1.4-191.2-1.4c-33.5,0-49.8,1.4-58,1.4
173 | 	c-2.7,0-3.6-0.9-3.6-3.6v-10.8c0-2.7,0.9-3.6,4.5-3.6h15.4c20.9,0,22.7-2.7,22.7-19.4v-152.5c0-37.2-0.9-93.3-1.8-99.7
174 | 	c-1.8-12.2-6.3-15.8-19.9-15.8h-17.3c-1.8,0-2.7-0.9-2.7-3.2v-11.8c0-2.3,0.5-3.1,2.7-3.1c10,0,26.3,1.8,63.4,1.4h12.7
175 | 	c19.9,0,124.1-0.9,144.1-0.9c7,0,15.9-0.5,18.1-0.5c1.7-0.3,3.3,0.9,3.6,2.6v0.1c1.8,5.9,19,67.5,20.9,72.1c0.9,2.7,0.4,4-1.4,4.9
176 | 	l-8.2,3.6c-1.4,0.4-2.7,0-4.1-2.3c-1.4-2.3-38.3-44.4-49.2-50.7c-10.4-6.6-22.2-10.6-34.4-11.8l-68.9-2.2l-0.9,131
177 | 	c17.2,0.1,34.5-0.6,51.7-2.2c9.5-1.1,17.6-7.4,20.9-16.3c3.9-8,6.2-16.6,7-25.4c-0.2-1.8,1.2-3.5,3-3.6c0.2,0,0.4,0,0.6,0l11.8,0.9
178 | 	c1.8,0,3.2,0.9,2.7,3.6c-0.9,7-0.9,37.2-0.9,48.8c0,18.1,1.8,39.9,2.7,49.8c0,2.2-0.9,2.7-3.6,3.2l-10,1.8c-1.8,0.4-2.7-0.5-3.6-3.1
179 | 	c-1.9-7.7-4.2-15.3-7-22.7c-4.1-9.1-12.7-15.3-22.7-16.3c-16.3-1.8-38.1-2.7-52.6-2.7L3214.7,2952.4z"/>
180 | <path fill="#007833" d="M36.2,3114.1c0,0,9.7,1.7,24.5-9.2c19.2-13.9,24.5-20.1,27.3-27.9c0,0,2.4-6.4-11.6-17.5
181 | 	C60.7,3047,13.7,3020,13.7,3020c3.3-1.9,13.5,0.3,22.3-2.2c-8-5.2-15.5-11.2-22.2-18.1c-6.1-5.4-10.8-12.2-13.9-19.7
182 | 	c11.7,2.2,22.8,6.8,32.8,13.3c0,0,18.6,11.3,13-3.4c-5-13.2-8.4-14.4-8.4-14.4s31.8,2.9,61.1,9.1c29.3,6.2,67.8,31.1,80.8,16.5
183 | 	s29.6-40.8-5.2-64.4s-53.7-26.7-77-33.2c-16-4.6-32.3-8.3-48.8-11.1c-13.8-3-25.1-12.8-30.3-25.9c0,0,36.2,20.1,39.9,4
184 | 	c3.6-16.2-2-28.2-13.9-40.1c-7.5-7.8-14.5-16.1-20.9-24.9c0,0,1.1,2.4,19.9,4.9s19.6,16.7,18.5,2.6c-1.2-9.6-4.7-18.8-10-26.9
185 | 	c0,0,13.2,8.2,33.2,18.8c19.9,10.6,41.8,34.4,51,23.9s20.5-14.6,5.7-37.1c-12.1-20.3-22.4-41.5-30.8-63.6c4.9,3.5,14.6,9,18.5,8.2
186 | 	c-2.8-12.9-6.9-25.5-12.3-37.6c0,0,8.2-1.1,22,9.6l13.9,10.7c2.6-4.3,4.2-9,4.9-13.9c0,0,23.3,17.8,29.1,29.7s31.3,28.5,29.1,12.1
187 | 	c-0.8-8.9-4-17.4-9.1-24.7c5.9,2.3,11.4,5.5,16.3,9.5c0,0,10.6-2.2,3.8-22.3c-4-11.7-6.9-23.8-8.8-36c12.2,7.5,25,35.3,40.7,41.8
188 | 	c1-5.3-1.9-16.7-1-20c11.4,12.8,17.2,29.6,16,46.7c-1.4,31.2,0.8,60.4-4,86.4c-5.4,29.8,14.6,71.6,31.2,51.8
189 | 	c16.7-19.8,20.9-39.6,28.8-66.4c6-19.8,17.5-37.6,33.3-51.1c0,0-7.9,18.7-2.6,24.7c6-11.7,10.4-8.4,21.4-22.1
190 | 	c13.8-18.2,31.1-33.5,50.7-45c-7.4,8.7-13.5,18.4-18,28.9c-4.8,13.9,1.5,38.6,8.6,26.9c5.1-8.6,12.3-15.8,20.9-20.9
191 | 	c0,0,2.4,13.2-15,51.4c-9.3,20.9-36.7,43.5-21.4,53.7c17.7,11.6,29.3,14.4,40.8,8.4c7.1-4.5,15.2-7.1,23.6-7.7
192 | 	c-0.8,4.6-7,7-8.1,12.5c1.2,3,13.3-9.4,26.8-10c-17.1,32.6-34.9,34.9-34.3,42.9c5.6,3.8,18,1.9,24.4,8.5
193 | 	c-18.4,2.9-36.5,7.9-53.8,14.8c-18.9,9.2-43.1,34.3-57,58.1c-17.1,29.1-30.3,103.6,6.5,71.4s70.5-95.3,70.5-95.3
194 | 	s-2.8,30.7,11.4,23.8c14.2-6.8,44.4-33.9,44.4-33.9c-2.8,17.4-8.6,34.1-17.1,49.5c-9.3,17.1-15.5,28.6-13.9,40
195 | 	c2.4,3.8,8.9-10.2,45.4-17.8c-10.9,15.8-24.2,29.9-39.3,41.8c-25,18.4-47.9,40.4-53.1,59c-5.6,19.9,15.8,46.3,44.2,21.8
196 | 	c15.6-13,28.1-29.3,36.7-47.7c7-13.9,20.5-41.8,28.6-51c0.7,9,3,17.8,7,25.9c3.8,1.9,9-4.8,37.8-12.5c0,33.7-18.5,50.1-13.9,54.8
197 | 	c0,0,4,5.6,16,2.6c18-4.6,18.8-3.3,18.8-3.3c-19.7,23.8-45.2,42.2-74,53.5c-26.9,10.9-132.9,50.3-116.1,112.9
198 | 	c3.5,12.9,51.7,89.9,122.4,93.7l12.3-0.6c7.3-1,10.5-2.2,11.7-0.6c1.3,1.7,1,3.8,2.4,12c2.2,13-13.2,9.4-44.5,1.9
199 | 	c-55.8-13.9-97.6-73-115.4-99.1c-11.4-14.1-25.1-26.1-40.5-35.5c-24.6-12.8-51.9-19.7-79.6-19.9c-44.5-0.3-90.2,14.7-111.5,15.3
200 | 	c6-5.4,21.6-9.3,27-13.9c-8.6-8.3-48.8-10.9-60.2-19c13.9-12.2,44.8-13.9,50.8-19.9c-3.8-6.3-5.4-13.6-4.5-20.9
201 | 	c0,0,10.7,8.2,42.7,8.8c23.5,0.4,48,2.5,60-8.9c14.5-13.9,17.6-33.6,0-45.6c-17.2-13-38.8-18.8-60.2-16c-27.9,2.1-120.6,25-120.6,25
202 | 	C77.2,3143.4,43.6,3121.4,36.2,3114.1z"/>
203 | </svg>
204 | 


--------------------------------------------------------------------------------
/figures/perf_report_homescreen.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/perf_report_homescreen.png


--------------------------------------------------------------------------------
/figures/read_portal.svg:
--------------------------------------------------------------------------------
   1 | <?xml version="1.0" standalone="no"?>
   2 | <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
   3 | <svg version="1.1" width="1200" height="726" onload="init(evt)" viewBox="0 0 1200 726" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
   4 | <!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.html for examples. -->
   5 | <!-- NOTES:  -->
   6 | <defs>
   7 | 	<linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
   8 | 		<stop stop-color="#eeeeee" offset="5%" />
   9 | 		<stop stop-color="#eeeeb0" offset="95%" />
  10 | 	</linearGradient>
  11 | </defs>
  12 | <style type="text/css">
  13 | 	text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); }
  14 | 	#search { opacity:0.1; cursor:pointer; }
  15 | 	#search:hover, #search.show { opacity:1; }
  16 | 	#subtitle { text-anchor:middle; font-color:rgb(160,160,160); }
  17 | 	#title { text-anchor:middle; font-size:17px}
  18 | 	#unzoom { cursor:pointer; }
  19 | 	#frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; }
  20 | 	.hide { display:none; }
  21 | 	.parent { opacity:0.5; }
  22 | </style>
  23 | <script type="text/ecmascript">
  24 | <![CDATA[
  25 | 	"use strict";
  26 | 	var details, searchbtn, unzoombtn, matchedtxt, svg, searching;
  27 | 	function init(evt) {
  28 | 		details = document.getElementById("details").firstChild;
  29 | 		searchbtn = document.getElementById("search");
  30 | 		unzoombtn = document.getElementById("unzoom");
  31 | 		matchedtxt = document.getElementById("matched");
  32 | 		svg = document.getElementsByTagName("svg")[0];
  33 | 		searching = 0;
  34 | 	}
  35 | 
  36 | 	window.addEventListener("click", function(e) {
  37 | 		var target = find_group(e.target);
  38 | 		if (target) {
  39 | 			if (target.nodeName == "a") {
  40 | 				if (e.ctrlKey === false) return;
  41 | 				e.preventDefault();
  42 | 			}
  43 | 			if (target.classList.contains("parent")) unzoom();
  44 | 			zoom(target);
  45 | 		}
  46 | 		else if (e.target.id == "unzoom") unzoom();
  47 | 		else if (e.target.id == "search") search_prompt();
  48 | 	}, false)
  49 | 
  50 | 	// mouse-over for info
  51 | 	// show
  52 | 	window.addEventListener("mouseover", function(e) {
  53 | 		var target = find_group(e.target);
  54 | 		if (target) details.nodeValue = "Function: " + g_to_text(target);
  55 | 	}, false)
  56 | 
  57 | 	// clear
  58 | 	window.addEventListener("mouseout", function(e) {
  59 | 		var target = find_group(e.target);
  60 | 		if (target) details.nodeValue = ' ';
  61 | 	}, false)
  62 | 
  63 | 	// ctrl-F for search
  64 | 	window.addEventListener("keydown",function (e) {
  65 | 		if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) {
  66 | 			e.preventDefault();
  67 | 			search_prompt();
  68 | 		}
  69 | 	}, false)
  70 | 
  71 | 	// functions
  72 | 	function find_child(node, selector) {
  73 | 		var children = node.querySelectorAll(selector);
  74 | 		if (children.length) return children[0];
  75 | 		return;
  76 | 	}
  77 | 	function find_group(node) {
  78 | 		var parent = node.parentElement;
  79 | 		if (!parent) return;
  80 | 		if (parent.id == "frames") return node;
  81 | 		return find_group(parent);
  82 | 	}
  83 | 	function orig_save(e, attr, val) {
  84 | 		if (e.attributes["_orig_" + attr] != undefined) return;
  85 | 		if (e.attributes[attr] == undefined) return;
  86 | 		if (val == undefined) val = e.attributes[attr].value;
  87 | 		e.setAttribute("_orig_" + attr, val);
  88 | 	}
  89 | 	function orig_load(e, attr) {
  90 | 		if (e.attributes["_orig_"+attr] == undefined) return;
  91 | 		e.attributes[attr].value = e.attributes["_orig_" + attr].value;
  92 | 		e.removeAttribute("_orig_"+attr);
  93 | 	}
  94 | 	function g_to_text(e) {
  95 | 		var text = find_child(e, "title").firstChild.nodeValue;
  96 | 		return (text)
  97 | 	}
  98 | 	function g_to_func(e) {
  99 | 		var func = g_to_text(e);
 100 | 		// if there's any manipulation we want to do to the function
 101 | 		// name before it's searched, do it here before returning.
 102 | 		return (func);
 103 | 	}
 104 | 	function update_text(e) {
 105 | 		var r = find_child(e, "rect");
 106 | 		var t = find_child(e, "text");
 107 | 		var w = parseFloat(r.attributes.width.value) -3;
 108 | 		var txt = find_child(e, "title").textContent.replace(/\([^(]*\)$/,"");
 109 | 		t.attributes.x.value = parseFloat(r.attributes.x.value) + 3;
 110 | 
 111 | 		// Smaller than this size won't fit anything
 112 | 		if (w < 2 * 12 * 0.59) {
 113 | 			t.textContent = "";
 114 | 			return;
 115 | 		}
 116 | 
 117 | 		t.textContent = txt;
 118 | 		// Fit in full text width
 119 | 		if (/^ *$/.test(txt) || t.getSubStringLength(0, txt.length) < w)
 120 | 			return;
 121 | 
 122 | 		for (var x = txt.length - 2; x > 0; x--) {
 123 | 			if (t.getSubStringLength(0, x + 2) <= w) {
 124 | 				t.textContent = txt.substring(0, x) + "..";
 125 | 				return;
 126 | 			}
 127 | 		}
 128 | 		t.textContent = "";
 129 | 	}
 130 | 
 131 | 	// zoom
 132 | 	function zoom_reset(e) {
 133 | 		if (e.attributes != undefined) {
 134 | 			orig_load(e, "x");
 135 | 			orig_load(e, "width");
 136 | 		}
 137 | 		if (e.childNodes == undefined) return;
 138 | 		for (var i = 0, c = e.childNodes; i < c.length; i++) {
 139 | 			zoom_reset(c[i]);
 140 | 		}
 141 | 	}
 142 | 	function zoom_child(e, x, ratio) {
 143 | 		if (e.attributes != undefined) {
 144 | 			if (e.attributes.x != undefined) {
 145 | 				orig_save(e, "x");
 146 | 				e.attributes.x.value = (parseFloat(e.attributes.x.value) - x - 10) * ratio + 10;
 147 | 				if (e.tagName == "text")
 148 | 					e.attributes.x.value = find_child(e.parentNode, "rect[x]").attributes.x.value + 3;
 149 | 			}
 150 | 			if (e.attributes.width != undefined) {
 151 | 				orig_save(e, "width");
 152 | 				e.attributes.width.value = parseFloat(e.attributes.width.value) * ratio;
 153 | 			}
 154 | 		}
 155 | 
 156 | 		if (e.childNodes == undefined) return;
 157 | 		for (var i = 0, c = e.childNodes; i < c.length; i++) {
 158 | 			zoom_child(c[i], x - 10, ratio);
 159 | 		}
 160 | 	}
 161 | 	function zoom_parent(e) {
 162 | 		if (e.attributes) {
 163 | 			if (e.attributes.x != undefined) {
 164 | 				orig_save(e, "x");
 165 | 				e.attributes.x.value = 10;
 166 | 			}
 167 | 			if (e.attributes.width != undefined) {
 168 | 				orig_save(e, "width");
 169 | 				e.attributes.width.value = parseInt(svg.width.baseVal.value) - (10 * 2);
 170 | 			}
 171 | 		}
 172 | 		if (e.childNodes == undefined) return;
 173 | 		for (var i = 0, c = e.childNodes; i < c.length; i++) {
 174 | 			zoom_parent(c[i]);
 175 | 		}
 176 | 	}
 177 | 	function zoom(node) {
 178 | 		var attr = find_child(node, "rect").attributes;
 179 | 		var width = parseFloat(attr.width.value);
 180 | 		var xmin = parseFloat(attr.x.value);
 181 | 		var xmax = parseFloat(xmin + width);
 182 | 		var ymin = parseFloat(attr.y.value);
 183 | 		var ratio = (svg.width.baseVal.value - 2 * 10) / width;
 184 | 
 185 | 		// XXX: Workaround for JavaScript float issues (fix me)
 186 | 		var fudge = 0.0001;
 187 | 
 188 | 		unzoombtn.classList.remove("hide");
 189 | 
 190 | 		var el = document.getElementById("frames").children;
 191 | 		for (var i = 0; i < el.length; i++) {
 192 | 			var e = el[i];
 193 | 			var a = find_child(e, "rect").attributes;
 194 | 			var ex = parseFloat(a.x.value);
 195 | 			var ew = parseFloat(a.width.value);
 196 | 			var upstack;
 197 | 			// Is it an ancestor
 198 | 			if (0 == 0) {
 199 | 				upstack = parseFloat(a.y.value) > ymin;
 200 | 			} else {
 201 | 				upstack = parseFloat(a.y.value) < ymin;
 202 | 			}
 203 | 			if (upstack) {
 204 | 				// Direct ancestor
 205 | 				if (ex <= xmin && (ex+ew+fudge) >= xmax) {
 206 | 					e.classList.add("parent");
 207 | 					zoom_parent(e);
 208 | 					update_text(e);
 209 | 				}
 210 | 				// not in current path
 211 | 				else
 212 | 					e.classList.add("hide");
 213 | 			}
 214 | 			// Children maybe
 215 | 			else {
 216 | 				// no common path
 217 | 				if (ex < xmin || ex + fudge >= xmax) {
 218 | 					e.classList.add("hide");
 219 | 				}
 220 | 				else {
 221 | 					zoom_child(e, xmin, ratio);
 222 | 					update_text(e);
 223 | 				}
 224 | 			}
 225 | 		}
 226 | 	}
 227 | 	function unzoom() {
 228 | 		unzoombtn.classList.add("hide");
 229 | 		var el = document.getElementById("frames").children;
 230 | 		for(var i = 0; i < el.length; i++) {
 231 | 			el[i].classList.remove("parent");
 232 | 			el[i].classList.remove("hide");
 233 | 			zoom_reset(el[i]);
 234 | 			update_text(el[i]);
 235 | 		}
 236 | 	}
 237 | 
 238 | 	// search
 239 | 	function reset_search() {
 240 | 		var el = document.querySelectorAll("#frames rect");
 241 | 		for (var i = 0; i < el.length; i++) {
 242 | 			orig_load(el[i], "fill")
 243 | 		}
 244 | 	}
 245 | 	function search_prompt() {
 246 | 		if (!searching) {
 247 | 			var term = prompt("Enter a search term (regexp " +
 248 | 			    "allowed, eg: ^ext4_)", "");
 249 | 			if (term != null) {
 250 | 				search(term)
 251 | 			}
 252 | 		} else {
 253 | 			reset_search();
 254 | 			searching = 0;
 255 | 			searchbtn.classList.remove("show");
 256 | 			searchbtn.firstChild.nodeValue = "Search"
 257 | 			matchedtxt.classList.add("hide");
 258 | 			matchedtxt.firstChild.nodeValue = ""
 259 | 		}
 260 | 	}
 261 | 	function search(term) {
 262 | 		var re = new RegExp(term);
 263 | 		var el = document.getElementById("frames").children;
 264 | 		var matches = new Object();
 265 | 		var maxwidth = 0;
 266 | 		for (var i = 0; i < el.length; i++) {
 267 | 			var e = el[i];
 268 | 			var func = g_to_func(e);
 269 | 			var rect = find_child(e, "rect");
 270 | 			if (func == null || rect == null)
 271 | 				continue;
 272 | 
 273 | 			// Save max width. Only works as we have a root frame
 274 | 			var w = parseFloat(rect.attributes.width.value);
 275 | 			if (w > maxwidth)
 276 | 				maxwidth = w;
 277 | 
 278 | 			if (func.match(re)) {
 279 | 				// highlight
 280 | 				var x = parseFloat(rect.attributes.x.value);
 281 | 				orig_save(rect, "fill");
 282 | 				rect.attributes.fill.value = "rgb(230,0,230)";
 283 | 
 284 | 				// remember matches
 285 | 				if (matches[x] == undefined) {
 286 | 					matches[x] = w;
 287 | 				} else {
 288 | 					if (w > matches[x]) {
 289 | 						// overwrite with parent
 290 | 						matches[x] = w;
 291 | 					}
 292 | 				}
 293 | 				searching = 1;
 294 | 			}
 295 | 		}
 296 | 		if (!searching)
 297 | 			return;
 298 | 
 299 | 		searchbtn.classList.add("show");
 300 | 		searchbtn.firstChild.nodeValue = "Reset Search";
 301 | 
 302 | 		// calculate percent matched, excluding vertical overlap
 303 | 		var count = 0;
 304 | 		var lastx = -1;
 305 | 		var lastw = 0;
 306 | 		var keys = Array();
 307 | 		for (k in matches) {
 308 | 			if (matches.hasOwnProperty(k))
 309 | 				keys.push(k);
 310 | 		}
 311 | 		// sort the matched frames by their x location
 312 | 		// ascending, then width descending
 313 | 		keys.sort(function(a, b){
 314 | 			return a - b;
 315 | 		});
 316 | 		// Step through frames saving only the biggest bottom-up frames
 317 | 		// thanks to the sort order. This relies on the tree property
 318 | 		// where children are always smaller than their parents.
 319 | 		var fudge = 0.0001;	// JavaScript floating point
 320 | 		for (var k in keys) {
 321 | 			var x = parseFloat(keys[k]);
 322 | 			var w = matches[keys[k]];
 323 | 			if (x >= lastx + lastw - fudge) {
 324 | 				count += w;
 325 | 				lastx = x;
 326 | 				lastw = w;
 327 | 			}
 328 | 		}
 329 | 		// display matched percent
 330 | 		matchedtxt.classList.remove("hide");
 331 | 		var pct = 100 * count / maxwidth;
 332 | 		if (pct != 100) pct = pct.toFixed(1)
 333 | 		matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%";
 334 | 	}
 335 | ]]>
 336 | </script>
 337 | <rect x="0.0" y="0" width="1200.0" height="726.0" fill="url(#background)"  />
 338 | <text id="title" x="600.00" y="24" >Flame Graph</text>
 339 | <text id="details" x="10.00" y="709" > </text>
 340 | <text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text>
 341 | <text id="search" x="1090.00" y="24" >Search</text>
 342 | <text id="matched" x="1090.00" y="709" > </text>
 343 | <g id="frames">
 344 | <g >
 345 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="533" width="0.7" height="15.0" fill="rgb(236,167,0)" rx="2" ry="2" />
 346 | <text  x="13.23" y="543.5" ></text>
 347 | </g>
 348 | <g >
 349 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Get (44 samples, 0.85%)</title><rect x="785.4" y="421" width="10.1" height="15.0" fill="rgb(230,176,20)" rx="2" ry="2" />
 350 | <text  x="788.45" y="431.5" ></text>
 351 | </g>
 352 | <g >
 353 | <title>[unknown] (5 samples, 0.10%)</title><rect x="334.3" y="325" width="1.1" height="15.0" fill="rgb(205,161,20)" rx="2" ry="2" />
 354 | <text  x="337.28" y="335.5" ></text>
 355 | </g>
 356 | <g >
 357 | <title>[unknown] (44 samples, 0.85%)</title><rect x="883.2" y="405" width="10.0" height="15.0" fill="rgb(243,94,19)" rx="2" ry="2" />
 358 | <text  x="886.23" y="415.5" ></text>
 359 | </g>
 360 | <g >
 361 | <title>std::condition_variable::wait&lt;vtkm::cont::internal::ArrayHandleImpl::WaitToRead (161 samples, 3.10%)</title><rect x="579.9" y="485" width="36.6" height="15.0" fill="rgb(220,62,16)" rx="2" ry="2" />
 362 | <text  x="582.87" y="495.5" >std..</text>
 363 | </g>
 364 | <g >
 365 | <title>__gnu_cxx::__exchange_and_add (3 samples, 0.06%)</title><rect x="1069.5" y="437" width="0.7" height="15.0" fill="rgb(230,51,16)" rx="2" ry="2" />
 366 | <text  x="1072.48" y="447.5" ></text>
 367 | </g>
 368 | <g >
 369 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::ScheduleTask (582 samples, 11.22%)</title><rect x="681.3" y="501" width="132.3" height="15.0" fill="rgb(209,210,22)" rx="2" ry="2" />
 370 | <text  x="684.30" y="511.5" >vtkm::cont::Devi..</text>
 371 | </g>
 372 | <g >
 373 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="645" width="1.4" height="15.0" fill="rgb(243,207,18)" rx="2" ry="2" />
 374 | <text  x="1191.64" y="655.5" ></text>
 375 | </g>
 376 | <g >
 377 | <title>std::_Sp_counted_base&lt; (15 samples, 0.29%)</title><rect x="677.9" y="453" width="3.4" height="15.0" fill="rgb(231,79,6)" rx="2" ry="2" />
 378 | <text  x="680.89" y="463.5" ></text>
 379 | </g>
 380 | <g >
 381 | <title>[libstdc++.so.6.0.27] (1 samples, 0.02%)</title><rect x="10.0" y="645" width="0.2" height="15.0" fill="rgb(222,26,2)" rx="2" ry="2" />
 382 | <text  x="13.00" y="655.5" ></text>
 383 | </g>
 384 | <g >
 385 | <title>std::__shared_ptr&lt;bool,  (8 samples, 0.15%)</title><rect x="1028.5" y="469" width="1.9" height="15.0" fill="rgb(213,0,3)" rx="2" ry="2" />
 386 | <text  x="1031.54" y="479.5" ></text>
 387 | </g>
 388 | <g >
 389 | <title>vtkm::cont::Token::Token (139 samples, 2.68%)</title><rect x="361.1" y="501" width="31.6" height="15.0" fill="rgb(235,159,33)" rx="2" ry="2" />
 390 | <text  x="364.11" y="511.5" >vt..</text>
 391 | </g>
 392 | <g >
 393 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (3 samples, 0.06%)</title><rect x="553.7" y="485" width="0.7" height="15.0" fill="rgb(213,147,52)" rx="2" ry="2" />
 394 | <text  x="556.72" y="495.5" ></text>
 395 | </g>
 396 | <g >
 397 | <title>std::__shared_ptr&lt;bool,  (128 samples, 2.47%)</title><rect x="1054.9" y="485" width="29.1" height="15.0" fill="rgb(245,93,51)" rx="2" ry="2" />
 398 | <text  x="1057.92" y="495.5" >st..</text>
 399 | </g>
 400 | <g >
 401 | <title>vtkm::cont::Token::~Token (3 samples, 0.06%)</title><rect x="988.7" y="517" width="0.7" height="15.0" fill="rgb(254,223,19)" rx="2" ry="2" />
 402 | <text  x="991.75" y="527.5" ></text>
 403 | </g>
 404 | <g >
 405 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (44 samples, 0.85%)</title><rect x="175.1" y="469" width="10.0" height="15.0" fill="rgb(209,226,1)" rx="2" ry="2" />
 406 | <text  x="178.10" y="479.5" ></text>
 407 | </g>
 408 | <g >
 409 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="549" width="1.4" height="15.0" fill="rgb(213,91,46)" rx="2" ry="2" />
 410 | <text  x="1191.64" y="559.5" ></text>
 411 | </g>
 412 | <g >
 413 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel::operator (191 samples, 3.68%)</title><rect x="938.3" y="453" width="43.4" height="15.0" fill="rgb(230,128,22)" rx="2" ry="2" />
 414 | <text  x="941.26" y="463.5" >vtkm..</text>
 415 | </g>
 416 | <g >
 417 | <title>std::__shared_ptr&lt;bool,  (8 samples, 0.15%)</title><rect x="1047.9" y="501" width="1.8" height="15.0" fill="rgb(228,57,47)" rx="2" ry="2" />
 418 | <text  x="1050.87" y="511.5" ></text>
 419 | </g>
 420 | <g >
 421 | <title>[unknown] (2 samples, 0.04%)</title><rect x="10.5" y="501" width="0.4" height="15.0" fill="rgb(227,6,25)" rx="2" ry="2" />
 422 | <text  x="13.45" y="511.5" ></text>
 423 | </g>
 424 | <g >
 425 | <title>std::__shared_ptr&lt;bool,  (3 samples, 0.06%)</title><rect x="1005.8" y="485" width="0.7" height="15.0" fill="rgb(252,194,48)" rx="2" ry="2" />
 426 | <text  x="1008.80" y="495.5" ></text>
 427 | </g>
 428 | <g >
 429 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt;::~ArrayPortalCheck (163 samples, 3.14%)</title><rect x="1047.0" y="517" width="37.0" height="15.0" fill="rgb(232,155,33)" rx="2" ry="2" />
 430 | <text  x="1049.96" y="527.5" >vtk..</text>
 431 | </g>
 432 | <g >
 433 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="293" width="0.2" height="15.0" fill="rgb(208,206,36)" rx="2" ry="2" />
 434 | <text  x="742.51" y="303.5" ></text>
 435 | </g>
 436 | <g >
 437 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="501" width="1.4" height="15.0" fill="rgb(211,19,41)" rx="2" ry="2" />
 438 | <text  x="1191.64" y="511.5" ></text>
 439 | </g>
 440 | <g >
 441 | <title>__memmove_avx_unaligned_erms (36 samples, 0.69%)</title><rect x="1091.5" y="629" width="8.2" height="15.0" fill="rgb(231,40,23)" rx="2" ry="2" />
 442 | <text  x="1094.53" y="639.5" ></text>
 443 | </g>
 444 | <g >
 445 | <title>[unknown] (1 samples, 0.02%)</title><rect x="835.0" y="341" width="0.2" height="15.0" fill="rgb(244,90,20)" rx="2" ry="2" />
 446 | <text  x="838.02" y="351.5" ></text>
 447 | </g>
 448 | <g >
 449 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::~unique_ptr (6 samples, 0.12%)</title><rect x="249.9" y="501" width="1.4" height="15.0" fill="rgb(213,79,42)" rx="2" ry="2" />
 450 | <text  x="252.91" y="511.5" ></text>
 451 | </g>
 452 | <g >
 453 | <title>std::__shared_ptr&lt;bool,  (27 samples, 0.52%)</title><rect x="1006.7" y="469" width="6.2" height="15.0" fill="rgb(222,8,0)" rx="2" ry="2" />
 454 | <text  x="1009.71" y="479.5" ></text>
 455 | </g>
 456 | <g >
 457 | <title>std::__uniq_ptr_impl&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::__uniq_ptr_impl (76 samples, 1.46%)</title><rect x="373.6" y="469" width="17.3" height="15.0" fill="rgb(206,167,3)" rx="2" ry="2" />
 458 | <text  x="376.62" y="479.5" ></text>
 459 | </g>
 460 | <g >
 461 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::ScheduleTask (95 samples, 1.83%)</title><rect x="813.6" y="501" width="21.6" height="15.0" fill="rgb(209,185,7)" rx="2" ry="2" />
 462 | <text  x="816.65" y="511.5" >v..</text>
 463 | </g>
 464 | <g >
 465 | <title>vtkm::testing::Testing::Assert&lt;char const  (17 samples, 0.33%)</title><rect x="1086.5" y="517" width="3.9" height="15.0" fill="rgb(230,71,3)" rx="2" ry="2" />
 466 | <text  x="1089.53" y="527.5" ></text>
 467 | </g>
 468 | <g >
 469 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (6 samples, 0.12%)</title><rect x="932.8" y="469" width="1.4" height="15.0" fill="rgb(243,42,2)" rx="2" ry="2" />
 470 | <text  x="935.81" y="479.5" ></text>
 471 | </g>
 472 | <g >
 473 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (5 samples, 0.10%)</title><rect x="767.9" y="469" width="1.2" height="15.0" fill="rgb(248,68,20)" rx="2" ry="2" />
 474 | <text  x="770.94" y="479.5" ></text>
 475 | </g>
 476 | <g >
 477 | <title>vtkm::cont::Token::DetachFromAll (5 samples, 0.10%)</title><rect x="360.0" y="501" width="1.1" height="15.0" fill="rgb(212,176,43)" rx="2" ry="2" />
 478 | <text  x="362.97" y="511.5" ></text>
 479 | </g>
 480 | <g >
 481 | <title>std::__shared_count&lt; (2 samples, 0.04%)</title><rect x="626.3" y="469" width="0.4" height="15.0" fill="rgb(221,11,52)" rx="2" ry="2" />
 482 | <text  x="629.27" y="479.5" ></text>
 483 | </g>
 484 | <g >
 485 | <title>__gnu_cxx::__exchange_and_add (45 samples, 0.87%)</title><rect x="1071.7" y="421" width="10.3" height="15.0" fill="rgb(240,226,54)" rx="2" ry="2" />
 486 | <text  x="1074.75" y="431.5" ></text>
 487 | </g>
 488 | <g >
 489 | <title>[unknown] (9 samples, 0.17%)</title><rect x="891.2" y="293" width="2.0" height="15.0" fill="rgb(233,73,6)" rx="2" ry="2" />
 490 | <text  x="894.19" y="303.5" ></text>
 491 | </g>
 492 | <g >
 493 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::IteratorAt (6 samples, 0.12%)</title><rect x="739.7" y="437" width="1.4" height="15.0" fill="rgb(246,9,24)" rx="2" ry="2" />
 494 | <text  x="742.74" y="447.5" ></text>
 495 | </g>
 496 | <g >
 497 | <title>std::__shared_ptr&lt;bool,  (2 samples, 0.04%)</title><rect x="512.1" y="485" width="0.5" height="15.0" fill="rgb(223,81,12)" rx="2" ry="2" />
 498 | <text  x="515.11" y="495.5" ></text>
 499 | </g>
 500 | <g >
 501 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt;::Get&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;, 0&gt; (99 samples, 1.91%)</title><rect x="1024.4" y="517" width="22.6" height="15.0" fill="rgb(249,133,15)" rx="2" ry="2" />
 502 | <text  x="1027.45" y="527.5" >v..</text>
 503 | </g>
 504 | <g >
 505 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="341" width="0.2" height="15.0" fill="rgb(214,123,23)" rx="2" ry="2" />
 506 | <text  x="742.51" y="351.5" ></text>
 507 | </g>
 508 | <g >
 509 | <title>std::__shared_ptr_access&lt;bool,  (41 samples, 0.79%)</title><rect x="474.8" y="469" width="9.3" height="15.0" fill="rgb(240,196,37)" rx="2" ry="2" />
 510 | <text  x="477.81" y="479.5" ></text>
 511 | </g>
 512 | <g >
 513 | <title>std::uniform_int_distribution&lt;long long&gt;::operator (8 samples, 0.15%)</title><rect x="88.9" y="517" width="1.8" height="15.0" fill="rgb(210,173,46)" rx="2" ry="2" />
 514 | <text  x="91.91" y="527.5" ></text>
 515 | </g>
 516 | <g >
 517 | <title>__gthread_mutex_lock (13 samples, 0.25%)</title><rect x="299.9" y="421" width="3.0" height="15.0" fill="rgb(219,125,21)" rx="2" ry="2" />
 518 | <text  x="302.94" y="431.5" ></text>
 519 | </g>
 520 | <g >
 521 | <title>__gthread_active_p (5 samples, 0.10%)</title><rect x="1082.0" y="421" width="1.1" height="15.0" fill="rgb(230,119,19)" rx="2" ry="2" />
 522 | <text  x="1084.98" y="431.5" ></text>
 523 | </g>
 524 | <g >
 525 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="357" width="0.3" height="15.0" fill="rgb(234,157,43)" rx="2" ry="2" />
 526 | <text  x="770.03" y="367.5" ></text>
 527 | </g>
 528 | <g >
 529 | <title>__gthread_mutex_unlock (6 samples, 0.12%)</title><rect x="230.8" y="453" width="1.4" height="15.0" fill="rgb(222,74,40)" rx="2" ry="2" />
 530 | <text  x="233.81" y="463.5" ></text>
 531 | </g>
 532 | <g >
 533 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (9 samples, 0.17%)</title><rect x="495.1" y="501" width="2.0" height="15.0" fill="rgb(212,162,17)" rx="2" ry="2" />
 534 | <text  x="498.05" y="511.5" ></text>
 535 | </g>
 536 | <g >
 537 | <title>[unknown] (5 samples, 0.10%)</title><rect x="334.3" y="309" width="1.1" height="15.0" fill="rgb(228,191,26)" rx="2" ry="2" />
 538 | <text  x="337.28" y="319.5" ></text>
 539 | </g>
 540 | <g >
 541 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt;::~ArrayPortalCheck (3 samples, 0.06%)</title><rect x="52.8" y="533" width="0.6" height="15.0" fill="rgb(224,12,51)" rx="2" ry="2" />
 542 | <text  x="55.75" y="543.5" ></text>
 543 | </g>
 544 | <g >
 545 | <title>vtkm::cont::internal::ArrayHandleImpl::WaitToRead (6 samples, 0.12%)</title><rect x="991.0" y="517" width="1.4" height="15.0" fill="rgb(254,94,6)" rx="2" ry="2" />
 546 | <text  x="994.02" y="527.5" ></text>
 547 | </g>
 548 | <g >
 549 | <title>__gnu_cxx::__atomic_add_dispatch (7 samples, 0.13%)</title><rect x="533.5" y="437" width="1.6" height="15.0" fill="rgb(253,29,45)" rx="2" ry="2" />
 550 | <text  x="536.48" y="447.5" ></text>
 551 | </g>
 552 | <g >
 553 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (43 samples, 0.83%)</title><rect x="258.1" y="469" width="9.8" height="15.0" fill="rgb(225,122,11)" rx="2" ry="2" />
 554 | <text  x="261.10" y="479.5" ></text>
 555 | </g>
 556 | <g >
 557 | <title>free@plt (7 samples, 0.13%)</title><rect x="27.7" y="533" width="1.6" height="15.0" fill="rgb(216,172,30)" rx="2" ry="2" />
 558 | <text  x="30.74" y="543.5" ></text>
 559 | </g>
 560 | <g >
 561 | <title>std::__shared_ptr&lt;bool,  (4 samples, 0.08%)</title><rect x="477.1" y="453" width="0.9" height="15.0" fill="rgb(237,94,4)" rx="2" ry="2" />
 562 | <text  x="480.09" y="463.5" ></text>
 563 | </g>
 564 | <g >
 565 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (31 samples, 0.60%)</title><rect x="463.2" y="469" width="7.1" height="15.0" fill="rgb(219,69,2)" rx="2" ry="2" />
 566 | <text  x="466.22" y="479.5" ></text>
 567 | </g>
 568 | <g >
 569 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="357" width="0.3" height="15.0" fill="rgb(243,32,51)" rx="2" ry="2" />
 570 | <text  x="528.53" y="367.5" ></text>
 571 | </g>
 572 | <g >
 573 | <title>__munmap (15 samples, 0.29%)</title><rect x="677.9" y="277" width="3.4" height="15.0" fill="rgb(225,220,25)" rx="2" ry="2" />
 574 | <text  x="680.89" y="287.5" ></text>
 575 | </g>
 576 | <g >
 577 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel::operator (9 samples, 0.17%)</title><rect x="681.5" y="469" width="2.1" height="15.0" fill="rgb(209,5,16)" rx="2" ry="2" />
 578 | <text  x="684.52" y="479.5" ></text>
 579 | </g>
 580 | <g >
 581 | <title>std::_Head_base&lt;0ul, vtkm::cont::Token::InternalStruct*, false&gt;::_M_head (14 samples, 0.27%)</title><rect x="418.6" y="405" width="3.2" height="15.0" fill="rgb(239,53,38)" rx="2" ry="2" />
 582 | <text  x="421.65" y="415.5" ></text>
 583 | </g>
 584 | <g >
 585 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (16 samples, 0.31%)</title><rect x="84.4" y="517" width="3.6" height="15.0" fill="rgb(210,163,23)" rx="2" ry="2" />
 586 | <text  x="87.36" y="527.5" ></text>
 587 | </g>
 588 | <g >
 589 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Get (2 samples, 0.04%)</title><rect x="779.5" y="437" width="0.5" height="15.0" fill="rgb(216,73,44)" rx="2" ry="2" />
 590 | <text  x="782.54" y="447.5" ></text>
 591 | </g>
 592 | <g >
 593 | <title>std::_Tuple_impl&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_head (2 samples, 0.04%)</title><rect x="447.3" y="405" width="0.5" height="15.0" fill="rgb(229,61,51)" rx="2" ry="2" />
 594 | <text  x="450.30" y="415.5" ></text>
 595 | </g>
 596 | <g >
 597 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="405" width="0.3" height="15.0" fill="rgb(234,222,50)" rx="2" ry="2" />
 598 | <text  x="770.03" y="415.5" ></text>
 599 | </g>
 600 | <g >
 601 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel::operator (348 samples, 6.71%)</title><rect x="688.1" y="453" width="79.2" height="15.0" fill="rgb(223,112,21)" rx="2" ry="2" />
 602 | <text  x="691.12" y="463.5" >vtkm::con..</text>
 603 | </g>
 604 | <g >
 605 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::unique_ptr&lt;std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt;, void&gt; (124 samples, 2.39%)</title><rect x="364.5" y="485" width="28.2" height="15.0" fill="rgb(211,201,23)" rx="2" ry="2" />
 606 | <text  x="367.52" y="495.5" >s..</text>
 607 | </g>
 608 | <g >
 609 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (39 samples, 0.75%)</title><rect x="500.3" y="485" width="8.9" height="15.0" fill="rgb(213,35,28)" rx="2" ry="2" />
 610 | <text  x="503.28" y="495.5" ></text>
 611 | </g>
 612 | <g >
 613 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (18 samples, 0.35%)</title><rect x="981.7" y="453" width="4.1" height="15.0" fill="rgb(205,117,34)" rx="2" ry="2" />
 614 | <text  x="984.70" y="463.5" ></text>
 615 | </g>
 616 | <g >
 617 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (13 samples, 0.25%)</title><rect x="505.3" y="469" width="2.9" height="15.0" fill="rgb(234,126,12)" rx="2" ry="2" />
 618 | <text  x="508.29" y="479.5" ></text>
 619 | </g>
 620 | <g >
 621 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="421" width="1.3" height="15.0" fill="rgb(207,2,41)" rx="2" ry="2" />
 622 | <text  x="337.05" y="431.5" ></text>
 623 | </g>
 624 | <g >
 625 | <title>std::allocator&lt;char&gt;::allocator@plt (4 samples, 0.08%)</title><rect x="46.8" y="533" width="0.9" height="15.0" fill="rgb(213,20,4)" rx="2" ry="2" />
 626 | <text  x="49.84" y="543.5" ></text>
 627 | </g>
 628 | <g >
 629 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl,  (28 samples, 0.54%)</title><rect x="261.5" y="453" width="6.4" height="15.0" fill="rgb(223,129,40)" rx="2" ry="2" />
 630 | <text  x="264.51" y="463.5" ></text>
 631 | </g>
 632 | <g >
 633 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::GetWriteCount (52 samples, 1.00%)</title><rect x="603.8" y="437" width="11.8" height="15.0" fill="rgb(218,186,13)" rx="2" ry="2" />
 634 | <text  x="606.75" y="447.5" ></text>
 635 | </g>
 636 | <g >
 637 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (33 samples, 0.64%)</title><rect x="801.1" y="437" width="7.5" height="15.0" fill="rgb(234,114,6)" rx="2" ry="2" />
 638 | <text  x="804.14" y="447.5" ></text>
 639 | </g>
 640 | <g >
 641 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (2 samples, 0.04%)</title><rect x="275.8" y="453" width="0.5" height="15.0" fill="rgb(206,10,44)" rx="2" ry="2" />
 642 | <text  x="278.84" y="463.5" ></text>
 643 | </g>
 644 | <g >
 645 | <title>cfree@GLIBC_2.2.5 (62 samples, 1.19%)</title><rect x="13.6" y="533" width="14.1" height="15.0" fill="rgb(220,37,47)" rx="2" ry="2" />
 646 | <text  x="16.64" y="543.5" ></text>
 647 | </g>
 648 | <g >
 649 | <title>[unknown] (1 samples, 0.02%)</title><rect x="10.7" y="485" width="0.2" height="15.0" fill="rgb(216,186,17)" rx="2" ry="2" />
 650 | <text  x="13.68" y="495.5" ></text>
 651 | </g>
 652 | <g >
 653 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt;::operator (3 samples, 0.06%)</title><rect x="893.2" y="469" width="0.7" height="15.0" fill="rgb(217,198,34)" rx="2" ry="2" />
 654 | <text  x="896.24" y="479.5" ></text>
 655 | </g>
 656 | <g >
 657 | <title>std::unique_lock&lt;std::mutex&gt;::lock (91 samples, 1.75%)</title><rect x="294.7" y="453" width="20.7" height="15.0" fill="rgb(211,189,21)" rx="2" ry="2" />
 658 | <text  x="297.71" y="463.5" ></text>
 659 | </g>
 660 | <g >
 661 | <title>vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::ReadPortal (2,573 samples, 49.59%)</title><rect x="92.1" y="517" width="585.1" height="15.0" fill="rgb(207,178,47)" rx="2" ry="2" />
 662 | <text  x="95.09" y="527.5" >vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::ReadPortal</text>
 663 | </g>
 664 | <g >
 665 | <title>pthread_mutex_unlock@plt (3 samples, 0.06%)</title><rect x="246.7" y="453" width="0.7" height="15.0" fill="rgb(240,32,42)" rx="2" ry="2" />
 666 | <text  x="249.73" y="463.5" ></text>
 667 | </g>
 668 | <g >
 669 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="517" width="1.4" height="15.0" fill="rgb(246,115,36)" rx="2" ry="2" />
 670 | <text  x="1191.64" y="527.5" ></text>
 671 | </g>
 672 | <g >
 673 | <title>std::unique_lock&lt;std::mutex&gt;::owns_lock (7 samples, 0.13%)</title><rect x="614.0" y="405" width="1.6" height="15.0" fill="rgb(209,22,4)" rx="2" ry="2" />
 674 | <text  x="616.99" y="415.5" ></text>
 675 | </g>
 676 | <g >
 677 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_construct&lt;char const*&gt; (8 samples, 0.15%)</title><rect x="32.3" y="533" width="1.8" height="15.0" fill="rgb(238,40,44)" rx="2" ry="2" />
 678 | <text  x="35.29" y="543.5" ></text>
 679 | </g>
 680 | <g >
 681 | <title>vtkm::exec::serial::internal::TaskTiling3D::operator (204 samples, 3.93%)</title><rect x="767.3" y="485" width="46.3" height="15.0" fill="rgb(234,57,25)" rx="2" ry="2" />
 682 | <text  x="770.26" y="495.5" >vtkm..</text>
 683 | </g>
 684 | <g >
 685 | <title>[unknown] (1 samples, 0.02%)</title><rect x="10.7" y="469" width="0.2" height="15.0" fill="rgb(215,218,17)" rx="2" ry="2" />
 686 | <text  x="13.68" y="479.5" ></text>
 687 | </g>
 688 | <g >
 689 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt;::operator (8 samples, 0.15%)</title><rect x="835.5" y="469" width="1.8" height="15.0" fill="rgb(215,24,15)" rx="2" ry="2" />
 690 | <text  x="838.48" y="479.5" ></text>
 691 | </g>
 692 | <g >
 693 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (1 samples, 0.02%)</title><rect x="352.0" y="453" width="0.2" height="15.0" fill="rgb(223,67,22)" rx="2" ry="2" />
 694 | <text  x="355.02" y="463.5" ></text>
 695 | </g>
 696 | <g >
 697 | <title>std::__shared_ptr_access&lt;bool,  (54 samples, 1.04%)</title><rect x="1000.6" y="501" width="12.3" height="15.0" fill="rgb(207,93,17)" rx="2" ry="2" />
 698 | <text  x="1003.57" y="511.5" ></text>
 699 | </g>
 700 | <g >
 701 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Get (1 samples, 0.02%)</title><rect x="931.7" y="437" width="0.2" height="15.0" fill="rgb(251,200,37)" rx="2" ry="2" />
 702 | <text  x="934.67" y="447.5" ></text>
 703 | </g>
 704 | <g >
 705 | <title>vtkm::exec::serial::internal::TaskTiling1D::operator (6 samples, 0.12%)</title><rect x="931.2" y="485" width="1.4" height="15.0" fill="rgb(224,72,13)" rx="2" ry="2" />
 706 | <text  x="934.21" y="495.5" ></text>
 707 | </g>
 708 | <g >
 709 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::get (76 samples, 1.46%)</title><rect x="437.3" y="453" width="17.3" height="15.0" fill="rgb(214,161,54)" rx="2" ry="2" />
 710 | <text  x="440.29" y="463.5" ></text>
 711 | </g>
 712 | <g >
 713 | <title>std::__shared_count&lt; (15 samples, 0.29%)</title><rect x="677.9" y="469" width="3.4" height="15.0" fill="rgb(210,19,33)" rx="2" ry="2" />
 714 | <text  x="680.89" y="479.5" ></text>
 715 | </g>
 716 | <g >
 717 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::IsControlArrayValid (75 samples, 1.45%)</title><rect x="342.9" y="469" width="17.1" height="15.0" fill="rgb(206,12,15)" rx="2" ry="2" />
 718 | <text  x="345.92" y="479.5" ></text>
 719 | </g>
 720 | <g >
 721 | <title>vtkm::cont::internal::Storage&lt;long long, vtkm::cont::StorageTagBasic&gt;::GetPortalConst (96 samples, 1.85%)</title><rect x="655.4" y="501" width="21.8" height="15.0" fill="rgb(215,84,18)" rx="2" ry="2" />
 722 | <text  x="658.37" y="511.5" >v..</text>
 723 | </g>
 724 | <g >
 725 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="309" width="0.2" height="15.0" fill="rgb(241,122,12)" rx="2" ry="2" />
 726 | <text  x="742.51" y="319.5" ></text>
 727 | </g>
 728 | <g >
 729 | <title>__pthread_mutex_unlock (64 samples, 1.23%)</title><rect x="232.2" y="453" width="14.5" height="15.0" fill="rgb(242,135,18)" rx="2" ry="2" />
 730 | <text  x="235.17" y="463.5" ></text>
 731 | </g>
 732 | <g >
 733 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="453" width="1.3" height="15.0" fill="rgb(230,41,54)" rx="2" ry="2" />
 734 | <text  x="337.05" y="463.5" ></text>
 735 | </g>
 736 | <g >
 737 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::TestAlgorithmSchedule (4,560 samples, 87.88%)</title><rect x="53.4" y="533" width="1037.0" height="15.0" fill="rgb(237,124,45)" rx="2" ry="2" />
 738 | <text  x="56.43" y="543.5" >vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::TestAlgorithmSchedule</text>
 739 | </g>
 740 | <g >
 741 | <title>[unknown] (1 samples, 0.02%)</title><rect x="1049.5" y="485" width="0.2" height="15.0" fill="rgb(233,59,38)" rx="2" ry="2" />
 742 | <text  x="1052.46" y="495.5" ></text>
 743 | </g>
 744 | <g >
 745 | <title>dl_main (1 samples, 0.02%)</title><rect x="1112.2" y="613" width="0.3" height="15.0" fill="rgb(226,72,52)" rx="2" ry="2" />
 746 | <text  x="1115.23" y="623.5" ></text>
 747 | </g>
 748 | <g >
 749 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::operator bool (8 samples, 0.15%)</title><rect x="395.7" y="485" width="1.8" height="15.0" fill="rgb(227,207,6)" rx="2" ry="2" />
 750 | <text  x="398.68" y="495.5" ></text>
 751 | </g>
 752 | <g >
 753 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Set (5 samples, 0.10%)</title><rect x="946.9" y="437" width="1.1" height="15.0" fill="rgb(238,118,6)" rx="2" ry="2" />
 754 | <text  x="949.90" y="447.5" ></text>
 755 | </g>
 756 | <g >
 757 | <title>vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::SyncControlArray (196 samples, 3.78%)</title><rect x="315.4" y="501" width="44.6" height="15.0" fill="rgb(217,179,40)" rx="2" ry="2" />
 758 | <text  x="318.40" y="511.5" >vtkm..</text>
 759 | </g>
 760 | <g >
 761 | <title>std::__shared_count&lt; (15 samples, 0.29%)</title><rect x="677.9" y="373" width="3.4" height="15.0" fill="rgb(216,16,49)" rx="2" ry="2" />
 762 | <text  x="680.89" y="383.5" ></text>
 763 | </g>
 764 | <g >
 765 | <title>[unknown] (3 samples, 0.06%)</title><rect x="680.6" y="37" width="0.7" height="15.0" fill="rgb(251,24,26)" rx="2" ry="2" />
 766 | <text  x="683.61" y="47.5" ></text>
 767 | </g>
 768 | <g >
 769 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::IteratorAt (1 samples, 0.02%)</title><rect x="931.7" y="421" width="0.2" height="15.0" fill="rgb(246,32,43)" rx="2" ry="2" />
 770 | <text  x="934.67" y="431.5" ></text>
 771 | </g>
 772 | <g >
 773 | <title>memcpy@plt (9 samples, 0.17%)</title><rect x="1120.6" y="629" width="2.1" height="15.0" fill="rgb(218,0,24)" rx="2" ry="2" />
 774 | <text  x="1123.64" y="639.5" ></text>
 775 | </g>
 776 | <g >
 777 | <title>[unknown] (5 samples, 0.10%)</title><rect x="334.3" y="341" width="1.1" height="15.0" fill="rgb(248,3,29)" rx="2" ry="2" />
 778 | <text  x="337.28" y="351.5" ></text>
 779 | </g>
 780 | <g >
 781 | <title>std::__uniq_ptr_impl&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::__uniq_ptr_impl (6 samples, 0.12%)</title><rect x="363.2" y="485" width="1.3" height="15.0" fill="rgb(216,180,12)" rx="2" ry="2" />
 782 | <text  x="366.16" y="495.5" ></text>
 783 | </g>
 784 | <g >
 785 | <title>__gthread_active_p (4 samples, 0.08%)</title><rect x="1083.1" y="437" width="0.9" height="15.0" fill="rgb(206,80,32)" rx="2" ry="2" />
 786 | <text  x="1086.12" y="447.5" ></text>
 787 | </g>
 788 | <g >
 789 | <title>vtkm::cont::internal::ArrayHandleImpl::CheckControlArrayValid (3 samples, 0.06%)</title><rect x="989.4" y="517" width="0.7" height="15.0" fill="rgb(254,149,18)" rx="2" ry="2" />
 790 | <text  x="992.43" y="527.5" ></text>
 791 | </g>
 792 | <g >
 793 | <title>std::unique_lock&lt;std::mutex&gt;::~unique_lock (113 samples, 2.18%)</title><rect x="221.7" y="501" width="25.7" height="15.0" fill="rgb(206,204,15)" rx="2" ry="2" />
 794 | <text  x="224.71" y="511.5" >s..</text>
 795 | </g>
 796 | <g >
 797 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="213" width="3.4" height="15.0" fill="rgb(242,57,51)" rx="2" ry="2" />
 798 | <text  x="680.89" y="223.5" ></text>
 799 | </g>
 800 | <g >
 801 | <title>vtkm::exec::serial::internal::FunctorTiling1DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt; &gt; &gt; (89 samples, 1.72%)</title><rect x="815.0" y="469" width="20.2" height="15.0" fill="rgb(219,195,33)" rx="2" ry="2" />
 802 | <text  x="818.01" y="479.5" ></text>
 803 | </g>
 804 | <g >
 805 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel::operator (5 samples, 0.10%)</title><rect x="931.4" y="453" width="1.2" height="15.0" fill="rgb(246,87,36)" rx="2" ry="2" />
 806 | <text  x="934.44" y="463.5" ></text>
 807 | </g>
 808 | <g >
 809 | <title>std::distance&lt;long long const*&gt; (30 samples, 0.58%)</title><rect x="670.4" y="469" width="6.8" height="15.0" fill="rgb(217,51,52)" rx="2" ry="2" />
 810 | <text  x="673.38" y="479.5" ></text>
 811 | </g>
 812 | <g >
 813 | <title>vtkm::exec::serial::internal::TaskTiling1D::operator (95 samples, 1.83%)</title><rect x="813.6" y="485" width="21.6" height="15.0" fill="rgb(221,150,52)" rx="2" ry="2" />
 814 | <text  x="816.65" y="495.5" >v..</text>
 815 | </g>
 816 | <g >
 817 | <title>__gnu_cxx::__atomic_add_dispatch (73 samples, 1.41%)</title><rect x="536.0" y="421" width="16.6" height="15.0" fill="rgb(238,159,49)" rx="2" ry="2" />
 818 | <text  x="538.99" y="431.5" ></text>
 819 | </g>
 820 | <g >
 821 | <title>__gthread_mutex_unlock (1 samples, 0.02%)</title><rect x="229.9" y="469" width="0.2" height="15.0" fill="rgb(249,101,38)" rx="2" ry="2" />
 822 | <text  x="232.90" y="479.5" ></text>
 823 | </g>
 824 | <g >
 825 | <title>vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::SyncControlArray (3 samples, 0.06%)</title><rect x="677.2" y="517" width="0.7" height="15.0" fill="rgb(209,42,41)" rx="2" ry="2" />
 826 | <text  x="680.20" y="527.5" ></text>
 827 | </g>
 828 | <g >
 829 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::get (12 samples, 0.23%)</title><rect x="429.1" y="469" width="2.7" height="15.0" fill="rgb(218,116,12)" rx="2" ry="2" />
 830 | <text  x="432.11" y="479.5" ></text>
 831 | </g>
 832 | <g >
 833 | <title>std::_Sp_counted_base&lt; (83 samples, 1.60%)</title><rect x="202.4" y="453" width="18.9" height="15.0" fill="rgb(230,65,0)" rx="2" ry="2" />
 834 | <text  x="205.38" y="463.5" ></text>
 835 | </g>
 836 | <g >
 837 | <title>std::__shared_count&lt; (111 samples, 2.14%)</title><rect x="1058.8" y="469" width="25.2" height="15.0" fill="rgb(248,177,47)" rx="2" ry="2" />
 838 | <text  x="1061.79" y="479.5" >s..</text>
 839 | </g>
 840 | <g >
 841 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel::operator (93 samples, 1.79%)</title><rect x="780.0" y="437" width="21.1" height="15.0" fill="rgb(228,111,23)" rx="2" ry="2" />
 842 | <text  x="782.99" y="447.5" ></text>
 843 | </g>
 844 | <g >
 845 | <title>__gnu_cxx::__exchange_and_add_dispatch (4 samples, 0.08%)</title><rect x="1064.7" y="453" width="0.9" height="15.0" fill="rgb(242,68,12)" rx="2" ry="2" />
 846 | <text  x="1067.70" y="463.5" ></text>
 847 | </g>
 848 | <g >
 849 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="341" width="0.3" height="15.0" fill="rgb(225,17,10)" rx="2" ry="2" />
 850 | <text  x="528.53" y="351.5" ></text>
 851 | </g>
 852 | <g >
 853 | <title>vtkm::Vec&lt;long long, 3&gt;::Vec (1 samples, 0.02%)</title><rect x="938.0" y="453" width="0.3" height="15.0" fill="rgb(228,186,24)" rx="2" ry="2" />
 854 | <text  x="941.04" y="463.5" ></text>
 855 | </g>
 856 | <g >
 857 | <title>[UnitTests_vtkm_cont_serial_testing] (12 samples, 0.23%)</title><rect x="10.9" y="533" width="2.7" height="15.0" fill="rgb(235,115,45)" rx="2" ry="2" />
 858 | <text  x="13.91" y="543.5" ></text>
 859 | </g>
 860 | <g >
 861 | <title>std::shared_ptr&lt;bool&gt;::shared_ptr (181 samples, 3.49%)</title><rect x="512.6" y="485" width="41.1" height="15.0" fill="rgb(219,25,5)" rx="2" ry="2" />
 862 | <text  x="515.56" y="495.5" >std..</text>
 863 | </g>
 864 | <g >
 865 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="149" width="3.4" height="15.0" fill="rgb(225,163,5)" rx="2" ry="2" />
 866 | <text  x="680.89" y="159.5" ></text>
 867 | </g>
 868 | <g >
 869 | <title>operator delete (13 samples, 0.25%)</title><rect x="29.3" y="533" width="3.0" height="15.0" fill="rgb(220,85,13)" rx="2" ry="2" />
 870 | <text  x="32.33" y="543.5" ></text>
 871 | </g>
 872 | <g >
 873 | <title>std::uniform_int_distribution&lt;long long&gt;::operator (7 samples, 0.13%)</title><rect x="89.1" y="501" width="1.6" height="15.0" fill="rgb(213,221,17)" rx="2" ry="2" />
 874 | <text  x="92.14" y="511.5" ></text>
 875 | </g>
 876 | <g >
 877 | <title>[unknown] (7 samples, 0.13%)</title><rect x="679.7" y="101" width="1.6" height="15.0" fill="rgb(247,155,0)" rx="2" ry="2" />
 878 | <text  x="682.71" y="111.5" ></text>
 879 | </g>
 880 | <g >
 881 | <title>std::_Head_base&lt;1ul, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt;, true&gt;::_Head_base (2 samples, 0.04%)</title><rect x="386.8" y="421" width="0.5" height="15.0" fill="rgb(216,103,23)" rx="2" ry="2" />
 882 | <text  x="389.81" y="431.5" ></text>
 883 | </g>
 884 | <g >
 885 | <title>std::shared_ptr&lt;bool&gt;::~shared_ptr (151 samples, 2.91%)</title><rect x="1049.7" y="501" width="34.3" height="15.0" fill="rgb(252,63,43)" rx="2" ry="2" />
 886 | <text  x="1052.69" y="511.5" >st..</text>
 887 | </g>
 888 | <g >
 889 | <title>vtkm::exec::serial::internal::TaskTiling1D::operator (254 samples, 4.89%)</title><rect x="835.5" y="485" width="57.7" height="15.0" fill="rgb(206,110,19)" rx="2" ry="2" />
 890 | <text  x="838.48" y="495.5" >vtkm::..</text>
 891 | </g>
 892 | <g >
 893 | <title>__gnu_cxx::__atomic_add (47 samples, 0.91%)</title><rect x="644.0" y="405" width="10.7" height="15.0" fill="rgb(237,9,44)" rx="2" ry="2" />
 894 | <text  x="647.00" y="415.5" ></text>
 895 | </g>
 896 | <g >
 897 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="373" width="0.3" height="15.0" fill="rgb(247,53,46)" rx="2" ry="2" />
 898 | <text  x="770.03" y="383.5" ></text>
 899 | </g>
 900 | <g >
 901 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="325" width="0.3" height="15.0" fill="rgb(246,138,44)" rx="2" ry="2" />
 902 | <text  x="770.03" y="335.5" ></text>
 903 | </g>
 904 | <g >
 905 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt;::operator (134 samples, 2.58%)</title><rect x="897.1" y="453" width="30.5" height="15.0" fill="rgb(229,2,53)" rx="2" ry="2" />
 906 | <text  x="900.10" y="463.5" >vt..</text>
 907 | </g>
 908 | <g >
 909 | <title>std::__shared_count&lt; (9 samples, 0.17%)</title><rect x="515.3" y="469" width="2.0" height="15.0" fill="rgb(252,55,21)" rx="2" ry="2" />
 910 | <text  x="518.29" y="479.5" ></text>
 911 | </g>
 912 | <g >
 913 | <title>std::_Sp_counted_base&lt; (81 samples, 1.56%)</title><rect x="1065.6" y="453" width="18.4" height="15.0" fill="rgb(221,41,10)" rx="2" ry="2" />
 914 | <text  x="1068.61" y="463.5" ></text>
 915 | </g>
 916 | <g >
 917 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Set (3 samples, 0.06%)</title><rect x="931.9" y="437" width="0.7" height="15.0" fill="rgb(215,202,54)" rx="2" ry="2" />
 918 | <text  x="934.90" y="447.5" ></text>
 919 | </g>
 920 | <g >
 921 | <title>vtkm::cont::testing::Testing::Run&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::TestAll&gt; (4,752 samples, 91.58%)</title><rect x="10.9" y="565" width="1080.6" height="15.0" fill="rgb(225,90,13)" rx="2" ry="2" />
 922 | <text  x="13.91" y="575.5" >vtkm::cont::testing::Testing::Run&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::TestAll&gt;</text>
 923 | </g>
 924 | <g >
 925 | <title>std::condition_variable::wait&lt;vtkm::cont::internal::ArrayHandleImpl::WaitToRead (9 samples, 0.17%)</title><rect x="185.1" y="501" width="2.0" height="15.0" fill="rgb(210,79,19)" rx="2" ry="2" />
 926 | <text  x="188.10" y="511.5" ></text>
 927 | </g>
 928 | <g >
 929 | <title>vtkm::exec::serial::internal::FunctorTiling1DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel&gt; (6 samples, 0.12%)</title><rect x="931.2" y="469" width="1.4" height="15.0" fill="rgb(208,169,37)" rx="2" ry="2" />
 930 | <text  x="934.21" y="479.5" ></text>
 931 | </g>
 932 | <g >
 933 | <title>[unknown] (5,041 samples, 97.15%)</title><rect x="10.2" y="645" width="1146.4" height="15.0" fill="rgb(246,179,14)" rx="2" ry="2" />
 934 | <text  x="13.23" y="655.5" >[unknown]</text>
 935 | </g>
 936 | <g >
 937 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt; &gt;::operator (72 samples, 1.39%)</title><rect x="818.9" y="453" width="16.3" height="15.0" fill="rgb(225,93,28)" rx="2" ry="2" />
 938 | <text  x="821.88" y="463.5" ></text>
 939 | </g>
 940 | <g >
 941 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Get (33 samples, 0.64%)</title><rect x="951.9" y="421" width="7.5" height="15.0" fill="rgb(226,142,52)" rx="2" ry="2" />
 942 | <text  x="954.91" y="431.5" ></text>
 943 | </g>
 944 | <g >
 945 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::IteratorAt (83 samples, 1.60%)</title><rect x="720.9" y="421" width="18.8" height="15.0" fill="rgb(226,69,26)" rx="2" ry="2" />
 946 | <text  x="723.87" y="431.5" ></text>
 947 | </g>
 948 | <g >
 949 | <title>std::_Head_base&lt;1ul, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt;, true&gt;::_Head_base (2 samples, 0.04%)</title><rect x="389.8" y="405" width="0.4" height="15.0" fill="rgb(226,221,28)" rx="2" ry="2" />
 950 | <text  x="392.76" y="415.5" ></text>
 951 | </g>
 952 | <g >
 953 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Set (171 samples, 3.30%)</title><rect x="854.4" y="437" width="38.8" height="15.0" fill="rgb(221,57,48)" rx="2" ry="2" />
 954 | <text  x="857.35" y="447.5" >vtk..</text>
 955 | </g>
 956 | <g >
 957 | <title>std::_Head_base&lt;0ul, vtkm::cont::Token::InternalStruct*, false&gt;::_M_head (3 samples, 0.06%)</title><rect x="449.3" y="389" width="0.7" height="15.0" fill="rgb(218,186,34)" rx="2" ry="2" />
 958 | <text  x="452.34" y="399.5" ></text>
 959 | </g>
 960 | <g >
 961 | <title>[unknown] (9 samples, 0.17%)</title><rect x="918.5" y="325" width="2.0" height="15.0" fill="rgb(220,82,26)" rx="2" ry="2" />
 962 | <text  x="921.48" y="335.5" ></text>
 963 | </g>
 964 | <g >
 965 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="421" width="0.3" height="15.0" fill="rgb(232,66,37)" rx="2" ry="2" />
 966 | <text  x="770.03" y="431.5" ></text>
 967 | </g>
 968 | <g >
 969 | <title>vtkm::exec::serial::internal::FunctorTiling1DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel&gt; (368 samples, 7.09%)</title><rect x="683.6" y="469" width="83.7" height="15.0" fill="rgb(210,146,52)" rx="2" ry="2" />
 970 | <text  x="686.57" y="479.5" >vtkm::exe..</text>
 971 | </g>
 972 | <g >
 973 | <title>__pthread_mutex_lock (54 samples, 1.04%)</title><rect x="302.9" y="421" width="12.3" height="15.0" fill="rgb(220,8,4)" rx="2" ry="2" />
 974 | <text  x="305.90" y="431.5" ></text>
 975 | </g>
 976 | <g >
 977 | <title>std::__shared_ptr&lt;bool,  (17 samples, 0.33%)</title><rect x="480.3" y="437" width="3.8" height="15.0" fill="rgb(250,116,43)" rx="2" ry="2" />
 978 | <text  x="483.27" y="447.5" ></text>
 979 | </g>
 980 | <g >
 981 | <title>std::__shared_ptr&lt;bool,  (130 samples, 2.51%)</title><rect x="191.7" y="485" width="29.6" height="15.0" fill="rgb(235,113,13)" rx="2" ry="2" />
 982 | <text  x="194.70" y="495.5" >st..</text>
 983 | </g>
 984 | <g >
 985 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::IteratorAt (18 samples, 0.35%)</title><rect x="955.3" y="405" width="4.1" height="15.0" fill="rgb(228,130,8)" rx="2" ry="2" />
 986 | <text  x="958.32" y="415.5" ></text>
 987 | </g>
 988 | <g >
 989 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="389" width="0.3" height="15.0" fill="rgb(235,99,8)" rx="2" ry="2" />
 990 | <text  x="770.03" y="399.5" ></text>
 991 | </g>
 992 | <g >
 993 | <title>vtkm::cont::internal::ArrayHandleImpl::WaitToRead (6 samples, 0.12%)</title><rect x="616.5" y="485" width="1.4" height="15.0" fill="rgb(238,81,44)" rx="2" ry="2" />
 994 | <text  x="619.49" y="495.5" ></text>
 995 | </g>
 996 | <g >
 997 | <title>std::__uniq_ptr_impl&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_ptr (59 samples, 1.14%)</title><rect x="408.4" y="469" width="13.4" height="15.0" fill="rgb(223,178,15)" rx="2" ry="2" />
 998 | <text  x="411.41" y="479.5" ></text>
 999 | </g>
1000 | <g >
1001 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::Schedule&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt; &gt; (421 samples, 8.11%)</title><rect x="835.2" y="517" width="95.8" height="15.0" fill="rgb(251,163,52)" rx="2" ry="2" />
1002 | <text  x="838.25" y="527.5" >vtkm::cont:..</text>
1003 | </g>
1004 | <g >
1005 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (5 samples, 0.10%)</title><rect x="572.4" y="469" width="1.1" height="15.0" fill="rgb(211,154,12)" rx="2" ry="2" />
1006 | <text  x="575.37" y="479.5" ></text>
1007 | </g>
1008 | <g >
1009 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (31 samples, 0.60%)</title><rect x="920.5" y="437" width="7.1" height="15.0" fill="rgb(235,44,44)" rx="2" ry="2" />
1010 | <text  x="923.53" y="447.5" ></text>
1011 | </g>
1012 | <g >
1013 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::IteratorAt (3 samples, 0.06%)</title><rect x="959.4" y="421" width="0.7" height="15.0" fill="rgb(236,188,22)" rx="2" ry="2" />
1014 | <text  x="962.41" y="431.5" ></text>
1015 | </g>
1016 | <g >
1017 | <title>std::__addressof&lt;std::mutex&gt; (5 samples, 0.10%)</title><rect x="292.7" y="453" width="1.1" height="15.0" fill="rgb(226,136,1)" rx="2" ry="2" />
1018 | <text  x="295.66" y="463.5" ></text>
1019 | </g>
1020 | <g >
1021 | <title>__gnu_cxx::__atomic_add_dispatch (4 samples, 0.08%)</title><rect x="640.8" y="437" width="0.9" height="15.0" fill="rgb(244,41,32)" rx="2" ry="2" />
1022 | <text  x="643.82" y="447.5" ></text>
1023 | </g>
1024 | <g >
1025 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (37 samples, 0.71%)</title><rect x="317.5" y="485" width="8.4" height="15.0" fill="rgb(209,143,14)" rx="2" ry="2" />
1026 | <text  x="320.45" y="495.5" ></text>
1027 | </g>
1028 | <g >
1029 | <title>std::_Sp_counted_base&lt; (15 samples, 0.29%)</title><rect x="677.9" y="357" width="3.4" height="15.0" fill="rgb(230,10,0)" rx="2" ry="2" />
1030 | <text  x="680.89" y="367.5" ></text>
1031 | </g>
1032 | <g >
1033 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (22 samples, 0.42%)</title><rect x="808.6" y="453" width="5.0" height="15.0" fill="rgb(228,86,16)" rx="2" ry="2" />
1034 | <text  x="811.64" y="463.5" ></text>
1035 | </g>
1036 | <g >
1037 | <title>std::_Sp_counted_base&lt; (6 samples, 0.12%)</title><rect x="630.8" y="453" width="1.4" height="15.0" fill="rgb(217,31,18)" rx="2" ry="2" />
1038 | <text  x="633.81" y="463.5" ></text>
1039 | </g>
1040 | <g >
1041 | <title>std::unique_lock&lt;std::mutex&gt;::unlock (2 samples, 0.04%)</title><rect x="221.3" y="501" width="0.4" height="15.0" fill="rgb(254,62,15)" rx="2" ry="2" />
1042 | <text  x="224.26" y="511.5" ></text>
1043 | </g>
1044 | <g >
1045 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="373" width="0.2" height="15.0" fill="rgb(252,169,52)" rx="2" ry="2" />
1046 | <text  x="742.51" y="383.5" ></text>
1047 | </g>
1048 | <g >
1049 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::Schedule&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel&gt; (241 samples, 4.64%)</title><rect x="931.0" y="517" width="54.8" height="15.0" fill="rgb(233,144,13)" rx="2" ry="2" />
1050 | <text  x="933.99" y="527.5" >vtkm:..</text>
1051 | </g>
1052 | <g >
1053 | <title>std::get&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (35 samples, 0.67%)</title><rect x="413.9" y="453" width="7.9" height="15.0" fill="rgb(226,181,3)" rx="2" ry="2" />
1054 | <text  x="416.87" y="463.5" ></text>
1055 | </g>
1056 | <g >
1057 | <title>__posix_memalign (1 samples, 0.02%)</title><rect x="1156.6" y="645" width="0.2" height="15.0" fill="rgb(208,54,46)" rx="2" ry="2" />
1058 | <text  x="1159.57" y="655.5" ></text>
1059 | </g>
1060 | <g >
1061 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="357" width="1.3" height="15.0" fill="rgb(216,79,16)" rx="2" ry="2" />
1062 | <text  x="337.05" y="367.5" ></text>
1063 | </g>
1064 | <g >
1065 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (194 samples, 3.74%)</title><rect x="116.0" y="501" width="44.1" height="15.0" fill="rgb(239,26,1)" rx="2" ry="2" />
1066 | <text  x="118.97" y="511.5" >std:..</text>
1067 | </g>
1068 | <g >
1069 | <title>std::_Sp_counted_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct*,  (15 samples, 0.29%)</title><rect x="677.9" y="341" width="3.4" height="15.0" fill="rgb(227,113,39)" rx="2" ry="2" />
1070 | <text  x="680.89" y="351.5" ></text>
1071 | </g>
1072 | <g >
1073 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (44 samples, 0.85%)</title><rect x="460.3" y="485" width="10.0" height="15.0" fill="rgb(235,31,31)" rx="2" ry="2" />
1074 | <text  x="463.26" y="495.5" ></text>
1075 | </g>
1076 | <g >
1077 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="485" width="1.4" height="15.0" fill="rgb(242,13,4)" rx="2" ry="2" />
1078 | <text  x="1191.64" y="495.5" ></text>
1079 | </g>
1080 | <g >
1081 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Set (19 samples, 0.37%)</title><rect x="796.8" y="421" width="4.3" height="15.0" fill="rgb(234,19,10)" rx="2" ry="2" />
1082 | <text  x="799.82" y="431.5" ></text>
1083 | </g>
1084 | <g >
1085 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Set (4 samples, 0.08%)</title><rect x="843.7" y="453" width="0.9" height="15.0" fill="rgb(225,113,39)" rx="2" ry="2" />
1086 | <text  x="846.66" y="463.5" ></text>
1087 | </g>
1088 | <g >
1089 | <title>[unknown] (16 samples, 0.31%)</title><rect x="916.9" y="389" width="3.6" height="15.0" fill="rgb(246,134,29)" rx="2" ry="2" />
1090 | <text  x="919.89" y="399.5" ></text>
1091 | </g>
1092 | <g >
1093 | <title>vtkm::exec::serial::internal::FunctorTiling1DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt; &gt; (246 samples, 4.74%)</title><rect x="837.3" y="469" width="55.9" height="15.0" fill="rgb(207,108,22)" rx="2" ry="2" />
1094 | <text  x="840.30" y="479.5" >vtkm:..</text>
1095 | </g>
1096 | <g >
1097 | <title>std::unique_lock&lt;std::mutex&gt;::unique_lock (4 samples, 0.08%)</title><rect x="269.5" y="485" width="0.9" height="15.0" fill="rgb(219,79,45)" rx="2" ry="2" />
1098 | <text  x="272.47" y="495.5" ></text>
1099 | </g>
1100 | <g >
1101 | <title>vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::GetLock (1 samples, 0.02%)</title><rect x="91.9" y="517" width="0.2" height="15.0" fill="rgb(208,41,1)" rx="2" ry="2" />
1102 | <text  x="94.87" y="527.5" ></text>
1103 | </g>
1104 | <g >
1105 | <title>std::tuple&lt;vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::tuple&lt;vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt;, true&gt; (8 samples, 0.15%)</title><rect x="390.9" y="469" width="1.8" height="15.0" fill="rgb(247,155,12)" rx="2" ry="2" />
1106 | <text  x="393.90" y="479.5" ></text>
1107 | </g>
1108 | <g >
1109 | <title>[unknown] (30 samples, 0.58%)</title><rect x="886.4" y="341" width="6.8" height="15.0" fill="rgb(211,92,24)" rx="2" ry="2" />
1110 | <text  x="889.42" y="351.5" ></text>
1111 | </g>
1112 | <g >
1113 | <title>std::_Tuple_impl&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_head (9 samples, 0.17%)</title><rect x="450.0" y="389" width="2.1" height="15.0" fill="rgb(232,97,24)" rx="2" ry="2" />
1114 | <text  x="453.03" y="399.5" ></text>
1115 | </g>
1116 | <g >
1117 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt;::operator (52 samples, 1.00%)</title><rect x="908.7" y="437" width="11.8" height="15.0" fill="rgb(212,32,53)" rx="2" ry="2" />
1118 | <text  x="911.70" y="447.5" ></text>
1119 | </g>
1120 | <g >
1121 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="437" width="0.3" height="15.0" fill="rgb(235,166,18)" rx="2" ry="2" />
1122 | <text  x="528.53" y="447.5" ></text>
1123 | </g>
1124 | <g >
1125 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="245" width="3.4" height="15.0" fill="rgb(229,208,19)" rx="2" ry="2" />
1126 | <text  x="680.89" y="255.5" ></text>
1127 | </g>
1128 | <g >
1129 | <title>[unknown] (12 samples, 0.23%)</title><rect x="917.8" y="373" width="2.7" height="15.0" fill="rgb(225,219,29)" rx="2" ry="2" />
1130 | <text  x="920.80" y="383.5" ></text>
1131 | </g>
1132 | <g >
1133 | <title>[unknown] (6 samples, 0.12%)</title><rect x="679.9" y="69" width="1.4" height="15.0" fill="rgb(214,195,38)" rx="2" ry="2" />
1134 | <text  x="682.93" y="79.5" ></text>
1135 | </g>
1136 | <g >
1137 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (28 samples, 0.54%)</title><rect x="573.5" y="469" width="6.4" height="15.0" fill="rgb(208,149,32)" rx="2" ry="2" />
1138 | <text  x="576.51" y="479.5" ></text>
1139 | </g>
1140 | <g >
1141 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::Schedule&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt; &gt; &gt; (95 samples, 1.83%)</title><rect x="813.6" y="517" width="21.6" height="15.0" fill="rgb(233,6,1)" rx="2" ry="2" />
1142 | <text  x="816.65" y="527.5" >v..</text>
1143 | </g>
1144 | <g >
1145 | <title>std::_Tuple_impl&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_Tuple_impl (4 samples, 0.08%)</title><rect x="375.4" y="453" width="0.9" height="15.0" fill="rgb(207,87,51)" rx="2" ry="2" />
1146 | <text  x="378.44" y="463.5" ></text>
1147 | </g>
1148 | <g >
1149 | <title>vtkm::cont::internal::StorageBasicBase::~StorageBasicBase (15 samples, 0.29%)</title><rect x="677.9" y="309" width="3.4" height="15.0" fill="rgb(215,89,5)" rx="2" ry="2" />
1150 | <text  x="680.89" y="319.5" ></text>
1151 | </g>
1152 | <g >
1153 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (49 samples, 0.94%)</title><rect x="568.7" y="485" width="11.2" height="15.0" fill="rgb(212,206,41)" rx="2" ry="2" />
1154 | <text  x="571.73" y="495.5" ></text>
1155 | </g>
1156 | <g >
1157 | <title>[unknown] (15 samples, 0.29%)</title><rect x="889.8" y="309" width="3.4" height="15.0" fill="rgb(251,147,37)" rx="2" ry="2" />
1158 | <text  x="892.83" y="319.5" ></text>
1159 | </g>
1160 | <g >
1161 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="181" width="3.4" height="15.0" fill="rgb(247,211,2)" rx="2" ry="2" />
1162 | <text  x="680.89" y="191.5" ></text>
1163 | </g>
1164 | <g >
1165 | <title>vtkm::exec::serial::internal::FunctorTiling3DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt; &gt; (161 samples, 3.10%)</title><rect x="894.4" y="469" width="36.6" height="15.0" fill="rgb(254,32,54)" rx="2" ry="2" />
1166 | <text  x="897.37" y="479.5" >vtk..</text>
1167 | </g>
1168 | <g >
1169 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="309" width="0.3" height="15.0" fill="rgb(242,85,54)" rx="2" ry="2" />
1170 | <text  x="528.53" y="319.5" ></text>
1171 | </g>
1172 | <g >
1173 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (2 samples, 0.04%)</title><rect x="499.8" y="485" width="0.5" height="15.0" fill="rgb(205,162,46)" rx="2" ry="2" />
1174 | <text  x="502.83" y="495.5" ></text>
1175 | </g>
1176 | <g >
1177 | <title>__gthread_active_p (5 samples, 0.10%)</title><rect x="552.6" y="421" width="1.1" height="15.0" fill="rgb(225,195,13)" rx="2" ry="2" />
1178 | <text  x="555.59" y="431.5" ></text>
1179 | </g>
1180 | <g >
1181 | <title>vtkm::cont::internal::Storage&lt;long long, vtkm::cont::StorageTagBasic&gt;::GetPortalConst (9 samples, 0.17%)</title><rect x="1084.5" y="517" width="2.0" height="15.0" fill="rgb(230,97,28)" rx="2" ry="2" />
1182 | <text  x="1087.48" y="527.5" ></text>
1183 | </g>
1184 | <g >
1185 | <title>perf (6 samples, 0.12%)</title><rect x="1188.6" y="661" width="1.4" height="15.0" fill="rgb(243,213,26)" rx="2" ry="2" />
1186 | <text  x="1191.64" y="671.5" ></text>
1187 | </g>
1188 | <g >
1189 | <title>malloc (44 samples, 0.85%)</title><rect x="1178.6" y="645" width="10.0" height="15.0" fill="rgb(236,167,19)" rx="2" ry="2" />
1190 | <text  x="1181.63" y="655.5" ></text>
1191 | </g>
1192 | <g >
1193 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl,  (17 samples, 0.33%)</title><rect x="322.0" y="453" width="3.9" height="15.0" fill="rgb(212,201,2)" rx="2" ry="2" />
1194 | <text  x="325.00" y="463.5" ></text>
1195 | </g>
1196 | <g >
1197 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (2 samples, 0.04%)</title><rect x="462.8" y="469" width="0.4" height="15.0" fill="rgb(215,40,42)" rx="2" ry="2" />
1198 | <text  x="465.76" y="479.5" ></text>
1199 | </g>
1200 | <g >
1201 | <title>std::_Sp_counted_base&lt; (59 samples, 1.14%)</title><rect x="641.7" y="437" width="13.4" height="15.0" fill="rgb(209,206,38)" rx="2" ry="2" />
1202 | <text  x="644.73" y="447.5" ></text>
1203 | </g>
1204 | <g >
1205 | <title>std::__shared_ptr_access&lt;bool,  (23 samples, 0.44%)</title><rect x="346.8" y="437" width="5.2" height="15.0" fill="rgb(247,94,4)" rx="2" ry="2" />
1206 | <text  x="349.79" y="447.5" ></text>
1207 | </g>
1208 | <g >
1209 | <title>[unknown] (1 samples, 0.02%)</title><rect x="10.0" y="629" width="0.2" height="15.0" fill="rgb(214,140,1)" rx="2" ry="2" />
1210 | <text  x="13.00" y="639.5" ></text>
1211 | </g>
1212 | <g >
1213 | <title>[unknown] (1 samples, 0.02%)</title><rect x="835.0" y="389" width="0.2" height="15.0" fill="rgb(205,37,54)" rx="2" ry="2" />
1214 | <text  x="838.02" y="399.5" ></text>
1215 | </g>
1216 | <g >
1217 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel::operator (166 samples, 3.20%)</title><rect x="770.9" y="453" width="37.7" height="15.0" fill="rgb(210,128,22)" rx="2" ry="2" />
1218 | <text  x="773.89" y="463.5" >vtk..</text>
1219 | </g>
1220 | <g >
1221 | <title>[unknown] (1 samples, 0.02%)</title><rect x="1156.8" y="613" width="0.2" height="15.0" fill="rgb(243,181,48)" rx="2" ry="2" />
1222 | <text  x="1159.80" y="623.5" ></text>
1223 | </g>
1224 | <g >
1225 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (21 samples, 0.40%)</title><rect x="79.6" y="517" width="4.8" height="15.0" fill="rgb(220,124,41)" rx="2" ry="2" />
1226 | <text  x="82.59" y="527.5" ></text>
1227 | </g>
1228 | <g >
1229 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (37 samples, 0.71%)</title><rect x="592.8" y="437" width="8.5" height="15.0" fill="rgb(244,221,35)" rx="2" ry="2" />
1230 | <text  x="595.84" y="447.5" ></text>
1231 | </g>
1232 | <g >
1233 | <title>cfree@GLIBC_2.2.5 (95 samples, 1.83%)</title><rect x="1157.0" y="645" width="21.6" height="15.0" fill="rgb(229,95,14)" rx="2" ry="2" />
1234 | <text  x="1160.03" y="655.5" >c..</text>
1235 | </g>
1236 | <g >
1237 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl,  (3 samples, 0.06%)</title><rect x="257.4" y="469" width="0.7" height="15.0" fill="rgb(209,117,1)" rx="2" ry="2" />
1238 | <text  x="260.42" y="479.5" ></text>
1239 | </g>
1240 | <g >
1241 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::GetNumberOfValues (2 samples, 0.04%)</title><rect x="720.4" y="421" width="0.5" height="15.0" fill="rgb(242,73,50)" rx="2" ry="2" />
1242 | <text  x="723.41" y="431.5" ></text>
1243 | </g>
1244 | <g >
1245 | <title>std::_Sp_counted_base&lt; (82 samples, 1.58%)</title><rect x="535.1" y="437" width="18.6" height="15.0" fill="rgb(226,166,38)" rx="2" ry="2" />
1246 | <text  x="538.08" y="447.5" ></text>
1247 | </g>
1248 | <g >
1249 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_construct&lt;char const*&gt; (78 samples, 1.50%)</title><rect x="1126.1" y="629" width="17.7" height="15.0" fill="rgb(254,124,1)" rx="2" ry="2" />
1250 | <text  x="1129.10" y="639.5" ></text>
1251 | </g>
1252 | <g >
1253 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (23 samples, 0.44%)</title><rect x="465.0" y="453" width="5.3" height="15.0" fill="rgb(243,186,15)" rx="2" ry="2" />
1254 | <text  x="468.04" y="463.5" ></text>
1255 | </g>
1256 | <g >
1257 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel::operator (107 samples, 2.06%)</title><rect x="948.0" y="437" width="24.4" height="15.0" fill="rgb(245,109,52)" rx="2" ry="2" />
1258 | <text  x="951.04" y="447.5" >v..</text>
1259 | </g>
1260 | <g >
1261 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="197" width="3.4" height="15.0" fill="rgb(252,147,8)" rx="2" ry="2" />
1262 | <text  x="680.89" y="207.5" ></text>
1263 | </g>
1264 | <g >
1265 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::IteratorAt (6 samples, 0.12%)</title><rect x="795.5" y="421" width="1.3" height="15.0" fill="rgb(212,214,5)" rx="2" ry="2" />
1266 | <text  x="798.45" y="431.5" ></text>
1267 | </g>
1268 | <g >
1269 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (20 samples, 0.39%)</title><rect x="278.1" y="437" width="4.6" height="15.0" fill="rgb(222,127,44)" rx="2" ry="2" />
1270 | <text  x="281.11" y="447.5" ></text>
1271 | </g>
1272 | <g >
1273 | <title>vtkm::exec::serial::internal::TaskTiling1D::operator (377 samples, 7.27%)</title><rect x="681.5" y="485" width="85.8" height="15.0" fill="rgb(222,132,9)" rx="2" ry="2" />
1274 | <text  x="684.52" y="495.5" >vtkm::exec..</text>
1275 | </g>
1276 | <g >
1277 | <title>std::unique_lock&lt;std::mutex&gt;::owns_lock (1 samples, 0.02%)</title><rect x="484.1" y="469" width="0.3" height="15.0" fill="rgb(231,165,40)" rx="2" ry="2" />
1278 | <text  x="487.14" y="479.5" ></text>
1279 | </g>
1280 | <g >
1281 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="325" width="0.3" height="15.0" fill="rgb(207,123,28)" rx="2" ry="2" />
1282 | <text  x="528.53" y="335.5" ></text>
1283 | </g>
1284 | <g >
1285 | <title>[unknown] (21 samples, 0.40%)</title><rect x="915.8" y="405" width="4.7" height="15.0" fill="rgb(243,77,15)" rx="2" ry="2" />
1286 | <text  x="918.75" y="415.5" ></text>
1287 | </g>
1288 | <g >
1289 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="565" width="0.7" height="15.0" fill="rgb(247,20,10)" rx="2" ry="2" />
1290 | <text  x="13.23" y="575.5" ></text>
1291 | </g>
1292 | <g >
1293 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::IteratorAt (3 samples, 0.06%)</title><rect x="1046.3" y="501" width="0.7" height="15.0" fill="rgb(238,58,1)" rx="2" ry="2" />
1294 | <text  x="1049.28" y="511.5" ></text>
1295 | </g>
1296 | <g >
1297 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="533" width="1.4" height="15.0" fill="rgb(242,139,33)" rx="2" ry="2" />
1298 | <text  x="1191.64" y="543.5" ></text>
1299 | </g>
1300 | <g >
1301 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::GetControlArrayValidPointer (3 samples, 0.06%)</title><rect x="990.3" y="517" width="0.7" height="15.0" fill="rgb(234,90,51)" rx="2" ry="2" />
1302 | <text  x="993.34" y="527.5" ></text>
1303 | </g>
1304 | <g >
1305 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (1 samples, 0.02%)</title><rect x="336.3" y="453" width="0.3" height="15.0" fill="rgb(228,117,28)" rx="2" ry="2" />
1306 | <text  x="339.32" y="463.5" ></text>
1307 | </g>
1308 | <g >
1309 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (8 samples, 0.15%)</title><rect x="586.2" y="453" width="1.9" height="15.0" fill="rgb(232,220,29)" rx="2" ry="2" />
1310 | <text  x="589.24" y="463.5" ></text>
1311 | </g>
1312 | <g >
1313 | <title>std::__shared_ptr&lt;bool,  (19 samples, 0.37%)</title><rect x="347.7" y="421" width="4.3" height="15.0" fill="rgb(252,116,4)" rx="2" ry="2" />
1314 | <text  x="350.70" y="431.5" ></text>
1315 | </g>
1316 | <g >
1317 | <title>std::__get_helper&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (5 samples, 0.10%)</title><rect x="412.7" y="453" width="1.2" height="15.0" fill="rgb(250,17,46)" rx="2" ry="2" />
1318 | <text  x="415.73" y="463.5" ></text>
1319 | </g>
1320 | <g >
1321 | <title>main (4,752 samples, 91.58%)</title><rect x="10.9" y="613" width="1080.6" height="15.0" fill="rgb(210,207,8)" rx="2" ry="2" />
1322 | <text  x="13.91" y="623.5" >main</text>
1323 | </g>
1324 | <g >
1325 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::Get (70 samples, 1.35%)</title><rect x="1030.4" y="501" width="15.9" height="15.0" fill="rgb(217,116,36)" rx="2" ry="2" />
1326 | <text  x="1033.36" y="511.5" ></text>
1327 | </g>
1328 | <g >
1329 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::IsControlArrayValid (105 samples, 2.02%)</title><rect x="471.2" y="485" width="23.9" height="15.0" fill="rgb(243,105,37)" rx="2" ry="2" />
1330 | <text  x="474.18" y="495.5" >v..</text>
1331 | </g>
1332 | <g >
1333 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (41 samples, 0.79%)</title><rect x="273.3" y="469" width="9.4" height="15.0" fill="rgb(236,97,10)" rx="2" ry="2" />
1334 | <text  x="276.33" y="479.5" ></text>
1335 | </g>
1336 | <g >
1337 | <title>vtkm::cont::Token::~Token (272 samples, 5.24%)</title><rect x="392.7" y="501" width="61.9" height="15.0" fill="rgb(253,16,11)" rx="2" ry="2" />
1338 | <text  x="395.72" y="511.5" >vtkm::..</text>
1339 | </g>
1340 | <g >
1341 | <title>std::shared_ptr&lt;bool&gt;::shared_ptr (8 samples, 0.15%)</title><rect x="188.1" y="501" width="1.8" height="15.0" fill="rgb(218,227,3)" rx="2" ry="2" />
1342 | <text  x="191.06" y="511.5" ></text>
1343 | </g>
1344 | <g >
1345 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="261" width="3.4" height="15.0" fill="rgb(243,138,42)" rx="2" ry="2" />
1346 | <text  x="680.89" y="271.5" ></text>
1347 | </g>
1348 | <g >
1349 | <title>__gthread_active_p (2 samples, 0.04%)</title><rect x="231.7" y="437" width="0.5" height="15.0" fill="rgb(234,69,36)" rx="2" ry="2" />
1350 | <text  x="234.72" y="447.5" ></text>
1351 | </g>
1352 | <g >
1353 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt;::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt; (164 samples, 3.16%)</title><rect x="617.9" y="501" width="37.2" height="15.0" fill="rgb(233,199,16)" rx="2" ry="2" />
1354 | <text  x="620.85" y="511.5" >vtk..</text>
1355 | </g>
1356 | <g >
1357 | <title>std::__uniq_ptr_impl&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_ptr (9 samples, 0.17%)</title><rect x="435.2" y="453" width="2.1" height="15.0" fill="rgb(211,14,6)" rx="2" ry="2" />
1358 | <text  x="438.25" y="463.5" ></text>
1359 | </g>
1360 | <g >
1361 | <title>std::_Sp_counted_base&lt; (3 samples, 0.06%)</title><rect x="1058.1" y="469" width="0.7" height="15.0" fill="rgb(225,97,48)" rx="2" ry="2" />
1362 | <text  x="1061.11" y="479.5" ></text>
1363 | </g>
1364 | <g >
1365 | <title>std::_Tuple_impl&lt;1ul, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_Tuple_impl (13 samples, 0.25%)</title><rect x="387.3" y="421" width="2.9" height="15.0" fill="rgb(226,29,28)" rx="2" ry="2" />
1366 | <text  x="390.26" y="431.5" ></text>
1367 | </g>
1368 | <g >
1369 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (1 samples, 0.02%)</title><rect x="835.2" y="485" width="0.3" height="15.0" fill="rgb(228,97,2)" rx="2" ry="2" />
1370 | <text  x="838.25" y="495.5" ></text>
1371 | </g>
1372 | <g >
1373 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::~InternalStruct (15 samples, 0.29%)</title><rect x="677.9" y="325" width="3.4" height="15.0" fill="rgb(217,70,34)" rx="2" ry="2" />
1374 | <text  x="680.89" y="335.5" ></text>
1375 | </g>
1376 | <g >
1377 | <title>_dl_new_object (1 samples, 0.02%)</title><rect x="1156.8" y="645" width="0.2" height="15.0" fill="rgb(228,4,13)" rx="2" ry="2" />
1378 | <text  x="1159.80" y="655.5" ></text>
1379 | </g>
1380 | <g >
1381 | <title>std::_Head_base&lt;0ul, vtkm::cont::Token::InternalStruct*, false&gt;::_Head_base (14 samples, 0.27%)</title><rect x="383.6" y="421" width="3.2" height="15.0" fill="rgb(233,15,54)" rx="2" ry="2" />
1382 | <text  x="386.62" y="431.5" ></text>
1383 | </g>
1384 | <g >
1385 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (41 samples, 0.79%)</title><rect x="972.4" y="437" width="9.3" height="15.0" fill="rgb(221,23,6)" rx="2" ry="2" />
1386 | <text  x="975.37" y="447.5" ></text>
1387 | </g>
1388 | <g >
1389 | <title>std::__addressof&lt;std::mutex&gt; (1 samples, 0.02%)</title><rect x="273.1" y="469" width="0.2" height="15.0" fill="rgb(233,187,1)" rx="2" ry="2" />
1390 | <text  x="276.11" y="479.5" ></text>
1391 | </g>
1392 | <g >
1393 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (4 samples, 0.08%)</title><rect x="470.3" y="485" width="0.9" height="15.0" fill="rgb(226,228,15)" rx="2" ry="2" />
1394 | <text  x="473.27" y="495.5" ></text>
1395 | </g>
1396 | <g >
1397 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (2 samples, 0.04%)</title><rect x="893.9" y="469" width="0.5" height="15.0" fill="rgb(214,184,34)" rx="2" ry="2" />
1398 | <text  x="896.92" y="479.5" ></text>
1399 | </g>
1400 | <g >
1401 | <title>__gnu_cxx::__exchange_and_add_dispatch (4 samples, 0.08%)</title><rect x="201.5" y="453" width="0.9" height="15.0" fill="rgb(233,211,41)" rx="2" ry="2" />
1402 | <text  x="204.47" y="463.5" ></text>
1403 | </g>
1404 | <g >
1405 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt; &gt;::Get&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt;, 0&gt; (134 samples, 2.58%)</title><rect x="992.4" y="517" width="30.5" height="15.0" fill="rgb(239,209,10)" rx="2" ry="2" />
1406 | <text  x="995.39" y="527.5" >vt..</text>
1407 | </g>
1408 | <g >
1409 | <title>vtkm::cont::internal::ArrayHandleImpl::WaitToRead (148 samples, 2.85%)</title><rect x="582.8" y="469" width="33.7" height="15.0" fill="rgb(226,111,34)" rx="2" ry="2" />
1410 | <text  x="585.83" y="479.5" >vt..</text>
1411 | </g>
1412 | <g >
1413 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Get (1 samples, 0.02%)</title><rect x="931.2" y="453" width="0.2" height="15.0" fill="rgb(218,79,19)" rx="2" ry="2" />
1414 | <text  x="934.21" y="463.5" ></text>
1415 | </g>
1416 | <g >
1417 | <title>std::mutex::lock (74 samples, 1.43%)</title><rect x="298.6" y="437" width="16.8" height="15.0" fill="rgb(246,147,16)" rx="2" ry="2" />
1418 | <text  x="301.58" y="447.5" ></text>
1419 | </g>
1420 | <g >
1421 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="325" width="0.2" height="15.0" fill="rgb(216,187,53)" rx="2" ry="2" />
1422 | <text  x="742.51" y="335.5" ></text>
1423 | </g>
1424 | <g >
1425 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (17 samples, 0.33%)</title><rect x="338.4" y="437" width="3.8" height="15.0" fill="rgb(248,202,1)" rx="2" ry="2" />
1426 | <text  x="341.37" y="447.5" ></text>
1427 | </g>
1428 | <g >
1429 | <title>vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::GetLock (282 samples, 5.43%)</title><rect x="251.3" y="501" width="64.1" height="15.0" fill="rgb(222,87,35)" rx="2" ry="2" />
1430 | <text  x="254.28" y="511.5" >vtkm::c..</text>
1431 | </g>
1432 | <g >
1433 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::GetControlArray (53 samples, 1.02%)</title><rect x="497.1" y="501" width="12.1" height="15.0" fill="rgb(252,191,12)" rx="2" ry="2" />
1434 | <text  x="500.10" y="511.5" ></text>
1435 | </g>
1436 | <g >
1437 | <title>vtkm::cont::internal::StorageBasicBase::ReleaseResources (15 samples, 0.29%)</title><rect x="677.9" y="293" width="3.4" height="15.0" fill="rgb(224,137,35)" rx="2" ry="2" />
1438 | <text  x="680.89" y="303.5" ></text>
1439 | </g>
1440 | <g >
1441 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="549" width="0.7" height="15.0" fill="rgb(234,149,21)" rx="2" ry="2" />
1442 | <text  x="13.23" y="559.5" ></text>
1443 | </g>
1444 | <g >
1445 | <title>[unknown] (66 samples, 1.27%)</title><rect x="878.2" y="421" width="15.0" height="15.0" fill="rgb(207,104,49)" rx="2" ry="2" />
1446 | <text  x="881.23" y="431.5" ></text>
1447 | </g>
1448 | <g >
1449 | <title>std::unique_lock&lt;std::mutex&gt;::owns_lock (4 samples, 0.08%)</title><rect x="508.2" y="469" width="1.0" height="15.0" fill="rgb(215,180,18)" rx="2" ry="2" />
1450 | <text  x="511.24" y="479.5" ></text>
1451 | </g>
1452 | <g >
1453 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="613" width="1.4" height="15.0" fill="rgb(242,164,39)" rx="2" ry="2" />
1454 | <text  x="1191.64" y="623.5" ></text>
1455 | </g>
1456 | <g >
1457 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt; &gt;::Get&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt;, 0&gt; (5 samples, 0.10%)</title><rect x="50.5" y="533" width="1.1" height="15.0" fill="rgb(241,79,50)" rx="2" ry="2" />
1458 | <text  x="53.48" y="543.5" ></text>
1459 | </g>
1460 | <g >
1461 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (3 samples, 0.06%)</title><rect x="325.9" y="485" width="0.6" height="15.0" fill="rgb(220,178,34)" rx="2" ry="2" />
1462 | <text  x="328.86" y="495.5" ></text>
1463 | </g>
1464 | <g >
1465 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::IsControlArrayValid (6 samples, 0.12%)</title><rect x="560.5" y="501" width="1.4" height="15.0" fill="rgb(239,112,13)" rx="2" ry="2" />
1466 | <text  x="563.55" y="511.5" ></text>
1467 | </g>
1468 | <g >
1469 | <title>std::__shared_ptr_access&lt;bool,  (2 samples, 0.04%)</title><rect x="335.4" y="469" width="0.5" height="15.0" fill="rgb(237,82,19)" rx="2" ry="2" />
1470 | <text  x="338.42" y="479.5" ></text>
1471 | </g>
1472 | <g >
1473 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (11 samples, 0.21%)</title><rect x="601.3" y="437" width="2.5" height="15.0" fill="rgb(253,142,45)" rx="2" ry="2" />
1474 | <text  x="604.25" y="447.5" ></text>
1475 | </g>
1476 | <g >
1477 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::GetControlArray (1 samples, 0.02%)</title><rect x="990.1" y="517" width="0.2" height="15.0" fill="rgb(216,218,43)" rx="2" ry="2" />
1478 | <text  x="993.11" y="527.5" ></text>
1479 | </g>
1480 | <g >
1481 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel::operator (1 samples, 0.02%)</title><rect x="932.6" y="469" width="0.2" height="15.0" fill="rgb(230,72,6)" rx="2" ry="2" />
1482 | <text  x="935.58" y="479.5" ></text>
1483 | </g>
1484 | <g >
1485 | <title>[unknown] (11 samples, 0.21%)</title><rect x="918.0" y="341" width="2.5" height="15.0" fill="rgb(217,206,4)" rx="2" ry="2" />
1486 | <text  x="921.02" y="351.5" ></text>
1487 | </g>
1488 | <g >
1489 | <title>[unknown] (6 samples, 0.12%)</title><rect x="919.2" y="309" width="1.3" height="15.0" fill="rgb(208,225,1)" rx="2" ry="2" />
1490 | <text  x="922.16" y="319.5" ></text>
1491 | </g>
1492 | <g >
1493 | <title>vtkm::cont::internal::ArrayHandleImpl::~ArrayHandleImpl (15 samples, 0.29%)</title><rect x="677.9" y="421" width="3.4" height="15.0" fill="rgb(209,173,21)" rx="2" ry="2" />
1494 | <text  x="680.89" y="431.5" ></text>
1495 | </g>
1496 | <g >
1497 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_construct&lt;char const*&gt;@plt (3 samples, 0.06%)</title><rect x="34.1" y="533" width="0.7" height="15.0" fill="rgb(235,144,40)" rx="2" ry="2" />
1498 | <text  x="37.10" y="543.5" ></text>
1499 | </g>
1500 | <g >
1501 | <title>std::unique_lock&lt;std::mutex&gt;::owns_lock (4 samples, 0.08%)</title><rect x="559.6" y="469" width="0.9" height="15.0" fill="rgb(209,36,9)" rx="2" ry="2" />
1502 | <text  x="562.64" y="479.5" ></text>
1503 | </g>
1504 | <g >
1505 | <title>std::__uniq_ptr_impl&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_ptr (46 samples, 0.89%)</title><rect x="441.6" y="437" width="10.5" height="15.0" fill="rgb(247,98,46)" rx="2" ry="2" />
1506 | <text  x="444.61" y="447.5" ></text>
1507 | </g>
1508 | <g >
1509 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="133" width="3.4" height="15.0" fill="rgb(243,19,23)" rx="2" ry="2" />
1510 | <text  x="680.89" y="143.5" ></text>
1511 | </g>
1512 | <g >
1513 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="565" width="1.4" height="15.0" fill="rgb(206,63,27)" rx="2" ry="2" />
1514 | <text  x="1191.64" y="575.5" ></text>
1515 | </g>
1516 | <g >
1517 | <title>[unknown] (35 samples, 0.67%)</title><rect x="885.3" y="373" width="7.9" height="15.0" fill="rgb(211,216,0)" rx="2" ry="2" />
1518 | <text  x="888.28" y="383.5" ></text>
1519 | </g>
1520 | <g >
1521 | <title>vtkm::cont::internal::ArrayHandleImpl::CanRead (121 samples, 2.33%)</title><rect x="588.1" y="453" width="27.5" height="15.0" fill="rgb(246,184,51)" rx="2" ry="2" />
1522 | <text  x="591.06" y="463.5" >v..</text>
1523 | </g>
1524 | <g >
1525 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::unique_ptr&lt;std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt;, void&gt; (11 samples, 0.21%)</title><rect x="247.4" y="501" width="2.5" height="15.0" fill="rgb(251,148,6)" rx="2" ry="2" />
1526 | <text  x="250.41" y="511.5" ></text>
1527 | </g>
1528 | <g >
1529 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::Run (4,752 samples, 91.58%)</title><rect x="10.9" y="581" width="1080.6" height="15.0" fill="rgb(231,173,49)" rx="2" ry="2" />
1530 | <text  x="13.91" y="591.5" >vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::Run</text>
1531 | </g>
1532 | <g >
1533 | <title>std::shared_ptr&lt;bool&gt;::shared_ptr (132 samples, 2.54%)</title><rect x="625.1" y="485" width="30.0" height="15.0" fill="rgb(240,134,44)" rx="2" ry="2" />
1534 | <text  x="628.13" y="495.5" >st..</text>
1535 | </g>
1536 | <g >
1537 | <title>[unknown] (1 samples, 0.02%)</title><rect x="428.9" y="469" width="0.2" height="15.0" fill="rgb(252,43,14)" rx="2" ry="2" />
1538 | <text  x="431.88" y="479.5" ></text>
1539 | </g>
1540 | <g >
1541 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::GetNumberOfValues (33 samples, 0.64%)</title><rect x="732.2" y="405" width="7.5" height="15.0" fill="rgb(230,116,39)" rx="2" ry="2" />
1542 | <text  x="735.24" y="415.5" ></text>
1543 | </g>
1544 | <g >
1545 | <title>std::forward&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt; (5 samples, 0.10%)</title><rect x="624.0" y="485" width="1.1" height="15.0" fill="rgb(226,26,11)" rx="2" ry="2" />
1546 | <text  x="626.99" y="495.5" ></text>
1547 | </g>
1548 | <g >
1549 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::basic_string (20 samples, 0.39%)</title><rect x="1149.5" y="629" width="4.6" height="15.0" fill="rgb(247,112,47)" rx="2" ry="2" />
1550 | <text  x="1152.52" y="639.5" ></text>
1551 | </g>
1552 | <g >
1553 | <title>__gthread_active_p (1 samples, 0.02%)</title><rect x="299.7" y="421" width="0.2" height="15.0" fill="rgb(240,167,15)" rx="2" ry="2" />
1554 | <text  x="302.71" y="431.5" ></text>
1555 | </g>
1556 | <g >
1557 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (27 samples, 0.52%)</title><rect x="554.4" y="485" width="6.1" height="15.0" fill="rgb(247,99,42)" rx="2" ry="2" />
1558 | <text  x="557.41" y="495.5" ></text>
1559 | </g>
1560 | <g >
1561 | <title>std::__get_helper&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (28 samples, 0.54%)</title><rect x="415.5" y="437" width="6.3" height="15.0" fill="rgb(235,162,44)" rx="2" ry="2" />
1562 | <text  x="418.46" y="447.5" ></text>
1563 | </g>
1564 | <g >
1565 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (15 samples, 0.29%)</title><rect x="489.4" y="453" width="3.4" height="15.0" fill="rgb(221,190,33)" rx="2" ry="2" />
1566 | <text  x="492.37" y="463.5" ></text>
1567 | </g>
1568 | <g >
1569 | <title>std::unique_lock&lt;std::mutex&gt;::lock (4 samples, 0.08%)</title><rect x="282.7" y="469" width="0.9" height="15.0" fill="rgb(229,49,36)" rx="2" ry="2" />
1570 | <text  x="285.66" y="479.5" ></text>
1571 | </g>
1572 | <g >
1573 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (4 samples, 0.08%)</title><rect x="166.9" y="485" width="0.9" height="15.0" fill="rgb(216,84,2)" rx="2" ry="2" />
1574 | <text  x="169.91" y="495.5" ></text>
1575 | </g>
1576 | <g >
1577 | <title>all (5,189 samples, 100%)</title><rect x="10.0" y="677" width="1180.0" height="15.0" fill="rgb(244,191,6)" rx="2" ry="2" />
1578 | <text  x="13.00" y="687.5" ></text>
1579 | </g>
1580 | <g >
1581 | <title>std::shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl&gt;::~shared_ptr (15 samples, 0.29%)</title><rect x="677.9" y="501" width="3.4" height="15.0" fill="rgb(219,59,17)" rx="2" ry="2" />
1582 | <text  x="680.89" y="511.5" ></text>
1583 | </g>
1584 | <g >
1585 | <title>std::shared_ptr&lt;bool&gt;::~shared_ptr (138 samples, 2.66%)</title><rect x="189.9" y="501" width="31.4" height="15.0" fill="rgb(234,102,52)" rx="2" ry="2" />
1586 | <text  x="192.88" y="511.5" >st..</text>
1587 | </g>
1588 | <g >
1589 | <title>std::_Tuple_impl&lt;1ul, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_Tuple_impl (3 samples, 0.06%)</title><rect x="390.2" y="437" width="0.7" height="15.0" fill="rgb(222,166,12)" rx="2" ry="2" />
1590 | <text  x="393.22" y="447.5" ></text>
1591 | </g>
1592 | <g >
1593 | <title>std::allocator&lt;char&gt;::~allocator@plt (5 samples, 0.10%)</title><rect x="49.1" y="533" width="1.2" height="15.0" fill="rgb(219,59,6)" rx="2" ry="2" />
1594 | <text  x="52.11" y="543.5" ></text>
1595 | </g>
1596 | <g >
1597 | <title>std::__shared_ptr_access&lt;bool,  (28 samples, 0.54%)</title><rect x="1006.5" y="485" width="6.4" height="15.0" fill="rgb(223,10,42)" rx="2" ry="2" />
1598 | <text  x="1009.48" y="495.5" ></text>
1599 | </g>
1600 | <g >
1601 | <title>std::allocator&lt;char&gt;::~allocator (6 samples, 0.12%)</title><rect x="47.7" y="533" width="1.4" height="15.0" fill="rgb(253,37,0)" rx="2" ry="2" />
1602 | <text  x="50.75" y="543.5" ></text>
1603 | </g>
1604 | <g >
1605 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::ArrayPortalFromIterators (58 samples, 1.12%)</title><rect x="664.0" y="485" width="13.2" height="15.0" fill="rgb(250,77,8)" rx="2" ry="2" />
1606 | <text  x="667.01" y="495.5" ></text>
1607 | </g>
1608 | <g >
1609 | <title>strlen@plt (11 samples, 0.21%)</title><rect x="1154.1" y="629" width="2.5" height="15.0" fill="rgb(240,97,10)" rx="2" ry="2" />
1610 | <text  x="1157.07" y="639.5" ></text>
1611 | </g>
1612 | <g >
1613 | <title>std::__shared_count&lt; (119 samples, 2.29%)</title><rect x="526.7" y="453" width="27.0" height="15.0" fill="rgb(209,143,6)" rx="2" ry="2" />
1614 | <text  x="529.66" y="463.5" >s..</text>
1615 | </g>
1616 | <g >
1617 | <title>[unknown] (6 samples, 0.12%)</title><rect x="679.9" y="85" width="1.4" height="15.0" fill="rgb(250,173,16)" rx="2" ry="2" />
1618 | <text  x="682.93" y="95.5" ></text>
1619 | </g>
1620 | <g >
1621 | <title>std::unique_lock&lt;std::mutex&gt;::owns_lock (10 samples, 0.19%)</title><rect x="492.8" y="453" width="2.3" height="15.0" fill="rgb(249,15,48)" rx="2" ry="2" />
1622 | <text  x="495.78" y="463.5" ></text>
1623 | </g>
1624 | <g >
1625 | <title>std::__uniq_ptr_impl&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_ptr (6 samples, 0.12%)</title><rect x="394.3" y="485" width="1.4" height="15.0" fill="rgb(224,78,23)" rx="2" ry="2" />
1626 | <text  x="397.31" y="495.5" ></text>
1627 | </g>
1628 | <g >
1629 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Get (3 samples, 0.06%)</title><rect x="946.2" y="437" width="0.7" height="15.0" fill="rgb(232,29,42)" rx="2" ry="2" />
1630 | <text  x="949.22" y="447.5" ></text>
1631 | </g>
1632 | <g >
1633 | <title>std::_Tuple_impl&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_Tuple_impl (48 samples, 0.93%)</title><rect x="379.3" y="437" width="10.9" height="15.0" fill="rgb(211,177,32)" rx="2" ry="2" />
1634 | <text  x="382.30" y="447.5" ></text>
1635 | </g>
1636 | <g >
1637 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (28 samples, 0.54%)</title><rect x="276.3" y="453" width="6.4" height="15.0" fill="rgb(214,49,46)" rx="2" ry="2" />
1638 | <text  x="279.29" y="463.5" ></text>
1639 | </g>
1640 | <g >
1641 | <title>std::_Sp_counted_base&lt; (6 samples, 0.12%)</title><rect x="193.1" y="469" width="1.3" height="15.0" fill="rgb(214,16,7)" rx="2" ry="2" />
1642 | <text  x="196.06" y="479.5" ></text>
1643 | </g>
1644 | <g >
1645 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (28 samples, 0.54%)</title><rect x="335.9" y="469" width="6.3" height="15.0" fill="rgb(210,212,49)" rx="2" ry="2" />
1646 | <text  x="338.87" y="479.5" ></text>
1647 | </g>
1648 | <g >
1649 | <title>vtkm::exec::serial::internal::TaskTiling3D::operator (234 samples, 4.51%)</title><rect x="932.6" y="485" width="53.2" height="15.0" fill="rgb(205,112,1)" rx="2" ry="2" />
1650 | <text  x="935.58" y="495.5" >vtkm:..</text>
1651 | </g>
1652 | <g >
1653 | <title>[unknown] (21 samples, 0.40%)</title><rect x="888.5" y="325" width="4.7" height="15.0" fill="rgb(223,168,42)" rx="2" ry="2" />
1654 | <text  x="891.46" y="335.5" ></text>
1655 | </g>
1656 | <g >
1657 | <title>[unknown] (1 samples, 0.02%)</title><rect x="767.0" y="341" width="0.3" height="15.0" fill="rgb(215,107,22)" rx="2" ry="2" />
1658 | <text  x="770.03" y="351.5" ></text>
1659 | </g>
1660 | <g >
1661 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (19 samples, 0.37%)</title><rect x="596.9" y="405" width="4.4" height="15.0" fill="rgb(205,171,22)" rx="2" ry="2" />
1662 | <text  x="599.93" y="415.5" ></text>
1663 | </g>
1664 | <g >
1665 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::~unique_ptr (114 samples, 2.20%)</title><rect x="397.5" y="485" width="25.9" height="15.0" fill="rgb(209,131,8)" rx="2" ry="2" />
1666 | <text  x="400.50" y="495.5" >s..</text>
1667 | </g>
1668 | <g >
1669 | <title>std::__shared_ptr_access&lt;bool,  (14 samples, 0.27%)</title><rect x="76.4" y="517" width="3.2" height="15.0" fill="rgb(213,197,46)" rx="2" ry="2" />
1670 | <text  x="79.40" y="527.5" ></text>
1671 | </g>
1672 | <g >
1673 | <title>std::unique_lock&lt;std::mutex&gt;::unlock (96 samples, 1.85%)</title><rect x="225.6" y="485" width="21.8" height="15.0" fill="rgb(217,142,19)" rx="2" ry="2" />
1674 | <text  x="228.58" y="495.5" >s..</text>
1675 | </g>
1676 | <g >
1677 | <title>std::__iterator_category&lt;long long const*&gt; (1 samples, 0.02%)</title><rect x="677.0" y="453" width="0.2" height="15.0" fill="rgb(205,143,9)" rx="2" ry="2" />
1678 | <text  x="679.98" y="463.5" ></text>
1679 | </g>
1680 | <g >
1681 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="357" width="0.2" height="15.0" fill="rgb(223,9,24)" rx="2" ry="2" />
1682 | <text  x="742.51" y="367.5" ></text>
1683 | </g>
1684 | <g >
1685 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::ScheduleTask (241 samples, 4.64%)</title><rect x="931.0" y="501" width="54.8" height="15.0" fill="rgb(223,153,43)" rx="2" ry="2" />
1686 | <text  x="933.99" y="511.5" >vtkm:..</text>
1687 | </g>
1688 | <g >
1689 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (56 samples, 1.08%)</title><rect x="255.1" y="485" width="12.8" height="15.0" fill="rgb(224,58,50)" rx="2" ry="2" />
1690 | <text  x="258.14" y="495.5" ></text>
1691 | </g>
1692 | <g >
1693 | <title>__gnu_cxx::__atomic_add (51 samples, 0.98%)</title><rect x="540.1" y="405" width="11.6" height="15.0" fill="rgb(247,189,30)" rx="2" ry="2" />
1694 | <text  x="543.08" y="415.5" ></text>
1695 | </g>
1696 | <g >
1697 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::GetNumberOfValues (10 samples, 0.19%)</title><rect x="1044.0" y="469" width="2.3" height="15.0" fill="rgb(226,57,30)" rx="2" ry="2" />
1698 | <text  x="1047.01" y="479.5" ></text>
1699 | </g>
1700 | <g >
1701 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="405" width="0.3" height="15.0" fill="rgb(245,200,0)" rx="2" ry="2" />
1702 | <text  x="528.53" y="415.5" ></text>
1703 | </g>
1704 | <g >
1705 | <title>std::shared_ptr&lt;bool&gt;::~shared_ptr (4 samples, 0.08%)</title><rect x="88.0" y="517" width="0.9" height="15.0" fill="rgb(225,141,50)" rx="2" ry="2" />
1706 | <text  x="91.00" y="527.5" ></text>
1707 | </g>
1708 | <g >
1709 | <title>std::__shared_count&lt; (3 samples, 0.06%)</title><rect x="191.0" y="485" width="0.7" height="15.0" fill="rgb(232,95,14)" rx="2" ry="2" />
1710 | <text  x="194.01" y="495.5" ></text>
1711 | </g>
1712 | <g >
1713 | <title>std::__shared_count&lt; (4 samples, 0.08%)</title><rect x="1054.0" y="485" width="0.9" height="15.0" fill="rgb(244,119,1)" rx="2" ry="2" />
1714 | <text  x="1057.01" y="495.5" ></text>
1715 | </g>
1716 | <g >
1717 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="597" width="1.4" height="15.0" fill="rgb(227,168,43)" rx="2" ry="2" />
1718 | <text  x="1191.64" y="607.5" ></text>
1719 | </g>
1720 | <g >
1721 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Set (3 samples, 0.06%)</title><rect x="908.0" y="437" width="0.7" height="15.0" fill="rgb(238,82,40)" rx="2" ry="2" />
1722 | <text  x="911.02" y="447.5" ></text>
1723 | </g>
1724 | <g >
1725 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="581" width="1.4" height="15.0" fill="rgb(210,89,50)" rx="2" ry="2" />
1726 | <text  x="1191.64" y="591.5" ></text>
1727 | </g>
1728 | <g >
1729 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (24 samples, 0.46%)</title><rect x="595.8" y="421" width="5.5" height="15.0" fill="rgb(253,49,34)" rx="2" ry="2" />
1730 | <text  x="598.79" y="431.5" ></text>
1731 | </g>
1732 | <g >
1733 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel::operator (3 samples, 0.06%)</title><rect x="767.3" y="469" width="0.6" height="15.0" fill="rgb(224,110,45)" rx="2" ry="2" />
1734 | <text  x="770.26" y="479.5" ></text>
1735 | </g>
1736 | <g >
1737 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="421" width="0.3" height="15.0" fill="rgb(211,122,32)" rx="2" ry="2" />
1738 | <text  x="528.53" y="431.5" ></text>
1739 | </g>
1740 | <g >
1741 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="389" width="0.3" height="15.0" fill="rgb(236,69,44)" rx="2" ry="2" />
1742 | <text  x="528.53" y="399.5" ></text>
1743 | </g>
1744 | <g >
1745 | <title>[unknown] (3 samples, 0.06%)</title><rect x="680.6" y="53" width="0.7" height="15.0" fill="rgb(241,76,36)" rx="2" ry="2" />
1746 | <text  x="683.61" y="63.5" ></text>
1747 | </g>
1748 | <g >
1749 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::GetNumberOfValues (1 samples, 0.02%)</title><rect x="931.7" y="405" width="0.2" height="15.0" fill="rgb(241,140,44)" rx="2" ry="2" />
1750 | <text  x="934.67" y="415.5" ></text>
1751 | </g>
1752 | <g >
1753 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::~basic_string (15 samples, 0.29%)</title><rect x="40.7" y="533" width="3.4" height="15.0" fill="rgb(229,173,11)" rx="2" ry="2" />
1754 | <text  x="43.70" y="543.5" ></text>
1755 | </g>
1756 | <g >
1757 | <title>std::unique_ptr&lt;vtkm::cont::Token::InternalStruct, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::operator bool (100 samples, 1.93%)</title><rect x="431.8" y="469" width="22.8" height="15.0" fill="rgb(250,87,53)" rx="2" ry="2" />
1758 | <text  x="434.83" y="479.5" >s..</text>
1759 | </g>
1760 | <g >
1761 | <title>std::mutex::lock (4 samples, 0.08%)</title><rect x="293.8" y="453" width="0.9" height="15.0" fill="rgb(229,14,0)" rx="2" ry="2" />
1762 | <text  x="296.80" y="463.5" ></text>
1763 | </g>
1764 | <g >
1765 | <title>[unknown] (12 samples, 0.23%)</title><rect x="917.8" y="357" width="2.7" height="15.0" fill="rgb(220,58,53)" rx="2" ry="2" />
1766 | <text  x="920.80" y="367.5" ></text>
1767 | </g>
1768 | <g >
1769 | <title>std::get&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (33 samples, 0.64%)</title><rect x="444.6" y="421" width="7.5" height="15.0" fill="rgb(210,23,33)" rx="2" ry="2" />
1770 | <text  x="447.57" y="431.5" ></text>
1771 | </g>
1772 | <g >
1773 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="581" width="0.7" height="15.0" fill="rgb(234,14,14)" rx="2" ry="2" />
1774 | <text  x="13.23" y="591.5" ></text>
1775 | </g>
1776 | <g >
1777 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (19 samples, 0.37%)</title><rect x="575.6" y="453" width="4.3" height="15.0" fill="rgb(218,16,39)" rx="2" ry="2" />
1778 | <text  x="578.55" y="463.5" ></text>
1779 | </g>
1780 | <g >
1781 | <title>std::__shared_ptr_access&lt;bool,  (27 samples, 0.52%)</title><rect x="478.0" y="453" width="6.1" height="15.0" fill="rgb(205,182,7)" rx="2" ry="2" />
1782 | <text  x="481.00" y="463.5" ></text>
1783 | </g>
1784 | <g >
1785 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="453" width="0.3" height="15.0" fill="rgb(227,74,42)" rx="2" ry="2" />
1786 | <text  x="528.53" y="463.5" ></text>
1787 | </g>
1788 | <g >
1789 | <title>__gthread_active_p (1 samples, 0.02%)</title><rect x="230.6" y="453" width="0.2" height="15.0" fill="rgb(208,3,31)" rx="2" ry="2" />
1790 | <text  x="233.58" y="463.5" ></text>
1791 | </g>
1792 | <g >
1793 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Set (115 samples, 2.22%)</title><rect x="741.1" y="437" width="26.2" height="15.0" fill="rgb(243,113,54)" rx="2" ry="2" />
1794 | <text  x="744.10" y="447.5" >v..</text>
1795 | </g>
1796 | <g >
1797 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt; &gt;::operator (214 samples, 4.12%)</title><rect x="844.6" y="453" width="48.6" height="15.0" fill="rgb(205,214,5)" rx="2" ry="2" />
1798 | <text  x="847.57" y="463.5" >vtkm..</text>
1799 | </g>
1800 | <g >
1801 | <title>__libc_start_main (4,752 samples, 91.58%)</title><rect x="10.9" y="629" width="1080.6" height="15.0" fill="rgb(240,95,18)" rx="2" ry="2" />
1802 | <text  x="13.91" y="639.5" >__libc_start_main</text>
1803 | </g>
1804 | <g >
1805 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::ArrayPortalFromIterators (1 samples, 0.02%)</title><rect x="655.1" y="501" width="0.3" height="15.0" fill="rgb(225,144,9)" rx="2" ry="2" />
1806 | <text  x="658.15" y="511.5" ></text>
1807 | </g>
1808 | <g >
1809 | <title>[unknown] (5 samples, 0.10%)</title><rect x="1188.9" y="453" width="1.1" height="15.0" fill="rgb(236,43,43)" rx="2" ry="2" />
1810 | <text  x="1191.86" y="463.5" ></text>
1811 | </g>
1812 | <g >
1813 | <title>__gthread_active_p (3 samples, 0.06%)</title><rect x="220.6" y="437" width="0.7" height="15.0" fill="rgb(250,23,53)" rx="2" ry="2" />
1814 | <text  x="223.58" y="447.5" ></text>
1815 | </g>
1816 | <g >
1817 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (110 samples, 2.12%)</title><rect x="160.1" y="501" width="25.0" height="15.0" fill="rgb(246,158,18)" rx="2" ry="2" />
1818 | <text  x="163.09" y="511.5" >s..</text>
1819 | </g>
1820 | <g >
1821 | <title>std::__shared_ptr&lt;bool,  (5 samples, 0.10%)</title><rect x="114.8" y="501" width="1.2" height="15.0" fill="rgb(250,222,54)" rx="2" ry="2" />
1822 | <text  x="117.83" y="511.5" ></text>
1823 | </g>
1824 | <g >
1825 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (7 samples, 0.13%)</title><rect x="267.9" y="485" width="1.6" height="15.0" fill="rgb(208,83,18)" rx="2" ry="2" />
1826 | <text  x="270.88" y="495.5" ></text>
1827 | </g>
1828 | <g >
1829 | <title>vtkm::cont::Token::Token (13 samples, 0.25%)</title><rect x="985.8" y="517" width="2.9" height="15.0" fill="rgb(228,181,3)" rx="2" ry="2" />
1830 | <text  x="988.79" y="527.5" ></text>
1831 | </g>
1832 | <g >
1833 | <title>std::unique_lock&lt;std::mutex&gt;::owns_lock (3 samples, 0.06%)</title><rect x="359.3" y="437" width="0.7" height="15.0" fill="rgb(227,33,47)" rx="2" ry="2" />
1834 | <text  x="362.29" y="447.5" ></text>
1835 | </g>
1836 | <g >
1837 | <title>std::__distance&lt;long long const*&gt; (8 samples, 0.15%)</title><rect x="675.2" y="453" width="1.8" height="15.0" fill="rgb(216,53,24)" rx="2" ry="2" />
1838 | <text  x="678.16" y="463.5" ></text>
1839 | </g>
1840 | <g >
1841 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (44 samples, 0.85%)</title><rect x="605.6" y="421" width="10.0" height="15.0" fill="rgb(205,219,43)" rx="2" ry="2" />
1842 | <text  x="608.57" y="431.5" ></text>
1843 | </g>
1844 | <g >
1845 | <title>std::unique_lock&lt;std::mutex&gt;::unique_lock (140 samples, 2.70%)</title><rect x="283.6" y="469" width="31.8" height="15.0" fill="rgb(239,197,41)" rx="2" ry="2" />
1846 | <text  x="286.57" y="479.5" >st..</text>
1847 | </g>
1848 | <g >
1849 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::Get (2 samples, 0.04%)</title><rect x="1084.0" y="517" width="0.5" height="15.0" fill="rgb(215,149,46)" rx="2" ry="2" />
1850 | <text  x="1087.03" y="527.5" ></text>
1851 | </g>
1852 | <g >
1853 | <title>vtkm::cont::internal::ArrayHandleImpl::GetLock (198 samples, 3.82%)</title><rect x="270.4" y="485" width="45.0" height="15.0" fill="rgb(241,217,1)" rx="2" ry="2" />
1854 | <text  x="273.38" y="495.5" >vtkm..</text>
1855 | </g>
1856 | <g >
1857 | <title>std::__shared_ptr&lt;bool,  (3 samples, 0.06%)</title><rect x="623.3" y="485" width="0.7" height="15.0" fill="rgb(238,39,13)" rx="2" ry="2" />
1858 | <text  x="626.31" y="495.5" ></text>
1859 | </g>
1860 | <g >
1861 | <title>[unknown] (2 samples, 0.04%)</title><rect x="10.5" y="517" width="0.4" height="15.0" fill="rgb(228,10,52)" rx="2" ry="2" />
1862 | <text  x="13.45" y="527.5" ></text>
1863 | </g>
1864 | <g >
1865 | <title>__gnu_cxx::__exchange_and_add_dispatch (57 samples, 1.10%)</title><rect x="1070.2" y="437" width="12.9" height="15.0" fill="rgb(215,125,22)" rx="2" ry="2" />
1866 | <text  x="1073.16" y="447.5" ></text>
1867 | </g>
1868 | <g >
1869 | <title>[unknown] (35 samples, 0.67%)</title><rect x="885.3" y="389" width="7.9" height="15.0" fill="rgb(245,178,21)" rx="2" ry="2" />
1870 | <text  x="888.28" y="399.5" ></text>
1871 | </g>
1872 | <g >
1873 | <title>std::unique_lock&lt;std::mutex&gt;::~unique_lock (5 samples, 0.10%)</title><rect x="90.7" y="517" width="1.2" height="15.0" fill="rgb(236,187,0)" rx="2" ry="2" />
1874 | <text  x="93.73" y="527.5" ></text>
1875 | </g>
1876 | <g >
1877 | <title>operator new (15 samples, 0.29%)</title><rect x="1122.7" y="629" width="3.4" height="15.0" fill="rgb(238,138,42)" rx="2" ry="2" />
1878 | <text  x="1125.69" y="639.5" ></text>
1879 | </g>
1880 | <g >
1881 | <title>std::uniform_int_distribution&lt;long long&gt;::param_type::b (2 samples, 0.04%)</title><rect x="90.3" y="485" width="0.4" height="15.0" fill="rgb(248,26,48)" rx="2" ry="2" />
1882 | <text  x="93.27" y="495.5" ></text>
1883 | </g>
1884 | <g >
1885 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (2 samples, 0.04%)</title><rect x="605.1" y="421" width="0.5" height="15.0" fill="rgb(234,21,22)" rx="2" ry="2" />
1886 | <text  x="608.12" y="431.5" ></text>
1887 | </g>
1888 | <g >
1889 | <title>[unknown] (1 samples, 0.02%)</title><rect x="893.0" y="261" width="0.2" height="15.0" fill="rgb(207,140,12)" rx="2" ry="2" />
1890 | <text  x="896.01" y="271.5" ></text>
1891 | </g>
1892 | <g >
1893 | <title>std::__shared_ptr&lt;bool,  (160 samples, 3.08%)</title><rect x="517.3" y="469" width="36.4" height="15.0" fill="rgb(217,170,26)" rx="2" ry="2" />
1894 | <text  x="520.34" y="479.5" >std..</text>
1895 | </g>
1896 | <g >
1897 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (11 samples, 0.21%)</title><rect x="557.1" y="469" width="2.5" height="15.0" fill="rgb(224,204,15)" rx="2" ry="2" />
1898 | <text  x="560.13" y="479.5" ></text>
1899 | </g>
1900 | <g >
1901 | <title>std::uniform_int_distribution&lt;long long&gt;::operator (1 samples, 0.02%)</title><rect x="50.3" y="533" width="0.2" height="15.0" fill="rgb(217,56,10)" rx="2" ry="2" />
1902 | <text  x="53.25" y="543.5" ></text>
1903 | </g>
1904 | <g >
1905 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::basic_string (26 samples, 0.50%)</title><rect x="34.8" y="533" width="5.9" height="15.0" fill="rgb(234,34,28)" rx="2" ry="2" />
1906 | <text  x="37.79" y="543.5" ></text>
1907 | </g>
1908 | <g >
1909 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::GenericClearArrayKernel&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt; &gt;::operator (6 samples, 0.12%)</title><rect x="813.6" y="469" width="1.4" height="15.0" fill="rgb(220,112,8)" rx="2" ry="2" />
1910 | <text  x="816.65" y="479.5" ></text>
1911 | </g>
1912 | <g >
1913 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::GetWriteCount (4 samples, 0.08%)</title><rect x="615.6" y="453" width="0.9" height="15.0" fill="rgb(221,35,8)" rx="2" ry="2" />
1914 | <text  x="618.58" y="463.5" ></text>
1915 | </g>
1916 | <g >
1917 | <title>std::__shared_count&lt; (101 samples, 1.95%)</title><rect x="632.2" y="453" width="22.9" height="15.0" fill="rgb(231,24,19)" rx="2" ry="2" />
1918 | <text  x="635.18" y="463.5" >s..</text>
1919 | </g>
1920 | <g >
1921 | <title>std::_Head_base&lt;0ul, vtkm::cont::Token::InternalStruct*, false&gt;::_M_head (3 samples, 0.06%)</title><rect x="417.1" y="421" width="0.6" height="15.0" fill="rgb(245,32,53)" rx="2" ry="2" />
1922 | <text  x="420.05" y="431.5" ></text>
1923 | </g>
1924 | <g >
1925 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="165" width="3.4" height="15.0" fill="rgb(249,67,6)" rx="2" ry="2" />
1926 | <text  x="680.89" y="175.5" ></text>
1927 | </g>
1928 | <g >
1929 | <title>std::_Head_base&lt;0ul, vtkm::cont::Token::InternalStruct*, false&gt;::_M_head (4 samples, 0.08%)</title><rect x="451.2" y="373" width="0.9" height="15.0" fill="rgb(219,180,22)" rx="2" ry="2" />
1930 | <text  x="454.16" y="383.5" ></text>
1931 | </g>
1932 | <g >
1933 | <title>__gnu_cxx::__exchange_and_add_dispatch (61 samples, 1.18%)</title><rect x="206.7" y="437" width="13.9" height="15.0" fill="rgb(217,165,29)" rx="2" ry="2" />
1934 | <text  x="209.70" y="447.5" ></text>
1935 | </g>
1936 | <g >
1937 | <title>malloc (36 samples, 0.69%)</title><rect x="1112.5" y="629" width="8.1" height="15.0" fill="rgb(215,7,7)" rx="2" ry="2" />
1938 | <text  x="1115.46" y="639.5" ></text>
1939 | </g>
1940 | <g >
1941 | <title>vtkm::detail::VecBase&lt;long long, 3, vtkm::Vec&lt;long long, 3&gt; &gt;::operator[] (15 samples, 0.29%)</title><rect x="927.6" y="453" width="3.4" height="15.0" fill="rgb(234,28,34)" rx="2" ry="2" />
1942 | <text  x="930.58" y="463.5" ></text>
1943 | </g>
1944 | <g >
1945 | <title>std::__shared_ptr_access&lt;bool,  (34 samples, 0.66%)</title><rect x="344.3" y="453" width="7.7" height="15.0" fill="rgb(218,198,20)" rx="2" ry="2" />
1946 | <text  x="347.28" y="463.5" ></text>
1947 | </g>
1948 | <g >
1949 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (12 samples, 0.23%)</title><rect x="356.6" y="437" width="2.7" height="15.0" fill="rgb(225,3,28)" rx="2" ry="2" />
1950 | <text  x="359.56" y="447.5" ></text>
1951 | </g>
1952 | <g >
1953 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::GetNumberOfValues (4 samples, 0.08%)</title><rect x="958.5" y="389" width="0.9" height="15.0" fill="rgb(221,204,37)" rx="2" ry="2" />
1954 | <text  x="961.50" y="399.5" ></text>
1955 | </g>
1956 | <g >
1957 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (107 samples, 2.06%)</title><rect x="135.8" y="485" width="24.3" height="15.0" fill="rgb(209,123,35)" rx="2" ry="2" />
1958 | <text  x="138.75" y="495.5" >s..</text>
1959 | </g>
1960 | <g >
1961 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::Schedule&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel&gt; (582 samples, 11.22%)</title><rect x="681.3" y="517" width="132.3" height="15.0" fill="rgb(214,62,0)" rx="2" ry="2" />
1962 | <text  x="684.30" y="527.5" >vtkm::cont::Devi..</text>
1963 | </g>
1964 | <g >
1965 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="389" width="0.2" height="15.0" fill="rgb(232,60,36)" rx="2" ry="2" />
1966 | <text  x="742.51" y="399.5" ></text>
1967 | </g>
1968 | <g >
1969 | <title>std::__shared_ptr&lt;bool,  (2 samples, 0.04%)</title><rect x="346.3" y="437" width="0.5" height="15.0" fill="rgb(235,73,14)" rx="2" ry="2" />
1970 | <text  x="349.33" y="447.5" ></text>
1971 | </g>
1972 | <g >
1973 | <title>vtkm::cont::internal::ArrayHandleImpl::WaitToRead (243 samples, 4.68%)</title><rect x="562.6" y="501" width="55.3" height="15.0" fill="rgb(253,100,11)" rx="2" ry="2" />
1974 | <text  x="565.59" y="511.5" >vtkm:..</text>
1975 | </g>
1976 | <g >
1977 | <title>vtkm::exec::serial::internal::FunctorTiling3DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::AddArrayKernel&gt; (196 samples, 3.78%)</title><rect x="769.1" y="469" width="44.5" height="15.0" fill="rgb(240,187,40)" rx="2" ry="2" />
1978 | <text  x="772.07" y="479.5" >vtkm..</text>
1979 | </g>
1980 | <g >
1981 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt;::Get (44 samples, 0.85%)</title><rect x="1012.9" y="501" width="10.0" height="15.0" fill="rgb(225,42,27)" rx="2" ry="2" />
1982 | <text  x="1015.85" y="511.5" ></text>
1983 | </g>
1984 | <g >
1985 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Set (54 samples, 1.04%)</title><rect x="960.1" y="421" width="12.3" height="15.0" fill="rgb(210,124,3)" rx="2" ry="2" />
1986 | <text  x="963.09" y="431.5" ></text>
1987 | </g>
1988 | <g >
1989 | <title>vtkm::cont::internal::ArrayHandleImpl::CheckControlArrayValid (178 samples, 3.43%)</title><rect x="454.6" y="501" width="40.5" height="15.0" fill="rgb(245,121,37)" rx="2" ry="2" />
1990 | <text  x="457.58" y="511.5" >vtk..</text>
1991 | </g>
1992 | <g >
1993 | <title>std::__shared_ptr_access&lt;bool,  (20 samples, 0.39%)</title><rect x="1025.8" y="501" width="4.6" height="15.0" fill="rgb(226,184,5)" rx="2" ry="2" />
1994 | <text  x="1028.81" y="511.5" ></text>
1995 | </g>
1996 | <g >
1997 | <title>std::unique_lock&lt;std::mutex&gt;::mutex (20 samples, 0.39%)</title><rect x="609.4" y="405" width="4.6" height="15.0" fill="rgb(235,172,7)" rx="2" ry="2" />
1998 | <text  x="612.44" y="415.5" ></text>
1999 | </g>
2000 | <g >
2001 | <title>[unknown] (1 samples, 0.02%)</title><rect x="1156.8" y="629" width="0.2" height="15.0" fill="rgb(221,26,3)" rx="2" ry="2" />
2002 | <text  x="1159.80" y="639.5" ></text>
2003 | </g>
2004 | <g >
2005 | <title>vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::TestAll::operator (4,752 samples, 91.58%)</title><rect x="10.9" y="549" width="1080.6" height="15.0" fill="rgb(219,96,21)" rx="2" ry="2" />
2006 | <text  x="13.91" y="559.5" >vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::TestAll::operator</text>
2007 | </g>
2008 | <g >
2009 | <title>[unknown] (2 samples, 0.04%)</title><rect x="920.1" y="277" width="0.4" height="15.0" fill="rgb(249,107,2)" rx="2" ry="2" />
2010 | <text  x="923.07" y="287.5" ></text>
2011 | </g>
2012 | <g >
2013 | <title>std::_Head_base&lt;0ul, vtkm::cont::Token::InternalStruct*, false&gt;::_Head_base (1 samples, 0.02%)</title><rect x="379.1" y="437" width="0.2" height="15.0" fill="rgb(241,107,6)" rx="2" ry="2" />
2014 | <text  x="382.08" y="447.5" ></text>
2015 | </g>
2016 | <g >
2017 | <title>[unknown] (3 samples, 0.06%)</title><rect x="919.8" y="293" width="0.7" height="15.0" fill="rgb(235,225,8)" rx="2" ry="2" />
2018 | <text  x="922.84" y="303.5" ></text>
2019 | </g>
2020 | <g >
2021 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::Set (50 samples, 0.96%)</title><rect x="823.9" y="437" width="11.3" height="15.0" fill="rgb(252,151,29)" rx="2" ry="2" />
2022 | <text  x="826.88" y="447.5" ></text>
2023 | </g>
2024 | <g >
2025 | <title>[unknown] (1 samples, 0.02%)</title><rect x="720.2" y="421" width="0.2" height="15.0" fill="rgb(233,29,34)" rx="2" ry="2" />
2026 | <text  x="723.18" y="431.5" ></text>
2027 | </g>
2028 | <g >
2029 | <title>std::__shared_ptr&lt;bool,  (125 samples, 2.41%)</title><rect x="626.7" y="469" width="28.4" height="15.0" fill="rgb(248,147,46)" rx="2" ry="2" />
2030 | <text  x="629.72" y="479.5" >st..</text>
2031 | </g>
2032 | <g >
2033 | <title>std::get&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (7 samples, 0.13%)</title><rect x="421.8" y="469" width="1.6" height="15.0" fill="rgb(207,35,50)" rx="2" ry="2" />
2034 | <text  x="424.83" y="479.5" ></text>
2035 | </g>
2036 | <g >
2037 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Set (44 samples, 0.85%)</title><rect x="910.5" y="421" width="10.0" height="15.0" fill="rgb(225,65,23)" rx="2" ry="2" />
2038 | <text  x="913.52" y="431.5" ></text>
2039 | </g>
2040 | <g >
2041 | <title>[unknown] (1 samples, 0.02%)</title><rect x="835.0" y="373" width="0.2" height="15.0" fill="rgb(237,89,13)" rx="2" ry="2" />
2042 | <text  x="838.02" y="383.5" ></text>
2043 | </g>
2044 | <g >
2045 | <title>_dl_sysdep_start (1 samples, 0.02%)</title><rect x="1112.2" y="629" width="0.3" height="15.0" fill="rgb(236,18,18)" rx="2" ry="2" />
2046 | <text  x="1115.23" y="639.5" ></text>
2047 | </g>
2048 | <g >
2049 | <title>pthread_mutex_lock@plt (1 samples, 0.02%)</title><rect x="315.2" y="421" width="0.2" height="15.0" fill="rgb(206,201,20)" rx="2" ry="2" />
2050 | <text  x="318.18" y="431.5" ></text>
2051 | </g>
2052 | <g >
2053 | <title>UnitTests_vtkm_ (5,183 samples, 99.88%)</title><rect x="10.0" y="661" width="1178.6" height="15.0" fill="rgb(234,176,0)" rx="2" ry="2" />
2054 | <text  x="13.00" y="671.5" >UnitTests_vtkm_</text>
2055 | </g>
2056 | <g >
2057 | <title>std::tuple&lt;vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::tuple&lt;vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt;, true&gt; (64 samples, 1.23%)</title><rect x="376.3" y="453" width="14.6" height="15.0" fill="rgb(207,167,8)" rx="2" ry="2" />
2058 | <text  x="379.35" y="463.5" ></text>
2059 | </g>
2060 | <g >
2061 | <title>std::allocator&lt;char&gt;::allocator (12 samples, 0.23%)</title><rect x="44.1" y="533" width="2.7" height="15.0" fill="rgb(221,155,25)" rx="2" ry="2" />
2062 | <text  x="47.11" y="543.5" ></text>
2063 | </g>
2064 | <g >
2065 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="437" width="1.3" height="15.0" fill="rgb(219,76,28)" rx="2" ry="2" />
2066 | <text  x="337.05" y="447.5" ></text>
2067 | </g>
2068 | <g >
2069 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::IteratorAt (20 samples, 0.39%)</title><rect x="1041.7" y="485" width="4.6" height="15.0" fill="rgb(221,86,8)" rx="2" ry="2" />
2070 | <text  x="1044.73" y="495.5" ></text>
2071 | </g>
2072 | <g >
2073 | <title>__gnu_cxx::__atomic_add_dispatch (52 samples, 1.00%)</title><rect x="642.9" y="421" width="11.8" height="15.0" fill="rgb(243,21,8)" rx="2" ry="2" />
2074 | <text  x="645.87" y="431.5" ></text>
2075 | </g>
2076 | <g >
2077 | <title>vtkm::testing::Testing::Assert&lt;char const  (5 samples, 0.10%)</title><rect x="1090.4" y="533" width="1.1" height="15.0" fill="rgb(207,88,30)" rx="2" ry="2" />
2078 | <text  x="1093.40" y="543.5" ></text>
2079 | </g>
2080 | <g >
2081 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::Get (143 samples, 2.76%)</title><rect x="707.2" y="437" width="32.5" height="15.0" fill="rgb(225,97,40)" rx="2" ry="2" />
2082 | <text  x="710.22" y="447.5" >vt..</text>
2083 | </g>
2084 | <g >
2085 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_create (18 samples, 0.35%)</title><rect x="1143.8" y="629" width="4.1" height="15.0" fill="rgb(249,2,27)" rx="2" ry="2" />
2086 | <text  x="1146.84" y="639.5" ></text>
2087 | </g>
2088 | <g >
2089 | <title>std::__get_helper&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (4 samples, 0.08%)</title><rect x="443.7" y="421" width="0.9" height="15.0" fill="rgb(248,167,1)" rx="2" ry="2" />
2090 | <text  x="446.66" y="431.5" ></text>
2091 | </g>
2092 | <g >
2093 | <title>__gnu_cxx::__exchange_and_add (44 samples, 0.85%)</title><rect x="209.2" y="421" width="10.0" height="15.0" fill="rgb(216,149,45)" rx="2" ry="2" />
2094 | <text  x="212.21" y="431.5" ></text>
2095 | </g>
2096 | <g >
2097 | <title>std::__distance&lt;long long const*&gt; (4 samples, 0.08%)</title><rect x="669.0" y="469" width="0.9" height="15.0" fill="rgb(211,35,41)" rx="2" ry="2" />
2098 | <text  x="672.02" y="479.5" ></text>
2099 | </g>
2100 | <g >
2101 | <title>[unknown] (2 samples, 0.04%)</title><rect x="834.8" y="421" width="0.4" height="15.0" fill="rgb(235,123,13)" rx="2" ry="2" />
2102 | <text  x="837.79" y="431.5" ></text>
2103 | </g>
2104 | <g >
2105 | <title>[unknown] (1 samples, 0.02%)</title><rect x="835.0" y="405" width="0.2" height="15.0" fill="rgb(225,90,20)" rx="2" ry="2" />
2106 | <text  x="838.02" y="415.5" ></text>
2107 | </g>
2108 | <g >
2109 | <title>std::_Sp_counted_ptr&lt;vtkm::cont::internal::ArrayHandleImpl*,  (15 samples, 0.29%)</title><rect x="677.9" y="437" width="3.4" height="15.0" fill="rgb(233,152,16)" rx="2" ry="2" />
2110 | <text  x="680.89" y="447.5" ></text>
2111 | </g>
2112 | <g >
2113 | <title>std::distance&lt;long long const*&gt; (1 samples, 0.02%)</title><rect x="663.8" y="485" width="0.2" height="15.0" fill="rgb(218,191,41)" rx="2" ry="2" />
2114 | <text  x="666.79" y="495.5" ></text>
2115 | </g>
2116 | <g >
2117 | <title>vtkm::cont::Token::DetachFromAll (137 samples, 2.64%)</title><rect x="423.4" y="485" width="31.2" height="15.0" fill="rgb(212,214,26)" rx="2" ry="2" />
2118 | <text  x="426.42" y="495.5" >vt..</text>
2119 | </g>
2120 | <g >
2121 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (34 samples, 0.66%)</title><rect x="352.2" y="453" width="7.8" height="15.0" fill="rgb(231,106,46)" rx="2" ry="2" />
2122 | <text  x="355.24" y="463.5" ></text>
2123 | </g>
2124 | <g >
2125 | <title>vtkm::cont::internal::ArrayHandleImpl::SyncControlArray (141 samples, 2.72%)</title><rect x="327.9" y="485" width="32.1" height="15.0" fill="rgb(231,120,8)" rx="2" ry="2" />
2126 | <text  x="330.91" y="495.5" >vt..</text>
2127 | </g>
2128 | <g >
2129 | <title>vtkm::cont::internal::ArrayHandleImpl::SyncControlArray (3 samples, 0.06%)</title><rect x="561.9" y="501" width="0.7" height="15.0" fill="rgb(249,85,49)" rx="2" ry="2" />
2130 | <text  x="564.91" y="511.5" ></text>
2131 | </g>
2132 | <g >
2133 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::GetNumberOfValues (2 samples, 0.04%)</title><rect x="790.9" y="405" width="0.5" height="15.0" fill="rgb(244,59,54)" rx="2" ry="2" />
2134 | <text  x="793.91" y="415.5" ></text>
2135 | </g>
2136 | <g >
2137 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="629" width="0.7" height="15.0" fill="rgb(214,96,0)" rx="2" ry="2" />
2138 | <text  x="13.23" y="639.5" ></text>
2139 | </g>
2140 | <g >
2141 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::GetControlArrayValidPointer (226 samples, 4.36%)</title><rect x="509.2" y="501" width="51.3" height="15.0" fill="rgb(249,151,24)" rx="2" ry="2" />
2142 | <text  x="512.15" y="511.5" >vtkm:..</text>
2143 | </g>
2144 | <g >
2145 | <title>_dl_relocate_object (1 samples, 0.02%)</title><rect x="1112.2" y="597" width="0.3" height="15.0" fill="rgb(211,222,39)" rx="2" ry="2" />
2146 | <text  x="1115.23" y="607.5" ></text>
2147 | </g>
2148 | <g >
2149 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="469" width="1.3" height="15.0" fill="rgb(240,213,30)" rx="2" ry="2" />
2150 | <text  x="337.05" y="479.5" ></text>
2151 | </g>
2152 | <g >
2153 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::GetNumberOfValues (4 samples, 0.08%)</title><rect x="794.5" y="389" width="1.0" height="15.0" fill="rgb(227,94,48)" rx="2" ry="2" />
2154 | <text  x="797.54" y="399.5" ></text>
2155 | </g>
2156 | <g >
2157 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="613" width="0.7" height="15.0" fill="rgb(246,114,19)" rx="2" ry="2" />
2158 | <text  x="13.23" y="623.5" ></text>
2159 | </g>
2160 | <g >
2161 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="405" width="1.3" height="15.0" fill="rgb(232,133,22)" rx="2" ry="2" />
2162 | <text  x="337.05" y="415.5" ></text>
2163 | </g>
2164 | <g >
2165 | <title>_dl_lookup_symbol_x (1 samples, 0.02%)</title><rect x="10.0" y="613" width="0.2" height="15.0" fill="rgb(211,206,43)" rx="2" ry="2" />
2166 | <text  x="13.00" y="623.5" ></text>
2167 | </g>
2168 | <g >
2169 | <title>std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_create@plt (7 samples, 0.13%)</title><rect x="1147.9" y="629" width="1.6" height="15.0" fill="rgb(205,223,50)" rx="2" ry="2" />
2170 | <text  x="1150.93" y="639.5" ></text>
2171 | </g>
2172 | <g >
2173 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="277" width="0.2" height="15.0" fill="rgb(217,189,17)" rx="2" ry="2" />
2174 | <text  x="742.51" y="287.5" ></text>
2175 | </g>
2176 | <g >
2177 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt;::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt; (7 samples, 0.13%)</title><rect x="1022.9" y="517" width="1.5" height="15.0" fill="rgb(208,16,14)" rx="2" ry="2" />
2178 | <text  x="1025.86" y="527.5" ></text>
2179 | </g>
2180 | <g >
2181 | <title>[unknown] (5 samples, 0.10%)</title><rect x="1188.9" y="437" width="1.1" height="15.0" fill="rgb(254,209,53)" rx="2" ry="2" />
2182 | <text  x="1191.86" y="447.5" ></text>
2183 | </g>
2184 | <g >
2185 | <title>[unknown] (2 samples, 0.04%)</title><rect x="892.8" y="277" width="0.4" height="15.0" fill="rgb(237,215,51)" rx="2" ry="2" />
2186 | <text  x="895.78" y="287.5" ></text>
2187 | </g>
2188 | <g >
2189 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl,  (8 samples, 0.15%)</title><rect x="133.9" y="485" width="1.9" height="15.0" fill="rgb(234,226,0)" rx="2" ry="2" />
2190 | <text  x="136.94" y="495.5" ></text>
2191 | </g>
2192 | <g >
2193 | <title>[unknown] (1 samples, 0.02%)</title><rect x="739.5" y="261" width="0.2" height="15.0" fill="rgb(209,15,2)" rx="2" ry="2" />
2194 | <text  x="742.51" y="271.5" ></text>
2195 | </g>
2196 | <g >
2197 | <title>[unknown] (10 samples, 0.19%)</title><rect x="679.0" y="117" width="2.3" height="15.0" fill="rgb(205,122,15)" rx="2" ry="2" />
2198 | <text  x="682.02" y="127.5" ></text>
2199 | </g>
2200 | <g >
2201 | <title>[unknown] (6 samples, 0.12%)</title><rect x="1188.6" y="629" width="1.4" height="15.0" fill="rgb(247,160,23)" rx="2" ry="2" />
2202 | <text  x="1191.64" y="639.5" ></text>
2203 | </g>
2204 | <g >
2205 | <title>[unknown] (15 samples, 0.29%)</title><rect x="677.9" y="229" width="3.4" height="15.0" fill="rgb(213,19,29)" rx="2" ry="2" />
2206 | <text  x="680.89" y="239.5" ></text>
2207 | </g>
2208 | <g >
2209 | <title>std::shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct&gt;::~shared_ptr (15 samples, 0.29%)</title><rect x="677.9" y="405" width="3.4" height="15.0" fill="rgb(228,41,41)" rx="2" ry="2" />
2210 | <text  x="680.89" y="415.5" ></text>
2211 | </g>
2212 | <g >
2213 | <title>std::mutex::unlock (5 samples, 0.10%)</title><rect x="224.4" y="485" width="1.2" height="15.0" fill="rgb(224,26,7)" rx="2" ry="2" />
2214 | <text  x="227.44" y="495.5" ></text>
2215 | </g>
2216 | <g >
2217 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl,  (15 samples, 0.29%)</title><rect x="677.9" y="485" width="3.4" height="15.0" fill="rgb(206,85,7)" rx="2" ry="2" />
2218 | <text  x="680.89" y="495.5" ></text>
2219 | </g>
2220 | <g >
2221 | <title>std::__shared_ptr_access&lt;bool,  (2 samples, 0.04%)</title><rect x="459.8" y="485" width="0.5" height="15.0" fill="rgb(245,1,14)" rx="2" ry="2" />
2222 | <text  x="462.81" y="495.5" ></text>
2223 | </g>
2224 | <g >
2225 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl,  (69 samples, 1.33%)</title><rect x="144.4" y="469" width="15.7" height="15.0" fill="rgb(244,130,1)" rx="2" ry="2" />
2226 | <text  x="147.40" y="479.5" ></text>
2227 | </g>
2228 | <g >
2229 | <title>__gthread_mutex_lock (1 samples, 0.02%)</title><rect x="298.3" y="437" width="0.3" height="15.0" fill="rgb(253,165,33)" rx="2" ry="2" />
2230 | <text  x="301.35" y="447.5" ></text>
2231 | </g>
2232 | <g >
2233 | <title>std::__get_helper&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (19 samples, 0.37%)</title><rect x="447.8" y="405" width="4.3" height="15.0" fill="rgb(209,102,11)" rx="2" ry="2" />
2234 | <text  x="450.75" y="415.5" ></text>
2235 | </g>
2236 | <g >
2237 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl,  (30 samples, 0.58%)</title><rect x="319.0" y="469" width="6.9" height="15.0" fill="rgb(225,50,40)" rx="2" ry="2" />
2238 | <text  x="322.04" y="479.5" ></text>
2239 | </g>
2240 | <g >
2241 | <title>std::_Sp_counted_base&lt; (4 samples, 0.08%)</title><rect x="525.8" y="453" width="0.9" height="15.0" fill="rgb(241,37,31)" rx="2" ry="2" />
2242 | <text  x="528.75" y="463.5" ></text>
2243 | </g>
2244 | <g >
2245 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long*, void&gt;::IteratorAt (18 samples, 0.35%)</title><rect x="791.4" y="405" width="4.1" height="15.0" fill="rgb(245,90,37)" rx="2" ry="2" />
2246 | <text  x="794.36" y="415.5" ></text>
2247 | </g>
2248 | <g >
2249 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool*, void&gt;::GetNumberOfValues (2 samples, 0.04%)</title><rect x="954.9" y="405" width="0.4" height="15.0" fill="rgb(211,221,29)" rx="2" ry="2" />
2250 | <text  x="957.86" y="415.5" ></text>
2251 | </g>
2252 | <g >
2253 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="373" width="1.3" height="15.0" fill="rgb(226,220,3)" rx="2" ry="2" />
2254 | <text  x="337.05" y="383.5" ></text>
2255 | </g>
2256 | <g >
2257 | <title>__gthread_active_p (8 samples, 0.15%)</title><rect x="301.1" y="405" width="1.8" height="15.0" fill="rgb(247,4,43)" rx="2" ry="2" />
2258 | <text  x="304.08" y="415.5" ></text>
2259 | </g>
2260 | <g >
2261 | <title>__gthread_active_p (6 samples, 0.12%)</title><rect x="219.2" y="421" width="1.4" height="15.0" fill="rgb(236,98,8)" rx="2" ry="2" />
2262 | <text  x="222.21" y="431.5" ></text>
2263 | </g>
2264 | <g >
2265 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;::GetNumberOfValues (2 samples, 0.04%)</title><rect x="1041.3" y="485" width="0.4" height="15.0" fill="rgb(252,79,52)" rx="2" ry="2" />
2266 | <text  x="1044.28" y="495.5" ></text>
2267 | </g>
2268 | <g >
2269 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::IsControlArrayValid (6 samples, 0.12%)</title><rect x="326.5" y="485" width="1.4" height="15.0" fill="rgb(230,16,30)" rx="2" ry="2" />
2270 | <text  x="329.55" y="495.5" ></text>
2271 | </g>
2272 | <g >
2273 | <title>[unknown] (1 samples, 0.02%)</title><rect x="1049.5" y="469" width="0.2" height="15.0" fill="rgb(213,40,24)" rx="2" ry="2" />
2274 | <text  x="1052.46" y="479.5" ></text>
2275 | </g>
2276 | <g >
2277 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (76 samples, 1.46%)</title><rect x="167.8" y="485" width="17.3" height="15.0" fill="rgb(248,69,0)" rx="2" ry="2" />
2278 | <text  x="170.82" y="495.5" ></text>
2279 | </g>
2280 | <g >
2281 | <title>std::forward&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt; (4 samples, 0.08%)</title><rect x="187.1" y="501" width="1.0" height="15.0" fill="rgb(233,191,26)" rx="2" ry="2" />
2282 | <text  x="190.15" y="511.5" ></text>
2283 | </g>
2284 | <g >
2285 | <title>std::__shared_ptr_access&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (25 samples, 0.48%)</title><rect x="336.6" y="453" width="5.6" height="15.0" fill="rgb(209,113,14)" rx="2" ry="2" />
2286 | <text  x="339.55" y="463.5" ></text>
2287 | </g>
2288 | <g >
2289 | <title>std::__iterator_category&lt;long long const*&gt; (2 samples, 0.04%)</title><rect x="669.9" y="469" width="0.5" height="15.0" fill="rgb(244,180,46)" rx="2" ry="2" />
2290 | <text  x="672.93" y="479.5" ></text>
2291 | </g>
2292 | <g >
2293 | <title>vtkm::exec::serial::internal::FunctorTiling3DExecute&lt;vtkm::cont::testing::TestingDeviceAdapter&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::OverlapKernel&gt; (227 samples, 4.37%)</title><rect x="934.2" y="469" width="51.6" height="15.0" fill="rgb(235,167,41)" rx="2" ry="2" />
2294 | <text  x="937.17" y="479.5" >vtkm:..</text>
2295 | </g>
2296 | <g >
2297 | <title>[unknown] (33 samples, 0.64%)</title><rect x="885.7" y="357" width="7.5" height="15.0" fill="rgb(212,148,43)" rx="2" ry="2" />
2298 | <text  x="888.73" y="367.5" ></text>
2299 | </g>
2300 | <g >
2301 | <title>__strlen_avx2 (55 samples, 1.06%)</title><rect x="1099.7" y="629" width="12.5" height="15.0" fill="rgb(207,74,5)" rx="2" ry="2" />
2302 | <text  x="1102.72" y="639.5" ></text>
2303 | </g>
2304 | <g >
2305 | <title>vtkm::cont::DeviceAdapterAlgorithm&lt;vtkm::cont::DeviceAdapterTagSerial&gt;::ScheduleTask (421 samples, 8.11%)</title><rect x="835.2" y="501" width="95.8" height="15.0" fill="rgb(208,86,3)" rx="2" ry="2" />
2306 | <text  x="838.25" y="511.5" >vtkm::cont:..</text>
2307 | </g>
2308 | <g >
2309 | <title>[unknown] (6 samples, 0.12%)</title><rect x="334.1" y="389" width="1.3" height="15.0" fill="rgb(213,50,20)" rx="2" ry="2" />
2310 | <text  x="337.05" y="399.5" ></text>
2311 | </g>
2312 | <g >
2313 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt;::GetNumberOfValues (15 samples, 0.29%)</title><rect x="1019.4" y="469" width="3.5" height="15.0" fill="rgb(224,133,43)" rx="2" ry="2" />
2314 | <text  x="1022.45" y="479.5" ></text>
2315 | </g>
2316 | <g >
2317 | <title>vtkm::cont::ArrayHandle&lt;long long, vtkm::cont::StorageTagBasic&gt;::~ArrayHandle (15 samples, 0.29%)</title><rect x="677.9" y="517" width="3.4" height="15.0" fill="rgb(220,217,54)" rx="2" ry="2" />
2318 | <text  x="680.89" y="527.5" ></text>
2319 | </g>
2320 | <g >
2321 | <title>vtkm::exec::serial::internal::TaskTiling3D::operator (166 samples, 3.20%)</title><rect x="893.2" y="485" width="37.8" height="15.0" fill="rgb(237,27,4)" rx="2" ry="2" />
2322 | <text  x="896.24" y="495.5" >vtk..</text>
2323 | </g>
2324 | <g >
2325 | <title>vtkm::cont::internal::ArrayPortalFromIterators&lt;bool const*, void&gt;::IteratorAt (32 samples, 0.62%)</title><rect x="1015.6" y="485" width="7.3" height="15.0" fill="rgb(233,116,46)" rx="2" ry="2" />
2326 | <text  x="1018.58" y="495.5" ></text>
2327 | </g>
2328 | <g >
2329 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (47 samples, 0.91%)</title><rect x="484.4" y="469" width="10.7" height="15.0" fill="rgb(252,184,24)" rx="2" ry="2" />
2330 | <text  x="487.37" y="479.5" ></text>
2331 | </g>
2332 | <g >
2333 | <title>std::_Tuple_impl&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_head (18 samples, 0.35%)</title><rect x="417.7" y="421" width="4.1" height="15.0" fill="rgb(223,172,16)" rx="2" ry="2" />
2334 | <text  x="420.74" y="431.5" ></text>
2335 | </g>
2336 | <g >
2337 | <title>[unknown] (1 samples, 0.02%)</title><rect x="835.0" y="357" width="0.2" height="15.0" fill="rgb(252,149,6)" rx="2" ry="2" />
2338 | <text  x="838.02" y="367.5" ></text>
2339 | </g>
2340 | <g >
2341 | <title>std::__shared_count&lt; (118 samples, 2.27%)</title><rect x="194.4" y="469" width="26.9" height="15.0" fill="rgb(232,81,38)" rx="2" ry="2" />
2342 | <text  x="197.42" y="479.5" >s..</text>
2343 | </g>
2344 | <g >
2345 | <title>std::_Tuple_impl&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt;::_M_head (4 samples, 0.08%)</title><rect x="414.6" y="437" width="0.9" height="15.0" fill="rgb(209,16,37)" rx="2" ry="2" />
2346 | <text  x="417.55" y="447.5" ></text>
2347 | </g>
2348 | <g >
2349 | <title>[unknown] (3 samples, 0.06%)</title><rect x="10.2" y="597" width="0.7" height="15.0" fill="rgb(235,138,17)" rx="2" ry="2" />
2350 | <text  x="13.23" y="607.5" ></text>
2351 | </g>
2352 | <g >
2353 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (2 samples, 0.04%)</title><rect x="595.3" y="421" width="0.5" height="15.0" fill="rgb(211,11,30)" rx="2" ry="2" />
2354 | <text  x="598.34" y="431.5" ></text>
2355 | </g>
2356 | <g >
2357 | <title>vtkm::cont::internal::ArrayPortalCheck&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt; &gt;::Get&lt;vtkm::cont::internal::ArrayPortalFromIterators&lt;long long const*, void&gt;, 0&gt; (5 samples, 0.10%)</title><rect x="51.6" y="533" width="1.2" height="15.0" fill="rgb(237,132,41)" rx="2" ry="2" />
2358 | <text  x="54.61" y="543.5" ></text>
2359 | </g>
2360 | <g >
2361 | <title>std::__shared_ptr_access&lt;bool,  (14 samples, 0.27%)</title><rect x="1027.2" y="485" width="3.2" height="15.0" fill="rgb(224,69,31)" rx="2" ry="2" />
2362 | <text  x="1030.18" y="495.5" ></text>
2363 | </g>
2364 | <g >
2365 | <title>__gthread_active_p (2 samples, 0.04%)</title><rect x="654.7" y="421" width="0.4" height="15.0" fill="rgb(211,147,13)" rx="2" ry="2" />
2366 | <text  x="657.69" y="431.5" ></text>
2367 | </g>
2368 | <g >
2369 | <title>__gthread_active_p (4 samples, 0.08%)</title><rect x="551.7" y="405" width="0.9" height="15.0" fill="rgb(223,19,44)" rx="2" ry="2" />
2370 | <text  x="554.68" y="415.5" ></text>
2371 | </g>
2372 | <g >
2373 | <title>std::__shared_ptr&lt;vtkm::cont::internal::ArrayHandleImpl::InternalStruct,  (15 samples, 0.29%)</title><rect x="677.9" y="389" width="3.4" height="15.0" fill="rgb(245,74,18)" rx="2" ry="2" />
2374 | <text  x="680.89" y="399.5" ></text>
2375 | </g>
2376 | <g >
2377 | <title>UnitTestSerialDeviceAdapter (4,752 samples, 91.58%)</title><rect x="10.9" y="597" width="1080.6" height="15.0" fill="rgb(219,227,41)" rx="2" ry="2" />
2378 | <text  x="13.91" y="607.5" >UnitTestSerialDeviceAdapter</text>
2379 | </g>
2380 | <g >
2381 | <title>[unknown] (5 samples, 0.10%)</title><rect x="1188.9" y="469" width="1.1" height="15.0" fill="rgb(242,23,52)" rx="2" ry="2" />
2382 | <text  x="1191.86" y="479.5" ></text>
2383 | </g>
2384 | <g >
2385 | <title>vtkm::cont::internal::ArrayHandleImpl::InternalStruct::CheckLock (3 samples, 0.06%)</title><rect x="342.2" y="469" width="0.7" height="15.0" fill="rgb(247,227,43)" rx="2" ry="2" />
2386 | <text  x="345.24" y="479.5" ></text>
2387 | </g>
2388 | <g >
2389 | <title>[unknown] (1 samples, 0.02%)</title><rect x="525.5" y="373" width="0.3" height="15.0" fill="rgb(236,189,49)" rx="2" ry="2" />
2390 | <text  x="528.53" y="383.5" ></text>
2391 | </g>
2392 | <g >
2393 | <title>std::get&lt;0ul, vtkm::cont::Token::InternalStruct*, std::default_delete&lt;vtkm::cont::Token::InternalStruct&gt; &gt; (11 samples, 0.21%)</title><rect x="452.1" y="437" width="2.5" height="15.0" fill="rgb(240,140,38)" rx="2" ry="2" />
2394 | <text  x="455.07" y="447.5" ></text>
2395 | </g>
2396 | <g >
2397 | <title>__gnu_cxx::__exchange_and_add (4 samples, 0.08%)</title><rect x="205.8" y="437" width="0.9" height="15.0" fill="rgb(242,121,22)" rx="2" ry="2" />
2398 | <text  x="208.79" y="447.5" ></text>
2399 | </g>
2400 | <g >
2401 | <title>std::mutex::unlock (76 samples, 1.46%)</title><rect x="230.1" y="469" width="17.3" height="15.0" fill="rgb(210,159,4)" rx="2" ry="2" />
2402 | <text  x="233.13" y="479.5" ></text>
2403 | </g>
2404 | </g>
2405 | </svg>
2406 | 


--------------------------------------------------------------------------------
/figures/ymm_dot_product.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NAThompson/performance_tuning_tutorial/09f9ecb12fbe6fe50a64696cb5601e89a2058dd1/figures/ymm_dot_product.png


--------------------------------------------------------------------------------
/perf_permissions.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Taken from Milian Wolf's talk "Linux perf for Qt developers"
 4 | sudo mount -o remount,mode=755 /sys/kernel/debug
 5 | sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
 6 | echo "0" | sudo tee /proc/sys/kernel/kptr_restrict
 7 | echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid
 8 | sudo chown `whoami` /sys/kernel/debug/tracing/uprobe_events
 9 | sudo chmod a+rw /sys/kernel/debug/tracing/uprobe_events
10 | 


--------------------------------------------------------------------------------
/src/decent_code.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include <vector>
 3 | #include <cstdlib>
 4 | #include <cmath>
 5 | 
 6 | double dot_product(double* a, double* b, long long n) {
 7 |    double c[4];
 8 |     c[0] = 0;
 9 |     c[1] = 0;
10 |     c[2] = 0;
11 |     c[3] = 0;
12 |     long long m = n/4;
13 |     for (long long i = 0; i < 4*m; i += 4) {
14 |         c[0] = std::fma(a[i], b[i], c[0]);
15 |         c[1] = std::fma(a[i+1], b[i+1], c[1]);
16 |         c[2] = std::fma(a[i+2], b[i+2], c[2]);
17 |         c[3] = std::fma(a[i+3], b[i+3], c[3]);
18 |     }
19 |     double d = c[0] + c[1] + c[2] + c[3];
20 |     for (long long i = 4*m; i < n; ++i) {
21 |         d = std::fma(a[i], b[i], d);
22 |     }
23 |     return d;
24 | }
25 | 
26 | int main(int argc, char** argv) {
27 |     if (argc != 2) {
28 |         std::cerr << "Usage: ./dot 10\n";
29 |         return 1;
30 |     }
31 |     size_t n = atoi(argv[1]);
32 | 
33 |     std::vector<double> a(n);
34 |     std::vector<double> b(n);
35 |     double scale = 1.0/n;
36 |     for (size_t i = 0; i < n; ++i) {
37 |         a[i] = 1.0*scale;
38 |         b[i] = 1.0;
39 |     }
40 | 
41 |     double d;
42 |     for (int i = 0; i < 100; ++i) {
43 |         asm volatile ("" ::: "memory");
44 |         d = dot_product(a.data(), b.data(), n);
45 |     }
46 |     std::cout << "a.b = " << d << "\n";
47 | }
48 | 


--------------------------------------------------------------------------------
/src/dot.asm:
--------------------------------------------------------------------------------
 1 | global dot
 2 | 
 3 | section .text
 4 | dot: ; Signature: dot(double* a, double* b, int64_t n), so a = rdi, b = rsi, n = rdx
 5 |     ; d = 0
 6 |     ; vxorpd xmm0, xmm0, xmm0
 7 |     vzeroall
 8 |     ; If n <= 0:
 9 |     test rdx,rdx
10 |     jle dot.bye
11 |     ; Now n > 0, so the sum is non-empty:
12 |     ; If there are < 4 elements, we cannot use AVX-256 instructions:
13 |     xor rcx, rcx
14 |     xor rax, rax
15 |     cmp rdx, 4
16 |     jnae dot.small_vec
17 |     ; rax will store n/4:
18 |     mov rax, rdx
19 |     and rax, -4
20 | 
21 | dot.large_vec:
22 |     vmovupd ymm1,yword [rdi + 8*rcx] ; move a[i], a[i+1], a[i+2], a[i+3] into ymm1
23 |     vmovupd ymm2,yword [rsi + 8*rcx] ; move b[i], b[i+1], b[i+2], b[i+3] into ymm2
24 |     vfmadd231pd ymm0, ymm1, ymm2 ;
25 |     add rcx, 4
26 |     cmp rax, rcx
27 |     jne dot.large_vec
28 |     ; now ymm0 has 4 parts of the dot product. Add each one into xmm0:
29 |     ; First, take [c0, c1, c2, c3] -> [c1 + c0, c0 + c1, c2 + c3, c2 + c3]
30 |     vhaddpd ymm0, ymm0, ymm0
31 |     ; Now extract the upper 128 bits out of the ymm0 register and place them in xmm1:
32 |     vextractf128 xmm1, ymm0, 1
33 |     ; And add:
34 |     addpd xmm0, xmm1
35 |     ; If the length of the vector is a multiple of 4, we don't need to do a small_vec cleanup:
36 |     cmp rax, rdx
37 |     je dot.bye
38 | 
39 |     mov rax, rcx
40 | 
41 | dot.small_vec:
42 |     vmovsd xmm1, qword [rdi + 8*rax] ; mov a[i] into xmm1
43 |     vfmadd231sd xmm0, xmm1, qword [rsi + 8*rax] ; d = fma(a[i], b[i], d)
44 |     inc rax
45 |     cmp rdx, rax
46 |     jae dot.small_vec
47 | 
48 | dot.bye:
49 |     vzeroupper ; clear 128 upper bits of ymm registers while preserving xmm portions of register
50 |     ret
51 | 


--------------------------------------------------------------------------------
/src/mwe.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include <vector>
 3 | #include <cstdlib>
 4 | 
 5 | double dot_product(double* a, double* b, size_t n) {
 6 |     double d = 0;
 7 |     for (size_t i = 0; i < n; ++i) {
 8 |         d += a[i]*b[i];
 9 |     }
10 |     return d;
11 | }
12 | 
13 | int main(int argc, char** argv) {
14 |     if (argc != 2) {
15 |         std::cerr << "Usage: ./dot 10\n";
16 |         return 1;
17 |     }
18 |     size_t n = atoi(argv[1]);
19 | 
20 |     std::vector<double> a(n);
21 |     std::vector<double> b(n);
22 |     for (size_t i = 0; i < n; ++i) {
23 |         a[i] = i;
24 |         b[i] = 1/double(i+3);
25 |     }
26 | 
27 |     double d = dot_product(a.data(), b.data(), n);
28 |     std::cout << "a.b = " << d << "\n";
29 | }
30 | 


--------------------------------------------------------------------------------
/src/use_asm.cpp:
--------------------------------------------------------------------------------
 1 | #include <iostream>
 2 | #include <cstdlib>
 3 | 
 4 | extern "C" double dot(double* a, double* b, long long n);
 5 | 
 6 | int main(int argc, char** argv) {
 7 |     if (argc != 2) {
 8 |         std::cerr << "Usage: ./dot 10\n";
 9 |         return 1;
10 |     }
11 |     size_t n = atoi(argv[1]);
12 | 
13 |     double* a = (double*) malloc(n*sizeof(double));
14 |     double* b = (double*) malloc(n*sizeof(double));
15 |     for (size_t i = 0; i < n; ++i) {
16 |         a[i] = i;
17 |         b[i] = 1.0/n;
18 |     }
19 | 
20 |     double d;
21 |     for (int i = 0; i < 100; ++i) {
22 | 	asm volatile ("" ::: "memory");
23 |         d = dot(a, b, n);
24 |     }
25 |     std::cout << "a.b = " << d << "\n";
26 | }
27 | 


--------------------------------------------------------------------------------