├── README.md ├── crunch_mpiGraph ├── hostlist_lite.pm ├── makefile └── mpiGraph.c /README.md: -------------------------------------------------------------------------------- 1 | # mpiGraph 2 | Benchmark to generate network bandwidth images 3 | 4 | ## Build 5 | make 6 | 7 | ## Run 8 | Run one MPI task per node: 9 | 10 | SLURM: srun -n -N ./mpiGraph 1048576 10 10 > mpiGraph.out 11 | Open MPI: mpirun --map-by node -np ./mpiGraph 1048576 10 10 > mpiGraph.out 12 | 13 | General usage: 14 | 15 | mpiGraph 16 | 17 | To compute bandwidth, each task averages the bandwidth from *iters* iterations. 18 | In each iteration, a process sends *window* number of messages of *size* bytes to another process 19 | while it simultaneously receives an equal number of messages of equal size from another process. 20 | The source and destination processes in each step are not necessary the same process. 21 | 22 | Watch progress: 23 | 24 | tail -f mpiGraph.out 25 | 26 | ## Results 27 | Parse output and create html report: 28 | 29 | crunch_mpiGraph mpiGraph.out 30 | 31 | View results in a web browser: 32 | 33 | firefox file:///path/to/mpiGraph.out_html/index.html 34 | 35 | # Description 36 | 37 | This package consists of an MPI application called "mpiGraph" written in C 38 | to measure message bandwidth and an associated "crunch_mpigraph" 39 | script written in Perl to parse the application output a generate an HTML 40 | report. The mpiGraph application is designed to inspect the health 41 | and scalability of a high-performance interconnect while subjecting it 42 | to heavy load. This is useful to detect hardware and software 43 | problems in a system, such as slow nodes, links, switches, or 44 | contention in switch routing. It is also useful to characterize how 45 | interconnect performance changes with different settings or how one 46 | interconnect type compares to another. 47 | 48 | Typically, one MPI task is run per node (or per interconnect link). 49 | For a job of N MPI tasks, the N tasks are logically arranged in a ring 50 | counting ranks from 0 and increasing to the right with the end 51 | wrapping back to rank 0. Then a series of N-1 steps are executed. 52 | In each step, each MPI task sends to the task D units to the right and 53 | simultaneously receives from the task D units to the left. The value 54 | of D starts at 1 and runs to N-1, so that by the end of the N-1 steps, 55 | each task has sent to and received from every other task in the run, 56 | excluding itself. At the end of the run, two NxN matrices of 57 | bandwidths are gathered and written to stdout -- one for send 58 | bandwidths and one for receive bandwidths. 59 | 60 | The crunch_mpiGraph script is then run on this output to generate a 61 | report. It includes a pair of bitmap images 62 | representing bandwidth values between different task pairings. 63 | Pixels in this image are colored depending on relative bandwidth 64 | values. The maximum bandwidth value is set to pure white (value 65 | 255) and other values are scaled to black (0) depending on their 66 | percentage of the maximum. One can then visually inspect and identify anomalous 67 | behavior in the system. One may zoom in and inspect image 68 | features in more detail by hovering the mouse cursor over the image. 69 | Javascript embedded in the HTML report opens a pop-up tooltip with a 70 | zoomed-in view of the cursor location. 71 | 72 | ## References 73 | [Contention-free Routing for Shift-based Communication in MPI Applications on Large-scale Infiniband Clusters](https://www.osti.gov/biblio/967277), Adam Moody, LLNL-TR-418522, Oct 2009 74 | -------------------------------------------------------------------------------- /crunch_mpiGraph: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | 3 | # Copyright (c) 2007, Lawrence Livermore National Security (LLNS), LLC 4 | # Produced at the Lawrence Livermore National Laboratory (LLNL) 5 | # Written by Adam Moody . 6 | # UCRL-CODE-232117. 7 | # All rights reserved. 8 | # 9 | # This file is part of mpiGraph. For details, see 10 | # http://www.sourceforge.net/projects/mpigraph 11 | # Please also read the Additional BSD Notice below. 12 | # 13 | # Redistribution and use in source and binary forms, with or without modification, 14 | # are permitted provided that the following conditions are met: 15 | # * Redistributions of source code must retain the above copyright notice, this 16 | # list of conditions and the disclaimer below. 17 | # * Redistributions in binary form must reproduce the above copyright notice, 18 | # this list of conditions and the disclaimer (as noted below) in the documentation 19 | # and/or other materials provided with the distribution. 20 | # * Neither the name of the LLNL nor the names of its contributors may be used to 21 | # endorse or promote products derived from this software without specific prior 22 | # written permission. 23 | # * 24 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 25 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 26 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 27 | # IN NO EVENT SHALL LLNL, THE U.S. DEPARTMENT 28 | # OF ENERGY OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 29 | # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 30 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 31 | # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 32 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 33 | # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 34 | # 35 | # Additional BSD Notice 36 | # 1. This notice is required to be provided under our contract with the U.S. Department 37 | # of Energy (DOE). This work was produced at LLNL under Contract No. W-7405-ENG-48 38 | # with the DOE. 39 | # 2. Neither the United States Government nor LLNL nor any of their employees, makes 40 | # any warranty, express or implied, or assumes any liability or responsibility for 41 | # the accuracy, completeness, or usefulness of any information, apparatus, product, 42 | # or process disclosed, or represents that its use would not infringe privately-owned 43 | # rights. 44 | # 3. Also, reference herein to any specific commercial products, process, or services 45 | # by trade name, trademark, manufacturer or otherwise does not necessarily constitute 46 | # or imply its endorsement, recommendation, or favoring by the United States Government 47 | # or LLNL. The views and opinions of authors expressed herein do not necessarily state 48 | # or reflect those of the United States Government or LLNL and shall not be used for 49 | # advertising or product endorsement purposes. 50 | 51 | # mpiGraph.c ======================================================================== 52 | # 53 | # The crunch_mpiGraph script is then run on this output to generate a report. A key 54 | # component in this report are a pair of bitmap images representing bandwidth values 55 | # between different task pairings. Pixels in this image are colored depending on 56 | # relative bandwidth values. The maximum bandwidth value is set to pure white (value 57 | # 255) and other values are scaled to black (0) depending on their percentage of the 58 | # maximum. Interesting patterns, or the lack thereof, make it easy to visually inspect 59 | # and identify anomalous behavior in the system. One may then zoom in and inspect 60 | # image features in more detail by hovering the mouse cursor over the feature. 61 | # Javascript embedded in the HTML report manages a pop-up tooltip with a zoomed-in 62 | # view of the cursor location. 63 | 64 | use FindBin; 65 | use lib "$FindBin::Bin/."; 66 | use hostlist_lite qw(compress); 67 | 68 | # print usage then exit 69 | sub usage 70 | { 71 | my $exit_code = shift @_; 72 | print "\n"; 73 | print " crunch_mpiGraph -- process mpiGraph output into images or sort links by performance\n"; 74 | print "\n"; 75 | print " Usage: crunch_mpiGraph [-z|-nz] [-dist] \n"; 76 | print "\n"; 77 | print " Node-to-node bandwidth options:\n"; 78 | print " -max Set maximum bandwidth value (in MB/sec) to scale by\n"; 79 | print " -z add javascript to html for zooming\n"; 80 | print " This may produce very large index.html files, but it's very useful for inspecting detailed features\n"; 81 | print " -nz Disable javascript zoom in report\n"; 82 | print " -dist set horizontal index to distance away from node in a given row, rather than MPI rank, so that\n"; 83 | print " data taken on different nodes at the same time are aligned into columns\n"; 84 | print "\n"; 85 | exit $exit_code; 86 | } 87 | 88 | # print usage if no arguments given on command line 89 | if (not @ARGV) { usage(1); } 90 | 91 | my @imagerows = (); 92 | 93 | # set up default parameters and read command line arguments 94 | $mode = "html"; 95 | $strip_mpi_rank = 1; 96 | $maxbw = 0; 97 | my %args = ( 98 | dist => 0, 99 | zoom => 1, 100 | ); 101 | while (@ARGV) { 102 | $arg = shift @ARGV; 103 | if ($arg =~ /^-/) { 104 | if ($arg =~ /^-max/) { $maxbw = shift @ARGV; } 105 | elsif ($arg =~ /^-html/) { $mode = "html"; } 106 | elsif ($arg =~ /^-dist/) { $args{dist} = 1; } 107 | elsif ($arg =~ /^-mpirank/) { $strip_mpi_rank = 0; } 108 | elsif ($arg =~ /^-z/) { $args{zoom} = 1; } 109 | elsif ($arg =~ /^-nz/) { $args{zoom} = 0; } 110 | else { print "Unrecognized option: $arg\n"; usage(1); } 111 | } else { 112 | push @{$args{files}}, $arg; 113 | } 114 | } 115 | 116 | # compute_percentages($base, @values) 117 | sub compute_percentages 118 | { 119 | my $base = shift @_; 120 | my @ret = (); 121 | foreach my $val (@_) { 122 | push @ret, sprintf("%3.1f%", $val / $base * 100); 123 | } 124 | return @ret; 125 | } 126 | 127 | # read in the mpiGraph files, values are averaged if there is more than one file 128 | my %sendtable = (); 129 | my %recvtable = (); 130 | foreach my $file (@{$args{files}}) { 131 | if (not -r $file) { print "File unreadable: $file\n"; exit 0; } 132 | ($msgsize, $times, $window, $testuser, $testtime) = read_file($file, \%sendtable, \%recvtable); 133 | } 134 | my @sendstats = min_max_avg(\%sendtable); 135 | my @recvstats = min_max_avg(\%recvtable); 136 | my @sendpercs = compute_percentages($sendstats[1], @sendstats); 137 | my @recvpercs = compute_percentages($recvstats[1], @recvstats); 138 | 139 | #################### 140 | # NODE-TO-NODE BANDWIDTH REPORT 141 | # Given a mpiGraph output, generate report including bandwidth images. 142 | #################### 143 | 144 | if ($mode eq "html") { 145 | # generate html report 146 | $dir = (@{$args{files}})[0] . "_html"; 147 | mkdir $dir; 148 | 149 | # write out send bitmap 150 | output_table_bitmap(\%sendtable, "$dir/send.bmp", @sendstats); 151 | my $sendimgjs = "var sendimgjs = [\n" . join(",\n", @imagerows) . "\n];\n"; 152 | 153 | # write out recv bitmap 154 | output_table_bitmap(\%recvtable, "$dir/recv.bmp", @recvstats); 155 | my $recvimgjs = "var recvimgjs = [\n" . join(",\n", @imagerows) . "\n];\n"; 156 | 157 | # write out index.html 158 | print "Writing $dir/index.html ...\n"; 159 | open(HTML, ">$dir/index.html"); 160 | print HTML "\n"; 161 | 162 | # write javascript 163 | print HTML "\n"; 296 | 297 | # now write out main body starting with header 298 | print HTML "\n"; 299 | print HTML "

mpiGraph Details

\n"; 300 | print HTML "\n"; 301 | print HTML "\n"; 302 | my ($hostcount, $hoststr) = output_rankmap(\%sendtable, "$dir/map.txt"); 303 | print HTML "\n"; 304 | print HTML "\n"; 305 | my $passwd = `grep ":$testuser:" /etc/passwd`; 306 | my @parts = split(":", $passwd); 307 | print HTML "\n"; 308 | print HTML "\n"; 309 | print HTML "\n"; 310 | print HTML "\n"; 311 | print HTML "
Date:" . localtime($testtime) . "
Nodes:" . $hostcount . "
Node list:" . $hoststr . "
Run by:" . $parts[0] . " (" . $parts[4]. ")
MsgSize:$msgsize
Times:$times
Window:$window

\n"; 312 | print HTML "MPI rank to node mapping
"; 313 | 314 | my ($min, $max, $avg); 315 | 316 | # write out send bandwidth portion 317 | print HTML "

Send Bandwidth

\n"; 318 | print HTML "\n\n"; 319 | ($min, $max, $avg) = @sendstats; 320 | print HTML ""; 321 | ($min, $max, $avg) = @sendpercs; 322 | print HTML ""; 323 | print HTML "\n
min MB/smax MB/savg MB/s
$min$max$avg
$min$max$avg
\n"; 324 | print HTML ""; 325 | print HTML ""; 326 | print HTML "
\n"; 327 | 328 | # write out recv bandwidth portion 329 | print HTML "

Receive Bandwidth

\n"; 330 | print HTML "\n"; 331 | ($min, $max, $avg) = @recvstats; 332 | print HTML ""; 333 | ($min, $max, $avg) = @recvpercs; 334 | print HTML ""; 335 | print HTML "\n
min MB/smax MB/savg MB/s
$min$max$avg
$min$max$avg
\n"; 336 | print HTML "\n"; 337 | print HTML "\n"; 338 | print HTML "
\n"; 339 | 340 | # close index.html 341 | print HTML ""; 342 | close(HTML); 343 | 344 | # create histogram images 345 | `cat $dir/recv.bmp | anytopnm | ppmtopgm | pnmhistmap | pnmtopng > $dir/recv_hist.png`; 346 | `cat $dir/send.bmp | anytopnm | ppmtopgm | pnmhistmap | pnmtopng > $dir/send_hist.png`; 347 | 348 | # write out node performance pages 349 | # output_table(\%sendtable, "$dir/send.html", "$dir/send_rows_cols.html", @sendstats); 350 | # output_table(\%recvtable, "$dir/recv.html", "$dir/recv_rows_cols.html", @recvstats); 351 | 352 | my $pwd = `pwd`; 353 | chomp $pwd; 354 | print "Report complete: firefox file://$pwd/$dir/index.html\n"; 355 | } 356 | 357 | # this reads in an mpiGraph output file and fills sendtable and recvtable 358 | # it returns the parameters used to run the job and the time the file was created 359 | sub read_file 360 | { 361 | my $file = shift @_; 362 | my $sendtable = shift @_; 363 | my $recvtable = shift @_; 364 | 365 | print "Reading $file...\n"; 366 | my $sendflag = 0; 367 | my $recvflag = 0; 368 | open(IN,$file); 369 | while (my $line = ) 370 | { 371 | chomp $line; 372 | 373 | # empty lines indicate end of table 374 | if (not $line) { 375 | $sendflag = 0; 376 | $recvflag = 0; 377 | } 378 | 379 | my @parts = split('\t', $line); 380 | 381 | # process header info, may start read of table data 382 | if ($line =~ /MsgSize\t/) { $msgsize = $parts[1]; next; } 383 | elsif ($line =~ /Times\t/) { $times = $parts[1]; next; } 384 | elsif ($line =~ /Window\t/) { $window = $parts[1]; next; } 385 | elsif ($line =~ /Send\t/) { $sendflag = 1; @colnames = @parts; next; } 386 | elsif ($line =~ /Recv\t/) { $recvflag = 1; @colnames = @parts; next; } 387 | 388 | # if reading table, read in next row 389 | if ($sendflag or $recvflag) { 390 | $rh = $parts[0]; 391 | $rh =~ s/[a-zA-Z\s]*$//; # strip off trailing letters and spaces 392 | $rh =~ s/^\s*\d+://; # strip off leading spaces and [0-9]+: 393 | if ($strip_mpi_rank) { $rh =~ s/:\d+$//; } # remove mpi rank from node name 394 | #TODO: hack to remove suffix starting with '-' 395 | #print "-- $rh -- "; 396 | $rh =~ s/\-\w*$//; # strip off trailing prefix starting with '-' 397 | #print "-- $rh -- \n"; 398 | for(my $i = 1; $i < @parts; $i++) { 399 | $ch = $colnames[$i]; 400 | if ($strip_mpi_rank) { $ch =~ s/:\d+$//; } # remove mpi rank from node name 401 | $ch =~ s/\-\w*$//; # strip off trailing prefix starting with '-' 402 | if ($sendflag) { push @{$$sendtable{$rh}{$ch}}, $parts[$i]; } 403 | if ($recvflag) { push @{$$recvtable{$rh}{$ch}}, $parts[$i]; } 404 | } 405 | } 406 | } 407 | close(IN); 408 | 409 | return $msgsize, $times, $window, (stat($file))[4,9]; 410 | } 411 | 412 | # return avg of a list of numbers 413 | sub avg 414 | { 415 | if (@_) { 416 | my $sum = 0; 417 | foreach my $val (@_) { $sum += $val; } 418 | return $sum / @_; 419 | } 420 | } 421 | 422 | # find min, max, and compute average of all values in table 423 | sub min_max_avg 424 | { 425 | my $table = shift @_; 426 | my $row = (keys %$table)[1]; 427 | my $col = (keys %{$$table{$row}})[0]; 428 | my $min = avg(@{$$table{$row}{$col}}); 429 | my $max = avg(@{$$table{$row}{$col}}); 430 | my $sum = 0; 431 | my $count = 0; 432 | foreach my $row (sort keys %$table) { 433 | foreach my $col (sort keys %{$$table{$row}}) { 434 | if ($col eq $row) { next; } 435 | my $value = avg(@{$$table{$row}{$col}}); 436 | if ($value < $min) { $min = $value; } 437 | if ($value > $max) { $max = $value; } 438 | $sum += $value; 439 | $count++; 440 | } 441 | } 442 | my $avg = $sum / $count; 443 | $min = sprintf("%0.3f", $min); 444 | $max = sprintf("%0.3f", $max); 445 | $avg = sprintf("%0.3f", $avg); 446 | return ($min, $max, $avg); 447 | } 448 | 449 | # output mapping of MPI rank to nodename 450 | sub output_rankmap 451 | { 452 | my $table = shift @_; 453 | my $outfile = shift @_; 454 | print "Writing $outfile ...\n"; 455 | open(MAP, ">$outfile"); 456 | print MAP "Rank\tNode\n"; 457 | my @nodes = (); 458 | my @rows = keys %$table; 459 | @rows = (sort {($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]} @rows); 460 | $rank = -1; 461 | foreach $row (@rows) 462 | { 463 | if ($strip_mpi_rank) { ($node) = ($row =~ /([a-zA-Z]+\d*)/); $rank++; } 464 | else { ($node, $rank) = ($row =~ /([a-zA-Z]+\d*):(\d+)/); } 465 | if ($node) { 466 | print MAP "$rank\t$node\n"; 467 | push @nodes, $node; 468 | } 469 | } 470 | close(MAP); 471 | # my $nodelist = join(",", @nodes); # if you don't have compress(), use this instead 472 | my $nodelist = hostlist_lite::compress(@nodes); 473 | return (scalar(@nodes), $nodelist); 474 | } 475 | 476 | # write out bandwidth table 477 | # and average across rows and down columns for html report 478 | sub output_table 479 | { 480 | my $table = ""; 481 | my $outfile = ""; 482 | my $outfile_rows_cols = ""; 483 | my @stats = (); 484 | 485 | if ($mode eq "html") { 486 | $table = shift @_; 487 | $outfile = shift @_; 488 | $outfile_rows_cols = shift @_; 489 | @stats = @_; 490 | } else { 491 | $table = shift @_; 492 | @stats = @_; 493 | } 494 | if (not $outfile_rows_cols) { $outfile_rows_cols = $outfile; } 495 | 496 | print "Writing $outfile_rows_cols ...\n"; 497 | my ($min, $max, $avg) = @stats; 498 | if ($mode eq "html") { 499 | open(OUT, ">$outfile"); 500 | print OUT table(( 501 | row((cell("Min"), cell("Max"), cell("Avg"))), 502 | row((cell($min), cell($max), cell($avg))), 503 | )); 504 | close(OUT); 505 | } 506 | 507 | my @outrows = (); 508 | my %rowsum = (); 509 | my %colsum = (); 510 | my @rows = keys %$table; 511 | @rows = (sort {($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]} @rows); 512 | my @cols = @rows; 513 | my $row_count = scalar(@rows); 514 | foreach my $r (@rows) { 515 | $rowsum{$r} = 0; 516 | $colsum{$r} = 0; 517 | } 518 | my $row_index = 0; 519 | foreach my $r (@rows) 520 | { 521 | my @outcells = (); 522 | foreach my $c (@cols) 523 | { 524 | my $row = $r; 525 | my $col = $c; 526 | $row =~ s/:[\w\s]*$//; 527 | $col =~ s/:[\w\s]*$//; 528 | my $val = avg(@{$$table{$r}{$c}}); 529 | if ($row eq $col) { 530 | $color = ""; 531 | } else { 532 | $perf = $val / $max; 533 | $color = set_color($perf); 534 | $rowsum{$r} += $perf; 535 | $colsum{$c} += $perf; 536 | } 537 | push @outcells, cell($val, $color); 538 | } 539 | if ($args{dist}) { 540 | for(my $s = 0; $s < $row_index; $s++) { my $temp = shift @outcells; push @outcells, $temp; } 541 | } 542 | push @outrows, row(@outcells); 543 | $row_index++; 544 | } 545 | if ($mode eq "html") { 546 | my $br = "
\n"; 547 | open (OUT, ">$outfile_rows_cols"); 548 | 549 | print OUT "Ordered by slowest-to-fastest
\n"; 550 | print OUT "Ordered by MPI rank
\n"; 551 | 552 | print OUT $br; 553 | 554 | my @slowestrows = (sort {$rowsum{$a} <=> $rowsum{$b}} keys %rowsum); 555 | my @slowestcols = (sort {$colsum{$a} <=> $colsum{$b}} keys %colsum); 556 | my $minrow = sprintf("%0.1f", $rowsum{$slowestrows[ 0]} * 100 / ($row_count)); 557 | my $maxrow = sprintf("%0.1f", $rowsum{$slowestrows[-1]} * 100 / ($row_count)); 558 | my $mincol = sprintf("%0.1f", $colsum{$slowestcols[ 0]} * 100 / ($row_count)); 559 | my $maxcol = sprintf("%0.1f", $colsum{$slowestcols[-1]} * 100 / ($row_count)); 560 | print OUT "Minimum and maximum performance" . $br; 561 | print OUT "\n"; 562 | print OUT "\n"; 563 | print OUT "\n"; 564 | print OUT "
Min Row \%MaxMax Row \%MaxMin Col \%MaxMax Col \%Max
$minrow\%$maxrow\%$mincol\%$maxcol\%
\n"; 565 | 566 | print OUT $br, $br; 567 | 568 | # print average across row and column for each rank, in order of slowest to fastest 569 | print OUT "\n"; 570 | print OUT "Ordered by slowest-to-fastest" . $br; 571 | print OUT "\n"; 572 | print OUT "\n"; 573 | for(my $r = 0; $r < @slowestrows; $r++) 574 | { 575 | print OUT ""; 576 | my $i = $slowestrows[$r]; 577 | $perf = $rowsum{$i} / ($row_count); 578 | $perf = sprintf("%0.4f", $perf); 579 | $bw = sprintf("%0.1f", $perf*$max); 580 | $perc = sprintf("%0.1f", $perf*100); 581 | print OUT ""; 582 | print OUT ""; 583 | print OUT ""; 584 | 585 | $i = $slowestcols[$r]; 586 | $perf = $colsum{$i} / ($row_count); 587 | $perf = sprintf("%0.4f", $perf); 588 | $bw = sprintf("%0.1f", $perf*$max); 589 | $perc = sprintf("%0.1f", $perf*100); 590 | print OUT ""; 591 | print OUT ""; 592 | print OUT ""; 593 | print OUT "\n"; 594 | } 595 | print OUT "
Rowavg MB/s\%MaxColumnavg MB/s\%Max
" . $i . "" . $bw . "" . $perc . "%" . $i . "" . $bw . "" . $perc . "%
\n"; 596 | 597 | print OUT "

\n"; 598 | 599 | # print average across row and column for each rank, in order of MPI ranks 600 | print OUT "
\n"; 601 | print OUT "Ordered by MPI rank" . $br; 602 | print OUT "\n"; 603 | print OUT "\n"; 604 | for(my $r = 0; $r < @rows; $r++) 605 | { 606 | print OUT ""; 607 | my $i = $rows[$r]; 608 | $perf = $rowsum{$i} / ($row_count); 609 | $perf = sprintf("%0.4f", $perf); 610 | $bw = sprintf("%0.1f", $perf*$max); 611 | $perc = sprintf("%0.1f", $perf*100); 612 | print OUT ""; 613 | print OUT ""; 614 | print OUT ""; 615 | 616 | $perf = $colsum{$i} / ($row_count); 617 | $perf = sprintf("%0.4f", $perf); 618 | $bw = sprintf("%0.1f", $perf*$max); 619 | $perc = sprintf("%0.1f", $perf*100); 620 | print OUT ""; 621 | print OUT ""; 622 | print OUT ""; 623 | print OUT "\n"; 624 | } 625 | print OUT "
Rowavg MB/s\%MaxColumnavg MB/s\%Max
" . $i . "" . $bw . "" . $perc . "%" . $i . "" . $bw . "" . $perc . "
\n"; 626 | 627 | close(OUT); 628 | } else { 629 | print table(@outrows) . "\n"; 630 | print "\n"; 631 | } 632 | } 633 | 634 | # convert table to a list of rows, each with comma-separated values 635 | # then call write_bitmap() function to write bitmap file 636 | sub output_table_bitmap 637 | { 638 | my $table = shift @_; 639 | my $outfile = shift @_; 640 | my @stats = @_; 641 | 642 | print "Writing $outfile ...\n"; 643 | my @outrows = (); 644 | my ($min, $max, $avg) = @stats; 645 | if ($maxbw) { $max = $maxbw; } 646 | my @rows = keys %$table; 647 | @rows = (sort {($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]} @rows); 648 | @cols = @rows; 649 | my $row_index = 0; 650 | foreach $row (@rows) { 651 | my @outcells = (); 652 | foreach $col (@cols) { 653 | if ($row eq $col) { 654 | $perf = 1; 655 | } else { 656 | my $value = avg(@{$$table{$row}{$col}}); 657 | $perf = $value / $max; 658 | if ($perf < 0.0) { print "ERROR: avg bw < 0: " . join(", ", @{$$table{$row}{$col}}), "\n"; } 659 | if ($perf > 1.0) { print "ERROR: avg bw > max bw: " . join(", ", @{$$table{$row}{$col}}), "\n"; } 660 | } 661 | if ($perf > 1.0) { 662 | print "ERROR: $perf > 1.0, setting to 1.0 and continuing ... \n"; 663 | $perf = 1.0; 664 | } 665 | if ($perf < 0) { 666 | print "ERROR: $perf < 0.0, setting to 0.0 and continuing ... \n"; 667 | $perf = 0.0; 668 | } 669 | $color = int($perf * 255); 670 | push @outcells, $color; 671 | } 672 | if ($args{dist}) { 673 | for(my $s = 0; $s < $row_index; $s++) { my $temp = shift @outcells; push @outcells, $temp; } 674 | } 675 | push @outrows, join(",", @outcells); 676 | $row_index++; 677 | } 678 | write_bitmap($outfile, @outrows); 679 | } 680 | 681 | # write_bitmap(filename, @rows with comma-delimited column values) 682 | sub write_bitmap 683 | { 684 | my $bmpfile = shift @_; 685 | my @rows = @_; 686 | my @vals = split(',', $rows[0]); 687 | 688 | # get data dimensions 689 | my $height = scalar(@rows); 690 | my $width = scalar(@vals); 691 | 692 | # create an image that is at least 100 pixels 693 | my $factor = 1; 694 | while ($factor * $height < 100) { $factor *= 2; } 695 | my $height_scaled = $height * $factor; 696 | my $width_scaled = $width * $factor; 697 | 698 | # bitmap rows must have a byte count that is a multiple of 4 699 | # pad it out if necessary (this data won't be displayed) 700 | my $pad = ($width_scaled * 3) % 4; 701 | if ($pad > 0) { $pad = 4 - $pad; } 702 | 703 | my $filesize = $width_scaled * $height_scaled * 3 + 54; 704 | print "Writing $bmpfile ($width_scaled x $height_scaled) bitmap, $filesize bytes ...\n"; 705 | 706 | # bitmap file format: see http://www.fortunecity.com/skyscraper/windows/364/bmpffrmt.html 707 | # open a file in binary mode and print bitmap file header 708 | open(OUT, ">$bmpfile"); 709 | binmode(OUT); 710 | print OUT "BM"; 711 | print OUT pack "I", $filesize; 712 | print OUT pack "xx"; 713 | print OUT pack "xx"; 714 | print OUT pack "I", 54; 715 | print OUT pack "III", 40, $width_scaled, $height_scaled; 716 | print OUT pack "SS", 1, 24; 717 | print OUT pack "IIIIII", 0, 0, 0, 0, 0, 0; 718 | 719 | # print the pixel values 720 | @imagerows = (); # store pixel values to be printed in javascript notation (json) 721 | # run down the rows 722 | for ($y = $height-1; $y >= 0; $y--) { 723 | my $row = $rows[$y]; 724 | chomp $row; 725 | my @vals = split(',', $row); 726 | # may use more than one pixel per data point depending on minimum image size 727 | for ($j = 0; $j < $factor; $j++) { 728 | my @imagerow = (); 729 | # print column values for this row 730 | for ($x = 0; $x < $width; $x++) { 731 | my $col = $vals[$x]; 732 | # may use more than one pixel per data point depending on minimum image size 733 | for ($i = 0; $i < $factor; $i++) { 734 | if ($col >= 256 or $col < 0) { print "ERROR: Invalid pixel value: $col\n"; } 735 | print OUT pack "CCC", $col, $col, $col; 736 | push @imagerow, $col; 737 | } 738 | } 739 | # pad out the row if necessary 740 | for($p=0; $p < $pad; $p++) { 741 | $col = 0; 742 | print OUT pack "C", $col; 743 | } 744 | push @imagerows, "[" . join(",", @imagerow) . "]"; 745 | } 746 | } 747 | close(OUT); 748 | } 749 | 750 | # given a table reference, returns array of node names in javascript (json) 751 | sub return_rankmapjs 752 | { 753 | my $table = shift @_; 754 | my @rows = keys %$table; 755 | @rows = (sort {($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]} @rows); 756 | my @nodes = (); 757 | my $node = ""; 758 | my $rank = ""; 759 | foreach my $row (@rows) { 760 | if ($strip_mpi_rank) { ($node) = ($row =~ /([a-zA-Z\-]+\d*)/); $rank++; } 761 | else { ($node, $rank) = ($row =~ /([a-zA-Z\-]+\d*):(\d+)/); } 762 | if ($node) { push @nodes, '"' . $node . '"'; } 763 | } 764 | return (scalar(@nodes), "var rankmap = [" . join(",", @nodes) . "];"); 765 | } 766 | 767 | # returns color string for a value from [0,1] 768 | sub set_color 769 | { 770 | my $val = shift @_; 771 | if ($mode eq "html") { 772 | # just use grayscale in html 773 | my $gray = sprintf("%x", int($val*255)); 774 | return "#" . $gray . $gray . $gray; 775 | } else { 776 | return ""; 777 | } 778 | 779 | my $len = scalar(@$legend); 780 | for($i=0; $i<$len; $i += 2) { 781 | if ($val < $$legend[$i]) { return $$legend[$i+1]; } 782 | } 783 | return $$legend[$len-1]; 784 | } 785 | 786 | # given the contents and color for a cell, 787 | # return a string representing the colored cell 788 | sub cell 789 | { 790 | my $content = shift @_; 791 | my $color = shift @_; 792 | 793 | my $spacing = 10; 794 | my $maxlen = 100; 795 | my $len = length($content); 796 | my $extra = ($len < $spacing) ? $spacing - $len : 0; 797 | if ($mode ne "html") 798 | { 799 | $content = ' ' x $extra . $content; 800 | $len = length($content); 801 | $offset = ($len > $maxlen) ? $len - $maxlen : 0; 802 | $content = substr($content, $offset); 803 | } 804 | if ($mode eq "html") { 805 | if ($color) { return "$content"; } 806 | return "$content"; 807 | } else { 808 | return $content; 809 | } 810 | } 811 | 812 | # given a list of cells, return a string representing the row 813 | sub row 814 | { 815 | if ($mode eq "html") { 816 | return "" . join("", @_) . ""; 817 | } else { 818 | return join("\t", @_); 819 | } 820 | } 821 | 822 | # given a list of rows, return a string representing the table 823 | sub table 824 | { 825 | if ($mode eq "html") { 826 | return "\n" . join("\n", @_) . "\n
\n"; 827 | } else { 828 | return join("\n", @_) . "\n"; 829 | } 830 | } 831 | -------------------------------------------------------------------------------- /hostlist_lite.pm: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2010, Lawrence Livermore National Security (LLNS), LLC 2 | # Produced at the Lawrence Livermore National Laboratory (LLNL) 3 | # Written by Adam Moody . 4 | # UCRL-CODE-232117. 5 | # All rights reserved. 6 | # 7 | # This file is part of mpiGraph. For details, see 8 | # http://www.sourceforge.net/projects/mpigraph 9 | # Please also read the Additional BSD Notice below. 10 | # 11 | # Redistribution and use in source and binary forms, with or without modification, 12 | # are permitted provided that the following conditions are met: 13 | # * Redistributions of source code must retain the above copyright notice, this 14 | # list of conditions and the disclaimer below. 15 | # * Redistributions in binary form must reproduce the above copyright notice, 16 | # this list of conditions and the disclaimer (as noted below) in the documentation 17 | # and/or other materials provided with the distribution. 18 | # * Neither the name of the LLNL nor the names of its contributors may be used to 19 | # endorse or promote products derived from this software without specific prior 20 | # written permission. 21 | # * 22 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 23 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 24 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 25 | # IN NO EVENT SHALL LLNL, THE U.S. DEPARTMENT 26 | # OF ENERGY OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 27 | # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 28 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 29 | # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 30 | # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 31 | # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | # 33 | # Additional BSD Notice 34 | # 1. This notice is required to be provided under our contract with the U.S. Department 35 | # of Energy (DOE). This work was produced at LLNL under Contract No. W-7405-ENG-48 36 | # with the DOE. 37 | # 2. Neither the United States Government nor LLNL nor any of their employees, makes 38 | # any warranty, express or implied, or assumes any liability or responsibility for 39 | # the accuracy, completeness, or usefulness of any information, apparatus, product, 40 | # or process disclosed, or represents that its use would not infringe privately-owned 41 | # rights. 42 | # 3. Also, reference herein to any specific commercial products, process, or services 43 | # by trade name, trademark, manufacturer or otherwise does not necessarily constitute 44 | # or imply its endorsement, recommendation, or favoring by the United States Government 45 | # or LLNL. The views and opinions of authors expressed herein do not necessarily state 46 | # or reflect those of the United States Government or LLNL and shall not be used for 47 | # advertising or product endorsement purposes. 48 | 49 | package hostlist_lite; 50 | use strict; 51 | 52 | # This package processes SLURM-style hostlist strings. 53 | # 54 | # expand($hostlist) 55 | # returns a list of individual hostnames given a hostlist string 56 | # compress(@hostlist) 57 | # returns an ordered hostlist string given a list of hostnames 58 | # diff(\@hostlist1, \@hostlist2) 59 | # subtracts elements in hostlist2 from hostlist1 and returns list of remainder 60 | # intersect(\@hostlist1, \@hostlist2) 61 | # returns list of nodes that are in both hostlist1 and hostlist2 62 | # 63 | # 64 | # Author: Adam Moody (moody20@llnl.gov) 65 | 66 | # Returns a list of hostnames, give a hostlist string 67 | # expand("rhea[2-4,6]") returns ('rhea2','rhea3','rhea4','rhea6') 68 | sub expand { 69 | # read in our hostlist, should be first parameter 70 | if (@_ != 1) { 71 | return undef; 72 | } 73 | my $nodeset = shift @_; 74 | 75 | my $machine = undef; 76 | my @lowhighs = (); 77 | if ($nodeset =~ /([a-zA-Z]+)\[([\d,-]+)\]/) { 78 | # hostlist with brackets, e.g., atlas[2-5,28,30] 79 | $machine = $1; 80 | my @ranges = split ",", $2; 81 | foreach my $range (@ranges) { 82 | my $low = undef; 83 | my $high = undef; 84 | if ($range =~ /(\d+)-(\d+)/) { 85 | # low-to-high range 86 | $low = $1; 87 | $high = $2; 88 | } else { 89 | # single element range 90 | $low = $range; 91 | $high = $range; 92 | } 93 | push @lowhighs, $low, $high; 94 | } 95 | } else { 96 | # single node hostlist, e.g., atlas2 97 | $nodeset =~ /([a-zA-Z]+)(\d+)/; 98 | $machine = $1; 99 | push @lowhighs, $2, $2; 100 | } 101 | 102 | # produce our list of nodes 103 | my @nodes = (); 104 | while(@lowhighs) { 105 | my $low = shift @lowhighs; 106 | my $high = shift @lowhighs; 107 | for(my $i = $low; $i <= $high; $i++) { 108 | push @nodes, $machine . $i; 109 | } 110 | } 111 | 112 | return @nodes; 113 | } 114 | 115 | # Returns a hostlist string given a list of hostnames 116 | # compress('rhea2','rhea3','rhea4','rhea6') returns "rhea[2-4,6]" 117 | sub compress { 118 | if (@_ == 0) { 119 | return ""; 120 | } 121 | 122 | # pull the machine name from the first node name 123 | my @numbers = (); 124 | my ($machine) = ($_[0] =~ /([a-zA-Z]+)(\d+)/); 125 | foreach my $host (@_) { 126 | # get the machine name and node number for this node 127 | my ($name, $number) = ($host =~ /([a-zA-Z]+)(\d+)/); 128 | 129 | # check that all nodes belong to the same machine 130 | if ($name ne $machine) { 131 | return undef; 132 | } 133 | 134 | # record the number 135 | push @numbers, $number; 136 | } 137 | 138 | # order the nodes by number 139 | my @sorted = sort {$a <=> $b} @numbers; 140 | 141 | # TODO: toss out duplicates? 142 | 143 | # build the ranges 144 | my @ranges = (); 145 | my $low = $sorted[0]; 146 | my $last = $low; 147 | for(my $i=1; $i < @sorted; $i++) { 148 | my $high = $sorted[$i]; 149 | if($high == $last + 1) { 150 | $last = $high; 151 | next; 152 | } 153 | if($last > $low) { 154 | push @ranges, $low . "-" . $last; 155 | } else { 156 | push @ranges, $low; 157 | } 158 | $low = $high; 159 | $last = $low; 160 | } 161 | if($last > $low) { 162 | push @ranges, $low . "-" . $last; 163 | } else { 164 | push @ranges, $low; 165 | } 166 | 167 | # join the ranges with commas and return the compressed hostlist 168 | return $machine . "[" . join(",", @ranges) . "]"; 169 | } 170 | 171 | # Given references to two lists, subtract elements in list 2 from list 1 and return remainder 172 | sub diff { 173 | # we should have two list references 174 | if (@_ != 2) { 175 | return undef; 176 | } 177 | my $set1 = $_[0]; 178 | my $set2 = $_[1]; 179 | 180 | my %nodes = (); 181 | 182 | # build list of nodes from set 1 183 | foreach my $node (@$set1) { 184 | $nodes{$node} = 1; 185 | } 186 | 187 | # remove nodes from set 2 188 | foreach my $node (@$set2) { 189 | delete $nodes{$node}; 190 | } 191 | 192 | my @nodelist = (keys %nodes); 193 | if (@nodelist > 0) { 194 | my $list = scr_hostlist::compress(@nodelist); 195 | return scr_hostlist::expand($list); 196 | } 197 | return (); 198 | } 199 | 200 | # Given references to two lists, return list of intersection nodes 201 | sub intersect { 202 | # we should have two list references 203 | if (@_ != 2) { 204 | return undef; 205 | } 206 | my $set1 = $_[0]; 207 | my $set2 = $_[1]; 208 | 209 | my %nodes = (); 210 | 211 | # build list of nodes from set 1 212 | my %tmp_nodes = (); 213 | foreach my $node (@$set1) { 214 | $tmp_nodes{$node} = 1; 215 | } 216 | 217 | # remove nodes from set 2 218 | foreach my $node (@$set2) { 219 | if (defined $tmp_nodes{$node}) { 220 | $nodes{$node} = 1; 221 | } 222 | } 223 | 224 | my @nodelist = (keys %nodes); 225 | if (@nodelist > 0) { 226 | my $list = scr_hostlist::compress(@nodelist); 227 | return scr_hostlist::expand($list); 228 | } 229 | return (); 230 | } 231 | 232 | 1; 233 | -------------------------------------------------------------------------------- /makefile: -------------------------------------------------------------------------------- 1 | all: clean 2 | mpicc -o mpiGraph mpiGraph.c 3 | 4 | debug: 5 | mpicc -g -O0 -o mpiGraph mpiGraph.c 6 | 7 | clean: 8 | rm -rf mpiGraph.o mpiGraph 9 | -------------------------------------------------------------------------------- /mpiGraph.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/LLNL/mpiGraph/5f6cbd9883f0204cc65ee5205f35518eab704ba7/mpiGraph.c --------------------------------------------------------------------------------